{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T15:39:58Z","timestamp":1765294798946,"version":"3.37.3"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2016,12,6]],"date-time":"2016-12-06T00:00:00Z","timestamp":1480982400000},"content-version":"vor","delay-in-days":15,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Calcul Quebec and Compute Canada"},{"name":"Natural Sciences and Engineering Council","award":["RGPGR 448167-2013"],"award-info":[{"award-number":["RGPGR 448167-2013"]}]},{"DOI":"10.13039\/501100000024","name":"Canadian Institutes of Health Research","doi-asserted-by":"publisher","award":["EP1-120608","EP1-120609"],"award-info":[{"award-number":["EP1-120608","EP1-120609"]}],"id":[{"id":"10.13039\/501100000024","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>Labeled histone mark data http:\/\/cbio.ensmp.fr\/~thocking\/chip-seq-chunk-db\/, R package to compute the label error of predicted peaks https:\/\/github.com\/tdhock\/PeakError<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btw672","type":"journal-article","created":{"date-parts":[[2016,10,21]],"date-time":"2016-10-21T19:05:23Z","timestamp":1477076723000},"page":"491-499","source":"Crossref","is-referenced-by-count":27,"title":["Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning"],"prefix":"10.1093","volume":"33","author":[{"given":"Toby Dylan","family":"Hocking","sequence":"first","affiliation":[{"name":"Department of Human Genetics, McGill University, Montr\u00e9al, Canada"}]},{"given":"Patricia","family":"Goerner-Potvin","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, McGill University, Montr\u00e9al, Canada"}]},{"given":"Andreanne","family":"Morin","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, McGill University, Montr\u00e9al, Canada"}]},{"given":"Xiaojian","family":"Shao","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, McGill University, Montr\u00e9al, Canada"}]},{"given":"Tomi","family":"Pastinen","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, McGill University, Montr\u00e9al, Canada"}]},{"given":"Guillaume","family":"Bourque","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, McGill University, Montr\u00e9al, Canada"}]}],"member":"286","published-online":{"date-parts":[[2016,11,21]]},"reference":[{"key":"2023020204414230000_btw672-B1","doi-asserted-by":"crossref","first-page":"2979","DOI":"10.1093\/bioinformatics\/btt524","article-title":"HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data","volume":"29","author":"Ashoor","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pcbi.1003326","article-title":"Practical guidelines for the comprehensive analysis of ChIP-seq data","volume":"9","author":"Bailey","year":"2013","journal-title":"PLoS Comput. Biol"},{"key":"2023020204414230000_btw672-B3","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1016\/j.cell.2007.05.009","article-title":"High-resolution profiling of histone methylations in the human genome","volume":"129","author":"Barski","year":"2007","journal-title":"Cell"},{"key":"2023020204414230000_btw672-B4","doi-asserted-by":"crossref","first-page":"938","DOI":"10.1038\/nmeth.3038","article-title":"Epiviz: interactive visual analytics for functional genomics data","volume":"11","author":"Chelaru","year":"2014","journal-title":"Nat. Methods"},{"key":"2023020204414230000_btw672-B5","doi-asserted-by":"crossref","first-page":"i504","DOI":"10.1093\/bioinformatics\/btq379","article-title":"A varying threshold method for chip peak-calling using multiple sources of information","volume":"26","author":"Chen","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B6","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"ENCODE Project","year":"2012","journal-title":"Nature"},{"key":"2023020204414230000_btw672-B7","doi-asserted-by":"crossref","first-page":"2290","DOI":"10.1101\/gr.139360.112","article-title":"Integration of chip-seq and machine learning reveals enhancers and a predictive regulatory sequence vocabulary in melanocytes","volume":"22","author":"Gorkin","year":"2012","journal-title":"Genome Res"},{"key":"2023020204414230000_btw672-B8","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1016\/j.molcel.2010.05.004","article-title":"Simple combinations of lineage-determining transcription factors primecis-regulatory elements required for macrophage and B cell identities","volume":"38","author":"Heinz","year":"2010","journal-title":"Mol. Cell"},{"key":"2023020204414230000_btw672-B9","doi-asserted-by":"crossref","first-page":"1539","DOI":"10.1093\/bioinformatics\/btu072","article-title":"SegAnnDB: interactive web-based genomic segmentation","volume":"30","author":"Hocking","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B10","first-page":"324","article-title":"PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data","author":"Hocking","year":"2015","journal-title":"Proc. 32nd ICML"},{"key":"2023020204414230000_btw672-B11","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1186\/1471-2105-14-164","article-title":"Learning smoothing models of copy number profiles using breakpoint annotations","volume":"14","author":"Hocking","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020204414230000_btw672-B12","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1093\/bioinformatics\/btu568","article-title":"JAMM: a peak finder for joint analysis of NGS replicates","volume":"31","author":"Ibrahim","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B13","doi-asserted-by":"crossref","first-page":"1826","DOI":"10.1073\/pnas.0808843106","article-title":"Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning","volume":"106","author":"Jones","year":"2009","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023020204414230000_btw672-B14","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at UCSC","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res"},{"key":"2023020204414230000_btw672-B15","doi-asserted-by":"crossref","first-page":"2204","DOI":"10.1093\/bioinformatics\/btq351","article-title":"BigWig and BigBed: enabling browsing of large distributed datasets","volume":"26","author":"Kent","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B16","doi-asserted-by":"crossref","first-page":"176.","DOI":"10.1186\/1471-2105-13-176","article-title":"The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding","volume":"13","author":"Kornacker","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020204414230000_btw672-B17","doi-asserted-by":"crossref","first-page":"R93.","DOI":"10.1186\/gb-2013-14-8-r93","article-title":"Web Apollo: a web-based genomic annotation editing platform","volume":"14","author":"Lee","year":"2013","journal-title":"Genome Biol"},{"key":"2023020204414230000_btw672-B18","doi-asserted-by":"crossref","first-page":"e70","DOI":"10.1093\/nar\/gks048","article-title":"Picking chip-seq peak detectors for analyzing chromatin modification experiments","volume":"40","author":"Micsinai","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020204414230000_btw672-B19","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1038\/nature06008","article-title":"Genome-wide maps of chromatin state in pluripotent and lineage-committed cells","volume":"448","author":"Mikkelsen","year":"2007","journal-title":"Nature"},{"key":"2023020204414230000_btw672-B20","doi-asserted-by":"crossref","first-page":"2262","DOI":"10.1101\/gr.140665.112","article-title":"Spark: a navigational paradigm for genomic data exploration","volume":"22","author":"Nielsen","year":"2012","journal-title":"Genome Res"},{"key":"2023020204414230000_btw672-B21","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2164-13-S1-S1","article-title":"Improving chip-seq peak-calling for functional co-regulator binding by integrating multiple sources of biological information","volume":"13","author":"Osmanbeyoglu","year":"2012","journal-title":"BMC Genomics"},{"key":"2023020204414230000_btw672-B22","doi-asserted-by":"crossref","first-page":"e25","DOI":"10.1093\/nar\/gkq1187","article-title":"A manually curated chip-seq benchmark demonstrates room for improvement in current peak-finder programs","volume":"39","author":"Rye","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020204414230000_btw672-B23","doi-asserted-by":"crossref","first-page":"870","DOI":"10.1093\/bioinformatics\/btr030","article-title":"Identifying dispersed epigenomic domains from chip-seq data","volume":"27","author":"Song","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B24","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1093\/bib\/bbq068","article-title":"Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing benchmarking efforts","volume":"12","author":"Szalkowski","year":"2011","journal-title":"Brief. Bioinf"},{"key":"2023020204414230000_btw672-B25","doi-asserted-by":"crossref","first-page":"1199","DOI":"10.1093\/bioinformatics\/btq128","article-title":"A signal-noise model for significance analysis of ChIP-seq with negative control","volume":"26","author":"Xu","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B26","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1111\/cgf.12096","article-title":"An interactive analysis and exploration tool for epigenomic data","volume":"32","author":"Younesy","year":"2013","journal-title":"Comput. Graph. Forum"},{"key":"2023020204414230000_btw672-B27","doi-asserted-by":"crossref","first-page":"1952","DOI":"10.1093\/bioinformatics\/btp340","article-title":"A clustering approach for identification of enriched domains from histone modification ChIP-Seq data","volume":"25","author":"Zang","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B28","doi-asserted-by":"crossref","first-page":"2568","DOI":"10.1093\/bioinformatics\/btu372","article-title":"PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data","volume":"30","author":"Zhang","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020204414230000_btw672-B29","doi-asserted-by":"crossref","first-page":"R137","DOI":"10.1186\/gb-2008-9-9-r137","article-title":"Model-based analysis of ChIP-Seq (MACS)","volume":"9","author":"Zhang","year":"2008","journal-title":"Genome Biol"},{"key":"2023020204414230000_btw672-B30","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1038\/nature08934","article-title":"Genetic analysis of variation in transcription factor binding in yeast","volume":"464","author":"Zheng","year":"2010","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/4\/491\/49037984\/bioinformatics_33_4_491.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/4\/491\/49037984\/bioinformatics_33_4_491.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T04:46:13Z","timestamp":1675313173000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/4\/491\/2608653"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2016,11,21]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw672","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2017,2,15]]},"published":{"date-parts":[[2016,11,21]]}}}