{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,26]],"date-time":"2023-01-26T05:19:53Z","timestamp":1674710393304},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Many ChIP-Seq experiments are aimed at developing gold standards for determining the locations of various genomic features such as transcription start or transcription factor binding sites on the whole genome. Many such pioneering experiments lack rigorous testing methods and adequate \u2018gold standard\u2019 annotations to compare against as they themselves are the most reliable source of empirical data available. To overcome this problem, we propose a self-consistency test whereby a dataset is tested against itself. It relies on a supervised machine learning style protocol for in silico annotation of a genome and accuracy estimation to guarantee, at least, self-consistency.<\/jats:p>\n               <jats:p>Results: The main results use a novel performance metric (a calibrated precision) in order to assess and compare the robustness of the proposed supervised learning method across different test sets. As a proof of principle, we applied the whole protocol to two recent ChIP-Seq ENCODE datasets of STAT1 and Pol-II binding sites. STAT1 is benchmarked against in silico detection of binding sites using available position weight matrices. Pol-II, the main focus of this paper, is benchmarked against 17 algorithms for the closely related and well-studied problem of in silico transcription start site (TSS) prediction. Our results also demonstrate the feasibility of in silico genome annotation extension with encouraging results from a small portion of annotated genome to the remainder.<\/jats:p>\n               <jats:p>Availability: Available from http:\/\/www.genomics.csse.unimelb.edu.au\/gat.<\/jats:p>\n               <jats:p>Contact: \u00a0justin.bedo@nicta.com.au; adam.kowalczyk@nicta.com.au<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr263","type":"journal-article","created":{"date-parts":[[2011,5,10]],"date-time":"2011-05-10T05:17:58Z","timestamp":1305004678000},"page":"1610-1617","source":"Crossref","is-referenced-by-count":0,"title":["Genome annotation test with validation on transcription start site and ChIP-Seq for Pol-II binding data"],"prefix":"10.1093","volume":"27","author":[{"given":"Justin","family":"Bedo","sequence":"first","affiliation":[{"name":"1 National ICT Australia, Victoria Research Laboratories, The University of Melbourne, VIC 3010, Australia and 2Informatique Biologie Int\u00e9grative et Syst\u00e8mes Complexes, Tour Evry II, 523 Place des Terrasses de l'Agora, 91000 Evry, France"},{"name":"1 National ICT Australia, Victoria Research Laboratories, The University of Melbourne, VIC 3010, Australia and 2Informatique Biologie Int\u00e9grative et Syst\u00e8mes Complexes, Tour Evry II, 523 Place des Terrasses de l'Agora, 91000 Evry, France"}]},{"given":"Adam","family":"Kowalczyk","sequence":"additional","affiliation":[{"name":"1 National ICT Australia, Victoria Research Laboratories, The University of Melbourne, VIC 3010, Australia and 2Informatique Biologie Int\u00e9grative et Syst\u00e8mes Complexes, Tour Evry II, 523 Place des Terrasses de l'Agora, 91000 Evry, France"}]}],"member":"286","published-online":{"date-parts":[[2011,5,9]]},"reference":[{"key":"2023012511151349600_B1","doi-asserted-by":"crossref","first-page":"i24","DOI":"10.1093\/bioinformatics\/btn172","article-title":"ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles","volume":"24","author":"Abeel","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012511151349600_B2","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1101\/gr.6991408","article-title":"Generic eukaryotic core promoter prediction using structural features of DNA","volume":"18","author":"Abeel","year":"2008","journal-title":"Genome Res."},{"key":"2023012511151349600_B3","doi-asserted-by":"crossref","first-page":"i313","DOI":"10.1093\/bioinformatics\/btp191","article-title":"Toward a gold standard for promoter prediction evaluation","volume":"25","author":"Abeel","year":"2009","journal-title":"Bioinformatics"},{"issue":"Suppl. 1","key":"2023012511151349600_B4","first-page":"S3.1","article-title":"Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment","volume":"7","author":"Bajic","year":"2006","journal-title":"Genome Biol."},{"key":"2023012511151349600_B5","doi-asserted-by":"crossref","DOI":"10.1038\/npre.2009.3811.1","article-title":"Simple SVM based whole-genome segmentation","volume-title":"Nat. Preced","author":"Bedo","year":"2009"},{"key":"2023012511151349600_B6","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1038\/nmeth.1223","article-title":"Stem cell transcriptome profiling via massive-scale mrna sequencing","volume":"5","author":"Cloonan","year":"2008","journal-title":"Nat. Methods"},{"key":"2023012511151349600_B7","doi-asserted-by":"crossref","DOI":"10.1145\/1143844.1143874","article-title":"The relationship between Precision-Recall and ROC curves","volume-title":"Proceedings of the 23rd International Conference on Machine Learning","author":"Davis","year":"2006"},{"key":"2023012511151349600_B8","doi-asserted-by":"crossref","first-page":"627","DOI":"10.2144\/000112802","article-title":"Deep cap analysis gene expression (CAGE): genome-wide identification of promoters, quantification of their expression, and network inference","volume":"44","author":"de Hoon","year":"2008","journal-title":"Biotechniques"},{"key":"2023012511151349600_B9","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1101\/gr.216102","article-title":"Computational detection and location of transcription start sites in mammalian genomic DNA","volume":"12","author":"Down","year":"2002","journal-title":"Genome Res."},{"key":"2023012511151349600_B10","doi-asserted-by":"crossref","first-page":"1921","DOI":"10.1101\/gad.1643208","article-title":"NELF-mediated stalling of Pol II can enhance gene expression by blocking promoter-proximal nucleosome assembly","volume":"22","author":"Gilchrist","year":"2008","journal-title":"Genes Dev."},{"key":"2023012511151349600_B11","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic (ROC) curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"2023012511151349600_B12","doi-asserted-by":"crossref","first-page":"2953","DOI":"10.1101\/gad.1371305","article-title":"Global changes in STAT target selection and transcription regulation upon interferon treatments","volume":"19","author":"Hartman","year":"2005","journal-title":"Genes Dev."},{"key":"2023012511151349600_B13","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1038\/nmeth0306-211","article-title":"CAGE: cap analysis of gene expression.","volume":"3","author":"Kodzius","year":"2006","journal-title":"Nat. Methods"},{"key":"2023012511151349600_B14","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1007\/978-3-642-12683-3_19","article-title":"The Poisson Margin Test for normalisation free significance analysis of NGS data","volume":"6044","author":"Kowalczyk","year":"2010","journal-title":"Lect. Notes Comput. Sci."},{"key":"2023012511151349600_B15","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1038\/nbt.1518","article-title":"PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls","volume":"27","author":"Rozowsky","year":"2009","journal-title":"Nat. Biotechnol."},{"key":"2023012511151349600_B16","doi-asserted-by":"crossref","first-page":"e423","DOI":"10.1093\/bioinformatics\/btl250","article-title":"Arts: accurate recognition of transcription starts in human","volume":"22","author":"Sonnenburg","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012511151349600_B17","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511151349600_B18","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1101\/gr.081638.108","article-title":"High-resolution human core-promoter prediction with CoreBoost_HM","volume":"19","author":"Wang","year":"2008","journal-title":"Genome Res."},{"key":"2023012511151349600_B19","doi-asserted-by":"crossref","first-page":"R17","DOI":"10.1186\/gb-2007-8-2-r17","article-title":"Boosting with stumps for predicting transcription start sites","volume":"8","author":"Zhao","year":"2007","journal-title":"Genome Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/12\/1610\/48862821\/bioinformatics_27_12_1610.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/12\/1610\/48862821\/bioinformatics_27_12_1610.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T11:21:06Z","timestamp":1674645666000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/12\/1610\/258386"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,5,9]]},"references-count":19,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2011,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr263","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,6,15]]},"published":{"date-parts":[[2011,5,9]]}}}