{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T17:57:24Z","timestamp":1773856644664,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":3015,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work.<\/jats:p><jats:p>Results: Comparing the average structural profile based on base stacking energy of transcribed, promoter and intergenic sequences demonstrates that the core promoter has unique features that cannot be found in other sequences. We show that unsupervised clustering by using self-organizing maps can clearly distinguish between the structural profiles of promoter sequences and other genomic sequences. An implementation of this promoter prediction program, called ProSOM, is available and has been compared with the state-of-the-art. We propose an objective, accurate and biologically sound validation scheme for core promoter predictors. ProSOM performs at least as well as the software currently available, but our technique is more balanced in terms of the number of predicted sites and the number of false predictions, resulting in a better all-round performance. Additional tests on the ENCODE regions of the human genome show that 98% of all predictions made by ProSOM can be associated with transcriptionally active regions, which demonstrates the high precision.<\/jats:p><jats:p>Availability: Predictions for the human genome, the validation datasets and the program (ProSOM) are available upon request.<\/jats:p><jats:p>Contact: \u00a0yves.vandepeer@psb.ugent.be<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn172","type":"journal-article","created":{"date-parts":[[2008,6,27]],"date-time":"2008-06-27T07:43:13Z","timestamp":1214552593000},"page":"i24-i31","source":"Crossref","is-referenced-by-count":69,"title":["ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles"],"prefix":"10.1093","volume":"24","author":[{"given":"Thomas","family":"Abeel","sequence":"first","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associ\u00e9 de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associ\u00e9 de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium"}]},{"given":"Yvan","family":"Saeys","sequence":"additional","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associ\u00e9 de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associ\u00e9 de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium"}]},{"given":"Pierre","family":"Rouz\u00e9","sequence":"additional","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associ\u00e9 de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associ\u00e9 de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium"}]},{"given":"Yves","family":"Van de Peer","sequence":"additional","affiliation":[{"name":"1 Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associ\u00e9 de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium"},{"name":"1 Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associ\u00e9 de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium"}]}],"member":"286","published-online":{"date-parts":[[2008,7,1]]},"reference":[{"key":"2023020210381875500_B1","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1101\/gr.6991408","article-title":"Generic eukaryotic core promoter prediction using structural features of DNA","volume":"18","author":"Abeel","year":"2008","journal-title":"Genome Res"},{"key":"2023020210381875500_B2","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/1471-2164-5-34","article-title":"Comprehensive analysis of the base composition around the transcription start site in Metazoa","volume":"5","author":"Aerts","year":"2004","journal-title":"BMC Genomics"},{"key":"2023020210381875500_B3","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1016\/S0076-6879(03)70021-4","article-title":"Computational detection of vertebrate RNA polymerase II promoters","volume":"370","author":"Bajic","year":"2003","journal-title":"Methods Enzymol"},{"key":"2023020210381875500_B4","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1093\/bioinformatics\/18.1.198","article-title":"Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters","volume":"18","author":"Bajic","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020210381875500_B5","doi-asserted-by":"crossref","first-page":"1467","DOI":"10.1038\/nbt1032","article-title":"Promoter prediction analysis on the whole human genome","volume":"22","author":"Bajic","year":"2004","journal-title":"Nat. Biotechnol"},{"key":"2023020210381875500_B6","doi-asserted-by":"crossref","first-page":"S3.1","DOI":"10.1186\/gb-2006-7-s1-s3","article-title":"Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment","volume":"7","author":"Bajic","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210381875500_B7","first-page":"35","article-title":"Computational applications of DNA structural scales","volume":"6","author":"Baldi","year":"1998","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol"},{"key":"2023020210381875500_B8","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1038\/nrg2220","article-title":"Steady progress and recent breakthroughs in the accuracy of automated genome annotation","volume":"9","author":"Brent","year":"2008","journal-title":"Nat. Rev. Genet"},{"key":"2023020210381875500_B9","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1038\/ng1789","article-title":"Genome-wide analysis of mammalian promoter architecture and evolution","volume":"38","author":"Carninci","year":"2006","journal-title":"Nat. Genet"},{"key":"2023020210381875500_B10","doi-asserted-by":"crossref","first-page":"29","DOI":"10.54254\/2755-2721\/13\/20230705","article-title":"PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matrices","volume":"13","author":"Chen","year":"1997","journal-title":"Comput. Appl. Biosci"},{"key":"2023020210381875500_B11","doi-asserted-by":"crossref","first-page":"1584","DOI":"10.1093\/nar\/gkh335","article-title":"DNA dynamically directs its own transcription initiation","volume":"32","author":"Choi","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023020210381875500_B12","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1038\/ng780","article-title":"Computational identification of promoters and first exons in the human genome","volume":"29","author":"Davuluri","year":"2001","journal-title":"Nat. Genet"},{"key":"2023020210381875500_B13","doi-asserted-by":"crossref","first-page":"2418","DOI":"10.1101\/gad.342405","article-title":"A core promoter element downstream of the TATA box that is recognized by TFIIB","volume":"19","author":"Deng","year":"2005","journal-title":"Genes Dev"},{"key":"2023020210381875500_B14","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1101\/gr.216102","article-title":"Computational detection and location of transcription start sites in mammalian genomic DNA","volume":"12","author":"Down","year":"2002","journal-title":"Genome Res"},{"key":"2023020210381875500_B15","doi-asserted-by":"crossref","first-page":"1455","DOI":"10.1101\/gr.4140006","article-title":"Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques","volume":"16","author":"Elnitski","year":"2006","journal-title":"Genome Res"},{"key":"2023020210381875500_B16","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1101\/gr.7.9.861","article-title":"Eukaryotic promoter recognition","volume":"7","author":"Fickett","year":"1997","journal-title":"Genome Res"},{"key":"2023020210381875500_B17","doi-asserted-by":"crossref","first-page":"D707","DOI":"10.1093\/nar\/gkm988","article-title":"Ensembl 2008","volume":"36","author":"Flicek","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023020210381875500_B18","doi-asserted-by":"crossref","first-page":"4255","DOI":"10.1093\/nar\/gki737","article-title":"Large-scale structural analysis of the core promoter in mammalian and plant genomes","volume":"33","author":"Florquin","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023020210381875500_B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1101\/gr.6831208","article-title":"A code for transcription initiation in mammalian genomes","volume":"18","author":"Frith","year":"2008","journal-title":"Genome Res"},{"key":"2023020210381875500_B20","doi-asserted-by":"crossref","first-page":"R263","DOI":"10.1186\/gb-2007-8-12-r263","article-title":"Determining promoter location based on DNA structure first-principles calculations","volume":"8","author":"Goni","year":"2007","journal-title":"Genome Biol"},{"key":"2023020210381875500_B21","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1089\/cmb.2006.13.379","article-title":"Using multiple alignments to improve gene prediction","volume":"13","author":"Gross","year":"2006","journal-title":"J. Comput. Biol"},{"key":"2023020210381875500_B22","doi-asserted-by":"crossref","first-page":"S2.1","DOI":"10.1186\/gb-2006-7-s1-s2","article-title":"EGASP: the human ENCODE Genome Annotation Assessment Project","volume":"7","author":"Guig\u00f6","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210381875500_B23","doi-asserted-by":"crossref","first-page":"3165","DOI":"10.1093\/nar\/gki627","article-title":"Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes","volume":"33","author":"Kanhere","year":"2005","journal-title":"Nucleic Acids Res"},{"issue":"(Database issue)","key":"2023020210381875500_B24","first-page":"D773","article-title":"The UCSC genome browser database: 2008 update","volume":"36","author":"Karolchik","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023020210381875500_B25","doi-asserted-by":"crossref","first-page":"R118","DOI":"10.1186\/gb-2006-7-12-r118","article-title":"Dynamic usage of transcription start sites within core promoters","volume":"7","author":"Kawaji","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210381875500_B26","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1093\/bioinformatics\/15.5.356","article-title":"Promoter2.0: for the recognition of PolII promoter sequences","volume":"15","author":"Knudsen","year":"1999","journal-title":"Bioinformatics"},{"key":"2023020210381875500_B27","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-56927-2","volume-title":"Self-Organizing Maps","author":"Kohonen","year":"2001","edition":"3rd ed"},{"key":"2023020210381875500_B28","doi-asserted-by":"crossref","first-page":"3347","DOI":"10.1073\/pnas.97.7.3347","article-title":"Insertion site preferences of the P transposable element in Drosophila melanogaster","volume":"97","author":"Liao","year":"2000","journal-title":"Proc. Natl Acad. Sci. USA"},{"issue":"(Database issue)","key":"2023020210381875500_B29","doi-asserted-by":"crossref","first-page":"D332","DOI":"10.1093\/nar\/gkj145","article-title":"The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide","volume":"34","author":"Liolios","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020210381875500_B30","first-page":"380","article-title":"Stochastic segment models of eukaryotic promoter regions","volume":"1","author":"Ohler","year":"2000","journal-title":"Pac. Symp. Biocomput"},{"key":"2023020210381875500_B31","doi-asserted-by":"crossref","first-page":"2341","DOI":"10.1002\/bip.1978.360171005","article-title":"Optimized potential function for calculation of nucleic-acid interaction energies. 1. Base stacking","volume":"17","author":"Ornstein","year":"1978","journal-title":"Biopolymers"},{"key":"2023020210381875500_B32","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1016\/S0097-8485(99)00015-7","article-title":"The biology of eukaryotic promoter prediction\u2013a review","volume":"23","author":"Pedersen","year":"1999","journal-title":"Comput. Chem"},{"key":"2023020210381875500_B33","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/MCAS.2006.1688199","article-title":"Ensemble based systems in decision making","volume":"6","author":"Polikar","year":"2006","journal-title":"IEEE Circuit Syst. Mag"},{"key":"2023020210381875500_B34","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1093\/bioinformatics\/18.4.631","article-title":"CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences","volume":"18","author":"Ponger","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020210381875500_B35","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1006\/jmbi.1995.0349","article-title":"Predicting Pol II promoter sequences using transcription factor binding sites","volume":"249","author":"Prestridge","year":"1995","journal-title":"J. Mol. Biol"},{"key":"2023020210381875500_B36","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1016\/S0097-8485(01)00099-7","article-title":"Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome","volume":"26","author":"Reese","year":"2001","journal-title":"Comput. Chem"},{"key":"2023020210381875500_B37","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020210381875500_B38","doi-asserted-by":"crossref","first-page":"424","DOI":"10.1038\/nrg2026","article-title":"Mammalian RNA polymerase II core promoters: insights from genome-wide studies","volume":"8","author":"Sandelin","year":"2007","journal-title":"Nat. Rev. Genet"},{"key":"2023020210381875500_B39","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1006\/jmbi.2000.3589","article-title":"Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach","volume":"297","author":"Scherf","year":"2000","journal-title":"J. Mol. Biol"},{"key":"2023020210381875500_B40","doi-asserted-by":"crossref","first-page":"15776","DOI":"10.1073\/pnas.2136655100","article-title":"Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage","volume":"100","author":"Shiraki","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210381875500_B41","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1146\/annurev.biochem.72.121801.161520","article-title":"The RNA polymerase II core promoter","volume":"72","author":"Smale","year":"2003","journal-title":"Annu. Rev. Biochem"},{"key":"2023020210381875500_B42","first-page":"S10.1","article-title":"Automatic annotation of eukaryotic genes, pseudogenes and promoters","volume":"7 (Suppl 1)","author":"Solovyev","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210381875500_B43","doi-asserted-by":"crossref","first-page":"e472","DOI":"10.1093\/bioinformatics\/btl250","article-title":"ARTS: accurate recognition of transcription starts in human","volume":"22","author":"Sonnenburg","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020210381875500_B44","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1038\/nature05874","article-title":"Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project","volume":"447","author":"The ENCODE Project Consortium","year":"2007","journal-title":"Nature"},{"key":"2023020210381875500_B45","volume-title":"Information Retrieval","author":"Van Rijsbergen","year":"1979","edition":"2nd edition"},{"issue":"(Database issue)","key":"2023020210381875500_B46","first-page":"D97","article-title":"DBTSS: database of transcription start sites, progress report 2008","volume":"36","author":"Wakaguri","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023020210381875500_B47","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1016\/j.bbrc.2006.06.062","article-title":"A mammalian promoter model links cis elements to genetic networks","volume":"347","author":"Wang","year":"2006","journal-title":"Biochem. Biophys. Res. Commun"},{"key":"2023020210381875500_B48","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1186\/1471-2164-8-374","article-title":"MetaProm: a neural network based meta-predictor for alternative human promoter prediction","volume":"8","author":"Wang","year":"2007","journal-title":"BMC Genomics"},{"key":"2023020210381875500_B49","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1142\/9781860948732_0021","article-title":"Prediction of transcription start sites based on feature selection using AMOSA","volume":"6","author":"Wang","year":"2007","journal-title":"Comput. Syst. Bioinformatics Conf"},{"key":"2023020210381875500_B50","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1016\/j.ygeno.2007.11.001","article-title":"Ensempro: an ensemble approach to predicting transcription start sites in human genomic DNA sequences","volume":"91","author":"Won","year":"2008","journal-title":"Genomics"},{"key":"2023020210381875500_B51","doi-asserted-by":"crossref","first-page":"2722","DOI":"10.1093\/bioinformatics\/btl482","article-title":"PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm","volume":"22","author":"Xie","year":"2006","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/i24\/49054147\/bioinformatics_24_13_i24.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/i24\/49054147\/bioinformatics_24_13_i24.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,27]],"date-time":"2024-02-27T22:21:26Z","timestamp":1709072486000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/13\/i24\/233949"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,7,1]]},"references-count":51,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2008,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn172","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,7,1]]},"published":{"date-parts":[[2008,7,1]]}}}