{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:37Z","timestamp":1772138077245,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2807,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Motivation: Several functional gene annotation databases have been developed in the recent years, and are widely used to infer the biological function of gene sets, by scrutinizing the attributes that appear over- and underrepresented. However, this strategy is not directly applicable to the study of non-coding DNA, as the non-coding sequence span varies greatly among different gene loci in the human genome and longer loci have a higher likelihood of being selected purely by chance. Therefore, conclusions involving the function of non-coding elements that are drawn based on the annotation of neighboring genes are often biased. We assessed the systematic bias in several particular Gene Ontology (GO) categories using the standard hypergeometric test, by randomly sampling non-coding elements from the human genome and inferring their function based on the functional annotation of the closest genes. While no category is expected to occur significantly over- or underrepresented for a random selection of elements, categories such as \u2018cell adhesion\u2019, \u2018nervous system development\u2019 and \u2018transcription factor activities\u2019 appeared to be systematically overrepresented, while others such as \u2018olfactory receptor activity\u2019\u2014underrepresented.<\/jats:p>\n                  <jats:p>Results: Our results suggest that functional inference for non-coding elements using gene annotation databases requires a special correction. We introduce a set of correction coefficients for the probabilities of the GO categories that accounts for the variability in the length of the non-coding DNA across different loci and effectively eliminates the ascertainment bias from the functional characterization of non-coding elements. Our approach can be easily generalized to any other gene annotation database.<\/jats:p>\n                  <jats:p>Contact: \u00a0ovcharei@ncbi.nlm.nih.gov<\/jats:p>\n                  <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics Online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp043","type":"journal-article","created":{"date-parts":[[2009,1,25]],"date-time":"2009-01-25T20:13:06Z","timestamp":1232914386000},"page":"578-584","source":"Crossref","is-referenced-by-count":23,"title":["Variable locus length in the human genome leads to ascertainment bias in functional inference for non-coding elements"],"prefix":"10.1093","volume":"25","author":[{"given":"Leila","family":"Taher","sequence":"first","affiliation":[{"name":"Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]},{"given":"Ivan","family":"Ovcharenko","sequence":"additional","affiliation":[{"name":"Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,1,25]]},"reference":[{"key":"2023013110115703500_B1","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1093\/bioinformatics\/btg455","article-title":"FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes","volume":"20","author":"Al-Shahrour","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110115703500_B2","doi-asserted-by":"crossref","first-page":"W91","DOI":"10.1093\/nar\/gkm260","article-title":"FatiGO+: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments","volume":"35","author":"Al-Shahrour","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023013110115703500_B3","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1242\/jcs.114.4.629","article-title":"The cadherin superfamily: diversity in form and function","volume":"114","author":"Angst","year":"2001","journal-title":"J. Cell Sci."},{"key":"2023013110115703500_B4","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023013110115703500_B5","doi-asserted-by":"crossref","first-page":"1464","DOI":"10.1093\/bioinformatics\/bth088","article-title":"GOstat: find statistically overrepresented Gene Ontologies within a group of genes","volume":"20","author":"Beissbarth","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110115703500_B6","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1126\/science.1098119","article-title":"Ultraconserved elements in the human genome","volume":"304","author":"Bejerano","year":"2004","journal-title":"Science"},{"key":"2023013110115703500_B7","first-page":"13","article-title":"Il Calcolo delle assicurazioni su gruppi di teste","volume-title":"Studi in Onore del Professore Salvatore Ortu Carboni.","author":"Bonferroni","year":"1935"},{"key":"2023013110115703500_B8","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1034\/j.1399-0004.2000.570403.x","article-title":"Online Mendelian Inheritance in Man (OMIM) as a knowledgebase for human developmental disorders","volume":"57","author":"Boyadjiev","year":"2000","journal-title":"Clin. Genet."},{"key":"2023013110115703500_B9","doi-asserted-by":"crossref","first-page":"3710","DOI":"10.1093\/bioinformatics\/bth456","article-title":"GO::TermFinder\u2013open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes","volume":"20","author":"Boyle EI","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110115703500_B10","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1038\/nrg1527","article-title":"Conserved non-genic sequences - an unexpected feature of mammalian genomes","volume":"6","author":"Dermitzakis","year":"2005","journal-title":"Nat. Rev. Genet."},{"key":"2023013110115703500_B11","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1093\/nar\/30.1.52","article-title":"Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders","volume":"30","author":"Hamosh","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023013110115703500_B12","doi-asserted-by":"crossref","first-page":"D514","DOI":"10.1093\/nar\/gki033","article-title":"Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders","volume":"33","author":"Hamosh","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023013110115703500_B13","doi-asserted-by":"crossref","first-page":"R257","DOI":"10.1186\/gb-2007-8-12-r257","article-title":"Prediction of synergistic transcription factors by function conservation","volume":"8","author":"Hu","year":"2007","journal-title":"Genome Biol."},{"key":"2023013110115703500_B14","first-page":"299","article-title":"Predicting gene function from gene expressions and ontologies","volume-title":"Pacific Symposium in Biocomputing.","author":"Hvidsten","year":"2001"},{"key":"2023013110115703500_B15","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1093\/nar\/gkj102","article-title":"From genomics to chemical genomics: new developments in KEGG","volume":"34","author":"Kanehisa","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023013110115703500_B16","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1093\/nar\/gkm882","article-title":"KEGG for linking genomes to life and the environment","volume":"36","author":"Kanehisa","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013110115703500_B17","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1093\/nar\/gkg129","article-title":"The UCSC Genome Browser Database","volume":"31","author":"Karolchik","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023013110115703500_B18","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1101\/gr.440803","article-title":"Predicting gene function from patterns of annotation","volume":"13","author":"King","year":"2003","journal-title":"Genome Res."},{"key":"2023013110115703500_B19","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1093\/hmg\/ddg180","article-title":"A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly","volume":"12","author":"Lettice","year":"2003","journal-title":"Hum. Mol. Genet."},{"key":"2023013110115703500_B20","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1016\/S0306-4522(02)00053-2","article-title":"Forebrain-specific promoter\/enhancer D6 derived from the mouse Dach1 gene controls expression in neural stem cells","volume":"112","author":"Machon","year":"2002","journal-title":"Neuroscience"},{"key":"2023013110115703500_B21","doi-asserted-by":"crossref","first-page":"3448","DOI":"10.1093\/bioinformatics\/bti551","article-title":"BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks","volume":"21","author":"Maere","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110115703500_B22","doi-asserted-by":"crossref","first-page":"R101","DOI":"10.1186\/gb-2004-5-12-r101","article-title":"GOToolBox: functional investigation of gene datasets based on Gene Ontology","volume":"5","author":"Martin","year":"2004","journal-title":"Genome Biol."},{"key":"2023013110115703500_B23","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1101\/gr.4143406","article-title":"Ancient duplicated conserved noncoding elements in vertebrates: a genomic and functional analysis","volume":"16","author":"McEwen","year":"2006","journal-title":"Genome Res."},{"key":"2023013110115703500_B24","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1126\/science.1088328","article-title":"Scanning human gene deserts for long-range enhancers","volume":"302","author":"Nobrega","year":"2003","journal-title":"Science"},{"key":"2023013110115703500_B25","doi-asserted-by":"crossref","first-page":"1668","DOI":"10.1093\/molbev\/msn116","article-title":"Widespread ultraconservation divergence in primates","volume":"25","author":"Ovcharenko","year":"2008","journal-title":"Mol. Biol. Evol."},{"key":"2023013110115703500_B26","first-page":"1668","article-title":"Interpreting mammalian evolution using Fugu genome comparisons","volume":"25","author":"Ovcharenko","year":"2004","journal-title":"Genomics"},{"key":"2023013110115703500_B27","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1101\/gr.3015505","article-title":"Evolution and functional classification of vertebrate gene deserts","volume":"15","author":"Ovcharenko","year":"2005","journal-title":"Genome Res."},{"key":"2023013110115703500_B28","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1038\/nature05295","article-title":"In vivo enhancer analysis of human conserved non-coding sequences","volume":"444","author":"Pennacchio","year":"2006","journal-title":"Nature"},{"key":"2023013110115703500_B29","doi-asserted-by":"crossref","first-page":"D61","DOI":"10.1093\/nar\/gkl842","article-title":"NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"35","author":"Pruitt","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023013110115703500_B30","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1186\/1471-2164-5-99","article-title":"Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes","volume":"5","author":"Sandelin","year":"2004","journal-title":"BMC Genomics"},{"key":"2023013110115703500_B31","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/genetics\/165.1.235","article-title":"Identification of Cis-regulatory elements in the mouse Pax9\/Nkx2-9 genomic region: implication for evolutionary conserved synteny","volume":"165","author":"Santagati","year":"2003","journal-title":"Genetics"},{"key":"2023013110115703500_B32","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1093\/bioinformatics\/btm551","article-title":"SNPtoGO: characterizing SNPs by enriched GO terms","volume":"24","author":"Schwarz","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013110115703500_B33","doi-asserted-by":"crossref","first-page":"1251","DOI":"10.1038\/nbt1346","article-title":"The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration","volume":"25","author":"Smith","year":"2007","journal-title":"Nat. Biotechnol."},{"key":"2023013110115703500_B34","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1038\/nature01262","article-title":"Initial sequencing and comparative analysis of the mouse genome","volume":"420","author":"Waterston","year":"2002","journal-title":"Nature"},{"key":"2023013110115703500_B35","doi-asserted-by":"crossref","first-page":"e7","DOI":"10.1371\/journal.pbio.0030007","article-title":"Highly conserved non-coding sequences are associated with vertebrate development","volume":"3","author":"Woolfe","year":"2005","journal-title":"PLoS Biol."},{"key":"2023013110115703500_B36","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1186\/1471-213X-7-100","article-title":"CONDOR: a database resource of developmentally associated conserved non-coding elements","volume":"7","author":"Woolfe","year":"2007","journal-title":"BMC Dev. Biol."},{"key":"2023013110115703500_B37","doi-asserted-by":"crossref","first-page":"3124","DOI":"10.1073\/pnas.97.7.3124","article-title":"Large exons encoding multiple ectodomains are a characteristic feature of protocadherin genes","volume":"97","author":"Wu","year":"2000","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110115703500_B38","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1101\/gr.167301","article-title":"Comparative DNA sequence analysis of mouse and human protocadherin gene clusters","volume":"11","author":"Wu","year":"2001","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/5\/578\/48985489\/bioinformatics_25_5_578.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/5\/578\/48985489\/bioinformatics_25_5_578.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T15:08:42Z","timestamp":1675177722000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/5\/578\/183706"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,1,25]]},"references-count":38,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2009,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp043","relation":{"has-review":[{"id-type":"doi","id":"10.3410\/f.1157594.617784","asserted-by":"object"}]},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,3,1]]},"published":{"date-parts":[[2009,1,25]]}}}