{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,3]],"date-time":"2024-07-03T11:59:13Z","timestamp":1720007953053},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The massively parallel sequencing technology can be used by small research labs to generate genome sequences of their research interest. However, annotation of genomes still relies on the manual process, which becomes a serious bottleneck to the high-throughput genome projects. Recently, automatic annotation methods are increasingly more accurate, but there are several issues. One important challenge in using automatic annotation methods is to distinguish annotation quality of ORFs or genes. The availability of such annotation quality of genes can reduce the human labor cost dramatically since manual inspection can focus only on genes with low-annotation quality scores.<\/jats:p>\n               <jats:p>Results: In this article, we propose a novel annotation quality or confidence scoring scheme, called Annotation Confidence Score (ACS), using a genome comparison approach. The scoring scheme is computed by combining sequence and textual annotation similarity using a modified version of a logistic curve. The most important feature of the proposed scoring scheme is to generate a score that reflects the excellence in annotation quality of genes by automatically adjusting the number of genomes used to compute the score and their phylogenetic distance. Extensive experiments with bacterial genomes showed that the proposed scoring scheme generated scores for annotation quality according to the quality of annotation regardless of the number of reference genomes and their phylogenetic distance.<\/jats:p>\n               <jats:p>Availability: \u00a0http:\/\/microbial.informatics.indiana.edu\/acs.<\/jats:p>\n               <jats:p>Contact: \u00a0sumkim2@indiana.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp613","type":"journal-article","created":{"date-parts":[[2009,10,25]],"date-time":"2009-10-25T00:12:49Z","timestamp":1256429569000},"page":"22-29","source":"Crossref","is-referenced-by-count":9,"title":["Annotation confidence score for genome annotation: a genome comparison approach"],"prefix":"10.1093","volume":"26","author":[{"given":"Youngik","family":"Yang","sequence":"first","affiliation":[{"name":"1 School of Informatics and Computing and 2 Department of Biology, Indiana University, Bloomington, IN 47408, USA"}]},{"given":"Donald","family":"Gilbert","sequence":"additional","affiliation":[{"name":"1 School of Informatics and Computing and 2 Department of Biology, Indiana University, Bloomington, IN 47408, USA"}]},{"given":"Sun","family":"Kim","sequence":"additional","affiliation":[{"name":"1 School of Informatics and Computing and 2 Department of Biology, Indiana University, Bloomington, IN 47408, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,10,24]]},"reference":[{"key":"2023012507531423100_B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023012507531423100_B2","first-page":"26","volume-title":"Survey of Text Mining I: Clustering, Classification, and Retrieval","author":"Berry","year":"2003"},{"key":"2023012507531423100_B3","volume-title":"The Richards Function","author":"Centre for Horticulture and Lanscape"},{"key":"2023012507531423100_B4","first-page":"49","volume-title":"Mining the Web: Discovering Knowledge form Hypertext Data","author":"Chakrabati","year":"2002","edition":"1st"},{"key":"2023012507531423100_B5","volume-title":"WordNet: An Electronic Lexical Database","author":"Christiane","year":"1998"},{"key":"2023012507531423100_B6","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1093\/nar\/gkg044","article-title":"iProClass: an integrated database of protein family, function and structure information","volume":"31","author":"Huang","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023012507531423100_B7","doi-asserted-by":"crossref","first-page":"717","DOI":"10.1093\/bioinformatics\/btg077","article-title":"Evaluation of annotation strategies using an entire genome sequence","volume":"19","author":"Iliopoulos","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012507531423100_B8","author":"Illunima","year":"2007","journal-title":"DNA sequencing with Solexa technology."},{"key":"2023012507531423100_B9","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1093\/bioinformatics\/bti749","article-title":"BioThesaurus: a web-based thesaurus of protein and gene names","volume":"22","author":"Liu","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012507531423100_B10","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1197\/jamia.M2085","article-title":"Quantitative Assessment of Dictionary-based Protein Named Entity Tagging","volume":"13","author":"Liu","year":"2006","journal-title":"J. Am. Med. Inform. Assoc."},{"issue":"Database issue","key":"2023012507531423100_B11","first-page":"528","article-title":"The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions","volume":"36","author":"Markowitz","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012507531423100_B12","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1093\/bioinformatics\/bti027","article-title":"Improving genome annotations using phylogenetic profile anomaly detection","volume":"21","author":"Mikkelsen","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012507531423100_B13","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1186\/1471-2105-9-353","article-title":"Identification and correction of abnormal, incomplete and mispredicted proteins in public databases","volume":"9","author":"Nagy","year":"2008","journal-title":"BMC Bioinform."},{"key":"2023012507531423100_B14","author":"NIH","year":"2007","journal-title":"An Overview of MeSH."},{"key":"2023012507531423100_B15","doi-asserted-by":"crossref","first-page":"2896","DOI":"10.1073\/pnas.96.6.2896","article-title":"The use of gene clusters to infer functional coupling","volume":"96","author":"Overbeek","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507531423100_B16","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1108\/eb046814","article-title":"An algorithm for suffix stripping","volume":"4","author":"Porter","year":"1980","journal-title":"Program"},{"key":"2023012507531423100_B17"},{"key":"2023012507531423100_B18","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1093\/jxb\/10.2.290","article-title":"A flexible growth function for empirical use","volume":"10","author":"Richards","year":"1959","journal-title":"J. Exp. Bot."},{"key":"2023012507531423100_B19","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1016\/0022-2836(75)90213-2","article-title":"A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase","volume":"94","author":"Sanger","year":"1975","journal-title":"J. Mol. Biol."},{"key":"2023012507531423100_B20","doi-asserted-by":"crossref","first-page":"5463","DOI":"10.1073\/pnas.74.12.5463","article-title":"DNA sequencing with chain-terminating inhibitors","volume":"74","author":"Sanger","year":"1977","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507531423100_B21","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1126\/science.278.5338.631","article-title":"A genomic perspective on protein families","volume":"278","author":"Tatusov","year":"1997","journal-title":"Science"},{"key":"2023012507531423100_B22","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","article-title":"CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice","volume":"11","author":"Thompson","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2023012507531423100_B23","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1186\/1471-2164-7-275","article-title":"454 sequencing put to the test using the complex genome of barley","volume":"7","author":"Wicker","year":"2006","journal-title":"BMC Genomics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/1\/22\/48852033\/bioinformatics_26_1_22.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/1\/22\/48852033\/bioinformatics_26_1_22.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T07:53:36Z","timestamp":1674633216000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/1\/22\/182256"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,10,24]]},"references-count":23,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp613","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,1,1]]},"published":{"date-parts":[[2009,10,24]]}}}