{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T19:54:52Z","timestamp":1751399692286},"reference-count":15,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2005,3,8]],"date-time":"2005-03-08T00:00:00Z","timestamp":1110240000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"},{"start":{"date-parts":[[2005,3,8]],"date-time":"2005-03-08T00:00:00Z","timestamp":1110240000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>Computational protein annotation methods occasionally introduce errors. False-positive (FP) errors are annotations that are mistakenly associated with a protein. Such false annotations introduce errors that may spread into databases through similarity with other proteins. Generally, methods used to minimize the chance for FPs result in decreased sensitivity or low throughput. We present a novel protein-clustering method that enables automatic separation of FP from true hits. The method quantifies the biological similarity between pairs of proteins by examining each protein's annotations, and then proceeds by clustering sets of proteins that received similar annotation into biological groups.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>Using a test set of all PROSITE signatures that are marked as FPs, we show that the method successfully separates FPs in 69% of the 327 test cases supplied by PROSITE. Furthermore, we constructed an extensive random FP simulation test and show a high degree of success in detecting FP, indicating that the method is not specifically tuned for PROSITE and performs well on larger scales. We also suggest some means of predicting in which cases this approach would be successful.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusion<\/jats:title>\n                        <jats:p>Automatic detection of FPs may greatly facilitate the manual validation process and increase annotation sensitivity. With the increasing number of automatic annotations, the tendency of biological properties to be clustered, once a biological similarity measure is introduced, may become exceedingly helpful in the development of such automatic methods.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-6-46","type":"journal-article","created":{"date-parts":[[2005,3,9]],"date-time":"2005-03-09T07:25:58Z","timestamp":1110353158000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Automatic detection of false annotations via binary property clustering"],"prefix":"10.1186","volume":"6","author":[{"given":"Noam","family":"Kaplan","sequence":"first","affiliation":[]},{"given":"Michal","family":"Linial","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,3,8]]},"reference":[{"key":"371_CR1","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1016\/S0167-7799(03)00139-2","volume":"21","author":"M Linial","year":"2003","unstructured":"Linial M: How incorrect annotations evolve-the case of short ORFs. Trends Biotechnol 2003, 21: 298\u2013300. 10.1016\/S0167-7799(03)00139-2","journal-title":"Trends Biotechnol"},{"key":"371_CR2","doi-asserted-by":"publisher","first-page":"1641","DOI":"10.1093\/bioinformatics\/18.12.1641","volume":"18","author":"WR Gilks","year":"2002","unstructured":"Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA: Modeling the percolation of annotation errors in a database of protein sequences. Bioinformatics 2002, 18: 1641\u20131649. 10.1093\/bioinformatics\/18.12.1641","journal-title":"Bioinformatics"},{"key":"371_CR3","doi-asserted-by":"publisher","first-page":"717","DOI":"10.1093\/bioinformatics\/btg077","volume":"19","author":"I Iliopoulos","year":"2003","unstructured":"Iliopoulos I, Tsoka S, Andrade MA, Enright AJ, Carroll M, Poullet P, Promponas V, Liakopoulos T, Palaios G, Pasquier C, Hamodrakas S, Tamames J, Yagnik AT, Tramontano A, Devos D, Blaschke C, Valencia A, Brett D, Martin D, Leroy C, Rigoutsos I, Sander C, Ouzounis CA: Evaluation of annotation strategies using an entire genome sequence. Bioinformatics 2003, 19: 717\u2013726. 10.1093\/bioinformatics\/btg077","journal-title":"Bioinformatics"},{"key":"371_CR4","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1093\/nar\/gkg005","volume":"31","author":"D Frishman","year":"2003","unstructured":"Frishman D, Mokrejs M, Kosykh D, Kastenmuller G, Kolesov G, Zubrzycki I, Gruber C, Geier B, Kaps A, Albermann K, Volz A, Wagner C, Fellenberg M, Heumann K, Mewes HW: The PEDANT genome database. Nucleic Acids Res 2003, 31: 207\u2013211. 10.1093\/nar\/gkg005","journal-title":"Nucleic Acids Res"},{"key":"371_CR5","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1093\/bioinformatics\/15.5.391","volume":"15","author":"MA Andrade","year":"1999","unstructured":"Andrade MA, Brown NP, Leroy C, Hoersch S, De Daruvar A, Reich C, Franchini A, Tamames J, Valencia A, Ouzounis C, Sander C.: Automated genome sequence analysis and annotation. Bioinformatics 1999, 15: 391\u2013412. 10.1093\/bioinformatics\/15.5.391","journal-title":"Bioinformatics"},{"key":"371_CR6","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1016\/S0168-9525(01)02348-4","volume":"17","author":"D Devos","year":"2001","unstructured":"Devos D, Valencia A: Intrinsic errors in genome annotation. Trends Genet 2001, 17: 429\u2013431. 10.1016\/S0168-9525(01)02348-4","journal-title":"Trends Genet"},{"key":"371_CR7","doi-asserted-by":"publisher","first-page":"5617","DOI":"10.1093\/nar\/gkg769","volume":"31","author":"N Kaplan","year":"2003","unstructured":"Kaplan N, Vaaknin A, Linial M: PANDORA: keyword-based analysis of protein sets by integration of annotation sources. Nucleic Acids Res 2003, 31: 5617\u20135626. 10.1093\/nar\/gkg769","journal-title":"Nucleic Acids Res"},{"key":"371_CR8","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1093\/bib\/3.3.265","volume":"3","author":"CJ Sigrist","year":"2002","unstructured":"Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P: PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 2002, 3: 265\u2013274.","journal-title":"Brief Bioinform"},{"key":"371_CR9","doi-asserted-by":"publisher","first-page":"1145","DOI":"10.1093\/bioinformatics\/16.12.1145","volume":"16","author":"R Apweiler","year":"2000","unstructured":"Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, Durbin R, Falquet L, Fleischmann W, Gouzy J, Hermjakob H, Hulo N, Jonassen I, Kahn D, Kanapin A, Karavidopoulou Y, Lopez R, Marx B, Mulder NJ, Oinn TM, Pagni M, Servant F, Sigrist CJ, Zdobnov EM: InterPro \u2013 an integrated documentation resource for protein families, domains and functional sites. Bioinformatics 2000, 16: 1145\u20131150. 10.1093\/bioinformatics\/16.12.1145","journal-title":"Bioinformatics"},{"key":"371_CR10","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1093\/nar\/gkg095","volume":"31","author":"B Boeckmann","year":"2003","unstructured":"Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31: 365\u2013370. 10.1093\/nar\/gkg095","journal-title":"Nucleic Acids Res"},{"key":"371_CR11","doi-asserted-by":"publisher","first-page":"662","DOI":"10.1101\/gr.461403","volume":"13","author":"E Camon","year":"2003","unstructured":"Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, Kersey P, Mulder N, Oinn T, Maslen J, Cox A, Apweiler R: The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res 2003, 13: 662\u2013672. 10.1101\/gr.461403","journal-title":"Genome Res"},{"key":"371_CR12","doi-asserted-by":"publisher","first-page":"1257","DOI":"10.1006\/jmbi.1999.3233","volume":"293","author":"A Muller","year":"1999","unstructured":"Muller A, MacCallum RM, Sternberg MJ: Benchmarking PSI-BLAST in genome annotation. J Mol Biol 1999, 293: 1257\u20131271. 10.1006\/jmbi.1999.3233","journal-title":"J Mol Biol"},{"key":"371_CR13","doi-asserted-by":"publisher","first-page":"846","DOI":"10.1093\/bioinformatics\/14.10.846","volume":"14","author":"K Karplus","year":"1998","unstructured":"Karplus K, Barrett C, Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998, 14: 846\u2013856. 10.1093\/bioinformatics\/14.10.846","journal-title":"Bioinformatics"},{"key":"371_CR14","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1093\/bioinformatics\/18.1.147","volume":"18","author":"R Karchin","year":"2002","unstructured":"Karchin R, Karplus K, Haussler D: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 2002, 18: 147\u2013159. 10.1093\/bioinformatics\/18.1.147","journal-title":"Bioinformatics"},{"issue":"Suppl 1","key":"371_CR15","doi-asserted-by":"publisher","first-page":"i342","DOI":"10.1093\/bioinformatics\/bth938","volume":"20","author":"D Wieser","year":"2004","unstructured":"Wieser D, Kretschmann E, Apweiler R: Filtering erroneous protein annotation. Bioinformatics 2004, 20(Suppl 1):i342-i347. 10.1093\/bioinformatics\/bth938","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-46.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-6-46\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-46.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:06:51Z","timestamp":1728302811000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-46"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,3,8]]},"references-count":15,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2005,12]]}},"alternative-id":["371"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-46","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2005,3,8]]},"assertion":[{"value":"5 September 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 March 2005","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 March 2005","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"46"}}