{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T12:47:23Z","timestamp":1767962843739,"version":"3.49.0"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2016,10,12]],"date-time":"2016-10-12T00:00:00Z","timestamp":1476230400000},"content-version":"vor","delay-in-days":219,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"The Scientific and Technological Research Council of Turkey, Post-doctoral Research Fellowship Program","award":["TUBITAK BIDEB-2219"],"award-info":[{"award-number":["TUBITAK BIDEB-2219"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins.<\/jats:p>\n               <jats:p>Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB\/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB\/TrEMBL resulted in 44\u00a0818\u00a0178 GO term predictions for 12\u00a0172\u00a0114 proteins. 22% of these predictions were for 2\u00a0812\u00a0016 previously non-annotated protein entries indicating the significance of the value added by this approach.<\/jats:p>\n               <jats:p>Availability and implementation: The results of the method are available at: ftp:\/\/ftp.ebi.ac.uk\/pub\/contrib\/martin\/DAAC\/.<\/jats:p>\n               <jats:p>Contact: \u00a0tdogan@ebi.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw114","type":"journal-article","created":{"date-parts":[[2016,3,9]],"date-time":"2016-03-09T01:38:23Z","timestamp":1457487503000},"page":"2264-2271","source":"Crossref","is-referenced-by-count":40,"title":["UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB"],"prefix":"10.1093","volume":"32","author":[{"given":"Tunca","family":"Do\u011fan","sequence":"first","affiliation":[{"name":"European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK"}]},{"given":"Alistair","family":"MacDougall","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK"}]},{"given":"Rabie","family":"Saidi","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK"}]},{"given":"Diego","family":"Poggioli","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK"}]},{"given":"Alex","family":"Bateman","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK"}]},{"given":"Claire","family":"O\u2019Donovan","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK"}]},{"given":"Maria J.","family":"Martin","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK"}]}],"member":"286","published-online":{"date-parts":[[2016,3,7]]},"reference":[{"key":"2023020112463758600_btw114-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023020112463758600_btw114-B2","doi-asserted-by":"crossref","first-page":"W202","DOI":"10.1093\/nar\/gkp335","article-title":"MEME SUITE: tools for motif discovery and searching","volume":"37 (Suppl. 2)","author":"Bailey","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B67","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.str.2006.11.009","article-title":"The generation of new protein functions by the combination of domains","volume":"15","author":"Bashton","year":"2007","journal-title":"Structure"},{"key":"2023020112463758600_btw114-B3","first-page":"D25","article-title":"GenBank","volume":"36 (Suppl. 1)","author":"Benson","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B4","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1016\/j.jmb.2005.08.067","article-title":"Domain rearrangements in protein evolution","volume":"353","author":"Bj\u00f6rklund","year":"2005","journal-title":"J. Mol. Biol"},{"key":"2023020112463758600_btw114-B5","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1016\/j.jtbi.2010.12.024","article-title":"Some remarks on protein attribute prediction and pseudo amino acid composition","volume":"273","author":"Chou","year":"2011","journal-title":"J. Theor. Biol"},{"key":"2023020112463758600_btw114-B6","doi-asserted-by":"crossref","first-page":"D565","DOI":"10.1093\/nar\/gkr1048","article-title":"The UniProt-GO annotation database in 2011","volume":"40","author":"Dimmer","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B7","doi-asserted-by":"crossref","first-page":"e75458.","DOI":"10.1371\/journal.pone.0075458","article-title":"Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences","volume":"8","author":"Do\u011fan","year":"2013","journal-title":"PLoS One"},{"key":"2023020112463758600_btw114-B8","doi-asserted-by":"crossref","first-page":"W389","DOI":"10.1093\/nar\/gkv332","article-title":"JPred4: a protein secondary structure prediction server","volume":"43 (W1)","author":"Drozdetskiy","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B9","doi-asserted-by":"crossref","first-page":"D536","DOI":"10.1093\/nar\/gks1080","article-title":"dcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more","volume":"41","author":"Fang","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B10","doi-asserted-by":"crossref","first-page":"D222","DOI":"10.1093\/nar\/gkt1223","article-title":"The Pfam protein families database","volume":"42","author":"Finn","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B12","doi-asserted-by":"crossref","first-page":"8613","DOI":"10.1074\/jbc.M513590200","article-title":"Dynamic association between the catalytic and lectin domains of human UDP-GalNAc: polypeptide \u03b1-N-acetylgalactosaminyltransferase-2","volume":"281","author":"Fritz","year":"2006","journal-title":"J. Biol. Chem"},{"key":"2023020112463758600_btw114-B13","doi-asserted-by":"crossref","first-page":"1619","DOI":"10.1101\/gr.278202","article-title":"CDART: protein homology by domain architecture","volume":"12","author":"Geer","year":"2002","journal-title":"Genome Res"},{"key":"2023020112463758600_btw114-B14","doi-asserted-by":"crossref","first-page":"D1049","DOI":"10.1093\/nar\/gku1179","article-title":"Gene ontology consortium: going forward","volume":"43","author":"Gene Ontology Consortium","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B15","doi-asserted-by":"crossref","first-page":"39.","DOI":"10.1186\/1471-2105-10-39","article-title":"Protein domain organisation: adding order","volume":"10","author":"Kummerfeld","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020112463758600_btw114-B16","doi-asserted-by":"crossref","first-page":"S5.","DOI":"10.1186\/1471-2105-10-S15-S5","article-title":"Protein comparison at the domain architecture level","volume":"10 (Suppl. 15)","author":"Lee","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020112463758600_btw114-B35","first-page":"D28","article-title":"The European nucleotide archive","volume":"39 (Suppl. 1)","author":"Leinonen","year":"2010","journal-title":"Nucleic acids research"},{"key":"2023020112463758600_btw114-B17","doi-asserted-by":"crossref","first-page":"2081","DOI":"10.1093\/bioinformatics\/btl366","article-title":"An initial strategy for comparing proteins at the domain architecture level","volume":"22","author":"Lin","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020112463758600_btw114-B18","doi-asserted-by":"crossref","first-page":"i444","DOI":"10.1093\/bioinformatics\/bts398","article-title":"Protein domain recurrence and order can enhance prediction of protein functions","volume":"28","author":"Messih","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020112463758600_btw114-B19","first-page":"D213\u2013D21","article-title":"The InterPro protein families database: the classification resource after 15 years","volume":"43 (D1)","author":"Mitchell","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B20","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol"},{"key":"2023020112463758600_btw114-B21","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020112463758600_btw114-B22","doi-asserted-by":"crossref","first-page":"D1064","DOI":"10.1093\/nar\/gku1002","article-title":"HAMAP in 2015: updates to the protein family classification and annotation system","volume":"43","author":"Pedruzzi","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B23","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020112463758600_btw114-B24","doi-asserted-by":"crossref","first-page":"e12382","DOI":"10.1371\/journal.pone.0012382","article-title":"GOPred: GO molecular function prediction by combined classifiers","volume":"5","author":"Sara\u00e7","year":"2010","journal-title":"PLoS One"},{"key":"2023020112463758600_btw114-B25","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1186\/1471-2105-6-152","article-title":"pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties","volume":"6","author":"Sarda","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020112463758600_btw114-B26","doi-asserted-by":"crossref","first-page":"D344","DOI":"10.1093\/nar\/gks1067","article-title":"New and continuing developments at PROSITE","volume":"41 (D1)","author":"Sigrist","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B27","doi-asserted-by":"crossref","first-page":"W244","DOI":"10.1093\/nar\/gki408","article-title":"The HHpred interactive server for protein homology detection and structure prediction","volume":"33 (Suppl. 2)","author":"S\u00f6ding","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B28","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1089\/cmb.2007.A009","article-title":"Domain architecture comparison for multidomain homology identification","volume":"14","author":"Song","year":"2007","journal-title":"J. Comput. Biol"},{"key":"2023020112463758600_btw114-B29","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1093\/bioinformatics\/btt379","article-title":"Rapid similarity search of proteins using alignments of domain arrangements","volume":"30","author":"Terrapon","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020112463758600_btw114-B30","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.molcel.2014.05.032","article-title":"A million peptide motifs for the molecular biologist","volume":"55","author":"Tompa","year":"2014","journal-title":"Mol. Cell"},{"key":"2023020112463758600_btw114-B31","first-page":"667","volume-title":"Data Mining and Knowledge Discovery Handbook","author":"Tsoumakas","year":"2010"},{"key":"2023020112463758600_btw114-B32","doi-asserted-by":"crossref","first-page":"D204","DOI":"10.1093\/nar\/gku989","article-title":"UniProt: a hub for protein information","volume":"43","author":"UniProt Consortium","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020112463758600_btw114-B33","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1073\/pnas.70.3.697","article-title":"Nucleation, rapid folding, and globular intrachain regions in proteins","volume":"70","author":"Wetlaufer","year":"1973","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020112463758600_btw114-B34","doi-asserted-by":"crossref","first-page":"D380","DOI":"10.1093\/nar\/gkn762","article-title":"SUPERFAMILY\u2014sophisticated comparative genomics, data mining, visualization and phylogeny","volume":"37 (Suppl. 1)","author":"Wilson","year":"2009","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/15\/2264\/49020149\/bioinformatics_32_15_2264.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/15\/2264\/49020149\/bioinformatics_32_15_2264.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:49:53Z","timestamp":1675291793000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/15\/2264\/1742842"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,3,7]]},"references-count":35,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2016,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw114","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2016,8,1]]},"published":{"date-parts":[[2016,3,7]]}}}