{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T21:32:56Z","timestamp":1767907976465,"version":"3.49.0"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"18","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1490,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Burgeoning sequencing technologies have generated massive amounts of genomic and proteomic data. Annotating the functions of proteins identified in this data has become a big and crucial problem. Various computational methods have been developed to infer the protein functions based on either the sequences or domains of proteins. The existing methods, however, ignore the recurrence and the order of the protein domains in this function inference.<\/jats:p>\n               <jats:p>Results: We developed two new methods to infer protein functions based on protein domain recurrence and domain order. Our first method, DRDO, calculates the posterior probability of the Gene Ontology terms based on domain recurrence and domain order information, whereas our second method, DRDO-NB, relies on the na\u00efve Bayes methodology using the same domain architecture information. Our large-scale benchmark comparisons show strong improvements in the accuracy of the protein function inference achieved by our new methods, demonstrating that domain recurrence and order can provide important information for inference of protein functions.<\/jats:p>\n               <jats:p>Availability: The new models are provided as open source programs at http:\/\/sfb.kaust.edu.sa\/Pages\/Software.aspx.<\/jats:p>\n               <jats:p>Contact: \u00a0dkihara@cs.purdue.edu, xin.gao@kaust.edu.sa<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics Online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts398","type":"journal-article","created":{"date-parts":[[2012,9,7]],"date-time":"2012-09-07T20:35:22Z","timestamp":1347050122000},"page":"i444-i450","source":"Crossref","is-referenced-by-count":27,"title":["Protein domain recurrence and order can enhance prediction of protein functions"],"prefix":"10.1093","volume":"28","author":[{"given":"Mario Abdel","family":"Messih","sequence":"first","affiliation":[{"name":"1 Mathematical and Computer Sciences and Engineering Division"},{"name":"2 Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Meghana","family":"Chitale","sequence":"additional","affiliation":[{"name":"3 Department of Computer Science"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vladimir B.","family":"Bajic","sequence":"additional","affiliation":[{"name":"2 Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daisuke","family":"Kihara","sequence":"additional","affiliation":[{"name":"3 Department of Computer Science"},{"name":"4 Department of Biological Sciences, College of Science"},{"name":"5 Markey Center for Structural Biology, Purdue University, West Lafayette, Indiana, USA."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[{"name":"1 Mathematical and Computer Sciences and Engineering Division"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2012,9,3]]},"reference":[{"key":"2023012513030906200_B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped blast and psi blast: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012513030906200_B2","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1093\/nar\/29.1.37","article-title":"The interpro database, an integrated documentation resource for rotein families, domains and functional sites","volume":"29","author":"Apweiler","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012513030906200_B3","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1093\/bioinformatics\/btm240","article-title":"Automated improvement of domain annotations using context analysis of domain arrangements (aidan)","volume":"23","author":"Beaussart","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012513030906200_B4","doi-asserted-by":"crossref","first-page":"2007","DOI":"10.1002\/prot.22715","article-title":"Real-time ligand binding pocket database search using local surface descriptors","volume":"78","author":"Chikhi","year":"2010","journal-title":"Proteins"},{"key":"2023012513030906200_B5","doi-asserted-by":"crossref","first-page":"1739","DOI":"10.1093\/bioinformatics\/btp309","article-title":"ESG: extended similarity group method for automated protein function prediction","volume":"25","author":"Chitale","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012513030906200_B6","doi-asserted-by":"crossref","first-page":"4516","DOI":"10.1073\/pnas.0737502100","article-title":"Enhanced protein domain discovery by using language modeling techniques from speech recognition","volume":"100","author":"Coin","year":"2003","journal-title":"Proc. Nat. Acad. Sci."},{"key":"2023012513030906200_B7","doi-asserted-by":"crossref","first-page":"D427","DOI":"10.1093\/nar\/gkq1130","article-title":"Superfamily 1.75 including a domain-centric gene ontology method","volume":"39","author":"de Lima Morais","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012513030906200_B8","doi-asserted-by":"crossref","first-page":"D233","DOI":"10.1093\/nar\/gki057","article-title":"The RCSB protein data bank: a redesigned query system and relational database based on the mmCIF schema","volume":"33","author":"Deshpande","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012513030906200_B9","first-page":"109","article-title":"Global sequence properties for superfamily prediction: a machine learning approach","volume":"6","author":"Dobson","year":"2003","journal-title":"J. Integr. Bioinform."},{"key":"2023012513030906200_B10","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1038\/386279a0","article-title":"GRIP: a synaptic PDZ domain-containing protein that interacts with AMPA receptors","volume":"386","author":"Dong","year":"1997","journal-title":"Nature"},{"key":"2023012513030906200_B11","doi-asserted-by":"crossref","first-page":"e45","DOI":"10.1371\/journal.pcbi.0010045","article-title":"Protein molecular function prediction by Bayesian phylogenomics","volume":"1","author":"Engelhardt","year":"2005","journal-title":"PLoS Comput. Biol."},{"key":"2023012513030906200_B12","doi-asserted-by":"crossref","first-page":"1681","DOI":"10.1093\/bioinformatics\/btn312","article-title":"Predicting protein function from domain content","volume":"24","author":"Forslund","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012513030906200_B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1142\/S0219720007002503","article-title":"Function prediction of uncharacterized proteins","volume":"5","author":"Hawkins","year":"2007","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023012513030906200_B14","first-page":"127","article-title":"Gotrees: predicting GO associations from protein domain composition using decision trees","volume":"10","author":"Hayete","year":"2005","journal-title":"Pacific Symp. Biocomput."},{"key":"2023012513030906200_B15","first-page":"31","article-title":"Hierarchical protein classification based on gene ontology and decision trees","volume-title":"ICT Innovations 2010 Web Proceedings","author":"Ivanoska","year":"2010"},{"key":"2023012513030906200_B16","first-page":"65","article-title":"Automatic annotation of protein functional class from sparse and imbalanced data sets","volume":"Volume 4316","author":"Jung","year":"2006"},{"key":"2023012513030906200_B17","doi-asserted-by":"crossref","first-page":"2485","DOI":"10.1093\/bioinformatics\/btg338","article-title":"Gofigure: automated gene ontology annotation","volume":"19","author":"Khan","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012513030906200_B18","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1016\/j.copbio.2009.07.007","article-title":"Generation of new protein functions by nonhomologous combinations and rearrangements of domains and modules","volume":"20","author":"Koide","year":"2009","journal-title":"Cur. Opin. Biotechnol."},{"issue":"Suppl. 1","key":"2023012513030906200_B19","doi-asserted-by":"crossref","first-page":"S12","DOI":"10.1186\/1471-2148-7-S1-S12","article-title":"Flowerpower: clustering proteins into domain architecture classes for phylogenomic inference of protein function","volume":"7","author":"Krishnamurthy","year":"2007","journal-title":"BMC Evol. Biol."},{"key":"2023012513030906200_B20","doi-asserted-by":"crossref","first-page":"4844","DOI":"10.1128\/MCB.18.8.4844","article-title":"Disabled is a putative adaptor protein that functions during signaling by the sevenless receptor tyrosine kinase","volume":"18","author":"Le","year":"1998","journal-title":"Mol. Cell. Biol."},{"key":"2023012513030906200_B21","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1016\/j.neuron.2005.07.006","article-title":"PICK1 interacts with ABP\/GRIP to regulate AMPA receptor trafficking","volume":"47","author":"Lu","year":"2005","journal-title":"Neuron"},{"key":"2023012513030906200_B22","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1186\/1471-2105-5-178","article-title":"A new method for prediction of protein function assessed by the annotation of seven genomes","volume":"5","author":"Martin","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012513030906200_B23","doi-asserted-by":"crossref","first-page":"2611","DOI":"10.1523\/JNEUROSCI.3670-08.2009","article-title":"A dual role for the adaptor protein DRK in drosophila olfactory learning and memory","volume":"29","author":"Moressis","year":"2009","journal-title":"J. Neurosci."},{"key":"2023012513030906200_B24","doi-asserted-by":"crossref","first-page":"D224","DOI":"10.1093\/nar\/gkl841","article-title":"New developments in the interpro database","volume":"35","author":"Mulder","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012513030906200_B25","first-page":"RE7","article-title":"PDZ domain proteins: plug and play!","volume":"179","author":"Nourry","year":"2003","journal-title":"Science STKE"},{"key":"2023012513030906200_B26","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1016\/0092-8674(93)90170-U","article-title":"Disabled is a putative adaptor protein that functions during signaling by the sevenless receptor tyrosine kinase","volume":"73","author":"Olivier","year":"1993","journal-title":"Cell"},{"key":"2023012513030906200_B27","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1038\/372631a0","article-title":"Protein superfamilies and domain superfolds","volume":"372","author":"Orengo","year":"1994","journal-title":"Nature"},{"key":"2023012513030906200_B28","article-title":"Computational approaches for protein function prediction. A Survey","author":"Pandey","year":"2006"},{"key":"2023012513030906200_B29","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1089\/10665270252935539","article-title":"Learning gene functional classifications from multiple data types","volume":"9","author":"Pavlidis","year":"2002","journal-title":"J. Comput. Biol."},{"key":"2023012513030906200_B30","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1126\/science.1083653","article-title":"Assembly of cell regulatory systems through protein interaction domains","volume":"300","author":"Pawson","year":"2003","journal-title":"Science"},{"key":"2023012513030906200_B31","doi-asserted-by":"crossref","first-page":"D247","DOI":"10.1093\/nar\/gki024","article-title":"The cath domain structure database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis","volume":"33","author":"Pearl","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012513030906200_B32","doi-asserted-by":"crossref","first-page":"e1000443","DOI":"10.1371\/journal.pcbi.1000443","article-title":"Semantic similarity in biomedical ontologies","volume":"5","author":"Pesquita","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012513030906200_B33","first-page":"210","article-title":"Protein function prediction the power of multiplicity","volume":"27","author":"Rentzsch","year":"2009","journal-title":"Cell"},{"key":"2023012513030906200_B34","doi-asserted-by":"crossref","first-page":"1259","DOI":"10.1002\/prot.22030","article-title":"Fast protein tertiary structure retrieval based on global surface shape similarity","volume":"72","author":"Sael","year":"2008","journal-title":"Proteins"},{"key":"2023012513030906200_B35","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1007\/s10969-012-9126-6","article-title":"Structure- and sequence-based function prediction for non-homologous proteins","volume":"13","author":"Sael","year":"2012","journal-title":"J. Struct. Funct. Genomics"},{"key":"2023012513030906200_B36","volume-title":"Inter-Element Dependency Models for Sequence Classification","author":"Silvescu","year":"2004"},{"key":"2023012513030906200_B37","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1016\/0092-8674(93)90169-Q","article-title":"An SH3-SH2-SH3 protein is required for p21Ras1 activation and binds to sevenless and Sos proteins in vitro","volume":"73","author":"Simon","year":"1993","journal-title":"Cell"},{"key":"2023012513030906200_B38","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1089\/cmb.2007.A009","article-title":"Domain architecture comparison for multidomain homology identification","volume":"14","author":"Song","year":"2007","journal-title":"J. Comput. Biol."},{"key":"2023012513030906200_B39","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1002\/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L","article-title":"Pfam: a comprehensive database of protein domain families based on seed alignments","volume":"28","author":"Sonnhammer","year":"1997","journal-title":"Proteins"},{"key":"2023012513030906200_B40","first-page":"4244","article-title":"Domain content based protein function prediction using incomplete go annotation information","volume-title":"International Conference on Bioinformatics and Biomedicine Workshop","author":"Tan","year":"2009"},{"key":"2023012513030906200_B41","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1186\/1471-2105-5-116","article-title":"Applying support vector machine for gene ontology based gene function prediction","volume":"5","author":"Vinayagam","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012513030906200_B42","doi-asserted-by":"crossref","first-page":"809","DOI":"10.1016\/j.jmb.2003.12.026","article-title":"Supra-domains: evolutionary units larger than single protein domains","volume":"336","author":"Vogel","year":"2004","journal-title":"J. Mol. Biol."},{"key":"2023012513030906200_B43","doi-asserted-by":"crossref","first-page":"D187","DOI":"10.1093\/nar\/gkj161","article-title":"The universal protein resource (uniprot): an expanding universe of protein information","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/18\/i444\/48884009\/bioinformatics_28_18_i444.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/18\/i444\/48884009\/bioinformatics_28_18_i444.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T18:53:19Z","timestamp":1674672799000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/18\/i444\/248130"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9,3]]},"references-count":43,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2012,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts398","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,9,15]]},"published":{"date-parts":[[2012,9,3]]}}}