{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,15]],"date-time":"2025-04-15T04:57:01Z","timestamp":1744693021741},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2016,11,10]],"date-time":"2016-11-10T00:00:00Z","timestamp":1478736000000},"content-version":"vor","delay-in-days":162,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the \u2018next generation\u2019 of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA.<\/jats:p>\n               <jats:p>Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM\u2009+\u2009Clan, SCOP\/Superfamily or CATH\/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases.<\/jats:p>\n               <jats:p>Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization.<\/jats:p>\n               <jats:p>Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity.<\/jats:p>\n               <jats:p>Availability and Implementation: Benchmark datasets and all scripts are placed at ( http:\/\/sonnhammer.org\/download\/Homology_benchmark ).<\/jats:p>\n               <jats:p>Contact: \u00a0forslund@embl.de<\/jats:p>\n               <jats:p>Supplementary information : Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw305","type":"journal-article","created":{"date-parts":[[2016,6,3]],"date-time":"2016-06-03T01:12:51Z","timestamp":1464916371000},"page":"2636-2641","source":"Crossref","is-referenced-by-count":16,"title":["Benchmarking the next generation of homology inference tools"],"prefix":"10.1093","volume":"32","author":[{"given":"Ganapathi Varma","family":"Saripella","sequence":"first","affiliation":[{"name":"1 Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden"}]},{"given":"Erik L. L.","family":"Sonnhammer","sequence":"additional","affiliation":[{"name":"1 Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden"}]},{"given":"Kristoffer","family":"Forslund","sequence":"additional","affiliation":[{"name":"2 European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg 69117, Germany"}]}],"member":"286","published-online":{"date-parts":[[2016,6,1]]},"reference":[{"key":"2023020112590447800_btw305-B201","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs","volume":"25","author":"Altschul","year":"1997","journal-title":"NAR"},{"key":"2023020112590447800_btw305-B1","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1016\/S0968-0004(98)01298-5","article-title":"Iterated profile searches with PSI-BLAST \u2013 a tool for discovery in protein databases","volume":"23","author":"Altschul","year":"1998","journal-title":"Trends Biochem. Sci"},{"key":"2023020112590447800_btw305-B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"JMB"},{"key":"2023020112590447800_btw305-B3","doi-asserted-by":"crossref","first-page":"7353","DOI":"10.1093\/nar\/gkq625","article-title":"Issues in bioinformatics benchmarking: the case study of multiple sequence alignment","volume":"38","author":"Aniba","year":"2010","journal-title":"NAR"},{"key":"2023020112590447800_btw305-B4","doi-asserted-by":"crossref","first-page":"3770","DOI":"10.1073\/pnas.0810767106","article-title":"Sequence context-specific profiles for homology searching","volume":"106","author":"Biegert","year":"2009","journal-title":"PNAS"},{"key":"2023020112590447800_btw305-B5","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/1745-6150-7-12","article-title":"Domain enhanced lookup time accelerated BLAST","volume":"7","author":"Boratyn","year":"2012","journal-title":"Biol. Direct"},{"key":"2023020112590447800_btw305-B6","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkt282","article-title":"BLAST: a more efficient report with usability improvements","volume":"41","author":"Boratyn","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020112590447800_btw305-B7","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1006\/jmbi.1998.2144","article-title":"Predicting function: from genes to genomes and back","volume":"283","author":"Bork","year":"1998","journal-title":"J. Mol. Biol"},{"key":"2023020112590447800_btw305-B8","doi-asserted-by":"crossref","first-page":"D189","DOI":"10.1093\/nar\/gkh034","article-title":"The ASTRAL Compendium in 2004","volume":"32","author":"Chandonia","year":"2004","journal-title":"NAR"},{"key":"2023020112590447800_btw305-B9","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1002\/j.1460-2075.1986.tb04288.x","article-title":"The Relation between the Divergence of Sequence and Structure in Proteins","volume":"5","author":"Chothia","year":"1986","journal-title":"Embo J"},{"key":"2023020112590447800_btw305-B10","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/1472-6807-9-23","article-title":"Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis","volume":"9","author":"Csaba","year":"2009","journal-title":"BMC Struct. Biol"},{"key":"2023020112590447800_btw305-B11","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile Hidden Markov Models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023020112590447800_btw305-B12","first-page":"2460","article-title":"Search and clustering orders of magnitude faster than BLAST","volume":"26","author":"Edgar","year":"2010","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023020112590447800_btw305-B13","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1002\/prot.10043","article-title":"A study on protein sequence alignment quality","volume":"339","author":"Elofsson","year":"2002","journal-title":"Proteins: Struct. Funct. Bioinf"},{"key":"2023020112590447800_btw305-B14","doi-asserted-by":"crossref","first-page":"D222","DOI":"10.1093\/nar\/gkt1223","article-title":"Pfam: the protein families database","volume":"42","author":"Finn","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020112590447800_btw305-B15","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020112590447800_btw305-B202","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1093\/molbev\/msm254","article-title":"Domain tree-based analysis of protein architecture evolution","volume":"25","author":"Forslund","year":"2008","journal-title":"Mol. Biol. Evol."},{"key":"2023020112590447800_btw305-B16","doi-asserted-by":"crossref","first-page":"2500","DOI":"10.1093\/bioinformatics\/btp446","article-title":"Benchmarking homology detection procedures with low complexity filters","volume":"25","author":"Forslund","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020112590447800_btw305-B17","doi-asserted-by":"crossref","first-page":"D304","DOI":"10.1093\/nar\/gkt1240","article-title":"SCOPe: structural classification of proteins \u2013 extended, integrating SCOP and ASTRAL data and classification of new structures","volume":"42","author":"Fox","year":"2014","journal-title":"NAR"},{"key":"2023020112590447800_btw305-B18","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1006\/jmbi.2001.5080","article-title":"Assignment of homology to genome sequences using a library of Hidden Markov Models that represent all proteins of known structure","volume":"313","author":"Gough","year":"2001","journal-title":"JMB"},{"key":"2023020112590447800_btw305-B19","doi-asserted-by":"crossref","first-page":"1464","DOI":"10.1093\/bioinformatics\/bti204","article-title":"Convergent evolution of domain architectures (is rare)","volume":"21","author":"Gough","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020112590447800_btw305-B20","doi-asserted-by":"crossref","first-page":"4355","DOI":"10.1073\/pnas.84.13.4355","article-title":"Profile analysis: detection of distantly related proteins","volume":"84","author":"Gribskov","year":"1987","journal-title":"PNAS"},{"key":"2023020112590447800_btw305-B21","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/S0097-8485(96)80004-0","article-title":"Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching","volume":"20","author":"Gribskov","year":"1996","journal-title":"Comput. Chem"},{"key":"2023020112590447800_btw305-B22","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"PNAS"},{"key":"2023020112590447800_btw305-B23","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1093\/nar\/27.1.254","article-title":"SCOP: a structural classification of proteins database","volume":"27","author":"Hubbard","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023020112590447800_btw305-B24","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1093\/nar\/gkt1205","article-title":"Gene3D: multi-domain annotations for protein sequence and comparative genome analysis","volume":"42","author":"Lees","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020112590447800_btw305-B25","doi-asserted-by":"crossref","first-page":"D213","DOI":"10.1093\/nar\/gku1243","article-title":"The InterPro protein families database: the classification resource after 15 years","volume":"43","author":"Mitchell","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020112590447800_btw305-B26","doi-asserted-by":"crossref","first-page":"1257","DOI":"10.1006\/jmbi.1999.3233","article-title":"Benchmarking PSI-BLAST in Genome Annotation","volume":"293","author":"M\u00fcller","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023020112590447800_btw305-B27","doi-asserted-by":"crossref","first-page":"D227","DOI":"10.1093\/nar\/gku1041","article-title":"The SUPERFAMILY 1.75 database in 2014: a doubling of data","volume":"43","author":"Oates","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020112590447800_btw305-B28","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"PNAS"},{"key":"2023020112590447800_btw305-B29","doi-asserted-by":"crossref","first-page":"14717.","DOI":"10.1038\/srep14717","article-title":"An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life","volume":"5","author":"Roche","year":"2015","journal-title":"Sci. Rep"},{"key":"2023020112590447800_btw305-B30","doi-asserted-by":"crossref","first-page":"D13","DOI":"10.1093\/nar\/gkr1184","article-title":"Database resources of the national center for biotechnology information","volume":"40","author":"Sayers","year":"2012","journal-title":"NAR"},{"key":"2023020112590447800_btw305-B31","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023020112590447800_btw305-B32","first-page":"951","article-title":"Protein homology detection by HMM-HMM Comparison","volume":"21","author":"S\u00f6ding","year":"2005","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023020112590447800_btw305-B33","doi-asserted-by":"crossref","first-page":"D142","DOI":"10.1093\/nar\/gkp846","article-title":"The Universal Protein Resource (UniProt) in 2010","volume":"38","author":"The Uniprot Consortium","year":"2010","journal-title":"NAR"},{"key":"2023020112590447800_btw305-B34","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1006\/jmbi.2001.4513","article-title":"Evolution of function in protein superfamilies, from a structural perspective","volume":"307","author":"Todd","year":"2001","journal-title":"JMB"},{"key":"2023020112590447800_btw305-B35","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.sbi.2004.03.011","article-title":"Structure, function and evolution of multidomain proteins","volume":"14","author":"Vogel","year":"2004","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023020112590447800_btw305-B36","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1016\/S0022-2836(02)01336-0","article-title":"An accurate, sensitive, and scalable method to identify functional sites in protein structures","volume":"326","author":"Yao","year":"2003","journal-title":"JMB"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/17\/2636\/49021522\/bioinformatics_32_17_2636.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/17\/2636\/49021522\/bioinformatics_32_17_2636.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:59:13Z","timestamp":1675292353000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/17\/2636\/2450749"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,1]]},"references-count":38,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2016,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw305","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,9,1]]},"published":{"date-parts":[[2016,6,1]]}}}