{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T02:21:38Z","timestamp":1769048498639,"version":"3.49.0"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Recent developments in sequence alignment software have made possible multiple sequence alignments (MSAs) of &amp;gt;100000 sequences in reasonable times. At present, there are no systematic analyses concerning the scalability of the alignment quality as the number of aligned sequences is increased.<\/jats:p>\n               <jats:p>Results: We benchmarked a wide range of widely used MSA packages using a selection of protein families with some known structures and found that the accuracy of such alignments decreases markedly as the number of sequences grows. This is more or less true of all packages and protein families. The phenomenon is mostly due to the accumulation of alignment errors, rather than problems in guide-tree construction. This is partly alleviated by using iterative refinement or selectively adding sequences. The average accuracy of progressive methods by comparison with structure-based benchmarks can be improved by incorporating information derived from high-quality structural alignments of sequences with solved structures. This suggests that the availability of high quality curated alignments will have to complement algorithmic and\/or software developments in the long-term.<\/jats:p>\n               <jats:p>Availability and implementation: Benchmark data used in this study are available at http:\/\/www.clustal.org\/omega\/homfam-20110613-25.tar.gz and http:\/\/www.clustal.org\/omega\/bali3fam-26.tar.gz.<\/jats:p>\n               <jats:p>Contact: \u00a0fabian.sievers@ucd.ie<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt093","type":"journal-article","created":{"date-parts":[[2013,2,22]],"date-time":"2013-02-22T01:50:54Z","timestamp":1361497854000},"page":"989-995","source":"Crossref","is-referenced-by-count":49,"title":["Making automated multiple alignments of very large numbers of protein sequences"],"prefix":"10.1093","volume":"29","author":[{"given":"Fabian","family":"Sievers","sequence":"first","affiliation":[{"name":"1 School of Medicine and Medical Science, Conway Institute, University College Dublin, Dublin 4, Ireland, 2Department of Bioengineering, University of California, Berkeley, CA 94729-1762, USA and 3Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Dineen","sequence":"additional","affiliation":[{"name":"1 School of Medicine and Medical Science, Conway Institute, University College Dublin, Dublin 4, Ireland, 2Department of Bioengineering, University of California, Berkeley, CA 94729-1762, USA and 3Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andreas","family":"Wilm","sequence":"additional","affiliation":[{"name":"1 School of Medicine and Medical Science, Conway Institute, University College Dublin, Dublin 4, Ireland, 2Department of Bioengineering, University of California, Berkeley, CA 94729-1762, USA and 3Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Desmond G.","family":"Higgins","sequence":"additional","affiliation":[{"name":"1 School of Medicine and Medical Science, Conway Institute, University College Dublin, Dublin 4, Ireland, 2Department of Bioengineering, University of California, Berkeley, CA 94729-1762, USA and 3Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2013,2,21]]},"reference":[{"key":"2023012810303989200_btt093-B1","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1093\/protein\/1.2.89","article-title":"Evaluation and improvements in the automatic alignment of protein sequences","volume":"1","author":"Barton","year":"1987","journal-title":"Protein Eng."},{"key":"2023012810303989200_btt093-B2","doi-asserted-by":"crossref","first-page":"2068","DOI":"10.1093\/bioinformatics\/btr320","article-title":"Aligning short reads to reference alignments and trees","volume":"27","author":"Berger","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B3","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1186\/1748-7188-5-21","article-title":"Sequence emBedding for fast construction of guide trees for multiple sequence alignment","volume":"5","author":"Blackshields","year":"2010","journal-title":"Algorithms Mol. Biol."},{"key":"2023012810303989200_btt093-B4","doi-asserted-by":"crossref","first-page":"e1000392","DOI":"10.1371\/journal.pcbi.1000392","article-title":"Fast statistical alignment","volume":"5","author":"Bradley","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012810303989200_btt093-B5","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1038\/nature11510","article-title":"Epistasis as the primary factor in molecular evolution","volume":"490","author":"Breen","year":"2012","journal-title":"Nature"},{"key":"2023012810303989200_btt093-B6","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1101\/gr.2821705","article-title":"ProbCons: probabilistic consistency-based multiple sequence alignment","volume":"15","author":"Do","year":"2005","journal-title":"Genome Res."},{"key":"2023012810303989200_btt093-B7","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012810303989200_btt093-B8","doi-asserted-by":"crossref","first-page":"2460","DOI":"10.1093\/bioinformatics\/btq461","article-title":"Search and clustering orders of magnitude faster than BLAST","volume":"26","author":"Edgar","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B9","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012810303989200_btt093-B10","doi-asserted-by":"crossref","first-page":"3059","DOI":"10.1093\/nar\/gkf436","article-title":"MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform","volume":"30","author":"Katoh","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012810303989200_btt093-B11","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1093\/nar\/gki198","article-title":"MAFFT version 5: improvement in accuracy of multiple sequence alignment","volume":"33","author":"Katoh","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012810303989200_btt093-B12","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1093\/bioinformatics\/btl592","article-title":"PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences","volume":"23","author":"Katoh","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B13","doi-asserted-by":"crossref","first-page":"3144","DOI":"10.1093\/bioinformatics\/bts578","article-title":"Adding unaligned sequences into an existing alignment using MAFFT and LAST","volume":"28","author":"Katoh","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B14","doi-asserted-by":"crossref","first-page":"2455","DOI":"10.1093\/bioinformatics\/btp452","article-title":"Upcoming challenges for multiple sequence alignment methods in the high-throughput era","volume":"25","author":"Kemena","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B16","doi-asserted-by":"crossref","first-page":"2947","DOI":"10.1093\/bioinformatics\/btm404","article-title":"Clustal W and Clustal X version 2.0","volume":"23","author":"Larkin","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B17","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1186\/1471-2105-6-298","article-title":"Kalign\u2013an accurate and fast multiple sequence alignment algorithm","volume":"6","author":"Lassmann","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012810303989200_btt093-B18","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1093\/bioinformatics\/18.3.452","article-title":"Multiple sequence alignment using partial order graphs","volume":"18","author":"Lee","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B19","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B20","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1126\/science.1171243","article-title":"Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees","volume":"324","author":"Liu","year":"2009","journal-title":"Science"},{"key":"2023012810303989200_btt093-B21","doi-asserted-by":"crossref","first-page":"1958","DOI":"10.1093\/bioinformatics\/btq338","article-title":"MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities","volume":"26","author":"Liu","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B22","doi-asserted-by":"crossref","first-page":"1632","DOI":"10.1126\/science.1158395","article-title":"Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis","volume":"320","author":"L\u00f6ytynoja","year":"2008","journal-title":"Science"},{"key":"2023012810303989200_btt093-B23","doi-asserted-by":"crossref","first-page":"1684","DOI":"10.1093\/bioinformatics\/bts198","article-title":"Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm","volume":"28","author":"L\u00f6ytynoja","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B24","doi-asserted-by":"crossref","first-page":"e28766","DOI":"10.1371\/journal.pone.0028766","article-title":"Protein 3D structure computed from evolutionary sequence variation","volume":"6","author":"Marks","year":"2011","journal-title":"PLoS One"},{"key":"2023012810303989200_btt093-B25","doi-asserted-by":"crossref","first-page":"2469","DOI":"10.1002\/pro.5560071126","article-title":"HOMSTRAD: a database of protein structure alignments for homologous families","volume":"7","author":"Mizuguchi","year":"2008","journal-title":"Protein Sci."},{"key":"2023012810303989200_btt093-B26","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1093\/bioinformatics\/14.3.290","article-title":"DIALIGN: finding local similarities by multiple sequence alignment","volume":"14","author":"Morgenstern","year":"1998","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B27","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1006\/jmbi.2000.4042","article-title":"T-Coffee: a novel method for fast and accurate multiple sequence alignment","volume":"302","author":"Notredame","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023012810303989200_btt093-B28","doi-asserted-by":"crossref","first-page":"4364","DOI":"10.1093\/nar\/gkl514","article-title":"MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information","volume":"34","author":"Pei","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012810303989200_btt093-B29","doi-asserted-by":"crossref","first-page":"802","DOI":"10.1093\/bioinformatics\/btm017","article-title":"PROMALS: towards accurate multiple sequence alignments of distantly related proteins","volume":"23","author":"Pei","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B30","doi-asserted-by":"crossref","first-page":"D290","DOI":"10.1093\/nar\/gkr1065","article-title":"The Pfam protein families database","volume":"40","author":"Punta","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012810303989200_btt093-B31","doi-asserted-by":"crossref","first-page":"2715","DOI":"10.1093\/bioinformatics\/btl472","article-title":"Probalign: multiple sequence alignment using partition function posterior probabilities","volume":"22","author":"Roshan","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012810303989200_btt093-B32","doi-asserted-by":"crossref","first-page":"e14454","DOI":"10.1371\/journal.pone.0014454","article-title":"A complete analysis of HA and NA genes of influenza A viruses","volume":"5","author":"Shi","year":"2010","journal-title":"PLoS One"},{"key":"2023012810303989200_btt093-B33","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol. Syst. Biol."},{"key":"2023012810303989200_btt093-B34","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1089\/cmb.2006.13.309","article-title":"A polynomial time solvable formulation of multiple sequence alignment","volume":"13","author":"Sze","year":"2006","journal-title":"J. Comput. Biol."},{"key":"2023012810303989200_btt093-B35","doi-asserted-by":"crossref","first-page":"W289","DOI":"10.1093\/nar\/gki390","article-title":"PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information","volume":"33","author":"Simossis","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012810303989200_btt093-B36","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1002\/prot.20527","article-title":"BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark","volume":"61","author":"Thompson","year":"2005","journal-title":"Proteins"},{"key":"2023012810303989200_btt093-B37","doi-asserted-by":"crossref","first-page":"e18093","DOI":"10.1371\/journal.pone.0018093","article-title":"A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives","volume":"6","author":"Thompson","year":"2011","journal-title":"PloS One"},{"key":"2023012810303989200_btt093-B38","doi-asserted-by":"crossref","first-page":"i559","DOI":"10.1093\/bioinformatics\/btm226","article-title":"Multiple alignment by aligning alignments","volume":"23","author":"Wheeler","year":"2007","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/8\/989\/48900723\/bioinformatics_29_8_989.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/8\/989\/48900723\/bioinformatics_29_8_989.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,28]],"date-time":"2023-01-28T12:15:28Z","timestamp":1674908128000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/8\/989\/229582"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,2,21]]},"references-count":37,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2013,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt093","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,4,15]]},"published":{"date-parts":[[2013,2,21]]}}}