{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T03:32:07Z","timestamp":1775791927752,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2016,11,7]],"date-time":"2016-11-07T00:00:00Z","timestamp":1478476800000},"content-version":"vor","delay-in-days":126,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones.<\/jats:p>\n               <jats:p>Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use.<\/jats:p>\n               <jats:p>Availability and Implementation: \u00a0http:\/\/mafft.cbrc.jp\/alignment\/software\/<\/jats:p>\n               <jats:p>Contact: \u00a0katoh@ifrec.osaka-u.ac.jp<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw412","type":"journal-article","created":{"date-parts":[[2016,7,5]],"date-time":"2016-07-05T01:59:34Z","timestamp":1467683974000},"page":"3246-3251","source":"Crossref","is-referenced-by-count":319,"title":["Application of the MAFFT sequence alignment program to large data\u2014reexamination of the usefulness of chained guide trees"],"prefix":"10.1093","volume":"32","author":[{"given":"Kazunori D.","family":"Yamada","sequence":"first","affiliation":[{"name":"1 Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan"},{"name":"2 Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan"}]},{"given":"Kentaro","family":"Tomii","sequence":"additional","affiliation":[{"name":"2 Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan"},{"name":"3 Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan"}]},{"given":"Kazutaka","family":"Katoh","sequence":"additional","affiliation":[{"name":"2 Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan"},{"name":"4 Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan"}]}],"member":"286","published-online":{"date-parts":[[2016,7,4]]},"reference":[{"key":"2023020113515753900_btw412-B1","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/0022-2836(87)90316-0","article-title":"A strategy for the rapid multiple alignment of protein sequences. confidence levels from tertiary structure comparisons","volume":"198","author":"Barton","year":"1987","journal-title":"J. Mol. Biol"},{"key":"2023020113515753900_btw412-B2","first-page":"479","article-title":"A novel randomized iterative strategy for aligning multiple protein sequences","volume":"7","author":"Berger","year":"1991","journal-title":"Comput. Appl. Biosci"},{"key":"2023020113515753900_btw412-B3","doi-asserted-by":"crossref","first-page":"10556","DOI":"10.1073\/pnas.1405628111","article-title":"Simple chained guide trees give high-quality protein multiple sequence alignments","volume":"111","author":"Boyce","year":"2014","journal-title":"Proc. Natl Acad. Sci. U.S.A"},{"key":"2023020113515753900_btw412-B4","doi-asserted-by":"crossref","first-page":"1625","DOI":"10.1093\/molbev\/msu117","article-title":"TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction","volume":"31","author":"Chang","year":"2014","journal-title":"Mol. Biol. Evol"},{"key":"2023020113515753900_btw412-B5","doi-asserted-by":"crossref","first-page":"113.","DOI":"10.1186\/1471-2105-5-113","article-title":"MUSCLE: a multiple sequence alignment method with reduced time and space complexity","volume":"5","author":"Edgar","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023020113515753900_btw412-B6","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023020113515753900_btw412-B7","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1007\/BF02603120","article-title":"Progressive sequence alignment as a prerequisite to correct phylogenetic trees","volume":"25","author":"Feng","year":"1987","journal-title":"J. Mol. Evol"},{"key":"2023020113515753900_btw412-B8","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"Hmmer web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020113515753900_btw412-B9","doi-asserted-by":"crossref","first-page":"814","DOI":"10.1093\/bioinformatics\/btv592","article-title":"Using \n              de novo\n               protein structure predictions to measure the quality of very large multiple sequence alignments","volume":"32","author":"Fox","year":"2016","journal-title":"Bioinformatics"},{"key":"2023020113515753900_btw412-B10","first-page":"361","article-title":"Optimal alignment between groups of sequences and its application to multiple sequence alignment","volume":"9","author":"Gotoh","year":"1993","journal-title":"Comput. Appl. Biosci"},{"key":"2023020113515753900_btw412-B11","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1016\/0378-1119(88)90330-7","article-title":"CLUSTAL: a package for performing multiple sequence alignment on a microcomputer","volume":"73","author":"Higgins","year":"1988","journal-title":"Gene"},{"key":"2023020113515753900_btw412-B12","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1007\/BF02257378","article-title":"The alignment of sets of sequences and the construction of phyletic trees: an integrated method","volume":"20","author":"Hogeweg","year":"1984","journal-title":"J. Mol. Evol"},{"key":"2023020113515753900_btw412-B13","doi-asserted-by":"crossref","first-page":"15674","DOI":"10.1073\/pnas.1314045110","article-title":"Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era","volume":"110","author":"Kamisetty","year":"2013","journal-title":"Proc. Natl Acad. Sci. U.S.A"},{"key":"2023020113515753900_btw412-B14","doi-asserted-by":"crossref","first-page":"3144","DOI":"10.1093\/bioinformatics\/bts578","article-title":"Adding unaligned sequences into an existing alignment using MAFFT and LAST","volume":"28","author":"Katoh","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020113515753900_btw412-B15","doi-asserted-by":"crossref","first-page":"1933","DOI":"10.1093\/bioinformatics\/btw108","article-title":"A simple method to control over-alignment in the MAFFT multiple sequence alignment program","volume":"32","author":"Katoh","year":"2016","journal-title":"Bioinformatics"},{"key":"2023020113515753900_btw412-B16","doi-asserted-by":"crossref","first-page":"3059","DOI":"10.1093\/nar\/gkf436","article-title":"Mafft: a novel method for rapid multiple sequence alignment based on fast Fourier transform","volume":"30","author":"Katoh","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023020113515753900_btw412-B17","doi-asserted-by":"crossref","first-page":"3250","DOI":"10.1093\/bioinformatics\/btr553","article-title":"FastSP: linear time calculation of alignment accuracy","volume":"27","author":"Mirarab","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020113515753900_btw412-B18","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1089\/cmb.2014.0156","article-title":"PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences","volume":"22","author":"Mirarab","year":"2015","journal-title":"J. Comput. Biol"},{"key":"2023020113515753900_btw412-B19","doi-asserted-by":"crossref","first-page":"2469","DOI":"10.1002\/pro.5560071126","article-title":"HOMSTRAD: a database of protein structure alignments for homologous families","volume":"7","author":"Mizuguchi","year":"1998","journal-title":"Protein Sci"},{"key":"2023020113515753900_btw412-B20","doi-asserted-by":"crossref","first-page":"124.","DOI":"10.1186\/s13059-015-0688-z","article-title":"Ultra-large alignments using phylogeny-aware profiles","volume":"16","author":"Nguyen","year":"2015","journal-title":"Genome Biol"},{"key":"2023020113515753900_btw412-B21","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1093\/bioinformatics\/14.5.407","article-title":"COFFEE: an objective function for multiple sequence alignments","volume":"14","author":"Notredame","year":"1998","journal-title":"Bioinformatics"},{"key":"2023020113515753900_btw412-B22","doi-asserted-by":"crossref","first-page":"1759","DOI":"10.1093\/molbev\/msq066","article-title":"An alignment confidence score capturing robustness to guide tree uncertainty","volume":"27","author":"Penn","year":"2010","journal-title":"Mol. Biol. Evol"},{"key":"2023020113515753900_btw412-B23","doi-asserted-by":"crossref","first-page":"e9490.","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2\u2014approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2023020113515753900_btw412-B24","doi-asserted-by":"crossref","first-page":"47.","DOI":"10.1186\/1471-2105-4-47","article-title":"OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy","volume":"4","author":"Raghava","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023020113515753900_btw412-B25","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol Syst Biol"},{"key":"2023020113515753900_btw412-B26","doi-asserted-by":"crossref","first-page":"338.","DOI":"10.1186\/1471-2105-15-338","article-title":"Systematic exploration of guide-tree topology effects for small protein alignments","volume":"15","author":"Sievers","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023020113515753900_btw412-B27","doi-asserted-by":"crossref","first-page":"E99","DOI":"10.1073\/pnas.1417526112","article-title":"Simple chained guide trees give poorer multiple sequence alignments than inferred trees in simulation and phylogenetic benchmarks","volume":"112","author":"Tan","year":"2015","journal-title":"Proc. Natl Acad. Sci. U.S.A"},{"key":"2023020113515753900_btw412-B28","doi-asserted-by":"crossref","first-page":"2682","DOI":"10.1093\/nar\/27.13.2682","article-title":"A comprehensive comparison of multiple sequence alignment programs","volume":"27","author":"Thompson","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023020113515753900_btw412-B29","doi-asserted-by":"crossref","first-page":"i559","DOI":"10.1093\/bioinformatics\/btm226","article-title":"Multiple alignment by aligning alignments","volume":"23","author":"Wheeler","year":"2007","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/21\/3246\/49021591\/bioinformatics_32_21_3246.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/21\/3246\/49021591\/bioinformatics_32_21_3246.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T23:55:00Z","timestamp":1675295700000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/21\/3246\/2415233"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,7,4]]},"references-count":29,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2016,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw412","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,11,1]]},"published":{"date-parts":[[2016,7,4]]}}}