{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T09:03:45Z","timestamp":1776762225719,"version":"3.51.2"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T00:00:00Z","timestamp":1606694400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["ABI-1458652"],"award-info":[{"award-number":["ABI-1458652"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,7,19]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>The estimation of large multiple sequence alignments (MSAs) is a basic bioinformatics challenge. Divide-and-conquer is a useful approach that has been shown to improve the scalability and accuracy of MSA estimation in established methods such as SAT\u00e9 and PASTA. In these divide-and-conquer strategies, a sequence dataset is divided into disjoint subsets, alignments are computed on the subsets using base MSA methods (e.g. MAFFT), and then merged together into an alignment on the full dataset.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We present MAGUS, Multiple sequence Alignment using Graph clUStering, a new technique for computing large-scale alignments. MAGUS is similar to PASTA in that it uses nearly the same initial steps (starting tree, similar decomposition strategy, and MAFFT to compute subset alignments), but then merges the subset alignments using the Graph Clustering Merger, a new method for combining disjoint alignments that we present in this study. Our study, on a heterogeneous collection of biological and simulated datasets, shows that MAGUS produces improved accuracy and is faster than PASTA on large datasets, and matches it on smaller datasets.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>MAGUS: https:\/\/github.com\/vlasmirnov\/MAGUS<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa992","type":"journal-article","created":{"date-parts":[[2020,11,17]],"date-time":"2020-11-17T20:12:39Z","timestamp":1605643959000},"page":"1666-1672","source":"Crossref","is-referenced-by-count":71,"title":["MAGUS: Multiple sequence Alignment using Graph clUStering"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7826-1214","authenticated-orcid":false,"given":"Vladimir","family":"Smirnov","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign , Urbana, IL 61801, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7717-3514","authenticated-orcid":false,"given":"Tandy","family":"Warnow","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign , Urbana, IL 61801, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,11,30]]},"reference":[{"key":"2023051709553394500_btaa992-B1","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/1471-2105-3-2","article-title":"The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs","volume":"3","author":"Cannone","year":"2002","journal-title":"BMC Bioinf"},{"key":"2023051709553394500_btaa992-B2","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1101\/gr.2821705","article-title":"Probcons: probabilistic consistency-based multiple sequence alignment","volume":"15","author":"Do","year":"2005","journal-title":"Genome Res"},{"key":"2023051709553394500_btaa992-B3","author":"Eddy","year":"2020"},{"key":"2023051709553394500_btaa992-B4","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1186\/1471-2105-5-113","article-title":"MUSCLE: a multiple sequence alignment method with reduced time and space complexity","volume":"5","author":"Edgar","year":"2004","journal-title":"BMC Bioinf"},{"key":"2023051709553394500_btaa992-B5","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1038\/s41587-019-0333-6","article-title":"Large multiple sequence alignments with a root-to-leaf regressive method","volume":"37","author":"Garriga","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023051709553394500_btaa992-B6","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/TSSC.1968.300136","article-title":"A formal basis for the heuristic determination of minimum cost paths","volume":"4","author":"Hart","year":"1968","journal-title":"IEEE Trans. Syst. Sci. Cyber"},{"key":"2023051709553394500_btaa992-B7","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1093\/bib\/bbn013","article-title":"Recent developments in the MAFFT multiple sequence alignment program","volume":"9","author":"Katoh","year":"2008","journal-title":"Brief. Bioinf"},{"key":"2023051709553394500_btaa992-B8","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1093\/nar\/gki198","article-title":"MAFFT version 5: improvement in accuracy of multiple sequence alignment","volume":"33","author":"Katoh","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023051709553394500_btaa992-B9","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1007\/BFb0029800","volume-title":"Annual Symposium on Combinatorial Pattern Matching","author":"Kececioglu","year":"1993"},{"key":"2023051709553394500_btaa992-B10","doi-asserted-by":"crossref","first-page":"1928","DOI":"10.1093\/bioinformatics\/btz795","article-title":"Kalign 3: multiple sequence alignment of large datasets","volume":"36","author":"Lassmann","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051709553394500_btaa992-B11","doi-asserted-by":"crossref","first-page":"2178","DOI":"10.1101\/gr.1224503","article-title":"OrthoMCL: identification of ortholog groups for eukaryotic genomes","volume":"13","author":"Li","year":"2003","journal-title":"Genome Res"},{"key":"2023051709553394500_btaa992-B12","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1126\/science.1171243","article-title":"Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees","volume":"324","author":"Liu","year":"2009","journal-title":"Science"},{"key":"2023051709553394500_btaa992-B13","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1093\/sysbio\/syr095","article-title":"SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees","volume":"61","author":"Liu","year":"2012","journal-title":"Syst. Biol"},{"key":"2023051709553394500_btaa992-B14","doi-asserted-by":"crossref","first-page":"3250","DOI":"10.1093\/bioinformatics\/btr553","article-title":"FastSP: linear time calculation of alignment accuracy","volume":"27","author":"Mirarab","year":"2011","journal-title":"Bioinformatics"},{"key":"2023051709553394500_btaa992-B15","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1089\/cmb.2014.0156","article-title":"PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences","volume":"22","author":"Mirarab","year":"2015","journal-title":"J. Comput. Biol"},{"key":"2023051709553394500_btaa992-B16","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1186\/s13059-015-0688-z","article-title":"Ultra-large alignments using phylogeny-aware profiles","volume":"16","author":"Nguyen","year":"2015","journal-title":"Genome Biol"},{"key":"2023051709553394500_btaa992-B17","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1006\/jmbi.2000.4042","article-title":"T-Coffee: a novel method for fast and accurate multiple sequence alignment","volume":"302","author":"Notredame","year":"2000","journal-title":"J. Mol. Biol"},{"key":"2023051709553394500_btaa992-B18","volume-title":"Intelligent Search Strategies for Computer Problem Solving","author":"Pearl","year":"1984"},{"key":"2023051709553394500_btaa992-B19","doi-asserted-by":"crossref","first-page":"802","DOI":"10.1093\/bioinformatics\/btm017","article-title":"PROMALS: towards accurate multiple sequence alignments of distantly related proteins","volume":"23","author":"Pei","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051709553394500_btaa992-B20","doi-asserted-by":"crossref","first-page":"e9490","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2\u2014approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2023051709553394500_btaa992-B21","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol. Syst. Biol"},{"key":"2023051709553394500_btaa992-B22","author":"Smirnov","year":"2020"},{"key":"2023051709553394500_btaa992-B23","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1093\/bioinformatics\/15.1.87","article-title":"BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs","volume":"15","author":"Thompson","year":"1999","journal-title":"Bioinformatics"},{"key":"2023051709553394500_btaa992-B24","author":"Van Dongen","year":"2000"},{"key":"2023051709553394500_btaa992-B25","article-title":"MCL manual","author":"Von Dongen","year":"2012"},{"key":"2023051709553394500_btaa992-B26","doi-asserted-by":"crossref","first-page":"i559","DOI":"10.1093\/bioinformatics\/btm226","article-title":"Multiple alignment by aligning alignments","volume":"23","author":"Wheeler","year":"2007","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa992\/35064786\/btaa992.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/12\/1666\/50361316\/btaa992.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/12\/1666\/50361316\/btaa992.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T14:54:41Z","timestamp":1697122481000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/12\/1666\/6012350"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,11,30]]},"references-count":26,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2021,7,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa992","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,6,15]]},"published":{"date-parts":[[2020,11,30]]}}}