{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T17:11:00Z","timestamp":1763399460089},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1623,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences.<\/jats:p>\n               <jats:p>Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses.<\/jats:p>\n               <jats:p>Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http:\/\/code.google.com\/p\/pagan-msa.<\/jats:p>\n               <jats:p>Contact: \u00a0ari.loytynoja@helsinki.fi<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts198","type":"journal-article","created":{"date-parts":[[2012,4,25]],"date-time":"2012-04-25T00:58:05Z","timestamp":1335315485000},"page":"1684-1691","source":"Crossref","is-referenced-by-count":121,"title":["Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm"],"prefix":"10.1093","volume":"28","author":[{"given":"Ari","family":"L\u00f6ytynoja","sequence":"first","affiliation":[{"name":"1 EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK and 2Institute of Biotechnology, 00014 University of Helsinki, Finland"},{"name":"1 EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK and 2Institute of Biotechnology, 00014 University of Helsinki, Finland"}]},{"given":"Albert J.","family":"Vilella","sequence":"additional","affiliation":[{"name":"1 EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK and 2Institute of Biotechnology, 00014 University of Helsinki, Finland"}]},{"given":"Nick","family":"Goldman","sequence":"additional","affiliation":[{"name":"1 EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK and 2Institute of Biotechnology, 00014 University of Helsinki, Finland"}]}],"member":"286","published-online":{"date-parts":[[2012,4,23]]},"reference":[{"key":"2023012512380742100_B1","doi-asserted-by":"crossref","first-page":"2068","DOI":"10.1093\/bioinformatics\/btr320","article-title":"Aligning short reads to reference alignments and trees","volume":"27","author":"Berger","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012512380742100_B2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1093\/sysbio\/syr010","article-title":"Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood","volume":"60","author":"Berger","year":"2011","journal-title":"Syst. Biol."},{"key":"2023012512380742100_B3","doi-asserted-by":"crossref","first-page":"R37","DOI":"10.1186\/gb-2010-11-4-r37","article-title":"Phylogenetic assessment of alignments reveals neglected tree signal in gaps","volume":"11","author":"Dessimoz","year":"2010","journal-title":"Genome. Biol."},{"key":"2023012512380742100_B4","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated Profile HMM Searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput. Biol."},{"key":"2023012512380742100_B5","doi-asserted-by":"crossref","first-page":"1879","DOI":"10.1093\/molbev\/msp098","article-title":"INDELible: a flexible simulator of biological sequence evolution","volume":"26","author":"Fletcher","year":"2009","journal-title":"Mol. Biol. Evol."},{"key":"2023012512380742100_B6","doi-asserted-by":"crossref","first-page":"2257","DOI":"10.1093\/molbev\/msq115","article-title":"The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection","volume":"27","author":"Fletcher","year":"2010","journal-title":"Mol. Biol. Evol."},{"key":"2023012512380742100_B7","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/0022-2836(82)90398-9","article-title":"An improved algorithm for matching biological sequences","volume":"162","author":"Gotoh","year":"1982","journal-title":"J. Mol. Biol."},{"key":"2023012512380742100_B8","first-page":"649","article-title":"A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given","volume":"6","author":"Hein","year":"1989","journal-title":"Mol. Biol. Evol."},{"key":"2023012512380742100_B9","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1093\/molbev\/msr272","article-title":"The effects of alignment error and alignment filtering on the sitewise detection of positive selection","volume":"29","author":"Jordan","year":"2012","journal-title":"Mol. Biol. Evol."},{"key":"2023012512380742100_B10","doi-asserted-by":"crossref","first-page":"3059","DOI":"10.1093\/nar\/gkf436","article-title":"MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform","volume":"30","author":"Katoh","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012512380742100_B11","first-page":"265","article-title":"An anthology of algorithms and concepts for sequence comparison","volume-title":"Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison","author":"Kruskal","year":"1983"},{"key":"2023012512380742100_B12","doi-asserted-by":"crossref","first-page":"2947","DOI":"10.1093\/bioinformatics\/btm404","article-title":"Clustal W and Clustal X version 2.0","volume":"23","author":"Larkin","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512380742100_B13","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1093\/bioinformatics\/18.3.452","article-title":"Multiple sequence alignment using partial order graphs","volume":"18","author":"Lee","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012512380742100_B14","doi-asserted-by":"crossref","first-page":"10557","DOI":"10.1073\/pnas.0409137102","article-title":"An algorithm for progressive multiple alignment of sequences with insertions","volume":"102","author":"L\u00f6ytynoja","year":"2005","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012512380742100_B15","doi-asserted-by":"crossref","first-page":"1632","DOI":"10.1126\/science.1158395","article-title":"Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis","volume":"320","author":"L\u00f6ytynoja","year":"2008","journal-title":"Science"},{"key":"2023012512380742100_B16","doi-asserted-by":"crossref","first-page":"1528","DOI":"10.1126\/science.1175949","article-title":"Uniting alignments and trees","volume":"324","author":"L\u00f6ytynoja","year":"2009","journal-title":"Science"},{"key":"2023012512380742100_B17","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1101\/gr.115949.110","article-title":"High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes","volume":"21","author":"Markova-Raina","year":"2011","journal-title":"Genome Res."},{"key":"2023012512380742100_B18","author":"Massingham","year":"2012","journal-title":"simNGS and simLibrary \u2013 software for simulating next-gen sequencing data."},{"key":"2023012512380742100_B19","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1186\/1471-2105-11-538","article-title":"pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree","volume":"11","author":"Matsen","year":"2010","journal-title":"BMC Bioinform."},{"key":"2023012512380742100_B20","first-page":"247","article-title":"SEPP: SAT\u00e9-enabled phylogenetic placement","volume":"17","author":"Mirarab","year":"2012","journal-title":"Proc. Pac. Symp. Biocomput."},{"key":"2023012512380742100_B21","doi-asserted-by":"crossref","first-page":"1829","DOI":"10.1101\/gr.076521.108","article-title":"Genome-wide nucleotide-level mammalian ancestor reconstruction","volume":"18","author":"Paten","year":"2008","journal-title":"Genome Res."},{"key":"2023012512380742100_B22","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1137\/0128004","article-title":"Minimal mutation trees of sequences","volume":"28","author":"Sankoff","year":"1975","journal-title":"SIAM J. Appl. Math."},{"key":"2023012512380742100_B23","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol. Syst. Biol."},{"key":"2023012512380742100_B24","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/1471-2105-6-31","article-title":"Automated generation of heuristics for biological sequence comparison","volume":"6","author":"Slater","year":"2005","journal-title":"BMC Bioinform."},{"key":"2023012512380742100_B25","doi-asserted-by":"crossref","first-page":"2688","DOI":"10.1093\/bioinformatics\/btl446","article-title":"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models","volume":"22","author":"Stamatakis","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012512380742100_B26","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1186\/1471-2164-11-461","article-title":"MLTreeMap\u2013accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies","volume":"11","author":"Stark","year":"2010","journal-title":"BMC Genomics"},{"key":"2023012512380742100_B27","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1093\/genetics\/155.1.431","article-title":"Codon-substitution models for heterogeneous selection pressure at amino acid sites","volume":"155","author":"Yang","year":"2000","journal-title":"Genetics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/13\/1684\/48867580\/bioinformatics_28_13_1684.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/13\/1684\/48867580\/bioinformatics_28_13_1684.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T16:39:44Z","timestamp":1674664784000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/13\/1684\/234353"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4,23]]},"references-count":27,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2012,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts198","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,7,1]]},"published":{"date-parts":[[2012,4,23]]}}}