{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T00:41:09Z","timestamp":1773016869534,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2017,1,16]],"date-time":"2017-01-16T00:00:00Z","timestamp":1484524800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001602","name":"Science Foundation Ireland","doi-asserted-by":"publisher","award":["11\/PI\/1034"],"award-info":[{"award-number":["11\/PI\/1034"]}],"id":[{"id":"10.13039\/501100001602","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Multiple sequence alignment (MSA) is commonly used to analyze sets of homologous protein or DNA sequences. This has lead to the development of many methods and packages for MSA over the past 30 years. Being able to compare different methods has been problematic and has relied on gold standard benchmark datasets of \u2018true\u2019 alignments or on MSA simulations. A number of protein benchmark datasets have been produced which rely on a combination of manual alignment and\/or automated superposition of protein structures. These are either restricted to very small MSAs with few sequences or require manual alignment which can be subjective. In both cases, it remains very difficult to properly test MSAs of more than a few dozen sequences. PREFAB and HomFam both rely on using a small subset of sequences of known structure and do not fairly test the quality of a full MSA.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this paper we describe QuanTest, a fully automated and highly scalable test system for protein MSAs which is based on using secondary structure prediction accuracy (SSPA) to measure alignment quality. This is based on the assumption that better MSAs will give more accurate secondary structure predictions when we include sequences of known structure. SSPA measures the quality of an entire alignment however, not just the accuracy on a handful of selected sequences. It can be scaled to alignments of any size but here we demonstrate its use on alignments of either 200 or 1000 sequences. This allows the testing of slow accurate programs as well as faster, less accurate ones. We show that the scores from QuanTest are highly correlated with existing benchmark scores. We also validate the method by comparing a wide range of MSA alignment options and by including different levels of mis-alignment into MSA, and examining the effects on the scores.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>QuanTest is available from http:\/\/www.bioinf.ucd.ie\/download\/QuanTest.tgz<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btw840","type":"journal-article","created":{"date-parts":[[2017,1,17]],"date-time":"2017-01-17T01:43:51Z","timestamp":1484617431000},"page":"1331-1337","source":"Crossref","is-referenced-by-count":34,"title":["Protein multiple sequence alignment benchmarking through secondary structure prediction"],"prefix":"10.1093","volume":"33","author":[{"given":"Quan","family":"Le","sequence":"first","affiliation":[{"name":"Conway Institute, UCD School of Medicine and Medical Science, University College Dublin, Belfield, Dublin, Dublin 4, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fabian","family":"Sievers","sequence":"additional","affiliation":[{"name":"Conway Institute, UCD School of Medicine and Medical Science, University College Dublin, Belfield, Dublin, Dublin 4, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Desmond G","family":"Higgins","sequence":"additional","affiliation":[{"name":"Conway Institute, UCD School of Medicine and Medical Science, University College Dublin, Belfield, Dublin, Dublin 4, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2017,1,16]]},"reference":[{"key":"2023020205032039200_btw840-B1","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/1748-7188-5-21","article-title":"Sequence embedding for fast construction of guide trees for multiple sequence alignment","volume":"5","author":"Blackshields","year":"2010","journal-title":"Algorithms Mol. Biol"},{"key":"2023020205032039200_btw840-B2","doi-asserted-by":"crossref","first-page":"E101","DOI":"10.1073\/pnas.1419351112","article-title":"Reply to tan et al.: Differences between real and simulated proteins in multiple sequence alignments","volume":"112","author":"Boyce","year":"2015","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020205032039200_btw840-B3","first-page":"bbv099.","article-title":"Multiple sequence alignment modeling: methods and applications","author":"Chatzou","year":"2015","journal-title":"Brief. Bioinf"},{"key":"2023020205032039200_btw840-B4","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1002\/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-Q","article-title":"Application of multiple sequence alignment profiles to improve protein secondary structure prediction","volume":"40","author":"Cuff","year":"2000","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023020205032039200_btw840-B5","doi-asserted-by":"crossref","first-page":"R37.","DOI":"10.1186\/gb-2010-11-4-r37","article-title":"Research phylogenetic assessment of alignments reveals neglected tree signal in gaps","volume":"11","author":"Dessimoz","year":"2010","journal-title":"Genome Biol"},{"key":"2023020205032039200_btw840-B6","doi-asserted-by":"crossref","first-page":"W389","DOI":"10.1093\/nar\/gkv332","article-title":"Jpred4: a protein secondary structure prediction server","volume":"43","author":"Drozdetskiy","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020205032039200_btw840-B7","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023020205032039200_btw840-B8","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"Muscle: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023020205032039200_btw840-B9","doi-asserted-by":"crossref","first-page":"2145","DOI":"10.1093\/nar\/gkp1196","article-title":"Quality measures for protein alignment benchmarks","volume":"38","author":"Edgar","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023020205032039200_btw840-B10","first-page":"d222","article-title":"Pfam: the protein families database","author":"Finn","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020205032039200_btw840-B11","doi-asserted-by":"crossref","first-page":"814","DOI":"10.1093\/bioinformatics\/btv592","article-title":"Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments","volume":"32","author":"Fox","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020205032039200_btw840-B12","doi-asserted-by":"crossref","first-page":"W100","DOI":"10.1093\/nar\/gkh464","article-title":"Ce-mc: a multiple protein structure alignment server","volume":"32","author":"Guda","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023020205032039200_btw840-B13","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.sbi.2009.04.003","article-title":"Advances and pitfalls of protein structural alignment","volume":"19","author":"Hasegawa","year":"2009","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023020205032039200_btw840-B14","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/978-1-62703-646-7_4","article-title":"Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment","volume":"1079","author":"Iantorno","year":"2014","journal-title":"Multiple Seq. Alignment Methods"},{"key":"2023020205032039200_btw840-B15","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1006\/jmbi.1999.3091","article-title":"Protein secondary structure prediction based on position-specific scoring matrices","volume":"292","author":"Jones","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023020205032039200_btw840-B16","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1093\/bioinformatics\/btr638","article-title":"Psicov: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments","volume":"28","author":"Jones","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020205032039200_btw840-B17","doi-asserted-by":"crossref","first-page":"772","DOI":"10.1093\/molbev\/mst010","article-title":"Mafft multiple sequence alignment software version 7: improvements in performance and usability","volume":"30","author":"Katoh","year":"2013","journal-title":"Mol. Biol. Evol"},{"key":"2023020205032039200_btw840-B18","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.1016\/j.jmb.2004.12.032","article-title":"Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures","volume":"346","author":"Kolodny","year":"2005","journal-title":"J. Mol. Biol"},{"key":"2023020205032039200_btw840-B19","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1002\/prot.20921","article-title":"Mustang: a multiple structural alignment algorithm","volume":"64","author":"Konagurthu","year":"2006","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023020205032039200_btw840-B20","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1007\/11560500_7","volume-title":"International Symposium on Computational Life Science","author":"Krissinel","year":"2005"},{"key":"2023020205032039200_btw840-B21","doi-asserted-by":"crossref","first-page":"2947","DOI":"10.1093\/bioinformatics\/btm404","article-title":"Clustal w and clustal x version 2.0","volume":"23","author":"Larkin","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020205032039200_btw840-B22","doi-asserted-by":"crossref","first-page":"298.","DOI":"10.1186\/1471-2105-6-298","article-title":"Kalign\u2014an accurate and fast multiple sequence alignment algorithm","volume":"6","author":"Lassmann","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020205032039200_btw840-B23","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020205032039200_btw840-B24","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1038\/nbt.2419","article-title":"Protein structure prediction from sequence variation","volume":"30","author":"Marks","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023020205032039200_btw840-B25","first-page":"177","volume-title":"International Conference on Research in Computational Molecular Biology","author":"Mirarab","year":"2014"},{"key":"2023020205032039200_btw840-B26","doi-asserted-by":"crossref","first-page":"2469","DOI":"10.1002\/pro.5560071126","article-title":"Homstrad: a database of protein structure alignments for homologous families","volume":"7","author":"Mizuguchi","year":"1998","journal-title":"Protein Sci"},{"key":"2023020205032039200_btw840-B27","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1006\/jmbi.2000.4042","article-title":"T-coffee: a novel method for fast and accurate multiple sequence alignment","volume":"302","author":"Notredame","year":"2000","journal-title":"J. Mol. Biol"},{"key":"2023020205032039200_btw840-B28","doi-asserted-by":"crossref","first-page":"1719","DOI":"10.1093\/bioinformatics\/bti203","article-title":"Porter: a new, accurate server for protein secondary structure prediction","volume":"21","author":"Pollastri","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020205032039200_btw840-B29","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/1471-2105-4-47","article-title":"Oxbench: a benchmark for evaluation of protein multiple sequence alignment accuracy","volume":"4","author":"Raghava","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023020205032039200_btw840-B30","doi-asserted-by":"crossref","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol. Syst. Biol"},{"key":"2023020205032039200_btw840-B31","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1093\/bioinformatics\/btt093","article-title":"Making automated multiple alignments of very large numbers of protein sequences","volume":"29","author":"Sievers","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020205032039200_btw840-B32","doi-asserted-by":"crossref","first-page":"E99","DOI":"10.1073\/pnas.1417526112","article-title":"Simple chained guide trees give poorer multiple sequence alignments than inferred trees in simulation and phylogenetic benchmarks","volume":"112","author":"Tan","year":"2015","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020205032039200_btw840-B33","doi-asserted-by":"crossref","first-page":"300.","DOI":"10.1186\/s12859-016-1059-9","article-title":"Reduction, alignment and visualisation of large diverse sequence families","volume":"17","author":"Taylor","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023020205032039200_btw840-B34","doi-asserted-by":"crossref","first-page":"1858","DOI":"10.1002\/pro.5560031025","article-title":"Multiple protein structure alignment","volume":"3","author":"Taylor","year":"1994","journal-title":"Protein Sci"},{"key":"2023020205032039200_btw840-B35","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1093\/bioinformatics\/15.1.87","article-title":"Balibase: a benchmark alignment database for the evaluation of multiple alignment programs","volume":"15","author":"Thompson","year":"1999","journal-title":"Bioinformatics"},{"key":"2023020205032039200_btw840-B36","doi-asserted-by":"crossref","first-page":"2682","DOI":"10.1093\/nar\/27.13.2682","article-title":"A comprehensive comparison of multiple sequence alignment programs","volume":"27","author":"Thompson","year":"1999","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/9\/1331\/49038951\/bioinformatics_33_9_1331.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/9\/1331\/49038951\/bioinformatics_33_9_1331.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T05:08:06Z","timestamp":1675314486000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/9\/1331\/2908435"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2017,1,16]]},"references-count":36,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2017,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw840","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,5,1]]},"published":{"date-parts":[[2017,1,16]]}}}