{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T12:58:55Z","timestamp":1765544335932},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"23","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2568,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually remove them. Although considered necessary in some cases, manual editing is time consuming and not reproducible. We present here an automated editing method based on the classification of \u2018valid\u2019 and \u2018invalid\u2019 sites.<\/jats:p>\n               <jats:p>Results: A support vector machine (SVM) classifier is trained to reproduce the decisions made during manual editing with an accuracy of 95.0%. This implies that manual editing can be made reproducible and applied to large-scale analyses. We further demonstrate that it is possible to retrain\/extend the training of the classifier by providing examples of multiple sequence alignment (MSA) annotation. Near optimal training can be achieved with only 1000 annotated sites, or roughly three samples of protein sequence alignments.<\/jats:p>\n               <jats:p>Availability: This method is implemented in the software MANUEL, licensed under the GPL. A web-based application for single and batch job is available at http:\/\/fester.cs.dal.ca\/manuel.<\/jats:p>\n               <jats:p>Contact: \u00a0cblouin@cs.dal.ca<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp552","type":"journal-article","created":{"date-parts":[[2009,9,22]],"date-time":"2009-09-22T01:46:16Z","timestamp":1253583976000},"page":"3093-3098","source":"Crossref","is-referenced-by-count":6,"title":["Reproducing the manual annotation of multiple sequence alignments using a SVM classifier"],"prefix":"10.1093","volume":"25","author":[{"given":"Christian","family":"Blouin","sequence":"first","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"},{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"},{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"}]},{"given":"Scott","family":"Perry","sequence":"additional","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"}]},{"given":"Allan","family":"Lavell","sequence":"additional","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"}]},{"given":"Edward","family":"Susko","sequence":"additional","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"},{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"}]},{"given":"Andrew J.","family":"Roger","sequence":"additional","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"},{"name":"1 Department of Biochemistry and Molecular Biology, Dalhousie University, Sir Charles Tupper Medical Building, Halifax NS B3H 1X5, 2 Faculty of Computer Science, Dalhousie University, Halifax NS B3H 5W1, 3 Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University and 4 Department of Mathematics and Statistics, Dalhousie University, Halifax NS B3H 6J3, Canada"}]}],"member":"286","published-online":{"date-parts":[[2009,9,21]]},"reference":[{"key":"2023013112192996100_B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023013112192996100_B2","doi-asserted-by":"crossref","first-page":"14332","DOI":"10.1073\/pnas.0504068102","article-title":"Highways of gene sharing in prokaryotes","volume":"102","author":"Beiko","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112192996100_B3","doi-asserted-by":"crossref","first-page":"e1000392","DOI":"10.1371\/journal.pcbi.1000392","article-title":"Fast statistical alignment","volume":"5","author":"Bradley","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023013112192996100_B4","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1093\/oxfordjournals.molbev.a026334","article-title":"Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis","volume":"17","author":"Castresana","year":"2000","journal-title":"Mol. Biol. Evol."},{"key":"2023013112192996100_B5","author":"Chang","year":"2001","journal-title":"LIBSVM: a library for support vector machines."},{"key":"2023013112192996100_B6","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1101\/gr.2821705","article-title":"Probcons: probabilistic consistency-based multiple sequence alignment","volume":"15","author":"Do","year":"2005","journal-title":"Genome Res."},{"key":"2023013112192996100_B7","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1186\/1471-2105-7-188","article-title":"Bio++: a set of c++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics","volume":"7","author":"Dutheil","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023013112192996100_B8","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023013112192996100_B9","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"Muscle: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023013112192996100_B10","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recogn. Lett."},{"key":"2023013112192996100_B11","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1007\/BF02603120","article-title":"Progressive sequence alignment as a prerequisite to correct phylogenetic trees","volume":"25","author":"Feng","year":"1987","journal-title":"J. Mol. Evol."},{"key":"2023013112192996100_B12","doi-asserted-by":"crossref","first-page":"D281","DOI":"10.1093\/nar\/gkm960","article-title":"The pfam protein families database","volume":"36","author":"Finn","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013112192996100_B13","doi-asserted-by":"crossref","first-page":"1576","DOI":"10.1093\/molbev\/msn103","article-title":"How well does the hot score reflect sequence alignment accuracy?","volume":"25","author":"Hall","year":"2008","journal-title":"Mol. Biol. Evol."},{"key":"2023013112192996100_B14","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1089\/cmb.1998.5.493","article-title":"Dynamic programming alignment accuracy","volume":"5","author":"Holmes","year":"1998","journal-title":"J. Comput. Biol."},{"key":"2023013112192996100_B15","first-page":"275","article-title":"The rapid generation of mutation data matrices from protein sequences","volume":"8","author":"Jones","year":"1992","journal-title":"Comput. Appl. Biosci."},{"key":"2023013112192996100_B16","doi-asserted-by":"crossref","first-page":"1380","DOI":"10.1093\/molbev\/msm060","article-title":"Heads or tails: a simple reliability check for multiple sequence alignments","volume":"24","author":"Landan","year":"2007","journal-title":"Mol. Biol. Evol."},{"key":"2023013112192996100_B17","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1016\/j.gene.2008.05.016","article-title":"Characterization of pairwise and multiple sequence alignment errors","volume":"441","author":"Landan","year":"2009","journal-title":"Gene"},{"issue":"Suppl. 5","key":"2023013112192996100_B18","doi-asserted-by":"crossref","first-page":"S9","DOI":"10.1186\/1471-2105-8-S5-S9","article-title":"Automatic extraction of reliable regions from multiple sequence alignments","volume":"8","author":"Lassmann","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013112192996100_B19","doi-asserted-by":"crossref","first-page":"7120","DOI":"10.1093\/nar\/gki1020","article-title":"Automatic assessment of alignment quality","volume":"33","author":"Lassmann","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023013112192996100_B20","doi-asserted-by":"crossref","first-page":"1632","DOI":"10.1126\/science.1158395","article-title":"Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis","volume":"320","author":"L\u00f6ytynoja","year":"2008","journal-title":"Science"},{"key":"2023013112192996100_B21","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1101\/gr.6725608","article-title":"Uncertainty in homology inferences: assessing and improving genomic sequence alignment","volume":"18","author":"Lunter","year":"2008","journal-title":"Genome Res."},{"key":"2023013112192996100_B22","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023013112192996100_B23","doi-asserted-by":"crossref","first-page":"e123","DOI":"10.1371\/journal.pcbi.0030123","article-title":"Recent evolutions of multiple sequence alignment algorithms","volume":"3","author":"Notredame","year":"2007","journal-title":"PLoS Comput. Biol."},{"key":"2023013112192996100_B24","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1006\/jmbi.2000.4042","article-title":"T-coffee: a novel method for fast and accurate multiple sequence alignment","volume":"302","author":"Notredame","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023013112192996100_B25","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1186\/1471-2105-7-471","article-title":"The accuracy of several multiple sequence alignment programs for proteins","volume":"7","author":"Nuin","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023013112192996100_B26","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1080\/10635150500541730","article-title":"Multiple sequence alignment accuracy and phylogenetic inference","volume":"55","author":"Ogdenw","year":"2006","journal-title":"Syst. Biol."},{"key":"2023013112192996100_B27","doi-asserted-by":"crossref","first-page":"700","DOI":"10.1093\/bioinformatics\/17.8.700","article-title":"Al2co: calculation of positional conservation in a protein sequence alignment","volume":"17","author":"Pei","year":"2001","journal-title":"Bioinformatics"},{"key":"2023013112192996100_B28","doi-asserted-by":"crossref","first-page":"1931","DOI":"10.1093\/molbev\/msp105","article-title":"A machine-learning approach reveals that alignment properties alone can accurately predict inference of lateral gene transfer from discordant phylogenies","volume":"26","author":"Roettger","year":"2009","journal-title":"Mol. Biol. Evol."},{"key":"2023013112192996100_B29","first-page":"406","article-title":"The neighbor-joining method: a new method for reconstructing phylogenetic trees","volume":"4","author":"Saitou","year":"1987","journal-title":"Mol. Biol. Evol."},{"key":"2023013112192996100_B30","doi-asserted-by":"crossref","first-page":"482","DOI":"10.1109\/CSB.2003.1227381","article-title":"Automatic recognition of regions of intrinsically poor multiple alignment using machine learning","author":"Shan","year":"2003","journal-title":"Proceedings of the 2003 IEEE Bioinformatics Conference (CSB2003)"},{"key":"2023013112192996100_B31","doi-asserted-by":"crossref","first-page":"3940","DOI":"10.1093\/bioinformatics\/bti623","article-title":"ROCR: visualizing classifier performance in R","volume":"21","author":"Sing","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112192996100_B32","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023013112192996100_B33","volume-title":"R: A Language and Environment for Statistical Computing.","author":"R Development Core Team","year":"2009"},{"key":"2023013112192996100_B34","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","article-title":"Clustalw: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice","volume":"22","author":"Thompson","year":"1994","journal-title":"Nucleic Acids Res."},{"key":"2023013112192996100_B35","doi-asserted-by":"crossref","first-page":"2682","DOI":"10.1093\/nar\/27.13.2682","article-title":"A comprehensive comparison of multiple sequence alignment programs","volume":"27","author":"Thompson","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023013112192996100_B36","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1002\/prot.20527","article-title":"BaliBASE 3.0: latest developments of the multiple sequence alignment benchmark","volume":"61","author":"Thompson","year":"2005","journal-title":"Proteins"},{"key":"2023013112192996100_B37","doi-asserted-by":"crossref","first-page":"1428","DOI":"10.1093\/bioinformatics\/bth116","article-title":"Align-m\u2013a new algorithm for multiple alignment of highly divergent sequences","volume":"20","author":"Van Walle","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013112192996100_B38","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1126\/science.1151532","article-title":"Alignment uncertainty and genomic analysis","volume":"319","author":"Wong","year":"2008","journal-title":"Science"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/23\/3093\/48997890\/bioinformatics_25_23_3093.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/23\/3093\/48997890\/bioinformatics_25_23_3093.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:59:12Z","timestamp":1675202352000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/23\/3093\/215514"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,9,21]]},"references-count":38,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2009,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp552","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,12,1]]},"published":{"date-parts":[[2009,9,21]]}}}