{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T11:06:28Z","timestamp":1779879988650,"version":"3.53.1"},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Worldwide effort on sampling and characterization of molecular variation within a large number of human and animal pathogens has lead to the emergence of multi-locus sequence typing (MLST) databases as an important tool for studying the epidemiology and evolution of pathogens. Many of these databases are currently harboring several thousands of multi-locus DNA sequence types (STs) enriched with metadata over traits such as serotype, antibiotic resistance, host organism etc of the isolates. Curators of the databases have thus the possibility of dividing the pathogen populations into subsets representing different evolutionary lineages, geographically associated groups, or other subpopulations, which are defined in terms of molecular similarities and dissimilarities residing within a database. When combined with the existing metadata, such subsets may provide invaluable information for assessing the position of a new set of isolates in relation to the whole pathogen population.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>To enable users of MLST schemes to query the databases with sets of new bacterial isolates and to automatically analyze their relation to existing curated sequences, we introduce here a Bayesian model-based method for semi-supervised classification of MLST data. Our method can use an MLST database as a training set and assign simultaneously any set of query sequences into the earlier discovered lineages\/populations, while also allowing some or all of these sequences to form previously undiscovered genetically distinct groups. This tool provides probabilistic quantification of the classification uncertainty and is highly efficient computationally, thus enabling rapid analyses of large databases and sets of query sequences. The latter feature is a necessary prerequisite for an automated access through the MLST web interface. We demonstrate the versatility of our approach by anayzing both real and synthesized data from MLST databases. The introduced method for semi-supervised classification of sets of query STs is freely available for Windows, Mac OS X and Linux operative systems in BAPS 5.4 software which is downloadable at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/web.abo.fi\/fak\/mnf\/mate\/jc\/software\/baps.html\" ext-link-type=\"uri\">http:\/\/web.abo.fi\/fak\/mnf\/mate\/jc\/software\/baps.html<\/jats:ext-link>. The query functionality is also directly available for the<jats:italic>Staphylococcus aureus<\/jats:italic>database at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/www.mlst.net\" ext-link-type=\"uri\">http:\/\/www.mlst.net<\/jats:ext-link>and shortly will be available for other species databases hosted at this web portal.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>We have introduced a model-based tool for automated semi-supervised classification of new pathogen samples that can be integrated into the web interface of the MLST databases. In particular, when combined with the existing metadata, the semi-supervised labeling may provide invaluable information for assessing the position of a new set of query strains in relation to the particular pathogen population represented by the curated database.<\/jats:p><jats:p>Such information will be useful both for clinical and basic research purposes.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-12-302","type":"journal-article","created":{"date-parts":[[2011,7,27]],"date-time":"2011-07-27T06:23:49Z","timestamp":1311747829000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Bayesian semi-supervised classification of bacterial samples using MLST databases"],"prefix":"10.1186","volume":"12","author":[{"given":"Lu","family":"Cheng","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thomas R","family":"Connor","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David M","family":"Aanensen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Brian G","family":"Spratt","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jukka","family":"Corander","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2011,7,26]]},"reference":[{"issue":"6","key":"4712_CR1","doi-asserted-by":"publisher","first-page":"3140","DOI":"10.1073\/pnas.95.6.3140","volume":"95","author":"M Maiden","year":"1998","unstructured":"Maiden M, Bygraves J, Feil E, Morelli G, Russell J, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant D, Feavers I, Achtman M, Spratt B: Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences of the United States of America 1998, 95(6):3140\u20133145. 10.1073\/pnas.95.6.3140","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"issue":"3","key":"4712_CR2","doi-asserted-by":"publisher","first-page":"312","DOI":"10.1016\/S1369-5274(99)80054-X","volume":"2","author":"B Spratt","year":"1999","unstructured":"Spratt B: Multilocus sequence typing: molecular typing of bacterial pathogens in an era of rapid DNA sequencing and the internet. Current opinion in microbiology 1999, 2(3):312\u2013316. 10.1016\/S1369-5274(99)80054-X","journal-title":"Current opinion in microbiology"},{"issue":"5","key":"4712_CR3","doi-asserted-by":"publisher","first-page":"1518","DOI":"10.1128\/JB.186.5.1518-1530.2004","volume":"186","author":"E Feil","year":"2004","unstructured":"Feil E, Li B, Aanensen D, Hanage W, Spratt B: eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data. Journal of bacteriology 2004, 186(5):1518\u20131530. 10.1128\/JB.186.5.1518-1530.2004","journal-title":"Journal of bacteriology"},{"key":"4712_CR4","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1016\/j.mbs.2006.09.015","volume":"205","author":"J Corander","year":"2007","unstructured":"Corander J, Tang J: Bayesian analysis of population structure based on linked molecular information. Mathematical biosciences 2007, 205: 19\u201331. 10.1016\/j.mbs.2006.09.015","journal-title":"Mathematical biosciences"},{"issue":"10","key":"4712_CR5","doi-asserted-by":"publisher","first-page":"2833","DOI":"10.1111\/j.1365-294X.2006.02994.x","volume":"15","author":"J Corander","year":"2006","unstructured":"Corander J, Marttinen P: Bayesian identification of admixture events using multilocus molecular markers. Molecular ecology 2006, 15(10):2833\u20132843. 10.1111\/j.1365-294X.2006.02994.x","journal-title":"Molecular ecology"},{"key":"4712_CR6","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1186\/1471-2105-9-539","volume":"9","author":"J Corander","year":"2008","unstructured":"Corander J, Marttinen P, Sir\u00e9n J, Tang J: Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC bioinformatics 2008, 9: 539. 10.1186\/1471-2105-9-539","journal-title":"BMC bioinformatics"},{"issue":"8","key":"4712_CR7","doi-asserted-by":"publisher","first-page":"e1000455","DOI":"10.1371\/journal.pcbi.1000455","volume":"5","author":"J Tang","year":"2009","unstructured":"Tang J, Hanage W, Fraser C, Corander J: Identifying currents in the gene pool for bacterial populations using an integrative approach. PLoS Computional Biology 2009, 5(8):e1000455. 10.1371\/journal.pcbi.1000455","journal-title":"PLoS Computional Biology"},{"issue":"S1","key":"4712_CR8","doi-asserted-by":"publisher","first-page":"S73","DOI":"10.1186\/1471-2105-10-S1-S73","volume":"10","author":"C Lee","year":"2009","unstructured":"Lee C, Abdool A, Huang C: PCA-based population structure inference with generic clustering algorithms. BMC bioinformatics 2009, 10(S1):S73.","journal-title":"BMC bioinformatics"},{"key":"4712_CR9","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/1471-2156-11-94","volume":"11","author":"T Jombart","year":"2010","unstructured":"Jombart T, Devillard S, Balloux F: Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC genetics 2010, 11: 94.","journal-title":"BMC genetics"},{"key":"4712_CR10","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198522195.001.0001","volume-title":"Graphical models","author":"S Lauritzen","year":"1996","unstructured":"Lauritzen S: Graphical models. Oxford: Oxford University Press; 1996."},{"key":"4712_CR11","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316870","volume-title":"Bayesian Theory","author":"JS Bernardo","year":"1994","unstructured":"Bernardo JS, Smith AFM: Bayesian Theory. Chichester: Wiley; 1994."},{"key":"4712_CR12","volume-title":"Pattern recognition and machine learning","author":"C Bishop","year":"2007","unstructured":"Bishop C: Pattern recognition and machine learning. New York: Springer; 2007."},{"key":"4712_CR13","volume-title":"Monte Carlo statistical methods","author":"C Robert","year":"2005","unstructured":"Robert C, Casella G: Monte Carlo statistical methods. New York: Springer; 2005."},{"issue":"5933","key":"4712_CR14","doi-asserted-by":"publisher","first-page":"1454","DOI":"10.1126\/science.1171908","volume":"324","author":"W Hanage","year":"2009","unstructured":"Hanage W, Fraser C, Tang J, Connor T, Corander J: Hyper-recombination, diversity, and antibiotic resistance in pneumococcus. Science 2009, 324(5933):1454\u20131457. 10.1126\/science.1171908","journal-title":"Science"},{"key":"4712_CR15","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1186\/1471-2105-10-90","volume":"10","author":"P Marttinen","year":"2009","unstructured":"Marttinen P, Myllykangas S, Corander J: Bayesian clustering and feature selection for cancer tissue samples. BMC bioinformatics 2009, 10: 90. 10.1186\/1471-2105-10-90","journal-title":"BMC bioinformatics"},{"key":"4712_CR16","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1186\/1471-2105-5-86","volume":"5","author":"K Jolley","year":"2004","unstructured":"Jolley K, Chan M, Maiden M: mlstdbNet - distributed multi-locus sequence typing(MLST) databases. BMC bioinformatics 2004, 5: 86. 10.1186\/1471-2105-5-86","journal-title":"BMC bioinformatics"},{"issue":"3","key":"4712_CR17","doi-asserted-by":"crossref","first-page":"1008","DOI":"10.1128\/JCM.38.3.1008-1015.2000","volume":"38","author":"M Enright","year":"2000","unstructured":"Enright M, Day N, Davies C, Peacock S, Spratt B: Multilocus sequence typing for characterization of methicillin-resistant and methicillin-susceptible clones of Staphylococcus aureus. Journal of clinical microbiology 2000, 38(3):1008\u20131015.","journal-title":"Journal of clinical microbiology"},{"key":"4712_CR18","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert L, Arabie P: Comparing partitions. Journal of classification 1985, 2: 193\u2013218. 10.1007\/BF01908075","journal-title":"Journal of classification"},{"issue":"8","key":"4712_CR19","doi-asserted-by":"publisher","first-page":"1596","DOI":"10.1093\/molbev\/msm092","volume":"24","author":"K Tamura","year":"2007","unstructured":"Tamura K, Dudley J, Nei M, Kumar S: MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Molecular biology and evolution 2007, 24(8):1596\u20131599. 10.1093\/molbev\/msm092","journal-title":"Molecular biology and evolution"},{"key":"4712_CR20","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1186\/1471-2105-9-421","volume":"9","author":"P Marttinen","year":"2008","unstructured":"Marttinen P, Baldwin A, Hanage W, Dowson C, Mahenthiralingam E, Corander J: Bayesian modeling of recombination events in bacterial populations. BMC bioinformatics 2008, 9: 421. 10.1186\/1471-2105-9-421","journal-title":"BMC bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-302.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,9]],"date-time":"2024-04-09T08:44:57Z","timestamp":1712652297000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-302"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,7,26]]},"references-count":20,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4712"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-302","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,7,26]]},"assertion":[{"value":"18 April 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 July 2011","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 July 2011","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"302"}}