{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T00:55:49Z","timestamp":1780448149254,"version":"3.54.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2020,12,15]],"date-time":"2020-12-15T00:00:00Z","timestamp":1607990400000},"content-version":"vor","delay-in-days":349,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units\u2014variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation\/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the \u2018SpolLineages\u2019 software tool (https:\/\/github.com\/dcouvin\/SpolLineages), which implements these approaches for MTBC spoligotype families\u2019 identification.<\/jats:p>","DOI":"10.1093\/database\/baaa108","type":"journal-article","created":{"date-parts":[[2020,11,20]],"date-time":"2020-11-20T20:24:00Z","timestamp":1605903840000},"source":"Crossref","is-referenced-by-count":15,"title":["Novel methods included in SpolLineages tool for fast and precise prediction of<i>Mycobacterium tuberculosis<\/i>complex spoligotype families"],"prefix":"10.1093","volume":"2020","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7897-9023","authenticated-orcid":false,"given":"David","family":"Couvin","sequence":"first","affiliation":[{"name":"WHO Supranational TB Reference Laboratory, Tuberculosis and Mycobacteria Unit, Institut Pasteur de la Guadeloupe, F-97183, Abymes, Guadeloupe, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wilfried","family":"Segretier","sequence":"additional","affiliation":[{"name":"Laboratoire de Math\u00e9matiques Informatique et Applications (LAMIA), Universit\u00e9 des Antilles, F-97154, Pointe-\u00e0-Pitre, Guadeloupe, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Erick","family":"Stattner","sequence":"additional","affiliation":[{"name":"Laboratoire de Math\u00e9matiques Informatique et Applications (LAMIA), Universit\u00e9 des Antilles, F-97154, Pointe-\u00e0-Pitre, Guadeloupe, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nalin","family":"Rastogi","sequence":"additional","affiliation":[{"name":"WHO Supranational TB Reference Laboratory, Tuberculosis and Mycobacteria Unit, Institut Pasteur de la Guadeloupe, F-97183, Abymes, Guadeloupe, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2020,12,15]]},"reference":[{"key":"2020121509335072100_R1","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1099\/ijsem.0.002507","article-title":"Phylogenomic analysis of the species of the Mycobacterium tuberculosis complex demonstrates that Mycobacterium africanum, Mycobacterium bovis, Mycobacterium caprae, Mycobacterium microti and Mycobacterium pinnipedii are later heterotypic synonyms of Mycobacterium tuberculosis","volume":"68","author":"Riojas","year":"2018","journal-title":"Int. J. Syst. Evol. Microbiol."},{"key":"2020121509335072100_R2","volume-title":"Global Tuberculosis Report 2019","author":"World Health Organization (WHO)","year":"2019"},{"key":"2020121509335072100_R3","doi-asserted-by":"crossref","first-page":"4498","DOI":"10.1128\/JCM.01392-06","article-title":"Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis","volume":"44","author":"Supply","year":"2006","journal-title":"J. Clin. Microbiol."},{"key":"2020121509335072100_R4","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1128\/JCM.35.4.907-914.1997","article-title":"Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology","volume":"35","author":"Kamerbeek","year":"1997","journal-title":"J. Clin. Microbiol."},{"key":"2020121509335072100_R5","doi-asserted-by":"crossref","first-page":"1535","DOI":"10.1038\/ng.3704","article-title":"Mycobacterium tuberculosis lineage 4 comprises globally distributed and geographically restricted sublineages","volume":"48","author":"Stucki","year":"2016","journal-title":"Nat. Genet."},{"key":"2020121509335072100_R6","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/j.meegid.2018.12.030","article-title":"Macro-geographical specificities of the prevailing tuberculosis epidemic as seen through SITVIT2, an updated version of the Mycobacterium tuberculosis genotyping database","volume":"72","author":"Couvin","year":"2019","journal-title":"Infect. Genet. Evol."},{"key":"2020121509335072100_R7","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2180-6-23","article-title":"Mycobacterium tuberculosis complex genetic diversity: mining the fourth international spoligotyping database (SpolDB4) for classification, population genetics and epidemiology","volume":"6","author":"Brudey","year":"2006","journal-title":"BMC Microbiol."},{"key":"2020121509335072100_R8","doi-asserted-by":"crossref","DOI":"10.1016\/j.meegid.2012.02.004","article-title":"SITVITWEB \u2013 a publicly available international multimarker database for studying Mycobacterium tuberculosis genetic diversity and molecular epidemiology","volume":"12","author":"Demay","year":"2012","journal-title":"Infect. Genet. Evol."},{"key":"2020121509335072100_R9","article-title":"A sister lineage of the Mycobacterium tuberculosis complex discovered in the African Great Lakes region","author":"Ngabonziza","year":"2020","journal-title":"Nat Commun 11, 2917"},{"key":"2020121509335072100_R10","article-title":"Phylogenomics of Mycobacterium africanum reveals a new lineage and a complex evolutionary history","author":"Coscolla","year":"2020","journal-title":"bioRxiv"},{"key":"2020121509335072100_R11","first-page":"82","article-title":"Knowledge discovery and data mining: towards a unifying framework","author":"Fayyad","year":"1996","journal-title":"KDD-96"},{"key":"2020121509335072100_R12","volume-title":"Adaptation in Natural and Artificial Systems","author":"Holland","year":"1975"},{"key":"2020121509335072100_R13","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780195099713.001.0001","volume-title":"Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms","author":"B\u00e4ck","year":"1996"},{"key":"2020121509335072100_R14","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-662-04923-5","volume-title":"Data Mining and Knowledge Discovery with Evolutionary Algorithms","author":"Freitas","year":"2002"},{"key":"2020121509335072100_R15","first-page":"844","article-title":"Evolutionary predictive modelling for flash floods","author":"Segretier","year":"2013"},{"key":"2020121509335072100_R16","first-page":"185","article-title":"SM2D: a modular knowledge discovery approach applied to hydrological forecasting","author":"Segretier","year":"2013"},{"key":"2020121509335072100_R17","doi-asserted-by":"crossref","first-page":"W326","DOI":"10.1093\/nar\/gkq351","article-title":"MIRU-VNTRplus: a web tool for polyphasic genotyping of Mycobacterium tuberculosis complex bacteria","volume":"38","author":"Weniger","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2020121509335072100_R18","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.meegid.2018.06.029","article-title":"Towards better prediction of Mycobacterium tuberculosis lineages from MIRU-VNTR data","volume":"72","author":"Thain","year":"2019","journal-title":"Infect. Genet. Evol."},{"key":"2020121509335072100_R19","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1016\/j.meegid.2012.02.010","article-title":"TB-Lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex","volume":"12","author":"Shabbeer","year":"2012","journal-title":"Infect. Genet. Evol."},{"key":"2020121509335072100_R20","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0130912","article-title":"Genomics and machine learning for taxonomy consensus: the Mycobacterium tuberculosis complex paradigm","volume":"10","author":"Az\u00e9","year":"2015","journal-title":"PLoS One"},{"key":"2020121509335072100_R21","doi-asserted-by":"crossref","first-page":"2869","DOI":"10.1073\/pnas.0511240103","article-title":"Variable host-pathogen compatibility in Mycobacterium tuberculosis","volume":"103","author":"Gagneux","year":"2006","journal-title":"Proc. Natl. Acad. Sci. U.S.A."},{"key":"2020121509335072100_R22","doi-asserted-by":"crossref","first-page":"4457","DOI":"10.1128\/JCM.40.12.4457-4465.2002","article-title":"Microevolution of the direct repeat region of Mycobacterium tuberculosis: implications for interpretation of spoligotyping data","volume":"40","author":"Warren","year":"2002","journal-title":"J. Clin. Microbiol."},{"key":"2020121509335072100_R23","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-12-224","article-title":"Using affinity propagation for identifying subspecies among clonal organisms: lessons from M. tuberculosis","volume":"12","author":"Borile","year":"2011","journal-title":"BMC Bioinform."},{"key":"2020121509335072100_R24","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.tube.2017.04.007","article-title":"SpolSimilaritySearch \u2013 a web tool to com-pare and search similarities between spoligotypes of Mycobacterium tuberculosis complex","volume":"105","author":"Couvin","year":"2017","journal-title":"Tuberculosis"},{"key":"2020121509335072100_R25","doi-asserted-by":"crossref","first-page":"2991","DOI":"10.1093\/bioinformatics\/bts544","article-title":"SpolPred: rapid and accurate prediction of Mycobacterium tuberculosis spoligotypes from short genomic sequences","volume":"28","author":"Coll","year":"2012","journal-title":"Bioinformatics"},{"key":"2020121509335072100_R26","doi-asserted-by":"crossref","DOI":"10.1186\/s13073-016-0270-7","article-title":"SpoTyping: fast and accurate in silico Mycobacterium spoligotyping from sequence reads","volume":"8","author":"Xia","year":"2016","journal-title":"Genome. Med."},{"key":"2020121509335072100_R27","doi-asserted-by":"crossref","DOI":"10.7717\/peerj.5090","article-title":"MIRU-profiler: a rapid tool for determination of 24-loci MIRU-VNTR profiles from assembled genomes of Mycobacterium tuberculosis","volume":"6","author":"Rajwani","year":"2018","journal-title":"PeerJ"},{"key":"2020121509335072100_R28","doi-asserted-by":"crossref","first-page":"1625","DOI":"10.1093\/bioinformatics\/btz771","article-title":"MIRUReader: MIRU-VNTR typing directly from long sequencing reads","volume":"36","author":"Tang","year":"2019","journal-title":"Bioinformatics"},{"key":"2020121509335072100_R29","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2164-15-881","article-title":"KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes","volume":"15","author":"Steiner","year":"2014","journal-title":"BMC Genomics"},{"key":"2020121509335072100_R30","doi-asserted-by":"crossref","DOI":"10.1186\/s13073-019-0650-x","article-title":"Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs","volume":"11","author":"Phelan","year":"2019","journal-title":"Genome Med."},{"key":"2020121509335072100_R31","doi-asserted-by":"crossref","first-page":"1908","DOI":"10.1128\/JCM.00025-15","article-title":"PhyResSE: a web tool delineating Mycobacterium tuberculosis antibiotic resistance and lineage from whole-genome sequencing data","volume":"53","author":"Feuerriegel","year":"2015","journal-title":"J. Clin. Microbiol."},{"key":"2020121509335072100_R32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0142951","article-title":"TGS-TB: total genotyping solution for Mycobacterium tuberculosis using short-read whole-genome sequencing","volume":"10","author":"Sekizuka","year":"2015","journal-title":"PLoS One"},{"key":"2020121509335072100_R33","doi-asserted-by":"crossref","first-page":"482","DOI":"10.3201\/eid2503.180894","article-title":"SNP-IT tool for identifying subspecies and associated lineages of Mycobacterium tuberculosis complex","volume":"25","author":"Lipworth","year":"2019","journal-title":"Emerging Infect. Dis."},{"key":"2020121509335072100_R34","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/21.97458","article-title":"A survey of decision tree classifier methodology","volume":"21","author":"Safavian","year":"1991","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"2020121509335072100_R35","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochim. Biophys. Acta Protein Struct."},{"key":"2020121509335072100_R36","first-page":"1114","article-title":"Decision tree analysis on j48 algorithm for data mining","volume":"3","author":"Bhargava","year":"2013","journal-title":"Int. J. Adv. Res. Comput. Sci. Softw. Eng."},{"key":"2020121509335072100_R37","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1007\/978-3-540-48765-4_16","volume-title":"Multiple Approaches to Intelligent Systems","author":"Rocha","year":"1999"},{"key":"2020121509335072100_R38","first-page":"73","article-title":"Crossover, macromutation, and population-based search","author":"Jones","year":"1995"},{"key":"2020121509335072100_R39","first-page":"316","volume-title":"Foundations of Genetic Algorithms","author":"Muhlenbein","year":"1991"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaa108\/34910894\/baaa108.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaa108\/34910894\/baaa108.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,13]],"date-time":"2023-10-13T00:28:31Z","timestamp":1697156911000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baaa108\/6035122"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,1]]},"references-count":39,"URL":"https:\/\/doi.org\/10.1093\/database\/baaa108","relation":{},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,1,1]]},"published":{"date-parts":[[2020,1,1]]},"article-number":"baaa108"}}