{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:34Z","timestamp":1772138074341,"version":"3.50.1"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T00:00:00Z","timestamp":1743638400000},"content-version":"vor","delay-in-days":5,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Wellcome Trust\/DBT","award":["IA\/I\/17\/2\/503323"],"award-info":[{"award-number":["IA\/I\/17\/2\/503323"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Multi-drug resistant or hetero-resistant tuberculosis (TB) hinders the successful treatment of TB. Hetero-resistant TB occurs when multiple strains of the TB-causing bacterium with varying degrees of drug susceptibility are present in an individual. Existing studies predicting the proportion and identity of strains in a mixed infection sample rely on a reference database of known strains. A main challenge then is to identify de novo strains not present in the reference database, while quantifying the proportion of known strains.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present Demixer, a probabilistic generative model that uses a combination of reference-based and reference-free techniques to delineate mixed infection strains in whole genome sequencing (WGS) data. Demixer extends a topic model widely used in text mining to represent known mutations and discover novel ones. Parallelization and other heuristics enabled Demixer to process large datasets like CRyPTIC (Comprehensive Resistance Prediction for Tuberculosis: an International Consortium). In both synthetic and experimental benchmark datasets, our proposed method precisely detected the identity (e.g. 91.67% accuracy on the experimental in vitro dataset) as well as the proportions of the mixed strains. In real-world applications, Demixer revealed novel high confidence mixed infections (101 out of 1963 Malawi samples analysed), and new insights into the global frequency of mixed infection (2% at the most stringent threshold in the CRyPTIC dataset) and its significant association to drug resistance. Our approach is generalizable and hence applicable to any bacterial and viral WGS data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>All code relevant to Demixer is available at https:\/\/github.com\/BIRDSgroup\/Demixer.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf139","type":"journal-article","created":{"date-parts":[[2025,4,2]],"date-time":"2025-04-02T09:57:46Z","timestamp":1743587866000},"source":"Crossref","is-referenced-by-count":1,"title":["Demixer: a probabilistic generative model to delineate different strains of a microbial species in a mixed infection sample"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-2300-8171","authenticated-orcid":false,"given":"Brintha","family":"VP","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras , Chennai, 600036,","place":["India"]},{"name":"Center for Integrative Biology and Systems Medicine (IBSE), IIT Madras , Chennai, 600036,","place":["India"]},{"name":"Wadhwani School of Data Science and AI, IIT Madras , Chennai, 600036,","place":["India"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8490-4087","authenticated-orcid":false,"given":"Manikandan","family":"Narayanan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras , Chennai, 600036,","place":["India"]},{"name":"Center for Integrative Biology and Systems Medicine (IBSE), IIT Madras , Chennai, 600036,","place":["India"]},{"name":"Wadhwani School of Data Science and AI, IIT Madras , Chennai, 600036,","place":["India"]}]}],"member":"286","published-online":{"date-parts":[[2025,4,3]]},"reference":[{"key":"2025042117315447100_btaf139-B1","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1186\/s12864-020-6486-3","article-title":"QuantTB \u2013 a method to classify mixed Mycobacterium tuberculosis infections within whole genome sequencing data","volume":"21","author":"Anyansi","year":"2020","journal-title":"BMC Genomics"},{"key":"2025042117315447100_btaf139-B2","doi-asserted-by":"crossref","first-page":"1925","DOI":"10.3389\/fmicb.2020.01925","article-title":"Computational methods for strain-level microbial detection in colony and metagenome sequencing data","volume":"11","author":"Anyansi","year":"2020","journal-title":"Front Microbiol"},{"key":"2025042117315447100_btaf139-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10462-023-10419-1","article-title":"Impact of word embedding models on text analytics in deep learning environment: a review","volume":"56","author":"Asudani","year":"2023","journal-title":"Artif Intell Rev"},{"key":"2025042117315447100_btaf139-B4","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1164\/rccm.2401001","article-title":"Tuberculosis due to multiple strains a concern for the patient? A concern for tuberculosis control?","volume":"169","author":"Behr","year":"2004","journal-title":"Am J Respir Crit Care Med"},{"key":"2025042117315447100_btaf139-B5","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J Machine Learn Res"},{"key":"2025042117315447100_btaf139-B6","doi-asserted-by":"crossref","first-page":"e0004124","DOI":"10.1371\/journal.pntd.0004124","article-title":"Trends of Mycobacterium bovis isolation and first-line anti-tuberculosis drug susceptibility profile: a fifteen-year laboratory-based surveillance","volume":"9","author":"Bobadilla-del Valle","year":"2015","journal-title":"PLoS Negl Trop Dis"},{"key":"2025042117315447100_btaf139-B7","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1128\/CMR.00021-12","article-title":"Mixed-strain Mycobacterium tuberculosis infections and the implications for tuberculosis treatment and control","volume":"25","author":"Cohen","year":"2012","journal-title":"Clin Microbiol Rev"},{"key":"2025042117315447100_btaf139-B8","doi-asserted-by":"crossref","first-page":"e3001755","DOI":"10.1371\/journal.pbio.3001755","article-title":"Genome-wide association studies of global Mycobacterium tuberculosis resistance to 13 antimicrobials in 10,228 genomes identify new resistance mechanisms","volume":"20","author":"CRyPTIC Consortium","year":"2022","journal-title":"PLoS Biol"},{"key":"2025042117315447100_btaf139-B9","doi-asserted-by":"crossref","first-page":"e3001721","DOI":"10.1371\/journal.pbio.3001721","article-title":"A data compendium associating the genomes of 12,289 Mycobacterium tuberculosis isolates with quantitative resistance phenotypes to 13 antibiotics","volume":"20","author":"CRyPTIC Consortium","year":"2022","journal-title":"PLoS Biol"},{"key":"2025042117315447100_btaf139-B10","doi-asserted-by":"crossref","first-page":"btad648","DOI":"10.1093\/bioinformatics\/btad648","article-title":"fastlin: an ultra-fast program for Mycobacterium tuberculosis complex lineage typing","volume":"39","author":"Derelle","year":"2023","journal-title":"Bioinformatics"},{"key":"2025042117315447100_btaf139-B11","first-page":"000607","article-title":"SplitStrains, a tool to identify and separate mixed Mycobacterium tuberculosis infections from WGS data","volume-title":"Microb Genom","author":"Gabbasov","year":"2021"},{"key":"2025042117315447100_btaf139-B12","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.3201\/eid1907.130313","article-title":"Undetected multidrug-resistant tuberculosis amplified by first-line therapy in mixed infection","volume":"19","author":"Hingley-Wilson","year":"2013","journal-title":"Emerg Infect Dis"},{"key":"2025042117315447100_btaf139-B13","doi-asserted-by":"crossref","first-page":"4474","DOI":"10.1128\/JCM.00930-10","article-title":"Mixed infection with Beijing and non-Beijing strains and drug resistance pattern of Mycobacterium tuberculosis","volume":"48","author":"Huang","year":"2010","journal-title":"J Clin Microbiol"},{"key":"2025042117315447100_btaf139-B14","doi-asserted-by":"crossref","first-page":"157","DOI":"10.2147\/IDR.S341817","article-title":"Effect of mixed infections with Mycobacterium tuberculosis and nontuberculous mycobacteria on diagnosis of multidrug-resistant tuberculosis: a retrospective multicentre study in China","volume":"15","author":"Huang","year":"2022","journal-title":"Infect Drug Resist"},{"key":"2025042117315447100_btaf139-B15","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2025042117315447100_btaf139-B16","first-page":"1298542","article-title":"Mycobacterium tuberculosis next-generation whole genome sequencing: opportunities and challenges","volume":"2018","author":"Iketleng","year":"2018","journal-title":"Tuberc Res Treat"},{"key":"2025042117315447100_btaf139-B17","doi-asserted-by":"crossref","first-page":"lqaa009","DOI":"10.1093\/nargab\/lqaa009","article-title":"DeepMicrobes: taxonomic classification for metagenomics with deep learning","volume":"2","author":"Liang","year":"2020","journal-title":"NAR Genom Bioinform"},{"key":"2025042117315447100_btaf139-B18","doi-asserted-by":"crossref","first-page":"1608","DOI":"10.1186\/s40064-016-3252-8","article-title":"An overview of topic modeling and its current applications in bioinformatics","volume":"5","author":"Liu","year":"2016","journal-title":"Springerplus"},{"key":"2025042117315447100_btaf139-B19","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1016\/j.ccm.2019.08.002","article-title":"Treatment of drug-resistant tuberculosis","volume":"40","author":"Mase","year":"2019","journal-title":"Clin Chest Med"},{"key":"2025042117315447100_btaf139-B20","doi-asserted-by":"crossref","DOI":"10.1093\/femspd\/ftx020","article-title":"Relapse, re-infection and mixed infections in tuberculosis disease","volume":"75","author":"McIvor","year":"2017","journal-title":"Pathog Dis"},{"key":"2025042117315447100_btaf139-B21","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1186\/s13073-020-00817-3","article-title":"Robust barcoding and identification of Mycobacterium tuberculosis lineages for epidemiological and clinical studies","volume":"12","author":"Napier","year":"2020","journal-title":"Genome Med"},{"key":"2025042117315447100_btaf139-B22","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/s13073-019-0650-x","article-title":"Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs","volume":"11","author":"Phelan","year":"2019","journal-title":"Genome Med"},{"key":"2025042117315447100_btaf139-B23","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1093\/biostatistics\/kxy018","article-title":"Latent variable modeling for the microbiome","volume":"20","author":"Sankaran","year":"2019","journal-title":"Biostatistics"},{"key":"2025042117315447100_btaf139-B24","doi-asserted-by":"crossref","first-page":"a017863","DOI":"10.1101\/cshperspect.a017863","article-title":"Multidrug-resistant tuberculosis and extensively drug-resistant tuberculosis","volume":"5","author":"Seung","year":"2015","journal-title":"Cold Spring Harb Perspect Med"},{"key":"2025042117315447100_btaf139-B25","doi-asserted-by":"crossref","first-page":"e01594\u201321","DOI":"10.1128\/spectrum.01594-21","article-title":"Mycobacterium tuberculosis lineages associated with mutations and drug resistance in isolates from India","volume":"10","author":"Shanmugam","year":"2022","journal-title":"Microbiol Spectr"},{"key":"2025042117315447100_btaf139-B26","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1186\/s12864-018-4988-z","article-title":"Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data","volume":"19","author":"Sobkowiak","year":"2018","journal-title":"BMC Genomics"},{"key":"2025042117315447100_btaf139-B27","first-page":"1973","article-title":"Rethinking LDA: why priors matter","volume":"22","author":"Wallach","year":"2009","journal-title":"Adv Neural Inform Process Syst"},{"key":"2025042117315447100_btaf139-B28","author":"Zhou","year":"2023"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf139\/62856461\/btaf139.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf139\/62856461\/btaf139.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf139\/62856461\/btaf139.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,21]],"date-time":"2025-04-21T17:32:03Z","timestamp":1745256723000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf139\/8105556"}},"subtitle":[],"editor":[{"given":"Christina","family":"Kendziorski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,3,29]]},"references-count":28,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,3,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf139","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.04.11.589150","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,4]]},"published":{"date-parts":[[2025,3,29]]},"article-number":"btaf139"}}