{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T10:50:58Z","timestamp":1767178258031,"version":"build-2238731810"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008518","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,1,5]],"date-time":"2021-01-05T00:00:00Z","timestamp":1609804800000}}],"reference-count":57,"publisher":"Public Library of Science (PLoS)","issue":"12","license":[{"start":{"date-parts":[[2020,12,21]],"date-time":"2020-12-21T00:00:00Z","timestamp":1608508800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    Tuberculosis disease is a major global public health concern and the growing prevalence of drug-resistant\n                    <jats:italic>Mycobacterium tuberculosis<\/jats:italic>\n                    is making disease control more difficult. However, the increasing application of whole-genome sequencing as a diagnostic tool is leading to the profiling of drug resistance to inform clinical practice and treatment decision making. Computational approaches for identifying established and novel resistance-conferring mutations in genomic data include genome-wide association study (GWAS) methodologies, tests for convergent evolution and machine learning techniques. These methods may be confounded by extensive co-occurrent resistance, where statistical models for a drug include unrelated mutations known to be causing resistance to other drugs. Here, we introduce a novel \u2018cannibalistic\u2019 elimination algorithm (\u201cHungry, Hungry SNPos\u201d) that attempts to remove these co-occurrent resistant variants. Using an\n                    <jats:italic>M. tuberculosis<\/jats:italic>\n                    genomic dataset for the virulent Beijing strain-type (n = 3,574) with phenotypic resistance data across five drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, and streptomycin), we demonstrate that this new approach is considerably more robust than traditional methods and detects resistance-associated variants too rare to be likely picked up by correlation-based techniques like GWAS.\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1008518","type":"journal-article","created":{"date-parts":[[2020,12,21]],"date-time":"2020-12-21T14:39:20Z","timestamp":1608561560000},"page":"e1008518","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":18,"title":["Robust detection of point mutations involved in multidrug-resistant Mycobacterium tuberculosis in the presence of co-occurrent resistance markers"],"prefix":"10.1371","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2208-0895","authenticated-orcid":true,"given":"Julian","family":"Libiseller-Egger","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8323-7019","authenticated-orcid":true,"given":"Jody","family":"Phelan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1403-6138","authenticated-orcid":true,"given":"Susana","family":"Campino","sequence":"additional","affiliation":[]},{"given":"Fady","family":"Mohareb","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8985-9265","authenticated-orcid":true,"given":"Taane G.","family":"Clark","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2020,12,21]]},"reference":[{"key":"pcbi.1008518.ref001","unstructured":"World Health Organisation. Global Tuberculosis Report; 2018."},{"issue":"6685","key":"pcbi.1008518.ref002","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1038\/31159","article-title":"Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence","volume":"393","author":"ST Cole","year":"1998","journal-title":"Nature"},{"key":"pcbi.1008518.ref003","doi-asserted-by":"crossref","first-page":"4812","DOI":"10.1038\/ncomms5812","article-title":"A robust SNP barcode for typing Mycobacterium tuberculosis complex strains","volume":"5","author":"F Coll","year":"2014","journal-title":"Nature communications"},{"issue":"1","key":"pcbi.1008518.ref004","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12916-016-0575-9","article-title":"Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance","volume":"14","author":"J Phelan","year":"2016","journal-title":"BMC Medicine"},{"issue":"5","key":"pcbi.1008518.ref005","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1038\/nrg3664","article-title":"Genomic insights into tuberculosis","volume":"15","author":"JE Galagan","year":"2014","journal-title":"Nature Reviews Genetics"},{"issue":"1","key":"pcbi.1008518.ref006","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1186\/s12864-019-5615-3","article-title":"Genome-wide analysis of Mycobacterium tuberculosis polymorphisms reveals lineage-specific associations with drug resistance","volume":"20","author":"YEA Oppong","year":"2019","journal-title":"BMC Genomics"},{"issue":"3","key":"pcbi.1008518.ref007","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/ng.3195","article-title":"Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage","volume":"47","author":"M Merker","year":"2015","journal-title":"Nature Genetics"},{"issue":"MAR","key":"pcbi.1008518.ref008","article-title":"Multiple introductions of Mycobacterium tuberculosis Lineage 2-Beijing into Africa over centuries","volume":"7","author":"LK Rutaihwa","year":"2019","journal-title":"Frontiers in Ecology and Evolution"},{"issue":"2","key":"pcbi.1008518.ref009","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1038\/s41588-017-0029-0","article-title":"Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis","volume":"50","author":"F Coll","year":"2018","journal-title":"Nature genetics"},{"issue":"1","key":"pcbi.1008518.ref010","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/s13073-019-0650-x","article-title":"Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs","volume":"11","author":"JE Phelan","year":"2019","journal-title":"Genome Medicine"},{"issue":"4","key":"pcbi.1008518.ref011","doi-asserted-by":"crossref","DOI":"10.1099\/mgen.0.000361","article-title":"Bayesian reconstruction of Mycobacterium tuberculosis transmission networks in a high incidence area over two decades in Malawi reveals associated risk factors and genomic variants","volume":"6","author":"B Sobkowiak","year":"2020","journal-title":"Microbial Genomics"},{"issue":"10","key":"pcbi.1008518.ref012","doi-asserted-by":"crossref","first-page":"1183","DOI":"10.1038\/ng.2747","article-title":"Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis","volume":"45","author":"MR Farhat","year":"2013","journal-title":"Nature Genetics"},{"issue":"2","key":"pcbi.1008518.ref013","doi-asserted-by":"crossref","first-page":"e1005958","DOI":"10.1371\/journal.pcbi.1005958","article-title":"A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination","volume":"14","author":"C Collins","year":"2018","journal-title":"PLoS Computational Biology"},{"key":"pcbi.1008518.ref014","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/j.mib.2015.03.002","article-title":"The advent of genome-wide association studies for bacteria","volume":"25","author":"PE Chen","year":"2015","journal-title":"Current Opinion in Microbiology"},{"issue":"5","key":"pcbi.1008518.ref015","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/nmicrobiol.2016.41","article-title":"Identifying lineage effects when controlling for population structure improves power in bacterial association studies","volume":"1","author":"SG Earle","year":"2016","journal-title":"Nature Microbiology"},{"key":"pcbi.1008518.ref016","article-title":"Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes","volume":"7","author":"JA Lees","year":"2016","journal-title":"Nature Communications"},{"issue":"10","key":"pcbi.1008518.ref017","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1038\/nmeth.1681","article-title":"FaST linear mixed models for genome-wide association studies","volume":"8","author":"C Lippert","year":"2011","journal-title":"Nature Methods"},{"issue":"11","key":"pcbi.1008518.ref018","doi-asserted-by":"crossref","first-page":"e1007758","DOI":"10.1371\/journal.pgen.1007758","article-title":"A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events","volume":"14","author":"M Jaillard","year":"2018","journal-title":"PLoS genetics"},{"issue":"12","key":"pcbi.1008518.ref019","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pcbi.1006258","article-title":"Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data","volume":"14","author":"D Moradigaravand","year":"2018","journal-title":"PLoS Computational Biology"},{"issue":"13","key":"pcbi.1008518.ref020","doi-asserted-by":"crossref","first-page":"i89","DOI":"10.1093\/bioinformatics\/bty276","article-title":"A pan-genome-based machine learning approach for predicting antimicrobial resistance activities of the Escherichia coli strains","volume":"34","author":"HL Her","year":"2018","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1008518.ref021","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-018-2403-z","article-title":"Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection","volume":"19","author":"P Mah\u00e9","year":"2018","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"pcbi.1008518.ref022","doi-asserted-by":"crossref","first-page":"1666","DOI":"10.1093\/bioinformatics\/btx801","article-title":"Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data","volume":"34","author":"Y Yang","year":"2018","journal-title":"Bioinformatics"},{"issue":"November 2018","key":"pcbi.1008518.ref023","first-page":"2276","article-title":"Application of machine learning techniques to tuberculosis drug resistance analysis","volume":"35","author":"S Kouchaki","year":"2018","journal-title":"Bioinformatics"},{"key":"pcbi.1008518.ref024","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1016\/j.ebiom.2019.04.016","article-title":"Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction","volume":"43","author":"ML Chen","year":"2019","journal-title":"EBioMedicine"},{"issue":"January","key":"pcbi.1008518.ref025","first-page":"1","article-title":"DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis","author":"Y Yang","year":"2019","journal-title":"Bioinformatics"},{"issue":"922","key":"pcbi.1008518.ref026","article-title":"Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data","volume":"10","author":"W Deelder","year":"2019","journal-title":"Front Genet"},{"issue":"21","key":"pcbi.1008518.ref027","doi-asserted-by":"crossref","first-page":"2987","DOI":"10.1093\/bioinformatics\/btr509","article-title":"A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data","volume":"27","author":"H Li","year":"2011","journal-title":"Bioinformatics (Oxford, England)"},{"issue":"3","key":"pcbi.1008518.ref028","doi-asserted-by":"crossref","first-page":"e9490","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2\u2014Approximately Maximum-Likelihood Trees for Large Alignments","volume":"5","author":"MN Price","year":"2010","journal-title":"PLoS ONE"},{"key":"pcbi.1008518.ref029","author":"EM Ortiz","year":"2019","journal-title":"vcf2phylip v2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis"},{"key":"pcbi.1008518.ref030","article-title":"RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference","author":"AM Kozlov","year":"2019","journal-title":"Bioinformatics"},{"issue":"6","key":"pcbi.1008518.ref031","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","article-title":"A new look at the statistical model identification","volume":"19","author":"H Akaike","year":"1974","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"2","key":"pcbi.1008518.ref032","doi-asserted-by":"crossref","first-page":"3041","DOI":"10.1093\/molbev\/msy194","article-title":"Two methods for mapping and visualizing associated data on phylogeny using ggtree","volume":"35","author":"G Yu","year":"2018","journal-title":"Molecular Biology and Evolution"},{"issue":"24","key":"pcbi.1008518.ref033","doi-asserted-by":"crossref","first-page":"4310","DOI":"10.1093\/bioinformatics\/bty539","article-title":"pyseer: A comprehensive tool for microbial pangenome-wide association studies","volume":"34","author":"JA Lees","year":"2018","journal-title":"Bioinformatics"},{"key":"pcbi.1008518.ref034","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"F Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"pcbi.1008518.ref035","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression Shrinkage and Selection via the Lasso","volume":"58","author":"R Tibshirani","year":"1996","journal-title":"Journal of the Royal Statistical Society Series B (Methodological)"},{"issue":"3","key":"pcbi.1008518.ref036","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"C Cortes","year":"1995","journal-title":"Machine Learning"},{"key":"pcbi.1008518.ref037","doi-asserted-by":"crossref","DOI":"10.1201\/9781315139470","volume-title":"Classification And Regression Trees","author":"L Breiman","year":"2017"},{"issue":"1","key":"pcbi.1008518.ref038","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"L Breiman","year":"2001","journal-title":"Machine Learning"},{"key":"pcbi.1008518.ref039","first-page":"1189","article-title":"Greedy function approximation: a gradient boosting machine","author":"JH Friedman","year":"2001","journal-title":"Annals of statistics"},{"key":"pcbi.1008518.ref040","unstructured":"Chollet F, et al. Keras; 2015. Available from: https:\/\/keras.io."},{"key":"pcbi.1008518.ref041","author":"M Abadi","year":"2015","journal-title":"TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems"},{"key":"pcbi.1008518.ref042","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1007\/978-3-540-31865-1_25","volume-title":"Advances in Information Retrieval","author":"C Goutte","year":"2005"},{"key":"pcbi.1008518.ref043","doi-asserted-by":"crossref","DOI":"10.1002\/9781118548387","volume-title":"Applied logistic regression","author":"DW Hosmer","year":"2013"},{"key":"pcbi.1008518.ref044","author":"RJ Nowling","year":"2017","journal-title":"Testing Feature Significance with the Likelihood Ratio Test"},{"issue":"10","key":"pcbi.1008518.ref045","doi-asserted-by":"crossref","first-page":"1340","DOI":"10.1093\/bioinformatics\/btq134","article-title":"Permutation importance: a corrected feature importance measure","volume":"26","author":"A Altmann","year":"2010","journal-title":"Bioinformatics"},{"key":"pcbi.1008518.ref046","volume-title":"A guide to NumPy","author":"TE Oliphant","year":"2006"},{"issue":"2","key":"pcbi.1008518.ref047","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/MCSE.2011.37","article-title":"The NumPy array: a structure for efficient numerical computation","volume":"13","author":"S Van Der Walt","year":"2011","journal-title":"Computing in Science & Engineering"},{"key":"pcbi.1008518.ref048","article-title":"SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python","author":"P Virtanen","year":"2020","journal-title":"Nature Methods"},{"key":"pcbi.1008518.ref049","doi-asserted-by":"crossref","unstructured":"McKinney W. Data Structures for Statistical Computing in Python. In: van der Walt S, Millman J, editors. Proceedings of the 9th Python in Science Conference; 2010. p. 51\u201356.","DOI":"10.25080\/Majora-92bf1922-00a"},{"key":"pcbi.1008518.ref050","doi-asserted-by":"crossref","unstructured":"Lam SK, Pitrou A, Seibert S. Numba: A LLVM-based Python JIT Compiler. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC. LLVM\u201915. New York, NY, USA: ACM; 2015. p. 7:1\u20137:6. Available from: http:\/\/doi.acm.org\/10.1145\/2833157.2833162.","DOI":"10.1145\/2833157.2833162"},{"key":"pcbi.1008518.ref051","doi-asserted-by":"crossref","unstructured":"Matsakis ND, Klock II FS. The rust language. In: ACM SIGAda Ada Letters. vol. 34. ACM; 2014. p. 103\u2013104.","DOI":"10.1145\/2692956.2663188"},{"key":"pcbi.1008518.ref052","article-title":"Genome-wide identification of lineage and locus specific variation associated with pneumococcal carriage duration","volume":"6","author":"JA Lees","year":"2017","journal-title":"eLife"},{"issue":"1","key":"pcbi.1008518.ref053","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-40561-2","article-title":"Interpretable genotype-to-phenotype classifiers with performance guarantees","volume":"9","author":"A Drouin","year":"2019","journal-title":"Scientific Reports"},{"issue":"1","key":"pcbi.1008518.ref054","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","article-title":"Ridge Regression: Biased Estimation for Nonorthogonal Problems","volume":"12","author":"AE Hoerl","year":"1970","journal-title":"Technometrics"},{"issue":"2","key":"pcbi.1008518.ref055","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"H Zou","year":"2005","journal-title":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)"},{"key":"pcbi.1008518.ref056","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1016\/j.ijid.2016.11.422","article-title":"Removing the bottleneck in whole genome sequencing of Mycobacterium tuberculosis for rapid drug resistance analysis: a call to action","volume":"56","author":"R McNerney","year":"2017","journal-title":"International journal of infectious diseases: IJID: official publication of the International Society for Infectious Diseases"},{"issue":"1","key":"pcbi.1008518.ref057","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1007\/s10994-014-5451-2","article-title":"The Effect of Splitting on Random Forests","volume":"99","author":"H Ishwaran","year":"2015","journal-title":"Machine learning"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008518","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,1,5]],"date-time":"2021-01-05T00:00:00Z","timestamp":1609804800000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008518","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,5]],"date-time":"2021-01-05T16:09:01Z","timestamp":1609862941000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008518"}},"subtitle":[],"editor":[{"given":"Roger Dimitri","family":"Kouyos","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,12,21]]},"references-count":57,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2020,12,21]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008518","relation":{},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,21]]}}}