{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T07:31:40Z","timestamp":1768635100375,"version":"3.49.0"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2020,7,13]],"date-time":"2020-07-13T00:00:00Z","timestamp":1594598400000},"content-version":"vor","delay-in-days":12,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["1252318"],"award-info":[{"award-number":["1252318"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The human body hosts more microbial organisms than human cells. Analysis of this microbial diversity provides key insight into the role played by these microorganisms on human health. Metagenomics is the collective DNA sequencing of coexisting microbial organisms in an environmental sample or a host. This has several applications in precision medicine, agriculture, environmental science and forensics. State-of-the-art predictive models for phenotype predictions from metagenomic data rely on alignments, assembly, extensive pruning, taxonomic profiling and reference sequence databases. These processes are time consuming and they do not consider novel microbial sequences when aligned with the reference genome, limiting the potential of whole metagenomics. We formulate the problem of predicting human disease from whole-metagenomic data using Multiple Instance Learning (MIL), a popular supervised learning paradigm. Our proposed alignment-free approach provides higher accuracy in prediction by harnessing the capability of deep convolutional neural network (CNN) within a MIL framework and provides interpretability via neural attention mechanism.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The MIL formulation combined with the hierarchical feature extraction capability of deep-CNN provides significantly better predictive performance compared to popular existing approaches. The attention mechanism allows for the identification of groups of sequences that are likely to be correlated to diseases providing the much-needed interpretation. Our proposed approach does not rely on alignment, assembly and reference sequence databases; making it fast and scalable for large-scale metagenomic data. We evaluate our method on well-known large-scale metagenomic studies and show that our proposed approach outperforms comparative state-of-the-art methods for disease prediction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/mrahma23\/IDMIL.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa477","type":"journal-article","created":{"date-parts":[[2020,5,5]],"date-time":"2020-05-05T19:19:51Z","timestamp":1588706391000},"page":"i39-i47","source":"Crossref","is-referenced-by-count":14,"title":["IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data"],"prefix":"10.1093","volume":"36","author":[{"given":"Mohammad Arifur","family":"Rahman","sequence":"first","affiliation":[{"name":"Department of Computer Science, George Mason University , Fairfax, VA 22030, USA"}]},{"given":"Huzefa","family":"Rangwala","sequence":"additional","affiliation":[{"name":"Department of Computer Science, George Mason University , Fairfax, VA 22030, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,7,13]]},"reference":[{"key":"2024021913361995100_btaa477-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2024021913361995100_btaa477-B2","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.artint.2013.06.003","article-title":"Multiple instance classification: review, taxonomy and comparative study","volume":"201","author":"Amores","year":"2013","journal-title":"Artif. Intell"},{"key":"2024021913361995100_btaa477-B3","first-page":"577","author":"Andrews","year":"2003"},{"key":"2024021913361995100_btaa477-B4","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/s40168-018-0401-z","article-title":"DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data","volume":"6","author":"Arango-Argoty","year":"2018","journal-title":"Microbiome"},{"key":"2024021913361995100_btaa477-B5","author":"Ba","year":"2016"},{"key":"2024021913361995100_btaa477-B6","doi-asserted-by":"crossref","first-page":"1915","DOI":"10.1126\/science.1104816","article-title":"Host-bacterial mutualism in the human intestine","volume":"307","author":"Backhed","year":"2005","journal-title":"Science"},{"key":"2024021913361995100_btaa477-B7","first-page":"105","author":"Bunescu","year":"2007"},{"key":"2024021913361995100_btaa477-B8","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1038\/s41576-019-0113-7","article-title":"Clinical metagenomics","volume":"20","author":"Chiu","year":"2019","journal-title":"Nat. Rev. Genet"},{"key":"2024021913361995100_btaa477-B9","author":"Chung","year":"2014"},{"key":"2024021913361995100_btaa477-B10","first-page":"933","author":"Dauphin","year":"2017"},{"key":"2024021913361995100_btaa477-B11","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/S0004-3702(96)00034-3","article-title":"Solving the multiple instance problem with axis-parallel rectangles","volume":"89","author":"Dietterich","year":"1997","journal-title":"Artif. Intell"},{"key":"2024021913361995100_btaa477-B12","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1186\/s12859-018-2033-5","article-title":"Phylogenetic convolutional neural networks in metagenomics","volume":"19","author":"Fioravanti","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2024021913361995100_btaa477-B13","volume-title":"Deep learning","author":"Goodfellow","year":"2016"},{"key":"2024021913361995100_btaa477-B14","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1016\/j.patcog.2017.10.013","article-title":"Recent advances in convolutional neural networks","volume":"77","author":"Gu","year":"2018","journal-title":"Pattern Recogn"},{"key":"2024021913361995100_btaa477-B15","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1128\/MMBR.68.4.669-685.2004","article-title":"Metagenomics: application of genomics to uncultured microorganisms","volume":"68","author":"Handelsman","year":"2004","journal-title":"Microbiol. Mol. Biol. Rev"},{"key":"2024021913361995100_btaa477-B16","volume-title":"Inequalities","author":"Hardy","year":"1952"},{"key":"2024021913361995100_btaa477-B17","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1038\/455481a","article-title":"Microbiology: metagenomics","volume":"455","author":"Hugenholtz","year":"2008","journal-title":"Nature"},{"key":"2024021913361995100_btaa477-B18","author":"Ilse","year":"2018"},{"key":"2024021913361995100_btaa477-B19","first-page":"597","author":"Kotzias","year":"2015"},{"key":"2024021913361995100_btaa477-B20","first-page":"1097","author":"Krizhevsky","year":"2012"},{"key":"2024021913361995100_btaa477-B21","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1053\/j.gastro.2018.04.028","article-title":"Association between bacteremia from specific microbes and subsequent diagnosis of colorectal cancer","volume":"155","author":"Kwong","year":"2018","journal-title":"Gastroenterology"},{"key":"2024021913361995100_btaa477-B22","first-page":"33","author":"LaPierre","year":"2016"},{"key":"2024021913361995100_btaa477-B23","first-page":"1188","author":"Le","year":"2014"},{"key":"2024021913361995100_btaa477-B24","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1038\/nature12506","article-title":"Richness of human gut microbiome correlates with metabolic markers","volume":"500","author":"Le Chatelier","year":"2013","journal-title":"Nature"},{"key":"2024021913361995100_btaa477-B25","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1186\/s13059-017-1299-7","article-title":"Comprehensive benchmarking and ensemble approaches for metagenomic classifiers","volume":"18.1","author":"McIntyre","year":"2017","journal-title":"Genome Biol"},{"key":"2024021913361995100_btaa477-B26","author":"Mikolov","year":"2010"},{"key":"2024021913361995100_btaa477-B27","author":"Mikolov","year":"2013"},{"key":"2024021913361995100_btaa477-B28","author":"Ng","year":"2017"},{"key":"2024021913361995100_btaa477-B29","author":"Nguyen","year":"2017"},{"key":"2024021913361995100_btaa477-B30","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1109\/TASLP.2016.2520371","article-title":"Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval","volume":"24","author":"Palangi","year":"2016","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"2024021913361995100_btaa477-B31","doi-asserted-by":"crossref","first-page":"e1004977","DOI":"10.1371\/journal.pcbi.1004977","article-title":"Machine learning meta-analysis of large metagenomic datasets: tools and biological insights","volume":"12","author":"Pasolli","year":"2016","journal-title":"PLoS Comput. Biol"},{"key":"2024021913361995100_btaa477-B32","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2024021913361995100_btaa477-B33","author":"Perez","year":"2017"},{"key":"2024021913361995100_btaa477-B34","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"Qin","year":"2010","journal-title":"Nature"},{"key":"2024021913361995100_btaa477-B35","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1038\/nature11450","article-title":"A metagenome-wide association study of gut microbiota in type 2 diabetes","volume":"490","author":"Qin","year":"2012","journal-title":"Nature"},{"key":"2024021913361995100_btaa477-B36","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature13568","article-title":"Alterations of the human gut microbiome in liver cirrhosis","volume":"513","author":"Qin","year":"2014","journal-title":"Nature"},{"key":"2024021913361995100_btaa477-B37","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1038\/nbt.3935","article-title":"Shotgun metagenomics, from sampling to analysis","volume":"35","author":"Quince","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2024021913361995100_btaa477-B38","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139058452","volume-title":"Mining of massive datasets","author":"Rajaraman","year":"2011"},{"key":"2024021913361995100_btaa477-B39","author":"Rahman","year":"2018"},{"key":"2024021913361995100_btaa477-B40","author":"Rahman","year":"2017"},{"key":"2024021913361995100_btaa477-B41","doi-asserted-by":"crossref","first-page":"1740006. World Scientific","DOI":"10.1142\/S0219720017400066","article-title":"Metagenome sequence clustering with hash-based canopies","volume":"15","author":"Rahman","year":"2017","journal-title":"J. Bioinf. Comput. Biol"},{"key":"2024021913361995100_btaa477-B42","author":"Ruckle","year":"2018"},{"key":"2024021913361995100_btaa477-B43","doi-asserted-by":"crossref","first-page":"1782","DOI":"10.1053\/j.gastro.2011.06.072","article-title":"Gastrointestinal microbiome signatures of pediatric patients with irritable bowel syndrome","volume":"141","author":"Saulnier","year":"2011","journal-title":"Gastroenterology"},{"key":"2024021913361995100_btaa477-B44","first-page":"1177","author":"Sculley","year":"2010"},{"key":"2024021913361995100_btaa477-B45","author":"Simonyan","year":"2014"},{"key":"2024021913361995100_btaa477-B46","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2024021913361995100_btaa477-B47","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1038\/nmeth.3589","article-title":"MetaPhlAn2 for enhanced metagenomic taxonomic profiling","volume":"12","author":"Truong","year":"2015","journal-title":"Nat. Methods"},{"key":"2024021913361995100_btaa477-B48","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1038\/nature06244","article-title":"The human microbiome project","volume":"449","author":"Turnbaugh","year":"2007","journal-title":"Nature"},{"key":"2024021913361995100_btaa477-B49","first-page":"5998","author":"Vaswani","year":"2017"},{"key":"2024021913361995100_btaa477-B50","first-page":"81","article-title":"Unculturable bacteria\u2014the uncharacterized organisms that cause oral infections","volume":"95","author":"Wade","year":"2002","journal-title":"J. R. Soc. Med"},{"key":"2024021913361995100_btaa477-B51","doi-asserted-by":"crossref","first-page":"766","DOI":"10.15252\/msb.20145645","article-title":"Potential of fecal microbiota for early-stage detection of colorectal cancer","volume":"10","author":"Zeller","year":"2014","journal-title":"Mol. Syst. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i39\/56702832\/bioinformatics_36_supplement1_i39.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i39\/56702832\/bioinformatics_36_supplement1_i39.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T13:47:10Z","timestamp":1708350430000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/Supplement_1\/i39\/5870478"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,1]]},"references-count":51,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2020,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa477","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,7]]},"published":{"date-parts":[[2020,7,1]]}}}