{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:18:37Z","timestamp":1772173117697,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1010598","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,10,26]],"date-time":"2022-10-26T00:00:00Z","timestamp":1666742400000}}],"reference-count":41,"publisher":"Public Library of Science (PLoS)","issue":"10","license":[{"start":{"date-parts":[[2022,10,14]],"date-time":"2022-10-14T00:00:00Z","timestamp":1665705600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["R01-AI087520"],"award-info":[{"award-number":["R01-AI087520"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["R01-AI135946"],"award-info":[{"award-number":["R01-AI135946"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["R01-AI152703"],"award-info":[{"award-number":["R01-AI152703"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Pathogen genomic sequence data are increasingly made available for epidemiological monitoring. A main interest is to identify and assess the potential of infectious disease outbreaks. While popular methods to analyze sequence data often involve phylogenetic tree inference, they are vulnerable to errors from recombination and impose a high computational cost, making it difficult to obtain real-time results when the number of sequences is in or above the thousands.<\/jats:p>\n                  <jats:p>\n                    Here, we propose an alternative strategy to outbreak detection using genomic data based on deep learning methods developed for image classification. The key idea is to use a pairwise genetic distance matrix calculated from viral sequences as an image, and develop convolutional neutral network (CNN) models to classify areas of the images that show signatures of active outbreak, leading to identification of subsets of sequences taken from an active outbreak. We showed that our method is efficient in finding HIV-1 outbreaks with\n                    <jats:italic>R<\/jats:italic>\n                    <jats:sub>0<\/jats:sub>\n                    \u2265 2.5, and overall a specificity exceeding 98% and sensitivity better than 92%. We validated our approach using data from HIV-1 CRF01 in Europe, containing both endemic sequences and a well-known dual outbreak in intravenous drug users. Our model accurately identified known outbreak sequences in the background of slower spreading HIV. Importantly, we detected both outbreaks early on, before they were over, implying that had this method been applied in real-time as data became available, one would have been able to intervene and possibly prevent the extent of these outbreaks. This approach is scalable to processing hundreds of thousands of sequences, making it useful for current and future real-time epidemiological investigations, including public health monitoring using large databases and especially for rapid outbreak identification.\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1010598","type":"journal-article","created":{"date-parts":[[2022,10,14]],"date-time":"2022-10-14T14:33:38Z","timestamp":1665758018000},"page":"e1010598","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":15,"title":["A deep learning approach to real-time HIV outbreak detection using genetic data"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0340-488X","authenticated-orcid":true,"given":"Michael D.","family":"Kupperman","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8160-2588","authenticated-orcid":true,"given":"Thomas","family":"Leitner","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5307-8934","authenticated-orcid":true,"given":"Ruian","family":"Ke","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,10,14]]},"reference":[{"issue":"2","key":"pcbi.1010598.ref001","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/S1473-3099(18)30647-9","article-title":"Global and regional molecular epidemiology of HIV-1, 1990\u20132015: a systematic review, global survey, and trend analysis","volume":"19","author":"J Hemelaar","year":"2019","journal-title":"The Lancet Infectious Diseases"},{"issue":"3","key":"pcbi.1010598.ref002","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1093\/sysbio\/syq010","article-title":"New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0","volume":"59","author":"S Guindon","year":"2010","journal-title":"Systematic Biology"},{"issue":"9","key":"pcbi.1010598.ref003","doi-asserted-by":"crossref","first-page":"1312","DOI":"10.1093\/bioinformatics\/btu033","article-title":"RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies","volume":"30","author":"A Stamatakis","year":"2014","journal-title":"Bioinformatics"},{"issue":"5","key":"pcbi.1010598.ref004","doi-asserted-by":"crossref","first-page":"1530","DOI":"10.1093\/molbev\/msaa015","article-title":"IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era","volume":"37","author":"BQ Minh","year":"2020","journal-title":"Molecular Biology and Evolution"},{"issue":"3","key":"pcbi.1010598.ref005","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2\u2014Approximately Maximum-Likelihood Trees for Large Alignments","volume":"5","author":"MN Price","year":"2010","journal-title":"PLOS ONE"},{"issue":"4","key":"pcbi.1010598.ref006","doi-asserted-by":"crossref","first-page":"e1003570","DOI":"10.1371\/journal.pcbi.1003570","article-title":"Phylodynamic Inference for Structured Epidemiological Models","volume":"10","author":"DA Rasmussen","year":"2014","journal-title":"PLoS Computational Biology"},{"key":"pcbi.1010598.ref007","unstructured":"Leitner T, Romero-Severson E. Phylogenetic patterns recover known HIV epidemiological relationships and reveal common transmission of multiple variants; 2018. Available from: https:\/\/www.nature.com\/articles\/s41564-018-0204-9."},{"issue":"1","key":"pcbi.1010598.ref008","doi-asserted-by":"crossref","first-page":"e1005316","DOI":"10.1371\/journal.pcbi.1005316","article-title":"Inference of Transmission Network Structure from HIV Phylogenetic Trees","volume":"13","author":"F Giardina","year":"2017","journal-title":"PLoS Computational Biology"},{"issue":"4","key":"pcbi.1010598.ref009","first-page":"997","article-title":"Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks","volume":"34","author":"X Didelot","year":"2017","journal-title":"Molecular Biology and Evolution"},{"issue":"3","key":"pcbi.1010598.ref010","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1093\/molbev\/msx304","article-title":"PHYLOSCANNER: Inferring transmission from within- and between-host pathogen genetic diversity","volume":"35","author":"C Wymant","year":"2018","journal-title":"Molecular Biology and Evolution"},{"issue":"1","key":"pcbi.1010598.ref011","doi-asserted-by":"crossref","first-page":"vey016","DOI":"10.1093\/ve\/vey016","article-title":"Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10","volume":"4","author":"MA Suchard","year":"2018","journal-title":"Virus Evolution"},{"issue":"7","key":"pcbi.1010598.ref012","doi-asserted-by":"crossref","first-page":"1812","DOI":"10.1093\/molbev\/msy016","article-title":"HIV-TRACE (TRAnsmission Cluster Engine): a Tool for Large Scale Molecular Epidemiology of HIV-1 and Other Rapidly Evolving Pathogens","volume":"35","author":"SL Kosakovsky Pond","year":"2018","journal-title":"Molecular Biology and Evolution"},{"issue":"3","key":"pcbi.1010598.ref013","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1089\/aid.2016.0205","article-title":"Identifying Transmission Clusters with Cluster Picker and HIV-TRACE","volume":"33","author":"R Rose","year":"2017","journal-title":"AIDS Research and Human Retroviruses"},{"issue":"5","key":"pcbi.1010598.ref014","doi-asserted-by":"crossref","DOI":"10.1097\/QAI.0000000000001856","article-title":"Identifying Clusters of Recent and Rapid HIV Transmission Through Analysis of Molecular Surveillance Data","volume":"79","author":"AM Oster","year":"2018","journal-title":"JAIDS Journal of Acquired Immune Deficiency Syndromes"},{"issue":"3","key":"pcbi.1010598.ref015","doi-asserted-by":"crossref","DOI":"10.1097\/QAI.0000000000002448","article-title":"Geographic Distribution of HIV Transmission Networks in the United States","volume":"85","author":"AR Board","year":"2020","journal-title":"JAIDS Journal of Acquired Immune Deficiency Syndromes"},{"issue":"4","key":"pcbi.1010598.ref016","doi-asserted-by":"crossref","DOI":"10.1136\/bmjopen-2021-060184","article-title":"Beyond HIV outbreaks: protocol, rationale and implementation of a prospective study quantifying the benefit of incorporating viral sequence clustering analysis into routine public health interventions","volume":"12","author":"JA Steingrimsson","year":"2022","journal-title":"BMJ Open"},{"issue":"4","key":"pcbi.1010598.ref017","doi-asserted-by":"crossref","DOI":"10.1097\/QAI.0000000000000809","article-title":"Using Molecular HIV Surveillance Data to Understand Transmission Between Subpopulations in the United States","volume":"70","author":"AM Oster","year":"2015","journal-title":"JAIDS Journal of Acquired Immune Deficiency Syndromes"},{"issue":"4","key":"pcbi.1010598.ref018","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation Applied to Handwritten Zip Code Recognition","volume":"1","author":"Y LeCun","year":"1989","journal-title":"Neural Computation"},{"key":"pcbi.1010598.ref019","unstructured":"Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in Neural Information Processing Systems. vol. 25. Curran Associates, Inc.; 2012.Available from: https:\/\/proceedings.neurips.cc\/paper\/2012\/file\/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf."},{"issue":"7553","key":"pcbi.1010598.ref020","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"Y Lecun","year":"2015","journal-title":"Nature"},{"key":"pcbi.1010598.ref021","first-page":"1","article-title":"A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects","author":"Z Li","year":"2021","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"6088","key":"pcbi.1010598.ref022","doi-asserted-by":"crossref","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"DE Rumelhart","year":"1986","journal-title":"Nature"},{"issue":"5","key":"pcbi.1010598.ref023","doi-asserted-by":"crossref","first-page":"884","DOI":"10.1093\/sysbio\/syaa009","article-title":"Identification of Hidden Population Structure in Time-Scaled Phylogenies","volume":"69","author":"EM Volz","year":"2020","journal-title":"Systematic Biology"},{"key":"pcbi.1010598.ref024","unstructured":"M\u00fcllner D. Modern hierarchical, agglomerative clustering algorithms; 2011."},{"key":"pcbi.1010598.ref025","doi-asserted-by":"crossref","unstructured":"Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society; 2014. p. 580\u2013587.","DOI":"10.1109\/CVPR.2014.81"},{"issue":"2","key":"pcbi.1010598.ref026","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1016\/j.epidem.2012.04.002","article-title":"Agent-based and phylogenetic analyses reveal how HIV-1 moves between risk groups: Injecting drug users sustain the heterosexual epidemic in Latvia","volume":"4","author":"F Graw","year":"2012","journal-title":"Epidemics"},{"issue":"5","key":"pcbi.1010598.ref027","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1086\/590501","article-title":"HIV-1 Transmission, by Stage of Infection","volume":"198","author":"TD Hollingsworth","year":"2008","journal-title":"The Journal of Infectious Diseases"},{"issue":"12","key":"pcbi.1010598.ref028","first-page":"1","article-title":"HIV-1 Transmission during Early Infection in Men Who Have Sex with Men: A Phylodynamic Analysis","volume":"10","author":"EM Volz","year":"2013","journal-title":"PLOS Medicine"},{"issue":"6","key":"pcbi.1010598.ref029","doi-asserted-by":"crossref","first-page":"1795","DOI":"10.1093\/ije\/dyz100","article-title":"Getting more from heterogeneous HIV-1 surveillance data in a high immigration country: estimation of incidence and undiagnosed population size using multiple biomarkers","volume":"48","author":"F Giardina","year":"2019","journal-title":"International Journal of Epidemiology"},{"issue":"11","key":"pcbi.1010598.ref030","doi-asserted-by":"crossref","first-page":"1554","DOI":"10.1002\/sim.3570","article-title":"A multistate approach for estimating the incidence of human immunodeficiency virus by using HIV and AIDS French surveillance data","volume":"28","author":"C Sommen","year":"2009","journal-title":"Statistics in Medicine"},{"issue":"19","key":"pcbi.1010598.ref031","doi-asserted-by":"crossref","first-page":"10752","DOI":"10.1073\/pnas.96.19.10752","article-title":"The molecular clock of HIV-1 unveiled through analysis of a known transmission history","volume":"96","author":"T Leitner","year":"1999","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"pcbi.1010598.ref032","unstructured":"Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al.. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems; 2015. Available from: https:\/\/www.tensorflow.org\/."},{"key":"pcbi.1010598.ref033","first-page":"1","article-title":"ADAM: A method for stochastic optimization","author":"DP Kingma","year":"2015","journal-title":"ICLR"},{"issue":"1","key":"pcbi.1010598.ref034","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.ppat.1006000","article-title":"Social and Genetic Networks of HIV-1 Transmission in New York City","volume":"13","author":"JO Wertheim","year":"2017","journal-title":"PLOS Pathogens"},{"issue":"1","key":"pcbi.1010598.ref035","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1128\/JVI.01413-10","article-title":"Dynamics of Two Separate but Linked HIV-1 CRF01_AE Outbreaks among Injection Drug Users in Stockholm, Sweden, and Helsinki, Finland","volume":"85","author":"H Skar","year":"2011","journal-title":"Journal of Virology"},{"key":"pcbi.1010598.ref036","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1093\/bioinformatics\/bty633","article-title":"ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R","volume":"35","author":"E Paradis","year":"2019","journal-title":"Bioinformatics"},{"key":"pcbi.1010598.ref037","unstructured":"R Core Team. R: A Language and Environment for Statistical Computing; 2020. Available from: https:\/\/www.R-project.org\/."},{"issue":"4","key":"pcbi.1010598.ref038","doi-asserted-by":"crossref","first-page":"772","DOI":"10.1093\/molbev\/mst010","article-title":"MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability","volume":"30","author":"K Katoh","year":"2013","journal-title":"Molecular Biology and Evolution"},{"key":"pcbi.1010598.ref039","doi-asserted-by":"crossref","unstructured":"Foley BT, Korber BTM, Leitner TK, Apetrei C, Hahn B, Mizrachi I, et al. HIV Sequence Compendium 2018. 2018.","DOI":"10.2172\/1458915"},{"key":"pcbi.1010598.ref040","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"F Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"issue":"5971","key":"pcbi.1010598.ref041","doi-asserted-by":"crossref","first-page":"1376","DOI":"10.1126\/science.1182300","article-title":"Toward Extracting All Phylogenetic Information from Matrices of Evolutionary Distances","volume":"327","author":"S Roch","year":"2010","journal-title":"Science"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1010598","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,10,26]],"date-time":"2022-10-26T00:00:00Z","timestamp":1666742400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010598","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,26]],"date-time":"2022-10-26T14:32:00Z","timestamp":1666794720000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010598"}},"subtitle":[],"editor":[{"given":"Sergei L.","family":"Kosakovsky Pond","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,10,14]]},"references-count":41,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2022,10,14]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010598","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.12.17.473204","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,14]]}}}