{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T01:42:23Z","timestamp":1772502143796,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2020,10,16]],"date-time":"2020-10-16T00:00:00Z","timestamp":1602806400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11871061"],"award-info":[{"award-number":["11871061"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Collaborative Research project for Overseas Scholars"},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61828203"],"award-info":[{"award-number":["61828203"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Infection with strains of different subtypes and the subsequent crossover reading between the two strands of genomic RNAs by host cells\u2019 reverse transcriptase are the main causes of the vast HIV-1 sequence diversity. Such inter-subtype genomic recombinants can become circulating recombinant forms (CRFs) after widespread transmissions in a population. Complete prediction of all the subtype sources of a CRF strain is a complicated machine learning problem. It is also difficult to understand whether a strain is an emerging new subtype and if so, how to accurately identify the new components of the genetic source.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce a multi-label learning algorithm for the complete prediction of multiple sources of a CRF sequence as well as the prediction of its chronological number. The prediction is strengthened by a voting of various multi-label learning methods to avoid biased decisions. In our steps, frequency and position features of the sequences are both extracted to capture signature patterns of pure subtypes and CRFs. The method was applied to 7185 HIV-1 sequences, comprising 5530 pure subtype sequences and 1655 CRF sequences. Results have demonstrated that the method can achieve very high accuracy (reaching 99%) in the prediction of the complete set of labels of HIV-1 recombinant forms. A few wrong predictions are actually incomplete predictions, very close to the complete set of genuine labels.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/Runbin-tang\/The-source-of-HIV-CRFs-prediction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa887","type":"journal-article","created":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T12:38:19Z","timestamp":1601469499000},"page":"750-758","source":"Crossref","is-referenced-by-count":7,"title":["Genetic source completeness of HIV-1 circulating recombinant forms (CRFs) predicted by multi-label learning"],"prefix":"10.1093","volume":"37","author":[{"given":"Runbin","family":"Tang","sequence":"first","affiliation":[{"name":"Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University , Hunan 411105, China"},{"name":"Advanced Analytics Institute, University of Technology Sydney , Sydney, NSW 2007, Australia"}]},{"given":"Zuguo","family":"Yu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University , Hunan 411105, China"},{"name":"School of Electrical Engineering and Computer Science, Queensland University of Technology , Brisbane, QLD 4001, Australia"}]},{"given":"Yuanlin","family":"Ma","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University , Hunan 411105, China"}]},{"given":"Yaoqun","family":"Wu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University , Hunan 411105, China"}]},{"given":"Yi-Ping","family":"Phoebe Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Technology, La Trobe University , Melbourne, VIC 3086, Australia"}]},{"given":"Limsoon","family":"Wong","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Singapore , Singapore 117417, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1833-7413","authenticated-orcid":false,"given":"Jinyan","family":"Li","sequence":"additional","affiliation":[{"name":"Advanced Analytics Institute, University of Technology Sydney , Sydney, NSW 2007, Australia"}]}],"member":"286","published-online":{"date-parts":[[2020,10,16]]},"reference":[{"key":"2023051705201698500_btaa887-B1","first-page":"847","author":"Benites","year":"2015"},{"key":"2023051705201698500_btaa887-B2","doi-asserted-by":"crossref","first-page":"5155","DOI":"10.1073\/pnas.83.14.5155","article-title":"A measure of the similarity of sets of sequences not requiring sequence alignment","volume":"83","author":"Blaisdell","year":"1986","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051705201698500_btaa887-B3","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1038\/srep08543","article-title":"Reliable genotypic tropism tests for the major HIV-1 subtypes","volume":"5","author":"Cashin","year":"2015","journal-title":"Sci. Rep"},{"key":"2023051705201698500_btaa887-B4","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1038\/421217a","article-title":"Microbial phylogenomics: branching out","volume":"421","author":"Charlebois","year":"2003","journal-title":"Nature"},{"key":"2023051705201698500_btaa887-B5","doi-asserted-by":"crossref","first-page":"3797","DOI":"10.1093\/bioinformatics\/bti607","article-title":"An automated genotyping system for analysis of HIV-1 and other microbial sequences","volume":"21","author":"De Oliveira","year":"2005","journal-title":"Bioinformatics"},{"key":"2023051705201698500_btaa887-B6","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1016\/j.jtbi.2012.10.010","article-title":"A simple k-word interval method for phylogenetic analysis of DNA sequences","volume":"317","author":"Ding","year":"2013","journal-title":"J. Theor. Biol"},{"key":"2023051705201698500_btaa887-B7","doi-asserted-by":"crossref","first-page":"2827","DOI":"10.1128\/JCM.00656-17","article-title":"Comparative evaluation of subtyping tools for surveillance of newly emerging HIV-1 strains","volume":"55","author":"Fabeni","year":"2017","journal-title":"J. Clin. Microbiol"},{"key":"2023051705201698500_btaa887-B8","author":"Foley","year":"2018"},{"key":"2023051705201698500_btaa887-B9","doi-asserted-by":"crossref","first-page":"i556","DOI":"10.1093\/bioinformatics\/btu464","article-title":"Drug susceptibility prediction against a panel of drugs using kernelized Bayesian multitask learning","volume":"30","author":"G\u00f6nen","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051705201698500_btaa887-B10","doi-asserted-by":"crossref","first-page":"1946","DOI":"10.1093\/bioinformatics\/btt331","article-title":"Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction","volume":"29","author":"Heider","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051705201698500_btaa887-B11","doi-asserted-by":"crossref","first-page":"1556","DOI":"10.1073\/pnas.87.4.1556","article-title":"Genetic consequences of packaging two RNA genomes in one retroviral particle: pseudodiploidy and high rate of genetic recombination","volume":"87","author":"Hu","year":"1990","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051705201698500_btaa887-B12","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1097\/00002030-200403260-00002","article-title":"HIV-1 pol gene variation is sufficient for reconstruction of transmissions in the era of antiretroviral therapy","volume":"18","author":"Hue","year":"2004","journal-title":"AIDS (London, England)"},{"key":"2023051705201698500_btaa887-B13","doi-asserted-by":"crossref","first-page":"e0119815","DOI":"10.1371\/journal.pone.0119815","article-title":"Mapping the space of genomic signatures","volume":"10","author":"Kari","year":"2015","journal-title":"PLoS One"},{"key":"2023051705201698500_btaa887-B14","doi-asserted-by":"crossref","first-page":"1870","DOI":"10.1093\/molbev\/msw054","article-title":"Mega7: molecular evolutionary genetics analysis version 7.0 for bigger datasets","volume":"33","author":"Kumar","year":"2016","journal-title":"Mol. Biol. Evol"},{"key":"2023051705201698500_btaa887-B15","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051705201698500_btaa887-B16","doi-asserted-by":"crossref","first-page":"255","DOI":"10.3390\/e22020255","article-title":"Phylogenetic analysis of HIV-1 genomes based on the position-weighted k-mers method","volume":"22","author":"Ma","year":"2020","journal-title":"Entropy"},{"key":"2023051705201698500_btaa887-B17","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1089\/08892220252781301","article-title":"Identification of a new circulating recombinant form of HIV type 1, CRF11-cpx, involving subtypes A, G, J, and CRF01-AE, in central Africa","volume":"18","author":"Montavon","year":"2002","journal-title":"AIDS Res. Hum. Retroviruses"},{"key":"2023051705201698500_btaa887-B18","doi-asserted-by":"crossref","first-page":"6106","DOI":"10.1073\/pnas.93.12.6106","article-title":"Recombination leads to the rapid emergence of HIV-1 dually resistant mutants under selective drug pressure","volume":"93","author":"Moutouh","year":"1996","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051705201698500_btaa887-B19","doi-asserted-by":"crossref","first-page":"399","DOI":"10.2217\/fvl-2017-0159","article-title":"Molecular evolution methods to study HIV-1 epidemics","volume":"13","author":"Pati\u00f1o-Galindo","year":"2018","journal-title":"Fut. Virol"},{"key":"2023051705201698500_btaa887-B20","doi-asserted-by":"crossref","first-page":"e1000581","DOI":"10.1371\/journal.pcbi.1000581","article-title":"An evolutionary model-based algorithm for accurate phylogenetic breakpoint mapping and subtype prediction in HIV-1","volume":"5","author":"Pond","year":"2009","journal-title":"PLoS Comput. Biol"},{"key":"2023051705201698500_btaa887-B21","first-page":"1","article-title":"Genes and genome of HIV-1","volume":"02","author":"Rajarapu","year":"2014","journal-title":"J. Phylogenet. Evol. Biol"},{"key":"2023051705201698500_btaa887-B22","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1038\/nrg1246","article-title":"The causes and consequences of HIV evolution","volume":"5","author":"Rambaut","year":"2004","journal-title":"Nat. Rev. Genet"},{"key":"2023051705201698500_btaa887-B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-019-5571-y","article-title":"ML-DSP: machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels","volume":"20","author":"Randhawa","year":"2019","journal-title":"BMC Genomics"},{"key":"2023051705201698500_btaa887-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-017-1602-3","article-title":"A machine learning approach for viral genome classification","volume":"18","author":"Remita","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023051705201698500_btaa887-B25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13040-016-0089-1","article-title":"Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification","volume":"9","author":"Riemenschneider","year":"2016","journal-title":"BioData Mining"},{"key":"2023051705201698500_btaa887-B26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/srep24883","article-title":"Genotypic prediction of co-receptor tropism of HIV-1 subtypes A and C","volume":"6","author":"Riemenschneider","year":"2016","journal-title":"Sci. Rep"},{"key":"2023051705201698500_btaa887-B27","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1126\/science.288.5463.55d","article-title":"HIV-1 nomenclature proposal","volume":"288","author":"Robertson","year":"2000","journal-title":"Science"},{"key":"2023051705201698500_btaa887-B28","doi-asserted-by":"crossref","first-page":"e0206409","DOI":"10.1371\/journal.pone.0206409","article-title":"An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes","volume":"13","author":"Solis-Reyes","year":"2018","journal-title":"PLoS One"},{"key":"2023051705201698500_btaa887-B29","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-018-04217-5","article-title":"Tracking HIV-1 recombination to resolve its contribution to HIV-1 evolution in natural infection","volume":"9","author":"Song","year":"2018","journal-title":"Nat. Commun"},{"key":"2023051705201698500_btaa887-B30","author":"Spyromitros","year":"2008"},{"key":"2023051705201698500_btaa887-B31","doi-asserted-by":"crossref","first-page":"e144","DOI":"10.1093\/nar\/gku739","article-title":"COMET: adaptive context-based modeling for ultrafast HIV-1 subtype identification","volume":"42","author":"Struck","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051705201698500_btaa887-B32","article-title":"A scikit-based Python environment for performing multi-label classification","author":"Szyma\u0144ski","year":"2017","journal-title":"ArXiv"},{"key":"2023051705201698500_btaa887-B33","first-page":"1","article-title":"The challenge of HIV-1 subtype diversity origin of HIV and mechanisms of HIV diversity","volume":"15","author":"Taylor","year":"2008","journal-title":"N. Engl. J. Med"},{"key":"2023051705201698500_btaa887-B34","author":"Thomas","year":"2005"},{"key":"2023051705201698500_btaa887-B35","doi-asserted-by":"crossref","first-page":"2214","DOI":"10.1093\/bioinformatics\/btx158","article-title":"DLTree: efficient and accurate phylogeny reconstruction using the dynamical language method","volume":"33","author":"Wu","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051705201698500_btaa887-B36","doi-asserted-by":"crossref","first-page":"1744","DOI":"10.1093\/bioinformatics\/btm248","article-title":"Nucleotide composition string selection in HIV-1 subtyping using whole genomes","volume":"23","author":"Wu","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051705201698500_btaa887-B37","doi-asserted-by":"crossref","DOI":"10.1186\/1742-4690-7-25","article-title":"The role of recombination in the emergence of a complex and dynamic HIV epidemic","volume":"7","author":"Zhang","year":"2010","journal-title":"Retrovirology"},{"key":"2023051705201698500_btaa887-B38","doi-asserted-by":"crossref","first-page":"2038","DOI":"10.1016\/j.patcog.2006.12.019","article-title":"ML-KNN: a lazy learning approach to multi-label learning","volume":"40","author":"Zhang","year":"2007","journal-title":"Pattern Recogn"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa887\/35069156\/btaa887.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/6\/750\/50356505\/btaa887.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/6\/750\/50356505\/btaa887.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T05:21:50Z","timestamp":1684300910000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/6\/750\/5924547"}},"subtitle":[],"editor":[{"given":"Valencia","family":"Alfonso","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,10,16]]},"references-count":38,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,5,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa887","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,3,15]]},"published":{"date-parts":[[2020,10,16]]}}}