{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T15:30:24Z","timestamp":1770996624225,"version":"3.50.1"},"reference-count":64,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,11,10]],"date-time":"2021-11-10T00:00:00Z","timestamp":1636502400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,11,10]],"date-time":"2021-11-10T00:00:00Z","timestamp":1636502400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Pan African University, Institute of Basic Sciences, Technology and Innovation"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Host population structure is a key determinant of pathogen and infectious disease transmission patterns. Pathogen phylogenetic trees are useful tools to reveal the population structure underlying an epidemic. Determining whether a population is structured or not is useful in informing the type of phylogenetic methods to be used in a given study. We employ tree statistics derived from phylogenetic trees and machine learning classification techniques to reveal an underlying population structure.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>In this paper, we simulate phylogenetic trees from both structured and non-structured host populations. We compute eight statistics for the simulated trees, which are: the number of cherries; Sackin, Colless and total cophenetic indices; ladder length; maximum depth; maximum width, and width-to-depth ratio. Based on the estimated tree statistics, we classify the simulated trees as from either a non-structured or a structured population using the decision tree (DT), K-nearest neighbor (KNN) and support vector machine (SVM). We incorporate the basic reproductive number (<jats:inline-formula><jats:alternatives><jats:tex-math>$$R_0$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:msub>\n                      <mml:mi>R<\/mml:mi>\n                      <mml:mn>0<\/mml:mn>\n                    <\/mml:msub>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula>) in our tree simulation procedure. Sensitivity analysis is done to investigate whether the classifiers are robust to different choice of model parameters and to size of trees. Cross-validated results for area under the curve (AUC) for receiver operating characteristic (ROC) curves yield mean values of over 0.9 for most of the classification models.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>Our classification procedure distinguishes well between trees from structured and non-structured populations using the classifiers, the two-sample Kolmogorov-Smirnov, Cucconi and Podgor-Gastwirth tests and the box plots. SVM models were more robust to changes in model parameters and tree size compared to KNN and DT classifiers. Our classification procedure was applied to real -world data and the structured population was revealed with high accuracy of <jats:inline-formula><jats:alternatives><jats:tex-math>$$92.3\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:mrow>\n                      <mml:mn>92.3<\/mml:mn>\n                      <mml:mo>%<\/mml:mo>\n                    <\/mml:mrow>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula> using SVM-polynomial classifier.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-021-04465-1","type":"journal-article","created":{"date-parts":[[2021,11,10]],"date-time":"2021-11-10T11:04:48Z","timestamp":1636542288000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Employing phylogenetic tree shape statistics to resolve the underlying host population structure"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8607-6636","authenticated-orcid":false,"given":"Hassan W.","family":"Kayondo","sequence":"first","affiliation":[]},{"given":"Alfred","family":"Ssekagiri","sequence":"additional","affiliation":[]},{"given":"Grace","family":"Nabakooza","sequence":"additional","affiliation":[]},{"given":"Nicholas","family":"Bbosa","sequence":"additional","affiliation":[]},{"given":"Deogratius","family":"Ssemwanga","sequence":"additional","affiliation":[]},{"given":"Pontiano","family":"Kaleebu","sequence":"additional","affiliation":[]},{"given":"Samuel","family":"Mwalili","sequence":"additional","affiliation":[]},{"given":"John M.","family":"Mango","sequence":"additional","affiliation":[]},{"given":"Andrew J.","family":"Leigh Brown","sequence":"additional","affiliation":[]},{"given":"Roberto A.","family":"Saenz","sequence":"additional","affiliation":[]},{"given":"Ronald","family":"Galiwango","sequence":"additional","affiliation":[]},{"given":"John M.","family":"Kitayimbwa","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,11,10]]},"reference":[{"issue":"94","key":"4465_CR1","doi-asserted-by":"publisher","first-page":"20131106","DOI":"10.1098\/rsif.2013.1106","volume":"11","author":"D K\u00fchnert","year":"2014","unstructured":"K\u00fchnert D, Stadler T, Vaughan TG, Drummond AJ. Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model. J R Soc Interface. 2014;11(94):20131106.","journal-title":"J R Soc Interface"},{"issue":"6","key":"4465_CR2","doi-asserted-by":"publisher","first-page":"1203","DOI":"10.1111\/jeb.12139","volume":"26","author":"T Stadler","year":"2013","unstructured":"Stadler T. Recovering speciation and extinction dynamics based on phylogenies. J Evol Biol. 2013;26(6):1203\u201319.","journal-title":"J Evol Biol"},{"key":"4465_CR3","doi-asserted-by":"crossref","unstructured":"Maddison WP, Midford PE, Otto SP. Estimating a binary character\u2019s effect on speciation and extinction. Syst Biol. 2007;56(5):701\u201310.","DOI":"10.1080\/10635150701607033"},{"issue":"1","key":"4465_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/srep29890","volume":"6","author":"P Duda","year":"2016","unstructured":"Duda P, Zrzav\u1ef3 J. Human population history revealed by a supertree approach. Sci Rep. 2016;6(1):1\u201310.","journal-title":"Sci Rep"},{"issue":"3","key":"4465_CR5","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1016\/j.jtbi.2010.09.010","volume":"267","author":"T Stadler","year":"2010","unstructured":"Stadler T. Sampling-through-time in birth-death trees. J Theor Biol. 2010;267(3):396\u2013404.","journal-title":"J Theor Biol"},{"issue":"1","key":"4465_CR6","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1016\/j.jtbi.2009.07.018","volume":"261","author":"T Stadler","year":"2009","unstructured":"Stadler T. On incomplete sampling under birth-death models and connections to the sampling-based coalescent. J Theor Biol. 2009;261(1):58\u201366.","journal-title":"J Theor Biol"},{"issue":"1","key":"4465_CR7","first-page":"19","volume":"68","author":"GR Jones","year":"2019","unstructured":"Jones GR. Divergence estimation in the presence of incomplete lineage sorting and migration. Syst Biol. 2019;68(1):19\u201331.","journal-title":"Syst Biol"},{"issue":"4","key":"4465_CR8","doi-asserted-by":"publisher","first-page":"769","DOI":"10.1016\/j.jtbi.2008.04.005","volume":"253","author":"T Gernhard","year":"2008","unstructured":"Gernhard T. The conditioned reconstructed process. J Theor Biol. 2008;253(4):769\u201378.","journal-title":"J Theor Biol"},{"issue":"9","key":"4465_CR9","doi-asserted-by":"publisher","first-page":"2577","DOI":"10.1093\/molbev\/msr095","volume":"28","author":"S H\u00f6hna","year":"2011","unstructured":"H\u00f6hna S, Stadler T, Ronquist F, Britton T. Inferring speciation and extinction rates under different sampling schemes. Mol Biol Evol. 2011;28(9):2577\u201389.","journal-title":"Mol Biol Evol"},{"issue":"4","key":"4465_CR10","doi-asserted-by":"publisher","first-page":"465","DOI":"10.1093\/sysbio\/syq026","volume":"59","author":"K Hartmann","year":"2010","unstructured":"Hartmann K, Wong D, Stadler T. Sampling trees from evolutionary models. Syst Biol. 2010;59(4):465\u201376.","journal-title":"Syst Biol"},{"issue":"1","key":"4465_CR11","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1534\/genetics.111.134627","volume":"190","author":"EM Volz","year":"2012","unstructured":"Volz EM. Complex population dynamics and the coalescent under neutrality. Genetics. 2012;190(1):187\u2013201.","journal-title":"Genetics"},{"issue":"11","key":"4465_CR12","doi-asserted-by":"publisher","first-page":"e1003913","DOI":"10.1371\/journal.pcbi.1003913","volume":"10","author":"V Boskova","year":"2014","unstructured":"Boskova V, Bonhoeffer S, Stadler T. Inference of epidemiological dynamics based on simulated phylogenies using birth-death and coalescent models. PLoS Comput Biol. 2014;10(11):e1003913.","journal-title":"PLoS Comput Biol"},{"issue":"1614","key":"4465_CR13","doi-asserted-by":"publisher","first-page":"20120314","DOI":"10.1098\/rstb.2012.0314","volume":"368","author":"B Dearlove","year":"2013","unstructured":"Dearlove B, Wilson DJ. Coalescent inference for infectious disease: meta-analysis of hepatitis C. Philos Trans R Soc B Biol Sci. 2013;368(1614):20120314.","journal-title":"Philos Trans R Soc B Biol Sci"},{"key":"4465_CR14","doi-asserted-by":"crossref","unstructured":"Kendall DG, et al. On the generalized \u201cbirth-and-death\u201d process. Ann Math Stat. 1948;19(1):1\u201315.","DOI":"10.1214\/aoms\/1177730285"},{"issue":"5","key":"4465_CR15","doi-asserted-by":"publisher","first-page":"676","DOI":"10.1093\/sysbio\/syr029","volume":"60","author":"T Stadler","year":"2011","unstructured":"Stadler T. Simulating trees with a fixed number of extant species. Syst Biol. 2011;60(5):676\u201384.","journal-title":"Syst Biol"},{"issue":"11","key":"4465_CR16","doi-asserted-by":"publisher","first-page":"1367","DOI":"10.1093\/bioinformatics\/btt153","volume":"29","author":"S H\u00f6hna","year":"2013","unstructured":"H\u00f6hna S. Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes. Bioinformatics. 2013;29(11):1367\u201374.","journal-title":"Bioinformatics"},{"issue":"1614","key":"4465_CR17","doi-asserted-by":"publisher","first-page":"20120198","DOI":"10.1098\/rstb.2012.0198","volume":"368","author":"T Stadler","year":"2013","unstructured":"Stadler T, Bonhoeffer S. Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods. Philos Trans R Soc B Biol Sci. 2013;368(1614):20120198.","journal-title":"Philos Trans R Soc B Biol Sci"},{"issue":"1614","key":"4465_CR18","doi-asserted-by":"publisher","first-page":"20120208","DOI":"10.1098\/rstb.2012.0208","volume":"368","author":"SD Frost","year":"2013","unstructured":"Frost SD, Volz EM. Modelling tree shape and structure in viral phylodynamics. Philos Trans R Soc B Biol Sci. 2013;368(1614):20120208.","journal-title":"Philos Trans R Soc B Biol Sci"},{"issue":"12","key":"4465_CR19","doi-asserted-by":"publisher","first-page":"e1003919","DOI":"10.1371\/journal.pcbi.1003919","volume":"10","author":"A Gavryushkina","year":"2014","unstructured":"Gavryushkina A, Welch D, Stadler T, Drummond AJ. Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration. PLoS Comput Biol. 2014;10(12):e1003919.","journal-title":"PLoS Comput Biol"},{"issue":"2","key":"4465_CR20","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1016\/j.epidem.2012.04.002","volume":"4","author":"F Graw","year":"2012","unstructured":"Graw F, Leitner T, Ribeiro RM. Agent-based and phylogenetic analyses reveal how HIV-1 moves between risk groups: injecting drug users sustain the heterosexual epidemic in Latvia. Epidemics. 2012;4(2):104\u201316.","journal-title":"Epidemics"},{"issue":"8","key":"4465_CR21","doi-asserted-by":"publisher","first-page":"2102","DOI":"10.1093\/molbev\/msw064","volume":"33","author":"D K\u00fchnert","year":"2016","unstructured":"K\u00fchnert D, Stadler T, Vaughan TG, Drummond AJ. Phylodynamics with migration: a computational framework to quantify population structure from genomic data. Mol Biol Evol. 2016;33(8):2102\u201316.","journal-title":"Mol Biol Evol"},{"key":"4465_CR22","doi-asserted-by":"crossref","unstructured":"De\u00a0Bruyn A, Martin DP, Lefeuvre P. Phylogenetic reconstruction methods: an overview. In: Molecular Plant Taxonomy. Springer; 2014. p. 257\u2013277.","DOI":"10.1007\/978-1-62703-767-9_13"},{"issue":"4","key":"4465_CR23","doi-asserted-by":"publisher","first-page":"561","DOI":"10.1111\/j.1365-313X.2005.02611.x","volume":"45","author":"C Jill Harrison","year":"2006","unstructured":"Jill Harrison C, Langdale JA. A step by step guide to phylogeny reconstruction. Plant J. 2006;45(4):561\u201372.","journal-title":"Plant J"},{"issue":"4","key":"4465_CR24","doi-asserted-by":"publisher","first-page":"2195","DOI":"10.1214\/105051606000000547","volume":"16","author":"MG Blum","year":"2006","unstructured":"Blum MG, Fran\u00e7ois O, Janson S, et al. The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance. Ann Appl Probab. 2006;16(4):2195\u2013214.","journal-title":"Ann Appl Probab"},{"issue":"1","key":"4465_CR25","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1093\/emph\/eou018","volume":"2014","author":"C Colijn","year":"2014","unstructured":"Colijn C, Gardy J. Phylogenetic tree shapes resolve disease transmission patterns. Evol Med Public Health. 2014;2014(1):96\u2013108.","journal-title":"Evol Med Public Health"},{"issue":"8","key":"4465_CR26","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1038\/nrg2583","volume":"10","author":"OG Pybus","year":"2009","unstructured":"Pybus OG, Rambaut A. Evolutionary analysis of the dynamics of viral infectious disease. Nat Rev Genet. 2009;10(8):540.","journal-title":"Nat Rev Genet"},{"key":"4465_CR27","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1016\/j.tpb.2013.10.002","volume":"90","author":"A Lambert","year":"2013","unstructured":"Lambert A, Stadler T. Birth-death models and coalescent point processes: the shape and probability of reconstructed phylogenies. Theor Popul Biol. 2013;90:113\u201328.","journal-title":"Theor Popul Biol"},{"issue":"4","key":"4465_CR28","doi-asserted-by":"publisher","first-page":"1143","DOI":"10.1017\/jpr.2016.70","volume":"53","author":"G Plazzotta","year":"2016","unstructured":"Plazzotta G, Colijn C. Asymptotic frequency of shapes in supercritical branching trees. J Appl Probab. 2016;53(4):1143\u201355.","journal-title":"J Appl Probab"},{"issue":"7","key":"4465_CR29","doi-asserted-by":"publisher","first-page":"e1004312","DOI":"10.1371\/journal.pcbi.1004312","volume":"11","author":"BL Dearlove","year":"2015","unstructured":"Dearlove BL, Frost SD. Measuring asymmetry in time-stamped phylogenies. PLoS Comput Biol. 2015;11(7):e1004312.","journal-title":"PLoS Comput Biol"},{"issue":"2","key":"4465_CR30","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1016\/j.mbs.2005.03.003","volume":"195","author":"MG Blum","year":"2005","unstructured":"Blum MG, Fran\u00e7ois O. On statistical tests of phylogenetic tree imbalance: the Sackin and other indices revisited. Math Biosci. 2005;195(2):141\u201353.","journal-title":"Math Biosci"},{"issue":"3","key":"4465_CR31","doi-asserted-by":"publisher","first-page":"e1002413","DOI":"10.1371\/journal.pcbi.1002413","volume":"8","author":"GE Leventhal","year":"2012","unstructured":"Leventhal GE, Kouyos R, Stadler T, Von Wyl V, Yerly S, B\u00f6ni J, et al. Inferring epidemic contact structure from phylogenetic trees. PLoS Comput Biol. 2012;8(3):e1002413.","journal-title":"PLoS Comput Biol"},{"issue":"1","key":"4465_CR32","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/S0025-5564(99)00060-7","volume":"164","author":"A McKenzie","year":"2000","unstructured":"McKenzie A, Steel M. Distributions of cherries for two models of trees. Math Biosci. 2000;164(1):81\u201392.","journal-title":"Math Biosci"},{"issue":"1","key":"4465_CR33","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1016\/j.mbs.2012.10.005","volume":"241","author":"A Mir","year":"2013","unstructured":"Mir A, Rossell\u00f3 F, et al. A new balance index for phylogenetic trees. Math Biosci. 2013;241(1):125\u201336.","journal-title":"Math Biosci"},{"issue":"1\u20132","key":"4465_CR34","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/S0025-5564(02)00108-6","volume":"180","author":"P Van den Driessche","year":"2002","unstructured":"Van den Driessche P, Watmough J. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Math Biosci. 2002;180(1\u20132):29\u201348.","journal-title":"Math Biosci"},{"issue":"11","key":"4465_CR35","doi-asserted-by":"publisher","first-page":"e1006546","DOI":"10.1371\/journal.pcbi.1006546","volume":"14","author":"EM Volz","year":"2018","unstructured":"Volz EM, Siveroni I. Bayesian phylodynamic inference with complex models. PLoS Comput Biol. 2018;14(11):e1006546.","journal-title":"PLoS Comput Biol"},{"issue":"6","key":"4465_CR36","doi-asserted-by":"publisher","first-page":"1635","DOI":"10.1093\/molbev\/msw046","volume":"33","author":"J Huerta-Cepas","year":"2016","unstructured":"Huerta-Cepas J, Serra F, Bork P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol. 2016;33(6):1635\u20138.","journal-title":"Mol Biol Evol"},{"issue":"1","key":"4465_CR37","doi-asserted-by":"publisher","first-page":"347","DOI":"10.1093\/molbev\/msr217","volume":"29","author":"T Stadler","year":"2012","unstructured":"Stadler T, Kouyos R, von Wyl V, Yerly S, B\u00f6ni J, B\u00fcrgisser P, et al. Estimating the basic reproductive number from viral sequence data. Mol Biol Evol. 2012;29(1):347\u201357.","journal-title":"Mol Biol Evol"},{"issue":"1","key":"4465_CR38","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1177\/096228029300200103","volume":"2","author":"K Dietz","year":"1993","unstructured":"Dietz K. The estimation of the basic reproduction number for infectious diseases. Stat Methods Med Res. 1993;2(1):23\u201341.","journal-title":"Stat Methods Med Res"},{"issue":"47","key":"4465_CR39","doi-asserted-by":"publisher","first-page":"873","DOI":"10.1098\/rsif.2009.0386","volume":"7","author":"O Diekmann","year":"2009","unstructured":"Diekmann O, Heesterbeek J, Roberts MG. The construction of next-generation matrices for compartmental epidemic models. J R Soc Interface. 2009;7(47):873\u201385.","journal-title":"J R Soc Interface"},{"issue":"5","key":"4465_CR40","doi-asserted-by":"publisher","first-page":"1111","DOI":"10.1007\/s00285-012-0581-2","volume":"67","author":"JM Kitayimbwa","year":"2013","unstructured":"Kitayimbwa JM, Mugisha JY, Saenz RA. The role of backward mutations on the within-host dynamics of HIV-1. J Math Biol. 2013;67(5):1111\u201339.","journal-title":"J Math Biol"},{"key":"4465_CR41","unstructured":"UNAIDS. Country factsheets. https:\/\/www.unaids.org\/en\/regionscountries\/ countries\/uganda; 2019."},{"issue":"8","key":"4465_CR42","doi-asserted-by":"publisher","first-page":"e70770","DOI":"10.1371\/journal.pone.0070770","volume":"8","author":"A Opio","year":"2013","unstructured":"Opio A, Muyonga M, Mulumba N. HIV infection in fishing communities of Lake Victoria Basin of Uganda-a cross-sectional sero-behavioral survey. PLoS ONE. 2013;8(8):e70770.","journal-title":"PLoS ONE"},{"issue":"1","key":"4465_CR43","doi-asserted-by":"publisher","first-page":"e83778","DOI":"10.1371\/journal.pone.0083778","volume":"9","author":"RN Nsubuga","year":"2014","unstructured":"Nsubuga RN, White RG, Mayanja BN, Shafer LA. Estimation of the HIV basic reproduction number in rural South West Uganda: 1991\u20132008. PLoS ONE. 2014;9(1):e83778.","journal-title":"PLoS ONE"},{"issue":"3","key":"4465_CR44","doi-asserted-by":"publisher","first-page":"331","DOI":"10.3390\/v12030331","volume":"12","author":"N Bbosa","year":"2020","unstructured":"Bbosa N, Ssemwanga D, Ssekagiri A, Xi X, Mayanja Y, Bahemuka U, et al. Phylogenetic and demographic characterization of directed HIV-1 transmission using deep sequences from high-risk and general population cohorts\/groups in Uganda. Viruses. 2020;12(3):331.","journal-title":"Viruses"},{"issue":"6","key":"4465_CR45","doi-asserted-by":"publisher","first-page":"1818","DOI":"10.2307\/2410033","volume":"46","author":"SB Heard","year":"1992","unstructured":"Heard SB. Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees. Evolution. 1992;46(6):1818\u201326.","journal-title":"Evolution"},{"key":"4465_CR46","unstructured":"Kendall M, Boyd M, Colijn C. phyloTop: Calculating Topological Properties of Phylogenies, 2016. R package version. 2016;2(0)."},{"issue":"319","key":"4465_CR47","doi-asserted-by":"publisher","first-page":"932","DOI":"10.1080\/01621459.1967.10500904","volume":"62","author":"J Klotz","year":"1967","unstructured":"Klotz J. Asymptotic efficiency of the two sample Kolmogorov\u2013Smirnov test. J Am Stat Assoc. 1967;62(319):932\u20138.","journal-title":"J Am Stat Assoc"},{"issue":"6","key":"4465_CR48","doi-asserted-by":"publisher","first-page":"1298","DOI":"10.1080\/03610918.2012.665546","volume":"42","author":"M Marozzi","year":"2013","unstructured":"Marozzi M. Nonparametric simultaneous tests for location and scale testing: a comparison of several methods. Commun Stat Simul Comput. 2013;42(6):1298\u2013317.","journal-title":"Commun Stat Simul Comput"},{"key":"4465_CR49","doi-asserted-by":"crossref","unstructured":"Wickham H, Chang W, Wickham MH. Package \u2018ggplot2\u2019. Create Elegant Data Visualisations Using the Grammar of Graphics Version. 2016;2(1):1\u2013189.","DOI":"10.1007\/978-3-319-24277-4_9"},{"issue":"5","key":"4465_CR50","first-page":"605","volume":"3","author":"SB Imandoust","year":"2013","unstructured":"Imandoust SB, Bolandraftar M. Application of k-nearest neighbor (knn) approach for predicting economic events: theoretical background. Int J Eng Res Appl. 2013;3(5):605\u201310.","journal-title":"Int J Eng Res Appl"},{"issue":"3","key":"4465_CR51","doi-asserted-by":"publisher","first-page":"671","DOI":"10.1109\/TNN.2006.873281","volume":"17","author":"ME Mavroforakis","year":"2006","unstructured":"Mavroforakis ME, Theodoridis S. A geometric approach to support vector machine (SVM) classification. IEEE Trans Neural Networks. 2006;17(3):671\u201382.","journal-title":"IEEE Trans Neural Networks"},{"issue":"3","key":"4465_CR52","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273\u201397.","journal-title":"Mach Learn"},{"issue":"3","key":"4465_CR53","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1016\/S0034-4257(97)00049-7","volume":"61","author":"MA Friedl","year":"1997","unstructured":"Friedl MA, Brodley CE. Decision tree classification of land cover from remotely sensed data. Remote Sens Environ. 1997;61(3):399\u2013409.","journal-title":"Remote Sens Environ"},{"key":"4465_CR54","unstructured":"Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F, Chang CC, et\u00a0al. Package \u2018e1071\u2019. R J. 2019."},{"issue":"1","key":"4465_CR55","first-page":"37","volume":"2","author":"DM Powers","year":"2011","unstructured":"Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol. 2011;2(1):37\u201363.","journal-title":"J Mach Learn Technol"},{"issue":"1","key":"4465_CR56","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-12-77","volume":"12","author":"X Robin","year":"2011","unstructured":"Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12(1):1\u20138.","journal-title":"BMC Bioinformatics"},{"issue":"8","key":"4465_CR57","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","volume":"27","author":"T Fawcett","year":"2006","unstructured":"Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27(8):861\u201374.","journal-title":"Pattern Recogn Lett"},{"issue":"10","key":"4465_CR58","doi-asserted-by":"publisher","first-page":"1113","DOI":"10.1097\/QAD.0b013e32830184a1","volume":"22","author":"RJ Murray","year":"2008","unstructured":"Murray RJ, Lewis FI, Miller MD, Brown AJL. Genetic basis of variation in tenofovir drug susceptibility in HIV-1. AIDS. 2008;22(10):1113\u201323.","journal-title":"AIDS"},{"issue":"5","key":"4465_CR59","doi-asserted-by":"publisher","first-page":"2242","DOI":"10.1128\/JVI.78.5.2242-2246.2004","volume":"78","author":"AJL Brown","year":"2004","unstructured":"Brown AJL, Frost SD, Good B, Daar ES, Simon V, Markowitz M, et al. Genetic basis of hypersusceptibility to protease inhibitors and low replicative capacity of human immunodeficiency virus type 1 strains in primary infection. J Virol. 2004;78(5):2242\u20136.","journal-title":"J Virol"},{"key":"4465_CR60","unstructured":"Kuhn M. The caret package. R Foundation for Statistical Computing, Vienna, Austria. https:\/\/cran.r-project.org\/package=caret. 2012."},{"issue":"1","key":"4465_CR61","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-018-37458-x","volume":"9","author":"N Bbosa","year":"2019","unstructured":"Bbosa N, Ssemwanga D, Nsubuga RN, Salazar-Gonzalez JF, Salazar MG, Nanyonjo M, et al. Phylogeography of HIV-1 suggests that Ugandan fishing communities are a sink for, not a source of, virus from general populations. Sci Rep. 2019;9(1):1\u20138.","journal-title":"Sci Rep"},{"key":"4465_CR62","first-page":"2","volume":"1","author":"JD Thompson","year":"2003","unstructured":"Thompson JD, Gibson TJ, Higgins DG. Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics. 2003;1:2\u20133.","journal-title":"Curr Protoc Bioinformatics"},{"issue":"1","key":"4465_CR63","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1093\/molbev\/msu300","volume":"32","author":"LT Nguyen","year":"2015","unstructured":"Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268\u201374.","journal-title":"Mol Biol Evol"},{"issue":"2","key":"4465_CR64","doi-asserted-by":"publisher","first-page":"518","DOI":"10.1093\/molbev\/msx281","volume":"35","author":"DT Hoang","year":"2018","unstructured":"Hoang DT, Chernomor O, Von Haeseler A, Minh BQ, Vinh LS. UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 2018;35(2):518\u201322.","journal-title":"Mol Biol Evol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04465-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04465-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04465-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,8]],"date-time":"2023-02-08T19:05:25Z","timestamp":1675883125000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04465-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,10]]},"references-count":64,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["4465"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04465-1","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,10]]},"assertion":[{"value":"14 August 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 October 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 November 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"546"}}