{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,25]],"date-time":"2026-06-25T11:08:07Z","timestamp":1782385687007,"version":"3.54.5"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2018,9,24]],"date-time":"2018-09-24T00:00:00Z","timestamp":1537747200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005189","name":"Estonian Research Competency Council","doi-asserted-by":"crossref","award":["PUT1476"],"award-info":[{"award-number":["PUT1476"]}],"id":[{"id":"10.13039\/501100005189","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004785","name":"NordForsk","doi-asserted-by":"publisher","award":["62721"],"award-info":[{"award-number":["62721"]}],"id":[{"id":"10.13039\/501100004785","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001729","name":"Stiftelsen f\u00f6r\u00a0Strategisk Forskning","doi-asserted-by":"publisher","award":["RB13-0011"],"award-info":[{"award-number":["RB13-0011"]}],"id":[{"id":"10.13039\/501100001729","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2018,12]]},"DOI":"10.1186\/s12859-018-2340-x","type":"journal-article","created":{"date-parts":[[2018,9,24]],"date-time":"2018-09-24T13:03:49Z","timestamp":1537794229000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":49,"title":["Machine Learning for detection of viral sequences in human metagenomic datasets"],"prefix":"10.1186","volume":"19","author":[{"given":"Zurab","family":"Bzhalava","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ardi","family":"Tampuu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Piotr","family":"Ba\u0142a","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Raul","family":"Vicente","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Joakim","family":"Dillner","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2018,9,24]]},"reference":[{"issue":"4","key":"2340_CR1","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1016\/j.trsl.2012.03.006","volume":"160","author":"KM Wylie","year":"2012","unstructured":"Wylie KM, Weinstock GM, Storch GA. Emerging view of the human virome. Transl Res. 2012; 160(4):283\u201390.","journal-title":"Transl Res"},{"issue":"10","key":"2340_CR2","doi-asserted-by":"publisher","first-page":"510","DOI":"10.1016\/j.tim.2013.07.001","volume":"21","author":"M Lecuit","year":"2013","unstructured":"Lecuit M, Eloit M. The human virome: new tools and concepts. Trends Microbiol. 2013; 21(10):510\u20135.","journal-title":"Trends Microbiol"},{"issue":"6","key":"2340_CR3","doi-asserted-by":"publisher","first-page":"27735","DOI":"10.1371\/journal.pone.0027735","volume":"7","author":"KM Wylie","year":"2012","unstructured":"Wylie KM, Mihindukulasuriya KA, Sodergren E, Weinstock GM, Storch GA. Sequence analysis of the human virome in febrile and afebrile children. PloS ONE. 2012; 7(6):27735.","journal-title":"PloS ONE"},{"issue":"10","key":"2340_CR4","doi-asserted-by":"publisher","first-page":"7370","DOI":"10.1371\/journal.pone.0007370","volume":"4","author":"D Willner","year":"2009","unstructured":"Willner D, Furlan M, Haynes M, Schmieder R, Angly FE, Silva J, Tammadoni S, Nosrat B, Conrad D, Rohwer F. Metagenomic analysis of respiratory tract dna viral communities in cystic fibrosis and non-cystic fibrosis individuals. PloS ONE. 2009; 4(10):7370.","journal-title":"PloS ONE"},{"issue":"2","key":"2340_CR5","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1016\/j.virol.2012.06.022","volume":"432","author":"D Bzhalava","year":"2012","unstructured":"Bzhalava D, Ekstr\u00f6m J, Lysholm F, Hultin E, Faust H, Persson B, Lehtinen M, de Villiers E-M, Dillner J. Phylogenetically diverse tt virus viremia among pregnant women. Virology. 2012; 432(2):427\u201334.","journal-title":"Virology"},{"issue":"9","key":"2340_CR6","doi-asserted-by":"publisher","first-page":"2212","DOI":"10.1002\/ijc.29666","volume":"138","author":"D Bzhalava","year":"2016","unstructured":"Bzhalava D, Hultin E, Arroyo M\u00fchr LS, Ekstr\u00f6m J, Lehtinen M, de Villiers E-M, Dillner J. Viremia during pregnancy and risk of childhood leukemia and lymphomas in the offspring: Nested case\u2013control study. Int J Cancer. 2016; 138(9):2212\u201320.","journal-title":"Int J Cancer"},{"issue":"6","key":"2340_CR7","doi-asserted-by":"publisher","first-page":"65953","DOI":"10.1371\/journal.pone.0065953","volume":"8","author":"D Bzhalava","year":"2013","unstructured":"Bzhalava D, Johansson H, Ekstr\u00f6m J, Faust H, M\u00f6ller B, Eklund C, Nordin P, Stenquist B, Paoli J, Persson B, et al.Unbiased approach for virus detection in skin lesions. PLoS ONE. 2013; 8(6):65953.","journal-title":"PLoS ONE"},{"key":"2340_CR8","doi-asserted-by":"publisher","first-page":"5807","DOI":"10.1038\/srep05807","volume":"4","author":"D Bzhalava","year":"2014","unstructured":"Bzhalava D, M\u00fchr LS, Lagheden C, Ekstr\u00f6m J, Forslund O, Dillner J, et al. Deep sequencing extends the diversity of human papillomaviruses in human skin. Sci Rep. 2014; 4:5807.","journal-title":"Sci Rep"},{"issue":"11","key":"2340_CR9","doi-asserted-by":"publisher","first-page":"2643","DOI":"10.1002\/ijc.26204","volume":"129","author":"J Ekstr\u00f6m","year":"2011","unstructured":"Ekstr\u00f6m J, Bzhalava D, Svenback D, Forslund O, Dillner J. High throughput sequencing reveals diversity of human papillomaviruses in cutaneous lesions. Int J Cancer. 2011; 129(11):2643\u201350.","journal-title":"Int J Cancer"},{"issue":"5866","key":"2340_CR10","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1126\/science.1152586","volume":"319","author":"H Feng","year":"2008","unstructured":"Feng H, Shuda M, Chang Y, Moore PS. Clonal integration of a polyomavirus in human merkel cell carcinoma. Science. 2008; 319(5866):1096\u2013100.","journal-title":"Science"},{"issue":"1","key":"2340_CR11","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1186\/1743-422X-9-164","volume":"9","author":"TL Meiring","year":"2012","unstructured":"Meiring TL, Salimo AT, Coetzee B, Maree HJ, Moodley J, Hitzeroth II, Freeborough M-J, Rybicki EP, Williamson A-L. Next-generation sequencing of cervical dna detects human papillomavirus types not detected by commercial kits. Virol J. 2012; 9(1):164.","journal-title":"Virol J"},{"issue":"6","key":"2340_CR12","doi-asserted-by":"publisher","first-page":"38499","DOI":"10.1371\/journal.pone.0038499","volume":"7","author":"V Foulongne","year":"2012","unstructured":"Foulongne V, Sauvage V, Hebert C, Dereure O, Cheval J, Gouilh MA, Pariente K, Segondy M, Burgui\u00e8re A, Manuguerra J-C, et al.Human skin microbiota: high diversity of dna viruses identified on the human skin by high throughput sequencing. PloS ONE. 2012; 7(6):38499.","journal-title":"PloS ONE"},{"issue":"11","key":"2340_CR13","doi-asserted-by":"publisher","first-page":"1000212","DOI":"10.1371\/journal.ppat.1000212","volume":"4","author":"JS Towner","year":"2008","unstructured":"Towner JS, Sealy TK, Khristova ML, Albari\u00f1o CG, Conlan S, Reeder SA, Quan P-L, Lipkin WI, Downing R, Tappero JW, et al.Newly discovered ebola virus associated with hemorrhagic fever outbreak in uganda. PLoS Pathog. 2008; 4(11):1000212.","journal-title":"PLoS Pathog"},{"issue":"2","key":"2340_CR14","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1165\/rcmb.2011-0253OC","volume":"46","author":"D Willner","year":"2012","unstructured":"Willner D, Haynes MR, Furlan M, Hanson N, Kirby B, Lim YW, Rainey PB, Schmieder R, Youle M, Conrad D, et al.Case studies of the spatial heterogeneity of dna viruses in the cystic fibrosis lung. Am J Respir Cell Mol Biol. 2012; 46(2):127\u201331.","journal-title":"Am J Respir Cell Mol Biol"},{"issue":"1","key":"2340_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.virol.2013.01.023","volume":"440","author":"H Johansson","year":"2013","unstructured":"Johansson H, Bzhalava D, Ekstr\u00f6m J, Hultin E, Dillner J, Forslund O. Metagenomic sequencing of \u201chpv-negative\u201d condylomas detects novel putative hpv types. Virology. 2013; 440(1):1\u20137.","journal-title":"Virology"},{"issue":"11","key":"2340_CR16","doi-asserted-by":"publisher","first-page":"2169","DOI":"10.1038\/ismej.2013.110","volume":"7","author":"JM Labont\u00e9","year":"2013","unstructured":"Labont\u00e9 JM, Suttle CA. Previously unknown and highly divergent ssdna viruses populate the oceans. ISME J. 2013; 7(11):2169.","journal-title":"ISME J"},{"issue":"1","key":"2340_CR17","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/2042-5783-2-3","volume":"2","author":"T Thomas","year":"2012","unstructured":"Thomas T, Gilbert J, Meyer F. Metagenomics-a guide from sampling to data analysis. Microb Inform Experimentation. 2012; 2(1):3.","journal-title":"Microb Inform Experimentation"},{"issue":"8","key":"2340_CR18","doi-asserted-by":"publisher","first-page":"105067","DOI":"10.1371\/journal.pone.0105067","volume":"9","author":"P Skewes-Cox","year":"2014","unstructured":"Skewes-Cox P, Sharpton TJ, Pollard KS, DeRisi JL. Profile hidden markov models for the detection of viruses within metagenomic sequence data. PLoS ONE. 2014; 9(8):105067.","journal-title":"PLoS ONE"},{"issue":"1","key":"2340_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0190938","volume":"13","author":"Z Bzhalava","year":"2018","unstructured":"Bzhalava Z, Hultin E, Dillner J. Extension of the viral ecology in humans using viral profile hidden markov models. Plos ONE. 2018; 13(1):1\u201312.","journal-title":"Plos ONE"},{"issue":"45","key":"2340_CR20","doi-asserted-by":"publisher","first-page":"14030","DOI":"10.1073\/pnas.1515387112","volume":"112","author":"YC Shin","year":"2015","unstructured":"Shin YC, Bischof GF, Lauer WA, Desrosiers RC. Importance of codon usage for the temporal regulation of viral gene expression. Proc Natl Acad Sci. 2015; 112(45):14030\u20135.","journal-title":"Proc Natl Acad Sci"},{"issue":"1","key":"2340_CR21","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1186\/s12859-017-1793-7","volume":"18","author":"J Athey","year":"2017","unstructured":"Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinformatics. 2017; 18(1):391.","journal-title":"BMC Bioinformatics"},{"key":"2340_CR22","doi-asserted-by":"publisher","first-page":"25235","DOI":"10.1038\/srep25235","volume":"6","author":"V Smelov","year":"2016","unstructured":"Smelov V, Bzhalava D, M\u00fchr LSA, Eklund C, Komyakov B, Gorelov A, Dillner J, Hultin E. Detection of dna viruses in prostate cancer. Sci Rep. 2016; 6:25235.","journal-title":"Sci Rep"},{"key":"2340_CR23","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1016\/j.virol.2015.07.023","volume":"485","author":"LSA M\u00fchr","year":"2015","unstructured":"M\u00fchr LSA, Bzhalava D, Lagheden C, Eklund C, Johansson H, Forslund O, Dillner J, Hultin E. Does human papillomavirus-negative condylomata exist?Virology. 2015; 485:283\u20138.","journal-title":"Virology"},{"issue":"11","key":"2340_CR24","doi-asserted-by":"publisher","first-page":"2546","DOI":"10.1002\/ijc.29325","volume":"136","author":"LS Arroyo M\u00fchr","year":"2015","unstructured":"Arroyo M\u00fchr LS, Hultin E, Bzhalava D, Eklund C, Lagheden C, Ekstr\u00f6m J, Johansson H, Forslund O, Dillner J. Human papillomavirus type 197 is commonly present in skin tumors. Int J Cancer. 2015; 136(11):2546\u201355.","journal-title":"Int J Cancer"},{"key":"2340_CR25","first-page":"134","volume":"4","author":"D Bzhalava","year":"2013","unstructured":"Bzhalava D, Dillner J. Bioinformatics for viral metagenomics. J Data Min Genom Proteomics. 2013; 4:134.","journal-title":"J Data Min Genom Proteomics"},{"issue":"5","key":"2340_CR26","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","volume":"26","author":"H Li","year":"2010","unstructured":"Li H, Durbin R. Fast and accurate long-read alignment with burrows\u2013wheeler transform. Bioinformatics. 2010; 26(5):589\u201395.","journal-title":"Bioinformatics"},{"key":"2340_CR27","doi-asserted-by":"publisher","first-page":"644","DOI":"10.1038\/nbt.1883","volume":"29","author":"MG Grabherr","year":"2011","unstructured":"Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, et al.Trinity: reconstructing a full-length transcriptome without a genome from rna-seq data. Nat Biotechnol. 2011; 29:644\u201352.","journal-title":"Nat Biotechnol"},{"issue":"1","key":"2340_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/2047-217X-1-18","volume":"1","author":"R Luo","year":"2012","unstructured":"Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, et al.Soapdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012; 1(1):1\u20136.","journal-title":"GigaScience"},{"issue":"11","key":"2340_CR29","doi-asserted-by":"publisher","first-page":"1420","DOI":"10.1093\/bioinformatics\/bts174","volume":"28","author":"Y Peng","year":"2012","unstructured":"Peng Y, Leung HCM, Yiu SM, Chin FYL. Idba-ud: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012; 28(11):1420\u20138.","journal-title":"Bioinformatics"},{"issue":"8","key":"2340_CR30","doi-asserted-by":"publisher","first-page":"871","DOI":"10.1089\/cmb.2018.0079","volume":"25","author":"M Nowicki","year":"2018","unstructured":"Nowicki M, Bzhalava D, Ba\u0142a P. Massively parallel implementation of sequence alignment with basic local alignment search tool using parallel computing in java library. J Comput Biol. 2018; 25(8):871\u201381.","journal-title":"J Comput Biol"},{"issue":"13","key":"2340_CR31","doi-asserted-by":"publisher","first-page":"5125","DOI":"10.1093\/nar\/14.13.5125","volume":"14","author":"PM Sharp","year":"1986","unstructured":"Sharp PM, Tuohy TM, Mosurski KR. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986; 14(13):5125\u201343.","journal-title":"Nucleic Acids Res"},{"issue":"23","key":"2340_CR32","doi-asserted-by":"publisher","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","volume":"28","author":"L Fu","year":"2012","unstructured":"Fu L, Niu B, Zhu Z, Wu S, Li W. Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150\u20132.","journal-title":"Bioinformatics"},{"key":"2340_CR33","unstructured":"Van Asch V. Macro-and micro-averaged evaluation measures. Tech Rep. 2013."},{"issue":"1","key":"2340_CR34","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001; 45(1):5\u201332.","journal-title":"Mach Learn"},{"issue":"Oct","key":"2340_CR35","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al.Scikit-learn: Machine learning in python. J Mach Learn Res. 2011; 12(Oct):2825\u201330.","journal-title":"J Mach Learn Res"},{"issue":"1","key":"2340_CR36","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1186\/1471-2105-9-307","volume":"9","author":"C Strobl","year":"2008","unstructured":"Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A. Conditional variable importance for random forests. BMC Bioinformatics. 2008; 9(1):307.","journal-title":"BMC Bioinformatics"},{"issue":"4","key":"2340_CR37","doi-asserted-by":"publisher","first-page":"2249","DOI":"10.1016\/j.csda.2007.08.015","volume":"52","author":"KJ Archer","year":"2008","unstructured":"Archer KJ, Kimes RV. Empirical characterization of random forest variable importance measures. Comput Stat Data Anal. 2008; 52(4):2249\u201360.","journal-title":"Comput Stat Data Anal"},{"key":"2340_CR38","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198538493.001.0001","volume-title":"Neural networks for pattern recognition","author":"CM Bishop","year":"1995","unstructured":"Bishop CM. Neural networks for pattern recognition. Oxford: Oxford University Press; 1995."},{"issue":"4","key":"2340_CR39","first-page":"500","volume":"9","author":"F Castro-Chavez","year":"2011","unstructured":"Castro-Chavez F. Most used codons per amino acid and per genome in the code of man compared to other organisms according to the rotating circular genetic code. NeuroQuantology Interdiscip J Neurosci Quantum Phys. 2011; 9(4):500.","journal-title":"NeuroQuantology Interdiscip J Neurosci Quantum Phys"},{"issue":"8","key":"2340_CR40","doi-asserted-by":"publisher","first-page":"901","DOI":"10.2217\/pgs.12.72","volume":"13","author":"J Henson","year":"2012","unstructured":"Henson J, Tischler G, Ning Z. Next-generation sequencing and large genome assemblies. Pharmacogenomics. 2012; 13(8):901\u201315.","journal-title":"Pharmacogenomics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-018-2340-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-018-2340-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-018-2340-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,10]],"date-time":"2024-07-10T17:57:46Z","timestamp":1720634266000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-018-2340-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9,24]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,12]]}},"alternative-id":["2340"],"URL":"https:\/\/doi.org\/10.1186\/s12859-018-2340-x","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,9,24]]},"assertion":[{"value":"26 September 2017","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 August 2018","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 September 2018","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Our study is based on re-analysis of a series of previous studies on metagenomics sequencing, analysed with the bioinformatics pipeline that was most up-to-date at that time. The studies had the following Ethical Review Board (ERB) permissions: 2011\/1026-31\/4; 2012\/1028\/32; 53\/2005; 612\/2008; LU574-03; 104\/2006; R13149, 2\/2014; 2011-198-31M and 12\/780-32. In the Swedish system, the Ethical Review Board (ERB) is appointed by government and chaired by a senior judge. The ERB has the authority to specify the demands on information and consent and the ERB decisions were carefully followed and our study is thus in accordance with the Declaration of Helsinki.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Publisher\u2019s Note"}}],"article-number":"336"}}