{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T07:00:52Z","timestamp":1771484452037,"version":"3.50.1"},"reference-count":74,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2020,12,9]],"date-time":"2020-12-09T00:00:00Z","timestamp":1607472000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"FCT\/MEC","award":["UIDB\/50008\/2020"],"award-info":[{"award-number":["UIDB\/50008\/2020"]}]},{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["CA16226"],"award-info":[{"award-number":["CA16226"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Biology"],"abstract":"<jats:p>Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008\u20132019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries\u2019 search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.<\/jats:p>","DOI":"10.3390\/biology9120453","type":"journal-article","created":{"date-parts":[[2020,12,9]],"date-time":"2020-12-09T09:17:58Z","timestamp":1607505478000},"page":"453","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4593-219X","authenticated-orcid":false,"given":"Petar","family":"Tonkovic","sequence":"first","affiliation":[{"name":"Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3373-8637","authenticated-orcid":false,"given":"Slobodan","family":"Kalajdziski","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7664-0168","authenticated-orcid":false,"given":"Eftim","family":"Zdravevski","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5336-1796","authenticated-orcid":false,"given":"Petre","family":"Lameski","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8366-6059","authenticated-orcid":false,"given":"Roberto","family":"Corizzo","sequence":"additional","affiliation":[{"name":"Department of Computer Science, American University, Washington, DC 20016, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3394-6762","authenticated-orcid":false,"given":"Ivan Miguel","family":"Pires","sequence":"additional","affiliation":[{"name":"Instituto de Telecomunica\u00e7\u00f5es, Universidade da Beira Interior, 6200-001 Covilh\u00e3, Portugal"},{"name":"Computer Science Department, Polytechnic Institute of Viseu, 3504-510 Viseu, Portugal"},{"name":"Health Sciences Research Unit: Nursing, School of Health, Polytechnic Institute of Viseu, 3504-510 Viseu, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3195-3168","authenticated-orcid":false,"given":"Nuno M.","family":"Garcia","sequence":"additional","affiliation":[{"name":"Instituto de Telecomunica\u00e7\u00f5es, Universidade da Beira Interior, 6200-001 Covilh\u00e3, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3582-8073","authenticated-orcid":false,"given":"Tatjana","family":"Loncar-Turukalo","sequence":"additional","affiliation":[{"name":"Faculty of Technical Sciences, University of Novi Sad, 21102 Novi Sad, Serbia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8103-8059","authenticated-orcid":false,"given":"Vladimir","family":"Trajkovik","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, Macedonia"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/S0378-1119(97)00044-9","article-title":"Human BAC library: Construction and rapid screening","volume":"191","author":"Asakawa","year":"1997","journal-title":"Gene"},{"key":"ref_2","first-page":"25","article-title":"Advances in recovery of novel biocatalysts from metagenomes","volume":"16","author":"Steele","year":"2009","journal-title":"J. Mol. Microbiol. Biotechnol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1016\/j.cell.2011.09.009","article-title":"Metagenomics and personalized medicine","volume":"147","author":"Virgin","year":"2011","journal-title":"Cell"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Pires, I.M., Marques, G., Garcia, N.M., Fl\u00f3rez-Revuelta, F., Ponciano, V., and Oniani, S. (2020). A Research on the Classification and Applicability of the Mobile Health Applications. J. Pers. Med., 10.","DOI":"10.3390\/jpm10010011"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Villasana, M.V., Pires, I.M., S\u00e1, J., Garcia, N.M., Zdravevski, E., Chorbev, I., Lameski, P., and Fl\u00f3rez-Revuelta, F. (2020). Promotion of Healthy Nutrition and Physical Activity Lifestyles for Teenagers: A Systematic Literature Review of The Current Methodologies. J. Pers. Med., 10.","DOI":"10.3390\/jpm10010012"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1007\/s13762-013-0299-8","article-title":"Biotechnological advances in bioremediation of heavy metals contaminated ecosystems: An overview with special reference to phytoremediation","volume":"11","author":"Mani","year":"2014","journal-title":"Int. J. Environ. Sci. Technol."},{"key":"ref_7","first-page":"12","article-title":"An Analysis of the Relation between Garbage Pickers and Women\u2019s Health Risk","volume":"4","author":"Pires","year":"2020","journal-title":"Acta Sci. Agric."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/s00414-013-0872-1","article-title":"The potential use of bacterial community succession in forensics as described by high throughput metagenomic sequencing","volume":"128","author":"Pechal","year":"2014","journal-title":"Int. J. Leg. Med."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kreil, D.P., and Hu, L. (2013). Proceedings of the Critical Assessment of Massive Data Analysis conferences: CAMDA 2011 (Vienna, Austria) and CAMDA 2012 (Long Beach, CA USA). Syst. Biomed., 1.","DOI":"10.4161\/sysb.28947"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1186\/s40168-016-0168-z","article-title":"The metagenomics and metadesign of the subways and urban biomes (MetaSUB) international consortium inaugural meeting report","volume":"4","author":"Mason","year":"2016","journal-title":"Microbiome"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zdravevski, E., Lameski, P., Trajkovik, V., Chorbev, I., Goleva, R., Pombo, N., and Garcia, N.M. (2019). Automation in systematic, scoping and rapid reviews by an NLP toolkit: A case study in enhanced living environments. Enhanced Living Environments, Springer.","DOI":"10.1007\/978-3-030-10752-9_1"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/2046-4053-4-1","article-title":"Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement","volume":"4","author":"Moher","year":"2015","journal-title":"Syst. Rev."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1186\/1748-5908-5-69","article-title":"Scoping studies: Advancing the methodology","volume":"5","author":"Levac","year":"2010","journal-title":"Implement. Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"e14017","DOI":"10.2196\/14017","article-title":"Literature on Wearable Technology for Connected Health: Scoping Review of Research Trends, Advances, and Barriers","volume":"21","author":"Zdravevski","year":"2019","journal-title":"J. Med. Internet Res."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., and McClosky, D. (2014, January 22\u201327). The Stanford CoreNLP natural language processing toolkit. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, Maryland.","DOI":"10.3115\/v1\/P14-5010"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/MCSE.2007.55","article-title":"Matplotlib: A 2D graphics environment","volume":"9","author":"Hunter","year":"2007","journal-title":"Comput. Sci. Eng."},{"key":"ref_17","unstructured":"Hagberg, A., Swart, P., and S Chult, D. (2008). Exploring Network Structure, Dynamics, and Function Using NetworkX. Technical report."},{"key":"ref_18","unstructured":"Tonkovic, P., Zdravevski, E., and Trajkovik, V. (2020). Metagenomic classification scoping review results. Zenodo."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1007\/s00294-017-0693-8","article-title":"The metagenomics worldwide research","volume":"63","year":"2017","journal-title":"Curr. Genet."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"803","DOI":"10.3748\/wjg.v21.i3.803","article-title":"Application of metagenomics in the human gut microbiome","volume":"21","author":"Wang","year":"2015","journal-title":"World J. Gastroenterol. WJG"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1111\/j.1574-6941.2002.tb00904.x","article-title":"Assessment of microbial diversity in human colonic samples by 16S rDNA sequence analysis","volume":"39","author":"Hold","year":"2002","journal-title":"FEMS Microbiol. Ecol."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Ehrlich, S.D., Consortium, M., and MetaHIT Consortium (2011). MetaHIT: The European Union Project on metagenomics of the human intestinal tract. Metagenomics of the Human Body, Springer.","DOI":"10.1007\/978-1-4419-7089-3_15"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1038\/nature06244","article-title":"The human microbiome project","volume":"449","author":"Turnbaugh","year":"2007","journal-title":"Nature"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1146\/annurev-pathmechdis-012418-012751","article-title":"Clinical metagenomic next-generation sequencing for pathogen detection","volume":"14","author":"Gu","year":"2019","journal-title":"Annu. Rev. Pathol. Mech. Dis."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1540","DOI":"10.1126\/science.280.5369.1540","article-title":"Shotgun sequencing of the human genome","volume":"280","author":"Venter","year":"1998","journal-title":"Science"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1126\/science.2448875","article-title":"Polymerase chain reaction","volume":"239","author":"Saiki","year":"1988","journal-title":"Science"},{"key":"ref_27","unstructured":"Goelet, P., Knapp, M.R., and Anderson, S. (1999). Method for Determining Nucleotide Identity through Primer Extension. (5,888,819), U.S. Patent."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1111\/2041-210X.12988","article-title":"On the universality of target-enrichment baits for phylogenomic research","volume":"9","author":"Bossert","year":"2018","journal-title":"Methods Ecol. Evol."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1373\/jalm.2018.026120","article-title":"Metagenomics to assist in the diagnosis of bloodstream infection","volume":"3","author":"Greninger","year":"2019","journal-title":"J. Appl. Lab. Med."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1038\/s41576-019-0113-7","article-title":"Clinical metagenomics","volume":"20","author":"Chiu","year":"2019","journal-title":"Nat. Rev. Genet."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1111\/j.1469-0691.2012.03868.x","article-title":"Metagenomics and antibiotics","volume":"18","author":"Garmendia","year":"2012","journal-title":"Clin. Microbiol. Infect."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1186\/s13062-018-0215-8","article-title":"Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles","volume":"13","author":"Walker","year":"2018","journal-title":"Biol. Direct"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13062-019-0245-x","article-title":"Application of machine learning techniques for creating urban microbial fingerprints","volume":"14","author":"Ryan","year":"2019","journal-title":"Biol. Direct"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/s13062-019-0252-y","article-title":"Fingerprinting cities: Differentiating subway microbiome functionality","volume":"14","author":"Zhu","year":"2019","journal-title":"Biol. Direct"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/s13062-019-0242-0","article-title":"Massive metagenomic data analysis using abundance-based machine learning","volume":"14","author":"Harris","year":"2019","journal-title":"Biol. Direct"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13062-018-0220-y","article-title":"MetaBinG2: A fast and accurate metagenomic sequence classification system for samples with many unknown organisms","volume":"13","author":"Qiao","year":"2018","journal-title":"Biol. Direct"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: Ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-018-1568-0","article-title":"KrakenUniq: Confident and fast metagenomics classification using unique k-mer counts","volume":"19","author":"Breitwieser","year":"2018","journal-title":"Genome Biol."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"3750","DOI":"10.1093\/bioinformatics\/bty433","article-title":"LiveKraken\u2014real-time metagenomic classification of illumina data","volume":"34","author":"Tausch","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Saghir, H., and Megherbi, D.B. (2013, January 12\u201314). A random-forest-based efficient comparative machine learning predictive DNA-codon metagenomics binning technique for WMD events & applications. Proceedings of the 2013 IEEE International Conference on Technologies for Homeland Security (HST), Waltham, MA, USA.","DOI":"10.1109\/THS.2013.6698995"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Saghir, H., and Megherbi, D.B. (2013, January 15\u201317). An efficient comparative machine learning-based metagenomics binning technique via using Random forest. Proceedings of the 2013 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), Milan, Italy.","DOI":"10.1109\/CIVEMSA.2013.6617419"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhu, Q., Zhu, Q., Pan, M., Jiang, X., Hu, X., and He, T. (2018, January 3\u20136). The phylogenetic tree based deep forest for metagenomic data classification. Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain.","DOI":"10.1109\/BIBM.2018.8621463"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lo, C., and Marculescu, R. (2019). MetaNN: Accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinform., 20.","DOI":"10.1186\/s12859-019-2833-2"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Kaufmann, J., Asalone, K., Corizzo, R., Saldanha, C., Bracht, J., and Japkowicz, N. (2020). One-Class Ensembles for Rare Genomic Sequences Identification. International Conference on Discovery Science, Springer.","DOI":"10.1007\/978-3-030-61527-7_23"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"156053","DOI":"10.1109\/ACCESS.2020.3019095","article-title":"ECHAD: Embedding-Based Change Detection From Multivariate Time Series in Smart Grids","volume":"8","author":"Ceci","year":"2020","journal-title":"IEEE Access"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-018-1554-6","article-title":"RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification","volume":"19","author":"Nasko","year":"2018","journal-title":"Genome Biol."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1186\/s40168-017-0323-1","article-title":"Learning microbial community structures with supervised and unsupervised non-negative matrix factorization","volume":"5","author":"Cai","year":"2017","journal-title":"Microbiome"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Guerrini, V., and Rosone, G. (2019). Lightweight metagenomic classification via eBWT. International Conference on Algorithms for Computational Biology, Springer.","DOI":"10.1007\/978-3-030-18174-1_8"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Cerulo, L., Elkan, C., and Ceccarelli, M. (2010). Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinform., 11.","DOI":"10.1186\/1471-2105-11-228"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Mignone, P., and Pio, G. (2018, January 29\u201331). Positive unlabeled link prediction via transfer learning for gene network reconstruction. Proceedings of the 24th International Symposium on Methodologies for Intelligent Systems, Limassol, Cyprus.","DOI":"10.1007\/978-3-030-01851-1_2"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1553","DOI":"10.1093\/bioinformatics\/btz781","article-title":"Exploiting transfer learning for the reconstruction of the human gene regulatory network","volume":"36","author":"Mignone","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-020-3392-2","article-title":"Prediction of new associations between ncRNAs and diseases exploiting multi-type hierarchical clustering","volume":"21","author":"Barracchia","year":"2020","journal-title":"BMC Bioinform."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"2936","DOI":"10.1093\/bioinformatics\/btx353","article-title":"FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation","volume":"33","author":"Min","year":"2017","journal-title":"Bioinformatics"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1002\/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L","article-title":"Pfam: A comprehensive database of protein domain families based on seed alignments","volume":"28","author":"Sonnhammer","year":"1997","journal-title":"PRoteins Struct. Funct. Bioinform."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Seppey, M., Manni, M., and Zdobnov, E.M. (2019). BUSCO: Assessing genome assembly and annotation completeness. Gene Prediction, Springer.","DOI":"10.1007\/978-1-4939-9173-0_14"},{"key":"ref_56","unstructured":"Korf, I., Yandell, M., and Bedell, J. (2003). Blast, O\u2019Reilly Media, Inc."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1101\/gr.5969107","article-title":"MEGAN analysis of metagenomic data","volume":"17","author":"Huson","year":"2007","journal-title":"Genome Res."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1186\/s13059-017-1299-7","article-title":"Comprehensive benchmarking and ensemble approaches for metagenomic classifiers","volume":"18","author":"McIntyre","year":"2017","journal-title":"Genome Biol."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Ounit, R., Wanamaker, S., Close, T.J., and Lonardi, S. (2015). CLARK: Fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom., 16.","DOI":"10.1186\/s12864-015-1419-2"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"3823","DOI":"10.1093\/bioinformatics\/btw542","article-title":"Higher classification sensitivity of short metagenomic reads with CLARK-S","volume":"32","author":"Ounit","year":"2016","journal-title":"Bioinformatics"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"2253","DOI":"10.1093\/bioinformatics\/btt389","article-title":"Scalable metagenomic taxonomy classification using a reference genome database","volume":"29","author":"Ames","year":"2013","journal-title":"Bioinformatics"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Sobih, A., Tomescu, A.I., and M\u00e4kinen, V. (2016, January 22\u201323). MetaFlow: Metagenomic profiling based on whole-genome coverage analysis with min-cost flows. Proceedings of the International Conference on Research in Computational Molecular Biology, Philadelphia, PA, USA.","DOI":"10.1101\/038208"},{"key":"ref_63","unstructured":"Freitas, T., Chain, P., Lo, C.C., and Li, P.E. (2015). GOTTCHA Database, Version 1, Technical report."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1038\/nmeth.2066","article-title":"Metagenomic microbial community profiling using unique clade-specific marker genes","volume":"9","author":"Segata","year":"2012","journal-title":"Nat. Methods"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"e243","DOI":"10.7717\/peerj.243","article-title":"PhyloSift: Phylogenetic analysis of genomes and metagenomes","volume":"2","author":"Darling","year":"2014","journal-title":"PeerJ"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"5262","DOI":"10.1109\/ACCESS.2017.2684913","article-title":"Improving Activity Recognition Accuracy in Ambient-Assisted Living Systems by Automated Feature Engineering","volume":"5","author":"Zdravevski","year":"2017","journal-title":"IEEE Access"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Zdravevski, E., Lameski, P., Kulakov, A., Jakimovski, B., Filiposka, S., and Trajanov, D. (2015, January 20\u201322). Feature Ranking Based on Information Gain for Large Classification Problems with MapReduce. Proceedings of the 2015 IEEE Trustcom\/BigDataSE\/ISPA, Helsinki, Finland.","DOI":"10.1109\/Trustcom.2015.580"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"106164","DOI":"10.1016\/j.asoc.2020.106164","article-title":"From Big Data to business analytics: The case study of churn prediction","volume":"90","author":"Zdravevski","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"e109","DOI":"10.1093\/nar\/gkt215","article-title":"Probabilistic error correction for RNA sequencing","volume":"41","author":"Le","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-019-2684-x","article-title":"From trash to treasure: Detecting unexpected contamination in unmapped NGS data","volume":"20","author":"Sangiovanni","year":"2019","journal-title":"BMC Bioinform."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1007\/s12539-015-0281-x","article-title":"MetaObtainer: A Tool for Obtaining Specified Species from Metagenomic Reads of Next-generation Sequencing","volume":"7","author":"Pan","year":"2015","journal-title":"Interdiscip. Sci. Comput. Life Sci."},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/s13040-018-0170-z","article-title":"Feature selection for gene prediction in metagenomic fragments","volume":"11","year":"2018","journal-title":"BioData Min."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Saghir, H., and Megherbi, D.B. (2015, January 14\u201316). Big data biology-based predictive models via DNA-metagenomics binning for WMD events applications. Proceedings of the 2015 IEEE International Symposium on Technologies for Homeland Security (HST), Waltham, MA, USA.","DOI":"10.1109\/THS.2015.7225313"},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Kim, M., Zhang, X., Ligo, J.G., Farnoud, F., Veeravalli, V.V., and Milenkovic, O. (2016). MetaCRAM: An integrated pipeline for metagenomic taxonomy identification and compression. BMC Bioinform., 17.","DOI":"10.1186\/s12859-016-0932-x"}],"container-title":["Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-7737\/9\/12\/453\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:42:38Z","timestamp":1760179358000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-7737\/9\/12\/453"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,9]]},"references-count":74,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["biology9120453"],"URL":"https:\/\/doi.org\/10.3390\/biology9120453","relation":{},"ISSN":["2079-7737"],"issn-type":[{"value":"2079-7737","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,9]]}}}