{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T09:45:44Z","timestamp":1769247944356,"version":"3.49.0"},"reference-count":42,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,4,22]],"date-time":"2021-04-22T00:00:00Z","timestamp":1619049600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>MicroRNAs (miRNAs) are short RNA sequences that are actively involved in gene regulation. These regulators on the post-transcriptional level have been discovered in virtually all eukaryotic organisms. Additionally, miRNAs seem to exist in viruses and might also be produced in microbial pathogens. Initially, transcribed RNA is cleaved by Drosha, producing precursor miRNAs. We have previously shown that it is possible to distinguish between microRNA precursors of different clades by representing the sequences in a k-mer feature space. The k-mer representation considers the frequency of a k-mer in the given sequence. We further hypothesized that the relationship between k-mers (e.g., distance between k-mers) could be useful for classification. Three different distance-based features were created, tested, and compared. The three feature sets were entitled inter k-mer distance, k-mer location distance, and k-mer first\u2013last distance. Here, we show that classification performance above 80% (depending on the evolutionary distance) is possible with a combination of distance-based and regular k-mer features. With these novel features, classification at closer evolutionary distances is better than using k-mers alone. Combining the features leads to accurate classification for larger evolutionary distances. For example, categorizing Homo sapiens versus Brassicaceae leads to an accuracy of 93%. When considering average accuracy, the novel distance-based features lead to an overall increase in effectiveness. On the contrary, secondary-structure-based features did not lead to any effective separation among clades in this study. With this line of research, we support the differentiation between true and false miRNAs detected from next-generation sequencing data, provide an additional viewpoint for confirming miRNAs when the species of origin is known, and open up a new strategy for analyzing miRNA evolution.<\/jats:p>","DOI":"10.3390\/a14050132","type":"journal-article","created":{"date-parts":[[2021,4,22]],"date-time":"2021-04-22T13:59:14Z","timestamp":1619099954000},"page":"132","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Classification of Precursor MicroRNAs from Different Species Based on K-mer Distance Features"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8780-6303","authenticated-orcid":false,"given":"Malik","family":"Yousef","sequence":"first","affiliation":[{"name":"Department of Information Systems, Zefat Academic College, Zefat 13206, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2164-7335","authenticated-orcid":false,"given":"Jens","family":"Allmer","sequence":"additional","affiliation":[{"name":"Institute of Measurement and Sensor Technology, Hochschule Ruhr West University of Applied Sciences, 45479 M\u00fclheim an der Ruhr, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-1-62703-748-8_1","article-title":"Introduction to MicroRNAs in Biological Systems","volume":"1107","year":"2014","journal-title":"Methods Mol. Biol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"884","DOI":"10.1038\/nrg2179","article-title":"Specialization and Evolution of Endogenous Small RNA Pathways","volume":"8","author":"Chapman","year":"2007","journal-title":"Nat. Rev. Genet."},{"key":"ref_3","unstructured":"Yousef, M., Allmer, J., and Khalifa, W. (2021, April 21). Plant MicroRNA Prediction Employing Sequence Motifs Achieves High Accuracy. Available online: https:\/\/www.researchgate.net\/publication\/320402782_Plant_microRNA_prediction_employing_sequence_motifs_achieves_high_accuracy."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1099\/vir.0.070862-0","article-title":"Role of MicroRNAs in Herpesvirus Latency and Persistence","volume":"96","author":"Grey","year":"2015","journal-title":"J. Gen. Virol."},{"key":"ref_5","first-page":"3","article-title":"Current Limitations for Computational Analysis of MiRNAs in Cancer","volume":"1","author":"Allmer","year":"2013","journal-title":"Pak. J. Clin. Biomed. Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"445","DOI":"10.2174\/1389201015666140519120855","article-title":"Intersection of MicroRNA and Gene Regulatory Networks and Their Implication in Cancer","volume":"15","author":"Yousef","year":"2014","journal-title":"Curr. Pharm. Biotechnol."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"209","DOI":"10.3389\/fgene.2012.00209","article-title":"Computational Methods for Ab Initio Detection of MicroRNAs","volume":"3","author":"Allmer","year":"2012","journal-title":"Front. Genet."},{"key":"ref_8","first-page":"177","article-title":"Machine Learning Methods for MicroRNA Gene Prediction","volume":"Volume 1107","author":"Yousef","year":"2014","journal-title":"miRNomics: MicroRNA Biology and Computational Analysis SE-10"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1093\/bioinformatics\/btl094","article-title":"Combining Multi-Species Genomic Data for MicroRNA Identification Using a Naive Bayes Classifier","volume":"22","author":"Yousef","year":"2006","journal-title":"Bioinformatics"},{"key":"ref_10","unstructured":"Dang, H.T., Tho, H.P., Satou, K., and Tu, B.H. (2008, January 16\u201318). Prediction of MicroRNA Hairpins Using One-Class Support Vector Machines. Proceedings of the 2nd International Conference on Bioinformatics and Biomedical Engineering, iCBBE 2008, Shanghai, China."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"e2135","DOI":"10.7717\/peerj.2135","article-title":"The Impact of Feature Selection on One and Two-Class Classification Performance for Plant MicroRNAs","volume":"4","author":"Khalifa","year":"2016","journal-title":"PeerJ"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/1748-7188-3-2","article-title":"Learning from Positive Examples When the Negative Class Is Undetermined\u2014MicroRNA Gene Identification","volume":"3","author":"Yousef","year":"2008","journal-title":"Algorithms Mol. Biol. AMB"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"e3131","DOI":"10.7717\/peerj.3131","article-title":"Delineating the Impact of Machine Learning Elements in Pre-MicroRNA Detection","volume":"5","author":"Demirci","year":"2017","journal-title":"PeerJ"},{"key":"ref_14","first-page":"215","article-title":"Can MiRBase Provide Positive Data for Machine Learning for the Detection of MiRNA Hairpins?","volume":"10","author":"Hamzeiy","year":"2013","journal-title":"J. Integr. Bioinform."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1146\/annurev-genet-120213-092023","article-title":"A Uniform System for the Annotation of Vertebrate MicroRNA Genes and the Evolution of the Human MicroRNAome","volume":"49","author":"Fromm","year":"2015","journal-title":"Annu. Rev. Genet."},{"key":"ref_16","first-page":"20170032","article-title":"Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for Pre- MicroRNA Detection","volume":"14","author":"Duygu","year":"2017","journal-title":"J. Integr. Bioinform."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"D78","DOI":"10.1093\/nar\/gkt1266","article-title":"MiRTarBase Update 2014: An Information Resource for Experimentally Validated MiRNA-Target Interactions","volume":"42","author":"Hsu","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"D222","DOI":"10.1093\/nar\/gkr1161","article-title":"TarBase 6.0: Capturing the Exponential Growth of MiRNA Targets with Experimental Support","volume":"40","author":"Vergoulis","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"D152","DOI":"10.1093\/nar\/gkq1027","article-title":"MiRBase: Integrating MicroRNA Annotation and Deep-Sequencing Data","volume":"39","author":"Kozomara","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1038\/s41467-017-00403-z","article-title":"On the Performance of Pre-MicroRNA Detection Algorithms","volume":"8","author":"Demirci","year":"2017","journal-title":"Nat. Commun."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sacar, M.D., and Allmer, J. (2013, January 25\u201327). Data Mining for Microrna Gene Prediction: On the Impact of Class Imbalance and Feature Number for Microrna Gene Prediction. Proceedings of the 2013 8th International Symposium on Health Informatics and Bioinformatics, Ankara, Turkey.","DOI":"10.1109\/HIBIT.2013.6661685"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M.J., Tuschl, T., van Nimwegen, E., and Zavolan, M. (2005). Identification of Clustered MicroRNAs Using an Ab Initio Prediction Method. BMC Bioinform., 6.","DOI":"10.1186\/1471-2105-6-267"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"42230","DOI":"10.1074\/jbc.M404931200","article-title":"Structural Features of MicroRNA (MiRNA) Precursors and Their Relevance to MiRNA Biogenesis and Small Interfering RNA\/Short Hairpin RNA Design","volume":"279","author":"Krol","year":"2004","journal-title":"J. Biol. Chem."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.biosystems.2015.10.003","article-title":"MiRNAfe: A Comprehensive Tool for Feature Extraction in MicroRNA Prediction","volume":"138","author":"Yones","year":"2015","journal-title":"BioSystems"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"R42","DOI":"10.1186\/gb-2003-4-7-r42","article-title":"Computational Identification of Drosophila MicroRNA Genes","volume":"4","author":"Lai","year":"2003","journal-title":"Genome Biol."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yousef, M., Khalifa, W., Acar, I.E., and Allmer, J. (2017). MicroRNA Categorization Using Sequence Motifs and K-Mers. BMC Bioinform., 18.","DOI":"10.1186\/s12859-017-1584-1"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yousef, M., Nigatu, D., Levy, D., Allmer, J., and Henkel, W. (2017). Categorization of Species Based on Their MicroRNAs Employing Sequence Motifs, Infor-Mation-Theoretic Sequence Feature Extraction, and k-Mers. EURASIP J. Adv. Signal Process., 2017.","DOI":"10.1186\/s13634-017-0506-8"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Cakir, M.V., and Allmer, J. (2010, January 20\u201322). Systematic Computational Analysis of Potential RNAi Regulation in Toxoplasma Gondii. Proceedings of the 2010 5th International Symposium on Health Informatics and Bioinformatics, Ankara, Turkey.","DOI":"10.1109\/HIBIT.2010.5478909"},{"key":"ref_29","unstructured":"Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., K\u00f6tter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., and Wiswedel, B. (2021, April 21). KNIME: The Konstanz Information Miner. Available online: https:\/\/www.knime.com\/sites\/default\/files\/knime_whitepaper.pdf."},{"key":"ref_30","first-page":"12.9.1","article-title":"MiRBase: MicroRNA Sequences and Annotation","volume":"29","year":"2010","journal-title":"Curr. Protoc. Bioinform."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1093\/bioinformatics\/btm026","article-title":"De Novo SVM Classification of Precursor MicroRNAs from Genomic Pseudo Hairpins Using Global and Intrinsic Folding Measures","volume":"23","author":"Ng","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1058","DOI":"10.1093\/bioinformatics\/bts114","article-title":"Defining and Providing Robust Controls for MicroRNA Prediction","volume":"28","author":"Ritchie","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"W339","DOI":"10.1093\/nar\/gkm368","article-title":"MiPred: Classification of Real and Pseudo MicroRNA Precursors Using Random Forest Prediction Model with Combined Features","volume":"35","author":"Jiang","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Xue, C., Li, F., He, T., Liu, G.-P., Li, Y., and Zhang, X. (2005). Classification of Real and Pseudo MicroRNA Precursors Using Local Structure-Sequence Features and Support Vector Machine. BMC Bioinform., 6.","DOI":"10.1186\/1471-2105-6-310"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Yousef, M., Allmer, J., and Khalifa, W. (2015). Sequence Motif-Based One-Class Classifiers Can Achieve Comparable Accuracy to Two-Class Learners for Plant MicroRNA Detection. J. Biomed. Sci. Eng.","DOI":"10.4236\/jbise.2015.810065"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"2460","DOI":"10.1093\/bioinformatics\/btq461","article-title":"Search and Clustering Orders of Magnitude Faster than BLAST","volume":"26","author":"Edgar","year":"2010","journal-title":"Bioinformatics"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0169-7439(00)00122-2","article-title":"Monte Carlo Cross Validation","volume":"56","author":"Xu","year":"2001","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"BBA Protein Struct."},{"key":"ref_40","unstructured":"Sa\u00e7ar Demirci, M.D., Ba\u011fci, C., and Allmer, J. (2021, April 21). Differential Expression of Toxoplasma Gondii MicroRNAs in Murine and Human Hosts. Available online: https:\/\/openaccess.iyte.edu.tr\/xmlui\/bitstream\/handle\/11147\/7918\/10.1007@978-3-319-39496-19.pdf;jsessionid=D7A7AB90CE83A13466B77615F319E128?sequence=1."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1016\/j.gpb.2014.09.002","article-title":"Computational Prediction of MicroRNAs from Toxoplasma Gondii Potentially Regulating the Hosts\u2019 Gene Expression","volume":"12","author":"Allmer","year":"2014","journal-title":"Genom. Proteom. Bioinform."},{"key":"ref_42","first-page":"335","article-title":"Evolution of MicroRNAs","volume":"342","author":"Tanzer","year":"2006","journal-title":"Methods Mol. Biol."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/5\/132\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:51:35Z","timestamp":1760161895000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/5\/132"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,22]]},"references-count":42,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["a14050132"],"URL":"https:\/\/doi.org\/10.3390\/a14050132","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,22]]}}}