{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:43:29Z","timestamp":1761896609133},"reference-count":58,"publisher":"Walter de Gruyter GmbH","issue":"2","license":[{"start":{"date-parts":[[2017,7,28]],"date-time":"2017-07-28T00:00:00Z","timestamp":1501200000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,7,28]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins.<\/jats:p>","DOI":"10.1515\/jib-2017-0032","type":"journal-article","created":{"date-parts":[[2017,7,28]],"date-time":"2017-07-28T10:07:53Z","timestamp":1501236473000},"source":"Crossref","is-referenced-by-count":2,"title":["Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection"],"prefix":"10.1515","volume":"14","author":[{"given":"M\u00fc\u015ferref Duygu Sa\u00e7ar","family":"Demirci","sequence":"first","affiliation":[{"name":"Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, Turkey"}]},{"given":"Jens","family":"Allmer","sequence":"additional","affiliation":[{"name":"Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, Turkey"}]}],"member":"374","reference":[{"key":"ref581","article-title":"On the performance of pre-microRNA detection algorithms","journal-title":"Nature Communications"},{"key":"ref501","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1186\/1471-2164-12-183","article-title":"Copy number variation of microRNA genes in the human genome","volume":"12","year":"2011","journal-title":"BMC Genomics"},{"key":"ref521","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1093\/bioinformatics\/btm026","article-title":"De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures","volume":"23","year":"2007","journal-title":"Bioinformatics"},{"key":"ref491","first-page":"215","article-title":"Can MiRBase provide positive data for machine learning for the detection of MiRNA hairpins?","volume":"10","year":"2013","journal-title":"J Integr Bioinform"},{"key":"ref201","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1186\/1471-2164-12-183","article-title":"Copy number variation of microRNA genes in the human genome","volume":"12","year":"2011","journal-title":"BMC Genomics"},{"key":"ref261","first-page":"1","year":"2013","journal-title":"Data mining for microrna gene prediction: On the impact of class imbalance and feature number for microrna gene prediction"},{"key":"ref21","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1016\/0092-8674(93)90529-Y","article-title":"The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14","volume":"75","year":"1993","journal-title":"Cell"},{"key":"ref121","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1186\/gb-2011-12-4-221","article-title":"Vive la diff\u00e9rence: biogenesis and evolution of microRNAs in plants and animals","volume":"12","year":"2011","journal-title":"Genome Biol"},{"key":"ref131","doi-asserted-by":"crossref","first-page":"D152","DOI":"10.1093\/nar\/gkq1027","article-title":"miRBase: integrating microRNA annotation and deep-sequencing data","volume":"39","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"ref41","first-page":"D154","article-title":"miRBase: tools for microRNA genomics","volume":"36","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"ref381","doi-asserted-by":"crossref","first-page":"992","DOI":"10.1101\/gad.1884710","article-title":"Mammalian microRNAs: experimental evaluation of novel and previously annotated genes","volume":"24","year":"2010","journal-title":"Genes Dev"},{"key":"ref421","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1186\/gb-2011-12-4-221","article-title":"Vive la diff\u00e9rence: biogenesis and evolution of microRNAs in plants and animals","volume":"12","year":"2011","journal-title":"Genome Biol"},{"key":"ref81","doi-asserted-by":"crossref","first-page":"992","DOI":"10.1101\/gad.1884710","article-title":"Mammalian microRNAs: experimental evaluation of novel and previously annotated genes","volume":"24","year":"2010","journal-title":"Genes Dev"},{"key":"ref231","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1007\/978-3-540-78246-9_38","volume-title":"Data analysis, machine learning and applications","year":"2008"},{"key":"ref271","doi-asserted-by":"crossref","first-page":"E1106","DOI":"10.1073\/pnas.1420955112","article-title":"Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs","volume":"112","year":"2015","journal-title":"Proc Natl Acad Sci"},{"key":"ref71","doi-asserted-by":"crossref","first-page":"e3131","DOI":"10.7717\/peerj.3131","article-title":"Delineating the impact of machine learning elements in pre-microRNA detection","volume":"5","year":"2017","journal-title":"PeerJ"},{"key":"ref461","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1261\/rna.2183803","article-title":"A uniform system for microRNA annotation","volume":"9","year":"2003","journal-title":"RNA"},{"key":"ref161","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1261\/rna.2183803","article-title":"A uniform system for microRNA annotation","volume":"9","year":"2003","journal-title":"RNA"},{"key":"ref281","article-title":"On the performance of pre-microRNA detection algorithms","journal-title":"Nature Communications"},{"key":"ref551","doi-asserted-by":"crossref","first-page":"D68","DOI":"10.1093\/nar\/gkt1181","article-title":"miRBase: annotating high confidence microRNAs using deep sequencing data","volume":"42","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"ref51","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1038\/ng1794","article-title":"Approaches to microRNA discovery","volume":"38","year":"2006","journal-title":"Nat Genet"},{"key":"ref11","doi-asserted-by":"crossref","first-page":"D1070","DOI":"10.1093\/nar\/gkt1023","article-title":"HMDD v2.0: A database for experimentally supported human microRNA and disease associations","volume":"42","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"ref341","first-page":"D154","article-title":"miRBase: tools for microRNA genomics","volume":"36","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"ref401","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1186\/1471-2164-13-197","article-title":"Target mimics: an embedded layer of microRNA-involved gene regulatory networks in plants","volume":"13","year":"2012","journal-title":"BMC Genomics"},{"key":"ref571","doi-asserted-by":"crossref","first-page":"E1106","DOI":"10.1073\/pnas.1420955112","article-title":"Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs","volume":"112","year":"2015","journal-title":"Proc Natl Acad Sci"},{"key":"ref141","doi-asserted-by":"crossref","first-page":"1153","DOI":"10.1038\/nsmb.2125","article-title":"Recognition of the pre-miRNA structure by Drosophila Dicer-1","volume":"18","year":"2011","journal-title":"Nat Struct Mol Biol"},{"key":"ref321","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1016\/0092-8674(93)90529-Y","article-title":"The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14","volume":"75","year":"1993","journal-title":"Cell"},{"key":"ref61","first-page":"177","volume-title":"MiRNomics: microRNA biology and computational analysis SE \u2013 10","year":"2014"},{"key":"ref151","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.tplants.2013.11.008","article-title":"Evolutionary history of plant microRNAs","volume":"19","year":"2014","journal-title":"Trends Plant Sci"},{"key":"ref331","doi-asserted-by":"crossref","first-page":"D78","DOI":"10.1093\/nar\/gkt1266","article-title":"miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions","volume":"42","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"ref241","doi-asserted-by":"crossref","first-page":"1033","DOI":"10.1038\/nmeth.3583","article-title":"Comparing the performance of biomedical clustering methods","volume":"12","year":"2015","journal-title":"Nat Methods"},{"key":"ref351","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1038\/ng1794","article-title":"Approaches to microRNA discovery","volume":"38","year":"2006","journal-title":"Nat Genet"},{"key":"ref191","first-page":"215","article-title":"Can MiRBase provide positive data for machine learning for the detection of MiRNA hairpins?","volume":"10","year":"2013","journal-title":"J Integr Bioinform"},{"key":"ref431","doi-asserted-by":"crossref","first-page":"D152","DOI":"10.1093\/nar\/gkq1027","article-title":"miRBase: integrating microRNA annotation and deep-sequencing data","volume":"39","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"ref221","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1093\/bioinformatics\/btm026","article-title":"De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures","volume":"23","year":"2007","journal-title":"Bioinformatics"},{"key":"ref301","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1038\/nrg2290","article-title":"Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight?","volume":"9","year":"2008","journal-title":"Nat Rev Genet"},{"key":"ref541","doi-asserted-by":"crossref","first-page":"1033","DOI":"10.1038\/nmeth.3583","article-title":"Comparing the performance of biomedical clustering methods","volume":"12","year":"2015","journal-title":"Nat Methods"},{"key":"ref171","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1146\/annurev-genet-120213-092023","article-title":"A uniform system for the annotation of vertebrate microRNA genes and the evolution of the human microRNAome","volume":"49","year":"2015","journal-title":"Annu Rev Genet"},{"key":"ref411","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1002\/bies.201200055","article-title":"Do miRNAs have a deep evolutionary history?","volume":"34","year":"2012","journal-title":"BioEssays"},{"key":"ref311","doi-asserted-by":"crossref","first-page":"D1070","DOI":"10.1093\/nar\/gkt1023","article-title":"HMDD v2.0: A database for experimentally supported human microRNA and disease associations","volume":"42","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"ref101","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1186\/1471-2164-13-197","article-title":"Target mimics: an embedded layer of microRNA-involved gene regulatory networks in plants","volume":"13","year":"2012","journal-title":"BMC Genomics"},{"key":"ref451","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.tplants.2013.11.008","article-title":"Evolutionary history of plant microRNAs","volume":"19","year":"2014","journal-title":"Trends Plant Sci"},{"key":"ref561","first-page":"1","year":"2013","journal-title":"Data mining for microrna gene prediction: On the impact of class imbalance and feature number for microrna gene prediction"},{"key":"ref591","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.tplants.2013.11.008","article-title":"Evolutionary history of plant microRNAs","volume":"19","year":"2014","journal-title":"Trends Plant Sci"},{"key":"ref211","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1038\/nrm2632","article-title":"Biogenesis of small RNAs in animals","volume":"10","year":"2009","journal-title":"Nat Rev Mol Cell Biol"},{"key":"ref291","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.tplants.2013.11.008","article-title":"Evolutionary history of plant microRNAs","volume":"19","year":"2014","journal-title":"Trends Plant Sci"},{"key":"ref251","doi-asserted-by":"crossref","first-page":"D68","DOI":"10.1093\/nar\/gkt1181","article-title":"miRBase: annotating high confidence microRNAs using deep sequencing data","volume":"42","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"ref31","doi-asserted-by":"crossref","first-page":"D78","DOI":"10.1093\/nar\/gkt1266","article-title":"miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions","volume":"42","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"ref361","first-page":"177","volume-title":"MiRNomics: microRNA biology and computational analysis SE \u2013 10","year":"2014"},{"key":"ref371","doi-asserted-by":"crossref","first-page":"e3131","DOI":"10.7717\/peerj.3131","article-title":"Delineating the impact of machine learning elements in pre-microRNA detection","volume":"5","year":"2017","journal-title":"PeerJ"},{"key":"ref01","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1038\/nrg2290","article-title":"Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight?","volume":"9","year":"2008","journal-title":"Nat Rev Genet"},{"key":"ref531","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1007\/978-3-540-78246-9_38","volume-title":"Data analysis, machine learning and applications","year":"2008"},{"key":"ref391","first-page":"25","article-title":"Systematic curation of miRBase annotation using integrated small RNA high-throughput sequencing data for C. elegans and Drosophila","volume":"2","year":"2011","journal-title":"Front Genet"},{"key":"ref471","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1146\/annurev-genet-120213-092023","article-title":"A uniform system for the annotation of vertebrate microRNA genes and the evolution of the human microRNAome","volume":"49","year":"2015","journal-title":"Annu Rev Genet"},{"key":"ref441","doi-asserted-by":"crossref","first-page":"1153","DOI":"10.1038\/nsmb.2125","article-title":"Recognition of the pre-miRNA structure by Drosophila Dicer-1","volume":"18","year":"2011","journal-title":"Nat Struct Mol Biol"},{"key":"ref511","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1038\/nrm2632","article-title":"Biogenesis of small RNAs in animals","volume":"10","year":"2009","journal-title":"Nat Rev Mol Cell Biol"},{"key":"ref91","first-page":"25","article-title":"Systematic curation of miRBase annotation using integrated small RNA high-throughput sequencing data for C. elegans and Drosophila","volume":"2","year":"2011","journal-title":"Front Genet"},{"key":"ref111","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1002\/bies.201200055","article-title":"Do miRNAs have a deep evolutionary history?","volume":"34","year":"2012","journal-title":"BioEssays"}],"container-title":["Journal of Integrative Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.degruyter.com\/view\/journals\/jib\/14\/2\/article-20170032.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2017-0032\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,4,22]],"date-time":"2021-04-22T01:57:39Z","timestamp":1619056659000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2017-0032\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7,28]]},"references-count":58,"journal-issue":{"issue":"2"},"URL":"https:\/\/doi.org\/10.1515\/jib-2017-0032","relation":{},"ISSN":["1613-4516"],"issn-type":[{"value":"1613-4516","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,7,28]]}}}