{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T22:22:17Z","timestamp":1757456537403,"version":"3.37.3"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"S3","license":[{"start":{"date-parts":[[2021,5,1]],"date-time":"2021-05-01T00:00:00Z","timestamp":1619827200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,5,13]],"date-time":"2021-05-13T00:00:00Z","timestamp":1620864000000},"content-version":"vor","delay-in-days":12,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"startup grant of Harbin Institute of Technology Shenzhen"},{"name":"the National \u201c863\u201d Key Basic Research Development Program","award":["2014AA021505"],"award-info":[{"award-number":["2014AA021505"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61702134"],"award-info":[{"award-number":["61702134"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Shenzhen stable support program"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-020-03884-w","type":"journal-article","created":{"date-parts":[[2021,5,12]],"date-time":"2021-05-12T23:04:37Z","timestamp":1620860677000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["IIMLP: integrated information-entropy-based method for LncRNA prediction"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8045-5264","authenticated-orcid":false,"given":"Junyi","family":"Li","sequence":"first","affiliation":[]},{"given":"Huinian","family":"Li","sequence":"additional","affiliation":[]},{"given":"Xiao","family":"Ye","sequence":"additional","affiliation":[]},{"given":"Li","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Qingzhe","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Yuan","family":"Ping","sequence":"additional","affiliation":[]},{"given":"Xiaozhu","family":"Jing","sequence":"additional","affiliation":[]},{"given":"Wei","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Qing","family":"Liao","sequence":"additional","affiliation":[]},{"given":"Bo","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Yadong","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,5,13]]},"reference":[{"issue":"5","key":"3884_CR1","doi-asserted-by":"publisher","first-page":"815","DOI":"10.1016\/j.cell.2007.02.029","volume":"128","author":"C Yanofsky","year":"2007","unstructured":"Yanofsky C. Establishing the triplet nature of the genetic code. Cell. 2007;128(5):815\u20138.","journal-title":"Cell"},{"issue":"2","key":"3884_CR2","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1093\/bfgp\/elu034","volume":"14","author":"V Mohanty","year":"2015","unstructured":"Mohanty V, Gokmen-Polar Y, Badve S, Janga SC. Role of lncRNAs in health and disease-size and shape matter. Brief Funct Genom. 2015;14(2):115\u201329.","journal-title":"Brief Funct Genom"},{"issue":"12","key":"3884_CR3","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1038\/nrg3074","volume":"12","author":"M Esteller","year":"2011","unstructured":"Esteller M. Non-coding RNAs in human disease. Nat Rev Genet. 2011;12(12):861\u201374.","journal-title":"Nat Rev Genet"},{"issue":"3","key":"3884_CR4","doi-asserted-by":"publisher","first-page":"288","DOI":"10.1002\/bies.20544","volume":"29","author":"RJ Taft","year":"2007","unstructured":"Taft RJ, Pheasant M, Mattick JS. The relationship between non-protein-coding DNA and eukaryotic complexity. BioEssays. 2007;29(3):288\u201399.","journal-title":"BioEssays"},{"issue":"7291","key":"3884_CR5","doi-asserted-by":"publisher","first-page":"1071","DOI":"10.1038\/nature08975","volume":"464","author":"RA Gupta","year":"2010","unstructured":"Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, Tsai MC, Hung T, Argani P, Rinn JL, et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071-U1148.","journal-title":"Nature"},{"issue":"1","key":"3884_CR6","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1093\/bib\/bbv031","volume":"17","author":"F Ferre","year":"2016","unstructured":"Ferre F, Colantoni A, Helmer-Citterich M. Revealing protein-lncRNA interaction. Brief Bioinform. 2016;17(1):106\u201316.","journal-title":"Brief Bioinform"},{"issue":"5","key":"3884_CR7","doi-asserted-by":"publisher","first-page":"806","DOI":"10.1093\/bib\/bbu048","volume":"16","author":"JW Li","year":"2015","unstructured":"Li JW, Ma W, Zeng P, Wang JY, Geng B, Yang JC, Cui QH. LncTar: a tool for predicting the RNA targets of long noncoding RNAs. Brief Bioinform. 2015;16(5):806\u201312.","journal-title":"Brief Bioinform"},{"issue":"1","key":"3884_CR8","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1093\/bib\/bbv114","volume":"18","author":"S Yotsukura","year":"2017","unstructured":"Yotsukura S, Duverle D, Hancock T, Natsume-Kitatani Y, Mamitsuka H. Computational recognition for long non-coding RNA (lncRNA): software and databases. Brief Bioinform. 2017;18(1):9\u201327.","journal-title":"Brief Bioinform"},{"issue":"3","key":"3884_CR9","doi-asserted-by":"publisher","first-page":"527","DOI":"10.1016\/0092-8674(92)90520-M","volume":"71","author":"CJ Brown","year":"1992","unstructured":"Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing Y, Lawrence J, Willard HF. The human Xist gene\u2014analysis of a 17 Kb inactive X-specific Rna that contains conserved repeats and is highly localized within the nucleus. Cell. 1992;71(3):527\u201342.","journal-title":"Cell"},{"issue":"7235","key":"3884_CR10","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1038\/nature07672","volume":"458","author":"M Guttman","year":"2009","unstructured":"Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458(7235):223\u20137.","journal-title":"Nature"},{"issue":"18","key":"3884_CR11","doi-asserted-by":"publisher","first-page":"1915","DOI":"10.1101\/gad.17446611","volume":"25","author":"MN Cabili","year":"2011","unstructured":"Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25(18):1915\u201327.","journal-title":"Genes Dev"},{"issue":"6","key":"3884_CR12","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1093\/nar\/gks1460","volume":"41","author":"L Wang","year":"2013","unstructured":"Wang L, Park HJ, Dasari S, Wang SQ, Kocher JP, Li W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013;41(6):56.","journal-title":"Nucleic Acids Res"},{"key":"3884_CR13","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1186\/1471-2105-15-16","volume":"15","author":"AM Li","year":"2014","unstructured":"Li AM, Zhang JY, Zhou ZY. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. Bmc Bioinform. 2014;15:16.","journal-title":"Bmc Bioinform"},{"issue":"24","key":"3884_CR14","doi-asserted-by":"crossref","first-page":"3897","DOI":"10.1093\/bioinformatics\/btv480","volume":"31","author":"R Achawanantakun","year":"2015","unstructured":"Achawanantakun R, Chen J, Sun YN, Zhang Y. LncRNA-ID: long non-coding RNA IDentification using balanced random forests. Bioinformatics. 2015;31(24):3897\u2013905.","journal-title":"Bioinformatics"},{"key":"3884_CR15","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1186\/s12864-017-4178-4","volume":"18","author":"HW Schneider","year":"2017","unstructured":"Schneider HW, Raiol T, Brigido MM, Walter MEMT, Stadler PF. A support vector machine based method to distinguish long non-coding RNAs from protein coding transcripts. BMC Genom. 2017;18:56.","journal-title":"BMC Genom"},{"issue":"D1","key":"3884_CR16","doi-asserted-by":"publisher","first-page":"D754","DOI":"10.1093\/nar\/gkx1098","volume":"46","author":"DR Zerbino","year":"2018","unstructured":"Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, Billis K, Cummins C, Gall A, Giron CG, et al. Ensembl 2018. Nucleic Acids Res. 2018;46(D1):D754\u201361.","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"3884_CR17","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1093\/bioinformatics\/17.3.282","volume":"17","author":"WZ Li","year":"2001","unstructured":"Li WZ, Jaroszewski L, Godzik A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17(3):282\u20133.","journal-title":"Bioinformatics"},{"issue":"1","key":"3884_CR18","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1093\/bioinformatics\/18.1.77","volume":"18","author":"WZ Li","year":"2002","unstructured":"Li WZ, Jaroszewski L, Godzik A. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics. 2002;18(1):77\u201382.","journal-title":"Bioinformatics"},{"issue":"13","key":"3884_CR19","doi-asserted-by":"publisher","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","volume":"22","author":"WZ Li","year":"2006","unstructured":"Li WZ, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658\u20139.","journal-title":"Bioinformatics"},{"issue":"8","key":"3884_CR20","doi-asserted-by":"publisher","first-page":"1061","DOI":"10.1093\/bioinformatics\/btr077","volume":"27","author":"D Koslicki","year":"2011","unstructured":"Koslicki D. Topological entropy of DNA sequences. Bioinformatics. 2011;27(8):1061\u20137.","journal-title":"Bioinformatics"},{"issue":"2","key":"3884_CR21","first-page":"56","volume":"9","author":"SL Jin","year":"2014","unstructured":"Jin SL, Tan RJ, Jiang QH, Xu L, Peng JJ, Wang Y, Wang YD. A generalized topological entropy for analyzing the complexity of DNA sequences. PLloS ONE. 2014;9(2):56.","journal-title":"PLloS ONE"},{"issue":"Suppl 8","key":"3884_CR22","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1186\/s12859-019-2772-y","volume":"20","author":"J Li","year":"2019","unstructured":"Li J, Zhang L, Li H, Ping Y, Xu Q, Wang R, Tan R, Wang Z, Liu B, Wang Y. Integrated entropy-based approach for analyzing exons and introns in DNA sequences. BMC Bioinform. 2019;20(Suppl 8):283.","journal-title":"BMC Bioinform"},{"key":"3884_CR23","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1186\/s12859-017-1884-5","volume":"18","author":"D Nigatu","year":"2017","unstructured":"Nigatu D, Sobetzko P, Yousef M, Henkel W. Sequence-based information-theoretic features for gene essentiality prediction. BMC Bioinform. 2017;18:56.","journal-title":"BMC Bioinform"},{"issue":"4","key":"3884_CR24","first-page":"306","volume":"14","author":"CE Shannon","year":"1997","unstructured":"Shannon CE. The mathematical theory of communication (reprinted). M D Comput. 1997;14(4):306\u201317.","journal-title":"M D Comput"},{"issue":"1","key":"3884_CR25","first-page":"22","volume":"16","author":"KW Church","year":"1990","unstructured":"Church KW, Hanks P. Word association norms, mutual information, and lexicography. Comput Linguist. 1990;16(1):22\u20139.","journal-title":"Comput Linguist"},{"issue":"1","key":"3884_CR26","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1214\/aoms\/1177729694","volume":"22","author":"S Kullback","year":"1951","unstructured":"Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22(1):79\u201386.","journal-title":"Ann Math Stat"},{"key":"3884_CR27","unstructured":"Platt J. Sequential minimal optimization: a fast algorithm for training support vector machines. 1998."},{"issue":"8","key":"3884_CR28","doi-asserted-by":"publisher","first-page":"832","DOI":"10.1109\/34.709601","volume":"20","author":"TK Ho","year":"1998","unstructured":"Ho TK. The random subspace method for constructing decision forests. IEEE T Pattern Anal. 1998;20(8):832\u201344.","journal-title":"IEEE T Pattern Anal"},{"key":"3884_CR29","doi-asserted-by":"crossref","unstructured":"Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco, CA, USA: ACM; 2016: 785\u201394.","DOI":"10.1145\/2939672.2939785"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03884-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-020-03884-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03884-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,3]],"date-time":"2023-11-03T17:24:04Z","timestamp":1699032244000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-03884-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5]]},"references-count":29,"journal-issue":{"issue":"S3","published-print":{"date-parts":[[2021,5]]}},"alternative-id":["3884"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-03884-w","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2021,5]]},"assertion":[{"value":"15 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 November 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 May 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"243"}}