{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:48:19Z","timestamp":1775638099414,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2021,3,3]],"date-time":"2021-03-03T00:00:00Z","timestamp":1614729600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Louisiana Board of Regents through the Board of Regents Support Fund LEQSF","award":["2016-19)-RD-B-07"],"award-info":[{"award-number":["2016-19)-RD-B-07"]}]},{"name":"Louisiana Board of Regents Support Fund","award":["LEQSF(2017-20)-RD-A-26"],"award-info":[{"award-number":["LEQSF(2017-20)-RD-A-26"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Transposable Elements (TEs) or jumping genes are DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even mediate duplications and large insertions and deletions in the genome, promoting gross genetic rearrangements. The proper classification of identified jumping genes is important for analyzing their genetic and evolutionary effects. An effective classifier, which can explain the role of TEs in germline and somatic evolution more accurately, is needed. In this study, we examine the performance of a variety of machine learning (ML) techniques and propose a robust method, ClassifyTE, for the hierarchical classification of TEs with high accuracy, using a stacking-based ML method.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We propose a stacking-based approach for the hierarchical classification of TEs. When trained on three different benchmark datasets, our proposed system achieved 4%, 10.68% and 10.13% average percentage improvement (using the hF measure) compared to several state-of-the-art methods. We developed an end-to-end automated hierarchical classification tool based on the proposed approach, ClassifyTE, to classify TEs up to the super-family level. We further evaluated our method on a new TE library generated by a homology-based classification method and found relatively high concordance at higher taxonomic levels. Thus, ClassifyTE paves the way for a more accurate analysis of the role of TEs.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The source code and data are available at https:\/\/github.com\/manisa\/ClassifyTE.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab146","type":"journal-article","created":{"date-parts":[[2021,3,1]],"date-time":"2021-03-01T20:15:21Z","timestamp":1614629721000},"page":"2529-2536","source":"Crossref","is-referenced-by-count":21,"title":["ClassifyTE: a stacking-based prediction of hierarchical classification of transposable elements"],"prefix":"10.1093","volume":"37","author":[{"given":"Manisha","family":"Panta","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of New Orleans , New Orleans, LA 70148, USA"}]},{"given":"Avdesh","family":"Mishra","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville , Kingsville, TX 78363, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0110-2194","authenticated-orcid":false,"given":"Md Tamjidul","family":"Hoque","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of New Orleans , New Orleans, LA 70148, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2425-2395","authenticated-orcid":false,"given":"Joel","family":"Atallah","sequence":"additional","affiliation":[{"name":"Department of Biological Sciences, University of New Orleans , New Orleans, LA 70148, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,3,3]]},"reference":[{"key":"2023051609215639300_btab146-B1","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1093\/bioinformatics\/btp084","article-title":"TEclass\u2013a tool for automated classification of unknown eukaryotic transposable elements","volume":"25","author":"Abrus\u00e1n","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051609215639300_btab146-B2","doi-asserted-by":"crossref","first-page":"56","DOI":"10.3390\/make2020005","article-title":"AIBH: accurate identification of brain hemorrhage using genetic algorithm based feature selection and stacking","volume":"2","author":"Alawad","year":"2020","journal-title":"Mach. Learn. Knowl. Extr"},{"key":"2023051609215639300_btab146-B3","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/00031305.1992.10475879","article-title":"An introduction to kernel and nearest-neighbor nonparametric regression","volume":"46","author":"Altman","year":"1992","journal-title":"Am Stat"},{"key":"2023051609215639300_btab146-B4","doi-asserted-by":"crossref","first-page":"2070","DOI":"10.1093\/bioinformatics\/btu152","article-title":"KAnalyze: a fast versatile pipelined K-mer toolkit","volume":"30","author":"Audano","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051609215639300_btab146-B5","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res"},{"key":"2023051609215639300_btab146-B6","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023051609215639300_btab146-B7","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1186\/s12859-016-1232-1","article-title":"Reduction strategies for hierarchical multi-label classification in protein function prediction","volume":"17","author":"Cerri","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023051609215639300_btab146-B8","doi-asserted-by":"crossref","first-page":"1055","DOI":"10.1109\/72.788646","article-title":"Support vector machines for histogram-based image classification","volume":"10","author":"Chapelle","year":"1999","journal-title":"IEEE Trans. Neural Netw"},{"key":"2023051609215639300_btab146-B9","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1145\/2939672.2939785","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Chen","year":"2016"},{"key":"2023051609215639300_btab146-B10","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn"},{"key":"2023051609215639300_btab146-B11","first-page":"256","volume-title":"SIGIR \u201900 Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Dumais","year":"2000"},{"key":"2023051609215639300_btab146-B12","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1093\/gbe\/evp023","article-title":"Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes","volume":"1","author":"Feschotte","year":"2009","journal-title":"Genome Biol. Evol"},{"key":"2023051609215639300_btab146-B13","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1007\/978-1-4939-9161-7_5","volume-title":"Protein Supersecondary Structures. Methods in Molecular Biology.","author":"Flot","year":"2019"},{"key":"2023051609215639300_btab146-B14","doi-asserted-by":"crossref","first-page":"9451","DOI":"10.1073\/pnas.1921046117","article-title":"RepeatModeler2 for automated genomic discovery of transposable element families","volume":"117","author":"Flynn","year":"2020","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051609215639300_btab146-B15","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511815867","volume-title":"Statistical Models: Theory and Practice","author":"Freedman","year":"2009"},{"key":"2023051609215639300_btab146-B16","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1016\/S0167-9473(01)00065-2","article-title":"Stochastic gradient boosting","volume":"38","author":"Friedman","year":"2002","journal-title":"Comput. Stat. Data Anal"},{"key":"2023051609215639300_btab146-B17","doi-asserted-by":"crossref","first-page":"107857","DOI":"10.1016\/j.carres.2019.107857","article-title":"StackCBPred: a stacking based prediction of protein-carbohydrate binding sites from sequence","volume":"486","author":"Gattani","year":"2019","journal-title":"Carbohydr. Res"},{"key":"2023051609215639300_btab146-B18","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","article-title":"Extremely randomized trees","volume":"63","author":"Geurts","year":"2006","journal-title":"Mach. Learn"},{"key":"2023051609215639300_btab146-B19","doi-asserted-by":"crossref","first-page":"D1141","DOI":"10.1093\/nar\/gkv1130","article-title":"PGSB PlantsDB: updates to the database framework for comparative plant genome research","volume":"44","author":"Gundlach","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023051609215639300_btab146-B20","doi-asserted-by":"crossref","first-page":"349","DOI":"10.4310\/SII.2009.v2.n3.a8","article-title":"Multi-class adaboost","volume":"2","author":"Hastie","year":"2009","journal-title":"Stat. Interface"},{"key":"2023051609215639300_btab146-B21","doi-asserted-by":"crossref","DOI":"10.1201\/9780429499661","volume-title":"Introduction to the Theory of Neural Computation","author":"Hertz","year":"2018"},{"key":"2023051609215639300_btab146-B22","doi-asserted-by":"crossref","first-page":"e91929","DOI":"10.1371\/journal.pone.0091929","article-title":"PASTEC: an automatic transposable element classification tool","volume":"9","author":"Hoede","year":"2014","journal-title":"PLos One"},{"key":"2023051609215639300_btab146-B23","doi-asserted-by":"crossref","first-page":"3289","DOI":"10.1093\/bioinformatics\/bty352","article-title":"PBRpredict-Suite: a suite of models to predict peptide-recognition domain residues from protein sequence","volume":"34","author":"Iqbal","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051609215639300_btab146-B24","first-page":"137","volume-title":"Text Categorization with Support Vector Machines: Learning with Many Relevant Features","author":"Joachims","year":"1998"},{"key":"2023051609215639300_btab146-B25","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1159\/000084979","article-title":"Repbase Update, a database of eukaryotic repetitive elements","volume":"110","author":"Jurka","year":"2005","journal-title":"Cytogenet Genome Res"},{"key":"2023051609215639300_btab146-B26","doi-asserted-by":"crossref","first-page":"226","DOI":"10.5808\/GI.2012.10.4.226","article-title":"Transposable elements: no more \u2018Junk DNA\u2019","volume":"10","author":"Kim","year":"2012","journal-title":"Genomics Inform"},{"key":"2023051609215639300_btab146-B27","doi-asserted-by":"crossref","first-page":"100012","DOI":"10.1016\/j.array.2019.100012","article-title":"Machine learning applications in detecting sand boils from images","volume":"3\u20134","author":"Kuchi","year":"2019","journal-title":"Array"},{"key":"2023051609215639300_btab146-B28","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.asoc.2019.02.017","article-title":"Machine learning applications in detecting rip channels from images","volume":"78","author":"Maryan","year":"2019","journal-title":"Appl. Soft Comput"},{"key":"2023051609215639300_btab146-B29","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1101\/SQB.1956.021.01.017","article-title":"Controlling elements and the gene","volume":"21","author":"Mcclintock","year":"1956","journal-title":"Cold Spring Harb. Symp. Quant. Biol"},{"key":"2023051609215639300_btab146-B30","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1186\/1471-2105-12-333","article-title":"Efficient counting of k-mers in DNA sequences using a bloom filter","volume":"12","author":"Melsted","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023051609215639300_btab146-B31","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1093\/bioinformatics\/bty653","article-title":"StackDPPred: a stacking based prediction of DNA-binding protein from sequence","volume":"35","author":"Mishra","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051609215639300_btab146-B32","doi-asserted-by":"crossref","first-page":"e1241050","DOI":"10.1080\/2159256X.2016.1241050","article-title":"LTRclassifier: a website for fast structural LTR retrotransposons classification in plants","volume":"6","author":"Monat","year":"2016","journal-title":"Mob. Genet. Elements"},{"key":"2023051609215639300_btab146-B33","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/s13721-013-0034-x","article-title":"Classification of microarray cancer data using ensemble approach","volume":"2","author":"Nagi","year":"2013","journal-title":"Netw. Model. Anal. Health Inform. Bioinform"},{"key":"2023051609215639300_btab146-B34","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1109\/ICMLA.2017.0-145","volume-title":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico","author":"Nakano","year":"2017"},{"key":"2023051609215639300_btab146-B35","volume-title":"IEEE, Anchorage, Alaska, USA.","author":"Nakano","year":"2017"},{"key":"2023051609215639300_btab146-B36","first-page":"1","volume-title":"2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil","author":"Nakano","year":"2018"},{"key":"2023051609215639300_btab146-B37","article-title":"Machine learning based prediction of hierarchical classification of transposable elements","author":"Panta","year":"2019","journal-title":"arXiv e-prints"},{"key":"2023051609215639300_btab146-B38","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051609215639300_btab146-B39","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s10577-017-9570-z","article-title":"Mammalian transposable elements and their impacts on genome evolution","volume":"26","author":"Platt","year":"2018","journal-title":"Chromosome Res"},{"key":"2023051609215639300_btab146-B40","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/j.gde.2010.01.004","article-title":"Genomic gems: SINE RNAs regulate mRNA production","volume":"20","author":"Ponicsan","year":"2010","journal-title":"Curr. Opin. Genet. Dev"},{"key":"2023051609215639300_btab146-B41","doi-asserted-by":"crossref","first-page":"e1006097","DOI":"10.1371\/journal.pcbi.1006097","article-title":"A machine learning based framework to identify and classify long terminal repeat retrotransposons","volume":"14","author":"Schietgat","year":"2018","journal-title":"PLoS Comput. Biol"},{"key":"2023051609215639300_btab146-B42","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1007\/s10618-010-0175-9","article-title":"A survey of hierarchical classification across different application domains","volume":"22","author":"Silla","year":"2011","journal-title":"Data Min. Knowl. Discov"},{"key":"2023051609215639300_btab146-B43","first-page":"521","volume-title":"Proceedings 2001 IEEE International Conference on Data Mining","author":"Sun","year":"2001"},{"key":"2023051609215639300_btab146-B44","first-page":"271","article-title":"Issues in stacked generalization","volume":"10","author":"Ting","year":"1999","journal-title":"J. Artif. Int. Res"},{"key":"2023051609215639300_btab146-B45","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1109\/CONFLUENCE.2017.7943141","volume-title":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence, Noida, India","author":"Verma","year":"2017"},{"key":"2023051609215639300_btab146-B46","doi-asserted-by":"crossref","first-page":"973","DOI":"10.1038\/nrg2165","article-title":"A unified classification system for eukaryotic transposable elements","volume":"8","author":"Wicker","year":"2007","journal-title":"Nat. Rev. Genet"},{"key":"2023051609215639300_btab146-B47","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","article-title":"Stacked generatlization","volume":"5","author":"Wolpert","year":"1992","journal-title":"Neural Netw"},{"key":"2023051609215639300_btab146-B48","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/4235.585893","article-title":"No free lunch theorems for optimization","volume":"1","author":"Wolpert","year":"1997","journal-title":"IEEE Trans. Evol. Comput"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab146\/36599133\/btab146.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/17\/2529\/50339398\/btab146.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/17\/2529\/50339398\/btab146.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,25]],"date-time":"2024-08-25T02:56:43Z","timestamp":1724554603000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/17\/2529\/6158037"}},"subtitle":[],"editor":[{"given":"Peter","family":"Robinson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,3,3]]},"references-count":48,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2021,9,9]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab146","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,9,1]]},"published":{"date-parts":[[2021,3,3]]}}}