{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T11:53:02Z","timestamp":1775476382239,"version":"3.50.1"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T00:00:00Z","timestamp":1769212800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T00:00:00Z","timestamp":1770249600000},"content-version":"vor","delay-in-days":12,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>DNA sequences are fundamental carriers of genetic information, and their accurate classification is essential for understanding gene regulation, disease mechanisms, and translational genomics. Existing encoding methods often fail to capture both local and long-range dependencies simultaneously.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>\n                      We introduce EDEN (Expected Density of Nucleotide Encoding), a unified multiscale encoding framework based on kernel density estimation. EDEN captures position-specific and context-dependent nucleotide patterns and integrates them into a hybrid deep learning architecture. Across sixteen benchmark datasets covering promoter detection, core promoter detection, and transcription factor binding prediction, EDEN achieves the best average performance while using orders of magnitude fewer parameters compared with state-of-the-art models. All source code, pretrained models, and datasets are publicly available at:\n                      <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/zabihis\/EDEN\" ext-link-type=\"uri\">https:\/\/github.com\/zabihis\/EDEN<\/jats:ext-link>\n                      .\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>EDEN provides an efficient, biologically informed, and interpretable multiscale representation for genomic sequence classification. Its favorable parameter-performance ratio and robust consistency across tasks underscore its practicality for large-scale genomic applications.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-026-06367-6","type":"journal-article","created":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T18:04:07Z","timestamp":1769277847000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["EDEN: multiscale expected density of nucleotide encoding for enhanced DNA sequence classification with hybrid deep learning"],"prefix":"10.1186","volume":"27","author":[{"given":"Saman","family":"Zabihi","sequence":"first","affiliation":[]},{"given":"Sattar","family":"Hashemi","sequence":"additional","affiliation":[]},{"given":"Eghbal","family":"Mansoori","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,1,24]]},"reference":[{"key":"6367_CR1","doi-asserted-by":"publisher","first-page":"691","DOI":"10.1016\/j.omtn.2023.02.019","volume":"31","author":"W Chen","year":"2023","unstructured":"Chen W, Liu X, Zhang S, Chen S. Artificial intelligence for drug discovery: resources, methods, and applications. Mol Ther Nucleic Acids. 2023;31:691\u2013702. https:\/\/doi.org\/10.1016\/j.omtn.2023.02.019.","journal-title":"Mol Ther Nucleic Acids"},{"key":"6367_CR2","doi-asserted-by":"publisher","unstructured":"Umer S, Rout RK, Khandelwal M, Pati S. Computational techniques for biological sequence analysis. In: Computational techniques for biological sequence analysis.\u00a02025. p. 1\u2013200. https:\/\/doi.org\/10.1201\/9781032660714","DOI":"10.1201\/9781032660714"},{"key":"6367_CR3","doi-asserted-by":"publisher","first-page":"564622","DOI":"10.3389\/FBIOE.2020.01032\/BIBTEX","volume":"8","author":"A Yang","year":"2020","unstructured":"Yang A, Zhang W, Wang J, Yang K, Han Y, Zhang L. Review on the application of machine learning algorithms in the sequence data mining of DNA. Front Bioeng Biotechnol. 2020;8:564622. https:\/\/doi.org\/10.3389\/FBIOE.2020.01032\/BIBTEX.","journal-title":"Front Bioeng Biotechnol"},{"key":"6367_CR4","doi-asserted-by":"publisher","first-page":"3198","DOI":"10.1016\/J.CSBJ.2021.05.039","volume":"19","author":"H Iuchi","year":"2021","unstructured":"Iuchi H, et al. Representation learning applications in biological sequence analysis. Comput Struct Biotechnol J. 2021;19:3198\u2013208. https:\/\/doi.org\/10.1016\/J.CSBJ.2021.05.039.","journal-title":"Comput Struct Biotechnol J"},{"key":"6367_CR5","doi-asserted-by":"publisher","DOI":"10.3390\/BIOM14111447","author":"T Li","year":"2024","unstructured":"Li T, Li M, Wu Y, Li Y. Visualization methods for DNA sequences: a review and prospects. Biomolecules. 2024. https:\/\/doi.org\/10.3390\/BIOM14111447.","journal-title":"Biomolecules"},{"issue":"2","key":"6367_CR6","doi-asserted-by":"publisher","DOI":"10.3390\/BIOMIMETICS8020218","volume":"8","author":"TB Alaku\u015f","year":"2023","unstructured":"Alaku\u015f TB. A novel repetition frequency-based DNA encoding scheme to predict human and mouse DNA enhancers with deep learning. Biomimetics. 2023;8(2):218. https:\/\/doi.org\/10.3390\/BIOMIMETICS8020218.","journal-title":"Biomimetics"},{"issue":"6","key":"6367_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/J.ISCI.2024.110030","volume":"27","author":"W Hu","year":"2024","unstructured":"Hu W, Li Y, Wu Y, Guan L, Li M. A deep learning model for DNA enhancer prediction based on nucleotide position aware feature encoding. iScience. 2024;27(6):110030. https:\/\/doi.org\/10.1016\/J.ISCI.2024.110030.","journal-title":"iScience"},{"key":"6367_CR8","doi-asserted-by":"publisher","unstructured":"Shen Y, Kudla G, Oyarz\u00fan DA. DNA representations and generalization performance of sequence-to-expression models. bioRxiv. 2024:2024.02.06.579067. https:\/\/doi.org\/10.1101\/2024.02.06.579067.","DOI":"10.1101\/2024.02.06.579067"},{"key":"6367_CR9","unstructured":"Zhang X, Beinke B, Al Kindhi B, Wiering M. Comparing machine learning algorithms with or without feature extraction for DNA classification. 2020. https:\/\/arxiv.org\/pdf\/2011.00485. Accessed 31\u00a0Jul 2025."},{"issue":"14","key":"6367_CR10","doi-asserted-by":"publisher","first-page":"3887","DOI":"10.1093\/NAR\/19.14.3887","volume":"19","author":"R Gnffais","year":"1991","unstructured":"Gnffais R, Andr\u00e9 PM, Thibon M. K-tuple frequency in the human genome and polymerase chain reaction. Nucleic Acids Res. 1991;19(14):3887\u201391. https:\/\/doi.org\/10.1093\/NAR\/19.14.3887.","journal-title":"Nucleic Acids Res"},{"issue":"14","key":"6367_CR11","doi-asserted-by":"publisher","first-page":"5155","DOI":"10.1073\/PNAS.83.14.5155","volume":"83","author":"BE Blaisdell","year":"1986","unstructured":"Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci U S A. 1986;83(14):5155\u20139. https:\/\/doi.org\/10.1073\/PNAS.83.14.5155.","journal-title":"Proc Natl Acad Sci U S A"},{"key":"6367_CR12","unstructured":"Ozan \u015e. DNA sequence classification with compressors. 2024. https:\/\/arxiv.org\/pdf\/2401.14025. Accessed 31 Jul 2025."},{"issue":"2","key":"6367_CR13","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1101\/GR.279452.124","volume":"35","author":"KM Jenike","year":"2025","unstructured":"Jenike KM, et al. K-mer approaches for biodiversity genomics. Genome Res. 2025;35(2):219\u201330. https:\/\/doi.org\/10.1101\/GR.279452.124.","journal-title":"Genome Res"},{"key":"6367_CR14","doi-asserted-by":"publisher","unstructured":"Sun S, Fodor AA. Correction for spurious taxonomic assignments of k-mer classifiers in low microbial biomass samples using shuffled sequences. bioRxiv. 2025:2025.06.18.660363. https:\/\/doi.org\/10.1101\/2025.06.18.660363.","DOI":"10.1101\/2025.06.18.660363"},{"issue":"21","key":"6367_CR15","doi-asserted-by":"publisher","first-page":"e110","DOI":"10.1093\/NAR\/GKAD929","volume":"51","author":"Y Wang","year":"2023","unstructured":"Wang Y, et al. A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder. Nucleic Acids Res. 2023;51(21):e110\u2013e110. https:\/\/doi.org\/10.1093\/NAR\/GKAD929.","journal-title":"Nucleic Acids Res"},{"key":"6367_CR16","doi-asserted-by":"crossref","unstructured":"Wang Z, Wang Z, Jiang J, Chen P, Shi X, Li Y. Large language models in bioinformatics: a survey. 2025. https:\/\/arxiv.org\/pdf\/2503.04490. Accessed 31 Jul 2025.","DOI":"10.18653\/v1\/2025.findings-acl.184"},{"issue":"4","key":"6367_CR17","doi-asserted-by":"publisher","first-page":"286","DOI":"10.1016\/j.tig.2024.11.013","volume":"41","author":"G Benegas","year":"2024","unstructured":"Benegas G, Ye C, Albors C, Li JC, Song YS. Genomic language models: opportunities and challenges. Trends Genet. 2024;41(4):286\u2013302. https:\/\/doi.org\/10.1016\/j.tig.2024.11.013.","journal-title":"Trends Genet"},{"issue":"7","key":"6367_CR18","doi-asserted-by":"publisher","first-page":"1394","DOI":"10.1101\/GR.2289704","volume":"14","author":"ACE Darling","year":"2004","unstructured":"Darling ACE, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14(7):1394\u2013403. https:\/\/doi.org\/10.1101\/GR.2289704.","journal-title":"Genome Res"},{"issue":"2","key":"6367_CR19","doi-asserted-by":"publisher","DOI":"10.1093\/BIB\/BBAF109","volume":"26","author":"S Li","year":"2025","unstructured":"Li S, Hua H, Chen S. Graph neural networks for single-cell omics data: a review of approaches and applications. Brief Bioinform. 2025;26(2):109. https:\/\/doi.org\/10.1093\/BIB\/BBAF109.","journal-title":"Brief Bioinform"},{"issue":"1","key":"6367_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-021-23774-w","volume":"12","author":"T Wang","year":"2021","unstructured":"Wang T, et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021;12(1):1\u201313. https:\/\/doi.org\/10.1038\/s41467-021-23774-w.","journal-title":"Nat Commun"},{"issue":"8","key":"6367_CR21","doi-asserted-by":"publisher","first-page":"2163","DOI":"10.1093\/NAR\/18.8.2163","volume":"18","author":"HJ Jeffrey","year":"1990","unstructured":"Jeffrey HJ. Chaos game representation of gene structure. Nucleic Acids Res. 1990;18(8):2163\u201370. https:\/\/doi.org\/10.1093\/NAR\/18.8.2163.","journal-title":"Nucleic Acids Res"},{"key":"6367_CR22","volume-title":"Deep Learning for the Life Sciences : Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More","author":"B Ramsundar","year":"2019","unstructured":"Ramsundar B, Eastman P, Walters P, Pande V, Leswing K, Wu Z. Deep Learning for the Life Sciences\u202f: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More. O\u2019Reilly Media; 2019."},{"issue":"1","key":"6367_CR23","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1109\/TITB.2009.2033052","volume":"14","author":"A Bucur","year":"2010","unstructured":"Bucur A, Van Leeuwen J, Dimitrova N, Mittal C. Alignment method for spectrograms of DNA sequences. IEEE Trans Inf Technol Biomed. 2010;14(1):3\u20139. https:\/\/doi.org\/10.1109\/TITB.2009.2033052.","journal-title":"IEEE Trans Inf Technol Biomed"},{"key":"6367_CR24","doi-asserted-by":"publisher","unstructured":"Dimitrova N, Cheung YH, Zhang M. Analysis and visualization of DNA spectrograms: open possibilities for the genome research. In:\u00a0Proceedings of the 14th ACM international conference on Multimedia MM. 2006. p. 1017\u201324. https:\/\/doi.org\/10.1145\/1180639.1180861.","DOI":"10.1145\/1180639.1180861"},{"issue":"5786","key":"6367_CR25","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1126\/SCIENCE.1127647","volume":"313","author":"GE Hinton","year":"2006","unstructured":"Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504\u20137. https:\/\/doi.org\/10.1126\/SCIENCE.1127647.","journal-title":"Science"},{"key":"6367_CR26","doi-asserted-by":"publisher","first-page":"3371","DOI":"10.5555\/1756006.1953039","volume":"11","author":"PV Ca","year":"2010","unstructured":"Ca PV, Edu LT, Lajoie I, Ca YB, Ca P-AM. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010;11:3371\u2013408. https:\/\/doi.org\/10.5555\/1756006.1953039.","journal-title":"J Mach Learn Res"},{"key":"6367_CR27","unstructured":"Thanh V, Nguyen D, Hy T-S. Advances in protein representation learning: methods, applications, and future directions.\u00a02025. https:\/\/arxiv.org\/pdf\/2503.16659. Accessed 01 Aug 2025."},{"issue":"15","key":"6367_CR28","doi-asserted-by":"publisher","first-page":"2112","DOI":"10.1093\/BIOINFORMATICS\/BTAB083","volume":"37","author":"Y Ji","year":"2021","unstructured":"Ji Y, Zhou Z, Liu H, Davuluri RV. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics. 2021;37(15):2112\u201320. https:\/\/doi.org\/10.1093\/BIOINFORMATICS\/BTAB083.","journal-title":"Bioinformatics"},{"key":"6367_CR29","unstructured":"Zhou Z, Ji Y, Li W, Dutta P, Davuluri RV, Liu H. DNABERT-2: efficient foundation model and benchmark for multi-species genome. In:\u00a012th international conference on learning representations, ICLR 2024. 2024. https:\/\/arxiv.org\/pdf\/2306.15006. Accessed 28 Jul 2025."},{"issue":"2","key":"6367_CR30","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1101\/2023.01.11.523679","volume":"22","author":"H Dalla-Torre","year":"2025","unstructured":"Dalla-Torre H, et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat Methods. 2025;22(2):287\u201397. https:\/\/doi.org\/10.1101\/2023.01.11.523679.","journal-title":"Nat Methods"},{"key":"6367_CR31","doi-asserted-by":"crossref","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In:\u00a0NAACL HLT 2019\u20132019 Conference of the north american chapter of the association for computational linguistics: human language technologies - Proceedings of the conference, vol. 1, pp. 4171\u20134186. 2018. https:\/\/arxiv.org\/pdf\/1810.04805. Accessed 30 Aug 2025.","DOI":"10.18653\/v1\/N19-1423"},{"key":"6367_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/J.DSP.2022.103430","volume":"123","author":"C Wei","year":"2022","unstructured":"Wei C, Zhang J, Yuan X. Enhancing the prediction of protein coding regions in biological sequence via a deep learning framework with hybrid encoding. Digit Signal Process. 2022;123:103430. https:\/\/doi.org\/10.1016\/J.DSP.2022.103430.","journal-title":"Digit Signal Process"},{"issue":"1","key":"6367_CR33","doi-asserted-by":"publisher","DOI":"10.1186\/S12864-023-09365-7","volume":"24","author":"C Wei","year":"2023","unstructured":"Wei C, Ye Z, Zhang J, Li A. CPPVec: an accurate coding potential predictor based on a distributed representation of protein sequence. BMC Genomics. 2023;24(1):264. https:\/\/doi.org\/10.1186\/S12864-023-09365-7.","journal-title":"BMC Genomics"},{"issue":"3","key":"6367_CR34","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/s41588-021-00782-6","volume":"53","author":"\u017d Avsec","year":"2021","unstructured":"Avsec \u017d, et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet. 2021;53(3):354\u201366. https:\/\/doi.org\/10.1038\/s41588-021-00782-6.","journal-title":"Nat Genet"},{"issue":"3","key":"6367_CR35","doi-asserted-by":"publisher","DOI":"10.1371\/JOURNAL.PCBI.1009941","volume":"18","author":"Q Zhang","year":"2022","unstructured":"Zhang Q, et al. Base-resolution prediction of transcription factor binding signals by a deep learning framework. PLoS Comput Biol. 2022;18(3):e1009941. https:\/\/doi.org\/10.1371\/JOURNAL.PCBI.1009941.","journal-title":"PLoS Comput Biol"},{"key":"6367_CR36","doi-asserted-by":"publisher","unstructured":"Liang J, Peng Z. Hybrid deep learning with protein language models and dual-path architecture for predicting IDP functions. bioRxiv. 2025:2025.05.25.655984. https:\/\/doi.org\/10.1101\/2025.05.25.655984.","DOI":"10.1101\/2025.05.25.655984"},{"key":"6367_CR37","doi-asserted-by":"publisher","unstructured":"Lv Z, Hou D, Hou M, Wang S, Zhuang J, Zhang G. MultiSAAl: sequence-informed antibody-antigen interactions prediction using multi-scale deep learning. bioRxiv. 2025:2025.05.29.656915. https:\/\/doi.org\/10.1101\/2025.05.29.656915.","DOI":"10.1101\/2025.05.29.656915"},{"issue":"16","key":"6367_CR38","doi-asserted-by":"publisher","first-page":"2299","DOI":"10.1093\/BIOINFORMATICS\/BTAB112","volume":"37","author":"J Charlier","year":"2021","unstructured":"Charlier J, Nadon R, Makarenkov V. Accurate deep learning off-target prediction with novel sgRNA-DNA sequence encoding in CRISPR-Cas9 gene editing. Bioinformatics. 2021;37(16):2299\u2013307. https:\/\/doi.org\/10.1093\/BIOINFORMATICS\/BTAB112.","journal-title":"Bioinformatics"},{"key":"6367_CR39","unstructured":"Zhou Z et al. Dec., XAI meets biology: a comprehensive review of explainable AI in bioinformatics applications. 2023. https:\/\/arxiv.org\/pdf\/2312.06082v1. Accessed 01 Aug 2025."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-026-06367-6","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-026-06367-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-026-06367-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T12:05:06Z","timestamp":1770293106000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1186\/s12859-026-06367-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,24]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["6367"],"URL":"https:\/\/doi.org\/10.1186\/s12859-026-06367-6","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,24]]},"assertion":[{"value":"9 September 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 January 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 January 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"This study does not include gels, blots, or microscopy images, and therefore no raw image files are required.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Image integrity"}},{"value":"All research content, including data generation, analysis, interpretation, and conclusions, was conceived, designed, and validated solely by the authors. Generative AI tools were used exclusively for translation, grammar correction, and language refinement during manuscript preparation. After using these tools, the authors thoroughly reviewed and revised the content as necessary and take full responsibility for the final published work.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declaration of generative AI and AI-assisted technologies in the writing process"}}],"article-number":"40"}}