{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T07:45:19Z","timestamp":1777794319844,"version":"3.51.4"},"reference-count":101,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T00:00:00Z","timestamp":1700092800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T00:00:00Z","timestamp":1700092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100007631","name":"Canadian Institute for Advanced Research","doi-asserted-by":"publisher","award":["FL-000655"],"award-info":[{"award-number":["FL-000655"]}],"id":[{"id":"10.13039\/100007631","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"U.S. Department of Health & Human Services | National Institutes of Health","doi-asserted-by":"publisher","award":["1U01HG012059"],"award-info":[{"award-number":["1U01HG012059"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"U.S. Department of Health & Human Services | National Institutes of Health","doi-asserted-by":"publisher","award":["DP2HG010013"],"award-info":[{"award-number":["DP2HG010013"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"U.S. Department of Health & Human Services | National Institutes of Health","doi-asserted-by":"publisher","award":["DP2HG010013"],"award-info":[{"award-number":["DP2HG010013"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["441540116"],"award-info":[{"award-number":["441540116"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Comput Sci"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep learning has become a popular tool to study cis-regulatory function. Yet efforts to design software for deep-learning analyses in regulatory genomics that are findable, accessible, interoperable and reusable (FAIR) have fallen short of fully meeting these criteria. Here we present elucidating the utility of genomic elements with neural nets (EUGENe), a FAIR toolkit for the analysis of genomic sequences with deep learning. EUGENe consists of a set of modules and subpackages for executing the key functionality of a genomics deep learning workflow: (1) extracting, transforming and loading sequence data from many common file formats; (2) instantiating, initializing and training diverse model architectures; and (3) evaluating and interpreting model behavior. We designed EUGENe as a simple, flexible and extensible interface for streamlining and customizing end-to-end deep-learning sequence analyses, and illustrate these principles through application of the toolkit to three predictive modeling tasks. We hope that EUGENe represents a springboard towards a collaborative ecosystem for deep-learning applications in genomics research.<\/jats:p>","DOI":"10.1038\/s43588-023-00544-w","type":"journal-article","created":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T17:01:58Z","timestamp":1700154118000},"page":"946-956","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Predictive analyses of regulatory sequences with EUGENe"],"prefix":"10.1038","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7600-3086","authenticated-orcid":false,"given":"Adam","family":"Klie","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Laub","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"James V.","family":"Talwar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hayden","family":"Stites","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1804-7187","authenticated-orcid":false,"given":"Tobias","family":"Jores","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joe J.","family":"Solvason","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emma K.","family":"Farley","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1729-2463","authenticated-orcid":false,"given":"Hannah","family":"Carter","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,11,16]]},"reference":[{"key":"544_CR1","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1038\/nature11247","volume":"489","author":"ENCODE Project Consortium.","year":"2012","unstructured":"ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57\u201374 (2012).","journal-title":"Nature"},{"key":"544_CR2","doi-asserted-by":"publisher","first-page":"831","DOI":"10.1038\/nbt.3300","volume":"33","author":"B Alipanahi","year":"2015","unstructured":"Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831\u2013838 (2015).","journal-title":"Nat. Biotechnol."},{"key":"544_CR3","doi-asserted-by":"publisher","DOI":"10.1186\/s12864-018-4889-1","volume":"19","author":"X Pan","year":"2018","unstructured":"Pan, X., Rijnbeek, P., Yan, J. & Shen, H.-B. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics 19, 511 (2018).","journal-title":"BMC Genomics"},{"key":"544_CR4","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1016\/j.ymeth.2019.03.020","volume":"166","author":"D Quang","year":"2019","unstructured":"Quang, D. & Xie, X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods 166, 40\u201347 (2019).","journal-title":"Methods"},{"key":"544_CR5","doi-asserted-by":"publisher","first-page":"e1008925","DOI":"10.1371\/journal.pcbi.1008925","volume":"17","author":"PK Koo","year":"2021","unstructured":"Koo, P. K., Majdandzic, A., Ploenzke, M., Anand, P. & Paul, S. B. Global importance analysis: an interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput. Biol. 17, e1008925 (2021).","journal-title":"PLoS Comput. Biol."},{"key":"544_CR6","doi-asserted-by":"publisher","first-page":"e69","DOI":"10.1093\/nar\/gky215","volume":"46","author":"M Wang","year":"2018","unstructured":"Wang, M., Tai, C., E, W. & Wei, L. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res. 46, e69 (2018).","journal-title":"Nucleic Acids Res."},{"key":"544_CR7","doi-asserted-by":"publisher","first-page":"931","DOI":"10.1038\/nmeth.3547","volume":"12","author":"J Zhou","year":"2015","unstructured":"Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931\u2013934 (2015).","journal-title":"Nat. Methods"},{"key":"544_CR8","doi-asserted-by":"publisher","first-page":"e107","DOI":"10.1093\/nar\/gkw226","volume":"44","author":"D Quang","year":"2016","unstructured":"Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016).","journal-title":"Nucleic Acids Res."},{"key":"544_CR9","doi-asserted-by":"publisher","first-page":"990","DOI":"10.1101\/gr.200535.115","volume":"26","author":"DR Kelley","year":"2016","unstructured":"Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26, 990\u2013999 (2016).","journal-title":"Genome Res"},{"key":"544_CR10","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1101\/gr.227819.117","volume":"28","author":"DR Kelley","year":"2018","unstructured":"Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 28, 739\u2013750 (2018).","journal-title":"Genome Res"},{"key":"544_CR11","doi-asserted-by":"publisher","first-page":"1815","DOI":"10.1101\/gr.260844.120","volume":"3","author":"L Minnoye","year":"2020","unstructured":"Minnoye, L. et al. Cross-species analysis of enhancer logic using deep learning. Genome Res. 3, 1815\u20131834 (2020).","journal-title":"Genome Res."},{"key":"544_CR12","doi-asserted-by":"publisher","first-page":"1082","DOI":"10.1101\/gr.260851.120","volume":"31","author":"ZK Atak","year":"2021","unstructured":"Atak, Z. K. et al. Interpretation of allele-specific chromatin accessibility using cell state-aware deep learning. Genome Res. 31, 1082\u20131096 (2021).","journal-title":"Genome Res."},{"key":"544_CR13","doi-asserted-by":"publisher","first-page":"bbaa159","DOI":"10.1093\/bib\/bbaa159","volume":"22","author":"J Li","year":"2021","unstructured":"Li, J., Pu, Y., Tang, J., Zou, Q. & Guo, F. DeepATT: a hybrid category attention neural network for identifying functional effects of DNA sequences. Brief. Bioinform. 22, bbaa159 (2021).","journal-title":"Brief. Bioinform."},{"key":"544_CR14","doi-asserted-by":"publisher","first-page":"1088","DOI":"10.1038\/s41592-022-01562-8","volume":"19","author":"H Yuan","year":"2022","unstructured":"Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods 19, 1088\u20131096 (2022).","journal-title":"Nat. Methods"},{"key":"544_CR15","doi-asserted-by":"publisher","first-page":"940","DOI":"10.1038\/s41588-022-01102-2","volume":"54","author":"KM Chen","year":"2022","unstructured":"Chen, K. M., Wong, A. K., Troyanskaya, O. G. & Zhou, J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940\u2013949 (2022).","journal-title":"Nat. Genet."},{"key":"544_CR16","doi-asserted-by":"publisher","first-page":"630","DOI":"10.1038\/s41586-021-04262-z","volume":"601","author":"J Janssens","year":"2022","unstructured":"Janssens, J. et al. Decoding gene regulation in the fly brain. Nature 601, 630\u2013636 (2022).","journal-title":"Nature"},{"key":"544_CR17","doi-asserted-by":"publisher","first-page":"i108","DOI":"10.1093\/bioinformatics\/btz352","volume":"35","author":"S Nair","year":"2019","unstructured":"Nair, S., Kim, D. S., Perricone, J. & Kundaje, A. Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts. Bioinformatics 35, i108\u2013i116 (2019).","journal-title":"Bioinformatics"},{"key":"544_CR18","doi-asserted-by":"publisher","first-page":"e77","DOI":"10.1093\/nar\/gkab349","volume":"49","author":"F Ullah","year":"2021","unstructured":"Ullah, F. & Ben-Hur, A. A self-attention model for inferring cooperativity between regulatory features. Nucleic Acids Res. 49, e77 (2021).","journal-title":"Nucleic Acids Res."},{"key":"544_CR19","doi-asserted-by":"publisher","first-page":"1171","DOI":"10.1038\/s41588-018-0160-6","volume":"50","author":"J Zhou","year":"2018","unstructured":"Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171\u20131179 (2018).","journal-title":"Nat. Genet."},{"key":"544_CR20","doi-asserted-by":"publisher","first-page":"107663","DOI":"10.1016\/j.celrep.2020.107663","volume":"31","author":"V Agarwal","year":"2020","unstructured":"Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020).","journal-title":"Cell Rep."},{"key":"544_CR21","doi-asserted-by":"publisher","first-page":"1196","DOI":"10.1038\/s41592-021-01252-x","volume":"18","author":"\u017d Avsec","year":"2021","unstructured":"Avsec, \u017d. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196\u20131203 (2021).","journal-title":"Nat. Methods"},{"key":"544_CR22","first-page":"930\u2013944","volume":"32","author":"A Karbalayghareh","year":"2022","unstructured":"Karbalayghareh, A., Sahin, M. & Leslie, C. S. Chromatin interaction-aware gene regulatory modeling with graph attention networks. Genome Res. 32, 930\u2013944 (2022).","journal-title":"Genome Res."},{"key":"544_CR23","doi-asserted-by":"publisher","first-page":"1111","DOI":"10.1038\/s41592-020-0958-x","volume":"17","author":"G Fudenberg","year":"2020","unstructured":"Fudenberg, G., Kelley, D. R. & Pollard, K. S. Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods 17, 1111\u20131117 (2020).","journal-title":"Nat. Methods"},{"key":"544_CR24","doi-asserted-by":"publisher","first-page":"725","DOI":"10.1038\/s41588-022-01065-4","volume":"54","author":"J Zhou","year":"2022","unstructured":"Zhou, J. Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nat. Genet. 54, 725\u2013734 (2022).","journal-title":"Nat. Genet."},{"key":"544_CR25","doi-asserted-by":"publisher","first-page":"134","DOI":"10.1186\/s13059-023-02934-9","volume":"24","author":"R Yang","year":"2023","unstructured":"Yang, R. et al. Epiphany: predicting Hi-C contact maps from 1D epigenomic signals. Genome Biol. 24, 134 (2023).","journal-title":"Genome Biol."},{"key":"544_CR26","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1038\/s41587-022-01612-8","volume":"41","author":"J Tan","year":"2023","unstructured":"Tan, J. et al. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat. Biotechnol. 41, 1140\u20131150 (2023).","journal-title":"Nat. Biotechnol."},{"key":"544_CR27","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1038\/s41588-022-01048-5","volume":"54","author":"BP de Almeida","year":"2022","unstructured":"de Almeida, B. P., Reiter, F., Pagani, M. & Stark, A. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613\u2013624 (2022).","journal-title":"Nat. Genet."},{"key":"544_CR28","doi-asserted-by":"publisher","first-page":"e0218073","DOI":"10.1371\/journal.pone.0218073","volume":"14","author":"R Movva","year":"2019","unstructured":"Movva, R. et al. Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PLoS One 14, e0218073 (2019).","journal-title":"PLoS One"},{"key":"544_CR29","doi-asserted-by":"publisher","first-page":"842","DOI":"10.1038\/s41477-021-00932-y","volume":"7","author":"T Jores","year":"2021","unstructured":"Jores, T. et al. Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nat. Plants 7, 842\u2013855 (2021).","journal-title":"Nat. Plants"},{"key":"544_CR30","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/s41588-021-00782-6","volume":"53","author":"\u017d Avsec","year":"2021","unstructured":"Avsec, \u017d. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354\u2013366 (2021).","journal-title":"Nat. Genet."},{"key":"544_CR31","doi-asserted-by":"publisher","unstructured":"Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1912.01703 (2019).","DOI":"10.48550\/arXiv.1912.01703"},{"key":"544_CR32","doi-asserted-by":"publisher","unstructured":"Abadi, M. et al. TensorFlow: a system for large-scale machine learning. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1605.08695 (2016).","DOI":"10.48550\/arXiv.1605.08695"},{"key":"544_CR33","doi-asserted-by":"publisher","first-page":"3035","DOI":"10.1093\/bioinformatics\/bty222","volume":"34","author":"S Budach","year":"2018","unstructured":"Budach, S. & Marsico, A. pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks. Bioinformatics 34, 3035\u20133037 (2018).","journal-title":"Bioinformatics"},{"key":"544_CR34","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1038\/s41592-019-0360-8","volume":"16","author":"KM Chen","year":"2019","unstructured":"Chen, K. M., Cofer, E. M., Zhou, J. & Troyanskaya, O. G. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315\u2013318 (2019).","journal-title":"Nat. Methods"},{"key":"544_CR35","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-020-17155-y","volume":"11","author":"W Kopp","year":"2020","unstructured":"Kopp, W., Monti, R., Tamburrini, A., Ohler, U. & Akalin, A. Deep learning for genomics using Janggu. Nat. Commun. 11, 3488 (2020).","journal-title":"Nat. Commun."},{"key":"544_CR36","doi-asserted-by":"publisher","first-page":"592","DOI":"10.1038\/s41587-019-0140-0","volume":"37","author":"\u017d Avsec","year":"2019","unstructured":"Avsec, \u017d. et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592\u2013600 (2019).","journal-title":"Nat. Biotechnol."},{"key":"544_CR37","doi-asserted-by":"publisher","DOI":"10.1186\/s12864-022-08414-x","volume":"23","author":"E Chalupov\u00e1","year":"2022","unstructured":"Chalupov\u00e1, E. et al. ENNGene: an easy neural network model building tool for genomics. BMC Genomics 23, 248 (2022).","journal-title":"BMC Genomics"},{"key":"544_CR38","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-022-01710-x","volume":"9","author":"M Barker","year":"2022","unstructured":"Barker, M. et al. Introducing the FAIR Principles for research software. Sci Data. 9, 622 (2022).","journal-title":"Sci Data."},{"key":"544_CR39","doi-asserted-by":"publisher","first-page":"2120","DOI":"10.1105\/tpc.20.00155","volume":"32","author":"T Jores","year":"2020","unstructured":"Jores, T. et al. Identification of plant enhancers and their constituent elements by STARR-seq in tobacco leaves. Plant Cell 32, 2120\u20132131 (2020).","journal-title":"Plant Cell"},{"key":"544_CR40","doi-asserted-by":"publisher","first-page":"e0235748","DOI":"10.1371\/journal.pone.0235748","volume":"15","author":"K Onimaru","year":"2020","unstructured":"Onimaru, K., Nishimura, O. & Kuraku, S. Predicting gene regulatory regions with a convolutional neural network for processing double-strand genome sequence information. PLoS One 15, e0235748 (2020).","journal-title":"PLoS One"},{"key":"544_CR41","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2007-8-2-r24","volume":"8","author":"S Gupta","year":"2007","unstructured":"Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).","journal-title":"Genome Biol."},{"key":"544_CR42","doi-asserted-by":"publisher","unstructured":"Shrikumar, A., Greenside, P., Shcherbina, A. & Kundaje, A. Not just a black box: learning important features through propagating activation differences. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1605.01713 (2016).","DOI":"10.48550\/arXiv.1605.01713"},{"key":"544_CR43","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/nature12311","volume":"499","author":"D Ray","year":"2013","unstructured":"Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172\u2013177 (2013).","journal-title":"Nature"},{"key":"544_CR44","doi-asserted-by":"publisher","first-page":"393","DOI":"10.1038\/nprot.2008.195","volume":"4","author":"MF Berger","year":"2009","unstructured":"Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 4, 393\u2013411 (2009).","journal-title":"Nat. Protoc."},{"key":"544_CR45","unstructured":"Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (eds. Guyon, I. et al.) Vol. 30, 4765\u20134774 (Curran Associates, 2017)."},{"key":"544_CR46","doi-asserted-by":"publisher","first-page":"D252","DOI":"10.1093\/nar\/gkx1106","volume":"46","author":"IV Kulakovskiy","year":"2018","unstructured":"Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 46, D252\u2013D259 (2018).","journal-title":"Nucleic Acids Res."},{"key":"544_CR47","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1038\/s41592-019-0367-1","volume":"16","author":"C Bravo Gonz\u00e1lez-Blas","year":"2019","unstructured":"Bravo Gonz\u00e1lez-Blas, C. et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. Nat. Methods 16, 397\u2013400 (2019).","journal-title":"Nat. Methods"},{"key":"544_CR48","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1038\/s41576-021-00434-9","volume":"23","author":"S Whalen","year":"2021","unstructured":"Whalen, S., Schreiber, J., Noble, W. S. & Pollard, K. S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 23, 169\u2013181 (2021).","journal-title":"Nat. Rev. Genet."},{"key":"544_CR49","doi-asserted-by":"publisher","first-page":"2281","DOI":"10.1016\/j.csbj.2020.08.015","volume":"18","author":"G Urban","year":"2020","unstructured":"Urban, G., Torrisi, M., Magnan, C. N., Pollastri, G. & Baldi, P. Protein profiles: biases and protocols. Comput. Struct. Biotechnol. J. 18, 2281\u20132289 (2020).","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"544_CR50","unstructured":"Laub, D. & Klie, A. ML4GLand\/SeqData (GitHub, 2023); https:\/\/github.com\/ML4GLand\/SeqData"},{"key":"544_CR51","unstructured":"Klie, A. ML4GLand\/SeqDatasets (GitHub, 2023); https:\/\/github.com\/ML4GLand\/SeqDatasets"},{"key":"544_CR52","doi-asserted-by":"crossref","unstructured":"Hoyer, S. & Hamman, J. XArray: N-D labeled arrays and datasets in Python. J. Open. Res. Softw. 5, 10 (2017).","DOI":"10.5334\/jors.148"},{"key":"544_CR53","doi-asserted-by":"publisher","unstructured":"Miles, A. et al. Zarr-Developers\/Zarr-Python: v2.15.0 (Zenodo, 2023); https:\/\/doi.org\/10.5281\/zenodo.8039103","DOI":"10.5281\/zenodo.8039103"},{"key":"544_CR54","doi-asserted-by":"publisher","unstructured":"Baker, E. A. G. et al. emObject: domain specific data abstraction for spatial omics. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/2023.06.07.543950 (2023).","DOI":"10.1101\/2023.06.07.543950"},{"key":"544_CR55","doi-asserted-by":"publisher","unstructured":"Marconato, L. et al. SpatialData: an open and universal data framework for spatial omics. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/2023.05.05.539647 (2023).","DOI":"10.1101\/2023.05.05.539647"},{"key":"544_CR56","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1038\/s41586-020-03182-8","volume":"598","author":"H Liu","year":"2021","unstructured":"Liu, H. et al. DNA methylation atlas of the mouse brain at single-cell resolution. Nature 598, 120\u2013128 (2021).","journal-title":"Nature"},{"key":"544_CR57","unstructured":"Dask: Library for Dynamic Task Scheduling (Dask, 2016); https:\/\/dask.org"},{"key":"544_CR58","doi-asserted-by":"crossref","unstructured":"Teufel, F. et al. GraphPart: homology partitioning for biological sequence analysis. NAR Genom. Bioinform. 5, lqad088 (2023).","DOI":"10.1093\/nargab\/lqad088"},{"key":"544_CR59","unstructured":"Klie, A. & Laub, D. ML4GLand\/SeqPro (GitHub, 2023); https:\/\/github.com\/ML4GLand\/SeqPro"},{"key":"544_CR60","doi-asserted-by":"publisher","unstructured":"Lam, S. K., Pitrou, A. & Seibert, S. Numba: a LLVM-based Python JIT compiler. In Proc. 2nd Workshop on the LLVM Compiler Infrastructure in HPC 1\u20136 (Association for Computing Machinery, 2015); https:\/\/doi.org\/10.1145\/2833157.2833162","DOI":"10.1145\/2833157.2833162"},{"key":"544_CR61","doi-asserted-by":"publisher","first-page":"192","DOI":"10.1186\/1471-2105-9-192","volume":"9","author":"M Jiang","year":"2008","unstructured":"Jiang, M., Anderson, J., Gillespie, J. & Mayne, M. uShuffle: a useful tool for shuffling biological sequences while preserving the k-let counts. BMC Bioinf. 9, 192 (2008).","journal-title":"BMC Bioinf."},{"key":"544_CR62","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1109\/MCSE.2007.55","volume":"9","author":"JD Hunter","year":"2007","unstructured":"Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90\u201395 (2007).","journal-title":"Comput. Sci. Eng."},{"key":"544_CR63","doi-asserted-by":"publisher","first-page":"3021","DOI":"10.21105\/joss.03021","volume":"6","author":"M Waskom","year":"2021","unstructured":"Waskom, M. Seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).","journal-title":"J. Open Source Softw."},{"key":"544_CR64","unstructured":"Klie, A. Tutorials\/Eugene\/Models\/Instantiating_Models.ipynb (GitHub, 2023); https:\/\/github.com\/ML4GLand\/tutorials\/blob\/main\/eugene\/models\/instantiating_models.ipynb"},{"key":"544_CR65","doi-asserted-by":"publisher","unstructured":"Moritz, P. et al. Ray: a distributed framework for emerging AI applications. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1712.05889 (2017).","DOI":"10.48550\/arXiv.1712.05889"},{"key":"544_CR66","doi-asserted-by":"publisher","unstructured":"Falcon, W. et al. PyTorchLightning\/Pytorch-Lightning: 0.7.6 Release (Zenodo, 2020); https:\/\/doi.org\/10.5281\/ZENODO.3828935","DOI":"10.5281\/ZENODO.3828935"},{"key":"544_CR67","unstructured":"Klie, A. Use_Cases\/BPNet\/Train_Eugene.ipynb (GitHub, 2023); https:\/\/github.com\/ML4GLand\/use_cases\/blob\/main\/BPNet\/train_eugene.ipynb"},{"key":"544_CR68","doi-asserted-by":"publisher","unstructured":"Koo, P. K., Qian, S., Kaplun, G., Volf, V. & Kalimeris, D. Robust neural networks are more interpretable for genomics. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/657437 (2019).","DOI":"10.1101\/657437"},{"key":"544_CR69","doi-asserted-by":"publisher","unstructured":"Taskiran, I. I., Spanier, K. I., Christiaens, V., Mauduit, D. & Aerts, S. Cell type directed design of synthetic enhancers. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/2022.07.26.501466 (2022).","DOI":"10.1101\/2022.07.26.501466"},{"key":"544_CR70","doi-asserted-by":"publisher","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","volume":"37","author":"Y Ji","year":"2021","unstructured":"Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112\u20132120 (2021).","journal-title":"Bioinformatics"},{"key":"544_CR71","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1016\/j.coisb.2020.04.001","volume":"19","author":"PK Koo","year":"2020","unstructured":"Koo, P. K. & Ploenzke, M. Deep learning for inferring transcription factor binding sites. Curr Opin Syst Biol 19, 16\u201323 (2020).","journal-title":"Curr Opin Syst Biol"},{"key":"544_CR72","doi-asserted-by":"crossref","unstructured":"Novakovsky, G., Dexter, N., Libbrecht, M. W., Wasserman, W. W. & Mostafavi, S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 24, 125\u2013137 (2022).","DOI":"10.1038\/s41576-022-00532-2"},{"key":"544_CR73","doi-asserted-by":"publisher","first-page":"bbaa177","DOI":"10.1093\/bib\/bbaa177","volume":"22","author":"A Talukder","year":"2021","unstructured":"Talukder, A., Barham, C., Li, X. & Hu, H. Interpretation of deep learning in genomics and epigenomics. Brief. Bioinform. 22, bbaa177 (2021).","journal-title":"Brief. Bioinform."},{"key":"544_CR74","unstructured":"Klie, A. ML4GLand\/SeqExplainer (GitHub, 2023); https:\/\/github.com\/ML4GLand\/SeqExplainer"},{"key":"544_CR75","doi-asserted-by":"publisher","first-page":"D165","DOI":"10.1093\/nar\/gkab1113","volume":"50","author":"JA Castro-Mondragon","year":"2022","unstructured":"Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165\u2013D173 (2022).","journal-title":"Nucleic Acids Res."},{"key":"544_CR76","doi-asserted-by":"publisher","first-page":"e1007560","DOI":"10.1371\/journal.pcbi.1007560","volume":"15","author":"PK Koo","year":"2019","unstructured":"Koo, P. K. & Eddy, S. R. Representation learning of genomic sequence motifs with convolutional neural networks. PLoS Comput. Biol. 15, e1007560 (2019).","journal-title":"PLoS Comput. Biol."},{"key":"544_CR77","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1038\/s42256-020-00291-x","volume":"3","author":"PK Koo","year":"2021","unstructured":"Koo, P. K. & Ploenzke, M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nat. Mach. Intell. 3, 258\u2013266 (2021).","journal-title":"Nat. Mach. Intell."},{"key":"544_CR78","doi-asserted-by":"publisher","unstructured":"Ploenzke, M. S. & Irizarry, R. A. Interpretable convolution methods for learning genomic sequence motifs. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/411934 (2018).","DOI":"10.1101\/411934"},{"key":"544_CR79","doi-asserted-by":"publisher","unstructured":"Kokhlikyan, N. et al. Captum: a unified and generic model interpretability library for PyTorch. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2009.07896 (2020).","DOI":"10.48550\/arXiv.2009.07896"},{"key":"544_CR80","doi-asserted-by":"publisher","unstructured":"Han, T., Srinivas, S. & Lakkaraju, H. Which explanation should I choose? A function approximation perspective to characterizing post Hoc explanations. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2206.01254 (2022).","DOI":"10.48550\/arXiv.2206.01254"},{"key":"544_CR81","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1186\/s13059-023-02956-3","volume":"24","author":"A Majdandzic","year":"2023","unstructured":"Majdandzic, A., Rajesh, C. & Koo, P. K. Correcting gradient-based interpretations of deep neural networks for genomics. Genome Biol. 24, 109 (2023).","journal-title":"Genome Biol."},{"key":"544_CR82","unstructured":"Shrikumar, A. et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5.6.5. Preprint at https:\/\/arxiv.org\/abs\/1811.00416 (2018)."},{"key":"544_CR83","doi-asserted-by":"crossref","unstructured":"Jores, T. Synthetic-Promoter-Designs-Enabled-by-a-Comprehensive-Analysis-of-Plant-Core-Promoters\/tree\/main\/CNN (GitHub, 2021); https:\/\/github.com\/tobjores\/Synthetic-Promoter-Designs-Enabled-by-a-Comprehensive-Analysis-of-Plant-Core-Promoters\/tree\/main\/CNN","DOI":"10.1101\/2021.01.07.425784"},{"key":"544_CR84","doi-asserted-by":"crossref","unstructured":"Jores, T. Synthetic-Promoter-Designs-Enabled-by-a-Comprehensive-Analysis-of-Plant-Core-Promoters (GitHub, 2021); https:\/\/github.com\/tobjores\/Synthetic-Promoter-Designs-Enabled-by-a-Comprehensive-Analysis-of-Plant-Core-Promoters\/tree\/main\/data\/misc","DOI":"10.1101\/2021.01.07.425784"},{"key":"544_CR85","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. Preprint at https:\/\/arxiv.org\/abs\/1502.01852 (2015).","DOI":"10.1109\/ICCV.2015.123"},{"key":"544_CR86","doi-asserted-by":"crossref","unstructured":"Jores, T. Synthetic-Promoter-Designs-Enabled-by-a-Comprehensive-Analysis-of-Plant-Core-Promoters\/blob\/main\/analysis\/validation_sequences\/promoters_for_evolution.tsv (GitHub, 2021); https:\/\/github.com\/tobjores\/Synthetic-Promoter-Designs-Enabled-by-a-Comprehensive-Analysis-of-Plant-Core-Promoters\/blob\/main\/analysis\/validation_sequences\/promoters_for_evolution.tsv","DOI":"10.1101\/2021.01.07.425784"},{"key":"544_CR87","unstructured":"Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https:\/\/arxiv.org\/abs\/1412.6980 (2014)."},{"key":"544_CR88","unstructured":"DeepBind\/Homo_sapiens\/RBP\/ (Kipoi, 2023); https:\/\/kipoi.org\/models\/DeepBind\/Homo_sapiens\/RBP\/"},{"key":"544_CR89","unstructured":"Index of Kundaje\/Akundaje\/Release\/Blacklists\/hg38-human (Univ. Stanford, 2016); http:\/\/mitra.stanford.edu\/kundaje\/akundaje\/release\/blacklists\/hg38-human\/hg38.blacklist.bed.gz"},{"key":"544_CR90","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","volume":"26","author":"AR Quinlan","year":"2010","unstructured":"Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841\u2013842 (2010).","journal-title":"Bioinformatics"},{"key":"544_CR91","unstructured":"Phuong, T. T. & Phong, L. T. On the convergence proof of AMSGrad and a new version. Preprint at https:\/\/arxiv.org\/abs\/1904.03590 (2019)."},{"key":"544_CR92","unstructured":"Detailed Information of Matrix Profile MA0491.1 (JASPAR, 2022); https:\/\/jaspar.genereg.net\/matrix\/MA0491.1"},{"key":"544_CR93","unstructured":"Shri, A. Kundajelab\/Vizsequence (GitHub, 2019); https:\/\/github.com\/kundajelab\/vizsequence"},{"key":"544_CR94","unstructured":"Kinney, J. B. Jbkinney\/Logomaker (GitHub, 2019); https:\/\/github.com\/jbkinney\/logomaker"},{"key":"544_CR95","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1214\/aoms\/1177730491","volume":"18","author":"HB Mann","year":"1947","unstructured":"Mann, H. B. & Whitney, D. R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statist. 18, 50\u201360 (1947).","journal-title":"Ann. Math. Statist."},{"key":"544_CR96","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","volume":"57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289\u2013300 (1995).","journal-title":"J. R. Stat. Soc. B"},{"key":"544_CR97","unstructured":"TomTom: Motif Comparison Tool (MEME Suite, 2023); https:\/\/meme-suite.org\/meme\/tools\/tomtom"},{"key":"544_CR98","unstructured":"Hughes, T. R. et al. Web Supplement to \"A Compendium of RNA-Binding Motifs for Decoding Gene Regulation\" (Univ. Toronto, 2023); https:\/\/hugheslab.ccbr.utoronto.ca\/supplementary-data\/RNAcompete_eukarya\/"},{"key":"544_CR99","doi-asserted-by":"publisher","unstructured":"Klie, A. Data to reproduce results presented in: Predictive analyses of regulatory sequences with EUGENe (Zenodo, 2023); https:\/\/doi.org\/10.5281\/zenodo.8169774","DOI":"10.5281\/zenodo.8169774"},{"key":"544_CR100","doi-asserted-by":"publisher","unstructured":"Klie, A., Hayden & Laub, D. ML4GLand\/EUGENe: Revision Release for EUGENe Codebase (Zenodo, 2023); https:\/\/doi.org\/10.5281\/zenodo.8357440","DOI":"10.5281\/zenodo.8357440"},{"key":"544_CR101","doi-asserted-by":"publisher","unstructured":"Klie, A. & Laub, D. ML4GLand\/EUGENe_paper: Revision Release for EUGENe Paper Repository (Zenodo, 2023); https:\/\/doi.org\/10.5281\/zenodo.8357432","DOI":"10.5281\/zenodo.8357432"}],"container-title":["Nature Computational Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s43588-023-00544-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s43588-023-00544-w","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s43588-023-00544-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,2]],"date-time":"2024-11-02T03:17:23Z","timestamp":1730517443000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s43588-023-00544-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,16]]},"references-count":101,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["544"],"URL":"https:\/\/doi.org\/10.1038\/s43588-023-00544-w","relation":{},"ISSN":["2662-8457"],"issn-type":[{"value":"2662-8457","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,16]]},"assertion":[{"value":"12 January 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 November 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}