{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T10:52:51Z","timestamp":1768560771843,"version":"3.49.0"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T00:00:00Z","timestamp":1686268800000},"content-version":"vor","delay-in-days":8,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["1R01GM125736"],"award-info":[{"award-number":["1R01GM125736"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Motifs play a crucial role in computational biology, as they provide valuable information about the binding specificity of proteins. However, conventional motif discovery methods typically rely on simple combinatoric or probabilistic approaches, which can be biased by heuristics such as substring-masking for multiple motif discovery. In recent years, deep neural networks have become increasingly popular for motif discovery, as they are capable of capturing complex patterns in data. Nonetheless, inferring motifs from neural networks remains a challenging problem, both from a modeling and computational standpoint, despite the success of these networks in supervised learning tasks.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present a principled representation learning approach based on a hierarchical sparse representation for motif discovery. Our method effectively discovers gapped, long, or overlapping motifs that we show to commonly exist in next-generation sequencing datasets, in addition to the short and enriched primary binding sites. Our model is fully interpretable, fast, and capable of capturing motifs in a large number of DNA strings. A key concept emerged from our approach\u2014enumerating at the image level\u2014effectively overcomes the k-mers paradigm, enabling modest computational resources for capturing the long and varied but conserved patterns, in addition to capturing the primary binding sites.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Our method is available as a Julia package under the MIT license at https:\/\/github.com\/kchu25\/MOTIFs.jl, and the results on experimental data can be found at https:\/\/zenodo.org\/record\/7783033.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad378","type":"journal-article","created":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T18:05:37Z","timestamp":1686333937000},"source":"Crossref","is-referenced-by-count":3,"title":["Finding motifs using DNA images derived from sparse representations"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8723-3215","authenticated-orcid":false,"given":"Shane K","family":"Chu","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Washington University in St. Louis , St. Louis, MO 63130, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6896-1850","authenticated-orcid":false,"given":"Gary D","family":"Stormo","sequence":"additional","affiliation":[{"name":"Department of Genetics, Washington University School of Medicine , St. Louis, MO 63110, United States"}]}],"member":"286","published-online":{"date-parts":[[2023,6,9]]},"reference":[{"key":"2023062420432432200_btad378-B1","doi-asserted-by":"crossref","first-page":"ii62","DOI":"10.1093\/bioinformatics\/btac469","article-title":"Deepzf: improved DNA-binding prediction of c2h2-zinc-finger proteins by deep transfer learning","volume":"38","author":"Aizenshtein-Gazit","year":"2022","journal-title":"Bioinformatics"},{"key":"2023062420432432200_btad378-B2","first-page":"1","author":"Akutsu","year":"2000"},{"key":"2023062420432432200_btad378-B3","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2023062420432432200_btad378-B4","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/s41592-021-01252-x","article-title":"Effective gene expression prediction from sequence by integrating long-range interactions","volume":"18","author":"Avsec","year":"2021","journal-title":"Nat Methods"},{"key":"2023062420432432200_btad378-B5","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/s41588-021-00782-6","article-title":"Base-resolution models of transcription-factor binding reveal soft motif syntax","volume":"53","author":"Avsec","year":"2021","journal-title":"Nat Genet"},{"key":"2023062420432432200_btad378-B6","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1016\/S0304-3975(97)00023-6","article-title":"Approximation algorithms for multiple sequence alignment","volume":"182","author":"Bafna","year":"1997","journal-title":"Theor Comput Sci"},{"key":"2023062420432432200_btad378-B7","doi-asserted-by":"crossref","first-page":"2834","DOI":"10.1093\/bioinformatics\/btab203","article-title":"Streme: accurate and versatile sequence motif discovery","volume":"37","author":"Bailey","year":"2021","journal-title":"Bioinformatics"},{"key":"2023062420432432200_btad378-B8","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1007\/BF00993379","article-title":"Unsupervised learning of multiple motifs in biopolymers using expectation maximization","volume":"21","author":"Bailey","year":"1995","journal-title":"Mach Learn"},{"key":"2023062420432432200_btad378-B9","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/S0167-6377(02)00231-6","article-title":"Mirror descent and nonlinear projected subgradient methods for convex optimization","volume":"31","author":"Beck","year":"2003","journal-title":"Operat Res Lett"},{"key":"2023062420432432200_btad378-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000016","article-title":"Distributed optimization and statistical learning via the alternating direction method of multipliers","volume":"3","author":"Boyd","year":"2010","journal-title":"FNT Mach Learn"},{"key":"2023062420432432200_btad378-B11","first-page":"391","author":"Bristow","year":"2013"},{"key":"2023062420432432200_btad378-B12","doi-asserted-by":"crossref","first-page":"D165","DOI":"10.1093\/nar\/gkab1113","article-title":"JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles","volume":"50","author":"Castro-Mondragon","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023062420432432200_btad378-B13","doi-asserted-by":"crossref","article-title":"Deep unfolded convolutional dictionary learning for motif discovery","author":"Chu","DOI":"10.1101\/2022.11.06.515322"},{"key":"2023062420432432200_btad378-B14","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-78674-2","volume-title":"Dictionary Learning Algorithms and Applications","author":"Dumitrescu","year":"2018"},{"key":"2023062420432432200_btad378-B15","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1109\/TCI.2018.2840334","article-title":"Convolutional dictionary learning: a comparative review and new algorithms","volume":"4","author":"Garcia-Cardona","year":"2018","journal-title":"IEEE Trans Comput Imaging"},{"key":"2023062420432432200_btad378-B16","first-page":"399","author":"Gregor","year":"2010"},{"key":"2023062420432432200_btad378-B17","doi-asserted-by":"crossref","first-page":"4800","DOI":"10.1093\/nar\/gku132","article-title":"An improved predictive recognition model for cys2-his2 zinc finger proteins","volume":"42","author":"Gupta","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023062420432432200_btad378-B18","doi-asserted-by":"crossref","first-page":"D316","DOI":"10.1093\/nar\/gkab996","article-title":"Remap 2022: a database of human, mouse, drosophila and arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments","volume":"50","author":"Hammal","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023062420432432200_btad378-B19","first-page":"5135","author":"Heide","year":"2015"},{"key":"2023062420432432200_btad378-B20","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1016\/j.molcel.2010.05.004","article-title":"Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities","volume":"38","author":"Heinz","year":"2010","journal-title":"Mol Cell"},{"key":"2023062420432432200_btad378-B21","first-page":"563","article-title":"Identifying DNA and protein patterns with statistically significant alignments of multiple sequences","volume":"15","author":"Hertz","year":"1999","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023062420432432200_btad378-B22","first-page":"12","author":"Hinton","year":"1986"},{"key":"2023062420432432200_btad378-B23","doi-asserted-by":"crossref","first-page":"D81","DOI":"10.1093\/nar\/gkv1272","article-title":"The dfam database of repetitive DNA families","volume":"44","author":"Hubley","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023062420432432200_btad378-B24","first-page":"1","article-title":"A universal deep-learning model for zinc finger design enables transcription factor reprogramming","author":"Ichikawa","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2023062420432432200_btad378-B25","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1101\/gr.200535.115","article-title":"Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks","volume":"26","author":"Kelley","year":"2016","journal-title":"Genome Res"},{"key":"2023062420432432200_btad378-B26","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1016\/S0968-0004(01)01800-X","article-title":"Nuclear-receptor interactions on DNA-response elements","volume":"26","author":"Khorasanizadeh","year":"2001","journal-title":"Trends Biochem Sci"},{"key":"2023062420432432200_btad378-B27","first-page":"473","author":"Li","year":"1999"},{"key":"2023062420432432200_btad378-B28","doi-asserted-by":"crossref","first-page":"1156","DOI":"10.1080\/01621459.1995.10476622","article-title":"Bayesian models for multiple local sequence alignment and gibbs sampling strategies","volume":"90","author":"Liu","year":"1995","journal-title":"J Am Stat Assoc"},{"key":"2023062420432432200_btad378-B29","first-page":"4766","article-title":"A unified approach to interpreting model predictions","volume":"30","author":"Lundberg","year":"2017","journal-title":"Adv Neural Inform Process Syst"},{"key":"2023062420432432200_btad378-B30","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/MSP.2020.3016905","article-title":"Algorithm unrolling: interpretable, efficient deep learning for signal and image processing","volume":"38","author":"Monga","year":"2021","journal-title":"IEEE Signal Process Mag"},{"key":"2023062420432432200_btad378-B31","doi-asserted-by":"crossref","first-page":"2879","DOI":"10.1093\/bioinformatics\/btv284","article-title":"Identification of c2h2-zf binding preferences from chip-seq data using rcade","volume":"31","author":"Najafabadi","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062420432432200_btad378-B32","doi-asserted-by":"crossref","first-page":"D141","DOI":"10.1093\/nar\/gkab1039","article-title":"Factorbook: an updated catalog of transcription factor motifs and candidate regulatory motif sites","volume":"50","author":"Pratt","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023062420432432200_btad378-B33","first-page":"3145","author":"Shrikumar","year":"2017"},{"key":"2023062420432432200_btad378-B34","doi-asserted-by":"crossref","first-page":"2099","DOI":"10.1093\/nar\/gkt1112","article-title":"Protein\u2013DNA binding: complexities and multi-protein codes","volume":"42","author":"Siggers","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023062420432432200_btad378-B35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1748-7188-2-15","article-title":"Efficient and accurate p-value computation for position weight matrices","volume":"2","author":"Touzet","year":"2007","journal-title":"Algorithms Mol Biol"},{"key":"2023062420432432200_btad378-B36","doi-asserted-by":"crossref","first-page":"2369","DOI":"10.1093\/bioinformatics\/btg329","article-title":"Combining phylogenetic data with co-regulated genes to identify regulatory motifs","volume":"19","author":"Wang","year":"2003","journal-title":"Bioinformatics"},{"key":"2023062420432432200_btad378-B37","doi-asserted-by":"crossref","first-page":"1431","DOI":"10.1016\/j.cell.2014.08.009","article-title":"Determination and inference of eukaryotic transcription factor sequence specificity","volume":"158","author":"Weirauch","year":"2014","journal-title":"Cell"},{"key":"2023062420432432200_btad378-B38","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1109\/TIP.2015.2495260","article-title":"Efficient algorithms for convolutional sparse representations","volume":"25","author":"Wohlberg","year":"2016","journal-title":"IEEE Trans Image Process"},{"key":"2023062420432432200_btad378-B39","doi-asserted-by":"crossref","first-page":"1088","DOI":"10.1038\/s41592-022-01562-8","article-title":"Scbasset: sequence-based modeling of single-cell atac-seq using convolutional neural networks","volume":"19","author":"Yuan","year":"2022","journal-title":"Nat Methods"},{"key":"2023062420432432200_btad378-B40","first-page":"18795","article-title":"Adabelief optimizer: adapting stepsizes by the belief in observed gradients","volume":"33","author":"Zhuang","year":"2020","journal-title":"Adv Neural Inform Process Syst"},{"key":"2023062420432432200_btad378-B41","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkad207","article-title":"On the dependent recognition of some long zinc finger proteins","author":"Zuo","year":"2023","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad378\/50563076\/btad378.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/6\/btad378\/50698074\/btad378.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/6\/btad378\/50698074\/btad378.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,25]],"date-time":"2023-06-25T00:47:21Z","timestamp":1687654041000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad378\/7192989"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":41,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad378","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,6,1]]},"published":{"date-parts":[[2023,6,1]]},"article-number":"btad378"}}