{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T06:45:11Z","timestamp":1775025911605,"version":"3.50.1"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2017,7,12]],"date-time":"2017-07-12T00:00:00Z","timestamp":1499817600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61573207"],"award-info":[{"award-number":["61573207"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61175002"],"award-info":[{"award-number":["61175002"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["71101010"],"award-info":[{"award-number":["71101010"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61673241"],"award-info":[{"award-number":["61673241"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61561146396"],"award-info":[{"award-number":["61561146396"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The source code can be downloaded from https:\/\/github.com\/minxueric\/ismb2017_lstm.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary materials are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx234","type":"journal-article","created":{"date-parts":[[2017,5,12]],"date-time":"2017-05-12T11:10:16Z","timestamp":1494587416000},"page":"i92-i101","source":"Crossref","is-referenced-by-count":105,"title":["Chromatin accessibility prediction via convolutional long short-term memory networks with<i>k<\/i>-mer embedding"],"prefix":"10.1093","volume":"33","author":[{"given":"Xu","family":"Min","sequence":"first","affiliation":[{"name":"MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China"},{"name":"Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wanwen","family":"Zeng","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China"},{"name":"Department of Automation, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ning","family":"Chen","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China"},{"name":"Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ting","family":"Chen","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China"},{"name":"Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China"},{"name":"Program in Computational Biology and Bioinformatics, University of Southern California, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rui","family":"Jiang","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China"},{"name":"Department of Automation, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2017,7,12]]},"reference":[{"issue":"8","key":"2023051506504587600_btx234-B1","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023051506504587600_btx234-B2","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1109\/72.279181","article-title":"Learning long-term dependencies with gradient descent is difficult","volume":"5","author":"Bengio","year":"1994","journal-title":"IEEE Trans. Neural Netw"},{"key":"2023051506504587600_btx234-B3","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/7503.003.0024","article-title":"Greedy layer-wise training of deep networks","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Bengio","year":"2007"},{"key":"2023051506504587600_btx234-B4","author":"Chollet","year":"2015"},{"key":"2023051506504587600_btx234-B5","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1126\/science.1105136","article-title":"The encode (encyclopedia of DNA elements) project","volume":"306","author":"Consortium","year":"2004","journal-title":"Science"},{"key":"2023051506504587600_btx234-B6","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1101\/gr.4074106","article-title":"Genome-wide mapping of dnase hypersensitive sites using massively parallel signature sequencing (mpss)","volume":"16","author":"Crawford","year":"2006","journal-title":"Genome Res"},{"key":"2023051506504587600_btx234-B7","first-page":"2121","article-title":"Adaptive subgradient methods for online learning and stochastic optimization","volume":"12","author":"Duchi","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051506504587600_btx234-B8","doi-asserted-by":"crossref","first-page":"e1003711","DOI":"10.1371\/journal.pcbi.1003711","article-title":"Enhanced regulatory sequence prediction using gapped k-mer features","volume":"10","author":"Ghandi","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023051506504587600_btx234-B9","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1080\/00437956.1954.11659520","article-title":"Distributional structure","volume":"10","author":"Harris","year":"1954","journal-title":"Word"},{"key":"2023051506504587600_btx234-B10","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-10578-9_23","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume-title":"IEEE transactions on pattern analysis and machine intelligence (TPAMI)","author":"He","year":"2014"},{"key":"2023051506504587600_btx234-B11","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1126\/science.1127647","article-title":"Reducing the dimensionality of data with neural networks","volume":"313","author":"Hinton","year":"2006","journal-title":"Science"},{"key":"2023051506504587600_btx234-B12","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1142\/S0218488598000094","article-title":"The vanishing gradient problem during learning recurrent neural nets and problem solutions","volume":"6","author":"Hochreiter","year":"1998","journal-title":"Int. J. Uncertain. Fuzz. Knowledge-Based Syst"},{"key":"2023051506504587600_btx234-B13","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023051506504587600_btx234-B14","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1038\/ng.759","article-title":"Chromatin accessibility pre-determines glucocorticoid receptor binding patterns","volume":"43","author":"John","year":"2011","journal-title":"Nature Genet"},{"issue":"7","key":"2023051506504587600_btx234-B15","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1101\/gr.200535.115","article-title":"Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks","volume":"26","author":"Kelley","year":"2016","journal-title":"Genome Res"},{"key":"2023051506504587600_btx234-B16","doi-asserted-by":"crossref","DOI":"10.3115\/v1\/D14-1181","article-title":"Convolutional neural networks for sentence classification","volume-title":"Conference on Empirical Methods on Natural Language Processing (EMNLP), Association for Computational Linguistics (ACL)","author":"Kim","year":"2014"},{"key":"2023051506504587600_btx234-B17","first-page":"1097","author":"Krizhevsky","year":"2012"},{"key":"2023051506504587600_btx234-B18","first-page":"1188","volume-title":"ICML","author":"Le","year":"2014"},{"key":"2023051506504587600_btx234-B19","doi-asserted-by":"crossref","first-page":"2167","DOI":"10.1101\/gr.121905.111","article-title":"Discriminative prediction of mammalian enhancers from dna sequence","volume":"21","author":"Lee","year":"2011","journal-title":"Genome Res"},{"key":"2023051506504587600_btx234-B20","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D15-1166","article-title":"Effective approaches to attention-based neural machine translation","volume-title":"Conference on Empirical Methods on Natural Language Processing (EMNLP), Association for Computational Linguistics (ACL)","author":"Luong","year":"2015"},{"key":"2023051506504587600_btx234-B21","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"2023051506504587600_btx234-B22","first-page":"3111","author":"Mikolov","year":"2013"},{"key":"2023051506504587600_btx234-B23","first-page":"637","author":"Min","year":"2016"},{"key":"2023051506504587600_btx234-B24","doi-asserted-by":"crossref","first-page":"2671","DOI":"10.1101\/gad.1615707","article-title":"Open conformation chromatin and pluripotency","volume":"21","author":"Niwa","year":"2007","journal-title":"Genes Dev"},{"key":"2023051506504587600_btx234-B25","first-page":"1532","author":"Pennington","year":"2014"},{"key":"2023051506504587600_btx234-B26","first-page":"68","author":"S\u00f8nderby","year":"2015"},{"key":"2023051506504587600_btx234-B27","doi-asserted-by":"crossref","DOI":"10.3115\/v1\/P15-1150","article-title":"Improved semantic representations from tree-structured long short-term memory networks","volume-title":"Annual Meeting of the Association for Computational Linguistics","author":"Tai","year":"2015"},{"key":"2023051506504587600_btx234-B28","author":"Tieleman","year":"2012"},{"key":"2023051506504587600_btx234-B29","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1038\/nmeth.2713","article-title":"Coupling transcription factor occupancy to nucleosome architecture with DNase-flash","volume":"11","author":"Vierstra","year":"2014","journal-title":"Nat. Methods"},{"issue":"2","key":"2023051506504587600_btx234-B30","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1093\/nsr\/nww025","article-title":"Modeling the causal regulatory network by integrating chromatin accessibility and transcriptome data","volume":"3","author":"Wang","year":"2016","journal-title":"Natl. Sci. Rev"},{"key":"2023051506504587600_btx234-B31","doi-asserted-by":"crossref","first-page":"1429","DOI":"10.1016\/S0893-6080(03)00138-2","article-title":"The general inefficiency of batch training for gradient descent learning","volume":"16","author":"Wilson","year":"2003","journal-title":"Neural Netw"},{"key":"2023051506504587600_btx234-B33","doi-asserted-by":"crossref","first-page":"i121","DOI":"10.1093\/bioinformatics\/btw255","article-title":"Convolutional neural network architectures for predicting DNA-protein binding","volume":"32","author":"Zeng","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051506504587600_btx234-B34","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning-based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat. Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i92\/50315275\/bioinformatics_33_14_i92.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i92\/50315275\/bioinformatics_33_14_i92.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,24]],"date-time":"2024-06-24T02:44:30Z","timestamp":1719197070000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/14\/i92\/3953949"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7,12]]},"references-count":33,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2017,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx234","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,7,15]]},"published":{"date-parts":[[2017,7,12]]}}}