{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T19:23:13Z","timestamp":1762197793718,"version":"build-2065373602"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T00:00:00Z","timestamp":1755907200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Young Collaborative Research","award":["C2004-23Y"],"award-info":[{"award-number":["C2004-23Y"]}]},{"DOI":"10.13039\/501100005847","name":"Health and Medical Research Fund","doi-asserted-by":"publisher","award":["11221026"],"award-info":[{"award-number":["11221026"]}],"id":[{"id":"10.13039\/501100005847","id-type":"DOI","asserted-by":"publisher"}]},{"name":"HKBU Start-up Grant Tier 2","award":["RC-SGT2\/19-20\/SCI\/007"],"award-info":[{"award-number":["RC-SGT2\/19-20\/SCI\/007"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>In silico transcription factor and DNA (TF\u2013DNA) binding affinity prediction plays a vital role in examining TF binding preferences and understanding gene regulation. The existing tools employ TF\u2013DNA binding profiles from in vitro high-throughput technologies to predict TF\u2013DNA binding affinity. However, TFs tend to bind to sequences in open chromatin regions in vivo, such TF binding preference is seldomly considered by these existing tools.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this study, we developed TRAFICA, an open chromatin language model to predict TF\u2013DNA binding affinity by integrating sequence characteristics of open chromatin regions from ATAC-seq experiments and in vitro TF\u2013DNA binding profiles from high-throughput technologies. We pretrained TRAFICA on over 2.8 million nucleotide sequences in open chromatin regions derived from 197 ATAC-seq experiments (115 cell lines) to learn in vivo TF binding preferences. We further fine-tuned TRAFICA using low-rank adaptation (LoRA) on PBM and HT-SELEX TF-DNA binding profiles to learn intrinsic binding preferences for specific TFs. We systematically evaluated TRAFICA and compared its predictive performance with existing prediction tools and advanced DNA language models. The experimental results demonstrated that TRAFICA significantly outperformed the others in predicting in vitro and in vivo TF\u2013DNA binding affinity, achieving state-of-the-art performance. These findings indicate that considering the sequence characteristics from open chromatin regions could significantly improve TF\u2013DNA binding affinity prediction.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The source code of TRAFICA and detailed tutorials are available at https:\/\/github.com\/ericcombiolab\/TRAFICA.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf469","type":"journal-article","created":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T11:45:00Z","timestamp":1755863100000},"source":"Crossref","is-referenced-by-count":0,"title":["TRAFICA: an open chromatin language model to improve transcription factor binding affinity prediction"],"prefix":"10.1093","volume":"41","author":[{"given":"Yu","family":"Xu","sequence":"first","affiliation":[{"name":"Department of Computer Science, Hong Kong Baptist University , 999077 Hong Kong,","place":["China"]}]},{"given":"Chonghao","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hong Kong Baptist University , 999077 Hong Kong,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0121-0291","authenticated-orcid":false,"given":"Ke","family":"Xu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hong Kong Baptist University , 999077 Hong Kong,","place":["China"]}]},{"given":"Yi","family":"Ding","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hong Kong Baptist University , 999077 Hong Kong,","place":["China"]}]},{"given":"Aiping","family":"Lyu","sequence":"additional","affiliation":[{"name":"School of Chinese Medicine, Hong Kong Baptist University , 999077 Hong Kong,","place":["China"]},{"name":"Institute of Systems Medicine and Health Sciences, Hong Kong Baptist University, 999077 Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2794-7371","authenticated-orcid":false,"given":"Lu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hong Kong Baptist University , 999077 Hong Kong,","place":["China"]},{"name":"Institute of Systems Medicine and Health Sciences, Hong Kong Baptist University, 999077 Hong Kong, China"}]}],"member":"286","published-online":{"date-parts":[[2025,8,23]]},"reference":[{"key":"2025110313555497100_btaf469-B1","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2025110313555497100_btaf469-B2","doi-asserted-by":"crossref","first-page":"9105","DOI":"10.1093\/nar\/gkac708","article-title":"DNAffinity: a machine-learning approach to predict DNA binding affinities of transcription factors","volume":"50","author":"Barissi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025110313555497100_btaf469-B3","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1038\/nprot.2008.195","article-title":"Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors","volume":"4","author":"Berger","year":"2009","journal-title":"Nat Protoc"},{"key":"2025110313555497100_btaf469-B4","doi-asserted-by":"crossref","first-page":"1429","DOI":"10.1038\/nbt1246","article-title":"Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities","volume":"24","author":"Berger","year":"2006","journal-title":"Nat Biotechnol"},{"key":"2025110313555497100_btaf469-B5","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/j.cell.2007.12.014","article-title":"High-resolution mapping and characterization of open chromatin across the genome","volume":"132","author":"Boyle","year":"2008","journal-title":"Cell"},{"key":"2025110313555497100_btaf469-B6","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1002\/0471142727.mb2129s109","article-title":"ATAC-seq: a method for assaying chromatin accessibility genome-wide","volume":"109","author":"Buenrostro","year":"2015","journal-title":"Curr Protoc Mol Biol"},{"key":"2025110313555497100_btaf469-B7","doi-asserted-by":"crossref","first-page":"D165","DOI":"10.1093\/nar\/gkab1113","article-title":"Jaspar 2022: the 9th release of the open-access database of transcription factor binding profiles","volume":"50","author":"Castro-Mondragon","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025110313555497100_btaf469-B8","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"The ENCODE Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2025110313555497100_btaf469-B9","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1038\/s41592-024-02523-z","article-title":"Nucleotide transformer: building and evaluating robust foundation models for human genomics","volume":"22","author":"Dalla-Torre","year":"2025","journal-title":"Nat Methods"},{"author":"Devlin","key":"2025110313555497100_btaf469-B10","article-title":"Bert: pre-training of deep bidirectional transformers for language understanding"},{"key":"2025110313555497100_btaf469-B11","doi-asserted-by":"crossref","first-page":"D157","DOI":"10.1093\/nar\/gks1233","article-title":"EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era","volume":"41","author":"Dreos","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2025110313555497100_btaf469-B12","first-page":"D58","article-title":"Enhanceratlas 2.0: an updated resource with enhancer annotation in 586 tissue\/cell types across nine species","volume":"48","author":"Gao","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025110313555497100_btaf469-B13","doi-asserted-by":"crossref","first-page":"2153","DOI":"10.1101\/gr.135681.111","article-title":"Genistein and bisphenol a exposure cause estrogen receptor 1 to bind thousands of sites in a cell type-specific manner","volume":"22","author":"Gertz","year":"2012","journal-title":"Genome Res"},{"article-title":"Lora: low-rank adaptation of large language models","year":"2022","author":"Hu","key":"2025110313555497100_btaf469-B14"},{"key":"2025110313555497100_btaf469-B15","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2025110313555497100_btaf469-B16","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1016\/0959-437X(95)80022-0","article-title":"Molecular mechanisms of cell-type determination in budding yeast","volume":"5","author":"Johnson","year":"1995","journal-title":"Curr Opin Genet Dev"},{"key":"2025110313555497100_btaf469-B17","doi-asserted-by":"crossref","first-page":"1497","DOI":"10.1126\/science.1141319","article-title":"Genome-wide mapping of in vivo protein-DNA interactions","volume":"316","author":"Johnson","year":"2007","journal-title":"Science"},{"key":"2025110313555497100_btaf469-B18","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1101\/gr.100552.109","article-title":"Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities","volume":"20","author":"Jolma","year":"2010","journal-title":"Genome Res"},{"key":"2025110313555497100_btaf469-B19","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/j.cell.2012.12.009","article-title":"DNA-binding specificities of human transcription factors","volume":"152","author":"Jolma","year":"2013","journal-title":"Cell"},{"key":"2025110313555497100_btaf469-B20","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1101\/gr.258848.119","article-title":"Binding specificities of human RNA-binding proteins toward structured and linear RNA sequences","volume":"30","author":"Jolma","year":"2020","journal-title":"Genome Res"},{"key":"2025110313555497100_btaf469-B21","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D18-2012","article-title":"Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing","author":"Kudo","year":"2018"},{"key":"2025110313555497100_btaf469-B22","doi-asserted-by":"crossref","first-page":"3504","DOI":"10.1093\/bioinformatics\/btw489","article-title":"Motif comparison based on similarity of binding affinity profiles","volume":"32","author":"Lambert","year":"2016","journal-title":"Bioinformatics"},{"key":"2025110313555497100_btaf469-B23","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1016\/j.cell.2018.01.029","article-title":"The human transcription factors","volume":"172","author":"Lambert","year":"2018","journal-title":"Cell"},{"key":"2025110313555497100_btaf469-B24","doi-asserted-by":"crossref","first-page":"e111","DOI":"10.1093\/nar\/gkac694","article-title":"Priesstess: interpretable, high-performing models of the sequence and structure preferences of RNA-binding proteins","volume":"50","author":"Laverty","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025110313555497100_btaf469-B25","doi-asserted-by":"crossref","first-page":"D28","DOI":"10.1093\/nar\/gkq967","article-title":"The European nucleotide archive","volume":"39","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2025110313555497100_btaf469-B26","doi-asserted-by":"crossref","first-page":"12877","DOI":"10.1093\/nar\/gkx1145","article-title":"Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding","volume":"45","author":"Li","year":"2017","journal-title":"Nucleic Acids Res"},{"author":"Loshchilov","key":"2025110313555497100_btaf469-B27"},{"key":"2025110313555497100_btaf469-B28","doi-asserted-by":"crossref","first-page":"1387","DOI":"10.1038\/s41593-024-01658-8","article-title":"Multiomic profiling of transcription factor binding and function in human brain","volume":"27","author":"Loupe","year":"2024","journal-title":"Nat Neurosci"},{"key":"2025110313555497100_btaf469-B29","doi-asserted-by":"crossref","first-page":"D882","DOI":"10.1093\/nar\/gkz1062","article-title":"New developments on the encyclopedia of DNA elements (encode) data portal","volume":"48","author":"Luo","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025110313555497100_btaf469-B30","doi-asserted-by":"crossref","first-page":"102012","DOI":"10.1016\/j.jchemneu.2021.102012","article-title":"Foxo and related transcription factors binding elements in the regulation of neurodegenerative disorders","volume":"116","author":"Oli","year":"2021","journal-title":"J Chem Neuroanat"},{"key":"2025110313555497100_btaf469-B31","doi-asserted-by":"crossref","first-page":"e63","DOI":"10.1093\/nar\/gku117","article-title":"A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and CHIP data","volume":"42","author":"Orenstein","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2025110313555497100_btaf469-B32","doi-asserted-by":"crossref","first-page":"975","DOI":"10.1038\/nmeth.4401","article-title":"Chromvar: inferring transcription-factor-associated accessibility from single-cell epigenomic data","volume":"14","author":"Schep","year":"2017","journal-title":"Nat Methods"},{"key":"2025110313555497100_btaf469-B33","doi-asserted-by":"crossref","DOI":"10.1101\/2023.07.26.550634","article-title":"Single-cell gene expression prediction from DNA sequence at large contexts","author":"Schwessinger","year":"2023"},{"key":"2025110313555497100_btaf469-B34","doi-asserted-by":"crossref","first-page":"1715","DOI":"10.18653\/v1\/P16-1162","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Sennrich","year":"2016"},{"key":"2025110313555497100_btaf469-B35","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1016\/j.cell.2011.10.053","article-title":"Cofactor binding evokes latent differences in DNA binding specificity between hox proteins","volume":"147","author":"Slattery","year":"2011","journal-title":"Cell"},{"key":"2025110313555497100_btaf469-B36","doi-asserted-by":"crossref","first-page":"127063","DOI":"10.1016\/j.neucom.2023.127063","article-title":"Roformer: enhanced transformer with rotary position embedding","volume":"568","author":"Su","year":"2024","journal-title":"Neurocomputing (Amst)"},{"key":"2025110313555497100_btaf469-B37","doi-asserted-by":"crossref","first-page":"2272","DOI":"10.1093\/bioinformatics\/btz921","article-title":"Logomaker: beautiful sequence logos in python","volume":"36","author":"Tareen","year":"2020","journal-title":"Bioinformatics"},{"key":"2025110313555497100_btaf469-B38","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025110313555497100_btaf469-B39","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1016\/j.omtn.2021.02.014","article-title":"Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture","volume":"24","author":"Wang","year":"2021","journal-title":"Mol Ther Nucleic Acids"},{"key":"2025110313555497100_btaf469-B40","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1038\/nbt.2486","article-title":"Evaluation of methods for modeling transcription factor sequence specificity","volume":"31","author":"Weirauch","year":"2013","journal-title":"Nat Biotechnol"},{"key":"2025110313555497100_btaf469-B41","doi-asserted-by":"crossref","first-page":"910","DOI":"10.15252\/msb.20167238","article-title":"Transcription factor family-specific DNA shape readout revealed by quantitative specificity models","volume":"13","author":"Yang","year":"2017","journal-title":"Mol Syst Biol"},{"key":"2025110313555497100_btaf469-B42","doi-asserted-by":"crossref","first-page":"eaaj2239","DOI":"10.1126\/science.aaj2239","article-title":"Impact of cytosine methylation on DNA binding specificities of human transcription factors","volume":"356","author":"Yin","year":"2017","journal-title":"Science"},{"key":"2025110313555497100_btaf469-B43","doi-asserted-by":"crossref","first-page":"1088","DOI":"10.1038\/s41592-022-01562-8","article-title":"Scbasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks","volume":"19","author":"Yuan","year":"2022","journal-title":"Nat Methods"},{"key":"2025110313555497100_btaf469-B44","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1109\/TCBB.2019.2947461","article-title":"Predicting in-vitro transcription factor binding sites using DNA sequence+ shape","volume":"18","author":"Zhang","year":"2021","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025110313555497100_btaf469-B45","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1038\/nrd.2016.199","article-title":"Aptamers as targeted therapeutics: current potential and challenges","volume":"16","author":"Zhou","year":"2017","journal-title":"Nat Rev Drug Discov"},{"author":"Zhou","key":"2025110313555497100_btaf469-B46"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf469\/64114792\/btaf469.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/11\/btaf469\/64114792\/btaf469.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/11\/btaf469\/64114792\/btaf469.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T18:56:09Z","timestamp":1762196169000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf469\/8240329"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,8,23]]},"references-count":46,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf469","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,11]]},"published":{"date-parts":[[2025,8,23]]},"article-number":"btaf469"}}