{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T19:24:35Z","timestamp":1779391475546,"version":"3.53.1"},"reference-count":61,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2022,1,22]],"date-time":"2022-01-22T00:00:00Z","timestamp":1642809600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2020YFB0204803"],"award-info":[{"award-number":["2020YFB0204803"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61772566"],"award-info":[{"award-number":["61772566"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Key Field R&D Plan","award":["2019B020228001"],"award-info":[{"award-number":["2019B020228001"]}]},{"name":"Guangdong Key Field R&D Plan","award":["2018B010109006"],"award-info":[{"award-number":["2018B010109006"]}]},{"name":"Introducing Innovative and Entrepreneurial Teams","award":["2016ZT06D211"],"award-info":[{"award-number":["2016ZT06D211"]}]},{"name":"Guangzhou Science and Technology Research Plan","award":["202007030010"],"award-info":[{"award-number":["202007030010"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,3,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Enhancer-promoter interaction (EPI) is a key mechanism underlying gene regulation. EPI prediction has always been a challenging task because enhancers could regulate promoters of distant target genes. Although many machine learning models have been developed, they leverage only the features in enhancers and promoters, or simply add the average genomic signals in the regions between enhancers and promoters, without utilizing detailed features between or outside enhancers and promoters. Due to a lack of large-scale features, existing methods could achieve only moderate performance, especially for predicting EPIs in different cell types. Here, we present a Transformer-based model, TransEPI, for EPI prediction by capturing large genomic contexts. TransEPI was developed based on EPI datasets derived from Hi-C or ChIA-PET data in six cell lines. To avoid over-fitting, we evaluated the TransEPI model by testing it on independent test datasets where the cell line and chromosome are different from the training data. TransEPI not only achieved consistent performance across the cross-validation and test datasets from different cell types but also outperformed the state-of-the-art machine learning and deep learning models. In addition, we found that the improved performance of TransEPI was attributed to the integration of large genomic contexts. Lastly, TransEPI was extended to study the non-coding mutations associated with brain disorders or neural diseases, and we found that TransEPI was also useful for predicting the target genes of non-coding mutations.<\/jats:p>","DOI":"10.1093\/bib\/bbab577","type":"journal-article","created":{"date-parts":[[2021,12,16]],"date-time":"2021-12-16T07:08:01Z","timestamp":1639638481000},"source":"Crossref","is-referenced-by-count":31,"title":["Capturing large genomic contexts for accurately predicting enhancer-promoter interactions"],"prefix":"10.1093","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5701-1438","authenticated-orcid":false,"given":"Ken","family":"Chen","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Sun Yat-sen University, 510000, Guangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Huiying","family":"Zhao","sequence":"additional","affiliation":[{"name":"Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 510000, Guangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuedong","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Sun Yat-sen University, 510000, Guangzhou, China"},{"name":"Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Sun Yat-sen University, 510000, Guangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2022,1,22]]},"reference":[{"key":"2022031506275191500_ref1","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1146\/annurev.genom.7.080505.115623","article-title":"Transcriptional regulatory elements in the human genome","volume":"7","author":"Maston","year":"2006","journal-title":"Annu Rev Genomics Hum Genet"},{"key":"2022031506275191500_ref2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/j.molcel.2014.06.015","article-title":"Enhancer function: mechanistic and genome-wide insights come together","volume":"55","author":"Plank","year":"2014","journal-title":"Mol Cell"},{"key":"2022031506275191500_ref3","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1038\/nrm3949","article-title":"The selection and function of cell type-specific enhancers","volume":"16","author":"Heinz","year":"2015","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2022031506275191500_ref4","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1038\/s41576-019-0128-0","article-title":"Long-range enhancer\u2013promoter contacts in gene expression control","volume":"20","author":"Schoenfelder","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2022031506275191500_ref5","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1016\/j.cell.2015.04.004","article-title":"Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions","volume":"161","author":"Lupi\u00e1\u00f1ez","year":"2015","journal-title":"Cell"},{"key":"2022031506275191500_ref6","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1007\/s10565-018-9430-4","article-title":"3D genome and its disorganization in diseases","volume":"34","author":"Li","year":"2018","journal-title":"Cell Biol Toxicol"},{"key":"2022031506275191500_ref7","doi-asserted-by":"crossref","first-page":"1369","DOI":"10.1016\/j.cell.2016.09.037","article-title":"Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters","volume":"167","author":"Javierre","year":"2016","journal-title":"Cell"},{"key":"2022031506275191500_ref8","doi-asserted-by":"crossref","first-page":"8641","DOI":"10.1093\/nar\/gkw519","article-title":"Explaining the disease phenotype of intergenic SNP through predicted long range regulation","volume":"44","author":"Chen","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2022031506275191500_ref9","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41593-020-0603-0","article-title":"A computational tool (H-MAGMA) for improved prediction of brain-disorder risk genes by incorporating brain chromatin interaction profiles","volume":"23","author":"Sey","year":"2020","journal-title":"Nat Neurosci"},{"key":"2022031506275191500_ref10","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1038\/nature11279","article-title":"The long-range interaction landscape of gene promoters","volume":"489","author":"Sanyal","year":"2012","journal-title":"Nature"},{"key":"2022031506275191500_ref11","doi-asserted-by":"crossref","first-page":"1450","DOI":"10.1093\/nar\/gks1339","article-title":"Transcription factor and chromatin features predict genes associated with eQTLs","volume":"41","author":"Wang","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2022031506275191500_ref12","doi-asserted-by":"crossref","first-page":"4440","DOI":"10.1093\/bioinformatics\/btaa254","article-title":"Predicting target genes of non-coding regulatory variants with IRT","volume":"36","author":"Wu","year":"2020","journal-title":"Bioinformatics"},{"key":"2022031506275191500_ref13","doi-asserted-by":"crossref","first-page":"1300\u201310","DOI":"10.1038\/s41588-021-00913-z","article-title":"Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression","volume":"53","author":"V\u00f5sa","year":"2021","journal-title":"Nat Genet"},{"key":"2022031506275191500_ref14","doi-asserted-by":"crossref","first-page":"521\u2013534.e15","DOI":"10.1016\/j.molcel.2020.06.007","article-title":"Robust Hi-C maps of enhancer-promoter interactions reveal the function of non-coding genome in neural development and diseases","volume":"79","author":"Lu","year":"2020","journal-title":"Mol Cell"},{"key":"2022031506275191500_ref15","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1126\/science.1181369","article-title":"Comprehensive mapping of long-range interactions reveals folding principles of the human genome","volume":"326","author":"Lieberman-Aiden","year":"2009","journal-title":"Science"},{"key":"2022031506275191500_ref16","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1038\/nature08497","article-title":"An oestrogen-receptor-alpha-bound human chromatin interactome","volume":"462","author":"Fullwood","year":"2009","journal-title":"Nature"},{"key":"2022031506275191500_ref17","doi-asserted-by":"crossref","first-page":"1665","DOI":"10.1016\/j.cell.2014.11.021","article-title":"A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping","volume":"159","author":"Rao","year":"2014","journal-title":"Cell"},{"key":"2022031506275191500_ref18","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1038\/nature11232","article-title":"The accessible chromatin landscape of the human genome","volume":"489","author":"Thurman","year":"2012","journal-title":"Nature"},{"key":"2022031506275191500_ref19","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1101\/gr.152140.112","article-title":"Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions","volume":"23","author":"Sheffield","year":"2013","journal-title":"Genome Res"},{"key":"2022031506275191500_ref20","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bax028","article-title":"GeneHancer: genome-wide integration of enhancers and target genes in GeneCards","volume":"2017","author":"Fishilevich","year":"2017","journal-title":"Database"},{"key":"2022031506275191500_ref21","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/s13059-019-1924-8","article-title":"A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods","volume":"21","author":"Moore","year":"2020","journal-title":"Genome Biol"},{"key":"2022031506275191500_ref22","doi-asserted-by":"crossref","first-page":"8694","DOI":"10.1093\/nar\/gkv865","article-title":"A predictive modeling approach for cell line-specific long-range regulatory interactions","volume":"43","author":"Roy","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2022031506275191500_ref23","doi-asserted-by":"crossref","first-page":"488","DOI":"10.1038\/ng.3539","article-title":"Enhancer\u2013promoter interactions are encoded by complex genomic signatures on looping chromatin","volume":"48","author":"Whalen","year":"2016","journal-title":"Nat Genet"},{"key":"2022031506275191500_ref24","doi-asserted-by":"crossref","first-page":"1428","DOI":"10.1038\/ng.3950","article-title":"Reconstruction of enhancer\u2013target networks in 935 samples of human primary cells, tissues and cell lines","volume":"49","author":"Cao","year":"2017","journal-title":"Nat Genet"},{"key":"2022031506275191500_ref25","doi-asserted-by":"crossref","first-page":"3877","DOI":"10.1093\/bioinformatics\/btz641","article-title":"EPIP: a novel approach for condition-specific enhancer\u2013promoter interaction prediction","volume":"35","author":"Talukder","year":"2019","journal-title":"Bioinformatics"},{"key":"2022031506275191500_ref26","doi-asserted-by":"crossref","first-page":"e1007436","DOI":"10.1371\/journal.pcbi.1007436","article-title":"EAGLE: an algorithm that utilizes a small number of genomic features to predict tissue\/cell type-specific enhancer-gene interactions","volume":"15","author":"Gao","year":"2019","journal-title":"PLoS Comput Biol"},{"key":"2022031506275191500_ref27","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1007\/s40484-019-0154-0","article-title":"Predicting enhancer-promoter interaction from genomic sequence with deep neural networks","volume":"7","author":"Singh","year":"2019","journal-title":"Quant Biol"},{"key":"2022031506275191500_ref28","doi-asserted-by":"crossref","first-page":"2899","DOI":"10.1093\/bioinformatics\/bty1050","article-title":"A simple convolutional neural network for prediction of enhancer\u2013promoter interactions with DNA sequence data","volume":"35","author":"Zhuang","year":"2019","journal-title":"Bioinformatics"},{"key":"2022031506275191500_ref29","doi-asserted-by":"crossref","first-page":"1037","DOI":"10.1093\/bioinformatics\/btz694","article-title":"Identifying enhancer\u2013promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism","volume":"36","author":"Hong","year":"2020","journal-title":"Bioinformatics"},{"key":"2022031506275191500_ref30","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbaa254","article-title":"Predicting enhancer-promoter interactions by deep learning and matching heuristic","volume":"22","author":"Min","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022031506275191500_ref31","doi-asserted-by":"crossref","first-page":"e1006625","DOI":"10.1371\/journal.pcbi.1006625","article-title":"Local epigenomic state cannot discriminate interacting and non-interacting enhancer\u2013promoter pairs with high accuracy","volume":"14","author":"Xi","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2022031506275191500_ref32","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/s41588-019-0434-7","article-title":"Inflated performance measures in enhancer\u2013promoter interaction-prediction methods","volume":"51","author":"Cao","year":"2019","journal-title":"Nat Genet"},{"key":"2022031506275191500_ref33","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1186\/s13059-020-02177-y","article-title":"A pitfall for machine learning methods aiming to predict across cell types","volume":"21","author":"Schreiber","year":"2020","journal-title":"Genome Biol"},{"key":"2022031506275191500_ref34","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1101\/gr.249367.119","article-title":"Quantitative prediction of enhancer\u2013promoter interactions","volume":"30","author":"Belokopytova","year":"2020","journal-title":"Genome Res"},{"key":"2022031506275191500_ref35","doi-asserted-by":"crossref","DOI":"10.1038\/s41592-020-0960-3","article-title":"DeepC: predicting 3D genome folding using megabase-scale transfer learning","volume":"17","author":"Schwessinger","year":"2020","journal-title":"Nat Methods"},{"key":"2022031506275191500_ref36","doi-asserted-by":"crossref","first-page":"1111","DOI":"10.1038\/s41592-020-0958-x","article-title":"Predicting 3D genome folding from DNA sequence with Akita","volume":"17","author":"Fudenberg","year":"2020","journal-title":"Nat Methods"},{"key":"2022031506275191500_ref37","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1186\/s13059-021-02453-5","article-title":"Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences","volume":"22","author":"Cao","year":"2021","journal-title":"Genome Biol"},{"key":"2022031506275191500_ref38","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2022031506275191500_ref39","article-title":"Dilated recurrent neural networks","author":"Chang","year":"2017","journal-title":"31st Conference on Neural Information Processing Systems (NIPS 2017)"},{"key":"2022031506275191500_ref40","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2022031506275191500_ref41","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1126\/science.abj8754","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","volume":"373","author":"Baek","year":"2021","journal-title":"Science"},{"key":"2022031506275191500_ref42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41592-021-01252-x","article-title":"Effective gene expression prediction from sequence by integrating long-range interactions","volume":"18","author":"Avsec","year":"2021","journal-title":"Nat Methods"},{"key":"2022031506275191500_ref43","doi-asserted-by":"crossref","first-page":"D766","DOI":"10.1093\/nar\/gky955","article-title":"GENCODE reference annotation for the human and mouse genomes","volume":"47","author":"Frankish","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022031506275191500_ref44","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/nature14248","article-title":"Integrative analysis of 111 reference human epigenomes","volume":"518","author":"Kundaje","year":"2015","journal-title":"Nature"},{"key":"2022031506275191500_ref45","article-title":"A structured self-attentive sentence embedding","author":"Lin","year":"2017","journal-title":"arXiv:1703.03130"},{"key":"2022031506275191500_ref46","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic (ROC) curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"2022031506275191500_ref47","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1038\/s41592-019-0690-6","article-title":"Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data","volume":"17","author":"Pratapa","year":"2020","journal-title":"Nat Methods"},{"key":"2022031506275191500_ref48","first-page":"8024","article-title":"PyTorch: an imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2022031506275191500_ref49","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2017","journal-title":"arXiv:1412.6980"},{"key":"2022031506275191500_ref50","first-page":"D58","article-title":"EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue\/cell types across nine species","volume":"48","author":"Gao","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022031506275191500_ref51","doi-asserted-by":"crossref","first-page":"e60","DOI":"10.1093\/nar\/gkz167","article-title":"DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning","volume":"47","author":"Li","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022031506275191500_ref52","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1016\/j.ajhg.2013.10.012","article-title":"Beyond GWASs: illuminating the dark road from association to function","volume":"93","author":"Edwards","year":"2013","journal-title":"Am J Hum Genet"},{"key":"2022031506275191500_ref53","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1017\/thg.2013.12","article-title":"Genome-wide association study of inattention and hyperactivity-impulsivity measured as quantitative traits","volume":"16","author":"Ebejer","year":"2013","journal-title":"Twin Res Hum Genet Off J Int Soc Twin Stud"},{"key":"2022031506275191500_ref54","doi-asserted-by":"crossref","first-page":"1031","DOI":"10.1038\/ng.3623","article-title":"Identification of 15 genetic loci associated with risk of major depression in individuals of European descent","volume":"48","author":"Hyde","year":"2016","journal-title":"Nat Genet"},{"key":"2022031506275191500_ref55","doi-asserted-by":"crossref","first-page":"W191","DOI":"10.1093\/nar\/gkz369","article-title":"g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update)","volume":"47","author":"Raudvere","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022031506275191500_ref56","article-title":"Informer: beyond efficient transformer for long sequence time-series forecasting","author":"Zhou","year":"2020","journal-title":"arXiv:2012.07436"},{"key":"2022031506275191500_ref57","article-title":"Rethinking attention with Performers","author":"Choromanski","year":"2020","journal-title":"arXiv:2009.14794"},{"key":"2022031506275191500_ref58","first-page":"5156","volume-title":"Proceedings of the 37th International Conference on Machine Learning","author":"Katharopoulos","year":"2020"},{"key":"2022031506275191500_ref59","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1016\/j.cell.2018.11.029","article-title":"A genome-wide framework for mapping gene regulation via cellular genetic screens","volume":"176","author":"Gasperini","year":"2019","journal-title":"Cell"},{"key":"2022031506275191500_ref60","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1038\/ng.3286","article-title":"Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C","volume":"47","author":"Mifsud","year":"2015","journal-title":"Nat Genet"},{"key":"2022031506275191500_ref61","doi-asserted-by":"crossref","first-page":"919","DOI":"10.1038\/nmeth.3999","article-title":"HiChIP: efficient and sensitive analysis of protein-directed genome architecture","volume":"13","author":"Mumbach","year":"2016","journal-title":"Nat Methods"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/2\/bbab577\/42805937\/bbab577.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/2\/bbab577\/42805937\/bbab577.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,14]],"date-time":"2023-11-14T04:13:00Z","timestamp":1699935180000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab577\/6513727"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,22]]},"references-count":61,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3,10]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab577","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.09.04.458817","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,3]]},"published":{"date-parts":[[2022,1,22]]},"article-number":"bbab577"}}