{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T11:45:21Z","timestamp":1769859921861,"version":"3.49.0"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2023,9,5]],"date-time":"2023-09-05T00:00:00Z","timestamp":1693872000000},"content-version":"vor","delay-in-days":4,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R35GM124952"],"award-info":[{"award-number":["R35GM124952"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised \u201czero-shot\u201d learning or supervised \u201cfew-shot\u201d learning.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Codes and data can be accessed at https:\/\/github.com\/Shen-Lab\/ncVarPred-1D3D and https:\/\/zenodo.org\/record\/7975777.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad541","type":"journal-article","created":{"date-parts":[[2023,9,5]],"date-time":"2023-09-05T16:22:15Z","timestamp":1693930935000},"source":"Crossref","is-referenced-by-count":4,"title":["Multimodal learning of noncoding variant effects using genome sequence and chromatin structure"],"prefix":"10.1093","volume":"39","author":[{"given":"Wuwei","family":"Tan","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Texas A&M University , College Station, TX 77843, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1703-7796","authenticated-orcid":false,"given":"Yang","family":"Shen","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Texas A&M University , College Station, TX 77843, United States"},{"name":"Department of Computer Science and Engineering, Texas A&M University , College Station, TX 77843, United States"},{"name":"Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University , Houston, TX 77030, United States"}]}],"member":"286","published-online":{"date-parts":[[2023,9,5]]},"reference":[{"key":"2023091504322991600_btad541-B1","doi-asserted-by":"crossref","first-page":"W39","DOI":"10.1093\/nar\/gkv416","article-title":"The MEME suite","volume":"43","author":"Bailey","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023091504322991600_btad541-B2","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baaa105","article-title":"ncVarDB: a manually curated database for pathogenic non-coding variants and benign controls","volume":"2020","author":"Biggs","year":"2020","journal-title":"Database"},{"key":"2023091504322991600_btad541-B3","doi-asserted-by":"crossref","first-page":"24","DOI":"10.3389\/fgene.2016.00024","article-title":"Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells","volume":"7","author":"Boeva","year":"2016","journal-title":"Front Genet"},{"key":"2023091504322991600_btad541-B4","doi-asserted-by":"crossref","first-page":"2472","DOI":"10.1038\/s41467-020-16106-x","article-title":"Determinants of transcription factor regulatory range","volume":"11","author":"Chen","year":"2020","journal-title":"Nat Commun"},{"key":"2023091504322991600_btad541-B5","doi-asserted-by":"crossref","first-page":"940","DOI":"10.1038\/s41588-022-01102-2","article-title":"A sequence-based global map of regulatory activity for deciphering human genetics","volume":"54","author":"Chen","year":"2022","journal-title":"Nat Genet"},{"key":"2023091504322991600_btad541-B6","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1186\/s12918-018-0643-1","article-title":"Multiple transcription factors contribute to inter-chromosomal interaction in yeast","volume":"12","author":"Dai","year":"2018","journal-title":"BMC Syst Biol"},{"key":"2023091504322991600_btad541-B7","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1038\/s41436-020-00974-1","article-title":"Interpreting the impact of noncoding structural variation in neurodevelopmental disorders","volume":"23","author":"D'haene","year":"2021","journal-title":"Genet Med"},{"key":"2023091504322991600_btad541-B8","doi-asserted-by":"crossref","first-page":"168180","DOI":"10.1016\/j.jmb.2023.168180","article-title":"PyMEGABASE: predicting cell-type-specific structural annotations of chromosomes using the epigenome","volume":"435","author":"Dodero-Rojas","year":"2023","journal-title":"J Mol Biol"},{"key":"2023091504322991600_btad541-B10","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1016\/j.tig.2021.08.010","article-title":"Uncovering the impact of noncoding variants in neurodegenerative brain diseases","volume":"38","author":"Frydas","year":"2022","journal-title":"Trends Genet"},{"key":"2023091504322991600_btad541-B11","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1186\/s13059-014-0480-5","article-title":"Funseq2: a framework for prioritizing noncoding regulatory variants in cancer","volume":"15","author":"Fu","year":"2014","journal-title":"Genome Biol"},{"key":"2023091504322991600_btad541-B12","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1038\/nature08497","article-title":"An oestrogen-receptor-\u03b1-bound human chromatin interactome","volume":"462","author":"Fullwood","year":"2009","journal-title":"Nature"},{"key":"2023091504322991600_btad541-B13","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2023091504322991600_btad541-B14","first-page":"930","article-title":"Chromatin interaction\u2013aware gene regulatory modeling with graph attention networks","volume":"32","author":"Karbalayghareh","year":"2022","journal-title":"Genome Res"},{"key":"2023091504322991600_btad541-B15","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1101\/gr.200535.115","article-title":"Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks","volume":"26","author":"Kelley","year":"2016","journal-title":"Genome Res"},{"key":"2023091504322991600_btad541-B16","author":"Kipf","year":"2016"},{"key":"2023091504322991600_btad541-B17","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/s41576-018-0089-8","article-title":"Chromatin accessibility and the regulatory epigenome","volume":"20","author":"Klemm","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2023091504322991600_btad541-B18","doi-asserted-by":"crossref","first-page":"e69853","DOI":"10.1371\/journal.pone.0069853","article-title":"Chromatin accessibility data sets show bias due to sequence specificity of the DNAse I enzyme","volume":"8","author":"Koohy","year":"2013","journal-title":"PLoS One"},{"key":"2023091504322991600_btad541-B19","doi-asserted-by":"crossref","first-page":"D252","DOI":"10.1093\/nar\/gkx1106","article-title":"HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale chip-seq analysis","volume":"46","author":"Kulakovskiy","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023091504322991600_btad541-B20","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1126\/science.1181369","article-title":"Comprehensive mapping of long-range interactions reveals folding principles of the human genome","volume":"326","author":"Lieberman-Aiden","year":"2009","journal-title":"Science"},{"key":"2023091504322991600_btad541-B21","doi-asserted-by":"crossref","first-page":"D882","DOI":"10.1093\/nar\/gkz1062","article-title":"New developments on the encyclopedia of DNA elements (ENCODE) data portal","volume":"48","author":"Luo","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023091504322991600_btad541-B22","doi-asserted-by":"crossref","first-page":"3668","DOI":"10.1073\/pnas.1813565116","article-title":"Epigenomic analysis reveals DNA motifs regulating histone modifications in human and mouse","volume":"116","author":"Ngo","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2023091504322991600_btad541-B23","doi-asserted-by":"crossref","first-page":"13413","DOI":"10.1038\/s41598-020-70218-4","article-title":"Enhancing the interpretability of transcription factor binding site prediction using attention mechanism","volume":"10","author":"Park","year":"2020","journal-title":"Sci Rep"},{"key":"2023091504322991600_btad541-B24","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1101\/gr.097857.109","article-title":"Detection of nonneutral substitution rates on mammalian phylogenies","volume":"20","author":"Pollard","year":"2010","journal-title":"Genome Res"},{"key":"2023091504322991600_btad541-B25","doi-asserted-by":"crossref","first-page":"e1007024","DOI":"10.1371\/journal.pcbi.1007024","article-title":"Predicting three-dimensional genome organization with chromatin states","volume":"15","author":"Qi","year":"2019","journal-title":"PLoS Comput Biol"},{"key":"2023091504322991600_btad541-B26","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1093\/nar\/gkw226","article-title":"DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences","volume":"44","author":"Quang","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023091504322991600_btad541-B27","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1093\/bioinformatics\/btu703","article-title":"DanN: a deep learning approach for annotating the pathogenicity of genetic variants","volume":"31","author":"Quang","year":"2015","journal-title":"Bioinformatics"},{"key":"2023091504322991600_btad541-B28","doi-asserted-by":"crossref","first-page":"D886","DOI":"10.1093\/nar\/gky1016","article-title":"CADD: predicting the deleteriousness of variants throughout the human genome","volume":"47","author":"Rentzsch","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023091504322991600_btad541-B29","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1038\/nmeth.2832","article-title":"Functional annotation of noncoding sequence variants","volume":"11","author":"Ritchie","year":"2014","journal-title":"Nat Methods"},{"key":"2023091504322991600_btad541-B30","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/nature14248","article-title":"Integrative analysis of 111 reference human epigenomes","volume":"518","author":"Roadmap Epigenomics Consortium","year":"2015","journal-title":"Nature"},{"key":"2023091504322991600_btad541-B31","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1093\/bioinformatics\/btx536","article-title":"FATHMM-XF: accurate prediction of pathogenic point mutations via extended features","volume":"34","author":"Rogers","year":"2018","journal-title":"Bioinformatics"},{"key":"2023091504322991600_btad541-B32","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1038\/s41587-022-01612-8","article-title":"Cell-type-specific prediction of 3d chromatin organization enables high-throughput in silico genetic screening","volume":"41","author":"Tan","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2023091504322991600_btad541-B9","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"The ENCODE Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023091504322991600_btad541-B33","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1186\/s13059-020-01987-4","article-title":"DeepMILO: a deep learning approach to predict the impact of non-coding sequence variants on 3D chromatin structure","volume":"21","author":"Trieu","year":"2020","journal-title":"Genome Biol"},{"key":"2023091504322991600_btad541-B34","doi-asserted-by":"crossref","first-page":"4755","DOI":"10.1038\/s41467-019-12721-5","article-title":"Identification of atrial fibrillation associated genes and functional non-coding variants","volume":"10","author":"van Ouwerkerk","year":"2019","journal-title":"Nat Commun"},{"key":"2023091504322991600_btad541-B35","first-page":"17283","article-title":"Big bird: transformers for longer sequences","volume":"33","author":"Zaheer","year":"2020","journal-title":"Adv Neural Inf Process Sys"},{"key":"2023091504322991600_btad541-B36","doi-asserted-by":"crossref","first-page":"R102","DOI":"10.1093\/hmg\/ddv259","article-title":"Non-coding genetic variants in human disease","volume":"24","author":"Zhang","year":"2015","journal-title":"Hum Mol Genet"},{"key":"2023091504322991600_btad541-B37","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1038\/nature12716","article-title":"Chromatin connectivity maps reveal dynamic promoter\u2013enhancer long-range associations","volume":"504","author":"Zhang","year":"2013","journal-title":"Nature"},{"key":"2023091504322991600_btad541-B38","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1038\/s41588-022-01065-4","article-title":"Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale","volume":"54","author":"Zhou","year":"2022","journal-title":"Nat Genet"},{"key":"2023091504322991600_btad541-B39","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning\u2013based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat Methods"},{"key":"2023091504322991600_btad541-B40","doi-asserted-by":"crossref","first-page":"1171","DOI":"10.1038\/s41588-018-0160-6","article-title":"Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk","volume":"50","author":"Zhou","year":"2018","journal-title":"Nat Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad541\/51359172\/btad541.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/9\/btad541\/51556142\/btad541.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/9\/btad541\/51556142\/btad541.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,15]],"date-time":"2023-09-15T04:45:34Z","timestamp":1694753134000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad541\/7260506"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,9,1]]},"references-count":40,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2023,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad541","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,9,1]]},"published":{"date-parts":[[2023,9,1]]},"article-number":"btad541"}}