{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,29]],"date-time":"2026-06-29T13:25:48Z","timestamp":1782739548912,"version":"3.54.5"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T00:00:00Z","timestamp":1635379200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004385","name":"Ghent University","doi-asserted-by":"publisher","award":["BOFGOA2020000703"],"award-info":[{"award-number":["BOFGOA2020000703"]}],"id":[{"id":"10.13039\/501100004385","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Flemish Government under the \u2018Onderzoeksprogramma Artifici\u00eble Intelligentie (AI) Vlaanderen\u2019 Programme"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>CpG Transformer is freely available at https:\/\/github.com\/gdewael\/cpg-transformer.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab746","type":"journal-article","created":{"date-parts":[[2021,10,25]],"date-time":"2021-10-25T15:13:43Z","timestamp":1635174823000},"page":"597-603","source":"Crossref","is-referenced-by-count":36,"title":["CpG Transformer for imputation of single-cell methylomes"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0367-9699","authenticated-orcid":false,"given":"Gaetan","family":"De Waele","sequence":"first","affiliation":[{"name":"Department of Data Analysis and Mathematical Modelling, Ghent University , Ghent 9000, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5876-1406","authenticated-orcid":false,"given":"Jim","family":"Clauwaert","sequence":"additional","affiliation":[{"name":"Department of Data Analysis and Mathematical Modelling, Ghent University , Ghent 9000, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gerben","family":"Menschaert","sequence":"additional","affiliation":[{"name":"Department of Data Analysis and Mathematical Modelling, Ghent University , Ghent 9000, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5950-3003","authenticated-orcid":false,"given":"Willem","family":"Waegeman","sequence":"additional","affiliation":[{"name":"Department of Data Analysis and Mathematical Modelling, Ghent University , Ghent 9000, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2021,10,28]]},"reference":[{"key":"2023020108505852500_btab746-B1","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1038\/nmeth.3728","article-title":"Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity","volume":"13","author":"Angermueller","year":"2016","journal-title":"Nat. Methods"},{"key":"2023020108505852500_btab746-B2","first-page":"1","article-title":"DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning","volume":"18","author":"Angermueller","year":"2017","journal-title":"Genome Biol"},{"key":"2023020108505852500_btab746-B3","article-title":"Layer normalization","author":"Ba","year":"2016"},{"key":"2023020108505852500_btab746-B4","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1038\/nrg1272","article-title":"Network biology: understanding the cell\u2019s functional organization","volume":"5","author":"Barabasi","year":"2004","journal-title":"Nat. Rev. Genet"},{"key":"2023020108505852500_btab746-B5","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2020.blackboxnlp-1.14","article-title":"The elephant in the interpretability room: why use attention as explanation when we have saliency methods?","author":"Bastings","year":"2020"},{"key":"2023020108505852500_btab746-B6","article-title":"Longformer: the long-document transformer","author":"Beltagy","year":"2020"},{"key":"2023020108505852500_btab746-B7","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1101\/gad.947102","article-title":"DNA methylation patterns and epigenetic memory","volume":"16","author":"Bird","year":"2002","journal-title":"Genes Devel"},{"key":"2023020108505852500_btab746-B8","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/0092-8674(88)90479-5","article-title":"DNA methylation and gene activity","volume":"53","author":"Cedar","year":"1988","journal-title":"Cell"},{"key":"2023020108505852500_btab746-B9","first-page":"1","article-title":"Novel transformer networks for improved sequence labeling in genomics","author":"Clauwaert","year":"2020"},{"key":"2023020108505852500_btab746-B10","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1080\/01621459.1979.10481038","article-title":"Robust locally weighted regression and smoothing scatterplots","volume":"74","author":"Cleveland","year":"1979","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020108505852500_btab746-B11","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1038\/nature06745","article-title":"Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning","volume":"452","author":"Cokus","year":"2008","journal-title":"Nature"},{"key":"2023020108505852500_btab746-B12","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/P19-1285","article-title":"Transformer-xl: attentive language models beyond a fixed-length context","author":"Dai","year":"2019"},{"key":"2023020108505852500_btab746-B13","article-title":"Bert: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018"},{"key":"2023020108505852500_btab746-B14","doi-asserted-by":"crossref","first-page":"3786","DOI":"10.1093\/bioinformatics\/btz134","article-title":"Missing value estimation methods for DNA methylation data","volume":"35","author":"Di Lena","year":"2019","journal-title":"Bioinformatics"},{"key":"2023020108505852500_btab746-B15","article-title":"ProtTrans: towards cracking the language of life\u2019s code through self-supervised deep learning and high performance computing","author":"Elnaggar","year":"2020"},{"key":"2023020108505852500_btab746-B16","doi-asserted-by":"crossref","first-page":"808","DOI":"10.1016\/j.stem.2016.10.019","article-title":"DNA methylation dynamics of human hematopoietic stem cell differentiation","volume":"19","author":"Farlik","year":"2016","journal-title":"Cell Stem Cell"},{"key":"2023020108505852500_btab746-B17","doi-asserted-by":"crossref","first-page":"2126","DOI":"10.1101\/gr.161679.113","article-title":"Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing","volume":"23","author":"Guo","year":"2013","journal-title":"Genome Res"},{"key":"2023020108505852500_btab746-B18","first-page":"770","author":"He","year":"2016"},{"key":"2023020108505852500_btab746-B19","first-page":"173","author":"He","year":"2017"},{"key":"2023020108505852500_btab746-B20","article-title":"Axial attention in multidimensional transformers","author":"Ho","year":"2019"},{"key":"2023020108505852500_btab746-B21","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1038\/cr.2016.23","article-title":"Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas","volume":"26","author":"Hou","year":"2016","journal-title":"Cell Res"},{"key":"2023020108505852500_btab746-B22","first-page":"1","article-title":"LightCpG: a multi-view CpG sites detection on single-cell whole genome sequence data","volume":"20","author":"Jiang","year":"2019","journal-title":"BMC Genomics"},{"key":"2023020108505852500_btab746-B23","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023020108505852500_btab746-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1665-8","article-title":"Melissa: Bayesian clustering and imputation of single-cell methylomes","volume":"20","author":"Kapourani","year":"2019","journal-title":"Genome Biol"},{"key":"2023020108505852500_btab746-B25","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2014"},{"key":"2023020108505852500_btab746-B26","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1158\/2643-3230.BCD-19-0058","article-title":"Preneoplastic alterations define CLL DNA methylome and persist through disease progression and therapy","volume":"2","author":"Kretzmer","year":"2021","journal-title":"Blood Cancer Disc"},{"key":"2023020108505852500_btab746-B27","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1038\/nmeth.1828","article-title":"DNA methylome analysis using short bisulfite sequencing data","volume":"9","author":"Krueger","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020108505852500_btab746-B28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-020-3443-8","article-title":"MethylNet: an automated and modular deep learning approach for DNA methylation analysis","volume":"21","author":"Levy","year":"2020","journal-title":"BMC Bioinform"},{"key":"2023020108505852500_btab746-B29","first-page":"406066","article-title":"A deep learning framework for imputing missing values in genomic data","author":"Qiu","year":"2018"},{"key":"2023020108505852500_btab746-B30","article-title":"Improving language understanding by generative pre-training","author":"Radford","year":"2018"},{"key":"2023020108505852500_btab746-B31","doi-asserted-by":"crossref","DOI":"10.1101\/2021.02.12.430858","article-title":"MSA transformer","author":"Rao","year":"2021"},{"key":"2023020108505852500_btab746-B32","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108505852500_btab746-B33","first-page":"53","article-title":"Efficient content-based sparse attention with routing transformers","volume":"9","author":"Roy","year":"2021","journal-title":"Trans. Assoc. Comput. Ling"},{"key":"2023020108505852500_btab746-B34","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1038\/nmeth.3035","article-title":"Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity","volume":"11","author":"Smallwood","year":"2014","journal-title":"Nat. Methods"},{"key":"2023020108505852500_btab746-B35","doi-asserted-by":"crossref","first-page":"1750243","DOI":"10.1142\/S0217979217502435","article-title":"Collaborations between CpG sites in DNA methylation","volume":"31","author":"Song","year":"2017","journal-title":"Int. J. Mod. Phys. B"},{"key":"2023020108505852500_btab746-B36","first-page":"3319","author":"Sundararajan","year":"2017"},{"key":"2023020108505852500_btab746-B37","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1038\/nrg2341","article-title":"DNA methylation landscapes: provocative insights from epigenomics","volume":"9","author":"Suzuki","year":"2008","journal-title":"Nat. Rev. Genet"},{"key":"2023020108505852500_btab746-B38","doi-asserted-by":"crossref","first-page":"1814","DOI":"10.1093\/bioinformatics\/btab029","article-title":"Camelia: imputation in single-cell methylomes based on local similarities between cells","volume":"37","author":"Tang","year":"2021","journal-title":"Bioinformatics"},{"key":"2023020108505852500_btab746-B39","first-page":"5998","article-title":"Attention is all you need","author":"Vaswani","year":"2017"},{"key":"2023020108505852500_btab746-B40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-020-03865-z","article-title":"A novel computational strategy for DNA methylation imputation using mixture regression model (MRM)","volume":"21","author":"Yu","year":"2020","journal-title":"BMC Bioinform"},{"key":"2023020108505852500_btab746-B41","article-title":"Big bird: transformers for longer sequences","author":"Zaheer","year":"2020"},{"key":"2023020108505852500_btab746-B42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-015-0581-9","article-title":"Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements","volume":"16","author":"Zhang","year":"2015","journal-title":"Genome Biol"},{"key":"2023020108505852500_btab746-B43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-018-4766-y","article-title":"BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues","volume":"19","author":"Zou","year":"2018","journal-title":"BMC Genomics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab746\/40978227\/btab746.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/3\/597\/49008701\/btab746.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/3\/597\/49008701\/btab746.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T15:09:26Z","timestamp":1675264166000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/3\/597\/6413629"}},"subtitle":[],"editor":[{"given":"Peter","family":"Robinson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2021,10,28]]},"references-count":43,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,1,12]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab746","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.06.08.447547","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,2,1]]},"published":{"date-parts":[[2021,10,28]]}}}