{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:58:52Z","timestamp":1760162332529,"version":"3.37.3"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"Supplement_2","license":[{"start":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T00:00:00Z","timestamp":1725408000000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["101016775"],"award-info":[{"award-number":["101016775"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["LI 3333\/5\u20131","GR 3793\/6\u20131","RE3474\/8\u20131","OH 266\/6\u20131"],"award-info":[{"award-number":["LI 3333\/5\u20131","GR 3793\/6\u20131","RE3474\/8\u20131","OH 266\/6\u20131"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"name":"HPI Research School on Data Science and Engineering"},{"name":"Helmholtz Einstein International Berlin Research School in Data Science"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Summary: With the development of high-throughput technologies, genomics datasets rapidly grow in size, including functional genomics data. This has allowed the training of large Deep Learning (DL) models to predict epigenetic readouts, such as protein binding or histone modifications, from genome sequences. However, large dataset sizes come at a price of data consistency, often aggregating results from a large number of studies, conducted under varying experimental conditions. While data from large-scale consortia are useful as they allow studying the effects of different biological conditions, they can also contain unwanted biases from confounding experimental factors. Here, we introduce Metadata-guided Feature Disentanglement (MFD)\u2014an approach that allows disentangling biologically relevant features from potential technical biases. MFD incorporates target metadata into model training, by conditioning weights of the model output layer on different experimental factors. It then separates the factors into disjoint groups and enforces independence of the corresponding feature subspaces with an adversarially learned penalty. We show that the metadata-driven disentanglement approach allows for better model introspection, by connecting latent features to experimental factors, without compromising, or even improving performance in downstream tasks, such as enhancer prediction, or genetic variant discovery. The code will be made available at https:\/\/github.com\/HealthML\/MFD.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btae403","type":"journal-article","created":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T07:43:28Z","timestamp":1725522208000},"page":"ii4-ii10","source":"Crossref","is-referenced-by-count":1,"title":["Metadata-guided feature disentanglement for functional genomics"],"prefix":"10.1093","volume":"40","author":[{"given":"Alexander","family":"Rakowski","sequence":"first","affiliation":[{"name":"Digital Health Machine Learning, Hasso Plattner Institute for Digital Engineering, Digital Engineering, University of Potsdam , Campus III Building G2, Rudolf-Breitscheid-Strasse 187 , Potsdam, Brandenburg, 14482, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Remo","family":"Monti","sequence":"additional","affiliation":[{"name":"Digital Health Machine Learning, Hasso Plattner Institute for Digital Engineering, Digital Engineering, University of Potsdam , Campus III Building G2, Rudolf-Breitscheid-Strasse 187 , Potsdam, Brandenburg, 14482, Germany"},{"name":"Max-Delbr\u00fcck-Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Department of Biology, Humboldt Universit\u00e4t Berlin , Hannoversche Strasse 28, Building 101, Room 1.05 , Berlin, 10115, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Viktoriia","family":"Huryn","sequence":"additional","affiliation":[{"name":"Max-Delbr\u00fcck-Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Department of Biology, Humboldt Universit\u00e4t Berlin , Hannoversche Strasse 28, Building 101, Room 1.05 , Berlin, 10115, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marta","family":"Lemanczyk","sequence":"additional","affiliation":[{"name":"Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, Digital Engineering, University of Potsdam , Potsdam, Brandenburg, 14482, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Uwe","family":"Ohler","sequence":"additional","affiliation":[{"name":"Max-Delbr\u00fcck-Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Department of Biology, Humboldt Universit\u00e4t Berlin , Hannoversche Strasse 28, Building 101, Room 1.05 , Berlin, 10115, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christoph","family":"Lippert","sequence":"additional","affiliation":[{"name":"Digital Health Machine Learning, Hasso Plattner Institute for Digital Engineering, Digital Engineering, University of Potsdam , Campus III Building G2, Rudolf-Breitscheid-Strasse 187 , Potsdam, Brandenburg, 14482, Germany"},{"name":"Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai , New York, NY, 10029, United States of America"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,9,4]]},"reference":[{"first-page":"2513","year":"2021","author":"Adeli","key":"2024090413593114300_btae403-B1"},{"key":"2024090413593114300_btae403-B2","doi-asserted-by":"crossref","first-page":"9354","DOI":"10.1038\/s41598-019-45839-z","article-title":"The encode blacklist: identification of problematic regions of the genome","volume":"9","author":"Amemiya","year":"2019","journal-title":"Sci Rep"},{"key":"2024090413593114300_btae403-B3","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/s41592-021-01252-x","article-title":"Effective gene expression prediction from sequence by integrating long-range interactions","volume":"18","author":"Avsec","year":"2021","journal-title":"Nat Methods"},{"key":"2024090413593114300_btae403-B4","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/s41588-021-00782-6","article-title":"Base-resolution models of transcription-factor binding reveal soft motif syntax","volume":"53","author":"Avsec","year":"2021","journal-title":"Nat Genet"},{"author":"Belghazi","key":"2024090413593114300_btae403-B5"},{"year":"2023","author":"Benegas","key":"2024090413593114300_btae403-B6"},{"key":"2024090413593114300_btae403-B7","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: a review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2024090413593114300_btae403-B8","doi-asserted-by":"crossref","first-page":"4267","DOI":"10.1038\/s41467-020-18035-1","article-title":"ATAC-seq footprinting unravels kinetics of transcription factor binding during zygotic genome activation","volume":"11","author":"Bentsen","year":"2020","journal-title":"Nat Commun"},{"key":"2024090413593114300_btae403-B9","doi-asserted-by":"crossref","DOI":"10.1038\/s41586-023-06045-0","article-title":"A genomic mutational constraint map using variation in 76,156 human genomes","volume":"625","author":"Chen","year":"2024","journal-title":"Nature"},{"year":"2022","author":"Chormai","key":"2024090413593114300_btae403-B10"},{"year":"2017","author":"Dalby","key":"2024090413593114300_btae403-B11"},{"year":"2019","author":"W. Falcon and The PyTorch Lightning Team","key":"2024090413593114300_btae403-B12"},{"key":"2024090413593114300_btae403-B13","first-page":"2096","article-title":"Domain-adversarial training of neural networks","volume":"17","author":"Ganin","year":"2016","journal-title":"J Mach Learn Res"},{"key":"2024090413593114300_btae403-B14","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1101\/gr.247494.118","article-title":"Deep neural networks for interpreting RNA-binding protein target preferences","volume":"30","author":"Ghanbari","year":"2020","journal-title":"Genome Res"},{"year":"2016","author":"Ha","key":"2024090413593114300_btae403-B15"},{"year":"2021","author":"He","key":"2024090413593114300_btae403-B16"},{"key":"2024090413593114300_btae403-B17","doi-asserted-by":"crossref","first-page":"D590","DOI":"10.1093\/nar\/gkj144","article-title":"The UCSC genome browser database: update 2006","volume":"34","author":"Hinrichs","year":"2006","journal-title":"Nucleic Acids Res"},{"year":"2019","author":"Hooker","key":"2024090413593114300_btae403-B18"},{"key":"2024090413593114300_btae403-B19","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1101\/gr.200535.115","article-title":"Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks","volume":"26","author":"Kelley","year":"2016","journal-title":"Genome Res"},{"key":"2024090413593114300_btae403-B20","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1101\/gr.227819.117","article-title":"Sequential regulatory activity prediction across chromosomes with convolutional neural networks","volume":"28","author":"Kelley","year":"2018","journal-title":"Genome Res"},{"key":"2024090413593114300_btae403-B21","doi-asserted-by":"crossref","first-page":"e1010932","DOI":"10.1371\/journal.pgen.1010932","article-title":"eQTL catalogue 2023: new datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs","volume":"19","author":"Kerimov","year":"2023","journal-title":"PLoS Genet"},{"first-page":"2207","year":"2020","author":"Khemakhem","key":"2024090413593114300_btae403-B22"},{"year":"2014","author":"Kingma","key":"2024090413593114300_btae403-B23"},{"year":"2020","author":"Kokhlikyan","key":"2024090413593114300_btae403-B24"},{"key":"2024090413593114300_btae403-B25","doi-asserted-by":"crossref","first-page":"e161","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet"},{"first-page":"6348","year":"2020","author":"Locatello","key":"2024090413593114300_btae403-B26"},{"key":"2024090413593114300_btae403-B27","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1038\/ng.2653","article-title":"The genotype-tissue expression (GTEx) project","volume":"45","author":"Lonsdale","year":"2013","journal-title":"Nat Genet"},{"key":"2024090413593114300_btae403-B28","first-page":"337","article-title":"Biologically informed deep learning to query gene programs in single-cell atlases","volume":"25","author":"Lotfollahi","year":"2023","journal-title":"Nat Cell Biol"},{"key":"2024090413593114300_btae403-B29","doi-asserted-by":"crossref","first-page":"D882","DOI":"10.1093\/nar\/gkz1062","article-title":"New developments on the encyclopedia of DNA elements (ENCODE) data portal","volume":"48","author":"Luo","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024090413593114300_btae403-B30","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1186\/s13059-023-02956-3","article-title":"Correcting gradient-based interpretations of deep neural networks for genomics","volume":"24","author":"Majdandzic","year":"2023","journal-title":"Genome Biol"},{"key":"2024090413593114300_btae403-B31","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1038\/s41586-020-2493-4","article-title":"Expanded encyclopaedias of DNA elements in the human and mouse genomes","volume":"583","author":"Moore","year":"2020","journal-title":"Nature"},{"key":"2024090413593114300_btae403-B32","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1038\/s41576-022-00532-2","article-title":"Obtaining genetics insights from deep learning via explainable artificial intelligence","volume":"24","author":"Novakovsky","year":"2023","journal-title":"Nat Rev Genet"},{"journal-title":"Advances in Neural Information Processing Systems, 32.","article-title":"PyTorch: an imperative style, high-performance deep learning library","author":"Paszke","key":"2024090413593114300_btae403-B33"},{"key":"2024090413593114300_btae403-B34","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"year":"2019","author":"Reddi","key":"2024090413593114300_btae403-B35"},{"key":"2024090413593114300_btae403-B36","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1186\/s13059-020-01977-6","article-title":"Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome","volume":"21","author":"Schreiber","year":"2020","journal-title":"Genome Biol"},{"first-page":"3319","year":"2017","author":"Sundararajan","key":"2024090413593114300_btae403-B37"},{"key":"2024090413593114300_btae403-B38","doi-asserted-by":"crossref","first-page":"D88","DOI":"10.1093\/nar\/gkl822","article-title":"Vista enhancer browser\u2013a database of tissue-specific human enhancers","volume":"35","author":"Visel","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2024090413593114300_btae403-B39","doi-asserted-by":"crossref","first-page":"D88","DOI":"10.1093\/nar\/gkl822","article-title":"Vista enhancer browser\u2013a database of tissue-specific human enhancers","volume":"35","author":"Visel","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2024090413593114300_btae403-B40","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1186\/s12859-017-1766-x","article-title":"Correcting nucleotide-specific biases in high-throughput sequencing data","volume":"18","author":"Wang","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2024090413593114300_btae403-B41","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1016\/j.gpb.2022.12.007","article-title":"Deepnoise: signal and noise disentanglement based on classifying fluorescent microscopy images via deep learning","volume":"20","author":"Yang","year":"2022","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2024090413593114300_btae403-B42","doi-asserted-by":"crossref","first-page":"6010","DOI":"10.1038\/s41467-020-19784-9","article-title":"Training confounder-free deep learning models for medical applications","volume":"11","author":"Zhao","year":"2020","journal-title":"Nat Commun"},{"key":"2024090413593114300_btae403-B43","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning\u2013based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_2\/ii4\/59016988\/btae403.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_2\/ii4\/59016988\/btae403.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T07:43:49Z","timestamp":1725522229000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_2\/ii4\/7749077"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,1]]},"references-count":43,"journal-issue":{"issue":"Supplement_2","published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae403","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,9,1]]}}}