{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T23:28:06Z","timestamp":1777591686234,"version":"3.51.4"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T00:00:00Z","timestamp":1763683200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100013278","name":"JPND","doi-asserted-by":"publisher","award":["01ED2407A"],"award-info":[{"award-number":["01ED2407A"]}],"id":[{"id":"10.13039\/100013278","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,1,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Protein language models (PLMs) have revolutionized computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. Bridging this gap requires approaches that maintain predictive performance while providing interpretable explanations of model behaviour.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present PLM-eXplain (PLM-X), an explainable adapter layer that bridges this gap by factoring PLM embeddings into two complementary components: an interpretable subspace based on established biochemical features, and a residual subspace that retains predictive, non-interpretable information. Using embeddings from ESM2 and ProtBert, PLM-X incorporates well-established properties, including secondary structure and hydropathy, while maintaining high predictive performance. We demonstrate the effectiveness of our approach across three biologically relevant classification tasks: extracellular vesicle association, transmembrane helix prediction, and aggregation propensity prediction. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalizable solution for enhancing PLM interpretability across various downstream applications.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code and models are available at https:\/\/github.com\/AIT4LIFE-UU\/PLM-eXplain\/.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf631","type":"journal-article","created":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T13:19:58Z","timestamp":1763731198000},"source":"Crossref","is-referenced-by-count":3,"title":["PLM-eXplain: divide and conquer the protein embedding space"],"prefix":"10.1093","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-3877-2965","authenticated-orcid":false,"given":"Jan","family":"van Eck","sequence":"first","affiliation":[{"name":"AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University , Utrecht, 3584CC,","place":["The Netherlands"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8809-0861","authenticated-orcid":false,"given":"Dea","family":"Gogishvili","sequence":"additional","affiliation":[{"name":"AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University , Utrecht, 3584CC,","place":["The Netherlands"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4080-9328","authenticated-orcid":false,"given":"Wilson","family":"Silva","sequence":"additional","affiliation":[{"name":"AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University , Utrecht, 3584CC,","place":["The Netherlands"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2779-7174","authenticated-orcid":false,"given":"Sanne","family":"Abeln","sequence":"additional","affiliation":[{"name":"AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University , Utrecht, 3584CC,","place":["The Netherlands"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,11,21]]},"reference":[{"key":"2026011114021217300_btaf631-B1","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1016\/j.cels.2021.05.017","article-title":"Learning the protein language: evolution, structure, and function","volume":"12","author":"Bepler","year":"2021","journal-title":"Cell Syst"},{"key":"2026011114021217300_btaf631-B2","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1590\/1414-431X20132964","article-title":"Extracellular vesicles: structure, function, and potential clinical uses in renal diseases","volume":"46","author":"Borges","year":"2013","journal-title":"Braz J Med Biol Res"},{"key":"2026011114021217300_btaf631-B3","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1146\/annurev-biochem-061516-045115","article-title":"Protein misfolding, amyloid formation, and human disease: a summary of progress over the last decade","volume":"86","author":"Chiti","year":"2017","journal-title":"Annu Rev Biochem"},{"key":"2026011114021217300_btaf631-B4","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available Python tools for computational molecular biology and bioinformatics","volume":"25","author":"Cock","year":"2009","journal-title":"Bioinformatics"},{"key":"2026011114021217300_btaf631-B5","doi-asserted-by":"crossref","first-page":"1914","DOI":"10.1038\/s41467-022-29443-w","article-title":"Learning meaningful representations of protein sequences","volume":"13","author":"Detlefsen","year":"2022","journal-title":"Nat Commun"},{"key":"2026011114021217300_btaf631-B6","doi-asserted-by":"crossref","first-page":"e1009736","DOI":"10.1371\/journal.pcbi.1009736","article-title":"Positional SHAP (PoSHAP) for interpretation of machine learning models trained from biological sequences","volume":"18","author":"Dickinson","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2026011114021217300_btaf631-B7","doi-asserted-by":"crossref","first-page":"a033878","DOI":"10.1101\/cshperspect.a033878","article-title":"The amyloid phenomenon and its significance in biology and medicine","volume":"12","author":"Dobson","year":"2020","journal-title":"Cold Spring Harb Perspect Biol"},{"key":"2026011114021217300_btaf631-B8","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"Prottrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2026011114021217300_btaf631-B9","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1007\/s44163-024-00114-7","article-title":"Explainable and interpretable artificial intelligence in medicine: a systematic bibliometric review","volume":"4","author":"Frasca","year":"2024","journal-title":"Discover Artif Intell"},{"key":"2026011114021217300_btaf631-B10","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/j.neurobiolaging.2018.10.006","article-title":"Extracellular vesicles, new actors in the search for biomarkers of dementias","volume":"74","author":"G\u00e1mez-Valero","year":"2019","journal-title":"Neurobiol Aging"},{"key":"2026011114021217300_btaf631-B11","first-page":"1","article-title":"Domain-adversarial training of neural networks","volume":"17","author":"Ganin","year":"2016","journal-title":"J Mach Learn Res"},{"key":"2026011114021217300_btaf631-B12","doi-asserted-by":"crossref","first-page":"vbae154","DOI":"10.1093\/bioadv\/vbae154","article-title":"PatchProt: hydrophobic patch prediction using protein foundation models","volume":"4","author":"Gogishvili","year":"2024","journal-title":"Bioinform Adv"},{"key":"2026011114021217300_btaf631-B13","author":"Hallgren","year":"2022"},{"key":"2026011114021217300_btaf631-B14","doi-asserted-by":"crossref","first-page":"eads0018","DOI":"10.1126\/science.ads0018","article-title":"Simulating 500 million years of evolution with a language model","volume":"387","author":"Hayes","year":"2025","journal-title":"Science"},{"key":"2026011114021217300_btaf631-B15","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1038\/s41587-023-01763-2","article-title":"Efficient evolution of human antibodies from general protein language models","volume":"42.2","author":"Hie","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2026011114021217300_btaf631-B16","doi-asserted-by":"crossref","first-page":"W510","DOI":"10.1093\/nar\/gkac439","article-title":"NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning","volume":"50","author":"H\u00f8ie","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2026011114021217300_btaf631-B17","doi-asserted-by":"crossref","first-page":"e1010669","DOI":"10.1371\/journal.pcbi.1010669","article-title":"Ten quick tips for sequence-based prediction of protein properties using machine learning","volume":"18","author":"Hou","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2026011114021217300_btaf631-B18","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1038\/s42003-023-04462-5","article-title":"Learning the protein language of proteome-wide protein\u2013protein binding sites via explainable ensemble deep learning","volume":"6","author":"Hou","year":"2023","journal-title":"Commun Biol"},{"key":"2026011114021217300_btaf631-B19","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2026011114021217300_btaf631-B20","first-page":"2577","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolym Origin Res Biomol"},{"key":"2026011114021217300_btaf631-B21","doi-asserted-by":"crossref","first-page":"5473","DOI":"10.1039\/C9CS00199A","article-title":"Half a century of amyloids: past, present and future","volume":"49","author":"Ke","year":"2020","journal-title":"Chem Soc Rev"},{"key":"2026011114021217300_btaf631-B22","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/0022-2836(82)90515-0","article-title":"A simple method for displaying the hydropathic character of a protein","volume":"157.1","author":"Kyte","year":"1982","journal-title":"J Mol Biol"},{"key":"2026011114021217300_btaf631-B23","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2026011114021217300_btaf631-B24","doi-asserted-by":"crossref","first-page":"D389","DOI":"10.1093\/nar\/gkz758","article-title":"WALTZ-DB 2.0: an updated database containing structural information of experimentally determined amyloid-forming peptides","volume":"48","author":"Louros","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2026011114021217300_btaf631-B25","doi-asserted-by":"crossref","first-page":"6888","DOI":"10.1038\/s41598-019-43189-4","article-title":"PiPred\u2014a deep-learning method for prediction of \u03c0-helices in protein sequences","volume":"9","author":"Ludwiczak","year":"2019","journal-title":"Sci Rep"},{"key":"2026011114021217300_btaf631-B26"},{"key":"2026011114021217300_btaf631-B27","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1016\/0022-2836(87)90038-6","article-title":"Interior and surface of monomeric proteins","volume":"196","author":"Miller","year":"1987","journal-title":"J Mol Biol"},{"key":"2026011114021217300_btaf631-B28","doi-asserted-by":"crossref","first-page":"5727","DOI":"10.1021\/acs.jcim.3c00817","article-title":"AggBERT: best in class prediction of hexapeptide amyloidogenesis with a semi-supervised prot-BERT model","volume":"63","author":"Perez","year":"2023","journal-title":"J Chem Inf Model"},{"key":"2026011114021217300_btaf631-B29","doi-asserted-by":"crossref","first-page":"btae290","DOI":"10.1093\/bioinformatics\/btae290","article-title":"LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model","volume":"40","author":"Pratyush","year":"2024","journal-title":"Bioinformatics"},{"key":"2026011114021217300_btaf631-B30","doi-asserted-by":"publisher","first-page":"2107","DOI":"10.1038\/s41592-025-02836-7","article-title":"InterPLM: discovering interpretable features in protein language models via sparse autoencoders","volume":"22","author":"Simon","year":"2025","journal-title":"Nat Methods"},{"key":"2026011114021217300_btaf631-B31","doi-asserted-by":"publisher","first-page":"eadt5111","DOI":"10.1126\/sciadv.adt5111","article-title":"Massive experimental quantification allows interpretable deep learning of protein aggregation","volume":"11","author":"Thompson","year":"2025","journal-title":"Sci Adv"},{"key":"2026011114021217300_btaf631-B32","doi-asserted-by":"crossref","first-page":"W228","DOI":"10.1093\/nar\/gkac278","article-title":"DeepLoc 2.0: multi-label subcellular localization prediction using protein language models","volume":"50","author":"Thumuluri","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2026011114021217300_btaf631-B33","doi-asserted-by":"crossref","first-page":"e1007767","DOI":"10.1371\/journal.pcbi.1007767","article-title":"The hydrophobic effect characterises the thermodynamic signature of amyloid fibril growth","volume":"16","author":"van Gils","year":"2020","journal-title":"PLoS Comput Biol"},{"key":"2026011114021217300_btaf631-B34","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1038\/nrm.2017.125","article-title":"Shedding light on the cell biology of extracellular vesicles","volume":"19","author":"van Niel","year":"2018","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2026011114021217300_btaf631-B35","doi-asserted-by":"crossref","first-page":"D368","DOI":"10.1093\/nar\/gkad1011","article-title":"AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences","volume":"52","author":"Varadi","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2026011114021217300_btaf631-B36","author":"Vaswani"},{"key":"2026011114021217300_btaf631-B37","doi-asserted-by":"crossref","first-page":"18069","DOI":"10.1007\/s00521-019-04051-w","article-title":"The importance of interpretability and visualization in machine learning for applications in medicine and health care","volume":"32","author":"Vellido","year":"2020","journal-title":"Neural Comput Appl"},{"key":"2026011114021217300_btaf631-B38","author":"Vig J, Madani A, Varshney LR"},{"key":"2026011114021217300_btaf631-B39","doi-asserted-by":"crossref","first-page":"e120","DOI":"10.1002\/jex2.120","article-title":"Proteome encoded determinants of protein sorting into extracellular vesicles","volume":"3","author":"Waury","year":"2024","journal-title":"J Extracell Biol"},{"key":"2026011114021217300_btaf631-B40","doi-asserted-by":"crossref","first-page":"27066","DOI":"10.3402\/jev.v4.27066","article-title":"Biological properties of extracellular vesicles and their physiological functions","volume":"4","author":"Y\u00e1\u00f1ez-M\u00f3","year":"2015","journal-title":"J Extracell Vesicles"},{"key":"2026011114021217300_btaf631-B41","doi-asserted-by":"crossref","first-page":"1358","DOI":"10.1126\/science.adf2465","article-title":"Enzyme function prediction using contrastive learning","volume":"379","author":"Yu","year":"2023","journal-title":"Science"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf631\/65450439\/btaf631.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/1\/btaf631\/65450439\/btaf631.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/1\/btaf631\/65450439\/btaf631.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T19:02:23Z","timestamp":1768158143000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf631\/8339745"}},"subtitle":[],"editor":[{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2025,11,21]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf631","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,1]]},"published":{"date-parts":[[2025,11,21]]},"article-number":"btaf631"}}