{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T11:08:26Z","timestamp":1771931306302,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T00:00:00Z","timestamp":1763596800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002347","name":"German Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Detection of gene regulatory aberrations enhances our ability to interpret the impact of inherited and acquired genetic variation for rare disease diagnostics and tumor characterization. While numerous methods for calling RNA expression outliers from RNA-sequencing data have been proposed, the establishment of protein expression outliers from mass spectrometry data is lacking.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we propose and assess various modeling approaches to call protein expression outliers across three datasets from rare disease diagnostics and oncology. We use as independent evidence the enrichment for outlier calls in matched RNA-seq samples and the enrichment for rare variants likely disrupting protein expression. We show that controlling for hidden confounders and technical covariates, while simultaneously modeling the occurrence of missing values, is largely beneficial and can be achieved using conditional autoencoders. Moreover, we find that the differences between experimental and fitted log-transformed intensities by such models exhibit heavy tails that are poorly captured with the Gaussian distribution and report stronger statistical calibration when instead using the Student\u2019s t-distribution. Our resulting method, PROTRIDER, outperformed baseline approaches based on raw log-intensities Z-scores, PCA, and isolation-based anomaly detection with Isolation forests. The application of PROTRIDER reveals significant enrichments of AlphaMissense pathogenic variants in protein expression outliers. Overall, PROTRIDER provides a method to confidently identify aberrantly expressed proteins applicable to rare disease diagnostics and cancer proteomics.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>PROTRIDER is freely available at github.com\/gagneurlab\/PROTRIDER and also available on Zenodo under the DOI zenodo.15569781.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf628","type":"journal-article","created":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:50:31Z","timestamp":1763470231000},"source":"Crossref","is-referenced-by-count":0,"title":["PROTRIDER: protein abundance outlier detection from mass spectrometry-based proteomics data with a conditional autoencoder"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-4223-2199","authenticated-orcid":false,"given":"Daniela","family":"Klaproth-Andrade","sequence":"first","affiliation":[{"name":"School of Computation, Information and Technology, Technical University of Munich , Garching, 85748,","place":["Germany"]}]},{"given":"Ines F","family":"Scheller","sequence":"additional","affiliation":[{"name":"School of Computation, Information and Technology, Technical University of Munich , Garching, 85748,","place":["Germany"]},{"name":"Computational Health Center, Helmholtz Munich , Neuherberg, 85764,","place":["Germany"]}]},{"given":"Georgios","family":"Tsitsiridis","sequence":"additional","affiliation":[{"name":"School of Computation, Information and Technology, Technical University of Munich , Garching, 85748,","place":["Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5571-0435","authenticated-orcid":false,"given":"Stefan","family":"Loipfinger","sequence":"additional","affiliation":[{"name":"School of Computation, Information and Technology, Technical University of Munich , Garching, 85748,","place":["Germany"]}]},{"given":"Christian","family":"Mertes","sequence":"additional","affiliation":[{"name":"School of Computation, Information and Technology, Technical University of Munich , Garching, 85748,","place":["Germany"]},{"name":"Institute of Human Genetics, School of Medicine and Health, Technical University of Munich , Munich, 81675,","place":["Germany"]}]},{"given":"Dmitrii","family":"Smirnov","sequence":"additional","affiliation":[{"name":"Computational Health Center, Helmholtz Munich , Neuherberg, 85764,","place":["Germany"]},{"name":"Institute of Human Genetics, School of Medicine and Health, Technical University of Munich , Munich, 81675,","place":["Germany"]}]},{"given":"Holger","family":"Prokisch","sequence":"additional","affiliation":[{"name":"Computational Health Center, Helmholtz Munich , Neuherberg, 85764,","place":["Germany"]},{"name":"Institute of Human Genetics, School of Medicine and Health, Technical University of Munich , Munich, 81675,","place":["Germany"]},{"name":"German Center for Child and Adolescent Health (DZKJ), Partner Site Munich , Munich, 80333,","place":["Germany"]}]},{"given":"Vicente A","family":"Y\u00e9pez","sequence":"additional","affiliation":[{"name":"School of Computation, Information and Technology, Technical University of Munich , Garching, 85748,","place":["Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8924-8365","authenticated-orcid":false,"given":"Julien","family":"Gagneur","sequence":"additional","affiliation":[{"name":"School of Computation, Information and Technology, Technical University of Munich , Garching, 85748,","place":["Germany"]},{"name":"Computational Health Center, Helmholtz Munich , Neuherberg, 85764,","place":["Germany"]},{"name":"Institute of Human Genetics, School of Medicine and Health, Technical University of Munich , Munich, 81675,","place":["Germany"]}]}],"member":"286","published-online":{"date-parts":[[2025,11,20]]},"reference":[{"key":"2026022405164130400_btaf628-B1","doi-asserted-by":"crossref","first-page":"4372","DOI":"10.1158\/0008-5472.CAN-12-3342","article-title":"The exomes of the NCI-60 panel: a genomic resource for cancer biology and systems pharmacology","volume":"73","author":"Abaan","year":"2013","journal-title":"Cancer Res"},{"key":"2026022405164130400_btaf628-B2","doi-asserted-by":"publisher","author":"Ahlmann-Eltze","year":"2019","DOI":"10.1101\/661496,"},{"key":"2026022405164130400_btaf628-B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple hypothesis testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J R Stat Soc B"},{"key":"2026022405164130400_btaf628-B4","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1214\/aos\/1013699998","article-title":"The control of the false discovery rate in multiple testing under dependency","volume":"29","author":"Benjamini","year":"2001","journal-title":"Ann Statist"},{"key":"2026022405164130400_btaf628-B6","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1016\/j.ajhg.2018.10.025","article-title":"OUTRIDER: a statistical method for detecting aberrantly expressed genes in RNA sequencing data","volume":"103","author":"Brechtmann","year":"2018","journal-title":"Am J Hum Genet"},{"key":"2026022405164130400_btaf628-B7","doi-asserted-by":"crossref","first-page":"1967","DOI":"10.1074\/mcp.RA119.001472","article-title":"Multibatch TMT reveals false positives, batch effects and missing values","volume":"18","author":"Brenes","year":"2019","journal-title":"Mol Cell Proteomics"},{"key":"2026022405164130400_btaf628-B8","doi-asserted-by":"crossref","first-page":"100318","DOI":"10.1016\/j.xhgg.2024.100318","article-title":"Identifying dysregulated regions in amyotrophic lateral sclerosis through chromatin accessibility outliers","volume":"5","author":"\u00c7elik","year":"2024","journal-title":"Hum Genet Genomics Adv"},{"key":"2026022405164130400_btaf628-B9","doi-asserted-by":"crossref","first-page":"eadg7492","DOI":"10.1126\/science.adg7492","article-title":"Accurate proteome-wide missense variant effect prediction with AlphaMissense","volume":"381","author":"Cheng","year":"2023","journal-title":"Science"},{"key":"2026022405164130400_btaf628-B10","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/s41525-025-00493-5","article-title":"An outlier approach: advancing diagnosis of neurological diseases through integrating proteomics into multi-omics guided exome reanalysis","volume":"10","author":"Chui","year":"2025","journal-title":"NPJ Genom Med"},{"key":"2026022405164130400_btaf628-B11","doi-asserted-by":"publisher","author":"Collier","year":"2020","DOI":"10.48550\/arXiv.2006.05301,"},{"key":"2026022405164130400_btaf628-B12","doi-asserted-by":"crossref","first-page":"eaal5209","DOI":"10.1126\/scitranslmed.aal5209","article-title":"Improving genetic diagnosis in mendelian disease with transcriptome sequencing","volume":"9","author":"Cummings","year":"2017","journal-title":"Sci Transl Med"},{"key":"2026022405164130400_btaf628-B13","doi-asserted-by":"crossref","first-page":"3639","DOI":"10.1038\/s41467-020-17336-9","article-title":"Proteome activity landscapes of tumor cell lines determine drug responses","volume":"11","author":"Frejno","year":"2020","journal-title":"Nat Commun"},{"key":"2026022405164130400_btaf628-B14","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1038\/s41591-019-0457-8","article-title":"Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts","volume":"25","author":"Fr\u00e9sard","year":"2019","journal-title":"Nat Med"},{"key":"2026022405164130400_btaf628-B15","doi-asserted-by":"crossref","first-page":"5040","DOI":"10.1109\/TIT.2014.2323359","article-title":"The optimal hard threshold for singular values is 4\/\\sqrt 3","volume":"60","author":"Gavish","year":"2014","journal-title":"IEEE Trans Inform Theory"},{"key":"2026022405164130400_btaf628-B16","first-page":"58","volume-title":"Genome Med","author":"Hock","year":"2025"},{"key":"2026022405164130400_btaf628-B17","doi-asserted-by":"crossref","first-page":"4609","DOI":"10.1093\/bioinformatics\/btaa259","article-title":"LeafCutterMD: an algorithm for outlier splicing detection in rare diseases","volume":"36","author":"Jenkinson","year":"2020","journal-title":"Bioinformatics"},{"key":"2026022405164130400_btaf628-B18","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1038\/s41586-020-2308-7","article-title":"The mutational constraint spectrum quantified from variation in 141,456 humans","volume":"581","author":"Karczewski","year":"2020","journal-title":"Nature"},{"key":"2026022405164130400_btaf628-B19","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/1471-2105-13-S16-S5","article-title":"Normalization and missing value imputation for label-free LC-MS analysis","volume":"13","author":"Karpievitch","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2026022405164130400_btaf628-B20","doi-asserted-by":"publisher","author":"Kingma","year":"2017","DOI":"10.48550\/arXiv.1412.6980,"},{"key":"2026022405164130400_btaf628-B21","doi-asserted-by":"crossref","first-page":"e2200092","DOI":"10.1002\/pmic.202200092","article-title":"Dealing with missing values in proteomics data","volume":"22","author":"Kong","year":"2022","journal-title":"Proteomics"},{"key":"2026022405164130400_btaf628-B22","doi-asserted-by":"publisher","author":"Kopajtich","year":"2021","DOI":"10.1101\/2021.03.09.21253187,"},{"key":"2026022405164130400_btaf628-B23","doi-asserted-by":"crossref","first-page":"15824","DOI":"10.1038\/ncomms15824","article-title":"Genetic diagnosis of mendelian disorders via RNA sequencing","volume":"8","author":"Kremer","year":"2017","journal-title":"Nat Commun"},{"key":"2026022405164130400_btaf628-B24","doi-asserted-by":"crossref","first-page":"119140","DOI":"10.1016\/j.bbamcr.2021.119140","article-title":"Regulation of gene expression via translational buffering","volume":"1869","author":"Kusnadi","year":"2022","journal-title":"Biochim Biophys Acta Mol Cell Res"},{"key":"2026022405164130400_btaf628-B25","doi-asserted-by":"crossref","first-page":"4754","DOI":"10.1093\/bioinformatics\/btac603","article-title":"ABEILLE: a novel method for ABerrant Expression Identification empLoying machine LEarning from RNA-sequencing data","volume":"38","author":"Labory","year":"2022","journal-title":"Bioinformatics"},{"key":"2026022405164130400_btaf628-B26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2133360.2133363","article-title":"Isolation-based anomaly detection","volume":"6","author":"Liu","year":"2012","journal-title":"ACM Trans Knowl Discov Data"},{"key":"2026022405164130400_btaf628-B27","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2026022405164130400_btaf628-B28","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/s13059-016-0974-4","article-title":"The ensembl variant effect predictor","volume":"17","author":"McLaren","year":"2016","journal-title":"Genome Biol"},{"key":"2026022405164130400_btaf628-B29","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/s41467-020-20573-7","article-title":"Detection of aberrant splicing events in RNA-seq data using FRASER","volume":"12","author":"Mertes","year":"2021","journal-title":"Nat Commun"},{"key":"2026022405164130400_btaf628-B30","doi-asserted-by":"crossref","first-page":"107501","DOI":"10.1016\/j.patcog.2020.107501","article-title":"Handling incomplete heterogeneous data using VAEs","volume":"107","author":"Naz\u00e1bal","year":"2020","journal-title":"Pattern Recognit"},{"key":"2026022405164130400_btaf628-B31","doi-asserted-by":"crossref","first-page":"4369","DOI":"10.1016\/j.csbj.2022.08.022","article-title":"Perspectives for better batch effect correction in mass-spectrometry-based proteomics","volume":"20","author":"Phua","year":"2022","journal-title":"Comput Struct Biotechnol J"},{"key":"2026022405164130400_btaf628-B32","doi-asserted-by":"crossref","first-page":"D886","DOI":"10.1093\/nar\/gky1016","article-title":"CADD: predicting the deleteriousness of variants throughout the human genome","volume":"47","author":"Rentzsch","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2026022405164130400_btaf628-B33","doi-asserted-by":"crossref","first-page":"e47","DOI":"10.1093\/nar\/gkv007","article-title":"Limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"Ritchie","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2026022405164130400_btaf628-B34","doi-asserted-by":"crossref","first-page":"2201","DOI":"10.1016\/j.celrep.2017.08.010","article-title":"Genomic determinants of protein abundance variation in colorectal cancer cells","volume":"20","author":"Roumeliotis","year":"2017","journal-title":"Cell Rep"},{"key":"2026022405164130400_btaf628-B35","doi-asserted-by":"crossref","first-page":"101245","DOI":"10.1016\/j.jocs.2020.101245","article-title":"OutPyR: Bayesian inference for RNA-seq outlier detection","volume":"47","author":"Salkovic","year":"2020","journal-title":"J Comput Sci"},{"key":"2026022405164130400_btaf628-B36","doi-asserted-by":"crossref","first-page":"btad142","DOI":"10.1093\/bioinformatics\/btad142","article-title":"OutSingle: a novel method of detecting and injecting outliers in RNA-seq count data using the optimal hard threshold for singular values","volume":"39","author":"Salkovic","year":"2023","journal-title":"Bioinformatics"},{"key":"2026022405164130400_btaf628-B37","doi-asserted-by":"crossref","first-page":"2056","DOI":"10.1016\/j.ajhg.2023.10.014","article-title":"Improved detection of aberrant splicing with FRASER 2.0 and the intron Jaccard index","volume":"110","author":"Scheller","year":"2023","journal-title":"Am J Hum Genet"},{"key":"2026022405164130400_btaf628-B38","first-page":"146","author":"Schlegl","year":"2017"},{"key":"2026022405164130400_btaf628-B39","doi-asserted-by":"publisher","author":"Segers","year":"2023","DOI":"10.1101\/2023.06.29.547014,"},{"key":"2026022405164130400_btaf628-B40","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1146\/annurev-genom-021623-121812","article-title":"RNA sequencing in disease diagnosis","volume":"25","author":"Smail","year":"2024","journal-title":"Annu Rev Genomics Hum Genet"},{"key":"2026022405164130400_btaf628-B41","doi-asserted-by":"crossref","first-page":"2301","DOI":"10.1038\/nprot.2016.136","article-title":"The MaxQuant computational platform for mass spectrometry-based shotgun proteomics","volume":"11","author":"Tyanova","year":"2016","journal-title":"Nat Protoc"},{"key":"2026022405164130400_btaf628-B42","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1038\/s41593-022-01031-7","article-title":"Integrating whole-genome sequencing with multi-omic data reveals the impact of structural variants on gene regulation in the human brain","volume":"25","author":"Vialle","year":"2022","journal-title":"Nat Neurosci"},{"key":"2026022405164130400_btaf628-B43","doi-asserted-by":"crossref","first-page":"e164","DOI":"10.1093\/nar\/gkq603","article-title":"ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data","volume":"38","author":"Wang","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2026022405164130400_btaf628-B44","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1186\/s13059-018-1451-z","article-title":"Post-translational buffering leads to convergent protein expression levels between primates","volume":"19","author":"Wang","year":"2018","journal-title":"Genome Biol"},{"key":"2026022405164130400_btaf628-B45","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1186\/s13073-022-01019-9","article-title":"Clinical implementation of RNA sequencing for mendelian disease diagnostics","volume":"14","author":"Y\u00e9pez","year":"2022","journal-title":"Genome Med"},{"key":"2026022405164130400_btaf628-B46","doi-asserted-by":"crossref","first-page":"1276","DOI":"10.1038\/s41596-020-00462-5","article-title":"Detection of aberrant gene expression events in RNA sequencing data","volume":"16","author":"Y\u00e9pez","year":"2021","journal-title":"Nat Protoc"},{"key":"2026022405164130400_btaf628-B47","doi-asserted-by":"crossref","first-page":"1468","DOI":"10.1074\/mcp.TIR119.001385","article-title":"TMT labeling for the masses: a robust and cost-efficient, in-solution labeling approach","volume":"18","author":"Zecha","year":"2019","journal-title":"Mol Cell Proteomics"},{"key":"2026022405164130400_btaf628-B5","first-page":"585","author":"Zhao","year":"2019"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf628\/65416092\/btaf628.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/12\/btaf628\/65416092\/btaf628.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/12\/btaf628\/65416092\/btaf628.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T10:16:55Z","timestamp":1771928215000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf628\/8332248"}},"subtitle":[],"editor":[{"given":"Macha","family":"Nikolski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,11,20]]},"references-count":47,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf628","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,12]]},"published":{"date-parts":[[2025,11,20]]},"article-number":"btaf628"}}