{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T18:45:00Z","timestamp":1770144300845,"version":"3.49.0"},"reference-count":57,"publisher":"Oxford University Press (OUP)","issue":"18","license":[{"start":{"date-parts":[[2022,7,25]],"date-time":"2022-07-25T00:00:00Z","timestamp":1658707200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Holland Proton Therapy Center","award":["2019020"],"award-info":[{"award-number":["2019020"]}]},{"name":"United States National Institutes of Health","award":["U54EY032442"],"award-info":[{"award-number":["U54EY032442"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Synthetic lethality (SL) between two genes occurs when simultaneous loss of function leads to cell death. This holds great promise for developing anti-cancer therapeutics that target synthetic lethal pairs of endogenously disrupted genes. Identifying novel SL relationships through exhaustive experimental screens is challenging, due to the vast number of candidate pairs. Computational SL prediction is therefore sought to identify promising SL gene pairs for further experimentation. However, current SL prediction methods lack consideration for generalizability in the presence of selection bias in SL data.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We show that SL data exhibit considerable gene selection bias. Our experiments designed to assess the robustness of SL prediction reveal that models driven by the topology of known SL interactions (e.g. graph, matrix factorization) are especially sensitive to selection bias. We introduce selection bias-resilient synthetic lethality (SBSL) prediction using regularized logistic regression or random forests. Each gene pair is described by 27 molecular features derived from cancer cell line, cancer patient tissue and healthy donor tissue samples. SBSL models are built and tested using approximately 8000 experimentally derived SL pairs across breast, colon, lung and ovarian cancers. Compared to other SL prediction methods, SBSL showed higher predictive performance, better generalizability and robustness to selection bias. Gene dependency, quantifying the essentiality of a gene for cell survival, contributed most to SBSL predictions. Random forests were superior to linear models in the absence of dependency features, highlighting the relevance of mutual exclusivity of somatic mutations, co-expression in healthy tissue and differential expression in tumour samples.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>https:\/\/github.com\/joanagoncalveslab\/sbsl<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac523","type":"journal-article","created":{"date-parts":[[2022,7,25]],"date-time":"2022-07-25T15:11:13Z","timestamp":1658761873000},"page":"4360-4368","source":"Crossref","is-referenced-by-count":7,"title":["Overcoming selection bias in synthetic lethality prediction"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1716-8419","authenticated-orcid":false,"given":"Colm","family":"Seale","sequence":"first","affiliation":[{"name":"Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology , Delft 2628 XE, The Netherlands"},{"name":"Holland Proton Therapy Center (HollandPTC) , Delft 2600 AC, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3375-6678","authenticated-orcid":false,"given":"Yasin","family":"Tepeli","sequence":"additional","affiliation":[{"name":"Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology , Delft 2628 XE, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6072-9627","authenticated-orcid":false,"given":"Joana P","family":"Gon\u00e7alves","sequence":"additional","affiliation":[{"name":"Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology , Delft 2628 XE, The Netherlands"}]}],"member":"286","published-online":{"date-parts":[[2022,7,25]]},"reference":[{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1186\/s13059-015-0612-6","article-title":"Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations","volume":"16","author":"Babur","year":"2015","journal-title":"Genome Biol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1002\/bimj.4710310209","article-title":"The wald statistic in proportional hazards hypothesis testing","volume":"31","author":"Bangdiwala","year":"1989","journal-title":"Biom. J"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/s41586-019-1103-9","article-title":"Prioritization of cancer therapeutic targets using CRISPR\u2013Cas9 screens","volume":"568","author":"Behan","year":"2019","journal-title":"Nature"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"e1006888","DOI":"10.1371\/journal.pcbi.1006888","article-title":"Predicting synthetic lethal interactions using conserved patterns in protein interaction networks","volume":"15","author":"Benstead-Hume","year":"2019","journal-title":"PLoS Comput. Biol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1186\/cc2955","article-title":"Statistics review 12: survival analysis","volume":"8","author":"Bewick","year":"2004","journal-title":"Crit. Care"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"4458","DOI":"10.1093\/bioinformatics\/btaa211","article-title":"Dual-dropout graph convolutional network for predicting synthetic lethality in human cancers","volume":"36","author":"Cai","year":"2020","journal-title":"Bioinformatics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-016-1114-x","article-title":"A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence","volume":"17","author":"Canisius","year":"2016","journal-title":"Genome Biol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1186\/1752-0509-3-116","article-title":"Human synthetic lethal inference as potential anti-cancer target gene detection","volume":"3","author":"Conde-Pueyo","year":"2009","journal-title":"BMC Syst. Biol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1093\/bioinformatics\/bty673","article-title":"DiscoverSL: an R package for multi-omic data driven prediction of synthetic lethality in cancers","volume":"35","author":"Das","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1016\/j.cels.2021.08.006","article-title":"Comprehensive prediction of robust synthetic lethality between paralog pairs in cancer cell lines","volume":"12","author":"De Kegel","year":"2021","journal-title":"Cell Syst"},{"key":"2023041408365165200_","article-title":"Extracting biological insights from the project Achilles genome-scale CRISPR screens in cancer cell lines","author":"Dempster","year":"2019","journal-title":"bioRxiv"},{"key":"2023041408365165200_","author":"Deng","year":"2012"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1016\/j.ccell.2019.01.009","article-title":"A platform of synthetic lethal gene interaction networks reveals that the GNAQ uveal melanoma oncogene controls the hippo pathway through FAK","volume":"35","author":"Feng","year":"2019","journal-title":"Cancer Cell"},{"key":"2023041408365165200_","first-page":"1","article-title":"All models are wrong, but many are useful: learning a variable\u2019s importance by studying an entire class of prediction models simultaneously","volume":"20","author":"Fisher","year":"2019","journal-title":"J. Mach. Learn. Res"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1038\/msb.2011.35","article-title":"Predicting selective drug targets in cancer through metabolic networks","volume":"7","author":"Folger","year":"2011","journal-title":"Mol. Syst. Biol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J. Stat. Softw"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/s41586-019-1186-3","article-title":"Next-generation characterization of the cancer cell line encyclopedia","volume":"569","author":"Ghandi","year":"2019","journal-title":"Nature"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1517","DOI":"10.1287\/opre.2019.1919","article-title":"Fast best subset selection: coordinate descent and local combinatorial optimization algorithms","volume":"68","author":"Hazimeh","year":"2020","journal-title":"Oper. Res"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1186\/s12859-019-3197-3","article-title":"Predicting synthetic lethal interactions in human cancers using graph regularized self-representative matrix factorization","volume":"20","author":"Huang","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"e1004506","DOI":"10.1371\/journal.pcbi.1004506","article-title":"Connectivity homology enables inter-species network models of synthetic lethality","volume":"11","author":"Jacunski","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4614-7138-7","volume-title":"An Introduction to Statistical Learning","author":"James","year":"2013"},{"key":"2023041408365165200_","first-page":"D498","article-title":"The reactome pathway knowledgebase","volume":"48","author":"Jassal","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1199","DOI":"10.1016\/j.cell.2014.07.027","article-title":"Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality","volume":"158","author":"Jerby-Arnon","year":"2014","journal-title":"Cell"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: Kyoto Encyclopedia of Genes and Genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1016\/B978-0-12-800173-8.00004-0","article-title":"Dasatinib","volume":"39","author":"Korashy","year":"2014","journal-title":"Profiles of Drug Substances, Excipients and Related Methodology"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"2163","DOI":"10.1039\/c3mb25589a","article-title":"Identification of synthetic lethal pairs in biological systems through network information centrality","volume":"9","author":"Kranthi","year":"2013","journal-title":"Mol. Biosyst"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"2546","DOI":"10.1038\/s41467-018-04647-1","article-title":"Harnessing synthetic lethality to predict the response to cancer treatment","volume":"9","author":"Lee","year":"2018","journal-title":"Nat. Commun"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"2209","DOI":"10.1093\/bioinformatics\/btz893","article-title":"Predicting synthetic lethal interactions using heterogeneous data sources","volume":"36","author":"Liany","year":"2020","journal-title":"Bioinformatics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1739","DOI":"10.1093\/bioinformatics\/btr260","article-title":"Molecular signatures database (MSigDB) 3.0","volume":"27","author":"Liberzon","year":"2011","journal-title":"Bioinformatics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1109\/TCBB.2019.2909908","article-title":"Sl 2 MF: predicting synthetic lethality in human cancers via logistic matrix factorization","volume":"17","author":"Liu","year":"2020","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"2432","DOI":"10.1093\/bioinformatics\/btab110","article-title":"Graph contextualized attention network for predicting synthetic lethality in human cancers","volume":"37","author":"Long","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1038\/ng.2653","article-title":"The Genotype-Tissue Expression (GTEx) Project","volume":"45","author":"Lonsdale","year":"2013","journal-title":"Nat. Genet"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1146\/annurev-med-050913-022545","article-title":"Synthetic lethality and cancer therapy: lessons learned from the development of PARP inhibitors","volume":"66","author":"Lord","year":"2015","journal-title":"Annu. Rev. Med"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"e0125795","DOI":"10.1371\/journal.pone.0125795","article-title":"Predicting human genetic interactions from cancer genome evolution","volume":"10","author":"Lu","year":"2015","journal-title":"PLoS One"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1214\/aoms\/1177730491","article-title":"On a test of whether one of two random variables is stochastically larger than the other","volume":"18","author":"Mann","year":"1947","journal-title":"Ann. Math. Statist"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"4610","DOI":"10.1038\/s41467-018-06916-5","article-title":"Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration","volume":"9","author":"McFarland","year":"2018","journal-title":"Nat. Commun"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"R41","DOI":"10.1186\/gb-2011-12-4-r41","article-title":"Gistic2. 0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers","volume":"12","author":"Mermel","year":"2011","journal-title":"Genome Biol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1779","DOI":"10.1038\/ng.3984","article-title":"Computational correction of copy number effect improves specificity of CRISPR\u2013Cas9 essentiality screens in cancer cells","volume":"49","author":"Meyers","year":"2017","journal-title":"Nat. Genet"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.febslet.2010.11.024","article-title":"Synthetic lethality: general principles, utility and detection using genetic screens in human cells","volume":"585","author":"Nijman","year":"2011","journal-title":"FEBS Lett"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"3666","DOI":"10.1093\/bioinformatics\/btv377","article-title":"Alternative preprocessing of RNA-sequencing data in the cancer genome atlas leads to improved analysis results","volume":"31","author":"Rahman","year":"2015","journal-title":"Bioinformatics"},{"key":"2023041408365165200_","first-page":"315","author":"Raman","year":"2018"},{"key":"2023041408365165200_","author":"Richoux","year":"2019"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"Edger: a bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"D674","DOI":"10.1093\/nar\/gkn653","article-title":"PID: the pathway interaction database","volume":"37","author":"Schaefer","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"972","DOI":"10.1093\/bioinformatics\/bty710","article-title":"Variable selection and validation in multivariate modelling","volume":"35","author":"Shi","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1186\/s13062-015-0086-1","article-title":"Inferring synthetic lethal interactions from mutual exclusivity of genetic events in cancer","volume":"10","author":"Srihari","year":"2015","journal-title":"Biol. Direct"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"e2006643","DOI":"10.1371\/journal.pbio.2006643","article-title":"Large-scale investigation of the reasons why potentially important genes are ignored","volume":"16","author":"Stoeger","year":"2018","journal-title":"PLoS Biol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl. Acad. Sci. U S A"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"D605","DOI":"10.1093\/nar\/gkaa1074","article-title":"The string database in 2021: customizable protein\u2013protein networks, and functional characterization of user-uploaded gene\/measurement sets","volume":"49","author":"Szklarczyk","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041408365165200_","volume-title":"Firehose stddata__2016_01_28 Run","author":"TCGA GDAC","year":"2016"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"D325","DOI":"10.1093\/nar\/gkaa1113","article-title":"The gene ontology resource: enriching a gold mine","volume":"49","author":"The Gene Ontology Consortium","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041408365165200_","first-page":"1","article-title":"High and low mutational burden tumors versus immunologically hot and cold tumors and response to immune checkpoint inhibitors","volume":"6","author":"Vareki","year":"2018","journal-title":"J. Immunother. Cancer"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"112","DOI":"10.3389\/fphar.2020.00112","article-title":"Exp2sl: a machine learning framework for cell-line-specific synthetic lethality prediction","volume":"11","author":"Wan","year":"2020","journal-title":"Front. Pharmacol"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1186\/s12864-016-2375-1","article-title":"Multi-omic measurement of mutually exclusive loss-of-function enriches for candidate synthetic lethal gene pairs","volume":"17","author":"Wappett","year":"2016","journal-title":"BMC Genomics"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"CIN.S14026","DOI":"10.4137\/CIN.S14026","article-title":"In silico prediction of synthetic lethality by Meta-analysis of genetic interactions, functions, and pathways in yeast and human cancer","volume":"13s3","author":"Wu","year":"2014","journal-title":"Cancer Inform"},{"key":"2023041408365165200_","doi-asserted-by":"crossref","first-page":"1541002","DOI":"10.1142\/S0219720015410024","article-title":"Predicting essential genes and synthetic lethality via influence propagation in signaling pathways of cancer cell fates","volume":"13","author":"Zhang","year":"2015","journal-title":"J. Bioinform. Comput. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac523\/45213918\/btac523.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/18\/4360\/49884644\/btac523.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/18\/4360\/49884644\/btac523.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T23:11:49Z","timestamp":1700867509000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/18\/4360\/6649727"}},"subtitle":[],"editor":[{"given":"Karsten","family":"Borgwardt","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,7,25]]},"references-count":57,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2022,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac523","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9,15]]},"published":{"date-parts":[[2022,7,25]]}}}