{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T16:22:14Z","timestamp":1776097334788,"version":"3.50.1"},"reference-count":71,"publisher":"Oxford University Press (OUP)","issue":"16","license":[{"start":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T00:00:00Z","timestamp":1656547200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"crossref","award":["RGPIN-2019-04460"],"award-info":[{"award-number":["RGPIN-2019-04460"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"crossref"}]},{"name":"McGill Initiative in Computational Medicine"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Computational methods for the prediction of protein\u2013protein interactions (PPIs), while important tools for researchers, are plagued by challenges in generalizing to unseen proteins. Datasets used for modelling protein\u2013protein predictions are particularly predisposed to information leakage and sampling biases.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this study, we introduce RAPPPID, a method for the Regularized Automatic Prediction of Protein\u2013Protein Interactions using Deep Learning. RAPPPID is a twin Averaged Weight-Dropped Long Short-Term memory network which employs multiple regularization methods during training time to learn generalized weights. Testing on stringent interaction datasets composed of proteins not seen during training, RAPPPID outperforms state-of-the-art methods. Further experiments show that RAPPPID\u2019s performance holds regardless of the particular proteins in the testing set and its performance is higher for experimentally supported edges. This study serves to demonstrate that appropriate regularization is an important component of overcoming the challenges of creating models for PPI prediction that generalize to unseen proteins. Additionally, as part of this study, we provide datasets corresponding to several data splits of various strictness, in order to facilitate assessment of PPI reconstruction methods by others in the future.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Code and datasets are freely available at https:\/\/github.com\/jszym\/rapppid and Zenodo.org.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac429","type":"journal-article","created":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T11:47:27Z","timestamp":1656589647000},"page":"3958-3967","source":"Crossref","is-referenced-by-count":33,"title":["RAPPPID: towards generalizable protein interaction prediction with AWD-LSTM twin networks"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1559-6225","authenticated-orcid":false,"given":"Joseph","family":"Szymborski","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, McGill University , Montr\u00e9al, QC H3A 0G4, Canada"},{"name":"Mila, Qu\u00e9bec AI Institute , Montr\u00e9al, QC H2S 3H1, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5108-4887","authenticated-orcid":false,"given":"Amin","family":"Emad","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, McGill University , Montr\u00e9al, QC H3A 0G4, Canada"},{"name":"Mila, Qu\u00e9bec AI Institute , Montr\u00e9al, QC H2S 3H1, Canada"},{"name":"The Rosalind and Morris Goodman Cancer Institute , Montr\u00e9al, QC H3A 1A3, Canada"}]}],"member":"286","published-online":{"date-parts":[[2022,6,30]]},"reference":[{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"D408","DOI":"10.1093\/nar\/gkw985","article-title":"Hippie v2.0: enhancing meaningfulness and reliability of protein\u2013protein interaction networks","volume":"45","author":"Alanis-Lobato","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1126\/science.181.4096.223","article-title":"Principles that govern the folding of protein chains","volume":"181","author":"Anfinsen","year":"1973","journal-title":"Science"},{"key":"2023041408491898300_","author":"Athiwaratkun","year":"2019"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"i38","DOI":"10.1093\/bioinformatics\/bti1016","article-title":"Kernel methods for predicting protein-protein interactions","volume":"21","author":"Ben-Hur","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-7-S1-S2","article-title":"Choosing negative examples for the prediction of protein-protein interactions","volume":"7","author":"Ben-Hur","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"D396","DOI":"10.1093\/nar\/gkt1079","article-title":"Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis","volume":"42","author":"Blohm","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"800","DOI":"10.1634\/theoncologist.2010-0035","article-title":"Trastuzumab","volume":"16","author":"Boekhout","year":"2011","journal-title":"Oncologist"},{"key":"2023041408491898300_","article-title":"High-performance large-scale image recognition without normalization","author":"Brock","year":"2021","journal-title":"arXiv"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1142\/S0218001493000339","article-title":"Signature verification using a \u201cSiamese\u201d time delay neural network","volume":"07","author":"Bromley","year":"1993","journal-title":"Int. J. Pattern Recognit. Artif. Intell"},{"key":"2023041408491898300_","first-page":"1365","author":"Browne","year":"2007"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1016\/j.sbi.2008.07.001","article-title":"Overcoming the challenges of membrane protein crystallography","volume":"18","author":"Carpenter","year":"2008","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"2305","DOI":"10.1016\/j.jacc.2012.07.056","article-title":"Novel protein therapeutics for systolic heart failure: chronic subcutaneous b-type natriuretic peptide","volume":"60","author":"Chen","year":"2012","journal-title":"J. Am. Coll. Cardiol"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"i305","DOI":"10.1093\/bioinformatics\/btz328","article-title":"Multifaceted protein\u2013protein interaction prediction based on Siamese residual RCNN","volume":"35","author":"Chen","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1126\/science.aaw6718","article-title":"Protein interaction networks revealed by proteome coevolution","volume":"365","author":"Cong","year":"2019","journal-title":"Science"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"1071","DOI":"10.1016\/j.str.2020.06.006","article-title":"Performance and its limits in rigid body protein-protein docking","volume":"28","author":"Desta","year":"2020","journal-title":"Structure"},{"key":"2023041408491898300_","article-title":"Bert: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2019","journal-title":"arXiv"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"1390","DOI":"10.1038\/s41598-019-56895-w","article-title":"Pipe4: Fast PPI predictor for comprehensive inter- and cross-species interactomes","volume":"10","author":"Dick","year":"2020","journal-title":"Sci. Rep"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1186\/s12859-016-1253-9","article-title":"Predicting protein-protein interactions via multivariate mutual information of protein sequences","volume":"17","author":"Ding","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023041408491898300_","author":"Elnaggar","year":"2021"},{"key":"2023041408491898300_","author":"Evans","year":"2021"},{"key":"2023041408491898300_","volume-title":"PyTorchLightning\/Pytorch-Lightning: 0.7.6 Release","author":"Falcon","year":"2020"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1016\/j.jprot.2014.01.020","article-title":"Bias tradeoffs in the creation and analysis of protein\u2013protein interaction networks","volume":"100","author":"Gillis","year":"2014","journal-title":"J. Proteomics"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1038\/nature22366","article-title":"Architecture of the human interactome defines protein communities and disease networks","volume":"545","author":"Huttlin","year":"2017","journal-title":"Nature"},{"key":"2023041408491898300_","article-title":"Averaging weights leads to wider optima and better generalization","author":"Izmailov","year":"2019","journal-title":"arXiv"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1126\/science.1087361","article-title":"A Bayesian networks approach for predicting protein-protein interactions from genomic data","volume":"302","author":"Jansen","year":"2003","journal-title":"Science"},{"key":"2023041408491898300_","first-page":"D498","article-title":"The Reactome pathway knowledgebase","volume":"48","author":"Jassal","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: Kyoto Encyclopedia of genes and genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","article-title":"Subword regularization: Improving neural network translation models with multiple subword candidates","author":"Kudo","year":"2018","journal-title":"arXiv"},{"key":"2023041408491898300_","first-page":"66","author":"Kudo","year":"2018"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1186\/s12859-017-1871-x","article-title":"Sprint: ultrafast protein-protein interaction prediction of the entire human interactome","volume":"18","author":"Li","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023041408491898300_","article-title":"A critical review of recurrent neural networks for sequence learning","author":"Lipton","year":"2015","journal-title":"arXiv"},{"key":"2023041408491898300_","author":"Loshchilov","year":"2019"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1002\/phar.1338","article-title":"Pertuzumab: a new targeted therapy for her2-positive metastatic breast cancer","volume":"34","author":"Malenfant","year":"2014","journal-title":"Pharmacotherapy"},{"key":"2023041408491898300_","article-title":"Regularizing and optimizing LSTM language models","author":"Merity","year":"2017","journal-title":"arXiv"},{"key":"2023041408491898300_","article-title":"Mish: a self-regularized non-monotonic activation function","author":"Misra","year":"2020","journal-title":"arXiv"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1038\/nmeth.1931","article-title":"Protein interaction data curation: the international molecular exchange (IMEx) consortium","volume":"9","author":"Orchard","year":"2012","journal-title":"Nat. Methods"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"D358","DOI":"10.1093\/nar\/gkt1115","article-title":"The MIntAct project\u2014IntAct as a common curation platform for 11 molecular interaction databases","volume":"42","author":"Orchard","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1002\/pro.3978","article-title":"The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions","volume":"30","author":"Oughtred","year":"2021","journal-title":"Protein Sci"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1038\/nmeth.2259","article-title":"Flaws in evaluation schemes for pair-input computational predictions","volume":"9","author":"Park","year":"2012","journal-title":"Nat. Methods"},{"key":"2023041408491898300_","first-page":"8024","author":"Paszke","year":"2019"},{"key":"2023041408491898300_","article-title":"Transformer protein language models are unsupervised structure learners","volume":"2020","author":"Rao","year":"2020","journal-title":"bioRxiv"},{"key":"2023041408491898300_","article-title":"Comparing two deep learning sequence-based models for protein-protein interaction prediction","author":"Richoux","year":"2019","journal-title":"arXiv"},{"key":"2023041408491898300_","first-page":"622803","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","author":"Rives","year":"2019","journal-title":"bioRxiv"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1083\/jcb.201112098","article-title":"A promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells","volume":"196","author":"Roux","year":"2012","journal-title":"J. Cell Biol"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"D449","DOI":"10.1093\/nar\/gkh086","article-title":"The database of interacting proteins: 2004 update","volume":"32","author":"Salwinski","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","first-page":"5149","author":"Schuster","year":"2012"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1080\/17425247.2019.1662785","article-title":"Long-term delivery of protein and peptide therapeutics for cancer therapies","volume":"16","author":"Sikder","year":"2019","journal-title":"Exp. Opin. Drug Deliv"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"848","DOI":"10.15252\/msb.20156351","article-title":"Fundamentals of protein interaction network mapping","volume":"11","author":"Snider","year":"2015","journal-title":"Mol. Syst. Biol"},{"key":"2023041408491898300_","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"D607","DOI":"10.1093\/nar\/gky1131","article-title":"String v11: protein\u2013protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets","volume":"47","author":"Szklarczyk","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","author":"Szymborski","year":"2022"},{"key":"2023041408491898300_","author":"Szymborski","year":"2022"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"6620","DOI":"10.1038\/s41598-018-24937-4","article-title":"A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models","volume":"8","author":"Tabe-Bordbar","year":"2018","journal-title":"Sci. Rep"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"459","DOI":"10.1016\/j.ejmech.2015.01.014","article-title":"Peptide therapeutics: targeting the undruggable space","volume":"94","author":"Tsomaia","year":"2015","journal-title":"Eur. J. Med. Chem"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"baq023","DOI":"10.1093\/database\/baq023","article-title":"iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence","volume":"2010","author":"Turner","year":"2010","journal-title":"Database (Oxford)"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"e0181748","DOI":"10.1371\/journal.pone.0181748","article-title":"THPdb: database of FDA-approved peptide and protein therapeutics","volume":"12","author":"Usmani","year":"2017","journal-title":"PLoS One"},{"key":"2023041408491898300_"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"1203","DOI":"10.1038\/nmeth.3182","article-title":"The yeast two-hybrid assay: still finding connections after 25 years","volume":"11","author":"Vidal","year":"2014","journal-title":"Nat. Methods"},{"key":"2023041408491898300_","first-page":"1058","author":"Wan","year":"2013"},{"key":"2023041408491898300_","article-title":"Ranger21: a synergistic deep learning optimizer","author":"Wright","year":"2021","journal-title":"arXiv"},{"key":"2023041408491898300_","doi-asserted-by":"crossref","first-page":"D1096","DOI":"10.1093\/nar\/gks966","article-title":"BioLip: a semi-manually curated database for biologically relevant ligand\u2013protein interactions","volume":"41","author":"Yang","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023041408491898300_","first-page":"635","author":"Yong","year":"2020"},{"key":"2023041408491898300_","first-page":"3320","volume-title":"How Transferable Are Features in Deep Neural Networks","author":"Yosinski","year":"2014"},{"key":"2023041408491898300_","article-title":"Recurrent neural network regularization","author":"Zaremba","year":"2015","journal-title":"arXiv"},{"key":"2023041408491898300_","first-page":"9593","author":"Zhang","year":"2019"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac429\/44834296\/btac429.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/16\/3958\/49889973\/btac429.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/16\/3958\/49889973\/btac429.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T18:49:02Z","timestamp":1700765342000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/16\/3958\/6623405"}},"subtitle":[],"editor":[{"given":"Teresa","family":"Przytycka","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,6,30]]},"references-count":71,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2022,8,10]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac429","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.08.13.456309","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,8,15]]},"published":{"date-parts":[[2022,6,30]]}}}