{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T07:47:31Z","timestamp":1780645651821,"version":"3.54.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2022,2,12]],"date-time":"2022-02-12T00:00:00Z","timestamp":1644624000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,4,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein\u2013protein, protein\u2013nucleotide and protein\u2013small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We constructed a large dataset dubbed BioDL, comprising protein\u2013protein interactions from the PDB, and DNA\/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein\u2013protein, 0.823 for protein\u2013nucleotide and 0.842 for protein\u2013small molecule.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code and datasets are available at https:\/\/github.com\/ibivu\/pipenn\/.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac071","type":"journal-article","created":{"date-parts":[[2022,2,4]],"date-time":"2022-02-04T15:13:14Z","timestamp":1643987594000},"page":"2111-2118","source":"Crossref","is-referenced-by-count":22,"title":["PIPENN: protein interface prediction from sequence with an ensemble of neural nets"],"prefix":"10.1093","volume":"38","author":[{"given":"Bas","family":"Stringer","sequence":"first","affiliation":[{"name":"Department of Computer Science, IBIVU\u2014Center for Integrative Bioinformatics, Vrije Universiteit , 1081HV Amsterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hans","family":"de Ferrante","sequence":"additional","affiliation":[{"name":"Department of Computer Science, IBIVU\u2014Center for Integrative Bioinformatics, Vrije Universiteit , 1081HV Amsterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2779-7174","authenticated-orcid":false,"given":"Sanne","family":"Abeln","sequence":"additional","affiliation":[{"name":"Department of Computer Science, IBIVU\u2014Center for Integrative Bioinformatics, Vrije Universiteit , 1081HV Amsterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jaap","family":"Heringa","sequence":"additional","affiliation":[{"name":"Department of Computer Science, IBIVU\u2014Center for Integrative Bioinformatics, Vrije Universiteit , 1081HV Amsterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6755-9667","authenticated-orcid":false,"given":"K Anton","family":"Feenstra","sequence":"additional","affiliation":[{"name":"Department of Computer Science, IBIVU\u2014Center for Integrative Bioinformatics, Vrije Universiteit , 1081HV Amsterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4138-7179","authenticated-orcid":false,"given":"Reza","family":"Haydarlou","sequence":"additional","affiliation":[{"name":"Department of Computer Science, IBIVU\u2014Center for Integrative Bioinformatics, Vrije Universiteit , 1081HV Amsterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2022,2,12]]},"reference":[{"key":"2023020109020521900_btac071-B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023020109020521900_btac071-B2","doi-asserted-by":"crossref","first-page":"e0141287","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PLoS One"},{"key":"2023020109020521900_btac071-B3","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023020109020521900_btac071-B4","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1186\/1471-2105-9-S12-S6","article-title":"Predicting RNA-binding sites of proteins using support vector machines and evolutionary information","volume":"9","author":"Cheng","year":"2008","journal-title":"BMC Bioinform"},{"key":"2023020109020521900_btac071-B5","first-page":"103","author":"Cho","year":"2014"},{"key":"2023020109020521900_btac071-B6","author":"Chung","year":"2014"},{"key":"2023020109020521900_btac071-B7","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1093\/bfgp\/elaa030","article-title":"Sequence representation approaches for sequence-based protein prediction tasks that use deep learning","volume":"20","author":"Cui","year":"2021","journal-title":"Brief. Funct. Genomics"},{"key":"2023020109020521900_btac071-B8","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1186\/s12859-019-2672-1","article-title":"Predicting protein-ligand binding residues with deep convolutional neural networks","volume":"20","author":"Cui","year":"2019","journal-title":"BMC Bioinform"},{"key":"2023020109020521900_btac071-B9","doi-asserted-by":"crossref","first-page":"2580","DOI":"10.1093\/bioinformatics\/btab154","article-title":"Protein interaction interface region prediction by geometric deep learning","volume":"37","author":"Dai","year":"2021","journal-title":"Bioinformatics"},{"key":"2023020109020521900_btac071-B10","article-title":"A guide to convolution arithmetic for deep learning","author":"Dumoulin","year":"2016"},{"key":"2023020109020521900_btac071-B11","first-page":"249","article-title":"Understanding the difficulty of training deep feedforward neural networks","volume":"9","author":"Glorot","year":"2010","journal-title":"J. Mach. Learn. Res"},{"key":"2023020109020521900_btac071-B12","doi-asserted-by":"publisher","DOI":"10.1101\/200857","article-title":"Dilated convolutions for modeling long-distance genomic dependencies","author":"Gupta","year":"2017"},{"key":"2023020109020521900_btac071-B13","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic (ROC) curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"2023020109020521900_btac071-B14","doi-asserted-by":"crossref","first-page":"4039","DOI":"10.1093\/bioinformatics\/bty481","article-title":"Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks","volume":"34","author":"Hanson","year":"2018","journal-title":"Bioinformatics"},{"key":"2023020109020521900_btac071-B15","author":"He","year":"2015"},{"key":"2023020109020521900_btac071-B16","first-page":"770","author":"He","year":"2016"},{"key":"2023020109020521900_btac071-B17","doi-asserted-by":"crossref","first-page":"630","DOI":"10.1007\/978-3-319-46493-0_38","volume-title":"Computer Vision\u2014ECCV 2016","author":"He","year":"2016"},{"key":"2023020109020521900_btac071-B18","first-page":"455","article-title":"Person segmentation using convolutional neural networks with dilated convolutions","volume":"2018","author":"Ho","year":"2018","journal-title":"Electron. Imaging"},{"key":"2023020109020521900_btac071-B19","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023020109020521900_btac071-B20","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1186\/s12859-015-0758-y","article-title":"Sequence specificity between interacting and non-interacting homologs identifies interface residues\u2014a homodimer and monomer use case","volume":"16","author":"Hou","year":"2015","journal-title":"BMC Bioinform"},{"key":"2023020109020521900_btac071-B21","doi-asserted-by":"crossref","first-page":"1479","DOI":"10.1093\/bioinformatics\/btx005","article-title":"Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest","volume":"33","author":"Hou","year":"2017","journal-title":"Bioinformatics"},{"key":"2023020109020521900_btac071-B22","doi-asserted-by":"crossref","first-page":"4794","DOI":"10.1093\/bioinformatics\/btz428","article-title":"SeRenDIP: Sequential RemasteriNg to DerIve profiles for fast and accurate predictions of PPI interface positions","volume":"35","author":"Hou","year":"2019","journal-title":"Bioinformatics"},{"key":"2023020109020521900_btac071-B23","doi-asserted-by":"crossref","first-page":"3421","DOI":"10.1093\/bioinformatics\/btab321","article-title":"SeRenDIP-CE: sequence-based interface prediction for conformational epitopes","volume":"37","author":"Hou","year":"2021","journal-title":"Bioinformatics"},{"key":"2023020109020521900_btac071-B24","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1073\/pnas.93.1.13","article-title":"Principles of protein-protein interactions","volume":"93","author":"Jones","year":"1996","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020109020521900_btac071-B25","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1101\/gr.227819.117","article-title":"Sequential regulatory activity prediction across chromosomes with convolutional neural networks","volume":"28","author":"Kelley","year":"2018","journal-title":"Genome Res"},{"key":"2023020109020521900_btac071-B26","first-page":"4765","volume-title":"Advances in Neural Information Processing Systems","author":"Lundberg","year":"2017"},{"key":"2023020109020521900_btac071-B27","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023020109020521900_btac071-B28","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/1472-6807-9-51","article-title":"A generic method for assignment of reliability scores applied to solvent accessibility predictions","volume":"9","author":"Petersen","year":"2009","journal-title":"BMC Struct. Biol"},{"key":"2023020109020521900_btac071-B29","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1093\/nar\/gkw226","article-title":"DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences","volume":"44","author":"Quang","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023020109020521900_btac071-B30","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1007\/978-3-319-24574-4_28","article-title":"U-net: convolutional networks for biomedical image segmentation","volume":"9351","author":"Ronneberger","year":"2015","journal-title":"Lecture Notes Comput. Sci"},{"key":"2023020109020521900_btac071-B31","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1093\/bib\/bbz156","article-title":"Deep learning for mining protein data","volume":"22","author":"Shi","year":"2021","journal-title":"Brief. Bioinform"},{"key":"2023020109020521900_btac071-B32","doi-asserted-by":"crossref","first-page":"4585","DOI":"10.2174\/138161212802651661","article-title":"Editorial: toward the design of drugs on protein-protein interactions","volume":"18","author":"Sperandio","year":"2012","journal-title":"Curr. Pharm. Des"},{"key":"2023020109020521900_btac071-B33","doi-asserted-by":"crossref","first-page":"2102592","DOI":"10.1002\/advs.202102592","article-title":"Improved protein structure prediction using a new multi-scale network and homologous templates","volume":"8","author":"Su","year":"2021","journal-title":"Adv. Sci"},{"key":"2023020109020521900_btac071-B34","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1038\/s41586-021-03828-1","article-title":"Highly accurate protein structure prediction for the human proteome","volume":"596","author":"Tunyasuvunakool","year":"2021","journal-title":"Nature"},{"key":"2023020109020521900_btac071-B35","doi-asserted-by":"crossref","first-page":"D483","DOI":"10.1093\/nar\/gks1258","article-title":"SIFTS: Structure Integration with Function, Taxonomy and Sequences resource","volume":"41","author":"Velankar","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020109020521900_btac071-B36","doi-asserted-by":"crossref","first-page":"e1005324","DOI":"10.1371\/journal.pcbi.1005324","article-title":"Accurate de novo prediction of protein contact map by ultra-deep learning model","volume":"13","author":"Wang","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023020109020521900_btac071-B37","doi-asserted-by":"crossref","first-page":"1926156","DOI":"10.1155\/2019\/1926156","article-title":"SmoPSI: analysis and prediction of small molecule binding sites based on protein sequence information","volume":"2019","author":"Wang","year":"2019","journal-title":"Comput. Math. Methods Med"},{"key":"2023020109020521900_btac071-B38","first-page":"947","article-title":"Deep graph learning of inter-protein contacts","volume-title":"Bioinformatics","author":"Xie","year":"2021"},{"key":"2023020109020521900_btac071-B39","doi-asserted-by":"crossref","first-page":"D1096","DOI":"10.1093\/nar\/gks966","article-title":"BioLiP: a semi-manually curated database for biologically relevant ligand\u2013protein interactions","volume":"41","author":"Yang","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020109020521900_btac071-B40","article-title":"Multi-scale context aggregation by dilated convolutions","author":"Yu","year":"2016"},{"key":"2023020109020521900_btac071-B41","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1093\/bib\/bbx022","article-title":"Review and comparative assessment of sequence-based predictors of protein-binding residues","volume":"19","author":"Zhang","year":"2018","journal-title":"Brief. Bioinform"},{"key":"2023020109020521900_btac071-B42","doi-asserted-by":"crossref","first-page":"i343","DOI":"10.1093\/bioinformatics\/btz324","article-title":"SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences","volume":"35","author":"Zhang","year":"2019","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac071\/42616008\/btac071.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2111\/49009113\/btac071.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2111\/49009113\/btac071.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T18:59:51Z","timestamp":1700161191000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/8\/2111\/6527621"}},"subtitle":[],"editor":[{"given":"Jinbo","family":"Xu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2022,2,12]]},"references-count":42,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4,12]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac071","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.09.03.458832","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,4,15]]},"published":{"date-parts":[[2022,2,12]]}}}