{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T15:16:23Z","timestamp":1774624583873,"version":"3.50.1"},"reference-count":65,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2024,1,12]],"date-time":"2024-01-12T00:00:00Z","timestamp":1705017600000},"content-version":"vor","delay-in-days":11,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NSERC Discovery","award":["RGPIN 2021-03978"],"award-info":[{"award-number":["RGPIN 2021-03978"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,1,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Proteins accomplish cellular functions by interacting with each other, which makes the prediction of interaction sites a fundamental problem. As experimental methods are expensive and time consuming, computational prediction of the interaction sites has been studied extensively. Structure-based programs are the most accurate, while the sequence-based ones are much more widely applicable, as the sequences available outnumber the structures by two orders of magnitude. Ideally, we would like a tool that has the quality of the former and the applicability of the latter.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We provide here the first solution that achieves these two goals. Our new sequence-based program, Seq-InSite, greatly surpasses the performance of sequence-based models, matching the quality of state-of-the-art structure-based predictors, thus effectively superseding the need for models requiring structure. The predictive power of Seq-InSite is illustrated using an analysis of evolutionary conservation for four protein sequences.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Seq-InSite is freely available as a web server at http:\/\/seq-insite.csd.uwo.ca\/ and as free source code, including trained models and all datasets used for training and testing, at https:\/\/github.com\/lucian-ilie\/Seq-InSite.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad738","type":"journal-article","created":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T01:59:50Z","timestamp":1704851990000},"source":"Crossref","is-referenced-by-count":22,"title":["Seq-InSite: sequence supersedes structure for protein interaction site prediction"],"prefix":"10.1093","volume":"40","author":[{"given":"SeyedMohsen","family":"Hosseini","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Western Ontario , London, ON N6A 5B7, Canada"}]},{"given":"G Brian","family":"Golding","sequence":"additional","affiliation":[{"name":"Department of Biology, McMaster University , Hamilton, ON L8S 4K1, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1856-3509","authenticated-orcid":false,"given":"Lucian","family":"Ilie","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Western Ontario , London, ON N6A 5B7, Canada"}]}],"member":"286","published-online":{"date-parts":[[2024,1,11]]},"reference":[{"key":"2024011822461626100_btad738-B1","author":"Abadi","year":"2016"},{"key":"2024011822461626100_btad738-B2","doi-asserted-by":"crossref","first-page":"e0141287","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PLoS One"},{"key":"2024011822461626100_btad738-B3","first-page":"189","volume-title":"Bacterial Protein Secretion Systems. Methods in Molecular Biology, Vol. 1615","author":"Atmakuri","year":"2017"},{"key":"2024011822461626100_btad738-B4","doi-asserted-by":"crossref","first-page":"e1618","DOI":"10.1002\/wcms.1618","article-title":"Machine learning solutions for predicting protein\u2013protein interactions","volume":"12","author":"Casadio","year":"2022","journal-title":"Wiley Interdiscipl Rev Comput Mol Sci"},{"key":"2024011822461626100_btad738-B5","first-page":"233","author":"Davis","year":"2006"},{"key":"2024011822461626100_btad738-B6","author":"Devlin","year":"2018"},{"key":"2024011822461626100_btad738-B7","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.jtbi.2014.01.028","article-title":"Sequence-based prediction of protein\u2013protein interaction sites with L1-logreg classifier","volume":"348","author":"Dhole","year":"2014","journal-title":"J Theor Biol"},{"key":"2024011822461626100_btad738-B8","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/978-1-4939-7033-9_21","volume-title":"Bacterial Protein Secretion Systems","author":"Douzi","year":"2017"},{"key":"2024011822461626100_btad738-B9","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024011822461626100_btad738-B10","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"Prottrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2024011822461626100_btad738-B11","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2024011822461626100_btad738-B12","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1038\/s41592-019-0666-6","article-title":"Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning","volume":"17","author":"Gainza","year":"2020","journal-title":"Nat Methods"},{"key":"2024011822461626100_btad738-B13","doi-asserted-by":"crossref","first-page":"3168","DOI":"10.1038\/s41467-021-23303-9","article-title":"Structure-based protein function prediction using graph convolutional networks","volume":"12","author":"Gligorijevi\u0107","year":"2021","journal-title":"Nat Commun"},{"key":"2024011822461626100_btad738-B14","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2024011822461626100_btad738-B15","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1186\/s12859-019-3220-8","article-title":"Modeling aspects of the language of life through transfer-learning protein sequences","volume":"20","author":"Heinzinger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2024011822461626100_btad738-B16","doi-asserted-by":"crossref","first-page":"D360","DOI":"10.1093\/nar\/gkn659","article-title":"PiSite: a database of protein interaction sites using multiple binding states in the PDB","volume":"37","author":"Higurashi","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2024011822461626100_btad738-B17","doi-asserted-by":"crossref","first-page":"12814","DOI":"10.3390\/ijms232112814","article-title":"PITHIA: protein interaction site prediction using multiple sequence alignments and attention","volume":"23","author":"Hosseini","year":"2022","journal-title":"Int J Mol Sci"},{"key":"2024011822461626100_btad738-B18","doi-asserted-by":"crossref","first-page":"115132","DOI":"10.1016\/j.ab.2023.115132","article-title":"Improving protein\u2013protein interaction site prediction using deep residual neural network","volume":"670","author":"Hu","year":"2023","journal-title":"Anal Biochem"},{"key":"2024011822461626100_btad738-B19","doi-asserted-by":"crossref","first-page":"3198","DOI":"10.1016\/j.csbj.2021.05.039","article-title":"Representation learning applications in biological sequence analysis","volume":"19","author":"Iuchi","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2024011822461626100_btad738-B20","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1126\/science.1087361","article-title":"A Bayesian networks approach for predicting protein\u2013protein interactions from genomic data","volume":"302","author":"Jansen","year":"2003","journal-title":"Science"},{"key":"2024011822461626100_btad738-B21","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1073\/pnas.93.1.13","article-title":"Principles of protein\u2013protein interactions","volume":"93","author":"Jones","year":"1996","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024011822461626100_btad738-B22","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2024011822461626100_btad738-B23","doi-asserted-by":"crossref","first-page":"bbac480","DOI":"10.1093\/bib\/bbac480","article-title":"HN-PPISP: a hybrid network based on MLP-Mixer for protein\u2013protein interaction site prediction","volume":"24","author":"Kang","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024011822461626100_btad738-B24","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/978-1-4939-7033-9_13","volume-title":"Bacterial Protein Secretion Systems","author":"Karimova","year":"2017"},{"key":"2024011822461626100_btad738-B25","doi-asserted-by":"crossref","first-page":"2117","DOI":"10.3390\/cells11132117","article-title":"Prob-site: protein binding site prediction using local features","volume":"11","author":"Khan","year":"2022","journal-title":"Cells"},{"key":"2024011822461626100_btad738-B26","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1093\/bioinformatics\/btz595","article-title":"DeepGOPlus: improved protein function prediction from sequence","volume":"36","author":"Kulmanov","year":"2020","journal-title":"Bioinformatics"},{"key":"2024011822461626100_btad738-B27","doi-asserted-by":"crossref","first-page":"bbab502","DOI":"10.1093\/bib\/bbab502","article-title":"Accurate protein function prediction via graph attention networks with predicted structure information","volume":"23","author":"Lai","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024011822461626100_btad738-B28","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1093\/bioinformatics\/btaa750","article-title":"DELPHI: accurate deep ensemble model for protein interaction sites prediction","volume":"37","author":"Li","year":"2021","journal-title":"Bioinformatics"},{"key":"2024011822461626100_btad738-B29","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/978-1-4939-7033-9_17","volume-title":"Bacterial Protein Secretion Systems","author":"Lin","year":"2017"},{"key":"2024011822461626100_btad738-B30","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1007\/978-1-4939-7033-9_20","volume-title":"Bacterial Protein Secretion Systems","author":"Louche","year":"2017"},{"key":"2024011822461626100_btad738-B31","first-page":"141","author":"Lu","year":"2021"},{"key":"2024011822461626100_btad738-B32","doi-asserted-by":"crossref","first-page":"bbab578","DOI":"10.1093\/bib\/bbab578","article-title":"EGRET: edge aggregated graph attention networks and transfer learning improve protein\u2013protein interaction site prediction","volume":"23","author":"Mahbub","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024011822461626100_btad738-B33","doi-asserted-by":"crossref","first-page":"167963","DOI":"10.1016\/j.jmb.2023.167963","article-title":"Ispred-seq: deep neural networks and embeddings for predicting interaction sites in protein sequences","volume":"435","author":"Manfredi","year":"2023","journal-title":"J Mol Biol"},{"key":"2024011822461626100_btad738-B34","author":"Mikolov","year":"2013"},{"key":"2024011822461626100_btad738-B35","doi-asserted-by":"crossref","first-page":"1841","DOI":"10.1093\/bioinformatics\/btq302","article-title":"Applying the Na\u00efve Bayes classifier with kernel density estimation to the prediction of protein\u2013protein interaction sites","volume":"26","author":"Murakami","year":"2010","journal-title":"Bioinformatics"},{"key":"2024011822461626100_btad738-B36","first-page":"1","author":"Nambiar","year":"2020"},{"key":"2024011822461626100_btad738-B37","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1093\/molbev\/msu300","article-title":"IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies","volume":"32","author":"Nguyen","year":"2015","journal-title":"Mol Biol Evol"},{"key":"2024011822461626100_btad738-B38","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: NLP, machine learning & protein sequences","volume":"19","author":"Ofer","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2024011822461626100_btad738-B39","first-page":"1532","author":"Pennington","year":"2014"},{"key":"2024011822461626100_btad738-B40","author":"Peters","year":"2018"},{"key":"2024011822461626100_btad738-B41","doi-asserted-by":"crossref","first-page":"630","DOI":"10.1002\/prot.21248","article-title":"Prediction-based fingerprints of protein\u2013protein interactions","volume":"66","author":"Porollo","year":"2007","journal-title":"Proteins Struct Funct Bioinf"},{"key":"2024011822461626100_btad738-B42","doi-asserted-by":"crossref","first-page":"2428","DOI":"10.1016\/j.jmb.2020.02.026","article-title":"Prona2020 predicts protein\u2013DNA, protein\u2013RNA, and protein\u2013protein binding proteins and residues from sequence","volume":"432","author":"Qiu","year":"2020","journal-title":"J Mol Biol"},{"key":"2024011822461626100_btad738-B43","first-page":"5485","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2024011822461626100_btad738-B44","first-page":"8844","author":"Rao","year":"2021"},{"key":"2024011822461626100_btad738-B45","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024011822461626100_btad738-B46","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision\u2013recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS One"},{"key":"2024011822461626100_btad738-B47","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1038\/s41586-019-1923-7","article-title":"Improved protein structure prediction using potentials from deep learning","volume":"577","author":"Senior","year":"2020","journal-title":"Nature"},{"key":"2024011822461626100_btad738-B48","author":"Singh","year":"2014"},{"key":"2024011822461626100_btad738-B49","doi-asserted-by":"crossref","first-page":"5316","DOI":"10.1016\/j.csbj.2022.08.070","article-title":"Protein\u2013protein interaction prediction with deep learning: a comprehensive review","volume":"20","author":"Soleymani","year":"2022","journal-title":"Comput Struct Biotechnol J"},{"key":"2024011822461626100_btad738-B50","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2024011822461626100_btad738-B51","doi-asserted-by":"crossref","first-page":"2542","DOI":"10.1038\/s41467-018-04964-5","article-title":"Clustering huge protein sequence sets in linear time","volume":"9","author":"Steinegger","year":"2018","journal-title":"Nat Commun"},{"key":"2024011822461626100_btad738-B52","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1038\/s41592-019-0437-4","article-title":"Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold","volume":"16","author":"Steinegger","year":"2019","journal-title":"Nat Methods"},{"key":"2024011822461626100_btad738-B53","doi-asserted-by":"crossref","first-page":"2111","DOI":"10.1093\/bioinformatics\/btac071","article-title":"PIPENN: protein interface prediction from sequence with an ensemble of neural nets","volume":"38","author":"Stringer","year":"2022","journal-title":"Bioinformatics"},{"key":"2024011822461626100_btad738-B54","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","article-title":"UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches","volume":"31","author":"Suzek","year":"2015","journal-title":"Bioinformatics"},{"key":"2024011822461626100_btad738-B55","doi-asserted-by":"crossref","first-page":"1223","DOI":"10.1002\/jcc.24314","article-title":"Sequence-based prediction of protein\u2013peptide binding sites using support vector machine","volume":"37","author":"Taherzadeh","year":"2016","journal-title":"J Comput Chem"},{"key":"2024011822461626100_btad738-B56","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"Uniprot: a worldwide hub of protein knowledge","volume":"47","author":"UniProt","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024011822461626100_btad738-B57","doi-asserted-by":"crossref","first-page":"5961","DOI":"10.1021\/acs.jcim.2c01092","article-title":"RGN: residue-based graph attention and convolutional network for protein\u2013protein interaction site prediction","volume":"62","author":"Wang","year":"2022","journal-title":"J Chem Inf Model"},{"key":"2024011822461626100_btad738-B58","doi-asserted-by":"crossref","first-page":"746","DOI":"10.1109\/TNB.2015.2475359","article-title":"A Cascade random forests algorithm for predicting protein\u2013protein interaction sites","volume":"14","author":"Wei","year":"2015","journal-title":"IEEE Trans Nanobiosci"},{"key":"2024011822461626100_btad738-B59","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/j.neucom.2016.02.022","article-title":"Protein\u2013protein interaction sites prediction by ensembling svm and sample-weighted random forests","volume":"193","author":"Wei","year":"2016","journal-title":"Neurocomputing"},{"key":"2024011822461626100_btad738-B60","doi-asserted-by":"crossref","first-page":"1496","DOI":"10.1073\/pnas.1914677117","article-title":"Improved protein structure prediction using predicted interresidue orientations","volume":"117","author":"Yang","year":"2020","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024011822461626100_btad738-B61","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1093\/bioinformatics\/btab643","article-title":"Structure-aware protein\u2013protein interaction site prediction using deep graph convolutional network","volume":"38","author":"Yuan","year":"2022","journal-title":"Bioinformatics"},{"key":"2024011822461626100_btad738-B62","doi-asserted-by":"crossref","first-page":"1114","DOI":"10.1093\/bioinformatics\/btz699","article-title":"Protein\u2013protein interaction site prediction through combining local and global features with deep neural networks","volume":"36","author":"Zeng","year":"2020","journal-title":"Bioinformatics"},{"key":"2024011822461626100_btad738-B63","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.neucom.2019.05.013","article-title":"Sequence-based prediction of protein\u2013protein interaction sites by simplified long short-term memory network","volume":"357","author":"Zhang","year":"2019","journal-title":"Neurocomputing"},{"key":"2024011822461626100_btad738-B64","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1093\/bib\/bbx022","article-title":"Review and comparative assessment of sequence-based predictors of protein-binding residues","volume":"19","author":"Zhang","year":"2018","journal-title":"Brief Bioinform"},{"key":"2024011822461626100_btad738-B65","doi-asserted-by":"crossref","first-page":"i343","DOI":"10.1093\/bioinformatics\/btz324","article-title":"SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences","volume":"35","author":"Zhang","year":"2019","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/1\/btad738\/56207082\/btad738.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad738\/55520224\/btad738.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/1\/btad738\/56207082\/btad738.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,18]],"date-time":"2024-01-18T17:48:59Z","timestamp":1705600139000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad738\/7517105"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,1,1]]},"references-count":65,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad738","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.06.19.545575","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2024,1,1]]},"article-number":"btad738"}}