{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T22:34:08Z","timestamp":1781044448985,"version":"3.54.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: A number of methods have been reported that predict protein\u2013protein interactions (PPIs) with high accuracy using only simple sequence-based features such as amino acid 3mer content. This is surprising, given that many protein interactions have high specificity that depends on detailed atomic recognition between physiochemically complementary surfaces. Are the reported high accuracies realistic?<\/jats:p>\n               <jats:p>Results: We find that the reported accuracies of the predictions are significantly over-estimated, and strongly dependent on the structure of the training and testing datasets used. The choice of which protein pairs are deemed as non-interactions in the training data has a variable impact on the accuracy estimates, and the accuracies can be artificially inflated by a bias towards dominant samples in the positive data which result from the presence of hub proteins in the protein interaction network. To address this bias, we propose a positive set-specific method to create a \u2018balanced\u2019 negative set maintaining the degree distribution for each protein, leading to the conclusion that simple sequence-based features contain insufficient information to be useful for predicting PPIs, but that protein domain-based features have some predictive value.<\/jats:p>\n               <jats:p>Availability: Our method, named \u2018BRS-nonint\u2019, is available at http:\/\/www.bioinformatics.leeds.ac.uk\/BRS-nonint\/. All the datasets used in this study are derived from publicly available data, and are available at http:\/\/www.bioinformatics.leeds.ac.uk\/BRS-nonint\/PPI_RandomBalance.html<\/jats:p>\n               <jats:p>Contact: \u00a0maozuguo@hit.edu.cn; d.r.westhead@leeds.ac.uk<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq483","type":"journal-article","created":{"date-parts":[[2010,8,28]],"date-time":"2010-08-28T00:40:38Z","timestamp":1282956038000},"page":"2610-2614","source":"Crossref","is-referenced-by-count":98,"title":["Simple sequence-based kernels do not predict protein\u2013protein interactions"],"prefix":"10.1093","volume":"26","author":[{"given":"Jiantao","family":"Yu","sequence":"first","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2Institute of Molecular and Cellular Biology, 3School of Computing, University of Leeds, Leeds, LS2 9JT, UK and 4School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China"},{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2Institute of Molecular and Cellular Biology, 3School of Computing, University of Leeds, Leeds, LS2 9JT, UK and 4School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Maozu","family":"Guo","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2Institute of Molecular and Cellular Biology, 3School of Computing, University of Leeds, Leeds, LS2 9JT, UK and 4School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chris J.","family":"Needham","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2Institute of Molecular and Cellular Biology, 3School of Computing, University of Leeds, Leeds, LS2 9JT, UK and 4School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yangchao","family":"Huang","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2Institute of Molecular and Cellular Biology, 3School of Computing, University of Leeds, Leeds, LS2 9JT, UK and 4School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lu","family":"Cai","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2Institute of Molecular and Cellular Biology, 3School of Computing, University of Leeds, Leeds, LS2 9JT, UK and 4School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David R.","family":"Westhead","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2Institute of Molecular and Cellular Biology, 3School of Computing, University of Leeds, Leeds, LS2 9JT, UK and 4School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014010, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2010,8,27]]},"reference":[{"key":"2023012507554452900_B1","doi-asserted-by":"crossref","first-page":"e154","DOI":"10.1371\/journal.pbio.0050154","article-title":"Still stratus not altocumulus: further evidence against the date\/party hub distinction","volume":"5","author":"Batada","year":"2007","journal-title":"PLoS Biol."},{"key":"2023012507554452900_B2","doi-asserted-by":"crossref","first-page":"i38","DOI":"10.1093\/bioinformatics\/bti1016","article-title":"Kernel methods for predicting protein-protein interactions","volume":"21","author":"Ben-Hur","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012507554452900_B3","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1093\/bioinformatics\/17.5.455","article-title":"Predicting protein-protein interactions from primary structure","volume":"17","author":"Bock","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012507554452900_B4","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1021\/pr050331g","article-title":"Predicting protein-protein interactions from sequences in a hybridization space","volume":"5","author":"Chou","year":"2006","journal-title":"J. Proteome Res."},{"key":"2023012507554452900_B5","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1126\/science.1180823","article-title":"The genetic landscape of a cell","volume":"327","author":"Costanzo","year":"2010","journal-title":"Science"},{"key":"2023012507554452900_B6","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1038\/nmeth0110-10b","article-title":"The importance of being negative","volume":"7","author":"Doerr","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012507554452900_B7","doi-asserted-by":"crossref","first-page":"D211","DOI":"10.1093\/nar\/gkp985","article-title":"The Pfam protein families database","volume":"38","author":"Finn","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012507554452900_B8","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1038\/nature04532","article-title":"Proteome survey reveals modularity of the yeast cell machinery","volume":"440","author":"Gavin","year":"2006","journal-title":"Nature"},{"key":"2023012507554452900_B9","doi-asserted-by":"crossref","first-page":"1875","DOI":"10.1093\/bioinformatics\/btg352","article-title":"Learning to predict protein-protein interactions from protein sequences","volume":"19","author":"Gomez","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012507554452900_B10","doi-asserted-by":"crossref","first-page":"3025","DOI":"10.1093\/nar\/gkn159","article-title":"Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences","volume":"36","author":"Guo","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012507554452900_B11","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1186\/1756-0500-3-145","article-title":"PRED\n            \u00a0PPI: a server for predicting protein-protein interactions based on sequence data with probability assignment","volume":"3","author":"Guo","year":"2010","journal-title":"BMC Res. Notes"},{"key":"2023012507554452900_B12","doi-asserted-by":"crossref","first-page":"4569","DOI":"10.1073\/pnas.061034498","article-title":"A comprehensive two-hybrid analysis to explore the yeast protein interactome","volume":"98","author":"Ito","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507554452900_B13","first-page":"564","article-title":"The spectrum kernel: a string kernel for SVM protein classification","author":"Leslie","year":"2002","journal-title":"Proc. Pac. Symp. Biocomput."},{"key":"2023012507554452900_B14","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1093\/bioinformatics\/bth483","article-title":"Predicting protein-protein interactions using signature products","volume":"21","author":"Martin","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012507554452900_B15","doi-asserted-by":"crossref","first-page":"1207","DOI":"10.1093\/bioinformatics\/btl055","article-title":"An ensemble of K-local hyperplanes for predicting protein-protein interactions","volume":"22","author":"Nanni","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012507554452900_B16","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1186\/1471-2105-10-419","article-title":"Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences","volume":"10","author":"Park","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012507554452900_B17","doi-asserted-by":"crossref","first-page":"D497","DOI":"10.1093\/nar\/gkh070","article-title":"Human protein reference database as a discovery resource for proteomics","volume":"32","author":"Peri","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012507554452900_B18","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1186\/1471-2105-7-365","article-title":"PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs","volume":"7","author":"Pitre","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012507554452900_B19","doi-asserted-by":"crossref","first-page":"4286","DOI":"10.1093\/nar\/gkn390","article-title":"Global investigation of protein-protein interactions in yeast Saccharomyces cerevisiae using re-occuring short polypeptide sequences","volume":"36","author":"Pitre","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012507554452900_B20","doi-asserted-by":"crossref","first-page":"e7813","DOI":"10.1371\/journal.pone.0007813","article-title":"Exploiting amino acid composition for predicting protein-protein interactions","volume":"4","author":"Roy","year":"2009","journal-title":"PLoS ONE"},{"key":"2023012507554452900_B21","doi-asserted-by":"crossref","first-page":"2498","DOI":"10.1101\/gr.1239303","article-title":"Cytoscape: a software environment for integrated models of biomolecular interaction networks","volume":"13","author":"Shannon","year":"2003","journal-title":"Genome Res."},{"key":"2023012507554452900_B22","doi-asserted-by":"crossref","first-page":"4337","DOI":"10.1073\/pnas.0607879104","article-title":"Predicting protein-protein interactions based only on sequences information","volume":"104","author":"Shen","year":"2007","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507554452900_B23","doi-asserted-by":"crossref","first-page":"D540","DOI":"10.1093\/nar\/gkp1026","article-title":"The negatome database: a reference set of non-interacting protein pairs","volume":"38","author":"Smialowski","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012507554452900_B24","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1006\/jmbi.2001.4920","article-title":"Correlated sequence-signatures as markers of protein-protein interaction","volume":"311","author":"Sprinzak","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2023012507554452900_B25","doi-asserted-by":"crossref","first-page":"D535","DOI":"10.1093\/nar\/gkj109","article-title":"BioGRID: a general repository for interaction datasets","volume":"34","author":"Stark","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012507554452900_B26","doi-asserted-by":"crossref","first-page":"957","DOI":"10.1016\/j.cell.2005.08.029","article-title":"A human protein-protein interaction network: a resource for annotating the proteome","volume":"122","author":"Stelzl","year":"2005","journal-title":"Cell"},{"key":"2023012507554452900_B27","doi-asserted-by":"crossref","first-page":"D142","DOI":"10.1093\/nar\/gkp846","article-title":"The Universal Protein Resource (UniProt) in 2010","volume":"38","author":"The UniProt Consortium","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012507554452900_B28","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1186\/1471-2105-11-167","article-title":"Predicting protein-protein interactions in unbalanced data using the primary structure of proteins","volume":"11","author":"Yu","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012507554452900_B29","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1126\/science.1158684","article-title":"High-quality binary protein interaction map of the yeast interactome network","volume":"322","author":"Yu","year":"2008","journal-title":"Science"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/20\/2610\/48852018\/bioinformatics_26_20_2610.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/20\/2610\/48852018\/bioinformatics_26_20_2610.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T07:56:13Z","timestamp":1674633373000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/20\/2610\/194618"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,8,27]]},"references-count":29,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2010,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq483","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,10,15]]},"published":{"date-parts":[[2010,8,27]]}}}