{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T14:32:21Z","timestamp":1780583541552,"version":"3.54.1"},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Machine learning may be the most popular computational tool in molecular biology. Providing sustained performance estimates is challenging. The standard cross-validation protocols usually fail in biology. Park and Marcotte found that even refined protocols fail for protein\u2013protein interactions (PPIs).<\/jats:p>\n               <jats:p>Results: Here, we sketch additional problems for the prediction of PPIs from sequence alone. First, it not only matters whether proteins A or B of a target interaction A\u2013B are similar to proteins of training interactions (positives), but also whether A or B are similar to proteins of non-interactions (negatives). Second, training on multiple interaction partners per protein did not improve performance for new proteins (not used to train). In contrary, a strictly non-redundant training that ignored good data slightly improved the prediction of difficult cases. Third, which prediction method appears to be best crucially depends on the sequence similarity between the test and the training set, how many true interactions should be found and the expected ratio of negatives to positives. The correct assessment of performance is the most complicated task in the development of prediction methods. Our analyses suggest that PPIs square the challenge for this task.<\/jats:p>\n               <jats:p>Availability and implementation: Datasets used in our analyses are available at https:\/\/rostlab.org\/owiki\/index.php\/PPI_challenges<\/jats:p>\n               <jats:p>Contact: \u00a0rost@in.tum.de<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu857","type":"journal-article","created":{"date-parts":[[2015,1,14]],"date-time":"2015-01-14T02:18:45Z","timestamp":1421201925000},"page":"1521-1525","source":"Crossref","is-referenced-by-count":68,"title":["More challenges for machine-learning protein interactions"],"prefix":"10.1093","volume":"31","author":[{"given":"Tobias","family":"Hamp","sequence":"first","affiliation":[{"name":"Department of Informatics, Bioinformatics and Computational Biology I12, Technische Universit\u00e4t M\u00fcnchen, 85748 Garching\/Munich, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Burkhard","family":"Rost","sequence":"additional","affiliation":[{"name":"Department of Informatics, Bioinformatics and Computational Biology I12, Technische Universit\u00e4t M\u00fcnchen, 85748 Garching\/Munich, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2015,1,12]]},"reference":[{"key":"2023020115413097700_btu857-B1","doi-asserted-by":"crossref","first-page":"900","DOI":"10.1093\/bioinformatics\/bts050","article-title":"Toward community standards in the quest for orthologs","volume":"28","author":"Dessimoz","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020115413097700_btu857-B2","doi-asserted-by":"crossref","first-page":"3025","DOI":"10.1093\/nar\/gkn159","article-title":"Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences","volume":"36","author":"Guo","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023020115413097700_btu857-B3","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The WEKA data mining software: an update","volume":"11","author":"Hall","year":"2009","journal-title":"SIGKDD Explor. Newsl."},{"key":"2023020115413097700_btu857-B4","first-page":"1137","article-title":"A study of cross-validation and bootstrap for accuracy estimation and model selection","volume-title":"IJCAI'95 Proceedings","author":"Kohavi","year":"1995"},{"key":"2023020115413097700_btu857-B5","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1093\/bioinformatics\/bth483","article-title":"Predicting protein-protein interactions using signature products","volume":"21","author":"Martin","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020115413097700_btu857-B6","doi-asserted-by":"crossref","first-page":"3789","DOI":"10.1093\/nar\/gkg620","article-title":"UniqueProt: creating representative protein sequence sets","volume":"31","author":"Mika","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023020115413097700_btu857-B7","doi-asserted-by":"crossref","first-page":"e79","DOI":"10.1371\/journal.pcbi.0020079","article-title":"Protein\u2013protein interactions more conserved within species than across species","volume":"2","author":"Mika","year":"2006","journal-title":"PLoS Comput. Biol."},{"key":"2023020115413097700_btu857-B8","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1038\/nmeth.2259","article-title":"Flaws in evaluation schemes for pair-input computational predictions","volume":"9","author":"Park","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020115413097700_btu857-B9","doi-asserted-by":"crossref","first-page":"4286","DOI":"10.1093\/nar\/gkn390","article-title":"Global investigation of protein-protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences","volume":"36","author":"Pitre","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023020115413097700_btu857-B10","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1038\/nbt.2831","article-title":"The binary protein-protein interaction landscape of Escherichia coli","volume":"32","author":"Rajagopala","year":"2014","journal-title":"Nat. Biotechnol."},{"key":"2023020115413097700_btu857-B11","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/protein\/12.2.85","article-title":"Twilight zone of protein sequence alignments","volume":"12","author":"Rost","year":"1999","journal-title":"Protein Eng."},{"key":"2023020115413097700_btu857-B12","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/S0022-2836(02)00016-5","article-title":"Enzyme function less conserved than anticipated","volume":"318","author":"Rost","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023020115413097700_btu857-B13","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1006\/jmbi.1993.1413","article-title":"Prediction of protein secondary structure at better than 70% accuracy","volume":"232","author":"Rost","year":"1993","journal-title":"J Mol. Biol."},{"key":"2023020115413097700_btu857-B14","doi-asserted-by":"crossref","first-page":"D449","DOI":"10.1093\/nar\/gkh086","article-title":"The database of interacting proteins: 2004 update","volume":"32","author":"Salwinski","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023020115413097700_btu857-B15","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1186\/1471-2105-11-605","article-title":"New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size","volume":"11","author":"Sambourg","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020115413097700_btu857-B16","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1002\/prot.340090107","article-title":"Database of homology-derived structures and the structural meaning of sequence alignment","volume":"9","author":"Sander","year":"1991","journal-title":"Proteins"},{"key":"2023020115413097700_btu857-B17","doi-asserted-by":"crossref","first-page":"e31826","DOI":"10.1371\/journal.pone.0031826","article-title":"HIPPIE: Integrating protein interaction networks with experiment based quality scores","volume":"7","author":"Schaefer","year":"2012","journal-title":"PLoS One"},{"key":"2023020115413097700_btu857-B18","doi-asserted-by":"crossref","first-page":"6959","DOI":"10.1073\/pnas.0708078105","article-title":"Estimating the size of the human interactome","volume":"105","author":"Stumpf","year":"2008","journal-title":"PNAS"},{"key":"2023020115413097700_btu857-B19","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1038\/nmeth.1280","article-title":"An empirical framework for binary interactome mapping","volume":"6","author":"Venkatesan","year":"2009","journal-title":"Nat Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/10\/1521\/49012547\/bioinformatics_31_10_1521.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/10\/1521\/49012547\/bioinformatics_31_10_1521.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:06:11Z","timestamp":1675296371000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/10\/1521\/176646"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,12]]},"references-count":19,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2015,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu857","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,5,15]]},"published":{"date-parts":[[2015,1,12]]}}}