{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T17:50:30Z","timestamp":1775325030078,"version":"3.50.1"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"S1","license":[{"start":{"date-parts":[[2006,3,1]],"date-time":"2006-03-01T00:00:00Z","timestamp":1141171200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"},{"start":{"date-parts":[[2006,3,20]],"date-time":"2006-03-20T00:00:00Z","timestamp":1142812800000},"content-version":"vor","delay-in-days":19,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions.<\/jats:p>","DOI":"10.1186\/1471-2105-7-s1-s2","type":"journal-article","created":{"date-parts":[[2006,4,20]],"date-time":"2006-04-20T15:42:44Z","timestamp":1145547764000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":168,"title":["Choosing negative examples for the prediction of protein-protein interactions"],"prefix":"10.1186","volume":"7","author":[{"given":"Asa","family":"Ben-Hur","sequence":"first","affiliation":[]},{"given":"William Stafford","family":"Noble","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2006,3,20]]},"reference":[{"key":"1285_CR1","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1038\/nature750","volume":"417","author":"C von Mering","year":"2002","unstructured":"von Mering C, Krause R, Snel B, Cornell M, Olivier SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417: 399\u2013403. 10.1038\/nature750","journal-title":"Nature"},{"key":"1285_CR2","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1006\/jmbi.2001.4920","volume":"311","author":"E Sprinzak","year":"2001","unstructured":"Sprinzak E, Margalit H: Correlated sequence-signatures as markers of protein-protein interaction. Journal of Molecular Biology 2001, 311: 681\u2013692. 10.1006\/jmbi.2001.4920","journal-title":"Journal of Molecular Biology"},{"issue":"10","key":"1285_CR3","doi-asserted-by":"publisher","first-page":"1540","DOI":"10.1101\/gr.153002","volume":"12","author":"M Deng","year":"2002","unstructured":"Deng M, Mehta S, Sun F, Chen T: Inferring domain-domain interactions from protein-protein interactions. Genome Research 2002, 12(10):1540\u20131548. 10.1101\/gr.153002","journal-title":"Genome Research"},{"key":"1285_CR4","doi-asserted-by":"publisher","first-page":"1875","DOI":"10.1093\/bioinformatics\/btg352","volume":"19","author":"SM Gomez","year":"2003","unstructured":"Gomez SM, Noble WS, Rzhetsky A: Learning to predict protein-protein interactions. Bioinformatics 2003, 19: 1875\u20131881. 10.1093\/bioinformatics\/btg352","journal-title":"Bioinformatics"},{"key":"1285_CR5","first-page":"1465","volume-title":"Advances in Neural Information Processing Systems 17","author":"H Wang","year":"2005","unstructured":"Wang H, Segal E, Ben-Hur A, Koller D, Brutlag DL: Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale. In Advances in Neural Information Processing Systems 17. Edited by: Saul LK, Weiss Y, Bottou L. Cambridge, MA: MIT Press; 2005:1465\u20131472."},{"issue":"2","key":"1285_CR6","doi-asserted-by":"publisher","first-page":"218","DOI":"10.1093\/bioinformatics\/bth483","volume":"21","author":"S Martin","year":"2005","unstructured":"Martin S, Roe D, Faulon JL: Predicting protein-protein interactions using signature products. Bioinformatics 2005, 21(2):218\u2013226. 10.1093\/bioinformatics\/bth483","journal-title":"Bioinformatics"},{"issue":"suppl 1","key":"1285_CR7","doi-asserted-by":"publisher","first-page":"i38","DOI":"10.1093\/bioinformatics\/bti1016","volume":"21","author":"A Ben-Hur","year":"2005","unstructured":"Ben-Hur A, Noble WS: Kernel methods for predicting protein-protein interactions. Bioinformatics 2005, 21(suppl 1):i38-i46. 10.1093\/bioinformatics\/bti1016","journal-title":"Bioinformatics"},{"key":"1285_CR8","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/S0022-2836(03)00114-1","volume":"327","author":"A Ramani","year":"2003","unstructured":"Ramani A, Marcotte E: Exploiting the co-evolution of interacting proteins to discover interaction specificity. Journal of Molecular Biology 2003, 327: 273\u2013284. 10.1016\/S0022-2836(03)00114-1","journal-title":"Journal of Molecular Biology"},{"issue":"2","key":"1285_CR9","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1002\/prot.10074","volume":"47","author":"F Pazos","year":"2002","unstructured":"Pazos F, Valencia A: In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins: Structure, Function and Genetics 2002, 47(2):219\u2013227. 10.1002\/prot.10074","journal-title":"Proteins: Structure, Function and Genetics"},{"key":"1285_CR10","doi-asserted-by":"publisher","first-page":"751","DOI":"10.1126\/science.285.5428.751","volume":"285","author":"EM Marcotte","year":"1999","unstructured":"Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285: 751\u2013753. 10.1126\/science.285.5428.751","journal-title":"Science"},{"key":"1285_CR11","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1126\/science.1087361","volume":"302","author":"R Jansen","year":"2003","unstructured":"Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302: 449\u2013453. 10.1126\/science.1087361","journal-title":"Science"},{"key":"1285_CR12","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1186\/1471-2105-5-38","volume":"5","author":"LV Zhang","year":"2004","unstructured":"Zhang LV, Wong S, King O, Roth F: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 2004, 5: 38\u201353. 10.1186\/1471-2105-5-38","journal-title":"BMC Bioinformatics"},{"key":"1285_CR13","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1186\/1471-2105-5-154","volume":"5","author":"N Lin","year":"2004","unstructured":"Lin N, Wu B, Jansen R, Gerstein M, Zhao H: Information assessment on predicting protein-protein interactions. BMC Bioinformatics 2004, 5: 154. 10.1186\/1471-2105-5-154","journal-title":"BMC Bioinformatics"},{"issue":"5","key":"1285_CR14","doi-asserted-by":"publisher","first-page":"919","DOI":"10.1016\/S0022-2836(03)00239-0","volume":"327","author":"E Sprinzak","year":"2003","unstructured":"Sprinzak E, Sattath S, Margalit H: How Reliable are Experimental Protein-Protein Interaction Data? Journal of Molecular Biology 2003, 327(5):919\u2013923. 10.1016\/S0022-2836(03)00239-0","journal-title":"Journal of Molecular Biology"},{"key":"1285_CR15","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1074\/mcp.M100037-MCP200","volume":"1","author":"C Deane","year":"2002","unstructured":"Deane C, Salwinski L, Xenarios I, Eisenberg D: Two Methods for Assessment of the Reliability of High Throughput Observations. Molecular & Cellular Proteomics 2002, 1: 349\u2013356. 10.1074\/mcp.M100037-MCP200","journal-title":"Molecular & Cellular Proteomics"},{"key":"1285_CR16","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1016\/j.mib.2004.08.012","volume":"7","author":"R Jansen","year":"2004","unstructured":"Jansen R, Gerstein M: Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. Current Opnion in Microbiology 2004, 7: 535\u2013545. 10.1016\/j.mib.2004.08.012","journal-title":"Current Opnion in Microbiology"},{"key":"1285_CR17","volume-title":"Proceedings of the Pacific Symposium on Biocomputing","author":"Y Qi","year":"2005","unstructured":"Qi Y, Klein-Seetharaman J, Bar-Joseph Z: Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources. Proceedings of the Pacific Symposium on Biocomputing 2005."},{"issue":"14","key":"1285_CR18","doi-asserted-by":"publisher","first-page":"4157","DOI":"10.1093\/nar\/gkg466","volume":"31","author":"A Grigoriev","year":"2003","unstructured":"Grigoriev A: On the number of protein-protein interactions in the yeast proteome. nar 2003, 31(14):4157\u20134161. 10.1093\/nar\/gkg466","journal-title":"nar"},{"key":"1285_CR19","doi-asserted-by":"publisher","first-page":"4241","DOI":"10.1091\/mbc.11.12.4241","volume":"11","author":"A Gasch","year":"2000","unstructured":"Gasch A, Spellman P, Kao C, Carmel-Harel O, Eisen M, Storz G, Botstein D, Brown P: Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Molecular Biology of the Cell 2000, 11: 4241\u20134257.","journal-title":"Molecular Biology of the Cell"},{"key":"1285_CR20","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"Consortium Gene Ontology","year":"2000","unstructured":"Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25: 25\u20139. 10.1038\/75556","journal-title":"Nat Genet"},{"key":"1285_CR21","first-page":"448","volume-title":"IJCAI","author":"P Resnik","year":"1995","unstructured":"Resnik P: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. IJCAI 1995, 448\u2013453. [citeseer.ist.psu.edu\/resnik95using.html] [citeseer.ist.psu.edu\/resnik95using.html]"},{"issue":"10","key":"1285_CR22","doi-asserted-by":"publisher","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","volume":"19","author":"P Lord","year":"2003","unstructured":"Lord P, Stevens R, Brass A, Goble C: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 2003, 19(10):1275\u20131283. 10.1093\/bioinformatics\/btg153","journal-title":"Bioinformatics"},{"issue":"10","key":"1285_CR23","doi-asserted-by":"publisher","first-page":"6562","DOI":"10.1073\/pnas.102102699","volume":"99","author":"C Ambroise","year":"2002","unstructured":"Ambroise C, McLachlan GJ: Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(10):6562\u20136566. 10.1073\/pnas.102102699","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"key":"1285_CR24","doi-asserted-by":"publisher","first-page":"242","DOI":"10.1093\/nar\/29.1.242","volume":"29","author":"GD Bader","year":"2001","unstructured":"Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW: BIND-The Biomolecular Interaction Network Database. Nucleic Acids Res 2001, 29: 242\u2013245. 10.1093\/nar\/29.1.242","journal-title":"Nucleic Acids Res"},{"key":"1285_CR25","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1093\/nar\/28.1.37","volume":"28","author":"HW Mewes","year":"2000","unstructured":"Mewes HW, Frishman D, Gruber C, Geier B, Haase D, Kaps A, Lemcke K, Mannhaupt G, Pfeiffer F, Sch\u00fcller C, Stocker S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Research 2000, 28: 37\u201340. 10.1093\/nar\/28.1.37","journal-title":"Nucleic Acids Research"},{"key":"1285_CR26","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1093\/nar\/30.1.303","volume":"30","author":"I Xenarios","year":"2002","unstructured":"Xenarios I, Salwinski L, Duan XQJ, Higney P, Kim SM, Eisenberg D: DIP: the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research 2002, 30: 303\u2013305. 10.1093\/nar\/30.1.303","journal-title":"Nucleic Acids Research"},{"key":"1285_CR27","first-page":"144","volume-title":"5th Annual ACM Workshop on COLT","author":"BE Boser","year":"1992","unstructured":"Boser BE, Guyon IM, Vapnik VN: A Training Algorithm for Optimal Margin Classifiers.In 5th Annual ACM Workshop on COLT Edited by: Haussler D. Pittsburgh, PA: ACM Press; 1992, 144\u2013152. [http:\/\/www.clopinet.com\/isabelle\/Papers\/] full_text"},{"key":"1285_CR28","volume-title":"Learning with Kernels","author":"B Sch\u00f6lkopf","year":"2002","unstructured":"Sch\u00f6lkopf B, Smola A: Learning with Kernels. Cambridge, MA: MIT Press; 2002."},{"key":"1285_CR29","doi-asserted-by":"crossref","first-page":"71","DOI":"10.7551\/mitpress\/4057.003.0005","volume-title":"Kernel methods in computational biology, chap. Support vector machine applications in computational biology","author":"WS Noble","year":"2004","unstructured":"Noble WS: Kernel methods in computational biology, chap. Support vector machine applications in computational biology. Cambridge, MA: MIT Press; 2004:71\u201392."},{"key":"1285_CR30","first-page":"564","volume-title":"Proceedings of the Pacific Symposium on Biocomputing","author":"C Leslie","year":"2002","unstructured":"Leslie C, Eskin E, Noble WS: The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing. Edited by: Altman RB, Dunker AK, Hunter L, Lauderdale K, Klein TE. New Jersey: World Scientific; 2002:564\u2013575."},{"issue":"suppl 1","key":"1285_CR31","first-page":"i26","volume":"19","author":"A Ben-hur","year":"2003","unstructured":"Ben-hur A, Brutlag D: Remote homology detection: a motif based approach. Proceedings of the Eleventh International Conference on Intelligent Systems for Molecular Biology 2003, 19(suppl 1):i26-i33.","journal-title":"Proceedings of the Eleventh International Conference on Intelligent Systems for Molecular Biology"},{"key":"1285_CR32","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1093\/nar\/gki060","volume":"33","author":"Q Su","year":"2005","unstructured":"Su Q, Liu L, Saxonov S, Brutlag D: eBLOCKS: enumerating conserved protein blocks to achieve maximal sensitivity and specificity. Nucleic Acids Research 2005, 33: 178\u2013182. 10.1093\/nar\/gki060","journal-title":"Nucleic Acids Research"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-S1-S2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-7-S1-S2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-S1-S2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,3]],"date-time":"2024-02-03T21:54:35Z","timestamp":1706997275000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-S1-S2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,3]]},"references-count":32,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2006,3]]}},"alternative-id":["1285"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-s1-s2","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,3]]},"assertion":[{"value":"20 March 2006","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S2"}}