{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T07:30:28Z","timestamp":1778311828755,"version":"3.51.4"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Unravelling the rules underlying protein\u2013protein and protein\u2013ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein\u2013protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain\u2013peptide interactions.<\/jats:p><jats:p>Results: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain\u2013peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the \u2018curse of dimension\u2019. Our results display an accuracy &amp;gt;90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data.<\/jats:p><jats:p>Contacts: \u00a0enrico@cbm.bio.uniroma2.it<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl403","type":"journal-article","created":{"date-parts":[[2006,7,27]],"date-time":"2006-07-27T00:58:50Z","timestamp":1153961930000},"page":"2333-2339","source":"Crossref","is-referenced-by-count":22,"title":["A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity"],"prefix":"10.1093","volume":"22","author":[{"given":"E.","family":"Ferraro","sequence":"first","affiliation":[{"name":"Centre of Molecular Bioinformatics, Department of Biology, University of Tor Vergata \u00a0 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"A.","family":"Via","sequence":"additional","affiliation":[{"name":"Centre of Molecular Bioinformatics, Department of Biology, University of Tor Vergata \u00a0 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"G.","family":"Ausiello","sequence":"additional","affiliation":[{"name":"Centre of Molecular Bioinformatics, Department of Biology, University of Tor Vergata \u00a0 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"M.","family":"Helmer-Citterich","sequence":"additional","affiliation":[{"name":"Centre of Molecular Bioinformatics, Department of Biology, University of Tor Vergata \u00a0 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2006,7,26]]},"reference":[{"key":"2023012409240161500_b1","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1038\/nature01511","article-title":"Mass spectrometry-based proteomics","volume":"422","author":"Aebersold","year":"2003","journal-title":"Nature"},{"key":"2023012409240161500_b2","volume-title":"Bioinformatics: The Machine Learning Approach,","author":"Baldi","year":"1998"},{"key":"2023012409240161500_b3","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/BF00993164","article-title":"Approximation and estimation bounds for artificial neural networks","volume":"14","author":"Barron","year":"1994","journal-title":"Mach. Learn."},{"key":"2023012409240161500_b4","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1162\/neco.1989.1.1.151","article-title":"What size net gives valid generalization?","volume":"1","author":"Baum","year":"1990","journal-title":"Neural comput."},{"key":"2023012409240161500_b5","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198538493.001.0001","volume-title":"Neural networks for Pattern Recognition","author":"Bishop","year":"1995"},{"key":"2023012409240161500_b6","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1093\/bioinformatics\/17.5.455","article-title":"Predicting protein\u2013protein interactions from primary structure","volume":"17","author":"Bock","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012409240161500_b7","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/j.sbi.2004.05.003","article-title":"Protein interaction networks from yeast to human","volume":"14","author":"Bork","year":"2004","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012409240161500_b8","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1016\/S0031-3203(96)00142-2","article-title":"The use of the area under the ROC curve in the evaluation of the machine learning algorithms","volume":"30","author":"Bradley","year":"1997","journal-title":"Pattern Recogn."},{"key":"2023012409240161500_b9","doi-asserted-by":"crossref","first-page":"3709","DOI":"10.1093\/nar\/gkg592","article-title":"iSPOT: A web tool to infer the interaction specificity of families of protein modules","volume":"31","author":"Brannetti","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023012409240161500_b10","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1006\/jmbi.2000.3670","article-title":"SH3-SPOT: an algorithm to predict preferred ligands to different members of the SH3 gene family","volume":"298","author":"Brannetti","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023012409240161500_b11","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1016\/S0014-5793(01)03307-5","article-title":"Can we infer peptide recognition specificity mediated by SH3 domains?","volume":"513","author":"Cesareni","year":"2001","journal-title":"FEBS Lett."},{"key":"2023012409240161500_b12","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1038\/47056","article-title":"Protein interaction maps for complete genomes based on gene fusion events","volume":"402","author":"Enright","year":"1999","journal-title":"Nature"},{"key":"2023012409240161500_b13","doi-asserted-by":"crossref","first-page":"1241","DOI":"10.1126\/science.7526465","article-title":"Two binding orientations for peptides to the Src SH3 domains: development of a general model for SH3-ligand interactions","volume":"266","author":"Feng","year":"1994","journal-title":"Science"},{"key":"2023012409240161500_b14","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1089\/omi.1.1998.3.199","article-title":"Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes","volume":"3","author":"Gaasterland","year":"1998","journal-title":"Microb. Comp. Genomics"},{"key":"2023012409240161500_b15","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1038\/415141a","article-title":"Functional organization of the yeast proteome by systematic analysis of protein complexes","volume":"415","author":"Gavin","year":"2002","journal-title":"Nature"},{"key":"2023012409240161500_b16","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1006\/jmbi.2000.3732","article-title":"Co-evolution of proteins with their interaction partners","volume":"299","author":"Goh","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023012409240161500_b17","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1002\/pro.5560060319","article-title":"Embedding strategies for effective use of information ifrom multiple sequence alignments","volume":"6","author":"Henikoff","year":"1997","journal-title":"Protein Sci."},{"key":"2023012409240161500_b18","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1038\/415180a","article-title":"Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry","volume":"415","author":"Ho","year":"2002","journal-title":"Nature"},{"key":"2023012409240161500_b19","doi-asserted-by":"crossref","first-page":"1143","DOI":"10.1073\/pnas.97.3.1143","article-title":"Toward a protein\u2013protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins","volume":"97","author":"Ito","year":"2000","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012409240161500_b20","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1096\/fasebj.14.2.231","article-title":"The importance of being proline: The interaction of proline-rich motifs in signaling proteins with their cognate domains","volume":"14","author":"Kay","year":"2000","journal-title":"FASEB J."},{"key":"2023012409240161500_b21","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1371\/journal.pbio.0020014","article-title":"Protein interaction networks by proteome peptide scanning","volume":"2","author":"Landgraf","year":"2004","journal-title":"PLOS Biol."},{"key":"2023012409240161500_b22","doi-asserted-by":"crossref","first-page":"532","DOI":"10.1093\/bioinformatics\/bti804","article-title":"A regularized discriminative model for the prediction of protein\u2013protein interactions","volume":"22","author":"Lehrach","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012409240161500_b23","first-page":"23","article-title":"Integrated access to sequence and structural data","volume-title":"Biosequences: Perspectives and User Services in Europe.","author":"Lesk","year":"1986"},{"key":"2023012409240161500_b24","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1042\/BJ20050411","article-title":"Specificity and versatility of SH3 and other proline-recognition domains: structural basis and implications for cellular signal transduction","volume":"390","author":"Li","year":"2005","journal-title":"Biochem. J."},{"key":"2023012409240161500_b25","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1038\/372375a0","article-title":"Structural determinants of peptide-binding orientation and of sequence specificity in SH3 domains","volume":"372","author":"Lim","year":"1994","journal-title":"Nature"},{"key":"2023012409240161500_b26","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1126\/science.285.5428.751","article-title":"Detecting protein function and protein\u2013protein interactions from genome sequences","volume":"285","author":"Marcotte","year":"1999","journal-title":"Science"},{"key":"2023012409240161500_b27","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1093\/bioinformatics\/bth483","article-title":"Predicting protein\u2013protein interactions using signature products","volume":"21","author":"Martin","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012409240161500_b28","doi-asserted-by":"crossref","first-page":"1253","DOI":"10.1242\/jcs.114.7.1253","article-title":"SH3 domains: Complexity in moderation","volume":"114","author":"Mayer","year":"2001","journal-title":"J. Cell Sci."},{"key":"2023012409240161500_b29","first-page":"753","article-title":"Application of an artificial neural network to predict specific class I MHC binding peptide sequences","volume":"16","author":"Milik","year":"1998","journal-title":"Nature"},{"key":"2023012409240161500_b30","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/S0065-3233(02)61006-X","article-title":"How SH3 domains recognize proline","volume":"61","author":"Musacchio","year":"2002","journal-title":"Adv. Protein Chem."},{"key":"2023012409240161500_b31","doi-asserted-by":"crossref","first-page":"1207","DOI":"10.1093\/bioinformatics\/btl055","article-title":"An ensemble of K-local hyperplanes for predicting protein\u2013protein interactions","volume":"22","author":"Nanni","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012409240161500_b32","first-page":"93","article-title":"Use of contiguity on the chromosome to predict functional coupling","volume":"1","author":"Overbeek","year":"1999","journal-title":"In Silico Biol."},{"key":"2023012409240161500_b33","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1093\/protein\/14.9.609","article-title":"Similarity of phylogenetic trees as indicator of protein\u2013protein interaction","volume":"14","author":"Pazos","year":"2001","journal-title":"Protein Eng."},{"key":"2023012409240161500_b34","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1002\/prot.10074","article-title":"In silico two-hybrid system for the selection of physically interacting protein pairs","volume":"47","author":"Pazos","year":"2002","journal-title":"Proteins"},{"key":"2023012409240161500_b35","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1006\/jmbi.1997.1198","article-title":"Correlated mutations contain information about protein\u2013protein interaction","volume":"271","author":"Pazos","year":"1997","journal-title":"J. Mol. Biol."},{"key":"2023012409240161500_b36","doi-asserted-by":"crossref","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","article-title":"Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles","volume":"96","author":"Pellegrini","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012409240161500_b37","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1093\/bioinformatics\/bth922","article-title":"Predicting protein-peptide interactions via a network-based motif sampler","volume":"20","author":"Reiss","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012409240161500_b38","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1016\/S0168-9525(00)02024-2","article-title":"EMBOSS: The European Molecular Biology Open Software Suite","volume":"16","author":"Rice","year":"2000","journal-title":"Trends in genetics"},{"key":"2023012409240161500_b39","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1016\/j.sbi.2004.04.006","article-title":"A structural perspective of protein\u2013protein interactions","volume":"14","author":"Russell","year":"2004","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012409240161500_b40","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1016\/0958-1669(95)80074-3","article-title":"Modeling mutations and homologous proteins","volume":"4","author":"Sali","year":"1995","journal-title":"Curr. Opin. Biotechnol"},{"key":"2023012409240161500_b41","doi-asserted-by":"crossref","first-page":"1540","DOI":"10.1073\/pnas.93.4.1540","article-title":"Distinct ligand preferences of Src homology 3 domains form Src, Yes, Abl, Cortactin, p53bp2, PLC\u03b3, Crk, and Grb2","volume":"93","author":"Sparks","year":"1996","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012409240161500_b42","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780199634972.003.0006","article-title":"Comparative modelling of proteins","volume-title":"Protein Structure Prediction, A practical approach","author":"Srinivasan","year":"1996"},{"key":"2023012409240161500_b43","doi-asserted-by":"crossref","first-page":"1469","DOI":"10.1038\/sj.onc.1202182","article-title":"From Src homology domains to other signalling modules: proposal of the \u2018protein recognition code\u2019","volume":"17","author":"Sudol","year":"1998","journal-title":"Oncogene"},{"key":"2023012409240161500_b44","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1126\/science.1064987","article-title":"A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules","volume":"295","author":"Tong","year":"2002","journal-title":"Science"},{"key":"2023012409240161500_b45","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/35001009","article-title":"A comprehensive analysis of protein\u2013protein interactions in Saccharomyces cerevisiae","volume":"403","author":"Uetz","year":"2000","journal-title":"Nature"},{"key":"2023012409240161500_b46","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1016\/S0959-440X(02)00333-0","article-title":"Computational methods for the prediction of protein interactions","volume":"12","author":"Valencia","year":"2002","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012409240161500_b47","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1016\/S0097-8485(96)00038-1","article-title":"Artificial neural networks for molecular sequence analysis","volume":"21","author":"Wu","year":"1997","journal-title":"Comp. Chem."},{"key":"2023012409240161500_b48","doi-asserted-by":"crossref","first-page":"1978","DOI":"10.1093\/bioinformatics\/btg255","article-title":"Application of support vector machines for T-cell epitopes prediction","volume":"19","author":"Zhao","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012409240161500_b49","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/S1367-5931(02)00005-4","article-title":"Protein chip technology","volume":"7","author":"Zhu","year":"2003","journal-title":"Curr. Opin Chem. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/19\/2333\/48841463\/bioinformatics_22_19_2333.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/19\/2333\/48841463\/bioinformatics_22_19_2333.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,5]],"date-time":"2024-02-05T15:44:55Z","timestamp":1707147895000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/19\/2333\/241294"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,7,26]]},"references-count":49,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2006,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl403","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,10,1]]},"published":{"date-parts":[[2006,7,26]]}}}