{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T19:02:10Z","timestamp":1758394930716},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"22","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,11,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The large-scale comparison of protein\u2013ligand binding sites is problematic, in that measures of structural similarity are difficult to quantify and are not easily understood in terms of statistical similarity that can ultimately be related to structure and function. We present a binding site matching score the Poisson Index (PI) based upon a well-defined statistical model. PI requires only the number of matching atoms between two sites and the size of the two sites\u2014the same information used by the Tanimoto Index (TI), a comparable and widely used measure for molecular similarity. We apply PI and TI to a previously automatically extracted set of binding sites to determine the robustness and usefulness of both scores.<\/jats:p><jats:p>Results: We found that PI outperforms TI; moreover, site similarity is poorly defined for TI at values around the 99.5% confidence level for which PI is well defined. A difference map at this confidence level shows that PI gives much more meaningful information than TI. We show individual examples where TI fails to distinguish either a false or a true site paring in contrast to PI, which performs much better. TI cannot handle large or small sites very well, or the comparison of large and small sites, in contrast to PI that is shown to be much more robust. Despite the difficulty of determining a biological \u2018ground truth\u2019 for binding site similarity we conclude that PI is a suitable measure of binding site similarity and could form the basis for a binding site classification scheme comparable to existing protein domain classification schema.<\/jats:p><jats:p>Availability: PI is implemented in SitesBase www.modelling.leeds.ac.uk\/sb\/<\/jats:p><jats:p>Contact: \u00a0r.m.jackson@leeds.ac.uk<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm470","type":"journal-article","created":{"date-parts":[[2007,9,25]],"date-time":"2007-09-25T00:13:08Z","timestamp":1190679188000},"page":"3001-3008","source":"Crossref","is-referenced-by-count":15,"title":["The Poisson Index: a new probabilistic model for protein\u2013ligand binding site similarity"],"prefix":"10.1093","volume":"23","author":[{"given":"J.R.","family":"Davies","sequence":"first","affiliation":[{"name":"1 School of Mathematics and 2Institute of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"R.M.","family":"Jackson","sequence":"additional","affiliation":[{"name":"1 School of Mathematics and 2Institute of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"K.V.","family":"Mardia","sequence":"additional","affiliation":[{"name":"1 School of Mathematics and 2Institute of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"C.C.","family":"Taylor","sequence":"additional","affiliation":[{"name":"1 School of Mathematics and 2Institute of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2007,9,24]]},"reference":[{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1177\/1087057104274091","article-title":"Development of CYP3A4 inhibition models: comparisons of machine-learning techniques and molecular descriptors","volume":"10","author":"Arimoto","year":"2005","journal-title":"J. Biomol. Screen."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1016\/0022-2836(87)90521-3","article-title":"Determinants of a protein fold. Unique features of the globin amino acid sequences","volume":"196","author":"Bashford","year":"1987","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1006\/jmbi.1999.2800","article-title":"Determination of the MurD mechanism through crystallographic analysis of enzyme complexes","volume":"289","author":"Bertrand","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1016\/S0022-2836(03)00882-9","article-title":"Inferring functional relationships of proteins from local sequence and spatial surface patterns","volume":"332","author":"Binkowski","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1002\/prot.20123","article-title":"Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching","volume":"56","author":"Brakoulias","year":"2004","journal-title":"Proteins"},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1021\/ci960373c","article-title":"The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding","volume":"37","author":"Brown","year":"1997","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"D189","DOI":"10.1093\/nar\/gkh034","article-title":"The ASTRAL compendium in 2004","volume":"32","author":"Chandonia","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"1712","DOI":"10.1110\/ps.12801","article-title":"Sequence-structure analysis of FAD-containing proteins","volume":"10","author":"Dym","year":"2001","journal-title":"Protein Sci."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1021\/bi00052a004","article-title":"Folding of subtilisin BPN: characterization of a folding intermediate","volume":"32","author":"Eder","year":"1993","journal-title":"Biochemistry"},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"D231","DOI":"10.1093\/nar\/gkj062","article-title":"Sitesbase: a database for structure-based protein ligand binding site comparisons","volume":"34","author":"Gold","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"1112","DOI":"10.1016\/j.jmb.2005.11.044","article-title":"Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships","volume":"355","author":"Gold","year":"2006","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/biomet\/93.2.235","article-title":"Bayesian alignment using hierarchical models with applications in protein bioinformatics","volume":"93","author":"Green","year":"2006","journal-title":"Biometrika"},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"1691","DOI":"10.1002\/pro.5560011217","article-title":"A database of protein structure families with common folding motifs","volume":"1","author":"Holm","year":"1992","journal-title":"Protein Sci."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"922","DOI":"10.1016\/j.ygeno.2004.08.005","article-title":"Learnability-based further prediction of gene functions in gene ontology","volume":"84","author":"Kang","year":"2004","journal-title":"Genomics"},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1016\/j.jmb.2006.04.024","article-title":"From the similarity analysis of protein cavities to the functional classification of protein families using Cavbase","volume":"359","author":"Khun","year":"2006","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1023\/A:1011318527094","article-title":"Identification of protein functions from a molecular surface database, eF-site","volume":"2","author":"Kinoshita","year":"2002","journal-title":"J. Struct. Funct. Genomics"},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"614","DOI":"10.1016\/j.jmb.2005.05.067","article-title":"Protein function prediction using local 3D templates","volume":"351","author":"Laskowski","year":"2005","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"395","DOI":"10.2174\/138920306778559386","article-title":"Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening","volume":"7","author":"Laurie","year":"2006","journal-title":"Curr. Protein Pept. Sci."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1006\/jmbi.1996.0072","article-title":"Crystal structure of Escherichia coli phosphoenolpyruvate carboxykinase: A new structural family with the p-loop nucleoside triphosphate hydrolase fold","volume":"256","author":"Matte","year":"1996","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"e104","DOI":"10.1093\/bioinformatics\/btl292","article-title":"Analysis of binding site similarity, small molecule similarity and experimental binding profiles in the human cytosolic sulfotransferase family","volume":"23","author":"Najmanovich","year":"2007","journal-title":"Bioinformatics"},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","article-title":"CATH a hierarchic classification of protein domain structures","volume":"5","author":"Orengo","year":"1997","journal-title":"Structure"},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/0022-2836(73)90388-4","article-title":"Comparison of super-secondary structures in proteins","volume":"76","author":"Rao","year":"1973","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1016\/S0022-2836(02)00811-2","article-title":"A new method to detect related function among proteins independent of sequence and fold homology","volume":"323","author":"Schmitt","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/S0968-0004(03)00090-2","article-title":"Many paths to methyltransfer: a chronicle of convergence","volume":"28","author":"Schubert","year":"2003","journal-title":"Trends Biochem. Sci."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1016\/j.jmb.2004.04.012","article-title":"Recognition of functional sites in protein structures","volume":"339","author":"Shulman-Peleg","year":"2004","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1016\/S0959-440X(01)00274-3","article-title":"The PRT protein family","volume":"11","author":"Sinha","year":"2001","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.1016\/S0022-2836(03)00045-7","article-title":"A model for statistical significance of local similarities in structure","volume":"326","author":"Stark","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1038\/nsb0196-74","article-title":"The crystal structure of GMP synthetase reveals a novel catalytic triad and is a structural paradigm for two enzyme families","volume":"3","author":"Tesmer","year":"1996","journal-title":"Nat. Struct. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1016\/j.sbi.2005.04.003","article-title":"Predicting protein function from sequence and structural data","volume":"15","author":"Watson","year":"2005","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1021\/ci00049a008","article-title":"Implementation of nearest-neighbor searching in an online chemical structure search system","volume":"26","author":"Willett","year":"1986","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"2023041208262506300_","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1002\/prot.20752","article-title":"Similarity networks of protein binding sites","volume":"62","author":"Zhang","year":"2006","journal-title":"Proteins"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/22\/3001\/49857736\/bioinformatics_23_22_3001.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/22\/3001\/49857736\/bioinformatics_23_22_3001.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,14]],"date-time":"2023-05-14T06:49:14Z","timestamp":1684046954000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/22\/3001\/209060"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,9,24]]},"references-count":32,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2007,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm470","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,11,15]]},"published":{"date-parts":[[2007,9,24]]}}}