{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T18:44:17Z","timestamp":1773859457744,"version":"3.50.1"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The ability to rank proteins by their likely success in crystallization is useful in current Structural Biology efforts and in particular in high-throughput Structural Genomics initiatives. We present ParCrys, a Parzen Window approach to estimate a protein's propensity to produce diffraction-quality crystals. The Protein Data Bank (PDB) provided training data whilst the databases TargetDB and PepcDB were used to define feature selection data as well as test data independent of feature selection and training. ParCrys outperforms the OB-Score, SECRET and CRYSTALP on the data examined, with accuracy and Matthews correlation coefficient values of 79.1% and 0.582, respectively (74.0% and 0.227, respectively, on data with a \u2018real-world\u2019 ratio of positive:negative examples). ParCrys predictions and associated data are available from www.compbio.dundee.ac.uk\/parcrys.<\/jats:p>\n               <jats:p>Contact: \u00a0geoff@compbio.dundee.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn055","type":"journal-article","created":{"date-parts":[[2008,2,20]],"date-time":"2008-02-20T03:58:37Z","timestamp":1203479917000},"page":"901-907","source":"Crossref","is-referenced-by-count":57,"title":["ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction"],"prefix":"10.1093","volume":"24","author":[{"given":"Ian M.","family":"Overton","sequence":"first","affiliation":[{"name":"1 School of Life Sciences Research, University of Dundee, Dow Street, Dundee, DD1 5EH and 2Department of Computing Science, University of Glasgow, Glasgow, GL12 8QQ, UK"}]},{"given":"Gianandrea","family":"Padovani","sequence":"additional","affiliation":[{"name":"1 School of Life Sciences Research, University of Dundee, Dow Street, Dundee, DD1 5EH and 2Department of Computing Science, University of Glasgow, Glasgow, GL12 8QQ, UK"}]},{"given":"Mark A.","family":"Girolami","sequence":"additional","affiliation":[{"name":"1 School of Life Sciences Research, University of Dundee, Dow Street, Dundee, DD1 5EH and 2Department of Computing Science, University of Glasgow, Glasgow, GL12 8QQ, UK"}]},{"given":"Geoffrey J.","family":"Barton","sequence":"additional","affiliation":[{"name":"1 School of Life Sciences Research, University of Dundee, Dow Street, Dundee, DD1 5EH and 2Department of Computing Science, University of Glasgow, Glasgow, GL12 8QQ, UK"}]}],"member":"286","published-online":{"date-parts":[[2008,2,19]]},"reference":[{"key":"2023020209531938300_B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023020209531938300_B2","doi-asserted-by":"crossref","first-page":"D115","DOI":"10.1093\/nar\/gkh131","article-title":"UniProt: the universal protein knowledgebase","volume":"32","author":"Apweiler","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023020209531938300_B3","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/0022-2836(87)90316-0","article-title":"A strategy for the rapid multiple alignment of protein sequences: confidence levels from tertiary structure comparisons","volume":"198","author":"Barton","year":"1987","journal-title":"J. Mol. Biol"},{"key":"2023020209531938300_B4","doi-asserted-by":"crossref","first-page":"D301","DOI":"10.1093\/nar\/gkl971","article-title":"The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data","volume":"35","author":"Berman","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023020209531938300_B5","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1107\/S0021889805008277","article-title":"Practical implementations for improving the throughput in a manual crystallization setup","volume":"38","author":"Biertumpfel","year":"2005","journal-title":"J. Appl. Cryst"},{"key":"2023020209531938300_B6","doi-asserted-by":"crossref","first-page":"967","DOI":"10.1038\/80747","article-title":"Target selection for structural genomics","volume":"7","author":"Brenner","year":"2000","journal-title":"Nat. Struct. Biol"},{"key":"2023020209531938300_B7","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1038\/13783","article-title":"Structural genomics: beyond the human genome project","volume":"23","author":"Burley","year":"1999","journal-title":"Nat. Genet"},{"key":"2023020209531938300_B8","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1016\/j.jmb.2004.09.076","article-title":"Protein biophysical properties that correlate with crystallisation success in Thermotoga maritima: maximum clustering strategy for structural genomics","volume":"344","author":"Canaves","year":"2004","journal-title":"J. Mol. Biol"},{"key":"2023020209531938300_B9","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1002\/prot.20298","article-title":"Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches","volume":"58","author":"Chandonia","year":"2005","journal-title":"Proteins"},{"key":"2023020209531938300_B10","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1126\/science.1121018","article-title":"The impact of structural genomics: expectations and outcomes","volume":"311","author":"Chandonia","year":"2006","journal-title":"Science"},{"key":"2023020209531938300_B11","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1002\/prot.20674","article-title":"Target selection and deselection at the berkeley structural genomics centre","volume":"62","author":"Chandonia","year":"2006","journal-title":"Proteins"},{"key":"2023020209531938300_B12","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1016\/j.sbi.2004.08.002","article-title":"Turning protein crystallisation from an art into a science","volume":"14","author":"Chayen","year":"2004","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023020209531938300_B13","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1016\/j.bbrc.2007.02.040","article-title":"Prediction of protein crystallization using collocation of amino acid pairs","volume":"355","author":"Chen","year":"2007","journal-title":"Biochem. Biophys. Res. Commun"},{"key":"2023020209531938300_B14","doi-asserted-by":"crossref","first-page":"2860","DOI":"10.1093\/bioinformatics\/bth300","article-title":"TargetDB: a target registration database for structural genomics projects","volume":"20","author":"Chen","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020209531938300_B15","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1038\/nsb842","article-title":"Structure-based design of a potent purine-based cyclin-dependent kinase inhibitor","volume":"9","author":"Davies","year":"2002","journal-title":"Nat. Struct. Mol. Biol"},{"key":"2023020209531938300_B16","doi-asserted-by":"crossref","first-page":"7229","DOI":"10.1093\/emboj\/20.24.7229","article-title":"Translocation portals for the substrates and products of a viral transcription complex: the bluetongue virus core","volume":"20","author":"Diprose","year":"2001","journal-title":"EMBO J"},{"key":"2023020209531938300_B17","volume-title":"Pattern Classification and Scene Analysis","author":"Duda","year":"1973"},{"key":"2023020209531938300_B18","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023020209531938300_B19","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1146\/annurev.bb.15.060186.001541","article-title":"Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins","volume":"15","author":"Engelman","year":"1986","journal-title":"Ann. Rev. Biophys. Biophys. Chem"},{"key":"2023020209531938300_B20","doi-asserted-by":"crossref","first-page":"D247","DOI":"10.1093\/nar\/gkj149","article-title":"Pfam: clans, web tools and services","volume":"34","author":"Finn","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020209531938300_B21","article-title":"GNU Scientific Library Reference Manual \u2013 Revised","author":"Galassi","year":"2006","edition":"2nd edn"},{"key":"2023020209531938300_B22","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/j.jmb.2003.11.053","article-title":"Mining the Structural genomics pipeline: identification of protein properties that affect high-throughput experimental analyses","volume":"336","author":"Goh","year":"2004","journal-title":"J. Mol. Biol"},{"key":"2023020209531938300_B23","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"2023020209531938300_B24","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1016\/S1047-8477(03)00046-7","article-title":"High-throughput protein crystallisation","volume":"142","author":"Hiu","year":"2003","journal-title":"J. Struct. Biol"},{"key":"2023020209531938300_B25","doi-asserted-by":"crossref","first-page":"964","DOI":"10.1038\/80744","article-title":"Structural genomics for science and society","volume":"7","author":"Hol","year":"2000","journal-title":"Nat. Struct. Biol"},{"key":"2023020209531938300_B26","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1002\/prot.20012","article-title":"Automatic target selection for structural genomics on eukaryotes","volume":"56","author":"Liu","year":"2004","journal-title":"Proteins"},{"key":"2023020209531938300_B27","doi-asserted-by":"crossref","first-page":"4005","DOI":"10.1016\/j.febslet.2006.06.015","article-title":"A normalised scale for structural genomics target ranking: the OB-Score","volume":"580","author":"Overton","year":"2006","journal-title":"FEBS Lett"},{"key":"2023020209531938300_B28","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1214\/aoms\/1177704472","article-title":"On estimation of a probability density function and mode","volume":"33","author":"Parzen","year":"1962","journal-title":"Ann. Math. Stat"},{"key":"2023020209531938300_B29","doi-asserted-by":"crossref","first-page":"1058","DOI":"10.1128\/AAC.41.5.1058","article-title":"Antiviral activity of the Dihydropyrone PNU-140690, a new nonpeptide himan immunodeficiency virus protease inhibitor","volume":"41","author":"Poppe","year":"1997","journal-title":"Antimicrob. Agents Chemother"},{"key":"2023020209531938300_B30","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1016\/j.pbiomolbio.2004.07.011","article-title":"Life in the fast lane for protein crystallization and X-ray crystallography","volume":"88","author":"Puesy","year":"2005","journal-title":"Prog. Biophys. Mol. Biol"},{"key":"2023020209531938300_B31","article-title":"R: A language and environment for statistical computing","author":"R Development Core Team","year":"2004"},{"key":"2023020209531938300_B32","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1016\/S0168-9525(00)02024-2","article-title":"EMBOSS: the european molecular biology open software suite","volume":"16","author":"Rice","year":"2000","journal-title":"Trends Genet"},{"key":"2023020209531938300_B33","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/protein\/12.2.85","article-title":"Twilight zone of protein sequence alignments","volume":"12","author":"Rost","year":"1999","journal-title":"Protein Eng"},{"key":"2023020209531938300_B34","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1002\/prot.10282","article-title":"Strategies for structural proteomics of prokaryotes: quantifying the advantages of studying orthologous proteins and of using both NMR and x-ray crystallography approaches","volume":"50","author":"Savchenko","year":"2003","journal-title":"Proteins"},{"key":"2023020209531938300_B35","doi-asserted-by":"crossref","first-page":"27278","DOI":"10.1074\/jbc.M604048200","article-title":"Screening-based discovery and structural dissection of a novel family 18 chitinase Inhibitor","volume":"281","author":"Schuttelkopf","year":"2006","journal-title":"J. Biol. Chem"},{"key":"2023020209531938300_B36","doi-asserted-by":"crossref","first-page":"948","DOI":"10.1126\/science.298.5595.948","article-title":"Tapping DNA for structures produces a trickle","volume":"298","author":"Service","year":"2002","journal-title":"Science"},{"key":"2023020209531938300_B37","doi-asserted-by":"crossref","first-page":"1554","DOI":"10.1126\/science.307.5715.1554","article-title":"Structural genomics, round 2","volume":"307","author":"Service","year":"2005","journal-title":"Science"},{"key":"2023020209531938300_B38","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/S0958-1669(99)00064-6","article-title":"Finding function through structural genomics","volume":"11","author":"Shapiro","year":"2000","journal-title":"Curr. Opin. Biotechnol"},{"key":"2023020209531938300_B39","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1038\/nature04443","article-title":"Structural basis for duffy recognition by the malaria parasite duffy-binding-like domain","volume":"439","author":"Singh","year":"2006","journal-title":"Nature"},{"key":"2023020209531938300_B40","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1002\/prot.20789","article-title":"Will my protein crystallize? A sequence-based predictor","volume":"62","author":"Smialowski","year":"2006","journal-title":"Proteins: Struct., Funct. Bioinformatics"},{"key":"2023020209531938300_B41","doi-asserted-by":"crossref","first-page":"1611","DOI":"10.1101\/gr.361602","article-title":"The bioperl toolkit: perl modules for the life sciences","volume":"12","author":"Stajich","year":"2004","journal-title":"Genome Res"},{"key":"2023020209531938300_B42","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1126\/science.1066011","article-title":"Global efforts in structural genomics","volume":"294","author":"Stevens","year":"2001","journal-title":"Science"},{"key":"2023020209531938300_B43","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1038\/80700","article-title":"Structural genomics in North America","volume":"7","author":"Terwillinger","year":"2000","journal-title":"Nat. Struct. Biol"},{"key":"2023020209531938300_B44","doi-asserted-by":"crossref","first-page":"1235","DOI":"10.1016\/j.jmb.2005.03.037","article-title":"Progress of structural genomics initiatives: an analysis of solved target structures","volume":"348","author":"Todd","year":"2005","journal-title":"J. Mol. Biol"},{"key":"2023020209531938300_B45","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1038\/363418a0","article-title":"Rational design of potent sialidase-based inhibitors of influenza virus replication","volume":"363","author":"von Itzstein","year":"1993","journal-title":"Nature"},{"key":"2023020209531938300_B46","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/S0097-8485(00)80008-X","article-title":"A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins","volume":"24","author":"Wan","year":"2000","journal-title":"Comput. Chem"},{"key":"2023020209531938300_B47","doi-asserted-by":"crossref","first-page":"1589","DOI":"10.1093\/bioinformatics\/btg224","article-title":"PISCES: a protein sequence culling server","volume":"19","author":"Wang","year":"2003","journal-title":"Bioinformatics"},{"key":"2023020209531938300_B48","doi-asserted-by":"crossref","first-page":"870","DOI":"10.1016\/j.jmb.2007.04.086","article-title":"The structure of serine palmitoyltransferase; gateway to sphingolipid biosynthesis","volume":"370","author":"Yard","year":"2007","journal-title":"J. Mol. Biol"},{"key":"2023020209531938300_B49","doi-asserted-by":"crossref","first-page":"15189","DOI":"10.1073\/pnas.95.26.15189","article-title":"Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics","volume":"95","author":"Zarembinski","year":"1998","journal-title":"PNAS"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/7\/901\/49046585\/bioinformatics_24_7_901.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/7\/901\/49046585\/bioinformatics_24_7_901.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T10:59:33Z","timestamp":1675335573000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/7\/901\/296521"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,2,19]]},"references-count":49,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2008,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn055","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,4,1]]},"published":{"date-parts":[[2008,2,19]]}}}