{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,3,11]],"date-time":"2024-03-11T22:10:02Z","timestamp":1710195002459},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2756,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies.<\/jats:p><jats:p>Results: We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank\/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods.<\/jats:p><jats:p>Availability: Links to the software and data used in this study are available at http:\/\/dbkgroup.org\/handl\/decoy_sets.<\/jats:p><jats:p>Contact: \u00a0simon.lovell@manchester.ac.uk<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp150","type":"journal-article","created":{"date-parts":[[2009,3,19]],"date-time":"2009-03-19T00:24:17Z","timestamp":1237422257000},"page":"1271-1279","source":"Crossref","is-referenced-by-count":21,"title":["Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction"],"prefix":"10.1093","volume":"25","author":[{"given":"Julia","family":"Handl","sequence":"first","affiliation":[{"name":"1 Faculty of Life Sciences and 2School of Computer Science, University of Manchester, Manchester, UK"}]},{"given":"Joshua","family":"Knowles","sequence":"additional","affiliation":[{"name":"1 Faculty of Life Sciences and 2School of Computer Science, University of Manchester, Manchester, UK"}]},{"given":"Simon C.","family":"Lovell","sequence":"additional","affiliation":[{"name":"1 Faculty of Life Sciences and 2School of Computer Science, University of Manchester, Manchester, UK"}]}],"member":"286","published-online":{"date-parts":[[2009,3,17]]},"reference":[{"key":"2023013110284885400_B1","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1002\/prot.1170","article-title":"Rosetta in CASP4: progress in ab initio protein structure prediction","volume":"S5","author":"Bonneau","year":"2001","journal-title":"Proteins"},{"key":"2023013110284885400_B2","doi-asserted-by":"crossref","first-page":"1653","DOI":"10.1110\/ps.062095806","article-title":"A composite score for predicting errors in protein structure models","volume":"15","author":"Eramian","year":"2006","journal-title":"Protein Sci."},{"key":"2023013110284885400_B3","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1186\/1471-2105-6-301","article-title":"A decoy set for the thermostable subdomain from chicken villin headpiece. Comparison of different free energy estimators","volume":"6","author":"Fogolari","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023013110284885400_B4","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1002\/prot.10429","article-title":"Optimizing physical energy functions for protein folding","volume":"55","author":"Fujitsuka","year":"2004","journal-title":"Proteins"},{"key":"2023013110284885400_B5","doi-asserted-by":"crossref","first-page":"1015","DOI":"10.1093\/bioinformatics\/btg124","article-title":"3D-Jury: a simple approach to improve protein structure prediction","volume":"19","author":"Ginalski","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013110284885400_B6","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1002\/prot.21308","article-title":"Convergence of molecular dynamics simulations of membrane proteins","volume":"67","author":"Grossfield","year":"2007","journal-title":"Proteins"},{"key":"2023013110284885400_B7","doi-asserted-by":"crossref","first-page":"031910","DOI":"10.1103\/PhysRevE.65.031910","article-title":"Convergence and sampling in protein simulations","volume":"65","author":"Hess","year":"2002","journal-title":"Phys. Rev. E"},{"key":"2023013110284885400_B8","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1002\/prot.20133","article-title":"Physical scoring function based on AMBER force field and Poisson-Boltzmann implicit solvent for protein structure prediction","volume":"56","author":"Hsieh","year":"2004","journal-title":"Proteins"},{"key":"2023013110284885400_B9","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/bth369","article-title":"Developing optimal non-linear scoring function for protein design","volume":"20","author":"Hu","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110284885400_B10","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1002\/prot.10613","article-title":"A hierarchical approach to all-atom protein loop prediction","volume":"55","author":"Jacobson","year":"2004","journal-title":"Proteins"},{"key":"2023013110284885400_B11","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1002\/prot.10444","article-title":"How well can we predict native contacts in proteins based on decoy structures and their energies?","volume":"52","author":"Jiang","year":"2003","journal-title":"Proteins"},{"key":"2023013110284885400_B12","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1016\/S0022-2836(03)00323-1","article-title":"A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics","volume":"329","author":"Keasar","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023013110284885400_B13","doi-asserted-by":"crossref","first-page":"1540","DOI":"10.1093\/bioinformatics\/btg186","article-title":"Development of a four-body statistical pseudo-potential to discriminate native from non-native protein conformations","volume":"19","author":"Krishnamoorthy","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013110284885400_B14","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1107\/S0021889892009944","article-title":"Procheck: a program to check the stereochemical quality of protein structures","volume":"26","author":"Laskowski","year":"1993","journal-title":"J. Appl. Cryst."},{"key":"2023013110284885400_B15","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1006\/jmbi.1999.2685","article-title":"Discrimination of the native from misfolded protein models with an energy function including implicit solvation","volume":"288","author":"Lazaridis","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023013110284885400_B16","doi-asserted-by":"crossref","first-page":"620","DOI":"10.1002\/prot.10470","article-title":"Distinguish protein decoys by using a scoring function based on a new AMBER force field, short molecular dynamics simulations, and the generalized born solvent model","volume":"55","author":"Lee","year":"2004","journal-title":"Proteins"},{"key":"2023013110284885400_B17","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-68372-0_3","article-title":"Knowledge-based energy functions for computational studies of proteins","volume-title":"Computational Methods for Protein Structure Prediction and Modeling, Volume 1: Basic Characterization.","author":"Li","year":"2007"},{"key":"2023013110284885400_B18","doi-asserted-by":"crossref","first-page":"2955","DOI":"10.1110\/ps.051681605","article-title":"A statistical approach to the interpretation of molecular dynamics simulations of calmodulin equilibrium dynamics","volume":"14","author":"Likic","year":"2005","journal-title":"Protein Sci."},{"key":"2023013110284885400_B19","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1002\/prot.1087","article-title":"A distance-dependent atomic knowledge-based potential for improved protein structure selection","volume":"44","author":"Lu","year":"2001","journal-title":"Proteins"},{"key":"2023013110284885400_B20","doi-asserted-by":"crossref","first-page":"2354","DOI":"10.1110\/ps.08501","article-title":"Pcons: a neural-network-based consensus predictor that improves fold recognition","volume":"10","author":"Lundstrom","year":"2001","journal-title":"Protein Sci."},{"key":"2023013110284885400_B21","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1038\/356083a0","article-title":"Assessment of protein models with three-dimensional profiles","volume":"356","author":"Luthy","year":"1992","journal-title":"Nature"},{"key":"2023013110284885400_B22","doi-asserted-by":"crossref","first-page":"12876","DOI":"10.1021\/jp073061t","article-title":"On the structural convergence of biomolecular simulations by determination of effective sample size","volume":"111","author":"Lyman","year":"2007","journal-title":"J. Phys. Chem. B"},{"key":"2023013110284885400_B23","doi-asserted-by":"crossref","first-page":"3215","DOI":"10.1073\/pnas.0535768100","article-title":"Discrimination of native protein structures using atom-atom contact scoring","volume":"100","author":"McConkey","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110284885400_B24","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1186\/1471-2105-8-345","article-title":"Benchmarking consensus model quality assessment for protein fold recognition","volume":"8","author":"McGuffin","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013110284885400_B25","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1002\/prot.21767","article-title":"Critical assessment of methods of protein structure prediction \u2014 round VII","volume":"69","author":"Moult","year":"2007","journal-title":"Proteins"},{"key":"2023013110284885400_B26","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1002\/prot.22262","article-title":"Model quality assessment using distance constraints from alignments","volume":"75","author":"Paluszewski","year":"2008","journal-title":"Proteins"},{"key":"2023013110284885400_B27","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1006\/jmbi.1996.0256","article-title":"Energy functions that discriminate X-ray and near native folds from well-constructed decoys","volume":"258","author":"Park","year":"1996","journal-title":"J. Mol. Biol."},{"key":"2023013110284885400_B28","doi-asserted-by":"crossref","first-page":"3509","DOI":"10.1093\/bioinformatics\/bti540","article-title":"Improving sequence-based fold recognition by using 3D model quality assessment","volume":"21","author":"Pettitt","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110284885400_B29","author":"Ponder","year":"2004","journal-title":"TINKER: Software tools for molecular design 4.2."},{"key":"2023013110284885400_B30","doi-asserted-by":"crossref","first-page":"1399","DOI":"10.1110\/ps.9.7.1399","article-title":"Decoys \u2018R\u2019 Us: a database of incorrect protein conformations to improve protein structure prediction","volume":"9","author":"Samudrala","year":"2000","journal-title":"Protein Sci."},{"key":"2023013110284885400_B31","doi-asserted-by":"crossref","first-page":"11158","DOI":"10.1073\/pnas.95.19.11158","article-title":"Clustering of low-energy conformations near the native structures of small proteins","volume":"95","author":"Shortle","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110284885400_B32","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1006\/jmbi.1997.0959","article-title":"Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions","volume":"268","author":"Simons","year":"1997","journal-title":"J. Mol. Biol."},{"key":"2023013110284885400_B33","volume-title":"Multiple Criteria Optimization. Theory, Computation, and Application.","author":"Steuer","year":"1986"},{"key":"2023013110284885400_B34","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/S0022-2836(03)00622-3","article-title":"Predicting reliable regions in protein alignments from sequence profiles","volume":"330","author":"Tress","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023013110284885400_B35","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1002\/prot.10454","article-title":"An improved protein decoy set for testing energy functions for protein structure prediction","volume":"53","author":"Tsai","year":"2003","journal-title":"Proteins"},{"key":"2023013110284885400_B36","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/1472-6807-7-12","article-title":"Protein structure prediction by all-atom free-energy refinement","volume":"7","author":"Verma","year":"2007","journal-title":"BMC Struct. Biol."},{"key":"2023013110284885400_B37","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1186\/1472-6807-4-8","article-title":"Improved protein structure selection using decoy-dependent discriminatory functions","volume":"4","author":"Wang","year":"2004","journal-title":"BMC Struct. Biol."},{"key":"2023013110284885400_B38","doi-asserted-by":"crossref","first-page":"709","DOI":"10.1073\/pnas.92.3.709","article-title":"Discriminating compact nonnative structures from the native structure of globular proteins","volume":"92","author":"Wang","year":"1995","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110284885400_B39","doi-asserted-by":"crossref","first-page":"2059","DOI":"10.1002\/jcc.20720","article-title":"Can a physics-based, all-atom potential find a protein's native structure among misfolded structures? - large scale AMBER benchmarking","volume":"28","author":"Wroblewska","year":"2007","journal-title":"J. Comp. Chem."},{"key":"2023013110284885400_B40","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1002\/prot.20035","article-title":"GEMDOCK: a generic evolutionary method for molecular docking","volume":"55","author":"Yang","year":"2004","journal-title":"Proteins"},{"key":"2023013110284885400_B41","doi-asserted-by":"crossref","first-page":"3370","DOI":"10.1093\/nar\/gkg571","article-title":"LGA: a method for finding 3D similarities in protein structure prediction","volume":"31","author":"Zemla","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023013110284885400_B42","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1110\/ps.03348304","article-title":"An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state","volume":"13","author":"Zhang","year":"2004","journal-title":"Protein Sci."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/10\/1271\/48989289\/bioinformatics_25_10_1271.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/10\/1271\/48989289\/bioinformatics_25_10_1271.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,11]],"date-time":"2024-03-11T21:47:47Z","timestamp":1710193667000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/10\/1271\/270281"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,17]]},"references-count":42,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2009,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp150","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,5,15]]},"published":{"date-parts":[[2009,3,17]]}}}