{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,16]],"date-time":"2025-06-16T11:48:51Z","timestamp":1750074531647},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: \u00a0In silico methods to classify compounds as potential drugs that bind to a specific target become increasingly important for drug design. To build classification devices training sets of drugs with known activities are needed. For many such classification problems, not only qualitative but also quantitative information of a specific property (e.g. binding affinity) is available. The latter can be used to build a regression scheme to predict this property for new compounds. Predicting a compound property explicitly is generally more difficult than classifying that the property lies below or above a given threshold value. Hence, an indirect classification that is based on regression may lead to poorer results than a direct classification scheme. In fact, initially researchers are only interested to classify compounds as potential drugs. The activities of these compounds are subsequently measured in wet lab.<\/jats:p>\n               <jats:p>Results: We propose a novel approach that uses available quantitative information directly for classification rather than first using a regression scheme. It uses a new type of loss function called weighted biased regression. Application of this method to four widely studied datasets of the CoEPrA contest (Comparative Evaluation of Prediction Algorithms, http:\/\/coepra.org) shows that it can outperform simple classification methods that do not make use of this additional quantitative information.<\/jats:p>\n               <jats:p>Availability: A stand alone application is available at the webpage http:\/\/agknapp.chemie.fu-berlin.de\/agknapp\/index.php?menu=software&amp;page=PeptideClassifier that can be used to build a model for a peptide training set to be submitted.<\/jats:p>\n               <jats:p>Contact: \u00a0odemir@chemie.fu-berlin.de<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq021","type":"journal-article","created":{"date-parts":[[2010,1,24]],"date-time":"2010-01-24T01:24:07Z","timestamp":1264296247000},"page":"603-609","source":"Crossref","is-referenced-by-count":8,"title":["Exploring classification strategies with the CoEPrA 2006 contest"],"prefix":"10.1093","volume":"26","author":[{"given":"Ozgur","family":"Demir-Kavuk","sequence":"first","affiliation":[{"name":"Institute of Chemistry and Biochemistry, Free University of Berlin, Fabeckstrasse 36A, 14195 Berlin, Germany"}]},{"given":"Henning","family":"Riedesel","sequence":"additional","affiliation":[{"name":"Institute of Chemistry and Biochemistry, Free University of Berlin, Fabeckstrasse 36A, 14195 Berlin, Germany"}]},{"given":"Ernst-Walter","family":"Knapp","sequence":"additional","affiliation":[{"name":"Institute of Chemistry and Biochemistry, Free University of Berlin, Fabeckstrasse 36A, 14195 Berlin, Germany"}]}],"member":"286","published-online":{"date-parts":[[2010,1,22]]},"reference":[{"key":"2023012511004888600_B1","first-page":"35","article-title":"Ant colony optimization for feature subset selection","volume":"4","author":"Al-Ani","year":"2005","journal-title":"World Acad. Sci. Eng. Technol."},{"key":"2023012511004888600_B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012511004888600_B3","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1093\/bioinformatics\/16.5.412","article-title":"Assessing the accuracy of prediction algorithms for classification: an overview","volume":"16","author":"Baldi","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012511004888600_B4","volume-title":"Numerical Linear Algebra.","author":"Bau","year":"1997"},{"key":"2023012511004888600_B5","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1145\/130385.130401","article-title":"A training algorithm for optimal margin classifiers","volume-title":"Proceedings of the fifth annual workshop on Computational learning theory.","author":"Boser","year":"1992"},{"key":"2023012511004888600_B6","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023012511004888600_B7","volume-title":"Classification and Regression Trees.","author":"Breiman","year":"1984"},{"key":"2023012511004888600_B8","doi-asserted-by":"crossref","first-page":"2597","DOI":"10.1093\/bioinformatics\/btl458","article-title":"Weighted quality estimates in machine learning","volume":"22","author":"Budagyan","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012511004888600_B9","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1023\/A:1009715923555","article-title":"A tutorial on support vector machines for pattern recognition","volume":"2","author":"Burges","year":"1998","journal-title":"Data Min. Knowl. Disc."},{"key":"2023012511004888600_B10","doi-asserted-by":"crossref","first-page":"3572","DOI":"10.1021\/jm010021j","article-title":"Toward the quantitative prediction of T-cell epitopes: coMFA and coMSIA studies of peptides with affinity for the class I MHC molecule HLA-A*0201","volume":"44","author":"Doytchinova","year":"2001","journal-title":"J. Med. Chem."},{"key":"2023012511004888600_B11","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1002\/prot.10154","article-title":"Physicochemical explanation of peptide binding to HLA-A*0201 major histocompatibility complex: a three-dimensional quantitative structure-activity relationship study","volume":"48","author":"Doytchinova","year":"2002","journal-title":"Prot. Struct. Funct. Genet."},{"key":"2023012511004888600_B12","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1007\/s10822-005-3993-x","article-title":"Towards the chemometric dissection of peptide-HLA-A*0201 binding affinity: comparison of local and global QSAR models","volume":"19","author":"Doytchinova","year":"2005","journal-title":"J. Comput. Aided Mol. Des."},{"key":"2023012511004888600_B13","volume-title":"Pattern Classification","author":"Duda","year":"2005","edition":"2nd"},{"key":"2023012511004888600_B14","first-page":"R27","volume-title":"Meeting review: the Second meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP2)","author":"Dunbrack","year":"1997"},{"key":"2023012511004888600_B15","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","article-title":"The use of multiple measurements in taxonomic problems","volume":"7","author":"Fisher","year":"1936","journal-title":"Ann. Eugenics"},{"key":"2023012511004888600_B16","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1145\/1007730.1007736","article-title":"Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach","volume":"6","author":"Guo","year":"2004","journal-title":"ACM SIGKDD Explorations Newsletter"},{"key":"2023012511004888600_B17","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res. Arch."},{"key":"2023012511004888600_B18","doi-asserted-by":"crossref","first-page":"3274","DOI":"10.1039\/b409656h","article-title":"New horizons in mouse immunoinformatics: reliable in silico prediction of mouse class I histocompatibility major complex peptide binding affinity","volume":"2","author":"Hattotuwagama","year":"2004","journal-title":"Org. Biomol. Chem."},{"key":"2023012511004888600_B19","volume-title":"Neural Networks. A Comprehensive Foundation.","author":"Haykin","year":"1998"},{"key":"2023012511004888600_B20","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511004888600_B21","doi-asserted-by":"crossref","DOI":"10.1002\/0471725331","volume-title":"A User's Guide to Principal Components.","author":"Jackson","year":"1991"},{"key":"2023012511004888600_B22","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/nar\/28.1.374","article-title":"AAindex: amino acid index database","volume":"28","author":"Kawashima","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012511004888600_B23","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1093\/nar\/27.1.368","article-title":"AAindex: amino acid index database","volume":"27","author":"Kawashima","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023012511004888600_B24","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1038\/ni1173","article-title":"How T cells \u2018see\u2019 antigen","volume":"6","author":"Krogsgaard","year":"2005","journal-title":"Nat. Immunol."},{"key":"2023012511004888600_B25","doi-asserted-by":"crossref","first-page":"36422","DOI":"10.1074\/jbc.274.51.36422","article-title":"Poor binding of a HER-2\/neu epitope (GP2) to HLA-A2.1 is due to a lack of interactions with the center of the peptide","volume":"274","author":"Kuhns","year":"1999","journal-title":"J. Biol. Chem."},{"key":"2023012511004888600_B26","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/JRPROC.1961.287775","article-title":"Steps toward artificial intelligence","volume":"49","author":"Minsky","year":"1961","journal-title":"Proc. IRE"},{"key":"2023012511004888600_B27","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1007\/s10822-007-9108-0","article-title":"kScore: a novel machine learning approach that is not dependent on the data structure of the training set","volume":"21","author":"Oloff","year":"2007","journal-title":"J. Comput. Aided Mol. Des."},{"key":"2023012511004888600_B28","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1080\/14786440109462720","article-title":"On lines and planes of closest fit to systems of points in space","volume":"2","author":"Pearson","year":"1901","journal-title":"Philos. Mag."},{"key":"2023012511004888600_B29","first-page":"198","article-title":"Peptide binding at class I major histocompatibility complex scored with linear functions and support vector machines","volume":"15","author":"Riedesel","year":"2004","journal-title":"Genome Inform."},{"key":"2023012511004888600_B30","first-page":"200","article-title":"Genetic Algorithms as a Tool for Feature Selection in Machine Learning","volume-title":"Proceedings of the 1992 IEEE Int. Conf. on Tools with AI.","author":"Vafaie","year":"1992"},{"key":"2023012511004888600_B31","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The Nature of Statistical Learning Theory.","author":"Vapnik","year":"1995"},{"key":"2023012511004888600_B32","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1093\/bioinformatics\/18.2.325","article-title":"Tclass: tumor classification system based on gene expression profile","volume":"18","author":"Wuju","year":"2002","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/5\/603\/48860779\/bioinformatics_26_5_603.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/5\/603\/48860779\/bioinformatics_26_5_603.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T11:04:19Z","timestamp":1674644659000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/5\/603\/213641"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,22]]},"references-count":32,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2010,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq021","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,3,1]]},"published":{"date-parts":[[2010,1,22]]}}}