{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,10]],"date-time":"2024-08-10T00:19:32Z","timestamp":1723249172179},"reference-count":18,"publisher":"Oxford University Press (OUP)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: In the genomics setting, an increasingly common data configuration consists of a small set of sequences possessing a targeted property (positive instances) amongst a large set of sequences for which class membership is unknown (unlabeled instances). Traditional two-class classification methods do not effectively handle such data.<\/jats:p>\n               <jats:p>Results: Here, we develop a novel method, likely positive-iterative classification (LP-IC) for this problem, and contrast its performance with the few existing methods, most of which were devised and utilized in the text classification context. LP-IC employs an iterative classification scheme and introduces a class dispersion measure, adopted from unsupervised clustering approaches, to monitor the model selection process. Using two case studies\u2014prediction of HLA binding, and alternative splicing conservation between human and mouse\u2014we show that LP-IC provides superior performance to existing methodologies in terms of: (i) combined accuracy and precision in positive identification from the unlabeled set; and (ii) predictive performance of the resultant classifiers on independent test data.<\/jats:p>\n               <jats:p>Contact: \u00a0mark@biostat.ucsf.edu<\/jats:p>\n               <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn089","type":"journal-article","created":{"date-parts":[[2008,3,15]],"date-time":"2008-03-15T00:45:23Z","timestamp":1205541923000},"page":"1198-1205","source":"Crossref","is-referenced-by-count":11,"title":["Biological sequence classification utilizing positive and unlabeled data"],"prefix":"10.1093","volume":"24","author":[{"given":"Yuanyuan","family":"Xiao","sequence":"first","affiliation":[{"name":"Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, CA 94107, USA"}]},{"given":"Mark R.","family":"Segal","sequence":"additional","affiliation":[{"name":"Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, CA 94107, USA"}]}],"member":"286","published-online":{"date-parts":[[2008,3,14]]},"reference":[{"key":"2023020210021396800_B1","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1093\/biomet\/58.3.525","article-title":"Estimating the transition between two intersecting straight lines","volume":"58","author":"Bacon","year":"1971","journal-title":"Biometrika"},{"key":"2023020210021396800_B2","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1093\/nar\/24.1.242","article-title":"MHCPEP \u2013 a database of MHC-binding peptides: update 1995","volume":"24","author":"Brusic","year":"1996","journal-title":"Nucleic Acids Res"},{"key":"2023020210021396800_B3","volume-title":"Support Vector Machines.","author":"Christianini","year":"2000"},{"key":"2023020210021396800_B4","doi-asserted-by":"crossref","first-page":"3445","DOI":"10.1093\/nar\/22.17.3445","article-title":"The European Bioinformatics Institute (EBI) databases","volume":"22","author":"Emmert","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2023020210021396800_B5","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1214\/aos\/1016218223","article-title":"Additive logistic regression: a statistical view of boosting","volume":"28","author":"Friedman","year":"2000","journal-title":"Ann. Stat"},{"key":"2023020210021396800_B6","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1093\/bioinformatics\/btl620","article-title":"Genomic sweeping for hypermethylated genes","volume":"23","author":"Goh","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020210021396800_B7","doi-asserted-by":"crossref","first-page":"3621","DOI":"10.1093\/nar\/gkg510","article-title":"MHCPred: a server for quantitative prediction of peptide-MHC binding","volume":"31","author":"Guan","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023020210021396800_B8","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/BF01025492","article-title":"Statistical analysis of the physical properties of the 20 naturally occurring amino acids","volume":"4","author":"Kidera","year":"1985","journal-title":"J. Protein Chem"},{"key":"2023020210021396800_B9","article-title":"Learning to classify text using positive and unlabeled data","volume-title":"Proceedings of Eighteenth International Joint Conference on Artificial Intelligence.","author":"Li","year":"2003"},{"key":"2023020210021396800_B10","doi-asserted-by":"crossref","DOI":"10.1109\/ICDM.2003.1250918","article-title":"Building text classifiers using positive and unlabeled examples","volume-title":"Proceedings of the Third IEEE International Conference on Data Mining.","author":"Liu","year":"2003"},{"key":"2023020210021396800_B11","article-title":"Partially supervised classification of text documents","volume-title":"Proceedings of the Nineteenth International Conference on Machine Learning.","author":"Liu","year":"2002"},{"key":"2023020210021396800_B12","article-title":"A comparison of event models for naive bayes text classification","author":"McCallum","year":"1998","journal-title":"AAAI-98 Workshop on Learning for Text Categorization"},{"key":"2023020210021396800_B13","doi-asserted-by":"crossref","first-page":"163","DOI":"10.4049\/jimmunol.152.1.163","article-title":"Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains","volume":"152","author":"Parker","year":"1994","journal-title":"J. Immunol"},{"key":"2023020210021396800_B14","article-title":"Estimating the support of a high-dimensional distribution. Technical report","author":"Scholkopf","year":"1999"},{"key":"2023020210021396800_B15","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a dataset via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. Roy. Stat. Soc. Ser. B\u2013Stat. Method"},{"key":"2023020210021396800_B16","article-title":"Prediction of genomewide conserved epitope profiles of HIV-1: Classifier choice and peptide representation","volume":"4","author":"Xiao","year":"2005","journal-title":"Stat. Appl. Genetics Mol. Biol"},{"key":"2023020210021396800_B17","doi-asserted-by":"crossref","first-page":"2850","DOI":"10.1073\/pnas.0409742102","article-title":"Identification and analysis of alternative splicing events conserved in human and mouse","volume":"102","author":"Yeo","year":"2005","journal-title":"PNAS"},{"key":"2023020210021396800_B18","doi-asserted-by":"crossref","DOI":"10.1145\/775047.775083","article-title":"PEBL: positive example based learning for web page classification using SVM","volume-title":"Proceedigns of the ACM ISGKDD International Conference on Knowledge Discovery & Data Mining.","author":"Yu","year":"2002"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/9\/1198\/49046088\/bioinformatics_24_9_1198.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/9\/1198\/49046088\/bioinformatics_24_9_1198.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T11:24:56Z","timestamp":1675337096000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/9\/1198\/206902"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,3,14]]},"references-count":18,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2008,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn089","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,5,1]]},"published":{"date-parts":[[2008,3,14]]}}}