{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T01:16:43Z","timestamp":1775697403695,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2009,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further, the default assumption of equal misclassification costs is most likely violated in medical diagnosis.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In this paper, we present a modified random forest classifier which is incorporated into the conformal predictor scheme. A conformal predictor is a transductive learning scheme, using Kolmogorov complexity to test the randomness of a particular sample with respect to the training sets. Our method show well-calibrated property that the performance can be set prior to classification and the accurate rate is exactly equal to the predefined confidence level. Further, to address the cost sensitive problem, we extend our method to a label-conditional predictor which takes into account different costs for misclassifications in different class and allows different confidence level to be specified for each class. Intensive experiments on benchmark datasets and real world applications show the resultant classifier is well-calibrated and able to control the specific risk of different class.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The method of using RF outlier measure to design a nonconformity measure benefits the resultant predictor. Further, a label-conditional classifier is developed and turn to be an alternative approach to the cost sensitive learning problem that relies on label-wise predefined confidence level. The target of minimizing the risk of misclassification is achieved by specifying the different confidence level for different class.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-10-s1-s22","type":"journal-article","created":{"date-parts":[[2009,1,30]],"date-time":"2009-01-30T20:04:59Z","timestamp":1233345899000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":93,"title":["Using random forest for reliable classification and cost-sensitive learning for medical diagnosis"],"prefix":"10.1186","volume":"10","author":[{"given":"Fan","family":"Yang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hua-zhen","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hong","family":"Mi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cheng-de","family":"Lin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei-wen","family":"Cai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2009,1,30]]},"reference":[{"issue":"Suppl 1","key":"3205_CR1","doi-asserted-by":"publisher","first-page":"S13","DOI":"10.1186\/1471-2164-9-S1-S13","volume":"9","author":"M Pirooznia","year":"2008","unstructured":"Pirooznia M, Yang JY, Yang MQ, Deng YP: A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics. 2008, 9 (Suppl 1): S13-","journal-title":"BMC Genomics"},{"key":"3205_CR2","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1016\/S0304-3975(02)00100-7","volume":"287","author":"A Gammerman","year":"2002","unstructured":"Gammerman A, Vovk V: Prediction algorithms and confidence measures based on algorithmic randomness theory. Theoretical Computer Science. 2002, 287: 209-217.","journal-title":"Theoretical Computer Science"},{"key":"3205_CR3","volume-title":"Algorithmic learning in a random world","author":"V Vovk","year":"2005","unstructured":"Vovk V, Gammerman A, Shafer G: Algorithmic learning in a random world. 2005, Springer, New York"},{"key":"3205_CR4","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1093\/comjnl\/bxl065","volume":"50","author":"A Gammerman","year":"2007","unstructured":"Gammerman A, Vovk V: Hedging predictions in machine learning. Computer Journal. 2007, 50: 151-177.","journal-title":"Computer Journal"},{"key":"3205_CR5","first-page":"371","volume":"9","author":"G Shafer","year":"2007","unstructured":"Shafer G, Vovk V: A tutorial on conformal prediction. J Mach Learn Res. 2007, 9: 371-421.","journal-title":"J Mach Learn Res"},{"key":"3205_CR6","first-page":"973","volume-title":"Proceedings of the Seventeenth International Joint Conference of Artificial Intelligence","author":"C Elkan","year":"2001","unstructured":"Elkan C: The foundations of cost-sensitive learning. Proceedings of the Seventeenth International Joint Conference of Artificial Intelligence. 2001, Morgan Kaufmann, Seattle, Washington, 973-978."},{"key":"3205_CR7","first-page":"575","volume":"5","author":"V Vovk","year":"2004","unstructured":"Vovk V: A Universal Well-Calibrated Algorithm for On-line Classification. J Mach Learn Res. 2004, 5: 575-604.","journal-title":"J Mach Learn Res"},{"key":"3205_CR8","doi-asserted-by":"publisher","first-page":"310","DOI":"10.1007\/978-3-540-73499-4_24","volume-title":"Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition","author":"V Stijn","year":"2007","unstructured":"Stijn V, Laurens VDM, Ida SK: Off-line learning with transductive confidence machines: an empirical evaluation. Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition. Edited by: Petra Perner, LNAI. 2007, Leipzig, Germany. Springer Press, 4571: 310-323."},{"issue":"4","key":"3205_CR9","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1142\/S012906570500027X","volume":"15","author":"B Tony","year":"2005","unstructured":"Tony B, Zhiyuan L, Gammerman A, Frederick VD, Vaskar S: Qualified predictions for microarray and proteomics pattern diagnostics with confidence machines. International Journal of Neural Systems. 2005, 15 (4): 247-258.","journal-title":"International Journal of Neural Systems"},{"key":"3205_CR10","first-page":"148","volume-title":"Proceedings of IEEE International Conference on Granular Computing, Atlanta, USA","author":"T Bellotti","year":"2006","unstructured":"Bellotti T, Zhiyuan L, Gammerman A: Reliable classification of childhood acute leukaemia from gene expression data using Confidence Machines. Proceedings of IEEE International Conference on Granular Computing, Atlanta, USA. 2006, 148-153."},{"key":"3205_CR11","first-page":"381","volume-title":"Proceedings of the 13th European Conference on Machine Learning","author":"K Proedrou","year":"2002","unstructured":"Proedrou K, Nouretdinov I, Vovk V, Gammerman A: Transductive confidence machines for pattern recognition. Proceedings of the 13th European Conference on Machine Learning. 2002, 381-390."},{"issue":"2","key":"3205_CR12","first-page":"123","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L: Bagging Predictors. Mach Learn. 1996, 24 (2): 123-140.","journal-title":"Mach Learn"},{"issue":"1","key":"3205_CR13","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L: Random forests. Mach Learn. 2001, 45 (1): 5-32.","journal-title":"Mach Learn"},{"key":"3205_CR14","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/1471-2105-7-3","volume":"7","author":"UR Diaz","year":"2006","unstructured":"Diaz UR, Alvarez AS: Gene Selection and Classification of Microarray Data Using Random Forest. BMC Bioinformatics. 2006, 7: 3-","journal-title":"BMC Bioinformatics"},{"key":"3205_CR15","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1186\/1471-2105-9-307","volume":"9","author":"C Strobl","year":"2008","unstructured":"Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A: Conditional variable importance for random forests. BMC Bioinformatics. 2008, 9: 307-","journal-title":"BMC Bioinformatics"},{"key":"3205_CR16","first-page":"15","volume-title":"Workshop on Cost-Sensitive Learning at ICML","author":"P Turney","year":"2000","unstructured":"Turney P: Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML. 2000, Stanford University, California, 15-21."},{"key":"3205_CR17","first-page":"567","volume-title":"Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA","author":"ZH Zhou","year":"2006","unstructured":"Zhou ZH, Liu XY: On multi-class cost-sensitive learning. Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA. 2006, 567-572."},{"key":"3205_CR18","doi-asserted-by":"publisher","first-page":"204","DOI":"10.1145\/502512.502540","volume-title":"Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining","author":"B Zadrozny","year":"2001","unstructured":"Zadrozny B, Elkan C: Learning and making decisions when costs and probabilities are both unknown. Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining. 2001, ACM Press, 204-213."},{"key":"3205_CR19","unstructured":"UCI Machine Learning Repository. [http:\/\/archive.ics.uci.edu\/ml\/]"},{"issue":"2","key":"3205_CR20","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1016\/S1535-6108(02)00032-6","volume":"1","author":"EJ Yeoh","year":"2002","unstructured":"Yeoh EJ, Ross ME, Shurtleff SA: Classification subtype discovery and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002, 1 (2): 133-143.","journal-title":"Cancer Cell"},{"key":"3205_CR21","volume-title":"Data Analysis Tools for DNA Microarrays","author":"D Sorin","year":"2003","unstructured":"Sorin D: Data Analysis Tools for DNA Microarrays. 2003, Chapman&Hall\/CRC, London"},{"key":"3205_CR22","unstructured":"Thyroid Disease Database. [ftp:\/\/ftp.ics.uci.edu\/pub\/machine-learning-databases\/thyroid-disease\/]"},{"key":"3205_CR23","unstructured":"Chronic Gastritis Dataset. [http:\/\/59.77.15.238\/APBC_paper]"},{"issue":"3","key":"3205_CR24","first-page":"70","volume":"20","author":"HZ Niu","year":"2001","unstructured":"Niu HZ, Wang RX, Lan SM, Xu WL: hinking and approaches on treatment of chronic gastritis with integration of traditional Chinese and western medicine. Shandong Journal of Traditional Chinese Medicine. 2001, 20 (3): 70-72.","journal-title":"Shandong Journal of Traditional Chinese Medicine"},{"key":"3205_CR25","doi-asserted-by":"crossref","first-page":"77","DOI":"10.4137\/CIN.S408","volume":"6","author":"AL Boulesteix","year":"2008","unstructured":"Boulesteix AL, Strobl C, Augustin T, Daumer M: Evaluating microarray-based classifiers: an overview. Cancer Informatics. 2008, 6: 77-97.","journal-title":"Cancer Informatics"},{"key":"3205_CR26","first-page":"531","volume":"10","author":"Y Qi","year":"2005","unstructured":"Qi Y, Klein SJ, Bar JZ: Random forest similarity for protein-protein interaction prediction from multiple sources. Pacific Symposium on Biocomputing. 2005, 10: 531-542.","journal-title":"Pacific Symposium on Biocomputing"},{"key":"3205_CR27","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1145\/312129.312220","volume-title":"Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining","author":"P Domingos","year":"1999","unstructured":"Domingos P: MetaCost: A general method for making classifiers cost-sensitive. Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining. 1999, New York. ACM Press, 155-164."},{"issue":"1","key":"3205_CR28","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1007\/s10994-006-8199-5","volume":"65","author":"D Chris","year":"2006","unstructured":"Chris D, Robert CH: Cost curves: An improved method for visualizing classifier performance. Machine Learning. 2006, 65 (1): 95-130.","journal-title":"Machine Learning"},{"key":"3205_CR29","unstructured":"Vovk V, Lindsay D, Nouretdinov I, Gammerman A: Mondrian Confidence Machine. Technical Report. Computer Learning Research Centre, Royal Holloway, University of London"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-10-S1-S22.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T02:54:55Z","timestamp":1630464895000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-10-S1-S22"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,1]]},"references-count":29,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2009,1]]}},"alternative-id":["3205"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-10-s1-s22","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,1]]},"assertion":[{"value":"30 January 2009","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S22"}}