{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:52:23Z","timestamp":1777456343839,"version":"3.51.4"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,5,13]],"date-time":"2020-05-13T00:00:00Z","timestamp":1589328000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,6,30]]},"abstract":"<jats:p>Classifier calibration does not always go hand in hand with the classifier\u2019s ability to separate the classes. There are applications where good classifier calibration, i.e., the ability to produce accurate probability estimates, is more important than class separation. When the amount of data for training is limited, the traditional approach to improve calibration starts to crumble. In this article, we show how generating more data for calibration is able to improve calibration algorithm performance in many cases where a classifier is not naturally producing well-calibrated outputs and the traditional approach fails. The proposed approach adds computational cost but considering that the main use case is with small datasets this extra computational cost stays insignificant and is comparable to other methods in prediction time. From the tested classifiers, the largest improvement was detected with the random forest and naive Bayes classifiers. Therefore, the proposed approach can be recommended at least for those classifiers when the amount of data available for training is limited and good calibration is essential.<\/jats:p>","DOI":"10.1145\/3385656","type":"journal-article","created":{"date-parts":[[2020,5,19]],"date-time":"2020-05-19T10:42:16Z","timestamp":1589884936000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Better Classifier Calibration for Small Datasets"],"prefix":"10.1145","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9330-9995","authenticated-orcid":false,"given":"Alasalmi","family":"Tuomo","sequence":"first","affiliation":[{"name":"University of Oulu, Finland"}]},{"given":"Jaakko","family":"Suutala","sequence":"additional","affiliation":[{"name":"University of Oulu, Finland"}]},{"given":"Juha","family":"R\u00f6ning","sequence":"additional","affiliation":[{"name":"University of Oulu, Finland"}]},{"given":"Heli","family":"Koskim\u00e4ki","sequence":"additional","affiliation":[{"name":"Oura Health Ltd., Oulu, Finland"}]}],"member":"320","published-online":{"date-parts":[[2020,5,13]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5220\/0006576003790386"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976699300016007"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA.2008.107"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision, Whistler.","author":"Caputo Barbara","year":"2002","unstructured":"Barbara Caputo , K. Sim , F. Furesjo , and Alex Smola . 2002 . Appearance-based object recognition using SVMs: Which kernel should I use? In Proceedings of the NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision, Whistler. Barbara Caputo, K. Sim, F. Furesjo, and Alex Smola. 2002. Appearance-based object recognition using SVMs: Which kernel should I use? In Proceedings of the NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision, Whistler."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2697065"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-017-1736-3"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1982.10477856"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976698300017197"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007413511361"},{"key":"e_1_2_1_10_1","unstructured":"Dheeru Dua and Casey Graff. 2019. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml.  Dheeru Dua and Casey Graff. 2019. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1118\/1.2786864"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1214\/08-AOAS191"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236009"},{"key":"e_1_2_1_14_1","first-page":"16","article-title":"Smooth isotonic regression: A new method to calibrate predictive models","volume":"2011","author":"Jiang Xiaoqian","year":"2011","unstructured":"Xiaoqian Jiang , Melanie Osl , Jihoon Kim , and Lucila Ohno-Machado . 2011 . Smooth isotonic regression: A new method to calibrate predictive models . AMIA Joint Summits on Translational Science Proceedings 2011 (2011), 16 -- 20 . Xiaoqian Jiang, Melanie Osl, Jihoon Kim, and Lucila Ohno-Machado. 2011. Smooth isotonic regression: A new method to calibrate predictive models. AMIA Joint Summits on Translational Science Proceedings 2011 (2011), 16--20.","journal-title":"AMIA Joint Summits on Translational Science Proceedings"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1136\/amiajnl-2011-000291"},{"key":"e_1_2_1_16_1","volume-title":"Applied Predictive Modeling","author":"Kuhn Max","unstructured":"Max Kuhn and Kjell Johnson . 2013. Applied Predictive Modeling . Vol. 26 . Springer . Max Kuhn and Kjell Johnson. 2013. Applied Predictive Modeling. Vol. 26. Springer."},{"key":"#cr-split#-e_1_2_1_17_1.1","doi-asserted-by":"crossref","unstructured":"Meelis Kull and Peter Flach. 2015. Novel decompositions of proper scoring rules for classification: Score adjustment as precursor to calibration. In Machine Learning and Knowledge Discovery in Databases (Lecture Notes in Computer Science) Annalisa Appice Pedro Pereira Rodrigues V\u00edtor Santos Costa Carlos Soares Jo\u00e3o Gama and Al\u00edpio Jorge (Eds.) Vol. 9284. Springer International Publishing 1--16. DOI:https:\/\/doi.org\/10.1007\/978-3-319-23528-8 10.1007\/978-3-319-23528-8","DOI":"10.1007\/978-3-319-23528-8"},{"key":"#cr-split#-e_1_2_1_17_1.2","doi-asserted-by":"crossref","unstructured":"Meelis Kull and Peter Flach. 2015. Novel decompositions of proper scoring rules for classification: Score adjustment as precursor to calibration. In Machine Learning and Knowledge Discovery in Databases (Lecture Notes in Computer Science) Annalisa Appice Pedro Pereira Rodrigues V\u00edtor Santos Costa Carlos Soares Jo\u00e3o Gama and Al\u00edpio Jorge (Eds.) Vol. 9284. Springer International Publishing 1--16. DOI:https:\/\/doi.org\/10.1007\/978-3-319-23528-8","DOI":"10.1007\/978-3-319-23528-8_5"},{"key":"e_1_2_1_18_1","first-page":"1679","article-title":"Assesing approximate inference for binary Gaussian process classification","author":"Kuss Malte","year":"2005","unstructured":"Malte Kuss and Carl Edward Rasmussen . 2005 . Assesing approximate inference for binary Gaussian process classification . Journal of Machine Learning Research 6 , Oct (2005), 1679 -- 1704 . Malte Kuss and Carl Edward Rasmussen. 2005. Assesing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research 6, Oct (2005), 1679--1704.","journal-title":"Journal of Machine Learning Research 6"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1021\/ci4000213"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-017-1133-2"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611974010.24"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence. 2901--2907","author":"Naeini Mahdi Pakdaman","year":"2015","unstructured":"Mahdi Pakdaman Naeini , Gregory F. Cooper , and Milos Hauskrecht . 2015 . Obtaining well calibrated probabilities using Bayesian binning . In Proceedings of the AAAI Conference on Artificial Intelligence. 2901--2907 . Mahdi Pakdaman Naeini, Gregory F. Cooper, and Milos Hauskrecht. 2015. Obtaining well calibrated probabilities using Bayesian binning. In Proceedings of the AAAI Conference on Artificial Intelligence. 2901--2907."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102430"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. 413--420","author":"Niculescu-Mizil Alexandru","unstructured":"Alexandru Niculescu-Mizil and Richard A. Caruana . 2005. Obtaining calibrated probabilities from boosting . In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. 413--420 . Alexandru Niculescu-Mizil and Richard A. Caruana. 2005. Obtaining calibrated probabilities from boosting. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. 413--420."},{"key":"e_1_2_1_25_1","volume-title":"Advances in Large Margin Classifiers","author":"Platt John C.","year":"1999","unstructured":"John C. Platt . 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods . In Advances in Large Margin Classifiers ( 1999 ). John C. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers (1999)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1198\/TECH.2010.10111"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/34.1-2.1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.735807"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2008.07.018"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/502512.502540"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc.","author":"Zadrozny Bianca","year":"2001","unstructured":"Bianca Zadrozny and Charles Elkan . 2001 . Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers . In Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. , San Francisco, CA, 609--616. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id=645530.655658. Bianca Zadrozny and Charles Elkan. 2001. Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, 609--616. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id=645530.655658."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775151"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3385656","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3385656","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:32:49Z","timestamp":1750199569000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3385656"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,13]]},"references-count":33,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,6,30]]}},"alternative-id":["10.1145\/3385656"],"URL":"https:\/\/doi.org\/10.1145\/3385656","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,13]]},"assertion":[{"value":"2019-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}