{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T20:19:37Z","timestamp":1774124377491,"version":"3.50.1"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2025,9,28]],"date-time":"2025-09-28T00:00:00Z","timestamp":1759017600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,9,28]],"date-time":"2025-09-28T00:00:00Z","timestamp":1759017600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In supervised machine learning, models are typically trained using data with hard labels, i.e., definite assignments of class membership. This traditional approach, however, does not take the inherent uncertainty in these labels into account. We investigate whether incorporating label uncertainty, represented for each instance as a discrete probability distribution over the class labels, known as a soft label, improves the predictive performance of classification models, focusing on tabular data. We first demonstrate the potential value of soft label learning (SLL) for estimating model parameters in a simulation experiment, particularly for limited sample sizes and imbalanced data. Subsequently, we compare the performance of various wrapper methods for learning from both hard and soft labels using identical base classifiers. On real-world-inspired synthetic data with clean labels, the SLL methods consistently outperform the hard label methods. Since real-world data is often noisy and precise soft labels are challenging to obtain, we study the effect that noisy probability estimates have on model performance. Alongside conventional noise models, our study examines four types of miscalibration that are known to affect human annotators. The results show that SLL methods outperform the hard label methods in the majority of settings. Finally, we evaluate the methods on a real-world dataset with confidence scores, where the SLL methods are shown to match the traditional methods for predicting the (noisy) hard labels while providing more accurate confidence estimates.<\/jats:p>","DOI":"10.1007\/s10994-025-06860-8","type":"journal-article","created":{"date-parts":[[2025,9,28]],"date-time":"2025-09-28T22:11:16Z","timestamp":1759097476000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Learning with confidence: training better classifiers from soft labels"],"prefix":"10.1007","volume":"114","author":[{"given":"Sjoerd","family":"de Vries","sequence":"first","affiliation":[]},{"given":"Dirk","family":"Thierens","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,9,28]]},"reference":[{"issue":"2\u20133","key":"6860_CR1","first-page":"255","volume":"17","author":"J Alcal\u00e1-Fdez","year":"2011","unstructured":"Alcal\u00e1-Fdez, J., Fern\u00e1ndez, A., Luengo, J., Derrac, J., Garc\u00eda, S., S\u00e1nchez, L., & Herrera, F. (2011). Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17(2\u20133), 255\u2013287.","journal-title":"Journal of Multiple-Valued Logic and Soft Computing"},{"key":"6860_CR2","unstructured":"Berthon, A., Han, B., Niu, G., Liu, T., & Sugiyama, M. (2021). Confidence scores make instance-dependent label-noise learning possible. International conference on machine learning, 825\u2013836."},{"key":"6860_CR3","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1023\/A:1018054314350","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123\u2013140.","journal-title":"Machine Learning"},{"issue":"1","key":"6860_CR4","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/j.inffus.2004.04.004","volume":"6","author":"G Brown","year":"2005","unstructured":"Brown, G., Wyatt, J., Harris, R., & Yao, X. (2005). Diversity creation methods: A survey and categorisation. Information Fusion, 6(1), 5\u201320.","journal-title":"Information Fusion"},{"issue":"3","key":"6860_CR5","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1016\/j.patcog.2008.07.014","volume":"42","author":"E C\u00f4me","year":"2009","unstructured":"C\u00f4me, E., Oukhellou, L., Denoeux, T., & Aknin, P. (2009). Learning from partially supervised data using mixture models and belief functions. Pattern Recognition, 42(3), 334\u2013348.","journal-title":"Pattern Recognition"},{"issue":"4","key":"6860_CR6","doi-asserted-by":"publisher","first-page":"547","DOI":"10.1016\/j.dss.2009.05.016","volume":"47","author":"P Cortez","year":"2009","unstructured":"Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4), 547\u2013553.","journal-title":"Decision Support Systems"},{"issue":"1","key":"6860_CR7","first-page":"20","volume":"28","author":"AP Dawid","year":"1979","unstructured":"Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer errorrates using the em algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 20\u201328.","journal-title":"Journal of the Royal Statistical Society: Series C (Applied Statistics)"},{"issue":"3","key":"6860_CR8","doi-asserted-by":"publisher","first-page":"409","DOI":"10.1016\/S0165-0114(00)00086-5","volume":"122","author":"T Den\u0153ux","year":"2001","unstructured":"Den\u0153ux, T., & Zouhal, L. M. (2001). Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets and Systems, 122(3), 409\u2013424.","journal-title":"Fuzzy Sets and Systems"},{"key":"6860_CR9","doi-asserted-by":"publisher","first-page":"105621","DOI":"10.1016\/j.compbiomed.2022.105621","volume":"146","author":"S de Vries","year":"2022","unstructured":"de Vries, S., Ten\u00a0Doesschate, T., Tott\u00e9, J. E., Heutz, J. W., Loeffen, Y. G., Oosterheert, J. J., Thierens, D., & Boel, E. (2022). A semi-supervised decision support system to facilitate antibiotic stewardship for urinary tract infections. Computers in Biology and Medicine, 146, 105621.","journal-title":"Computers in Biology and Medicine"},{"key":"6860_CR10","doi-asserted-by":"publisher","first-page":"106738","DOI":"10.1016\/j.knosys.2021.106738","volume":"215","author":"S de Vries","year":"2021","unstructured":"de Vries, S., & Thierens, D. (2021). A reliable ensemble based approach to semisupervised learning. Knowledge-Based Systems, 215, 106738.","journal-title":"Knowledge-Based Systems"},{"key":"6860_CR11","doi-asserted-by":"crossref","unstructured":"de Vries, S., & Thierens, D. (2024). Generating the ground truth: Synthetic data for soft label and label noise research. arXiv preprint arXiv:2309.04318.","DOI":"10.1007\/s41060-025-00786-z"},{"key":"6860_CR12","unstructured":"Dua, D., & Graff, C. (2017). UCI machine learning repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml."},{"key":"6860_CR13","doi-asserted-by":"crossref","unstructured":"Fornaciari, T., Uma, A., Paun, S., Plank, B., Hovy, D., & Poesio, M. (2021). Beyond black & white: Leveraging annotator disagreement via soft-label multi-task learning. In 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies.","DOI":"10.18653\/v1\/2021.naacl-main.204"},{"issue":"5","key":"6860_CR14","doi-asserted-by":"publisher","first-page":"845","DOI":"10.1109\/TNNLS.2013.2292894","volume":"25","author":"B Fr\u00e9nay","year":"2013","unstructured":"Fr\u00e9nay, B., & Verleysen, M. (2013). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845\u2013869.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"6","key":"6860_CR15","doi-asserted-by":"publisher","first-page":"2825","DOI":"10.1109\/TIP.2017.2689998","volume":"26","author":"B-B Gao","year":"2017","unstructured":"Gao, B.-B., Xing, C., Xie, C.-W., Wu, J., & Geng, X. (2017). Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6), 2825\u20132838.","journal-title":"IEEE Transactions on Image Processing"},{"issue":"10","key":"6860_CR16","doi-asserted-by":"publisher","first-page":"2044","DOI":"10.1016\/j.ins.2009.12.010","volume":"180","author":"S Garc\u00eda","year":"2010","unstructured":"Garc\u00eda, S., Fern\u00e1ndez, A., Luengo, J., & Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 180(10), 2044\u20132064.","journal-title":"Information Sciences"},{"issue":"7","key":"6860_CR17","doi-asserted-by":"publisher","first-page":"1734","DOI":"10.1109\/TKDE.2016.2545658","volume":"28","author":"X Geng","year":"2016","unstructured":"Geng, X. (2016). Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1734\u20131748.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"6860_CR18","first-page":"158","volume":"199","author":"D Griffin","year":"2004","unstructured":"Griffin, D., & Brenner, L. (2004). Perspectives on probability judgment calibration. Blackwell Handbook of Judgment and Decision Making, 199, 158\u2013177.","journal-title":"Blackwell Handbook of Judgment and Decision Making"},{"issue":"3","key":"6860_CR19","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1016\/0010-0285(92)90013-R","volume":"24","author":"D Griffin","year":"1992","unstructured":"Griffin, D., & Tversky, A. (1992). The weighing of evidence and the determinants of confidence. Cognitive Psychology, 24(3), 411\u2013435.","journal-title":"Cognitive Psychology"},{"key":"6860_CR20","doi-asserted-by":"crossref","unstructured":"Gui, L., Lu, Q., Xu, R., Li, M., & Wei, Q. (2015). A novel class noise estimation method and application in classification. In Proceedings of the 24th acm international on conference on information and knowledge management (pp. 1081\u20131090).","DOI":"10.1145\/2806416.2806554"},{"key":"6860_CR21","unstructured":"Jin, R., & Ghahramani, Z. (2002). Learning with multiple labels. Advances in neural information processing systems, 15."},{"key":"6860_CR22","doi-asserted-by":"crossref","unstructured":"Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1977). Calibration of probabilities: The state of the art. Decision making and change in human affairs: Proceedings of the fifth research conference on subjective probability, utility, and decision making, darmstadt, 1\u20134 september, 1975 (pp. 275\u2013324).","DOI":"10.1007\/978-94-010-1276-8_19"},{"key":"6860_CR23","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1016\/j.ins.2013.08.059","volume":"261","author":"N Maci\u00e0","year":"2014","unstructured":"Maci\u00e0, N., & Bernad\u00f3-Mansilla, E. (2014). Towards UCI+: A mindful repository design. Information Sciences, 261, 237\u2013262. https:\/\/doi.org\/10.1016\/j.ins.2013.08.059","journal-title":"Information Sciences"},{"key":"6860_CR24","doi-asserted-by":"crossref","unstructured":"Nguyen, Q., Valizadegan, H., & Hauskrecht, M. (2011). Learning classification with auxiliary probabilistic information. In 2011 ieee 11th international conference on data mining (pp. 477\u2013486).","DOI":"10.1109\/ICDM.2011.84"},{"issue":"3","key":"6860_CR25","doi-asserted-by":"publisher","first-page":"501","DOI":"10.1136\/amiajnl-2013-001964","volume":"21","author":"Q Nguyen","year":"2014","unstructured":"Nguyen, Q., Valizadegan, H., & Hauskrecht, M. (2014). Learning classification models with soft-label information. Journal of the American Medical Informatics Association, 21(3), 501\u2013508.","journal-title":"Journal of the American Medical Informatics Association"},{"key":"6860_CR44","unstructured":"Online Supplementaries: https:\/\/github.com\/sjoerd-de-vries\/Soft_Label_Learning"},{"key":"6860_CR26","unstructured":"Oyama, S., Baba, Y., Sakurai, Y., & Kashima, H. (2013). Accurate integration of crowdsourced labels using workers self-reported confidence scores. Twenty-third international joint conference on artificial intelligence."},{"key":"6860_CR27","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825\u20132830.","journal-title":"Journal of Machine Learning Research"},{"key":"6860_CR28","doi-asserted-by":"crossref","unstructured":"Peng, P., Wong, R. C.-W., & Yu, P. S. (2014). Learning on probabilistic labels. In Proceedings of the 2014 SIAM international conference on data mining (pp. 307\u2013315).","DOI":"10.1137\/1.9781611973440.35"},{"key":"6860_CR29","doi-asserted-by":"crossref","unstructured":"Peterson, J. C., Battleday, R. M., Griffiths, T. L., & Russakovsky, O. (2019). Human uncertainty makes classification more robust. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 9617\u20139626).","DOI":"10.1109\/ICCV.2019.00971"},{"key":"6860_CR30","unstructured":"Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., & Moy, L. (2010). Learning from crowds. Journal of machine learning research, 11(4)."},{"issue":"1","key":"6860_CR31","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1109\/JBHI.2018.2810820","volume":"23","author":"N Reamaroon","year":"2018","unstructured":"Reamaroon, N., Sjoding, M. W., Lin, K., Iwashyna, T. J., & Najarian, K. (2018). Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE Journal of Biomedical and Health Informatics, 23(1), 407\u2013415.","journal-title":"IEEE Journal of Biomedical and Health Informatics"},{"key":"6860_CR32","doi-asserted-by":"crossref","unstructured":"Sheng, V. S. (2011). Simple multiple noisy label utilization strategies. In 2011 IEEE 11th international conference on data mining (pp. 635\u2013644).","DOI":"10.1109\/ICDM.2011.133"},{"key":"6860_CR33","doi-asserted-by":"crossref","unstructured":"Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th acm sigkdd international conference on knowledge discovery and data mining (pp. 614\u2013622).","DOI":"10.1145\/1401890.1401965"},{"key":"6860_CR34","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1016\/j.knosys.2018.07.010","volume":"159","author":"J Song","year":"2018","unstructured":"Song, J., Wang, H., Gao, Y., & An, B. (2018). Active learning with confidence-based answers for crowdsourcing labeling tasks. Knowledge-Based Systems, 159, 244\u2013258.","journal-title":"Knowledge-Based Systems"},{"issue":"4157","key":"6860_CR35","doi-asserted-by":"publisher","first-page":"1124","DOI":"10.1126\/science.185.4157.1124","volume":"185","author":"A Tversky","year":"1974","unstructured":"Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases: Biases in judgments reveal some heuristics of thinking under uncertainty. Science, 185(4157), 1124\u20131131.","journal-title":"Science"},{"key":"6860_CR36","unstructured":"Xue, Y., & Hauskrecht, M. (2017). Efficient learning of classification models from soft-label information by binning and ranking. The thirtieth international flairs conference."},{"issue":"12","key":"6860_CR37","doi-asserted-by":"publisher","first-page":"2751","DOI":"10.1111\/j.1365-2648.2010.05437.x","volume":"66","author":"H Yang","year":"2010","unstructured":"Yang, H., & Thompson, C. (2010). Nurses\u2019 risk assessment judgements: A confidence calibration study. Journal of Advanced Nursing, 66(12), 2751\u20132760.","journal-title":"Journal of Advanced Nursing"},{"key":"6860_CR38","doi-asserted-by":"crossref","unstructured":"Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by costproportionate example weighting. Third IEEE international conference on data mining (pp. 435\u2013442).","DOI":"10.1109\/ICDM.2003.1250950"},{"issue":"5","key":"6860_CR39","doi-asserted-by":"publisher","first-page":"749","DOI":"10.1109\/JAS.2022.105434","volume":"9","author":"J Zhang","year":"2022","unstructured":"Zhang, J. (2022). Knowledge learning with crowdsourcing: A brief review and systematic perspective. IEEE\/CAA Journal of Automatica Sinica, 9(5), 749\u2013762.","journal-title":"IEEE\/CAA Journal of Automatica Sinica"},{"issue":"8","key":"6860_CR40","doi-asserted-by":"publisher","first-page":"1506","DOI":"10.1109\/TKDE.2018.2860992","volume":"31","author":"J Zhang","year":"2018","unstructured":"Zhang, J., Wu, M., & Sheng, V. S. (2018). Ensemble learning from crowds. IEEE Transactions on Knowledge and Data Engineering, 31(8), 1506\u20131519.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"4","key":"6860_CR41","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1007\/s10462-016-9491-9","volume":"46","author":"J Zhang","year":"2016","unstructured":"Zhang, J., Wu, X., & Sheng, V. S. (2016). Learning from crowdsourced labeled data: A survey. Artificial Intelligence Review, 46(4), 543\u2013576.","journal-title":"Artificial Intelligence Review"},{"key":"6860_CR42","doi-asserted-by":"crossref","unstructured":"Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on machine learning (p. 116).","DOI":"10.1145\/1015330.1015332"},{"key":"6860_CR43","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1007\/s10462-004-0751-8","volume":"22","author":"X Zhu","year":"2004","unstructured":"Zhu, X., & Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22, 177\u2013210.","journal-title":"Artificial Intelligence Review"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06860-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-025-06860-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06860-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T14:29:56Z","timestamp":1764685796000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-025-06860-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,28]]},"references-count":44,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["6860"],"URL":"https:\/\/doi.org\/10.1007\/s10994-025-06860-8","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,28]]},"assertion":[{"value":"24 September 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 July 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 July 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 September 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"All code that was used in this study has been made publicly available on GitHub:\n                      \n                      .","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}}],"article-number":"238"}}