{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T01:39:54Z","timestamp":1772847594193,"version":"3.50.1"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T00:00:00Z","timestamp":1737936000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T00:00:00Z","timestamp":1737936000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Ministerio de Ciencia e Innovaci\u00f3n, Gobierno de Espa\u00f1a","award":["PID2021-123733NB-I00"],"award-info":[{"award-number":["PID2021-123733NB-I00"]}]},{"name":"Ministerio de Ciencia e Innovaci\u00f3n, Gobierno de Espa\u00f1a","award":["PID2021-123733NB-I00"],"award-info":[{"award-number":["PID2021-123733NB-I00"]}]},{"DOI":"10.13039\/501100011104","name":"Universitat Aut\u00f2noma de Barcelona","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100011104","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In the field of supervised machine learning, accurate evaluation of classification models is a critical factor for assessing their performance and guiding model selection. This paper delves into the domain of ordinal classification and raises the question of adapting ordinal metrics to the interval scale. In scenarios where measurements are recorded at intervals, not only the order but also their length assume significance, and this promotes the adoption of novel performance metrics. Initially, we revisit two existing confusion matrix-based ordinal metrics and introduce a normalization technique to render them comparable and enhance their practical utility. We extend our focus to classification by intervals, proposing a robust framework for adapting ordinal metrics to the interval scale, and applying it to the aforementioned ordinal metrics. We address the challenge of unbounded rightmost intervals, a common issue in practical applications, from both theoretical and simulation perspectives, by providing a solution that enhances the applicability of the proposed metrics. To further explore practical implications, we conducted experiments on real-world datasets. The results reveal a promising trend in the use of interval-scale metrics to guide hyper-parameter tuning for improving model performance.<\/jats:p>","DOI":"10.1007\/s10994-024-06654-4","type":"journal-article","created":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T21:24:02Z","timestamp":1738013042000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Adapting performance metrics for ordinal classification to interval scale: length matters"],"prefix":"10.1007","volume":"114","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0789-9069","authenticated-orcid":false,"given":"Giulia","family":"Binotto","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1208-9236","authenticated-orcid":false,"given":"Rosario","family":"Delgado","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,1,27]]},"reference":[{"key":"6654_CR1","doi-asserted-by":"publisher","unstructured":"Amig\u00f3, E., Gonzalo, J., Mizzaro, S., & Carrillo-de-Albornoz, J. (2020). An effectiveness metric for ordinal classification: Formal properties and experimental results. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 3938\u20133949). https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.363","DOI":"10.18653\/v1\/2020.acl-main.363"},{"issue":"2","key":"6654_CR2","doi-asserted-by":"publisher","first-page":"825","DOI":"10.1016\/j.eswa.2006.10.022","volume":"34","author":"A Ben-David","year":"2008","unstructured":"Ben-David, A. (2008). Comparison of classification accuracy using Cohen\u2019s Weighted Kappa. Expert Systems with Applications, 34(2), 825\u2013832. https:\/\/doi.org\/10.1016\/j.eswa.2006.10.022","journal-title":"Expert Systems with Applications"},{"key":"6654_CR3","doi-asserted-by":"publisher","unstructured":"Baccianella, S., Esuli, A., & Sebastiani, F. (2009). Evaluation measures for ordinal regression. In 2009 ninth international conference on intelligent systems design and applications (pp. 283\u2013287). https:\/\/doi.org\/10.1109\/ISDA.2009.230","DOI":"10.1109\/ISDA.2009.230"},{"issue":"1","key":"6654_CR4","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1007\/s11747-021-00790-2","volume":"50","author":"S Baehre","year":"2022","unstructured":"Baehre, S., O\u2019Dwyer, M., O\u2019Malley, L., & Lee, N. (2022). The use of Net Promoter Score (NPS) to predict sales growth: insights from an empirical investigation. Journal of the Academy of Marketing Science, 50(1), 67\u201384. https:\/\/doi.org\/10.1007\/s11747-021-00790-2","journal-title":"Journal of the Academy of Marketing Science"},{"key":"6654_CR5","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1016\/j.datak.2017.10.003","volume":"112","author":"JR Cano","year":"2017","unstructured":"Cano, J. R., & Garc\u00eda, S. (2017). Training set selection for monotonic ordinal classification. Data & Knowledge Engineering, 112, 94\u2013105. https:\/\/doi.org\/10.1016\/j.datak.2017.10.003","journal-title":"Data & Knowledge Engineering"},{"key":"6654_CR6","doi-asserted-by":"publisher","unstructured":"Cruz-Ram\u00edrez, M., Herv\u00e1s-Mart\u00ednez, C., S\u00e1nchez-Monedero, J., & Guti\u00e9rrez, P. A. (2011). A preliminary study of ordinal metrics to guide a multi-objective evolutionary algorithm. 2011 11th International Conference on Intelligent Systems Design and Applications, 1176\u20131181 https:\/\/doi.org\/10.1109\/ISDA.2011.6121818","DOI":"10.1109\/ISDA.2011.6121818"},{"key":"6654_CR7","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1016\/j.neucom.2013.05.058","volume":"135","author":"M Cruz-Ram\u00edrez","year":"2014","unstructured":"Cruz-Ram\u00edrez, M., Herv\u00e1s-Mart\u00ednez, C., S\u00e1nchez-Monedero, J., & Guti\u00e9rrez, P. A. (2014). Metrics to guide a multi-objective evolutionary algorithm for ordinal classification. Neurocomputing, 135, 21\u201331. https:\/\/doi.org\/10.1016\/j.neucom.2013.05.058","journal-title":"Neurocomputing"},{"issue":"8","key":"6654_CR8","doi-asserted-by":"publisher","first-page":"1173","DOI":"10.1142\/S0218001411009093","volume":"25","author":"JS Cardoso","year":"2011","unstructured":"Cardoso, J. S., & Sousa, R. (2011). Measuring the performance of ordinal classification. International Journal of Pattern Recognition and Artificial Intelligence, 25(8), 1173\u20131195. https:\/\/doi.org\/10.1142\/S0218001411009093","journal-title":"International Journal of Pattern Recognition and Artificial Intelligence"},{"key":"6654_CR9","doi-asserted-by":"publisher","unstructured":"Erbilek, M., Fairhurst, M., & Costa-Abreu, M.C.D. (2013). Age Prediction from Iris Biometrics. 5th International Conference on Imaging for Crime Detection and Prevention, ICDP 2013 https:\/\/doi.org\/10.1049\/ic.2013.0258","DOI":"10.1049\/ic.2013.0258"},{"key":"6654_CR10","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1007\/978-3-642-01818-3_25","volume":"5549","author":"L Gaudette","year":"2009","unstructured":"Gaudette, L., & Japkowicz, N. (2009). Evaluation methods for ordinal classification. Lecture Notes in Computer Science, 5549, 207\u2013210. https:\/\/doi.org\/10.1007\/978-3-642-01818-3_25","journal-title":"Lecture Notes in Computer Science"},{"key":"6654_CR11","doi-asserted-by":"publisher","unstructured":"Gowroju, S., Kumar, S., Aarti, & Ghimire, A. (2022). Deep Neural Network for accurate age group prediction through Pupils using the Optimised UNet model. Mathematical Problems in Engineering Article ID 7813701 24 pages https:\/\/doi.org\/10.1155\/2022\/7813701","DOI":"10.1155\/2022\/7813701"},{"issue":"1","key":"6654_CR12","doi-asserted-by":"publisher","first-page":"135","DOI":"10.5430\/air.v5n1p135","volume":"5","author":"NI George","year":"2016","unstructured":"George, N. I., Lu, T.-P., & Chang, C.-W. (2016). Cost-sensitive performance metric for comparing multiple ordinal classifiers. Artificial Intelligence Research, 5(1), 135\u2013143. https:\/\/doi.org\/10.5430\/air.v5n1p135","journal-title":"Artificial Intelligence Research"},{"key":"6654_CR13","doi-asserted-by":"publisher","unstructured":"Jeske, D. R., Callanan, T. P., & Guo, L. (2011). Identification of key drivers of net promoter score using a statistical classification model. Efficient Decision Support Systems, IntechOpen, Chapter 8 https:\/\/doi.org\/10.5772\/16954","DOI":"10.5772\/16954"},{"key":"6654_CR14","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1016\/j.neucom.2020.01.025","volume":"388","author":"X Liu","year":"2020","unstructured":"Liu, X., Fan, F., Kong, L., Diao, Z., Xie, W., Lu, J., & You, J. (2020). Unimodal regularized neuron stick-breaking for ordinal classification. Neurocomputing, 388, 34\u201344. https:\/\/doi.org\/10.1016\/j.neucom.2020.01.025","journal-title":"Neurocomputing"},{"issue":"140","key":"6654_CR15","first-page":"5","volume":"22","author":"R Likert","year":"1932","unstructured":"Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 5\u201355.","journal-title":"Archives of Psychology"},{"issue":"8","key":"6654_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0183537","volume":"12","author":"AA Morgan-L\u00f3pez","year":"2017","unstructured":"Morgan-L\u00f3pez, A. A., Kim, A. E., Chew, R. F., & Ruddle, P. (2017). Predicting age groups of Twitter users based on language and metadata features. PLoS ONE, 12(8), 1\u201312. https:\/\/doi.org\/10.1371\/journal.pone.0183537","journal-title":"PLoS ONE"},{"key":"6654_CR17","doi-asserted-by":"publisher","unstructured":"Markoulidakis, J., Rallis, I., Georgoulas, I., Kopsiaftis, G., Doulamis, A., & Doulamis, N. (2021). Multiclass confusion matrix reduction method and its application on net promoter score classification problem. In Proceedings of the 14th peripheral technologies related to assistive environments conference (PETRA \u201921)(Vol. 9(4), 81) https:\/\/doi.org\/10.3390\/technologies9040081","DOI":"10.3390\/technologies9040081"},{"key":"6654_CR18","doi-asserted-by":"publisher","unstructured":"Peersman, C., Daelemans, W., & Vaerenbergh, L. V. (2011). Predicting age and gender in online social networks. In Proceedings of the 3rd international CIKM workshop on search and mining user-generated contents (pp. 37\u201344). https:\/\/doi.org\/10.1145\/2065023.2065035","DOI":"10.1145\/2065023.2065035"},{"issue":"1\u20132","key":"6654_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/1500000011","volume":"2","author":"B Pang","year":"2008","unstructured":"Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1\u20132), 1\u2013135. https:\/\/doi.org\/10.1561\/1500000011","journal-title":"Foundations and Trends in Information Retrieval"},{"key":"6654_CR20","doi-asserted-by":"publisher","unstructured":"Ravishankar, S., Kumar, P., Patage, V. V., Tiwari, S., & Goyal, S. (2020). Prediction of age from speech features using a multi-layer perceptron model. In 11th international conference on computing, communication and networking technologies (ICCCNT). https:\/\/doi.org\/10.1109\/ICCCNT49239.2020.9225390","DOI":"10.1109\/ICCCNT49239.2020.9225390"},{"key":"6654_CR21","doi-asserted-by":"publisher","unstructured":"Sebastiani, F. (2015). An axiomatically derived measure for the evaluation of classification algorithms. In Proceedings of the 2015 international conference on the theory of information retrieval (pp. 11\u201320). https:\/\/doi.org\/10.1145\/2808194.2809449","DOI":"10.1145\/2808194.2809449"},{"issue":"25","key":"6654_CR22","doi-asserted-by":"publisher","first-page":"33911","DOI":"10.1007\/s11042-021-11252-w","volume":"80","author":"N Sharma","year":"2021","unstructured":"Sharma, N., Sharma, R., & Jindal, N. (2021). Prediction of face age progression with generative adversarial networks. Multimedia Tools and Applications, 80(25), 33911\u201333935. https:\/\/doi.org\/10.1007\/s11042-021-11252-w","journal-title":"Multimedia Tools and Applications"},{"issue":"2684","key":"6654_CR23","first-page":"677","volume":"103","author":"SS Stevens","year":"1946","unstructured":"Stevens, S. S. (1946). On the Theory of Scales of Measurement. Science, New Series, 103(2684), 677\u2013680.","journal-title":"Science, New Series"},{"key":"6654_CR24","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1016\/j.neucom.2020.03.034","volume":"401","author":"VM Vargas","year":"2020","unstructured":"Vargas, V. M., Guti\u00e9rrez, P. A., & Herv\u00e1s-Mart\u00ednez, C. (2020). Cumulative link models for deep ordinal classification. Neurocomputing, 401, 48\u201358. https:\/\/doi.org\/10.1016\/j.neucom.2020.03.034","journal-title":"Neurocomputing"},{"key":"6654_CR25","unstructured":"Waegeman, W., Baets, B. D., & Boullart, L. (2006). A comparison of different ROC measures for ordinal regression. In Proceedings of the CML 2006 workshop on ROC analysis in machine learning"},{"key":"6654_CR26","doi-asserted-by":"publisher","first-page":"110020","DOI":"10.1016\/j.asoc.2023.110020","volume":"134","author":"AE Yilmaz","year":"2023","unstructured":"Yilmaz, A. E., & Demirhan, H. (2023). Weighted kappa measures for ordinal multi-class classification performance. Applied Soft Computing, 134, 110020. https:\/\/doi.org\/10.1016\/j.asoc.2023.110020","journal-title":"Applied Soft Computing"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06654-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-024-06654-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06654-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T02:01:11Z","timestamp":1739844071000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-024-06654-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,27]]},"references-count":26,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2]]}},"alternative-id":["6654"],"URL":"https:\/\/doi.org\/10.1007\/s10994-024-06654-4","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,27]]},"assertion":[{"value":"16 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 November 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 December 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 January 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest nor conflict of interest in relation to the research reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"None.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Financial and non-financial interests."}},{"value":"The R code for calculating the metrics as well as for replicating all examples in Sect.\u00a0 is available at . Additionally, the R scripts used to implement the experimental phase detailed in Sect.\u00a0 can be found at .","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability."}}],"article-number":"41"}}