{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T11:20:41Z","timestamp":1779880841895,"version":"3.53.1"},"reference-count":7,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2010,11,9]],"date-time":"2010-11-09T00:00:00Z","timestamp":1289260800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGKDD Explor. Newsl."],"published-print":{"date-parts":[[2010,11,9]]},"abstract":"<jats:p>Cross-validation is a mainstay for measuring performance and progress in machine learning. There are subtle differences in how exactly to compute accuracy, F-measure and Area Under the ROC Curve (AUC) in cross-validation studies. However, these details are not discussed in the literature, and incompatible methods are used by various papers and software packages. This leads to inconsistency across the research literature. Anomalies in performance calculations for particular folds and situations go undiscovered when they are buried in aggregated results over many folds and datasets, without ever a person looking at the intermediate performance measurements. This research note clarifies and illustrates the differences, and it provides guidance for how best to measure classification performance under cross-validation. In particular, there are several divergent methods used for computing F-measure, which is often recommended as a performance measure under class imbalance, e.g., for text classification domains and in one-vs.-all reductions of datasets having many classes. We show by experiment that all but one of these computation methods leads to biased measurements, especially under high class imbalance. This paper is of particular interest to those designing machine learning software libraries and researchers focused on high class imbalance.<\/jats:p>","DOI":"10.1145\/1882471.1882479","type":"journal-article","created":{"date-parts":[[2010,11,12]],"date-time":"2010-11-12T13:36:08Z","timestamp":1289568968000},"page":"49-57","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":298,"title":["Apples-to-apples in cross-validation studies"],"prefix":"10.1145","volume":"12","author":[{"given":"George","family":"Forman","sequence":"first","affiliation":[{"name":"Hewlett-Packard Labs, Palo Alto, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Martin","family":"Scholz","sequence":"additional","affiliation":[{"name":"Hewlett-Packard Labs, Palo Alto, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2010,11,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458119"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.286.5439.531"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"e_1_2_1_4_1","first-page":"81","volume-title":"Symposium on Document Analysis and Information Retrieval","author":"Lewis D. D.","year":"1994","unstructured":"D. D. Lewis and M. Ringuette . A comparison of two learning algorithms for text categorization . In Symposium on Document Analysis and Information Retrieval , pages 81 -- 93 , Las Vegas, NV , Apr. 1994 . ISRI; Univ. of Nevada, Las Vegas. D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, pages 81--93, Las Vegas, NV, Apr. 1994. ISRI; Univ. of Nevada, Las Vegas."},{"key":"e_1_2_1_5_1","first-page":"361","volume-title":"RCV1: A new benchmark collection for text categorization research","author":"Lewis D. D.","year":"2004","unstructured":"D. D. Lewis , Y. Yang , T. Rose , and F. Li . RCV1: A new benchmark collection for text categorization research . volume 5 , pages 361 -- 397 , 2004 . http:\/\/www.jmlr.org\/papers\/volume5\/lewis04a\/lewis04a.pdf. D. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. volume 5, pages 361--397, 2004. http:\/\/www.jmlr.org\/papers\/volume5\/lewis04a\/lewis04a.pdf."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150531"},{"key":"e_1_2_1_7_1","volume-title":"Data Mining: Foundations and Intelligent Paradigms","author":"Raeder T.","year":"2010","unstructured":"T. Raeder , G. Forman , and N. V. Chawla . Data Mining: Foundations and Intelligent Paradigms , chapter Learning with Imbalanced Data: Evaluation Matters. Intelligent Systems Reference Library. Springer Verlag , 2010 . T. Raeder, G. Forman, and N. V. Chawla. Data Mining: Foundations and Intelligent Paradigms, chapter Learning with Imbalanced Data: Evaluation Matters. Intelligent Systems Reference Library. Springer Verlag, 2010."}],"container-title":["ACM SIGKDD Explorations Newsletter"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1882471.1882479","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1882471.1882479","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:59:34Z","timestamp":1750244374000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1882471.1882479"}},"subtitle":["pitfalls in classifier performance measurement"],"short-title":[],"issued":{"date-parts":[[2010,11,9]]},"references-count":7,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,11,9]]}},"alternative-id":["10.1145\/1882471.1882479"],"URL":"https:\/\/doi.org\/10.1145\/1882471.1882479","relation":{},"ISSN":["1931-0145","1931-0153"],"issn-type":[{"value":"1931-0145","type":"print"},{"value":"1931-0153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,11,9]]},"assertion":[{"value":"2010-11-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}