{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T00:00:39Z","timestamp":1775692839494,"version":"3.50.1"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"11","license":[{"start":{"date-parts":[[2024,10,25]],"date-time":"2024-10-25T00:00:00Z","timestamp":1729814400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2024,11]]},"abstract":"<jats:p>With the growing processing power of computing systems and the increasing availability of massive datasets, machine learning algorithms have led to major breakthroughs in many different areas. This development has influenced computer security, spawning a series of work on learning-based security systems, such as for malware detection, vulnerability discovery, and binary code analysis. Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance and render learning-based systems potentially unsuitable for security tasks and practical deployment.<\/jats:p>\n          <jats:p>In this paper, we look at this problem with critical eyes. First, we identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We conduct a study of 30 papers from top-tier security conferences within the past 10 years, confirming that these pitfalls are widespread in the current security literature. In an empirical analysis, we further demonstrate how individual pitfalls can lead to unrealistic performance and interpretations, obstructing the understanding of the security problem at hand. As a remedy, we propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible. Furthermore, we identify open problems when applying machine learning in security and provide directions for further research.<\/jats:p>","DOI":"10.1145\/3643456","type":"journal-article","created":{"date-parts":[[2024,10,25]],"date-time":"2024-10-25T13:52:01Z","timestamp":1729864321000},"page":"104-112","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Pitfalls in Machine Learning for Computer Security"],"prefix":"10.1145","volume":"67","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3628-794X","authenticated-orcid":false,"given":"Daniel","family":"Arp","sequence":"first","affiliation":[{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"},{"name":"The Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-7170-1274","authenticated-orcid":false,"given":"Erwin","family":"Quiring","sequence":"additional","affiliation":[{"name":"International Computer Science Institute (ICSI), Berkeley, USA"},{"name":"Ruhr University Bochum, Bochum, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1140-322X","authenticated-orcid":false,"given":"Feargus","family":"Pendlebury","sequence":"additional","affiliation":[{"name":"University College London, London, England, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-3617-3968","authenticated-orcid":false,"given":"Alexander","family":"Warnecke","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"},{"name":"The Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1254-1758","authenticated-orcid":false,"given":"Fabio","family":"Pierazzi","sequence":"additional","affiliation":[{"name":"King\u2019s College London, London, England, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1493-9552","authenticated-orcid":false,"given":"Christian","family":"Wressnegger","sequence":"additional","affiliation":[{"name":"Karlsruhe Institute of Technology, Karlsruhe, Germany"},{"name":"KASTEL Security Research Labs, Karlsruhe, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3878-2680","authenticated-orcid":false,"given":"Lorenzo","family":"Cavallaro","sequence":"additional","affiliation":[{"name":"University College London, London, England, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5054-8758","authenticated-orcid":false,"given":"Konrad","family":"Rieck","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"},{"name":"The Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,10,25]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Abuhamad M. AbuHmed T. Mohaisen A. and Nyang D. Large-scale and language-oblivious code authorship identification. In Proc. of ACM Conf. on Computer and Communications Security (CCS) 2018.","DOI":"10.1145\/3243734.3243738"},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","unstructured":"Allix K. Bissyand\u00e9 T.F. Klein J. and Le Traon Y. Androzoo: Collecting millions of android apps for the research community. In Proc. of the Int. Conf. on Mining Software Repositories (MSR) 2016.","DOI":"10.1145\/2901739.2903508"},{"key":"e_1_3_1_4_2","first-page":"3971","volume-title":"Proc. of USENIX Security Symp.","author":"Arp D.","year":"2022","unstructured":"Arp, D. et al. Dos and don\u2019ts of machine learning in computer security. In Proc. of USENIX Security Symp. Boston, MA, 2022, 3971\u20133988."},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Arp D. et al. Drebin: Efficient and explainable detection of android malware in your pocket. In Proc. of Network and Distributed System Security Symp. (NDSS) 2014.","DOI":"10.14722\/ndss.2014.23247"},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","unstructured":"Axelsson S. The base-rate fallacy and the difficulty of intrusion detection. ACM Transactions on Information and System Security (TISSEC) Aug. 2000.","DOI":"10.1145\/357830.357849"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Bach S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE July 2015.","DOI":"10.1371\/journal.pone.0130140"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40994-3_25"},{"key":"e_1_3_1_9_2","unstructured":"Caliskan A. et al. De-anonymizing programmers via code stylometry. In Proc. of USENIX Security Symp. 2015."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2005.10.010"},{"key":"e_1_3_1_11_2","unstructured":"Feurer M. et al. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems (NIPS) 2015."},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","unstructured":"Jang J. Brumley D. and Venkataraman S. Bitshred: Feature hashing malware for scalable triage and semantic analysis. In Proc. of ACM Conf. on Computer and Communications Security (CCS) 2011.","DOI":"10.1145\/2046707.2046742"},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Juarez M. et al. A critical evaluation of website fingerprinting attacks. In Proc. of ACM Conf. on Computer and Communications Security (CCS) 2014.","DOI":"10.1145\/2660267.2660368"},{"key":"e_1_3_1_14_2","volume-title":"Content analysis: An introduction to its methodology","author":"Krippendorff K.","year":"2018","unstructured":"Krippendorff, K. Content analysis: An introduction to its methodology. Sage publications, 2018."},{"key":"e_1_3_1_15_2","unstructured":"Krizhevsky A. Sutskever I. and Hinton G.E. Imagenet: Classification with deep convolutional neural networks. In Advances in Neural Information Proccessing Systems (NIPS) 2012."},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Li Z. et al. Vuldeepecker: A deep learning-based system for vulnerability detection. In Proc. of Network and Distributed System Security Symp. (NDSS) 2018.","DOI":"10.14722\/ndss.2018.23158"},{"key":"e_1_3_1_17_2","doi-asserted-by":"crossref","unstructured":"McLaughlin N. et al. Deep android malware detection. In Proc. of ACM Conf. on Data and Applications Security and Privacy (CODASPY) 2017.","DOI":"10.1145\/3029806.3029823"},{"key":"e_1_3_1_18_2","doi-asserted-by":"crossref","unstructured":"Mirsky Y. Doitshman T. Elovici Y. and Shabtai A. Kitsune: An ensemble of autoencoders for online network intrusion detection. In Proc. of Network and Distributed System Security Symp. (NDSS) 2018.","DOI":"10.14722\/ndss.2018.23204"},{"key":"e_1_3_1_19_2","unstructured":"Pendlebury F. et al. TESSERACT: Eliminating experimental bias in malware classification across space and time. In Proc. of USENIX Security Symp. 2019."},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Pierazzi F. Pendlebury F. Cortellazzi J. and Cavallaro L. Intriguing properties of adversarial ml attacks in the problem space. In Proc. of IEEE Symp. on Security and Privacy (S&P) 2020.","DOI":"10.1109\/SP40000.2020.00073"},{"key":"e_1_3_1_21_2","unstructured":"Shin E.C.R. Song D. and Moazzezi R. Recognizing functions in binaries with neural networks. In Proc. of USENIX Security Symp. 2015."},{"key":"e_1_3_1_22_2","doi-asserted-by":"crossref","unstructured":"Sommer R. and Paxson V. Outside the closed world: On using machine learning for network intrusion detection. In Proc. of IEEE Symp. on Security and Privacy (S&P) 2010.","DOI":"10.1109\/SP.2010.25"},{"key":"e_1_3_1_23_2","unstructured":"Sutskever I. Vinyals O. and Le Q.V. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (NIPS) 2014."},{"key":"e_1_3_1_24_2","volume-title":"Exploratory data analysis","author":"Tukey","year":"1977","unstructured":"Tukey , J.W. Addison-wesley series in behavioral science : Quantitative methods. Exploratory data analysis. Addison-Wesley, 1977."},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Wolpert D.H. The lack of a priori distinctions between learning algorithms. Neural Computation 1996.","DOI":"10.1162\/neco.1996.8.7.1341"},{"key":"e_1_3_1_26_2","doi-asserted-by":"crossref","unstructured":"Yamaguchi F. Maier A. Gascon H. and Rieck K. Automatic inference of search patterns for taint-style vulnerabilities. In Proc. of IEEE Symp. on Security and Privacy (S&P) 2015.","DOI":"10.1109\/SP.2015.54"}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643456","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643456","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:31:22Z","timestamp":1750264282000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643456"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,25]]},"references-count":25,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,11]]}},"alternative-id":["10.1145\/3643456"],"URL":"https:\/\/doi.org\/10.1145\/3643456","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"value":"0001-0782","type":"print"},{"value":"1557-7317","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,25]]},"assertion":[{"value":"2024-10-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}