{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T23:07:40Z","timestamp":1768432060995,"version":"3.49.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2019,1,22]],"date-time":"2019-01-22T00:00:00Z","timestamp":1548115200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100007601","name":"Horizon 2020","doi-asserted-by":"publisher","award":["771844 - Bitcrumbs"],"award-info":[{"award-number":["771844 - Bitcrumbs"]}],"id":[{"id":"10.13039\/501100007601","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Priv. Secur."],"published-print":{"date-parts":[[2019,2,28]]},"abstract":"<jats:p>The number of unique malware samples is growing out of control. Over the years, security companies have designed and deployed complex infrastructures to collect and analyze this overwhelming number of samples. As a result, a security company can collect more than 1M unique files per day only from its different feeds. These are automatically stored and processed to extract actionable information derived from static and dynamic analysis. However, only a tiny amount of this data is interesting for security researchers and attracts the interest of a human expert.<\/jats:p>\n          <jats:p>To the best of our knowledge, nobody has systematically dissected these datasets to precisely understand what they really contain. The security community generally discards the problem because of the alleged prevalence of uninteresting samples.<\/jats:p>\n          <jats:p>\n            In this article, we guide the reader through a step-by-step analysis of the hundreds of thousands Windows executables collected in one day from these feeds. Our goal is to show how a company can employ existing state-of-the-art techniques to automatically process these samples and then perform manual experiments to understand and document what is the real content of this gigantic dataset. We present the filtering steps, and we discuss in detail how samples can be grouped together according to their behavior to support manual verification. Finally, we use the results of this measurement experiment to provide a rough estimate of both the human and computer resources that are required to get to the bottom of the\n            <jats:italic>catch of the day<\/jats:italic>\n            .\n          <\/jats:p>","DOI":"10.1145\/3291061","type":"journal-article","created":{"date-parts":[[2019,1,23]],"date-time":"2019-01-23T13:02:14Z","timestamp":1548248534000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":47,"title":["A Close Look at a Daily Dataset of Malware Samples"],"prefix":"10.1145","volume":"22","author":[{"given":"Xabier","family":"Ugarte-Pedrero","sequence":"first","affiliation":[{"name":"Cisco Systems, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mariano","family":"Graziano","sequence":"additional","affiliation":[{"name":"Cisco Systems, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Davide","family":"Balzarotti","sequence":"additional","affiliation":[{"name":"Eurecom, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,1,22]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Jose Morales. 2014. A New Approach to Prioritizing Malware Analysis. Retrieved from https:\/\/insights.sei.cmu.edu\/sei_blog\/2014\/04\/a-new-approach-to-prioritizing-malware-analysis.html.  Jose Morales. 2014. A New Approach to Prioritizing Malware Analysis. Retrieved from https:\/\/insights.sei.cmu.edu\/sei_blog\/2014\/04\/a-new-approach-to-prioritizing-malware-analysis.html."},{"key":"e_1_2_1_2_1","volume-title":"Symantec Global Internet Security Threat Report Trends for","year":"2008","unstructured":"Symantec. 2008. Symantec Global Internet Security Threat Report Trends for 2008 . Retrieved from http:\/\/eval.symantec.com\/mktginfo\/enterprise\/white_papers\/b-whitepaper_internet_security_threat_report_xiv_04-2009.en-us.pdf. Symantec. 2008. Symantec Global Internet Security Threat Report Trends for 2008. Retrieved from http:\/\/eval.symantec.com\/mktginfo\/enterprise\/white_papers\/b-whitepaper_internet_security_threat_report_xiv_04-2009.en-us.pdf."},{"key":"e_1_2_1_3_1","unstructured":"Symantec. 2015. Symantec\u2019s 2015 internet security threat report. Retrieved from https:\/\/www.symantec.com\/security_response\/publications\/threatreport.jsp.  Symantec. 2015. Symantec\u2019s 2015 internet security threat report. Retrieved from https:\/\/www.symantec.com\/security_response\/publications\/threatreport.jsp."},{"key":"e_1_2_1_4_1","unstructured":"Francisco Santos. 2016. Putting the spotlight on firmware malware. Retrieved from http:\/\/blog.virustotal.com\/2016\/01\/putting-spotlight-on-firmware-malware_27.html.  Francisco Santos. 2016. Putting the spotlight on firmware malware. Retrieved from http:\/\/blog.virustotal.com\/2016\/01\/putting-spotlight-on-firmware-malware_27.html."},{"key":"e_1_2_1_5_1","volume-title":"Threat Report","year":"2016","unstructured":"Symantec. 2016. Symantec\u2019s Internet Security Threat Report 2016 . Retrieved from https:\/\/www.symantec.com\/content\/dam\/symantec\/docs\/reports\/istr-21-2016-en.pdf. Symantec. 2016. Symantec\u2019s Internet Security Threat Report 2016. Retrieved from https:\/\/www.symantec.com\/content\/dam\/symantec\/docs\/reports\/istr-21-2016-en.pdf."},{"key":"e_1_2_1_6_1","unstructured":"Herman Slatman. 2017. Awesome Threat Intelligence. Retrieved from https:\/\/github.com\/hslatman\/awesome-threat-intelligence.  Herman Slatman. 2017. Awesome Threat Intelligence. Retrieved from https:\/\/github.com\/hslatman\/awesome-threat-intelligence."},{"key":"e_1_2_1_7_1","unstructured":"VirusTotal. 2017. VirusTotal File Statistics during the last 7 days. Retrieved from https:\/\/www.virustotal.com\/en\/statistics\/.  VirusTotal. 2017. VirusTotal File Statistics during the last 7 days. Retrieved from https:\/\/www.virustotal.com\/en\/statistics\/."},{"key":"e_1_2_1_8_1","unstructured":"Alberto Ortega. 2018. Pafish\u2014Paranoid Fish. Retrieved from https:\/\/github.com\/a0rtega\/pafish.  Alberto Ortega. 2018. Pafish\u2014Paranoid Fish. Retrieved from https:\/\/github.com\/a0rtega\/pafish."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3029806.3029815"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET 09)","author":"Bayer Ulrich","year":"2009","unstructured":"Ulrich Bayer , Imam Habibi , Davide Balzarotti , Engin Kirda , and Christopher Kruegel . 2009 . A view on current malware behaviors . In Proceedings of the USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET 09) . Ulrich Bayer, Imam Habibi, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. 2009. A view on current malware behaviors. In Proceedings of the USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET 09)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-45719-2_15"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 27th International Symposium on Reliable Distributed Systems (SRDS\u201908)","author":"Canto Julio","year":"2008","unstructured":"Julio Canto , Marc Dacier , Engin Kirda , and Corrado Leita . 2008 . Large-scale malware collection: Lessons learned . In Proceedings of the 27th International Symposium on Reliable Distributed Systems (SRDS\u201908) . Retrieved from http:\/\/www.eurecom.fr\/publication\/2648. Julio Canto, Marc Dacier, Engin Kirda, and Corrado Leita. 2008. Large-scale malware collection: Lessons learned. In Proceedings of the 27th International Symposium on Reliable Distributed Systems (SRDS\u201908). Retrieved from http:\/\/www.eurecom.fr\/publication\/2648."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2018.00054"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.4236\/jis.2014.52006"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 24th USENIX Security Symposium (USENIXSecurity\u201915)","author":"Graziano Mariano","year":"2015","unstructured":"Mariano Graziano , Davide Canali , Leyla Bilge , Andrea Lanzi , and Davide Balzarotti . 2015 . Needles in a haystack: Mining information from public dynamic analysis sandboxes for malware intelligence . In Proceedings of the 24th USENIX Security Symposium (USENIXSecurity\u201915) . Mariano Graziano, Davide Canali, Leyla Bilge, Andrea Lanzi, and Davide Balzarotti. 2015. Needles in a haystack: Mining information from public dynamic analysis sandboxes for malware intelligence. In Proceedings of the 24th USENIX Security Symposium (USENIXSecurity\u201915)."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (USENIXATC\u201913)","author":"Hu Xin","year":"2013","unstructured":"Xin Hu , Kang G. Shin , Sandeep Bhatkar , and Kent Griffin . 2013 . MutantX-S: Scalable malware clustering based on static features . In Proceedings of the USENIX Annual Technical Conference (USENIXATC\u201913) . USENIX, San Jose, CA, 187--198. Xin Hu, Kang G. Shin, Sandeep Bhatkar, and Kent Griffin. 2013. MutantX-S: Scalable malware clustering based on static features. In Proceedings of the USENIX Annual Technical Conference (USENIXATC\u201913). USENIX, San Jose, CA, 187--198."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2016.7840712"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-37300-8_6"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2046707.2046742"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 22nd USENIX Security Symposium (USENIXSecurity\u201913)","author":"Jang Jiyong","year":"2013","unstructured":"Jiyong Jang , Maverick Woo , and David Brumley . 2013 . Towards automatic software lineage inference . In Proceedings of the 22nd USENIX Security Symposium (USENIXSecurity\u201913) . USENIX, Washington, D.C., 81--96. Jiyong Jang, Maverick Woo, and David Brumley. 2013. Towards automatic software lineage inference. In Proceedings of the 22nd USENIX Security Symposium (USENIXSecurity\u201913). USENIX, Washington, D.C., 81--96."},{"key":"e_1_2_1_21_1","volume-title":"Pearu Peterson et al","author":"Jones Eric","year":"2016","unstructured":"Eric Jones , Travis Oliphant , Pearu Peterson et al . 2016 . SciPy: Open source scientific tools for Python 2001--2012. Retrieved from http:\/\/www.scipy.org. Eric Jones, Travis Oliphant, Pearu Peterson et al. 2016. SciPy: Open source scientific tools for Python 2001--2012. Retrieved from http:\/\/www.scipy.org."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2046684.2046690"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3133958"},{"key":"e_1_2_1_24_1","volume-title":"Doowon Kim, Christopher Gates, and Tudor Dumitra\u015f.","author":"Koz\u00e1k Kristi\u00e1n","year":"2018","unstructured":"Kristi\u00e1n Koz\u00e1k , Bum Jun Kwon , Doowon Kim, Christopher Gates, and Tudor Dumitra\u015f. 2018 . Issued for abuse: Measuring the underground trade in code signing certificate. arXiv preprint arXiv:1803.02931. Kristi\u00e1n Koz\u00e1k, Bum Jun Kwon, Doowon Kim, Christopher Gates, and Tudor Dumitra\u015f. 2018. Issued for abuse: Measuring the underground trade in code signing certificate. arXiv preprint arXiv:1803.02931."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2810103.2813724"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2017.59"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2420950.2421001"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23644-0_18"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/BADGERS.2014.7"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CTC.2013.9"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-70542-0_6"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/2011216.2011217"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-45719-2_11"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2015.46"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-60876-1_6"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 2nd USENIX Conference on Large-scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (LEET\u201909)","author":"Wicherski Georg","year":"2009","unstructured":"Georg Wicherski . 2009 . peHash: A novel approach to fast malware clustering . In Proceedings of the 2nd USENIX Conference on Large-scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (LEET\u201909) . USENIX Association. Georg Wicherski. 2009. peHash: A novel approach to fast malware clustering. In Proceedings of the 2nd USENIX Conference on Large-scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (LEET\u201909). USENIX Association."}],"container-title":["ACM Transactions on Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291061","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3291061","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:01:53Z","timestamp":1750208513000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291061"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,22]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,2,28]]}},"alternative-id":["10.1145\/3291061"],"URL":"https:\/\/doi.org\/10.1145\/3291061","relation":{},"ISSN":["2471-2566","2471-2574"],"issn-type":[{"value":"2471-2566","type":"print"},{"value":"2471-2574","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,1,22]]},"assertion":[{"value":"2018-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-01-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}