{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T16:06:49Z","timestamp":1780762009642,"version":"3.54.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,9,30]],"date-time":"2023-09-30T00:00:00Z","timestamp":1696032000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Digital Threats"],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>Digital forensics depends on data sets for various purposes like concept evaluation, educational training, and tool validation. Researchers have gathered such data sets into repositories and created data simulation frameworks for producing large amounts of data. Synthetic data often face skepticism due to its perceived deviation from real-world data, raising doubts about its realism. This paper addresses this concern, arguing that there is no definitive answer. We focus on four common digital forensic use cases that rely on data. Through these, we elucidate the specifications and prerequisites of data sets within their respective contexts. Our discourse uncovers that both real-world and synthetic data are indispensable for advancing digital forensic science, software, tools, and the competence of practitioners. Additionally, we provide an overview of available data set repositories and data generation frameworks, contributing to the ongoing dialogue on digital forensic data sets\u2019 utility.<\/jats:p>","DOI":"10.1145\/3609863","type":"journal-article","created":{"date-parts":[[2023,7,20]],"date-time":"2023-07-20T12:02:24Z","timestamp":1689854544000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Data for Digital Forensics: Why a Discussion on \u201cHow Realistic is Synthetic Data\u201d is Dispensable"],"prefix":"10.1145","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-5670-8150","authenticated-orcid":false,"given":"Thomas","family":"G\u00f6bel","sequence":"first","affiliation":[{"name":"Research Institute CODE, University of the Bundeswehr Munich, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9254-6398","authenticated-orcid":false,"given":"Harald","family":"Baier","sequence":"additional","affiliation":[{"name":"Research Institute CODE, University of the Bundeswehr Munich, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5261-4600","authenticated-orcid":false,"given":"Frank","family":"Breitinger","sequence":"additional","affiliation":[{"name":"School of Criminal Justice, University of Lausanne, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,10,6]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1604.03850"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2017.06.004"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2009.06.016"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/BADGERS.2014.11"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fsidi.2022.301344"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fsidi.2021.301133"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1002\/wfs2.1432"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1007\/978-3-030-56223-6_5","volume-title":"Advances in Digital Forensics XVI","author":"G\u00f6bel Thomas","year":"2020","unstructured":"Thomas G\u00f6bel, Thomas Sch\u00e4fer, Julien Hachenberger, Jan T\u00fcrr, and Harald Baier. 2020. A novel approach for generating synthetic datasets for digital forensics. In Advances in Digital Forensics XVI, Gilbert Peterson and Sujeet Shenoi (Eds.). Springer International Publishing, Cham, 73\u201393."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2017.01.010"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fsidi.2023.301562"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.5555\/1051914"},{"key":"e_1_3_2_13_2","unstructured":"Jon Berryhill. 2019. What is Metadata? (2019). https:\/\/www.computerforensics.com\/news\/what-is-metadata"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2011.05.005"},{"key":"e_1_3_2_15_2","first-page":"123","volume-title":"Proceedings of ADFSL Conference on Digital Forensics, Security and Law","author":"Woods Kam","year":"2011","unstructured":"Kam Woods, Christopher A. Lee, Simson Garfinkel, David Dittrich, Adam Russell, and Kris Kearton. 2011. Creating realistic corpora for security and forensic education. In Proceedings of ADFSL Conference on Digital Forensics, Security and Law. 123\u2013134."},{"key":"e_1_3_2_16_2","first-page":"1","article-title":"Forensic corpora: A challenge for forensic research","author":"Garfinkel Simson","year":"2007","unstructured":"Simson Garfinkel. 2007. Forensic corpora: A challenge for forensic research. Electronic Evidence Information Center (2007), 1\u201310.","journal-title":"Electronic Evidence Information Center"},{"key":"e_1_3_2_17_2","volume-title":"2015 AAAI Spring Symposium Series","author":"Baggili Ibrahim","year":"2015","unstructured":"Ibrahim Baggili and Frank Breitinger. 2015. Data sources for advancing cyber forensics: What the social world has to offer. In 2015 AAAI Spring Symposium Series."},{"key":"e_1_3_2_18_2","unstructured":"Brian Carrier. 2010. Digital Forensics Tool Testing Images. URL: http:\/\/dftt.sourceforge.net. (2010). Accessed: 2021-10-12."},{"key":"e_1_3_2_19_2","first-page":"309","volume-title":"Advances in Digital Forensics X","author":"Yannikos York","year":"2014","unstructured":"York Yannikos, Martin Steinebach, Lukas Graner, and Christian Winter. 2014. Data corpora for digital forensics education and research. In Advances in Digital Forensics X, Gilbert Peterson and Sujeet Shenoi (Eds.). Springer Berlin, Berlin, 309\u2013325."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1002\/wfs2.1367"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.6028\/NIST.IR.7490"},{"key":"e_1_3_2_22_2","article-title":"Introduction to CFTT and CFReDS projects at NIST","author":"Park Jungheum","year":"2016","unstructured":"Jungheum Park, James R. Lyle, and Barbara Guttman. 2016. Introduction to CFTT and CFReDS projects at NIST. Journal of the Korea Institute of Information Security & Cryptography (2016).","journal-title":"Journal of the Korea Institute of Information Security & Cryptography"},{"key":"e_1_3_2_23_2","article-title":"The CFReDS Project","year":"2022","unstructured":"NIST. 2022. The CFReDS Project. https:\/\/www.cfreds.nist.gov\/. (2022). Online; accessed: 2nd December 2022.","journal-title":"https:\/\/www.cfreds.nist.gov\/"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2012.05.002"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2018.04.021"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1007\/978-3-319-20125-2_14","volume-title":"Computational Forensics","author":"Visti Hannu","year":"2015","unstructured":"Hannu Visti, Sean Tohill, and Paul Douglas. 2015. Automatic creation of computer forensic test images. In Computational Forensics, Utpal Garain and Faisal Shafait (Eds.). Springer International Publishing, Cham, 163\u2013175."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fsidi.2021.301264"},{"key":"e_1_3_2_28_2","first-page":"2183","article-title":"Daubert v. Merrell Dow Pharmaceuticals, Inc.: Epistemilogy and legal process","volume":"15","author":"Farrell Margaret G.","year":"1993","unstructured":"Margaret G. Farrell. 1993. Daubert v. Merrell Dow Pharmaceuticals, Inc.: Epistemilogy and legal process. Cardozo L. Rev. 15 (1993), 2183.","journal-title":"Cardozo L. Rev."},{"key":"e_1_3_2_29_2","unstructured":"Brian Carrier. 2002. Open source digital forensics tools: The legal argument. @stake Inc. http:\/\/dl.packetstormsecurity.net\/papers\/IDS\/atstake_opensource_forensics.pdf"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.scijus.2018.04.001"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2019.01.009"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2016.04.006"},{"key":"e_1_3_2_33_2","unstructured":"Brian Cusack and Alain Homewood. 2013. Identifying bugs in digital forensic tools. (2013)."},{"key":"e_1_3_2_34_2","unstructured":"James Lyle. 2002. NIST CFTT: Testing disk imaging tools. Digital Forensic Research Workshop International Journal of Digital Evidence (online at www.ijde.org) Syracuse . https:\/\/tsapps.nist.gov\/publication\/get_pdf.cfm?pub_id=51081"},{"key":"e_1_3_2_35_2","unstructured":"James Lyle Barbara Guttman and Richard Ayers. 2011. Ten years of computer forensic tool testing. 8 (2011-10-12 00:10:002011). https:\/\/tsapps.nist.gov\/publication\/get_pdf.cfm?pub_id=909329"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fsidi.2022.301407"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-88381-2_5"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-73697-6_11"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fsidi.2021.301330"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2018.01.015"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/IMF.2018.00014"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2010.05.009"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2018.04.024"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2018.01.007"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fsidi.2021.301109"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2019.04.005"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2018.05.004"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3339252.3339281"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/2666652.2666663"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/CNSM.2015.7367341"},{"key":"e_1_3_2_51_2","unstructured":"Josh Brunthy. 2021. Validation of Forensic Tools- A Quick Guide for the DFIR Examiner. (2021). https:\/\/joshbrunty.github.io\/2021\/11\/01\/validation.html"}],"container-title":["Digital Threats: Research and Practice"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609863","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3609863","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:38:01Z","timestamp":1750178281000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609863"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,30]]},"references-count":50,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3609863"],"URL":"https:\/\/doi.org\/10.1145\/3609863","relation":{},"ISSN":["2692-1626","2576-5337"],"issn-type":[{"value":"2692-1626","type":"print"},{"value":"2576-5337","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,30]]},"assertion":[{"value":"2023-06-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-06-26","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}