{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T08:19:11Z","timestamp":1767860351000,"version":"3.49.0"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["HR001119C0076"],"award-info":[{"award-number":["HR001119C0076"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,6,20]]},"abstract":"<jats:p>Despite decades of contributions to the theoretical foundations of parsing and the many tools available to aid in parser development, many security attacks in the wild still exploit parsers. The issues are myriad\u2014flaws in memory management in contexts lacking memory safety, flaws in syntactic or semantic validation of input, and misinterpretation of hundred-page-plus standards documents. It remains challenging to build and maintain parsers for common, mature data formats.<\/jats:p>\n          <jats:p>In response to these challenges, we present Daedalus, a new domain-specific language (DSL) and toolchain for writing safe parsers. Daedalus is built around functional-style parser combinators, which suit the rich data dependencies often found in complex data formats. It adds domain-specific constructs for stream manipulation, allowing the natural expression of parsing noncontiguous formats. Balancing between expressivity and domain-specific constructs lends Daedalus specifications simplicity and leaves them amenable to analysis. As a stand-alone DSL, Daedalus is able to generate safe parsers in multiple languages, currently C++ and Haskell.<\/jats:p>\n          <jats:p>We have implemented 20 data formats with Daedalus, including two large, complex formats\u2014PDF and NITF\u2013and our evaluation shows that Daedalus parsers are concise and performant. Our experience with PDF forms our largest case study. We worked with the PDF Association to build a reference implementation, which was subject to a red-teaming exercise along with a number of other PDF parsers and was the only parser to be found free of defects.<\/jats:p>","DOI":"10.1145\/3656410","type":"journal-article","created":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T16:27:20Z","timestamp":1718900840000},"page":"816-840","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Daedalus: Safer Document Parsing"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-7795-4708","authenticated-orcid":false,"given":"Iavor S.","family":"Diatchki","sequence":"first","affiliation":[{"name":"Galois, Portland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4439-0130","authenticated-orcid":false,"given":"Mike","family":"Dodds","sequence":"additional","affiliation":[{"name":"Galois, Portland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9631-1169","authenticated-orcid":false,"given":"Harrison","family":"Goldstein","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, Portland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1762-2039","authenticated-orcid":false,"given":"Bill","family":"Harris","sequence":"additional","affiliation":[{"name":"Galois, Portland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9328-1686","authenticated-orcid":false,"given":"David A.","family":"Holland","sequence":"additional","affiliation":[{"name":"Galois, Portland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5698-9841","authenticated-orcid":false,"given":"Benoit","family":"Razet","sequence":"additional","affiliation":[{"name":"Galois, Portland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-9350-3041","authenticated-orcid":false,"given":"Cole","family":"Schlesinger","sequence":"additional","affiliation":[{"name":"Galois, Portland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6133-0147","authenticated-orcid":false,"given":"Simon","family":"Winwood","sequence":"additional","affiliation":[{"name":"Galois, Portland, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.5555\/6448"},{"key":"e_1_3_1_3_2","volume-title":"The Theory of Parsing, Translation, and Compiling","author":"Aho Alfred V.","year":"1972","unstructured":"Alfred V. Aho and Jeffrey D. Ullman. 1972. The Theory of Parsing, Translation, and Compiling. Prentice-Hall, Inc., USA."},{"key":"e_1_3_1_4_2","unstructured":"Ange Albertini Thai P. Duong Shay Gueron Stefan K\u00f6lbl Atul Luykx and Sophie Schmieg. 2020. How to Abuse and Fix Authenticated Encryption Without Key Commitment. In IACR Cryptology ePrint Archive. https:\/\/api.semanticscholar.org\/CorpusID:227231944"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.5555\/AAI29211293"},{"key":"e_1_3_1_6_2","unstructured":"Andy Gill and Simon Marlow. Accessed: March 2024. Happy. https:\/\/www.haskell.org\/happy\/"},{"key":"e_1_3_1_7_2","unstructured":"ASN. 1 Accessed: March 2024. ITU reference material for ASN.1. http:\/\/www.itu.int\/en\/ITU-T\/asn1\/."},{"key":"e_1_3_1_8_2","first-page":"615","volume-title":"Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation","author":"Bangert Julian","year":"2014","unstructured":"Julian Bangert and Nickolai Zeldovich. 2014. Nail: a practical tool for parsing and generating data formats. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) (OSDI\u201914). USENIX Association, USA, 615\u2013628."},{"key":"e_1_3_1_9_2","unstructured":"Karen Barrett-Wilt. 2021. The trials and tribulations of academic publishing \u2013 and Fuzz Testing. https:\/\/www.cs.wisc.edu\/2021\/01\/14\/the-trials-and-tribulations-of-academic-publishing-and-fuzz-testing\/ Publication Title: UW. Madison Department of Computer Sciences."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3372885.3373836"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/964001.964011"},{"key":"e_1_3_1_12_2","unstructured":"Bryan O\u2019Sullivan. Accessed: March 2024. attoparsec: Fast combinator parsing for bytestrings and text. https:\/\/hackage.haskell.org\/package\/attoparsec"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/321239.321249"},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-3-319-17524-9_1","volume-title":"NASA Formal Methods","author":"Calcagno Cristiano","year":"2015","unstructured":"Cristiano Calcagno, Dino Distefano, Jeremy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O\u2019Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving Fast with Software Verification. In NASA Formal Methods, Klaus Havelund, Gerard Holzmann, and Rajeev Joshi (Eds.). Springer International Publishing, Cham, 3\u201311."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1956.1056813"},{"key":"e_1_3_1_16_2","unstructured":"Chris Dornan and Simon Marlow. Accessed: March 2024. Alex User Guide. https:\/\/www.haskell.org\/alex\/"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/351240.351266"},{"key":"e_1_3_1_18_2","unstructured":"Common Crawl Accessed: March 2024. Common Crawl. https:\/\/commoncrawl.org"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0956796807006326"},{"key":"e_1_3_1_20_2","unstructured":"CVE-2022-27337 2022. CVE-2022-27337. https:\/\/nvd.nist.gov\/vuln\/detail\/CVE-2022-27337"},{"key":"e_1_3_1_21_2","unstructured":"CVE-2022-38784 2022. CVE-2022-38784. https:\/\/nvd.nist.gov\/vuln\/detail\/CVE-2022-38784"},{"key":"e_1_3_1_22_2","unstructured":"Daan Leijen and Erik Meijer. 2001. Parsec: direct style monadic parser combinators for the real world. https:\/\/api.semanticscholar.org\/CorpusID:14373505"},{"key":"e_1_3_1_23_2","unstructured":"David Peter. Accessed: March 2024. hyperfine: a command-line benchmarking tool. https:\/\/github.com\/sharkdp\/hyperfine"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3341686"},{"key":"e_1_3_1_25_2","unstructured":"DFDL Accessed: March 2024. DFDL (Open Grid Forum). https:\/\/www.ogf.org\/ogf\/doku.php\/standards\/dfdl\/dfdl"},{"key":"e_1_3_1_26_2","volume-title":"High-Level Abstractions for Low-Level Programming","author":"Diatchki Iavor Sotirov","year":"2007","unstructured":"Iavor Sotirov Diatchki. 2007. High-Level Abstractions for Low-Level Programming. Ph. D. Dissertation. OGI School of Science & Engineering at Oregon Health & Science University."},{"key":"e_1_3_1_27_2","volume-title":"ASN.1: Communication Between Heterogeneous Systems","author":"Dubuisson Olivier","year":"2001","unstructured":"Olivier Dubuisson and Philippe Fouquart. 2001. ASN.1: Communication Between Heterogeneous Systems. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0963548304006315"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385992"},{"key":"e_1_3_1_30_2","volume-title":"14th USENIX Workshop on Offensive Technologies (WOOT 20)","author":"Fioraldi Andrea","year":"2020","unstructured":"Andrea Fioraldi, Dominik Maier, Heiko Ei\u00dffeldt, and Marc Heuse. 2020. AFL++ : Combining Incremental Steps of Fuzzing Research. In 14th USENIX Workshop on Offensive Technologies (WOOT 20). USENIX Association. https:\/\/www.usenix.org\/conference\/woot20\/presentation\/fioraldi"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/982962.964011"},{"key":"e_1_3_1_32_2","first-page":"66","volume-title":"Proceedings of the ACM Conference on Generative Programming and Component Engineering Proceedings (GPCE 2002), published as LNCS 2487","author":"Back Godmar","year":"2002","unstructured":"Godmar Back. 2002. DataScript\u2014A Specification and Scripting Language for Binary Data. In Proceedings of the ACM Conference on Generative Programming and Component Engineering Proceedings (GPCE 2002), published as LNCS 2487. Association for Computing Machinery, Pittsburgh, PA, USA, 66\u201377."},{"key":"e_1_3_1_33_2","unstructured":"Google. Accessed: March 2024. Protocol Buffers - Google\u2019s data interchange format. https:\/\/github.com\/protocolbuffers\/protobuf"},{"key":"e_1_3_1_34_2","unstructured":"Google. Accessed: March 2024. Wuffs. Wrangling Untrusted File Formats Safely. https:\/\/github.com\/google\/wuffs"},{"key":"e_1_3_1_35_2","volume-title":"Monadic Parser Combinators","author":"Hutton Graham","year":"1996","unstructured":"Graham Hutton and Erik Meijer. 1996. Monadic Parser Combinators. Technical Report. University of Nottingham."},{"key":"e_1_3_1_36_2","unstructured":"Hammer 2019. Hammer: Parser Combinators for Binary Formats in C. Yes in C. What? Don\u2019t Look at Me like That. UpstandingHackers."},{"key":"e_1_3_1_37_2","unstructured":"Mike Hodge. 2020. Exploiting PHP Phar Deserialization Vulnerabilities: Part 1. Keysight (June 2020). https:\/\/www.keysight.com\/blogs\/tech\/nwvs\/2020\/07\/23\/exploiting-php-phar-deserialization-vulnerabilities-part-1"},{"key":"e_1_3_1_38_2","volume-title":"Introduction to Automata Theory, Languages, and Computation","author":"Hopcroft John E.","year":"1979","unstructured":"John E. Hopcroft and Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, Mass."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/1086365.1086387"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","unstructured":"Iavor S. Diatchki Mike Dodds Harrison Goldstein Bill Harris David A. Holland Benoit Razet Cole Schlesinger and Simon Winwood. Accessed: May 2024. Daedalus PLDI Artifact. https:\/\/doi.org\/10.5281\/zenodo.10966813 10.5281\/zenodo.10966813","DOI":"10.5281\/zenodo.10966813"},{"key":"e_1_3_1_41_2","unstructured":"Iavor S. Diatchki Mike Dodds Harrison Goldstein Bill Harris David A. Holland Benoit Razet Cole Schlesinger and Simon Winwood. Accessed: May 2024. Daedalus Repository. https:\/\/github.com\/GaloisInc\/daedalus"},{"key":"e_1_3_1_42_2","volume-title":"Abstract Syntax Notation One (ASN.1): Specification of base notation","author":"International Telecommunication Union","year":"1994","unstructured":"International Telecommunication Union. 1994. Abstract Syntax Notation One (ASN.1): Specification of base notation. Standard. ITU. https:\/\/handle.itu.int\/11.1002\/1000\/14468"},{"key":"e_1_3_1_43_2","volume-title":"ISO 32000-2:2020 (PDF 2.0)","author":"ISO\/TC 171\/SC2","year":"2020","unstructured":"ISO\/TC 171\/SC2. 2020. ISO 32000-2:2020 (PDF 2.0). Standard. International Organization for Standardization. https:\/\/www.iso.org\/standard\/75839.html"},{"key":"e_1_3_1_44_2","unstructured":"Kaitai Accessed: March 2024. Kaitai Struct: declarative binary format parsing language. http:\/\/https:\/\/kaitai.io\/"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/1667053.1667059"},{"key":"e_1_3_1_46_2","unstructured":"Ravie Lakshmanan. 2023. Cybercriminals Using Polyglot Files in Malware Distribution to Fly Under the Radar. The Hacker News (January 2023). https:\/\/thehackernews.com\/2023\/01\/cybercriminals-using-polyglot-files-in.html"},{"key":"e_1_3_1_47_2","volume-title":"ASN. 1 Complete","author":"Larmouth John","year":"2000","unstructured":"John Larmouth. 2000. ASN. 1 Complete. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/2714064.2660241"},{"key":"e_1_3_1_49_2","unstructured":"Daniel Marjam\u00e4ki. 2007. CppCheck: Static Analysis of C\/C++ code. https:\/\/github.com\/danmar\/cppcheck"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.5555\/145055.145097"},{"key":"e_1_3_1_51_2","volume-title":"Data Format Description Language: Lessons Learned, Concepts and Experience","author":"McGrath Robert E","year":"2011","unstructured":"Robert E McGrath. 2011. Data Format Description Language: Lessons Learned, Concepts and Experience. Technical Report. National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign. https:\/\/core.ac.uk\/display\/4835576?utm_source=pdf"},{"key":"e_1_3_1_52_2","unstructured":"Microsoft. Accessed: March 2024. Language Server Protocol. https:\/\/microsoft.github.io\/language-server-protocol\/"},{"key":"e_1_3_1_53_2","doi-asserted-by":"crossref","unstructured":"Prashanth Mundkur Linda Briesemeister Natarajan Shankar Prashant Anantharaman Sameed Ali Zephyr Lucas and Sean W. Smith. 2020. Research Report: The Parsley Data Format Definition Language. 2020 IEEE Security and Privacy Work-shops (SPW) (2020) 300\u2013307. https:\/\/api.semanticscholar.org\/CorpusID:219184197","DOI":"10.1109\/SPW50608.2020.00064"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093333.3009867"},{"key":"e_1_3_1_55_2","unstructured":"nom Accessed: March 2024. nom eating data byte by byte. https:\/\/github.com\/rust-bakery\/nom"},{"key":"e_1_3_1_56_2","unstructured":"PicoHTTPParser Accessed: March 2024. PicoHTTPParser. https:\/\/github.com\/h2o\/picohttpparser"},{"key":"e_1_3_1_57_2","unstructured":"Poppler Accessed: August 2023. Poppler Releases. https:\/\/poppler.freedesktop.org\/releases.html"},{"key":"e_1_3_1_58_2","first-page":"1465","volume-title":"28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019","author":"Ramananandro Tahina","year":"2019","unstructured":"Tahina Ramananandro, Antoine Delignat-Lavaud, C\u00e9dric Fournet, Nikhil Swamy, Tej Chajed, Nadim Kobeissi, and Jonathan Protzenko. 2019. EverParse: Verified Secure Zero-Copy Parsers for Authenticated Message Formats. In 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019, Nadia Heninger and Patrick Traynor (Eds.). USENIX Association, 1465\u20131482. https:\/\/www.usenix.org\/conference\/usenixsecurity19\/presentation\/delignat-lavaud"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/0022-0000(78)90014-4"},{"key":"e_1_3_1_60_2","first-page":"12","volume-title":"Proceedings of the 31st Symposium on Implementation and Application of Functional Languages (Singapore, Singapore)","author":"Ullrich Sebastian","year":"2021","unstructured":"Sebastian Ullrich and Leonardo de Moura. 2021. Counting Immutable Beans: Reference Counting Optimized for Purely Functional Programming. In Proceedings of the 31st Symposium on Implementation and Application of Functional Languages (Singapore, Singapore). Association for Computing Machinery, New York, NY, USA, Article 3, 12 pages."},{"issue":"8","key":"e_1_3_1_61_2","first-page":"127","article-title":"Thrift: Scalable cross-language services implementation","volume":"5","author":"Slee Mark","year":"2007","unstructured":"Mark Slee, Aditya Agarwal, and Marc Kwiatkowski. 2007. Thrift: Scalable cross-language services implementation. Facebook white paper 5, 8 (2007), 127.","journal-title":"Facebook white paper"},{"key":"e_1_3_1_62_2","first-page":"PS1:15-1","volume-title":"Yacc: Yet Another Compiler-Compiler","author":"Johnson Stephen C.","year":"1975","unstructured":"Stephen C. Johnson. 1975. Yacc: Yet Another Compiler-Compiler. Technical Report. AT&T Bell Laboratories, Murray Hill, New Jersey 07974. PS1:15-1 \u2013 PS1:15-32 pages."},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523708"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.5555\/2501720"},{"key":"e_1_3_1_65_2","unstructured":"Tony Hannan. Accessed: March 2024. Haskell bson library. https:\/\/hackage.haskell.org\/package\/bson"},{"key":"e_1_3_1_66_2","doi-asserted-by":"crossref","unstructured":"Mark Tullsen William Harris and Peter Wyatt. 2022. Strengthening Weak Links in the PDF Trust Chain. LangSec Work-shop (2022). https:\/\/github.com\/gangtan\/LangSec-papers-and-slides\/raw\/main\/langsec22\/papers\/Tullsen_LangSec22.pdf","DOI":"10.1109\/SPW54247.2022.9833889"},{"key":"e_1_3_1_67_2","unstructured":"United States Department of Defense. 1993. National Imagery Transmission Format (Version 2.0). Standard. US DoD. https:\/\/earth-info.nga.mil\/php\/download.php?file=cib-2500a"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1984.5010248"},{"key":"e_1_3_1_69_2","unstructured":"David Wheeler. 2001. Flawfinder. https:\/\/security.web.cern.ch\/recommendations\/en\/codetools\/flawfinder.shtml"},{"key":"e_1_3_1_70_2","unstructured":"XpdfReader Accessed: March 2024. XpdfReader. https:\/\/www.xpdfreader.com"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/1190216.1190231"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656410","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3656410","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T20:42:09Z","timestamp":1751661729000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656410"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":70,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2024,6,20]]}},"alternative-id":["10.1145\/3656410"],"URL":"https:\/\/doi.org\/10.1145\/3656410","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,20]]},"assertion":[{"value":"2024-06-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}