{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T21:10:10Z","timestamp":1751663410429,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["FW-HTF 2129008, CA-HDR 2033558"],"award-info":[{"award-number":["FW-HTF 2129008, CA-HDR 2033558"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,6,20]]},"abstract":"<jats:p>Lightweight syntactic analysis tools like Semgrep and Comby leverage the tree structure of code, making them more expressive than string and regex search. Unlike traditional language frameworks (e.g., ESLint) that analyze codebases via explicit syntax tree manipulations, these tools use query languages that closely resemble the source language. However, state-of-the-art matching techniques for these tools require queries to be complete and parsable snippets, which makes in-progress query specifications useless.<\/jats:p>\n          <jats:p>\n            We propose a new search architecture that relies only on tokenizing (not parsing) a query. We introduce a novel language and matching algorithm to support tree-aware wildcards on this architecture by building on tree automata. We also present\n            <jats:monospace>stsearch<\/jats:monospace>\n            , a syntactic search tool leveraging our approach.\n          <\/jats:p>\n          <jats:p>\n            In contrast to past work, our approach supports syntactic search\n            <jats:italic toggle=\"yes\">even for previously unparsable queries.<\/jats:italic>\n            We show empirically that stsea rch can support all tokenizable queries, while still providing results comparable to Semgrep for existing queries. Our work offers evidence that lightweight syntactic code search can accept in-progress specifications, potentially improving support for interactive settings.\n          <\/jats:p>\n          <jats:p>\n            CCS Concepts: \u2022\n            <jats:bold>Software and its engineering<\/jats:bold>\n            \u2192\n            <jats:italic toggle=\"yes\">Formal language definitions<\/jats:italic>\n            ;\n            <jats:italic toggle=\"yes\">Software maintenance tools;<\/jats:italic>\n            \u2022\n            <jats:bold>Information systems<\/jats:bold>\n            \u2192\n            <jats:italic toggle=\"yes\">Query representation;<\/jats:italic>\n            \u2022\n            <jats:bold>Theory of computation<\/jats:bold>\n            \u2192 Tree languages.\n          <\/jats:p>","DOI":"10.1145\/3656460","type":"journal-article","created":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T16:27:20Z","timestamp":1718900840000},"page":"2051-2072","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7785-1231","authenticated-orcid":false,"given":"Gabriel","family":"Matute","sequence":"first","affiliation":[{"name":"University of California at Berkeley, Berkeley, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5341-4958","authenticated-orcid":false,"given":"Wode","family":"Ni","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4877-0739","authenticated-orcid":false,"given":"Titus","family":"Barik","sequence":"additional","affiliation":[{"name":"Apple, Pittsburgh, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6261-6263","authenticated-orcid":false,"given":"Alvin","family":"Cheung","sequence":"additional","affiliation":[{"name":"University of California at Berkeley, Berkeley, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0557-3580","authenticated-orcid":false,"given":"Sarah E.","family":"Chasins","sequence":"additional","affiliation":[{"name":"University of California at Berkeley, Berkeley, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"e_1_3_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1882291.1882316"},{"volume-title":"Pygments.","author":"Georg Brandl","key":"e_1_3_1_3_1","unstructured":"Georg Brandl. Pygments. Version 2.14.0. Jan. 1, 2023. url: https:\/\/pygments.org."},{"volume-title":"Tree-sitter.","author":"Max Brunsfeld","key":"e_1_3_1_4_1","unstructured":"Max Brunsfeld. Tree-sitter. Version 0.20.9. Sept. 2, 2022. url: https:\/\/tree-sitter.github.io."},{"volume-title":"Tree Automata Techniques and Applications.","author":"Comon-Lundh Hubert","key":"e_1_3_1_5_1","unstructured":"Hubert Comon-Lundh et al. Tree Automata Techniques and Applications. Oct. 12, 2007. url: http:\/\/tata.gforge.inria.fr\/."},{"volume-title":"Txl.","author":"Cordy James R.","key":"e_1_3_1_6_1","unstructured":"James R. Cordy. Txl. url: https:\/\/txl.ca."},{"key":"e_1_3_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCAM.2007.31"},{"volume-title":"ESLint.","key":"e_1_3_1_8_1","unstructured":"ESLint. OpenJS Foundation. url: https:\/\/eslint.org."},{"volume-title":"Express.","key":"e_1_3_1_9_1","unstructured":"Express. OpenJS Foundation. url: https:\/\/expressjs.com."},{"key":"e_1_3_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174154"},{"key":"e_1_3_1_11_1","unstructured":"Glean. url: https:\/\/glean.software\/."},{"key":"e_1_3_1_12_1","unstructured":"gofmt. specifically the -r rule flag. Google. url: https:\/\/pkg.go.dev\/cmd\/gofmt."},{"volume-title":"Fixing Session Fixation.","key":"e_1_3_1_13_1","unstructured":"Jared Hanson. Fixing Session Fixation. May 20, 2022. url: https:\/\/medium.com\/passportjs\/fixing-session-fixation-b2b68619c51d."},{"volume-title":"Passport.","author":"Hanson Jared","key":"e_1_3_1_14_1","unstructured":"Jared Hanson.Passport. May 20, 2022. url: https:\/\/passportjs.org\/."},{"key":"e_1_3_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/322290.322295"},{"key":"e_1_3_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095430.1081744"},{"key":"e_1_3_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11334-016-0282-x"},{"volume-title":"Rosie Pattern Language.","key":"e_1_3_1_18_1","unstructured":"Jamie A. Jennings. Rosie Pattern Language. url: https:\/\/rosie-lang.org\/."},{"volume-title":"jscodeshift.","key":"e_1_3_1_19_1","unstructured":"jscodeshift. Meta. url: https:\/\/github.com\/facebook\/jscodeshift."},{"volume-title":"Cubix Framework.","key":"e_1_3_1_20_1","unstructured":"James Koppel. Cubix Framework. url: http:\/\/www.cubix-framework.com."},{"key":"e_1_3_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276492"},{"volume-title":"Kythe.","key":"e_1_3_1_22_1","unstructured":"Kythe. url: https:\/\/kythe.io\/."},{"key":"e_1_3_1_23_1","doi-asserted-by":"publisher","unstructured":"D.A. Ladd and J.C. Ramming. \u201cA*: a language for implementing language processors\u201d. In: 1994. url: https:\/\/doi.org\/10.1109\/ICCL.1994.288398 10.1109\/ICCL.1994.288398.","DOI":"10.1109\/ICCL.1994.288398"},{"volume-title":"2018 USENIX Annual Technical Conference.","author":"Lawall Julia L.","key":"e_1_3_1_24_1","unstructured":"Julia L. Lawall and Gilles Muller. \u201cCoccinelle: 10 Years of Automated Evolution in the Linux Kernel\u201d. In: 2018 USENIX Annual Technical Conference. USENIX ATC \u201818. July 2018. url: https:\/\/www.usenix.org\/conference\/atc18\/presentation\/lawall."},{"volume-title":"Coccinelle.","author":"Lawall Julia L.","key":"e_1_3_1_25_1","unstructured":"Julia L. Lawall et al. Coccinelle. url: https:\/\/coccinelle.lip6.fr\/."},{"key":"e_1_3_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3480027"},{"key":"e_1_3_1_27_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.10937816"},{"key":"e_1_3_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2013.6606596"},{"key":"e_1_3_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00047"},{"key":"e_1_3_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360569"},{"key":"e_1_3_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/WCRE.2001.957806"},{"key":"e_1_3_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/WPC.2002.1021343"},{"key":"e_1_3_1_33_1","unstructured":"Gail C. Murphy. \u201cLightweight structural summarization as an aid to software evolution\u201d. University of Washington 1996. url: http:\/\/hdl.handle.net\/1773\/6976."},{"key":"e_1_3_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/234426.234441"},{"volume-title":"recast.","key":"e_1_3_1_35_1","unstructured":"Ben Newman. recast. url: https:\/\/github.com\/benjamn\/recast."},{"key":"e_1_3_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472749.3474748"},{"key":"e_1_3_1_37_1","unstructured":"npm Registry npm. Mar. 15 2023. url: https:\/\/www.npmjs.com\/."},{"key":"e_1_3_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3386001"},{"key":"e_1_3_1_39_1","unstructured":"Retrie. Meta. url: https:\/\/github.com\/facebookincubator\/retrie."},{"key":"e_1_3_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2017.44"},{"key":"e_1_3_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786805.2786855"},{"key":"e_1_3_1_42_1","unstructured":"Semgrep. Version 1.15. r2c Mar. 15 2023. url: https:\/\/semgrep.dev."},{"key":"e_1_3_1_43_1","unstructured":"semgrep-rules. Semgrep. Mar. 15 2023. url: https:\/\/github.com\/semgrep\/semgrep-rules."},{"key":"e_1_3_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00044"},{"volume-title":"Source Graph: Batch Changes.","key":"e_1_3_1_45_1","unstructured":"Source Graph: Batch Changes. url: https:\/\/docs.sourcegraph.com\/batch_changes."},{"volume-title":"Source Graph: Code Search.","key":"e_1_3_1_46_1","unstructured":"Source Graph: Code Search. url: https:\/\/docs.sourcegraph.com\/code_search."},{"volume-title":"Comby.","key":"e_1_3_1_47_1","unstructured":"Rijnard van Tonder. Comby. url: https:\/\/comby.dev."},{"key":"e_1_3_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3563302"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656460","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3656460","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T20:44:48Z","timestamp":1751661888000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656460"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":47,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2024,6,20]]}},"alternative-id":["10.1145\/3656460"],"URL":"https:\/\/doi.org\/10.1145\/3656460","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2024,6,20]]},"assertion":[{"value":"2024-06-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}