{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,24]],"date-time":"2025-08-24T01:40:08Z","timestamp":1755999608171,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,12,12]],"date-time":"2024-12-12T00:00:00Z","timestamp":1733961600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"ANID Fondecyt Regular","award":["1230935"],"award-info":[{"award-number":["1230935"]}]},{"name":"ANID\u2014Millennium Science Initiative Program\u2014Code","award":["ICN17_002"],"award-info":[{"award-number":["ICN17_002"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In the past decade, the task of extracting data from nested documents over streams has become especially relevant. We focus on the streaming evaluation of queries with outputs of varied sizes over nested documents. We model queries of this kind as Visibly Pushdown Annotators (VPAnn), a computational model that extends visibly pushdown automata with outputs and has the same expressive power as monadic second-order logic over nested documents. Since processing a document through a VPAnn can generate a massive number of results, we are interested in reading the input in a streaming fashion and enumerating the outputs one after another as efficiently as possible, namely, with constant delay. This article presents an algorithm that enumerates these elements with constant delay after processing the document stream in a single pass. Furthermore, we show that this algorithm is worst-case optimal in terms of update-time per symbol and memory usage.<\/jats:p>","DOI":"10.1145\/3701557","type":"journal-article","created":{"date-parts":[[2024,10,25]],"date-time":"2024-10-25T10:05:13Z","timestamp":1729850713000},"page":"1-39","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Streaming Enumeration on Nested Documents"],"prefix":"10.1145","volume":"49","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-3294-6159","authenticated-orcid":false,"given":"Mart\u00edn","family":"Mu\u00f1oz","sequence":"first","affiliation":[{"name":"Pontificia Universidad Cat\u00f3lica de Chile, Santiago, Chile and Millennium Institute Foundational Research on Data, Santiago, Chile"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0832-116X","authenticated-orcid":false,"given":"Cristian","family":"Riveros","sequence":"additional","affiliation":[{"name":"School of Engineering, Department of Computer Science, Pontificia Universidad Catolica de Chile, Santiago, Chile and Millennium Institute Foundational Research on Data, Santiago, Chile"}]}],"member":"320","published-online":{"date-parts":[[2024,12,12]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.5555\/578775"},{"key":"e_1_3_3_3_2","article-title":"Compilers, Principles, Techniques","author":"Aho Alfred V.","year":"1986","unstructured":"Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers, Principles, Techniques. Addison-Wesley, Boston, MA.","journal-title":"Addison-Wesley, Boston, MA"},{"key":"e_1_3_3_4_2","first-page":"53","volume-title":"Proceedings of the VLDB","author":"Alt\u0131nel Mehmet","year":"2000","unstructured":"Mehmet Alt\u0131nel and Michael J. Franklin. 2000. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the VLDB. 53\u201364."},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2019.11.018"},{"key":"e_1_3_3_6_2","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1145\/1007352.1007390","volume-title":"Proceedings of the STOC","author":"Alur Rajeev","year":"2004","unstructured":"Rajeev Alur and P. Madhusudan. 2004. Visibly pushdown languages. In Proceedings of the STOC. 202\u2013211."},{"key":"e_1_3_3_7_2","first-page":"111:1\u2013111:15","volume-title":"Proceedings of the ICALP","author":"Amarilli Antoine","year":"2017","unstructured":"Antoine Amarilli, Pierre Bourhis, Louis Jachiet, and Stefan Mengel. 2017. A circuit-based approach to efficient enumeration. In Proceedings of the ICALP. 111:1\u2013111:15."},{"key":"e_1_3_3_8_2","first-page":"22:1\u201322:19","volume-title":"Proceedings of the ICDT","author":"Amarilli Antoine","year":"2019","unstructured":"Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. 2019. Constant-delay enumeration for nondeterministic document spanners. In Proceedings of the ICDT. 22:1\u201322:19."},{"key":"e_1_3_3_9_2","first-page":"89","volume-title":"Proceedings of the PODS","author":"Amarilli Antoine","year":"2019","unstructured":"Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. 2019. Enumeration on trees with tractable combined complexity and efficient updates. In Proceedings of the PODS. 89\u2013103."},{"key":"e_1_3_3_10_2","first-page":"291","volume-title":"Proceedings of the PODS","author":"Amarilli Antoine","year":"2022","unstructured":"Antoine Amarilli, Louis Jachiet, Mart\u00edn Mu\u00f1oz, and Cristian Riveros. 2022. Efficient enumeration for annotated grammars. In Proceedings of the PODS. 291\u2013300."},{"key":"e_1_3_3_11_2","first-page":"59","volume-title":"Proceedings of the PODS","author":"Arenas Marcelo","year":"2019","unstructured":"Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, and Cristian Riveros. 2019. Efficient logspace classes for enumeration, counting, and uniform generation. In Proceedings of the PODS. 59\u201373."},{"key":"e_1_3_3_12_2","first-page":"1","volume-title":"Proceedings of the SIGMOD","author":"Babcock Brian","year":"2002","unstructured":"Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. 2002. Models and issues in data stream systems. In Proceedings of the SIGMOD. 1\u201316."},{"key":"e_1_3_3_13_2","first-page":"167","volume-title":"Proceedings of the CSL","author":"Bagan Guillaume","year":"2006","unstructured":"Guillaume Bagan. 2006. MSO queries on tree decomposable structures are computable with linear delay. In Proceedings of the CSL. 167\u2013181."},{"key":"e_1_3_3_14_2","first-page":"208","volume-title":"Proceedings of the CSL","author":"Bagan Guillaume","year":"2007","unstructured":"Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. 2007. On acyclic conjunctive queries and constant delay enumeration. In Proceedings of the CSL. 208\u2013222."},{"key":"e_1_3_3_15_2","first-page":"216","volume-title":"Proceedings of the PODS","author":"Bar-Yossef Ziv","year":"2005","unstructured":"Ziv Bar-Yossef, Marcus Fontoura, and Vanja Josifovski. 2005. Buffering in query evaluation over XML streams. In Proceedings of the PODS. 216\u2013227."},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2006.10.002"},{"key":"e_1_3_3_17_2","volume-title":"Proceedings of the PODS","author":"Barloy Corentin","year":"2021","unstructured":"Corentin Barloy, Filip Murlak, and Charles Paperman. 2021. Stackless processing of streamed trees. In Proceedings of the PODS."},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3385634.3385636"},{"key":"e_1_3_3_19_2","first-page":"303","volume-title":"Proceedings of the PODS","author":"Berkholz Christoph","year":"2017","unstructured":"Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. 2017. Answering conjunctive queries under updates. In Proceedings of the PODS. 303\u2013318."},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2019.101478"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2015.01.018"},{"key":"e_1_3_3_22_2","first-page":"79","volume-title":"Proceedings of the ICDE","author":"Chen Yi","year":"2006","unstructured":"Yi Chen, Susan B. Davidson, and Yifeng Zheng. 2006. An efficient XPath query processor for XML streams. In Proceedings of the ICDE. 79."},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1561\/1900000020"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2008.08.021"},{"key":"e_1_3_3_25_2","first-page":"109","volume-title":"Proceedings of the STOC","author":"Driscoll James R.","year":"1986","unstructured":"James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. 1986. Making data structures persistent. In Proceedings of the STOC. 109\u2013121."},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/1276920.1276923"},{"issue":"2","key":"e_1_3_3_27_2","first-page":"12:1\u201312:51","article-title":"Document spanners: A formal approach to information extraction","volume":"62","author":"Fagin Ronald","year":"2015","unstructured":"Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. 2015. Document spanners: A formal approach to information extraction. J. ACM 62, 2 (2015), 12:1\u201312:51.","journal-title":"J. ACM"},{"issue":"2","key":"e_1_3_3_28_2","article-title":"Streamability of nested word transductions","volume":"15","author":"Filiot Emmanuel","year":"2019","unstructured":"Emmanuel Filiot, Olivier Gauwin, Pierre-Alain Reynier, and Fr\u00e9d\u00e9ric Servais. 2019. Streamability of nested word transductions. LMCS 15, 2 (2019).","journal-title":"LMCS"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2018.05.002"},{"issue":"1","key":"e_1_3_3_30_2","first-page":"3:1\u20133:42","article-title":"Efficient enumeration algorithms for regular document spanners","volume":"45","author":"Florenzano Fernando","year":"2020","unstructured":"Fernando Florenzano, Cristian Riveros, Mart\u00edn Ugarte, Stijn Vansummeren, and Domagoj Vrgoc. 2020. Efficient enumeration algorithms for regular document spanners. TODS 45, 1 (2020), 3:1\u20133:42.","journal-title":"TODS"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-018-9874-1"},{"key":"e_1_3_3_32_2","first-page":"137","volume-title":"Proceedings of the PODS","author":"Freydenberger Dominik D.","year":"2018","unstructured":"Dominik D. Freydenberger, Benny Kimelfeld, and Liat Peterfreund. 2018. Joining extractions of regular expressions. In Proceedings of the PODS. 137\u2013149."},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2008.08.002"},{"key":"e_1_3_3_34_2","first-page":"350","volume-title":"Proceedings of the LATA","volume":"5457","author":"Gauwin Olivier","year":"2009","unstructured":"Olivier Gauwin, Joachim Niehren, and Sophie Tison. 2009. Bounded delay and concurrency for earliest query answering. In Proceedings of the LATA, Vol. 5457. 350\u2013361."},{"key":"e_1_3_3_35_2","first-page":"121","volume-title":"Proceedings of the FCT","volume":"5699","author":"Gauwin Olivier","year":"2009","unstructured":"Olivier Gauwin, Joachim Niehren, and Sophie Tison. 2009. Earliest query answering for deterministic nested word automata. In Proceedings of the FCT, Vol. 5699. 121\u2013132."},{"key":"e_1_3_3_36_2","first-page":"269","volume-title":"Proceedings of the SIGMOD","author":"Gou Gang","year":"2007","unstructured":"Gang Gou and Rada Chirkova. 2007. Efficient algorithms for evaluating XPath over streams. In Proceedings of the SIGMOD. ACM, 269\u2013280."},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/1042046.1042051"},{"key":"e_1_3_3_38_2","first-page":"14:1\u201314:17","volume-title":"Proceedings of the ICDT","author":"Grez Alejandro","year":"2020","unstructured":"Alejandro Grez and Cristian Riveros. 2020. Towards streaming evaluation of queries with correlation in complex event processing. In Proceedings of the ICDT. 14:1\u201314:17."},{"key":"e_1_3_3_39_2","first-page":"5:1\u20135:18","volume-title":"Proceedings of the ICDT","author":"Grez Alejandro","year":"2019","unstructured":"Alejandro Grez, Cristian Riveros, and Mart\u00edn Ugarte. 2019. A formal framework for complex event processing. In Proceedings of the ICDT. 5:1\u20135:18."},{"key":"e_1_3_3_40_2","first-page":"1259","volume-title":"Proceedings of the SIGMOD","author":"Idris Muhammad","year":"2017","unstructured":"Muhammad Idris, Mart\u00edn Ugarte, and Stijn Vansummeren. 2017. The dynamic yannakakis algorithm: Compact and efficient query processing under updates. In Proceedings of the SIGMOD. 1259\u20131274."},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(86)90174-X"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-004-0123-7"},{"key":"e_1_3_3_43_2","first-page":"375","volume-title":"Proceedings of the PODS","author":"Kara Ahmet","year":"2020","unstructured":"Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. 2020. Trade-offs in static and dynamic evaluation of hierarchical queries. In Proceedings of the PODS. 375\u2013392."},{"key":"e_1_3_3_44_2","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1145\/1242572.1242714","volume-title":"Proceedings of the WWW","author":"Kumar Viraj","year":"2007","unstructured":"Viraj Kumar, P. Madhusudan, and Mahesh Viswanathan. 2007. Visibly pushdown automata for streaming XML. In Proceedings of the WWW. 1053\u20131062."},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-07003-1"},{"key":"e_1_3_3_46_2","first-page":"125","volume-title":"Proceedings of the PODS","author":"Maturana Francisco","year":"2018","unstructured":"Francisco Maturana, Cristian Riveros, and Domagoj Vrgoc. 2018. Document spanners for extracting incomplete information: Expressiveness and complexity. In Proceedings of the PODS. 125\u2013136."},{"key":"e_1_3_3_47_2","volume-title":"Proceedings of the ICDT","author":"Mu\u00f1oz Martin","year":"2023","unstructured":"Martin Mu\u00f1oz and Cristian Riveros. 2023. Constant-delay enumeration for SLP-compressed documents. In Proceedings of the ICDT."},{"key":"e_1_3_3_48_2","first-page":"365","volume-title":"Proceedings of the SIGMOD","author":"Nikolic Milos","year":"2018","unstructured":"Milos Nikolic and Dan Olteanu. 2018. Incremental view maintenance with triple lock factorization benefits. In Proceedings of the SIGMOD. 365\u2013380."},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.1063"},{"key":"e_1_3_3_50_2","first-page":"627","volume-title":"Proceedings of the SAC","author":"Olteanu Dan","year":"2004","unstructured":"Dan Olteanu, Tim Furche, and Fran\u00e7ois Bry. 2004. An efficient single-pass query evaluator for XML data streams. In Proceedings of the SAC. 627\u2013631."},{"issue":"1","key":"e_1_3_3_51_2","first-page":"2:1\u20132:44","article-title":"Size bounds for factorised representations of query results","volume":"40","author":"Olteanu Dan","year":"2015","unstructured":"Dan Olteanu and Jakub Z\u00e1vodn\u00fd. 2015. Size bounds for factorised representations of query results. ACM TODS 40, 1 (2015), 2:1\u20132:44.","journal-title":"ACM TODS"},{"key":"e_1_3_3_52_2","first-page":"7:1\u20137:18","volume-title":"Proceedings of the ICDT","author":"Peterfreund Liat","year":"2021","unstructured":"Liat Peterfreund. 2021. Grammars for document spanners. In Proceedings of the ICDT. 7:1\u20137:18."},{"key":"e_1_3_3_53_2","first-page":"10","volume-title":"Proceedings of the ICDT","author":"Segoufin Luc","year":"2013","unstructured":"Luc Segoufin. 2013. Enumerating with constant delay the answers to a query. In Proceedings of the ICDT. 10\u201320."},{"key":"e_1_3_3_54_2","first-page":"53","volume-title":"Proceedings of the PODS","author":"Segoufin Luc","year":"2002","unstructured":"Luc Segoufin and Victor Vianu. 2002. Validating streaming XML documents. In Proceedings of the PODS. 53\u201364."},{"key":"e_1_3_3_55_2","first-page":"824","volume-title":"Proceedings of the ICDE","author":"Shalem Mirit","year":"2008","unstructured":"Mirit Shalem and Ziv Bar-Yossef. 2008. The space complexity of processing XML twig queries over indexed documents. In Proceedings of the ICDE. 824\u2013832."},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/1328854.1328858"},{"key":"e_1_3_3_57_2","first-page":"427","volume-title":"Proceedings of the PODS","author":"Torunczyk Szymon","year":"2020","unstructured":"Szymon Torunczyk. 2020. Aggregate queries on sparse databases. In Proceedings of the PODS. 427\u2013443."}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701557","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3701557","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:00Z","timestamp":1750295880000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701557"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,12]]},"references-count":56,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3701557"],"URL":"https:\/\/doi.org\/10.1145\/3701557","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"type":"print","value":"0362-5915"},{"type":"electronic","value":"1557-4644"}],"subject":[],"published":{"date-parts":[[2024,12,12]]},"assertion":[{"value":"2023-08-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-20","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}