{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T07:46:31Z","timestamp":1759131991746,"version":"3.41.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2004,12,12]],"date-time":"2004-12-12T00:00:00Z","timestamp":1102809600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2004,12,12]]},"abstract":"<jats:p>\n            We consider the problem of evaluating a large number of XPath expressions on a stream of XML packets. We contribute two novel techniques. The first is to use a single Deterministic Finite Automaton (DFA). The contribution here is to show that the DFA can be used effectively for this problem: in our experiments we achieve a constant throughput, independently of the number of XPath expressions. The major issue is the size of the DFA, which, in theory, can be exponential in the number of XPath expressions. We provide a series of theoretical results and experimental evaluations that show that the\n            <jats:italic>lazy<\/jats:italic>\n            DFA has a small number of states, for all practical purposes. These results are of general interest in XPath processing, beyond stream processing. The second technique is the Streaming IndeX (SIX), which consists of adding a small amount of binary data to each XML packet that allows the query processor to achieve significant speedups. As an application of these techniques we describe the XML Toolkit (XMLTK), a collection of command-line tools providing highly scalable XML data processing.\n          <\/jats:p>","DOI":"10.1145\/1042046.1042051","type":"journal-article","created":{"date-parts":[[2005,1,26]],"date-time":"2005-01-26T16:35:53Z","timestamp":1106757353000},"page":"752-788","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":106,"title":["Processing XML streams with deterministic automata and stream indexes"],"prefix":"10.1145","volume":"29","author":[{"given":"Todd J.","family":"Green","sequence":"first","affiliation":[{"name":"University of Pennsylvania, Philadelphia, PA"}]},{"given":"Ashish","family":"Gupta","sequence":"additional","affiliation":[{"name":"University of Washington, Seattle, WA"}]},{"given":"Gerome","family":"Miklau","sequence":"additional","affiliation":[{"name":"University of Washington, Seattle, WA"}]},{"given":"Makoto","family":"Onizuka","sequence":"additional","affiliation":[{"name":"NTT Cyber Space Laboratories, NTT Corporation, Kanagawa, Japan"}]},{"given":"Dan","family":"Suciu","sequence":"additional","affiliation":[{"name":"University of Washington, Seattle, WA"}]}],"member":"320","published-online":{"date-parts":[[2004,12,12]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Web: From Relations to Semistructured Data and XML. Morgan Kaufmann","author":"Abiteboul S.","year":"1999","unstructured":"Abiteboul , S. , Buneman , P. , and Suciu , D . 1999 . Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann , San Fransisco, CA . Abiteboul, S., Buneman, P., and Suciu, D. 1999. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Fransisco, CA."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/360825.360855"},{"volume-title":"Proceedings of VLDB","author":"Altinel M.","key":"e_1_2_2_3_1","unstructured":"Altinel , M. and Franklin , M . 2000. Efficient filtering of XML documents for selective dissemination . In Proceedings of VLDB ( Cairo, Egipt). 53--64. Altinel, M. and Franklin, M. 2000. Efficient filtering of XML documents for selective dissemination. In Proceedings of VLDB (Cairo, Egipt). 53--64."},{"key":"e_1_2_2_4_1","unstructured":"ANDIS\/ISO. 1998. C&plus;&plus; Standard. ANDIS\/ISO Geneva Switzerland.  ANDIS\/ISO. 1998. C&plus;&plus; Standard. ANDIS\/ISO Geneva Switzerland."},{"volume-title":"Proceedings of PLANX.","author":"Avila-Campillo I.","key":"e_1_2_2_5_1","unstructured":"Avila-Campillo , I. , Green , T. J. , Gupta , A. , Onizuka , M. , Raven , D. , and Suciu , D . 2002. XMLTK: An XML toolkit for scalable XML stream processing . In Proceedings of PLANX. Avila-Campillo, I., Green, T. J., Gupta, A., Onizuka, M., Raven, D., and Suciu, D. 2002. XMLTK: An XML toolkit for scalable XML stream processing. In Proceedings of PLANX."},{"key":"e_1_2_2_6_1","unstructured":"Borne K. D. n.d. NASA's astronomical data center. ADC XML resource page. Available online at http:\/\/xml.gsfc.nasa.gov\/.  Borne K. D. n.d. NASA's astronomical data center. ADC XML resource page. Available online at http:\/\/xml.gsfc.nasa.gov\/."},{"volume-title":"Proceedings of the International Conference on Database Theory","author":"Buneman P.","key":"e_1_2_2_7_1","unstructured":"Buneman , P. , Davidson , S. , Fernandez , M. , and Suciu , D . 1997. Adding structure to unstructured data . In Proceedings of the International Conference on Database Theory ( Delphi, Greece). Springer Verlag, Berlin, Germany, 336--350. Buneman, P., Davidson, S., Fernandez, M., and Suciu, D. 1997. Adding structure to unstructured data. In Proceedings of the International Conference on Database Theory (Delphi, Greece). Springer Verlag, Berlin, Germany, 336--350."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(95)00024-Q"},{"volume-title":"Proceedings of the International Conference on Data Engineering.","author":"Chan C.","key":"e_1_2_2_9_1","unstructured":"Chan , C. , Felber , P. , Garofalakis , M. , and Rastogi , R . 2002. Efficient filtering of XML documents with XPath expressions . In Proceedings of the International Conference on Data Engineering. Chan, C., Felber, P., Garofalakis, M., and Rastogi, R. 2002. Efficient filtering of XML documents with XPath expressions. In Proceedings of the International Conference on Data Engineering."},{"volume-title":"Proceedings of the ACM\/SIGMOD Conference on Management of Data. 379--390","author":"Chen J.","key":"e_1_2_2_10_1","unstructured":"Chen , J. , DeWitt , D. , Tian , F. , and Wang , Y . 2000. NiagaraCQ: A scalable continuous query system for internet databases . In Proceedings of the ACM\/SIGMOD Conference on Management of Data. 379--390 . Chen, J., DeWitt, D., Tian, F., and Wang, Y. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the ACM\/SIGMOD Conference on Management of Data. 379--390."},{"key":"e_1_2_2_11_1","unstructured":"Cormen T. H. Leiserson C. E. and Rivest R. L. 1990. Introduction to Algorithms. MIT Press Cambridge MA.   Cormen T. H. Leiserson C. E. and Rivest R. L. 1990. Introduction to Algorithms. MIT Press Cambridge MA."},{"key":"e_1_2_2_12_1","unstructured":"Corp. M. n.d. DIME---direct Internet message encapsulation specification index page. IETF Internet draft. Available online at http:\/\/msdn.microsoft.com\/webservices\/understanding\/gxa\/default.aspx.  Corp. M. n.d. DIME---direct Internet message encapsulation specification index page. IETF Internet draft. Available online at http:\/\/msdn.microsoft.com\/webservices\/understanding\/gxa\/default.aspx."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/958942.958947"},{"volume-title":"Proceedings of VLDB","author":"Diao Y.","key":"e_1_2_2_14_1","unstructured":"Diao , Y. and Franklin , M . 2003. Query processing for high-volume XML message brokering . In Proceedings of VLDB ( Berlin, Germany). Diao, Y. and Franklin, M. 2003. Query processing for high-volume XML message brokering. In Proceedings of VLDB (Berlin, Germany)."},{"volume-title":"Proceedings of the International Conference on Data Engineering. 14--23","author":"Fernandez M.","key":"e_1_2_2_15_1","unstructured":"Fernandez , M. and Suciu , D . 1998. Optimizing regular path expressions using graph schemas . In Proceedings of the International Conference on Data Engineering. 14--23 . Fernandez, M. and Suciu, D. 1998. Optimizing regular path expressions using graph schemas. In Proceedings of the International Conference on Data Engineering. 14--23."},{"volume-title":"Proceedings of VLDB","author":"Florescu D.","key":"e_1_2_2_16_1","unstructured":"Florescu , D. , Hillary , C. , Kossmann , D., P. Lucas , Riccardi, F., Westmann , T. , Carey , M. , Sundararajan , A. , and Agrawal , G . 2003. The bea\/xqrl streaming xquery processor . In Proceedings of VLDB ( Berlin, Germany). 997--1008. Florescu, D., Hillary, C., Kossmann, D., P.Lucas, Riccardi, F., Westmann, T., Carey, M., Sundararajan, A., and Agrawal, G. 2003. The bea\/xqrl streaming xquery processor. In Proceedings of VLDB (Berlin, Germany). 997--1008."},{"key":"e_1_2_2_17_1","unstructured":"Garcia-Molina H. Ullman J. D. and Widom J. 2000. Database System Implementation. Prentice Hall Upper Saddle River NJ.   Garcia-Molina H. Ullman J. D. and Widom J. 2000. Database System Implementation. Prentice Hall Upper Saddle River NJ."},{"volume-title":"Proceedings of Very Large Data Bases. 436--445","author":"Goldman R.","key":"e_1_2_2_18_1","unstructured":"Goldman , R. and Widom , J . 1997. DataGuides: Enabling query formulation and optimization in semistructured databases . In Proceedings of Very Large Data Bases. 436--445 . Goldman, R. and Widom, J. 1997. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proceedings of Very Large Data Bases. 436--445."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/152610.152611"},{"volume-title":"Proceedings of ICDT. 173--189","author":"Green T. J.","key":"e_1_2_2_20_1","unstructured":"Green , T. J. , Miklau , G. , Onizuka , M. , and Suciu , D . 2003. Processing XML streams with deterministic automata . In Proceedings of ICDT. 173--189 . Green, T. J., Miklau, G., Onizuka, M., and Suciu, D. 2003. Processing XML streams with deterministic automata. In Proceedings of ICDT. 173--189."},{"volume-title":"Proceedings of the International Workshop on the Web and Database (Web DB). 83--88","author":"Gupta A. K.","key":"e_1_2_2_21_1","unstructured":"Gupta , A. K. , Halevy , A. Y. , and Suciu , D . 2002. View selection for XML stream processing . In Proceedings of the International Workshop on the Web and Database (Web DB). 83--88 . Gupta, A. K., Halevy, A. Y., and Suciu, D. 2002. View selection for XML stream processing. In Proceedings of the International Workshop on the Web and Database (Web DB). 83--88."},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872809"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/773153.773161"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/20.suppl.2071"},{"key":"e_1_2_2_25_1","unstructured":"Hopcroft J. and Ullman J. 1979. Introduction to Automata Theory Languages and Computation. Addison-Wesley Reading MA.   Hopcroft J. and Ullman J. 1979. Introduction to Automata Theory Languages and Computation. Addison-Wesley Reading MA."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-002-0078-5"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/SPIRE.2000.878194"},{"key":"e_1_2_2_28_1","unstructured":"Ley M. n.d. Computer science bibliography (dblp). Available online at http:\/\/dblp.uni- trier.de.  Ley M. n.d. Computer science bibliography (dblp). Available online at http:\/\/dblp.uni- trier.de."},{"volume-title":"Proceedings of SIGMOD","author":"Liefke H.","key":"e_1_2_2_29_1","unstructured":"Liefke , H. and Suciu , D . 2000. XMill: An efficent compressor for XML data . In Proceedings of SIGMOD ( Dallas, TX). 153--164. Liefke, H. and Suciu, D. 2000. XMill: An efficent compressor for XML data. In Proceedings of SIGMOD (Dallas, TX). 153--164."},{"volume-title":"Proceedings of VLDB. 227--238","author":"Ludaescher B.","key":"e_1_2_2_30_1","unstructured":"Ludaescher , B. , Mukhopadhyay , P. , and Papakonstantinou , Y . 2002. A transducer-based XML query processor . In Proceedings of VLDB. 227--238 . Ludaescher, B., Mukhopadhyay, P., and Papakonstantinou, Y. 2002. A transducer-based XML query processor. In Proceedings of VLDB. 227--238."},{"key":"e_1_2_2_31_1","doi-asserted-by":"crossref","unstructured":"Marcus M. Santorini B. and Marcinkiewicz M. A. 1993. Building a large annotated corpus of English: The Penn Treenbak. Computat. Ling. 19.   Marcus M. Santorini B. and Marcinkiewicz M. A. 1993. Building a large annotated corpus of English: The Penn Treenbak. Computat. Ling. 19.","DOI":"10.21236\/ADA273556"},{"volume-title":"Proceedings of VLDB","author":"McHugh J.","key":"e_1_2_2_32_1","unstructured":"McHugh , J. and Widom , J . 1999. Query optimization for XML . In Proceedings of VLDB ( Edinburgh, U.K.). 315--326. McHugh, J. and Widom, J. 1999. Query optimization for XML. In Proceedings of VLDB (Edinburgh, U.K.). 315--326."},{"volume-title":"Proceedings of the ACM SIGMOD Conference on Management of Data","author":"Nguyen B.","key":"e_1_2_2_33_1","unstructured":"Nguyen , B. , Abiteboul , S. , Cobena , G. , and Preda , M . 2001. Monitoring XML data on the Web . In Proceedings of the ACM SIGMOD Conference on Management of Data ( Santa Barbara, CA). 437--448. Nguyen, B., Abiteboul, S., Cobena, G., and Preda, M. 2001. Monitoring XML data on the Web. In Proceedings of the ACM SIGMOD Conference on Management of Data (Santa Barbara, CA). 437--448."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956928"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872810"},{"key":"e_1_2_2_36_1","doi-asserted-by":"crossref","unstructured":"Rozenberg G. and Salomaa A. 1997. Handbook of Formal Languages. Springer Verlag Berlin Germany.   Rozenberg G. and Salomaa A. 1997. Handbook of Formal Languages. Springer Verlag Berlin Germany.","DOI":"10.1007\/978-3-642-59126-6"},{"key":"e_1_2_2_37_1","volume-title":"Proceedings of WebDB, D. Suciu and G. Vossen, Eds. Sringer Verlag","author":"Sahuguet A.","year":"2000","unstructured":"Sahuguet , A. 2000 . Everything you ever wanted to know about DTDs, but were afraid to ask . In Proceedings of WebDB, D. Suciu and G. Vossen, Eds. Sringer Verlag , Berlin, Germany, 171--183. Sahuguet, A. 2000. Everything you ever wanted to know about DTDs, but were afraid to ask. In Proceedings of WebDB, D. Suciu and G. Vossen, Eds. Sringer Verlag, Berlin, Germany, 171--183."},{"volume-title":"Proceedings of the 18th Symposium on Operating Systems Principles.","author":"Snoeren A.","key":"e_1_2_2_38_1","unstructured":"Snoeren , A. , Conley , K. , and Gifford , D . 2001. Mesh-based content routing using XML . In Proceedings of the 18th Symposium on Operating Systems Principles. Snoeren, A., Conley, K., and Gifford, D. 2001. Mesh-based content routing using XML. In Proceedings of the 18th Symposium on Operating Systems Principles."},{"key":"e_1_2_2_39_1","first-page":"92","article-title":"Syntactic Definitions for the ACEDB Data Base","author":"Thierry-Mieg J.","year":"1992","unstructured":"Thierry-Mieg , J. and Durbin , R. 1992 . Syntactic Definitions for the ACEDB Data Base Manager. Tech. rep. MRC-LMB xx . 92 . MRC Laboratory for Molecular Biology, Cambridge, U.K. Thierry-Mieg, J. and Durbin, R. 1992. Syntactic Definitions for the ACEDB Data Base Manager. Tech. rep. MRC-LMB xx.92. MRC Laboratory for Molecular Biology, Cambridge, U.K.","journal-title":"MRC-LMB"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/363347.363387"},{"volume-title":"A taxonomy of finite automata construction algorithms. Computing Science report 93\/43","author":"Watson B. W.","key":"e_1_2_2_41_1","unstructured":"Watson , B. W. 1993. A taxonomy of finite automata construction algorithms. Computing Science report 93\/43 . University of Technology Eindhoven , Eindhoven, The Netherlands. Watson, B. W. 1993. A taxonomy of finite automata construction algorithms. Computing Science report 93\/43. University of Technology Eindhoven, Eindhoven, The Netherlands."},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1017\/S135132499700154X"}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1042046.1042051","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1042046.1042051","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:43:15Z","timestamp":1750286595000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1042046.1042051"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,12,12]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2004,12,12]]}},"alternative-id":["10.1145\/1042046.1042051"],"URL":"https:\/\/doi.org\/10.1145\/1042046.1042051","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"type":"print","value":"0362-5915"},{"type":"electronic","value":"1557-4644"}],"subject":[],"published":{"date-parts":[[2004,12,12]]},"assertion":[{"value":"2004-12-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}