{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,2,19]],"date-time":"2023-02-19T15:51:06Z","timestamp":1676821866695},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2008,8]]},"abstract":"<jats:p>\n            We study the problem of extracting flattened tuple data from streaming, hierarchical XML data. Tuple-extraction queries are essentially XML pattern queries with\n            <jats:italic>multiple<\/jats:italic>\n            extraction nodes. Their typical applications include mapping-based XML transformation and integrated (set-based) processing of XML and relational data. Holistic twig joins are known for the optimal matching of XML pattern queries on parsed\/indexed XML data. Na\u00efve application of the holistic twig joins to streaming XML data incurs unnecessary disk I\/Os. We adapt the holistic twig joins for tuple-extraction queries on streaming XML with two novel features: first, we use the\n            <jats:italic>block-and-trigger<\/jats:italic>\n            technique to consume streaming XML data in a best-effort fashion without compromising the optimality of holistic matching; second, to reduce peak buffer sizes and overall running times, we apply\n            <jats:italic>query-path pruning<\/jats:italic>\n            and\n            <jats:italic>existential-match pruning<\/jats:italic>\n            techniques to aggressively filter irrelevant incoming data. We compare our solution with the direct competitor TurboXPath and other alternative approaches that use full-fledged query engines such as XQuery or XSLT engines for tuple extraction. The experiments using real-world XML data and queries demonstrated that our approach 1) outperformed its competitors by up to orders of magnitude, and 2) exhibited almost linear scalability. Our solution has been demonstrated extensively to IBM customers and will be included in customer engagement applications in healthcare.\n          <\/jats:p>","DOI":"10.14778\/1453856.1453891","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"289-300","source":"Crossref","is-referenced-by-count":10,"title":["StreamTX"],"prefix":"10.14778","volume":"1","author":[{"given":"Wook-Shin","family":"Han","sequence":"first","affiliation":[{"name":"Kyungpook National University, Republic of Korea"}]},{"given":"Haifeng","family":"Jiang","sequence":"additional","affiliation":[{"name":"Google Inc., Mountain View, California"}]},{"given":"Howard","family":"Ho","sequence":"additional","affiliation":[{"name":"IBM Almaden Research Center, San Jose, California"}]},{"given":"Quanzhong","family":"Li","sequence":"additional","affiliation":[{"name":"IBM Almaden Research Center, San Jose, California"}]}],"member":"320","published-online":{"date-parts":[[2008,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"VLDB","author":"Altinel M.","year":"2000"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.452.0299"},{"key":"e_1_2_1_3_1","volume-title":"W3C Working Draft","author":"Berglund A.","year":"2003"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564727"},{"key":"e_1_2_1_5_1","first-page":"283","volume-title":"Proc. of the 32nd Int'l Conference on Very Large Data Bases","author":"Chen S.","year":"2006"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066209"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.18"},{"key":"e_1_2_1_8_1","unstructured":"DBLP. http:\/\/dblp.uni-trier.de\/xml\/.  DBLP. http:\/\/dblp.uni-trier.de\/xml\/."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/800070.802184"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/1315451.1315537"},{"key":"e_1_2_1_11_1","unstructured":"Galax (version 0.5.0). http:\/\/www.galaxquery.org\/.  Galax (version 0.5.0). http:\/\/www.galaxquery.org\/."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066252"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242715"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1315451.1315476"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-004-0123-7"},{"key":"e_1_2_1_16_1","first-page":"1378","volume-title":"VLDB","author":"Koch C.","year":"2007"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/1316689.1316711"},{"key":"e_1_2_1_18_1","volume-title":"VLDB","author":"Li X.","year":"2005"},{"key":"e_1_2_1_19_1","unstructured":"MonetDB\/XQuery (version 0.10.2). http:\/\/monetdb.cwi.nl\/XQuery.  MonetDB\/XQuery (version 0.10.2). http:\/\/monetdb.cwi.nl\/XQuery."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1071610.1071617"},{"key":"e_1_2_1_21_1","unstructured":"Quip (version 2.2.1.1). http:\/\/www.softwareag.com.  Quip (version 2.2.1.1). http:\/\/www.softwareag.com."},{"key":"e_1_2_1_22_1","unstructured":"Saxon for .NET. http:\/\/saxon.sourceforge.net.  Saxon for .NET. http:\/\/saxon.sourceforge.net."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367869"},{"key":"e_1_2_1_24_1","volume-title":"ICDE","author":"Srivastava D.","year":"2002"},{"key":"e_1_2_1_25_1","unstructured":"Treebank. http:\/\/www.cis.upenn.edu\/ treebank\/.  Treebank. http:\/\/www.cis.upenn.edu\/ treebank\/."},{"key":"e_1_2_1_26_1","unstructured":"Xerces-C++ (version 2.6.0). http:\/\/xml.apache.org\/xerces-c\/.  Xerces-C++ (version 2.6.0). http:\/\/xml.apache.org\/xerces-c\/."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/376284.375722"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1453856.1453891","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:10:33Z","timestamp":1672225833000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1453856.1453891"}},"subtitle":["extracting tuples from streaming XML data"],"short-title":[],"issued":{"date-parts":[[2008,8]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,8]]}},"alternative-id":["10.14778\/1453856.1453891"],"URL":"https:\/\/doi.org\/10.14778\/1453856.1453891","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2008,8]]}}}