{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T04:31:10Z","timestamp":1759638670741,"version":"3.41.0"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2005,3,1]],"date-time":"2005-03-01T00:00:00Z","timestamp":1109635200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2005,3]]},"abstract":"<jats:p>\n            We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an\n            <jats:italic>L<\/jats:italic>\n            <jats:sub>1<\/jats:sub>\n            vector space while guaranteeing a (worst-case) upper bound of\n            <jats:italic>O<\/jats:italic>\n            (log\n            <jats:sup>2<\/jats:sup>\n            <jats:italic>n<\/jats:italic>\n            log*\n            <jats:italic>n<\/jats:italic>\n            ) on the distance distortion between any data trees with at most\n            <jats:italic>n<\/jats:italic>\n            nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. Experimental results from an empirical study with both synthetic and real-life XML data trees validate our approach, demonstrating that the average-case behavior of our embedding techniques is much better than what would be predicted from our theoretical worst-case distortion bounds. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model.\n          <\/jats:p>","DOI":"10.1145\/1061318.1061326","type":"journal-article","created":{"date-parts":[[2005,8,3]],"date-time":"2005-08-03T08:30:55Z","timestamp":1123057855000},"page":"279-332","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":36,"title":["XML stream processing using tree-edit distance embeddings"],"prefix":"10.1145","volume":"30","author":[{"given":"Minos","family":"Garofalakis","sequence":"first","affiliation":[{"name":"Bell Labs, Lucent Technologies, Murray Hill, NJ"}]},{"given":"Amit","family":"Kumar","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology, New Delhi, India"}]}],"member":"320","published-online":{"date-parts":[[2005,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304207"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/303976.303978"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/237814.237823"},{"volume-title":"Proceedings of the 26th International Conference on Very Large Data Bases","author":"Altinel M.","key":"e_1_2_1_4_1"},{"volume-title":"Eds","year":"1997","author":"Apostolico A.","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/543613.543642"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/543613.543615"},{"volume-title":"Proceedings of the 6th International Workshop on Randomization and Approximation Techniques in Computer Science (RANDOM'02)","author":"Bar-Yossef Z.","key":"e_1_2_1_8_1"},{"volume-title":"Proceedings of the 26th International Conference on Very Large Data Bases","author":"Chakrabarti K.","key":"e_1_2_1_9_1"},{"volume-title":"Proceedings of the Eighteenth International Conference on Data Engineering","author":"Chan C.-Y.","key":"e_1_2_1_10_1"},{"volume-title":"Proceedings of the International Colloquium on Automata, Languages, and Programming","author":"Charikar M.","key":"e_1_2_1_11_1"},{"volume-title":"Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science","author":"Charikar M.","key":"e_1_2_1_12_1"},{"volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases","author":"Cormode G.","key":"e_1_2_1_13_1"},{"volume-title":"Proceedings of the Eighteenth International Conference on Data Engineering","author":"Cormode G.","key":"e_1_2_1_14_1"},{"volume-title":"Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms","author":"Cormode G.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Dasu T. and Johnson T. 2003. Exploratory Data Mining and Data Cleaning. Wiley Series in Probability and Statistics. John Wiley & Sons Inc. New York NY.   Dasu T. and Johnson T. 2003. Exploratory Data Mining and Data Cleaning. Wiley Series in Probability and Statistics. John Wiley & Sons Inc. New York NY.","DOI":"10.1002\/0471448354"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/958942.958947"},{"volume-title":"Proceedings of the 29th International Conference on Very Large Data Bases","author":"Diao Y.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564699"},{"volume-title":"Proceedings of the 9th International Conference on Extending Database Technology (EDBT'2004","author":"Dobra A.","key":"e_1_2_1_20_1"},{"volume-title":"Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science","author":"Feigenbaum J.","key":"e_1_2_1_21_1"},{"volume-title":"Proceedings of the 23rd International Conference on Very Large Data Bases","author":"Florescu D.","key":"e_1_2_1_22_1"},{"volume-title":"Proceedings of the 9th International Conference on Extending Database Technology (EDBT'2004","author":"Ganguly S.","key":"e_1_2_1_23_1"},{"volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases","author":"Garofalakis M.","key":"e_1_2_1_24_1"},{"volume-title":"Proceedings of the 27th International Conference on Very Large Data Bases","author":"Garofalakis M.","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/773153.773168"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/509907.509966"},{"volume-title":"Proceedings of the 27th International Conference on Very Large Data Bases","author":"Gilbert A. C.","key":"e_1_2_1_28_1"},{"volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases","author":"Gilbert A. C.","key":"e_1_2_1_29_1"},{"volume-title":"Proceedings of the 27th International Conference on Very Large Data Bases","author":"Gravano L.","key":"e_1_2_1_30_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375670"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564725"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872809"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/795666.796606"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/874063.875596"},{"volume-title":"Proceedings of the 26th International Conference on Very Large Data Bases","author":"Indyk P.","key":"e_1_2_1_36_1"},{"volume-title":"Proceedings of the 25th International Conference on Very Large Data Bases","author":"Ioannidis Y. E.","key":"e_1_2_1_37_1"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1090\/conm\/026\/737400"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.312.0249"},{"volume-title":"The Art of Computer Programming (Vol. 1\/Fundamental Algorithms)","author":"Knuth D. E.","key":"e_1_2_1_40_1"},{"volume-title":"Proceedings of the 8th International Conference on Extending Database Technology (EDBT'2002","author":"Lakshmanan L. V. S.","key":"e_1_2_1_41_1"},{"volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases","author":"Manku G. S.","key":"e_1_2_1_42_1"},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Motwani R. and Raghavan P. 1995. Randomized Algorithms. Cambridge University Press Cambridge U.K.   Motwani R. and Raghavan P. 1995. Randomized Algorithms. Cambridge University Press Cambridge U.K.","DOI":"10.1017\/CBO9780511814075"},{"key":"e_1_2_1_44_1","unstructured":"Nolan J. P. 2004. Stable distributions: Models for heavy-tailed data. Available online at http:\/\/academic2.american.edu\/~jpnolan\/stable\/stable.html.  Nolan J. P. 2004. Stable distributions: Models for heavy-tailed data. Available online at http:\/\/academic2.american.edu\/~jpnolan\/stable\/stable.html."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564733"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007599"},{"volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases","author":"Schmidt A.","key":"e_1_2_1_47_1"},{"volume-title":"Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM'2002)","author":"Shapira D.","key":"e_1_2_1_48_1"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564741"},{"key":"e_1_2_1_50_1","doi-asserted-by":"crossref","unstructured":"Uchaikin V. V. and Zolotarev V. M. 1999. Chance and Stability : Stable Distributions and their Applications. VSP Utrecht The Netherland.  Uchaikin V. V. and Zolotarev V. M. 1999. Chance and Stability : Stable Distributions and their Applications. VSP Utrecht The Netherland.","DOI":"10.1515\/9783110935974"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(92)90143-4"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304199"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1137\/0218082"}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1061318.1061326","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1061318.1061326","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:36:56Z","timestamp":1750282616000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1061318.1061326"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,3]]},"references-count":53,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2005,3]]}},"alternative-id":["10.1145\/1061318.1061326"],"URL":"https:\/\/doi.org\/10.1145\/1061318.1061326","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"type":"print","value":"0362-5915"},{"type":"electronic","value":"1557-4644"}],"subject":[],"published":{"date-parts":[[2005,3]]},"assertion":[{"value":"2005-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}