{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:37:10Z","timestamp":1750307830016,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2008,7,1]],"date-time":"2008-07-01T00:00:00Z","timestamp":1214870400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2008,7]]},"abstract":"<jats:p>\n            Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique\n            <jats:italic>embedding list<\/jats:italic>\n            representation of the tree structure, which enables efficient implementation of our\n            <jats:italic>Tree Model Guided<\/jats:italic>\n            (\n            <jats:italic>TMG<\/jats:italic>\n            ) candidate generation.\n            <jats:italic>TMG<\/jats:italic>\n            is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that\n            <jats:italic>TMG<\/jats:italic>\n            has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and\/or embedded subtrees by using the\n            <jats:italic>maximum level of embedding constraint<\/jats:italic>\n            . Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.\n          <\/jats:p>","DOI":"10.1145\/1376815.1376818","type":"journal-article","created":{"date-parts":[[2008,7,22]],"date-time":"2008-07-22T13:04:05Z","timestamp":1216731845000},"page":"1-43","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":30,"title":["Tree model guided candidate generation for mining frequent subtrees from XML documents"],"prefix":"10.1145","volume":"2","author":[{"given":"Henry","family":"Tan","sequence":"first","affiliation":[{"name":"Univ. of Technology Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fedja","family":"Hadzic","sequence":"additional","affiliation":[{"name":"Univ. of Technology Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tharam S.","family":"Dillon","sequence":"additional","affiliation":[{"name":"Univ. of Technology Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Elizabeth","family":"Chang","sequence":"additional","affiliation":[{"name":"Curtin Univ. of Technology, Perth, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ling","family":"Feng","sequence":"additional","affiliation":[{"name":"Tsinghua Univ., China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2008,7,24]]},"reference":[{"volume-title":"Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2002)","author":"Abe K.","key":"e_1_2_1_1_1","unstructured":"Abe , K. , Kawasoe , S. , Asai , T. , Arimura , H. , and Arikawa , S . 2002. Optimized substructure discovery for semistructured data . In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2002) (Helsinki, Finland). 1--14 Abe, K., Kawasoe, S., Asai, T., Arimura, H., and Arikawa, S. 2002. Optimized substructure discovery for semistructured data. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2002) (Helsinki, Finland). 1--14"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/170035.170072"},{"key":"e_1_2_1_3_1","unstructured":"Agrawal R. Mannila H. Srikant R. Toivonen H. and Verkamo A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining Usama M. Fayyad Gregory Piatetsky-Shapiro Padhraic Smyth Ramasamy Uthurusamy Eds. American Association for Artificial Intelligence CA 307--328.   Agrawal R. Mannila H. Srikant R. Toivonen H. and Verkamo A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining Usama M. Fayyad Gregory Piatetsky-Shapiro Padhraic Smyth Ramasamy Uthurusamy Eds. American Association for Artificial Intelligence CA 307--328."},{"volume-title":"Proceedings of the 20th Very Large Data Bases (VLDB 1994)","author":"Agrawal R.","key":"e_1_2_1_4_1","unstructured":"Agrawal , R. and Srikant , R . 1994. Fast algorithm for mining association rules . In Proceedings of the 20th Very Large Data Bases (VLDB 1994) (Santiago de Chile, Chile). 487--499. Agrawal, R. and Srikant, R. 1994. Fast algorithm for mining association rules. In Proceedings of the 20th Very Large Data Bases (VLDB 1994) (Santiago de Chile, Chile). 487--499."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276313"},{"key":"e_1_2_1_6_1","first-page":"1","article-title":"Frequent subtree mining an overview","volume":"65","author":"Chi Y.","year":"2005","unstructured":"Chi , Y. , Nijssen , S. , Muntz , R. R. , and Kok . J. N. 2005 . Frequent subtree mining an overview . Fundamenta Informaticae, Special Issue on Graph and Tree Mining 65 , 1 -- 2 , 161--198. Chi, Y., Nijssen, S., Muntz, R. R., and Kok. J. N. 2005. Frequent subtree mining an overview. Fundamenta Informaticae, Special Issue on Graph and Tree Mining 65, 1--2, 161--198.","journal-title":"Fundamenta Informaticae, Special Issue on Graph and Tree Mining"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/998688.1007091"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-31841-5_5"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJBIDM.2005.007316"},{"volume-title":"Proceedings of the 14th Database and Expert Systems Applications (DEXA 2003)","author":"Feng L.","key":"e_1_2_1_10_1","unstructured":"Feng , L. , Dillon , T. S. , Weigand , H. , and Chang , E . 2003. An XML-Enabled association rule framework . In Proceedings of the 14th Database and Expert Systems Applications (DEXA 2003) (Prague, Czech Republic). 88--97. Feng, L., Dillon, T. S., Weigand, H., and Chang, E. 2003. An XML-Enabled association rule framework. In Proceedings of the 14th Database and Expert Systems Applications (DEXA 2003) (Prague, Czech Republic). 88--97."},{"volume-title":"Proceedings of the 31st International Conference on Very Large Database (VLDB)","author":"Ghoting A.","key":"e_1_2_1_11_1","unstructured":"Ghoting , A. , Buehrer , G. , Parthasarathy , S. , Kim , D. , Nguyen , A. , Chen , Y.-K. , and Dubey , P . 2005. Cache-conscious frequent pattern mining on a modern processor . In Proceedings of the 31st International Conference on Very Large Database (VLDB) ( Trondheim, Norway). 577--588. Ghoting, A., Buehrer, G., Parthasarathy, S., Kim, D., Nguyen, A., Chen, Y.-K., and Dubey, P. 2005. Cache-conscious frequent pattern mining on a modern processor. In Proceedings of the 31st International Conference on Very Large Database (VLDB) (Trondheim, Norway). 577--588."},{"key":"e_1_2_1_12_1","unstructured":"Jenkins B. 1997. Hash functions. Dr. Dobb's J. Sept.  Jenkins B. 1997. Hash functions. Dr. Dobb's J. Sept."},{"key":"e_1_2_1_13_1","unstructured":"Kudo T. 2003. An implementation of FREQT. http:\/\/www.chasen.org\/~taku\/software\/freqt\/. (Last accessed 1 Jan 2006).  Kudo T. 2003. An implementation of FREQT. http:\/\/www.chasen.org\/~taku\/software\/freqt\/. (Last accessed 1 Jan 2006)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2004.33"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10056"},{"volume-title":"Proceedings of the 1st International Workshop Mining Graphs, Trees, and Sequences (MGTS-2003)","author":"Nijssen S.","key":"e_1_2_1_16_1","unstructured":"Nijssen , S. and Kok , J. N . 2003. Efficient discovery of frequent unordered trees . In Proceedings of the 1st International Workshop Mining Graphs, Trees, and Sequences (MGTS-2003) (Dubrovnik, Croatia), 55--64. Nijssen, S. and Kok, J. N. 2003. Efficient discovery of frequent unordered trees. In Proceedings of the 1st International Workshop Mining Graphs, Trees, and Sequences (MGTS-2003) (Dubrovnik, Croatia), 55--64."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/335168.335173"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/967900.968018"},{"key":"e_1_2_1_19_1","volume-title":"Biology: Practices and Challenges","author":"Sidhu A. S.","year":"2006","unstructured":"Sidhu , A. S. , Dillon T. S. , and Chang , E . 2006 . Protein ontology. In Database Modeling in Biology: Practices and Challenges , Z. Ma and J. Y. Chen, Eds. Springer-Verlag, New York , 39--60. Sidhu, A. S., Dillon T. S., and Chang, E. 2006. Protein ontology. In Database Modeling in Biology: Practices and Challenges, Z. Ma and J. Y. Chen, Eds. Springer-Verlag, New York, 39--60."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICITA.2005.223"},{"volume-title":"Semistructured data and XML","author":"Suciu D.","key":"e_1_2_1_21_1","unstructured":"Suciu , D. 2000. Semistructured data and XML . In Information Organization and Databases: Foundations of Data Organization, K. Tanaka, S. Ghandeharizadeh, and Y. Kambayashi, Eds. Kluwer International Series in Engineering and Computer Science Series, vol. 579 . Kluwer Academic Publishers , Norwell, MA, 9--30. Suciu, D. 2000. Semistructured data and XML. In Information Organization and Databases: Foundations of Data Organization, K. Tanaka, S. Ghandeharizadeh, and Y. Kambayashi, Eds. Kluwer International Series in Engineering and Computer Science Series, vol. 579. Kluwer Academic Publishers, Norwell, MA, 9--30."},{"volume-title":"Proceedings of the 6th International Data Mining 2005","author":"Tan H.","key":"e_1_2_1_23_1","unstructured":"Tan , H. , Dillon , T. S. , Feng , L. , Chang , E. , and Hadzic , F . 2005a. X3-Miner: mining patterns from XML database . In Proceedings of the 6th International Data Mining 2005 ( Skiathos, Greece). 287--297. Tan, H., Dillon, T. S., Feng, L., Chang, E., and Hadzic, F. 2005a. X3-Miner: mining patterns from XML database. In Proceedings of the 6th International Data Mining 2005 (Skiathos, Greece). 287--297."},{"volume-title":"Proceedings of the 1st International Workshop on Mining Complex Data 2005 in conjunction with ICDM 2005","author":"Tan H.","key":"e_1_2_1_24_1","unstructured":"Tan , H. , Dillon , T. S. , Hadzic , F. , Feng , L. , and Chang , E . 2005b. MB3-Miner: mining eMBedded subTREEs using tree model guided candidate generation . In Proceedings of the 1st International Workshop on Mining Complex Data 2005 in conjunction with ICDM 2005 ( Houston, TX). 103--110. Tan, H., Dillon, T. S., Hadzic, F., Feng, L., and Chang, E. 2005b. MB3-Miner: mining eMBedded subTREEs using tree model guided candidate generation. In Proceedings of the 1st International Workshop on Mining Complex Data 2005 in conjunction with ICDM 2005 (Houston, TX). 103--110."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/11731139_52"},{"volume-title":"Proceedings of Data Mining and Information Engineering '06 (Prague, Czech Republic, 11--13 July). 315--328","author":"Tan H.","key":"e_1_2_1_26_1","unstructured":"Tan , H. , Dillon , T. S. , Hadzic , F. , Feng , L. , and Chang , E . 2006b. SEQUEST: mining frequent subsequences using DMA Strips . In Proceedings of Data Mining and Information Engineering '06 (Prague, Czech Republic, 11--13 July). 315--328 . Tan, H., Dillon, T. S., Hadzic, F., Feng, L., and Chang, E. 2006b. SEQUEST: mining frequent subsequences using DMA Strips. In Proceedings of Data Mining and Information Engineering '06 (Prague, Czech Republic, 11--13 July). 315--328."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183614.1183680"},{"key":"e_1_2_1_28_1","volume-title":"Treefinder: A first step towards XML data mining. In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM 2002) (Maebashi City, Japan)","author":"Termier A.","year":"2002","unstructured":"Termier , A. , Rousset , M.-C. , and Sebag , M . 2002 . Treefinder: A first step towards XML data mining. In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM 2002) (Maebashi City, Japan) . IEEE Computer Society Press , Los Alamitos, CA , 450--458. Termier, A., Rousset, M.-C., and Sebag, M. 2002. Treefinder: A first step towards XML data mining. In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM 2002) (Maebashi City, Japan). IEEE Computer Society Press, Los Alamitos, CA, 450--458."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/956699.956720"},{"volume-title":"Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004)","author":"Wang C.","key":"e_1_2_1_30_1","unstructured":"Wang , C. , Hong , M. , Pei , J. , Zhou , H. , Wang , W. , and Shi , B . 2004. Efficient pattern-growth methods for frequent tree pattern mining . In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004) (Sydney, Australia). 441--451. Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., and Shi, B. 2004. Efficient pattern-growth methods for frequent tree pattern mining. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004) (Sydney, Australia). 441--451."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290982"},{"volume-title":"Proceedings of the 3rd Annual IEEE International Conference on Data Mining (ICDM 2003)","author":"Xiao Y.","key":"e_1_2_1_32_1","unstructured":"Xiao , Y. , Yao , J.-F. , Li , Z. , and Dunham , M. H . 2003. Efficient data mining for maximal frequent subtrees . In Proceedings of the 3rd Annual IEEE International Conference on Data Mining (ICDM 2003) (Melbourne, FL). IEEE Computer Society Press, Los Alamitos, CA, 379--386. Xiao, Y., Yao, J.-F., Li, Z., and Dunham, M. H. 2003. Efficient data mining for maximal frequent subtrees. In Proceedings of the 3rd Annual IEEE International Conference on Data Mining (ICDM 2003) (Melbourne, FL). IEEE Computer Society Press, Los Alamitos, CA, 379--386."},{"volume-title":"Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM 2002)","author":"Yan X.","key":"e_1_2_1_33_1","unstructured":"Yan , X. and Han , J . 2002. gSpan: Graph-based substructure pattern mining . In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM 2002) (Maebashi City, Japan). IEEE Computer Society Press, Los Alamitos, CA. 721--724. Yan, X. and Han, J. 2002. gSpan: Graph-based substructure pattern mining. In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM 2002) (Maebashi City, Japan). IEEE Computer Society Press, Los Alamitos, CA. 721--724."},{"volume-title":"Proceedings of the 29th International Very Large Data Bases (VLDB)Conference","author":"Yang L. H.","key":"e_1_2_1_34_1","unstructured":"Yang , L. H. , Lee , M. L. , and Hsu , W . 2003. Efficient mining of XML query patterns for caching . In Proceedings of the 29th International Very Large Data Bases (VLDB)Conference ( Berlin, Germany). 69--80. Yang, L. H., Lee, M. L., and Hsu, W. 2003. Efficient mining of XML query patterns for caching. In Proceedings of the 29th International Very Large Data Bases (VLDB)Conference (Berlin, Germany). 69--80."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956788"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.125"},{"volume-title":"Proceedings of the 15th International Conference Database and Expert Systems Applications (DEXA 2004)","author":"Zhang J.","key":"e_1_2_1_37_1","unstructured":"Zhang , J. , Ling , T. W. , Bruckner , R. M. , Tjoa , A. M. , and Liu , H . 2004. On efficient and effective association rule mining from XML data . In Proceedings of the 15th International Conference Database and Expert Systems Applications (DEXA 2004) (Zaragoza, Spain). 497--507. Zhang, J., Ling, T. W., Bruckner, R. M., Tjoa, A. M., and Liu, H. 2004. On efficient and effective association rule mining from XML data. In Proceedings of the 15th International Conference Database and Expert Systems Applications (DEXA 2004) (Zaragoza, Spain). 497--507."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1062745.1062785"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1376815.1376818","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1376815.1376818","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:57:55Z","timestamp":1750255075000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1376815.1376818"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,7]]},"references-count":37,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2008,7]]}},"alternative-id":["10.1145\/1376815.1376818"],"URL":"https:\/\/doi.org\/10.1145\/1376815.1376818","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2008,7]]},"assertion":[{"value":"2007-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-07-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}