{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T05:21:31Z","timestamp":1672291291363},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>\n            Mining for association rules and frequent patterns is a central activity in data mining. However, most existing algorithms are only moderately suitable for real-world scenarios. Most strategies use parameters like minimum support, for which it can be very difficult to define a suitable value for unknown datasets. Since most untrained users are unable or unwilling to set such technical parameters, we address the problem of replacing the minimum-support parameter with top-\n            <jats:italic>n<\/jats:italic>\n            strategies. In our paper, we start by extending a top-\n            <jats:italic>n<\/jats:italic>\n            implementation of the ECLAT algorithm to improve its performance by using heuristic search strategy optimizations. Also, real-world datasets are often distributed and modern database architectures are switching from expensive SMPs to cheaper shared-nothing blade servers. Thus, most mining queries require distribution handling. Since partitioning can be forced by user-defined semantics, it is often forbidden to transform the data. Therefore, we developed an adaptive top-\n            <jats:italic>n<\/jats:italic>\n            frequent-pattern mining algorithm that simplifies the mining process on real distributions by relaxing some requirements on the results. We first combine the PARTITION and the TPUT algorithms to handle distributed top-\n            <jats:italic>n<\/jats:italic>\n            frequent-pattern mining. Then, we extend this new algorithm for distributions with real-world data characteristics. For frequent-pattern mining algorithms, equal distributions are important conditions, and tiny partitions can cause performance bottlenecks. Hence, we implemented an approach called MAST that defines a minimum absolute-support threshold. MAST prunes patterns with low chances of reaching the global top-\n            <jats:italic>n<\/jats:italic>\n            result set and high computing costs. In total, our approach simplifies the process of frequent-pattern mining for real customer scenarios and data sets. This may make frequent-pattern mining accessible for very new user groups. Finally, we present results of our algorithms when run on the SAP NetWeaver BW Acceleratorwith standard and real business datasets.\n          <\/jats:p>","DOI":"10.14778\/1687553.1687571","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1438-1449","source":"Crossref","is-referenced-by-count":1,"title":["Robust and distributed top-n frequent-pattern mining with SAP BW accelerator"],"prefix":"10.14778","volume":"2","author":[{"given":"Thomas","family":"Legler","sequence":"first","affiliation":[{"name":"SAP AG, Walldorf, Germany"}]},{"given":"Wolfgang","family":"Lehner","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Dresden, Dresden, Germany"}]},{"given":"Jan","family":"Schaffner","sequence":"additional","affiliation":[{"name":"Hasso-Plattner-Institut, Potsdam, Germany"}]},{"given":"Jens","family":"Kr\u00fcger","sequence":"additional","affiliation":[{"name":"Hasso-Plattner-Institut, Potsdam, Germany"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"487","volume-title":"Proc. of the 20th Intl. Conf. On Very Large Data Bases","author":"Agrawal R.","year":"1994","unstructured":"R. Agrawal and R. Srikant . Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors , Proc. of the 20th Intl. Conf. On Very Large Data Bases , pages 487 -- 499 . Morgan Kaufmann , 1994 . R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. of the 20th Intl. Conf. On Very Large Data Bases, pages 487--499. Morgan Kaufmann, 1994."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/312129.312241"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1011767.1011798"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/645563.660338"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/646416.692863"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-007-0078-6"},{"key":"e_1_2_1_7_1","volume-title":"Mining the top-k frequent itemset with minimum length m","author":"Cong S.","year":"2001","unstructured":"S. Cong . Mining the top-k frequent itemset with minimum length m , 2001 . S. Cong. Mining the top-k frequent itemset with minimum length m, 2001."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335372"},{"key":"e_1_2_1_9_1","volume-title":"Proc. of the ICDM'02","author":"Han J.","year":"2002","unstructured":"J. Han , J. Wang , Y. Lu , and P. Tzvetkov . Mining top-k frequent closed patterns without minimum support . In Proc. of the ICDM'02 , December 2002 . J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In Proc. of the ICDM'02, December 2002."},{"key":"e_1_2_1_11_1","unstructured":"The IlliMine Project http:\/\/illimine.cs.uiuc.edu.  The IlliMine Project http:\/\/illimine.cs.uiuc.edu."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.2307\/1910129"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/1182635.1164218"},{"key":"e_1_2_1_14_1","first-page":"637","volume-title":"Proc. of the 31st Intl. Conf. On Very Large Data Bases","author":"Michel S.","year":"2005","unstructured":"S. Michel , P. Triantafillou , and G. Weikum . KLEE: a framework for distributed top-k query algorithms . In Proc. of the 31st Intl. Conf. On Very Large Data Bases , pages 637 -- 648 . VLDB Endowment , 2005 . S. Michel, P. Triantafillou, and G. Weikum. KLEE: a framework for distributed top-k query algorithms. In Proc. of the 31st Intl. Conf. On Very Large Data Bases, pages 637--648. VLDB Endowment, 2005."},{"key":"e_1_2_1_15_1","first-page":"21","volume-title":"ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery","author":"Pei J.","year":"2000","unstructured":"J. Pei , J. Han , and R. Mao . CLOSET: An efficient algorithm for mining frequent closed itemsets . In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery , pages 21 -- 30 , 2000 . J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21--30, 2000."},{"key":"e_1_2_1_16_1","volume-title":"Galileo Press","author":"Ross A.","year":"2009","unstructured":"A. Ross . SAP NetWeaver BI Accelerator . Galileo Press , 2009 . A. Ross. SAP NetWeaver BI Accelerator. Galileo Press, 2009."},{"key":"e_1_2_1_17_1","first-page":"432","volume-title":"The VLDB Journal","author":"Savasere A.","year":"1995","unstructured":"A. Savasere , E. Omiecinski , and S. B. Navathe . An efficient algorithm for mining association rules in large databases . In The VLDB Journal , pages 432 -- 444 , 1995 . A. Savasere, E. Omiecinski, and S. B. Navathe. An efficient algorithm for mining association rules in large databases. In The VLDB Journal, pages 432--444, 1995."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(02)00096-0"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1316689.1316746"},{"key":"e_1_2_1_20_1","first-page":"56","volume-title":"Proc. of the PKDD 2001 Workshop on Ubiquitous Data Mining for Mobile and Distributed Environments","author":"Wirth R.","year":"2001","unstructured":"R. Wirth , M. Borth , and J. Hipp . When distribution is part of the semantics: A new problem class for distributed knowledge discovery . In Proc. of the PKDD 2001 Workshop on Ubiquitous Data Mining for Mobile and Distributed Environments , pages 56 -- 64 , Freiburg, Germany , 2001 . R. Wirth, M. Borth, and J. Hipp. When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In Proc. of the PKDD 2001 Workshop on Ubiquitous Data Mining for Mobile and Distributed Environments, pages 56--64, Freiburg, Germany, 2001."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/4434.806975"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956788"},{"key":"e_1_2_1_23_1","volume-title":"Charm: an efficient algorithm for closed association rule mining. Technical report","author":"Zaki M.","year":"1999","unstructured":"M. Zaki and C. Hsiao . Charm: an efficient algorithm for closed association rule mining. Technical report , 1999 . M. Zaki and C. Hsiao. Charm: an efficient algorithm for closed association rule mining. Technical report, 1999."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1288552.1288555"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687553.1687571","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:55:28Z","timestamp":1672224928000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687553.1687571"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687553.1687571"],"URL":"https:\/\/doi.org\/10.14778\/1687553.1687571","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}