{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:07:46Z","timestamp":1760242066873,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2018,11,28]],"date-time":"2018-11-28T00:00:00Z","timestamp":1543363200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The Map-Reduce (MR) framework has become a popular framework for developing new parallel algorithms for Big Data. Efficient algorithms for data mining of big data and distributed databases has become an important problem. In this paper we focus on algorithms producing association rules and frequent itemsets. After reviewing the most recent algorithms that perform this task within the MR framework, we present two new algorithms: one algorithm for producing closed frequent itemsets, and the second one for producing frequent itemsets when the database is updated and new data is added to the old database. Both algorithms include novel optimizations which are suitable to the MR framework, as well as to other parallel architectures. A detailed experimental evaluation shows the effectiveness and advantages of the algorithms over existing methods when it comes to large distributed databases.<\/jats:p>","DOI":"10.3390\/a11120194","type":"journal-article","created":{"date-parts":[[2018,11,28]],"date-time":"2018-11-28T11:43:44Z","timestamp":1543405424000},"page":"194","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["New and Efficient Algorithms for Producing Frequent Itemsets with the Map-Reduce Framework"],"prefix":"10.3390","volume":"11","author":[{"given":"Yaron","family":"Gonen","sequence":"first","affiliation":[{"name":"Department of Computer Science, Ben-Gurion University, Beer-Sheva 8410501, Israel"}]},{"given":"Ehud","family":"Gudes","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Ben-Gurion University, Beer-Sheva 8410501, Israel"},{"name":"Department of Computer Science, Open University, Ra\u2019anana 4353701, Israel"}]},{"given":"Kirill","family":"Kandalov","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Open University, Ra\u2019anana 4353701, Israel"}]}],"member":"1968","published-online":{"date-parts":[[2018,11,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Dean, J., and Ghemawat, S. (2008). MapReduce: Simplified Data Processing on Large Clusters, ACM.","DOI":"10.1145\/1327452.1327492"},{"key":"ref_2","unstructured":"(2016, January 01). Apache: Hadoop. Available online: http:\/\/hadoop.apache.org\/."},{"key":"ref_3","unstructured":"Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., and Stoica, I. (2012, January 25\u201327). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, San Jose, CA, USA."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1007\/s00778-013-0319-9","article-title":"A survey of large-scale analytical query processing in MapReduce","volume":"23","author":"Doulkeridis","year":"2014","journal-title":"VLDB J. Int. J. Very Large Data Bases"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Agrawal, R., Imieli\u0144ski, T., and Swami, A. (1993, January 25\u201328). Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA.","DOI":"10.1145\/170035.170072"},{"key":"ref_6","unstructured":"Agrawal, R., and Srikant, R. (1994). Fast Algorithms for Mining Association Rules, Morgan Kaufmann Publishers Inc."},{"key":"ref_7","first-page":"346","article-title":"Association rules mining for incremental database","volume":"2","author":"Duaimi","year":"2014","journal-title":"Int. J. Adv. Res. Comput. Sci. Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1007\/s10618-006-0059-1","article-title":"Frequent pattern mining: Current status and future directions","volume":"15","author":"Han","year":"2007","journal-title":"Data Min. Knowl. Discovery"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10115-007-0092-4","article-title":"A survey on algorithms for mining frequent","volume":"16","author":"Cheng","year":"2008","journal-title":"Knowl. Inf, Syst."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Farzanyar, Z., and Cercone, N. (2013, January 25\u201328). Efficient mining of frequent itemsets in social network data based on MapReduce framework. Proceedings of the 2013 IEEE\/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, ON, Canada.","DOI":"10.1145\/2492517.2500301"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Li, N., Zeng, L., He, Q., and Shi, Z. (2012, January 8\u201310). Parallel implementation of apriori algorithm based on MapReduce. Proceedings of the 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing, Kyoto, Japan.","DOI":"10.1109\/SNPD.2012.31"},{"key":"ref_12","unstructured":"Woo, J. (2012, January 16\u201319). Apriori-map\/reduce algorithm. Proceedings of the 2012 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2012), Las Vegas, NV, USA."},{"key":"ref_13","first-page":"59","article-title":"An efficient implementation of Apriori algorithm based on Hadoop-Mapreduce model","volume":"12","author":"Yahya","year":"2012","journal-title":"Int. J. Rev. Comput."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999, January 10\u201312). Discovering frequent closed itemsets for association rules. Proceedings of the Database Theory ICDT 99, Jerusalem, Israel.","DOI":"10.1007\/3-540-49257-7_25"},{"key":"ref_15","unstructured":"Cheung, D.W., Han, J., and Wong, C.Y. (March, January 26). Maintenance of discovered association rules in large databases: An incremental updating technique. Proceedings of the Twelfth International Conference on Data Engineering, New Orleans, LA, USA."},{"key":"ref_16","unstructured":"Thomas, S., Bodagala, S., Alsabti, K., and Ranka, S. (1997, January 14\u201317). An efficient algorithm for the incremental updation of association rules in large databases. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Das, A., and Bhattacharyya, D.K. (2004). Rule Mining for Dynamic Databases, Springer.","DOI":"10.1007\/978-3-540-30536-1_6"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gonen, Y., and Gudes, E. (2016, January 23\u201324). An improved mapreduce algorithm for mining closed frequent itemsets. Proceedings of the IEEE International Conference on Software Science, Technology and Engineering (SWSTE), Beer-Sheva, Israel.","DOI":"10.1109\/SWSTE.2016.19"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Kandalov, K., and Gudes, E. (2017). Incremental Frequent Itemsets Mining with MapReduce, Springer.","DOI":"10.1007\/978-3-319-66917-5_17"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1109\/69.553164","article-title":"Parallel mining of association rules","volume":"8","author":"Agrawal","year":"1996","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zaki, M.J., Parthasarathy, S., Ogihara, M., and Li, W. (1997). New Algorithms for Fast Discovery of Association Rules, University of Rochester.","DOI":"10.1007\/978-1-4615-5669-5_1"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lucchese, C., Orlando, S., and Perego, R. (2007, January 28\u201331). Parallel mining of frequent closed patterns: Harnessing modern computer architectures. Proceedings of the Seventh IEEE International Conference on Data Mining, Omaha, NE, USA.","DOI":"10.1109\/ICDM.2007.13"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"658","DOI":"10.1002\/cpe.1545","article-title":"Mining@home: Toward a public-resource computing framework for distributed data mining","volume":"22","author":"Lucchese","year":"2009","journal-title":"Concurrency Comput. Pract. Exp."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liang, Y.-H., and Wu, S.-Y. (July, January 27). Sequence-growth: A scalable and effective frequent itemset mining algorithm for big data based on mapreduce framework. Proceedings of the 2015 IEEE International Congress on Big Data, New York, NY, USA.","DOI":"10.1109\/BigDataCongress.2015.65"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wang, S.-Q., Yang, Y.-B., Chen, G.-P., Gao, Y., and Zhang, Y. (2012, January 10). Mapreduce based closed frequent itemset mining with efficient redundancy filtering. Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops, Brussels, Belgium.","DOI":"10.1109\/ICDMW.2012.24"},{"key":"ref_26","unstructured":"Liu, G., Lu, H., Yu, J., Wang, W., and Xiao, X. (2003, January 19\u201322). Afopt: An efficient implementation of pattern growth approach. Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, FL, USA."},{"key":"ref_27","unstructured":"Borthakur, D. (2016, January 01). The Hadoop Distributed File System: Architecture and Design. In: Hadoop Project Website. Available online: https:\/\/hadoop.apache.org\/docs\/r1.2.1\/hdfs_design.pdf."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Bhatotia, P.W., Rodrigues, R., Acar, U.A., and Pasquin, R. (2011, January 26\u201328). Incoop: MapReduce for incremental computations. Proceedings of the 2nd ACM Symposium on Cloud Computing, Cascals, Portugal.","DOI":"10.1145\/2038916.2038923"},{"key":"ref_29","unstructured":"Popa, L., Budiu, M., Yu, Y., and Isard, M. (2009, January 15). DryadInc: Reusing work in large-scale computations. Proceedings of the USENIX Workshop on Hot Topics in Cloud Computing, San Diego, CA, USA."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Afrati, F.N., and Ullman, J.D. (2010, January 22\u201326). Optimizing joins in a map-reduce environment. Proceedings of the 13th International Conference on Extending Database Technology, Lausanne, Switzerland.","DOI":"10.1145\/1739041.1739056"},{"key":"ref_31","unstructured":"(2015, June 01). Amazon: Elastic Mapreduce (EMR). Available online: https:\/\/aws.amazon.com\/elasticmapreduce\/."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Gunarathne, T., Wu, T.-L., Qiu, J., and Fox, G. (December, January 30). MapReduce in the Clouds for Science. Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, Indianapolis, IN, USA.","DOI":"10.1109\/CloudCom.2010.107"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., and Tian, Y. (2010, January 6\u201310). A comparison of join algorithms for log processing in mapreduce. Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, IN, USA.","DOI":"10.1145\/1807167.1807273"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1109\/TKDE.2011.47","article-title":"Optimizing multiway joins in a map-reduce environment","volume":"23","author":"Afrati","year":"2011","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_35","unstructured":"Goethals, B. (2015, June 01). Frequent Itemset Mining Dataset. Available online: http:\/\/fimi.ua.ac.be\/data."},{"key":"ref_36","unstructured":"Lucchese, C., Orlando, S., Perego, R., and Silvestri, F. (2004, January 1). Webdocs: A real-life huge transactional dataset. Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK."},{"key":"ref_37","unstructured":"Agrawal, R., and Srikant, R. (2016, January 01). Quest Synthetic Data Generator IBM Almaden Research Center, San Jose, California. In: Mirror: http:\/\/sourceforge.net\/projects\/ibmquestdatagen\/. Available online: http:\/\/www.almaden.ibm.com\/cs\/quest\/syndata.html."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S., Qiu, J., and Fox, G. (2010, January 21\u201325). Twister: A runtime for iterative mapreduce. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, IL, USA.","DOI":"10.1145\/1851476.1851593"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/11\/12\/194\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:33:07Z","timestamp":1760196787000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/11\/12\/194"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,11,28]]},"references-count":38,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2018,12]]}},"alternative-id":["a11120194"],"URL":"https:\/\/doi.org\/10.3390\/a11120194","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2018,11,28]]}}}