{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T12:33:26Z","timestamp":1723034006815},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,1,9]],"date-time":"2020-01-09T00:00:00Z","timestamp":1578528000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,1,9]],"date-time":"2020-01-09T00:00:00Z","timestamp":1578528000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>MapReduce is used within the Hadoop framework, which handles two important tasks: mapping and reducing. Data clustering in mappers and reducers can decrease the execution time, as similar data can be assigned to the same reducer with one key. Our proposed method decreases the overall execution time by clustering and lowering the number of reducers. Our proposed algorithm is composed of five phases. In the first phase, data are stored in the Hadoop structure. In the second phase, we cluster data using the MR-DBSCAN-KD method in order to determine all of the outliers and clusters. Then, the outliers are assigned to the existing clusters using the futuristic greedy method. At the end of the second phase, similar clusters are merged together. In the third phase, clusters are assigned to the reducers. Note that fewer reducers are required for this task by applying approximated load balancing between the reducers. In the fourth phase, the reducers execute their jobs in each cluster. Eventually, in the final phase, reducers return the output. Decreasing the number of reducers and revising the clustering helped reducers to perform their jobs almost\u00a0simultaneously. Our research results indicate that the proposed algorithm improves the execution time by about 3.9% less than the fastest algorithm in our experiments.<\/jats:p>","DOI":"10.1186\/s40537-019-0279-z","type":"journal-article","created":{"date-parts":[[2020,1,9]],"date-time":"2020-01-09T08:02:57Z","timestamp":1578556977000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach"],"prefix":"10.1186","volume":"7","author":[{"given":"Ali","family":"Bakhthemmat","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammad","family":"Izadi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,1,9]]},"reference":[{"issue":"1","key":"279_CR1","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1186\/s40537-015-0030-3","volume":"2","author":"C-W Tsai","year":"2015","unstructured":"Tsai C-W, Lai C-F, Chao H-C, Vasilakos AV. Big data analytics: a survey. J Big data. 2015;2(1):21.","journal-title":"J Big data."},{"issue":"3","key":"279_CR2","first-page":"642","volume":"4","author":"K Sanse","year":"2015","unstructured":"Sanse K, Sharma M. Clustering methods for Big data analysis. Int J Adv Res Comput Eng Technol. 2015;4(3):642\u20138.","journal-title":"Int J Adv Res Comput Eng Technol."},{"key":"279_CR3","first-page":"674","volume-title":"Lecture Notes in Computer Science","author":"Weizhong Zhao","year":"2009","unstructured":"Zhao W, Ma H, He Q. Parallel k-means clustering based on mapreduce. In: IEEE international conference on cloud computing. 2009. p. 674\u20139."},{"key":"279_CR4","doi-asserted-by":"crossref","unstructured":"Srivastava DK, Yadav R, Agrwal G. Map reduce programming model for parallel K-mediod algorithm on hadoop cluster. In: 2017 7th international conference on communication systems and network technologies (CSNT). 2017. p. 74\u20138.","DOI":"10.1109\/CSNT.2017.8418514"},{"key":"279_CR5","doi-asserted-by":"crossref","unstructured":"Dai B-R, Lin I-C. Efficient map\/reduce-based dbscan algorithm with optimized data partition. In: 2012 IEEE Fifth international conference on cloud computing. 2012. p. 59\u201366.","DOI":"10.1109\/CLOUD.2012.42"},{"issue":"1","key":"279_CR6","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1007\/s11704-013-3158-3","volume":"8","author":"Y He","year":"2014","unstructured":"He Y, Tan H, Luo W, Feng S, Fan J. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front Comput Sci. 2014;8(1):83\u201399.","journal-title":"Front Comput Sci."},{"key":"279_CR7","doi-asserted-by":"crossref","unstructured":"Verma A, Cherkasova L, Campbell RH. Two sides of a coin: Optimizing the schedule of mapreduce jobs to minimize their makespan and improve cluster performance. In: 2012 IEEE 20th international symposium on modeling, analysis and simulation of computer and telecommunication systems. 2012. p. 11\u20138.","DOI":"10.1109\/MASCOTS.2012.12"},{"key":"279_CR8","doi-asserted-by":"crossref","unstructured":"Ramakrishnan SR, Swart G, Urmanov A. Balancing reducer skew in MapReduce workloads using progressive sampling. In: Proceedings of the Third ACM symposium on cloud computing. 2012. p. 16.","DOI":"10.1145\/2391229.2391245"},{"key":"279_CR9","unstructured":"Fan L, Gao B, Zhang F, Liu Z. OS4M: Achieving Global Load Balance of MapReduce workload by scheduling at the operation level. arXiv Prepr arXiv14063901. 2014."},{"key":"279_CR10","doi-asserted-by":"crossref","unstructured":"Xia H. Load balancing greedy algorithm for reduce on Hadoop platform. In: 2018 IEEE 3rd international conference on big data analysis (ICBDA). 2018. p. 212\u20136.","DOI":"10.1109\/ICBDA.2018.8367679"},{"key":"279_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2015\/793010","volume":"2015","author":"Dawen Xia","year":"2015","unstructured":"Xia D, Wang B, Li Y, Rong Z, Zhang Z. An efficient MapReduce-based parallel clustering algorithm for distributed traffic subarea division. Discret Dyn Nat Soc. 2015;2015.","journal-title":"Discrete Dynamics in Nature and Society"},{"issue":"3","key":"279_CR12","doi-asserted-by":"publisher","first-page":"818","DOI":"10.1109\/TPDS.2015.2419671","volume":"27","author":"H Ke","year":"2015","unstructured":"Ke H, Li P, Guo S, Guo M. On traffic-aware partition and aggregation in mapreduce for big data applications. IEEE Trans Parallel Distrib Syst. 2015;27(3):818\u201328.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"10","key":"279_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.17485\/ijst\/2016\/v9i10\/88981","volume":"9","author":"YD Reddy","year":"2016","unstructured":"Reddy YD, Sajin AP. An efficient traffic-aware partition and aggregation for big data applications using map-reduce. Indian J Sci Technol. 2016;9(10):1\u20137.","journal-title":"Indian J Sci Technol."},{"key":"279_CR14","doi-asserted-by":"crossref","unstructured":"Venkatesh G, Arunesh K. Map Reduce for big data processing based on traffic aware partition and aggregation. Cluster Comput. 2018. p. 1\u20137.","DOI":"10.1007\/s10586-018-1799-6"},{"issue":"3","key":"279_CR15","doi-asserted-by":"publisher","first-page":"619","DOI":"10.1007\/s10844-017-0472-5","volume":"52","author":"MA HajKacem","year":"2019","unstructured":"HajKacem MA, N\u2019cir C-E, Essoussi N. One-pass MapReduce-based clustering method for mixed large scale data. J Intell Inf Syst. 2019;52(3):619\u201336.","journal-title":"J Intell Inf Syst."},{"key":"279_CR16","doi-asserted-by":"crossref","unstructured":"Ilango SS, Vimal S, Kaliappan M, Subbulakshmi P. Optimization using artificial bee colony based clustering approach for big data. Cluster Comput. 2018. p. 1\u20139.","DOI":"10.1007\/s10586-017-1571-3"},{"issue":"8","key":"279_CR17","doi-asserted-by":"publisher","first-page":"10017","DOI":"10.1007\/s11042-017-4825-4","volume":"77","author":"T Fan","year":"2018","unstructured":"Fan T. Research and implementation of user clustering based on MapReduce in multimedia big data. Multimed Tools Appl. 2018;77(8):10017\u201331.","journal-title":"Multimed Tools Appl."},{"issue":"1","key":"279_CR18","first-page":"1","volume":"28","author":"EM Jane","year":"2018","unstructured":"Jane EM, Raj E. SBKMMA: sorting based K means and median based clustering algorithm using multi machine technique for big data. Int J Comput. 2018;28(1):1\u20137.","journal-title":"Int J Comput."},{"issue":"1","key":"279_CR19","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1186\/s40537-015-0027-y","volume":"2","author":"A Kaur","year":"2015","unstructured":"Kaur A, Datta A. A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J Big Data. 2015;2(1):17.","journal-title":"J Big Data."},{"key":"279_CR20","first-page":"427","volume-title":"Advances in Intelligent Systems and Computing","author":"K. V. Kanimozhi","year":"2017","unstructured":"Kanimozhi K\u00a0V, Venkatesan M. A novel map-reduce based augmented clustering algorithm for big text datasets. In: Data Engineering and Intelligent Computing. New York: Springer; 2018. p. 427\u201336."},{"key":"279_CR21","first-page":"291","volume-title":"Advances in Computing Systems and Applications","author":"Soumeya Zerabi","year":"2018","unstructured":"Zerabi S, Meshoul S, Khantoul B. Parallel clustering validation based on MapReduce. In: International conference on computer science and its applications. 2018. p. 291\u20139."},{"key":"279_CR22","doi-asserted-by":"publisher","first-page":"198","DOI":"10.1016\/j.eswa.2017.08.051","volume":"91","author":"B Hosseini","year":"2018","unstructured":"Hosseini B, Kiani K. FWCMR: a scalable and robust fuzzy weighted clustering based on MapReduce with application to microarray gene expression. Expert Syst Appl. 2018;91:198\u2013210.","journal-title":"Expert Syst Appl"},{"issue":"1","key":"279_CR23","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1504\/IJBDI.2019.097395","volume":"6","author":"KHK Reddy","year":"2019","unstructured":"Reddy KHK, Pandey V, Roy DS. A novel entropy-based dynamic data placement strategy for data intensive applications in Hadoop clusters. Int J Big Data Intell. 2019;6(1):20\u201337.","journal-title":"Int J Big Data Intell."},{"key":"279_CR24","doi-asserted-by":"crossref","unstructured":"Beck G, Duong T, Lebbah M, Azzag H, C\u00e9rin C. A Distributed and approximated nearest neighbors algorithm for an efficient large scale mean shift clustering. arXiv Prepr arXiv190203833. 2019.","DOI":"10.1016\/j.jpdc.2019.07.015"},{"issue":"1","key":"279_CR25","first-page":"3049","volume":"18","author":"AJ Gates","year":"2017","unstructured":"Gates AJ, Ahn Y-Y. The impact of random models on clustering similarity. J Mach Learn Res. 2017;18(1):3049\u201376.","journal-title":"J Mach Learn Res."},{"issue":"1","key":"279_CR26","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1186\/s40537-019-0236-x","volume":"6","author":"S Heidari","year":"2019","unstructured":"Heidari S, Alborzi M, Radfar R, Afsharkazemi MA, Ghatari AR. Big data clustering with varied density based on MapReduce. J Big Data. 2019;6(1):77.","journal-title":"J Big Data."},{"key":"279_CR27","unstructured":"Kenyon C, others. Best-Fit Bin-Packing with Random Order. In: SODA. 1996. p. 359\u201364."},{"key":"279_CR28","unstructured":"Data set. https:\/\/archive.ics.uci.edu\/ml\/. Accessed 9 Feb 2018."},{"key":"279_CR29","unstructured":"Data set. ftp:\/\/ftp.ncdc.noaa.gov\/pub\/data\/uscrn\/products\/subhourly01. Accessed 11 Feb 2019."},{"key":"279_CR30","volume-title":"Encyclopedia of machine learning","author":"C Sammut","year":"2011","unstructured":"Sammut C, Webb GI. Encyclopedia of machine learning. New York: Springer; 2011."},{"issue":"336","key":"279_CR31","doi-asserted-by":"publisher","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","volume":"66","author":"WM Rand","year":"1971","unstructured":"Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66(336):846\u201350.","journal-title":"J Am Stat Assoc"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-019-0279-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-019-0279-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-019-0279-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,8]],"date-time":"2021-01-08T00:30:08Z","timestamp":1610065808000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-019-0279-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,9]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["279"],"URL":"https:\/\/doi.org\/10.1186\/s40537-019-0279-z","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,9]]},"assertion":[{"value":"2 August 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 December 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 January 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors ethics approval and consent to participate.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors consent for publication.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"6"}}