{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T10:22:01Z","timestamp":1768645321558,"version":"3.49.0"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,12,21]],"date-time":"2022-12-21T00:00:00Z","timestamp":1671580800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,21]],"date-time":"2022-12-21T00:00:00Z","timestamp":1671580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Fuzzy clustering is an invaluable data mining technique that allows each data point to belong to more than one cluster with some degree of membership. It is widely employed in exploratory data mining to discover overlapping communities in social networks, find structure in spectral data, and capture user interests in recommendation systems. Nowadays, the variety and volume of data are increasing at a tremendous rate. Data is power; the massive data, along with an effective technique, can unravel valuable information. The existing fuzzy clustering algorithms do not perform well on massive heterogeneous datasets. Processing an enormous amount of data is beyond the capacity of a single processor. The need of the hour is to develop fuzzy clustering techniques that can work on a distributed framework for Big Data processing and can handle heterogeneous data. In this research, we evaluate the performance of the recently proposed algorithm for the Fuzzy clustering of mixed-mode data FCMD-MD (D\u2019Urso and Massari in Inf Sci 505:513\u2013534, 2019) with different real-world datasets. We develop a distributed FCMD-MD, a fuzzy clustering algorithm for mixed-mode data in Apache SPARK. The experimental results show that the algorithm is scalable, performs well in a distributed environment, and clusters enormous heterogeneous data with high accuracy. We also compared the performance of distributed FCMD-MD and the distributed k-medoid algorithm.<\/jats:p>","DOI":"10.1186\/s40537-022-00671-7","type":"journal-article","created":{"date-parts":[[2022,12,21]],"date-time":"2022-12-21T12:02:57Z","timestamp":1671624177000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Distributed fuzzy clustering algorithm for mixed-mode data in Apache SPARK"],"prefix":"10.1186","volume":"9","author":[{"given":"Abdul Wahab","family":"Akram","sequence":"first","affiliation":[]},{"given":"Zareen","family":"Alamgir","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,12,21]]},"reference":[{"issue":"4","key":"671_CR1","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1109\/21.286391","volume":"24","author":"A Ahmad","year":"1994","unstructured":"Ahmad A, Hasmi S. Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Trans Syst Man Cybern. 1994;24(4):698\u2013708.","journal-title":"IEEE Trans Syst Man Cybern"},{"key":"671_CR2","doi-asserted-by":"publisher","first-page":"857","DOI":"10.2307\/2528823","volume":"27","author":"JC Gower","year":"1971","unstructured":"Gower JC. A general coefficient of similarity and some of its properties. Biometrics. 1971;27:857\u201371.","journal-title":"Biometrics"},{"key":"671_CR3","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1016\/j.ins.2019.07.100","volume":"505","author":"P D\u2019Urso","year":"2019","unstructured":"D\u2019Urso P, Massari R. Fuzzy clustering of mixed data. Inf Sci. 2019;505:513\u201334.","journal-title":"Inf Sci"},{"key":"671_CR4","unstructured":"Huang Z. Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia conference on knowledge discovery and data mining, (PAKDD); 1997. p. 21\u201334."},{"issue":"7","key":"671_CR5","doi-asserted-by":"publisher","first-page":"707","DOI":"10.1109\/TNB.2015.2477407","volume":"14","author":"F Sa\u00e2daoui","year":"2015","unstructured":"Sa\u00e2daoui F, Bertrand PR, Boudet G, Rouffiac K, Chamoux A. A dimensionally reduced clustering methodology for heterogeneous occupational medicine data mining. IEEE Trans NanoBiosci. 2015;14(7):707\u201315.","journal-title":"IEEE Trans NanoBiosci"},{"key":"671_CR6","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1016\/j.asoc.2016.06.019","volume":"48","author":"A Ahmad","year":"2016","unstructured":"Ahmad A, Hasmi S. K-harmonic means type clustering algorithm for mixed datasets. Appl Soft Comput. 2016;48:39\u201349.","journal-title":"Appl Soft Comput"},{"key":"671_CR7","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1007\/s10994-016-5575-7","volume":"105","author":"A Foss","year":"2016","unstructured":"Foss A, Markatou M, Ray A.H. Bonnie. A semiparametric method for clustering mixed data. Mach Learn. 2016;105:419\u201358.","journal-title":"Mach Learn"},{"key":"671_CR8","doi-asserted-by":"publisher","first-page":"988","DOI":"10.1016\/j.procs.2017.05.083","volume":"108","author":"A Skabar","year":"2017","unstructured":"Skabar A. Clustering mixed-attribute data using random walk. Procedia Comput Sci. 2017;108:988\u201397.","journal-title":"Procedia Comput Sci"},{"key":"671_CR9","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1016\/0098-3004(84)90020-7","volume":"10","author":"J Bezdek","year":"1984","unstructured":"Bezdek J, Ehrlich R, Full W. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984;10:191\u2013203.","journal-title":"Comput Geosci"},{"issue":"4","key":"671_CR10","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1109\/91.940971","volume":"9","author":"J Bezdek","year":"2001","unstructured":"Bezdek J, Ehrlich R, Full W. Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Syst. 2001;9(4):595\u2013607.","journal-title":"IEEE Trans Fuzzy Syst"},{"issue":"10","key":"671_CR11","first-page":"3319","volume":"6","author":"X Su","year":"2010","unstructured":"Su X, Wang X, Wang Z, Xiao Y. An new fuzzy clustering algorithm based on entropy weighting. J Comput Inf Syst. 2010;6(10):3319\u201326.","journal-title":"J Comput Inf Syst"},{"issue":"4","key":"671_CR12","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1109\/TFUZZ.2004.840099","volume":"13","author":"NR Pal","year":"2005","unstructured":"Pal NR, Pal K, Keller JM, Bezdek JC. A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst. 2005;13(4):517\u201330.","journal-title":"IEEE Trans Fuzzy Syst"},{"key":"671_CR13","unstructured":"Ulutagay G, Nasibov E. Fn-dbscan: a novel density-based clustering method with fuzzy neighborhood relations. In: 8th international conference on application of fuzzy systems and soft computing (ICAFS-2008); 2008. p. 101\u201310."},{"key":"671_CR14","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/j.spasta.2019.03.002","volume":"30","author":"P D\u2019Urso","year":"2019","unstructured":"D\u2019Urso P, De Giovanni L, Disegna M, Massari R. Fuzzy clustering with spatial\u2013temporal information. Spat Stat. 2019;30:71\u2013102. https:\/\/doi.org\/10.1016\/j.spasta.2019.03.002.","journal-title":"Spat Stat"},{"key":"671_CR15","doi-asserted-by":"crossref","unstructured":"Mau TN, Huynh V-N. Kernel-based k-representatives algorithm for fuzzy clustering of categorical data. In: 2021 IEEE international conference on fuzzy systems (FUZZ-IEEE); 2021.","DOI":"10.1109\/FUZZ45933.2021.9494597"},{"key":"671_CR16","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1016\/j.fss.2021.01.002","volume":"421","author":"L Wang","year":"2021","unstructured":"Wang L, Xu P, Ma Q. Incremental fuzzy clustering of time series. Fuzzy Sets Syst. 2021;421:62\u201376.","journal-title":"Fuzzy Sets Syst"},{"key":"671_CR17","doi-asserted-by":"crossref","unstructured":"Doring C, Borgelt C, Kruse R. Fuzzy clustering of quantitative and qualitative data. In: IEEE annual meeting of the fuzzy information, Vol. 1. IEEE; 2004. p. 84\u20139.","DOI":"10.1109\/NAFIPS.2004.1336254"},{"key":"671_CR18","doi-asserted-by":"publisher","first-page":"107454","DOI":"10.1016\/j.compbiolchem.2021.107454","volume":"92","author":"P Jha","year":"2021","unstructured":"Jha P, Tiwari A, Bharill N, Ratnaparkhe M, Mounika M, Nagendra N. Apache spark based kernelized fuzzy clustering framework for single nucleotide polymorphism sequence analysis. Comput Biol Chem. 2021;92:107454.","journal-title":"Comput Biol Chem"},{"key":"671_CR19","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1145\/2934664","volume":"59","author":"M Zaharia","year":"2016","unstructured":"Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I. Apache spark: a unified engine for big data processing. Commun ACM. 2016;59:56\u201365. https:\/\/doi.org\/10.1145\/2934664.","journal-title":"Commun ACM"},{"key":"671_CR20","unstructured":"Dua D, Graff C. UCI machine learning repository; 2017. http:\/\/archive.ics.uci.edu\/ml."},{"key":"671_CR21","unstructured":"Kaggle. https:\/\/www.kaggle.com."},{"key":"671_CR22","unstructured":"Australian credit dataset. http:\/\/archive.ics.uci.edu\/ml\/datasets\/statlog+(australian+credit+approval)."},{"key":"671_CR23","unstructured":"Evans B. Cylinder bands dataset; 1995. https:\/\/archive.ics.uci.edu\/ml\/datasets\/Cylinder+Bands."},{"key":"671_CR24","unstructured":"Saka CO, Kastro Y. Online shoppers purchasing intention dataset; 2018. http:\/\/archive.ics.uci.edu\/ml\/datasets\/Online+Shoppers+Purchasing+Intention+Dataset."},{"key":"671_CR25","unstructured":"Dhakar R. Airbnb dataset; 2018. https:\/\/www.kaggle.com\/ronikdhakar\/airbnb-dataset#Airbnb-Dataset."},{"key":"671_CR26","doi-asserted-by":"publisher","first-page":"546","DOI":"10.1109\/TFUZZ.2011.2179303","volume":"20","author":"E Hullermeier","year":"2012","unstructured":"Hullermeier E, Rifqi M, Henzgen S, Senge R. Comparing fuzzy partitions: a generalization of the rand index and related measures. IEEE Trans Fuzzy Syst. 2012;20:546\u201356. https:\/\/doi.org\/10.1109\/TFUZZ.2011.2179303.","journal-title":"IEEE Trans Fuzzy Syst"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-022-00671-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-022-00671-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-022-00671-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,26]],"date-time":"2023-03-26T21:15:30Z","timestamp":1679865330000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-022-00671-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,21]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["671"],"URL":"https:\/\/doi.org\/10.1186\/s40537-022-00671-7","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,21]]},"assertion":[{"value":"7 April 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 March 2023","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The typo in affiliation has been corrected.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"121"}}