{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T14:14:14Z","timestamp":1753884854973,"version":"3.41.2"},"reference-count":19,"publisher":"World Scientific Pub Co Pte Ltd","issue":"16","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2022,12,30]]},"abstract":"<jats:p> Data clustering is a thoroughly studied data mining issue. As the amount of information being analyzed grows exponentially, there are several problems with clustering diagnostic large datasets like the monitoring, microbiology, and end results (SEER) carcinoma feature sets. These traditional clustering methods are severely constrained in terms of speed, productivity, and adaptability. This paper summarizes the most modern distributed clustering algorithms, organized according to the computing platforms used to process vast volumes of data. The purpose of this work was to offer an optimized distributed clustering strategy for reducing the algorithm\u2019s total execution time. We obtained, preprocessed, and analyzed clinical SEER data on liver cancer, respiratory cancer, human immunodeficiency virus (HIV)-related lymphoma, and lung cancer for large-scale data clustering analysis. Three major contributions and their effects were covered in this paper: To begin, three current Pyspark distributed clustering algorithms were evaluated on SEER clinical data using a simulated New York cancer dataset. Second, systemic inflammatory response syndrome (SIRS) model inference was done and described using three SEER cancer datasets. Third, employing lung cancer data, we suggested an optimized distributed bisecting [Formula: see text]-means method. We have shown the outcomes of our suggested optimized distributed clustering technique, demonstrating the performance enhancement. <\/jats:p>","DOI":"10.1142\/s0218001422400067","type":"journal-article","created":{"date-parts":[[2022,9,21]],"date-time":"2022-09-21T09:13:38Z","timestamp":1663751618000},"source":"Crossref","is-referenced-by-count":0,"title":["Distributed Clustering Approach by Apache Pyspark Based on SEER for Clinical Data"],"prefix":"10.1142","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2225-6536","authenticated-orcid":false,"given":"R.","family":"Ramesh","sequence":"first","affiliation":[{"name":"Department of Computer Applications, Cochin University of Science and Technology (CUSAT) Cochin, Kerala 682022, India"}]},{"given":"M. V.","family":"Judy","sequence":"additional","affiliation":[{"name":"Department of Computer Applications, Cochin University of Science and Technology (CUSAT) Cochin, Kerala 682022, India"}]}],"member":"219","published-online":{"date-parts":[[2022,12,5]]},"reference":[{"key":"S0218001422400067BIB001","doi-asserted-by":"crossref","first-page":"304","DOI":"10.4081\/gh.2016.304","volume":"11","author":"Boscoe F.","year":"2016","journal-title":"Geospatial Health"},{"key":"S0218001422400067BIB002","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/CHASE.2016.35","volume-title":"2016 IEEE First Int. Conf. Connected Health: Applications, Systems and Engineering Technologies (CHASE)","author":"Chen D.","year":"2016"},{"key":"S0218001422400067BIB003","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"S0218001422400067BIB004","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1109\/TBME.1972.324107","volume":"19","author":"Dionne P.","year":"1972","journal-title":"IEEE Trans. Biomed. Eng."},{"issue":"1","key":"S0218001422400067BIB006","first-page":"169","volume":"68","author":"Ga\u0144czak M.","year":"2014","journal-title":"Przegl Epidemiol."},{"key":"S0218001422400067BIB008","first-page":"542","volume-title":"2018 IEEE\/ACM 40th Int. Conf. Software Engineering: Companion (ICSE-Companion)","author":"Gousios G.","year":"2018"},{"key":"S0218001422400067BIB009","doi-asserted-by":"crossref","first-page":"1808","DOI":"10.3201\/eid2210.160097","volume":"22","author":"Greene S.","year":"2016","journal-title":"Emerg. Infect. Diseases"},{"key":"S0218001422400067BIB010","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1475-2875-8-185","volume":"8","author":"Haque U.","year":"2009","journal-title":"Malaria J."},{"issue":"2","key":"S0218001422400067BIB011","doi-asserted-by":"crossref","first-page":"125817","DOI":"10.1016\/j.jmaa.2021.125817","volume":"507","author":"Skakauskas V.","year":"2022","journal-title":"J. Math. Anal. Appl."},{"key":"S0218001422400067BIB012","doi-asserted-by":"crossref","first-page":"858","DOI":"10.3201\/eid1005.030646","volume":"10","author":"Heffernan R.","year":"2004","journal-title":"Emerg. Infect. Diseases"},{"key":"S0218001422400067BIB013","doi-asserted-by":"crossref","first-page":"1377","DOI":"10.2105\/AJPH.88.9.1377","volume":"88","author":"Kulldorff M.","year":"1998","journal-title":"Am. J. Public Health"},{"key":"S0218001422400067BIB014","doi-asserted-by":"crossref","first-page":"544","DOI":"10.1111\/j.0006-341X.1999.00544.x","volume":"55","author":"Kulldorff M.","year":"1999","journal-title":"Biometrics"},{"key":"S0218001422400067BIB015","doi-asserted-by":"publisher","DOI":"10.23876\/j.krcp.2017.36.1.3"},{"key":"S0218001422400067BIB016","doi-asserted-by":"crossref","first-page":"46328","DOI":"10.1038\/srep46328","volume":"7","author":"Li R.","year":"2017","journal-title":"Sci. Rep."},{"key":"S0218001422400067BIB018","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1476-072X-6-30","volume":"6","author":"Nunes C.","year":"2007","journal-title":"Int. J. Health Geogr."},{"key":"S0218001422400067BIB020","first-page":"63","volume-title":"IEEE Int. Conf. ICIAF","author":"Silva D. D.","year":"2007"},{"key":"S0218001422400067BIB022","first-page":"1","volume":"2","author":"Sweeney C.","year":"2011","journal-title":"Univ. Virginia"},{"key":"S0218001422400067BIB023","first-page":"754","volume-title":"2012 IEEE Int. Conf. Bioinformatics and Biomedicine Workshops","author":"Wu D.","year":"2012"},{"key":"S0218001422400067BIB024","first-page":"381","volume-title":"2010 Int. Conf. Digital Image Computing: Techniques and Applications","author":"Zhang J.","year":"2010"}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001422400067","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,5]],"date-time":"2023-03-05T11:58:37Z","timestamp":1678017517000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218001422400067"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,5]]},"references-count":19,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2022,12,30]]}},"alternative-id":["10.1142\/S0218001422400067"],"URL":"https:\/\/doi.org\/10.1142\/s0218001422400067","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"type":"print","value":"0218-0014"},{"type":"electronic","value":"1793-6381"}],"subject":[],"published":{"date-parts":[[2022,12,5]]},"article-number":"2240006"}}