{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T21:37:24Z","timestamp":1740173844001,"version":"3.37.3"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,8,13]],"date-time":"2020-08-13T00:00:00Z","timestamp":1597276800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,8,13]],"date-time":"2020-08-13T00:00:00Z","timestamp":1597276800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001858","name":"VINNOVA","doi-asserted-by":"publisher","award":["DNR 2016-02207"],"award-info":[{"award-number":["DNR 2016-02207"]}],"id":[{"id":"10.13039\/501100001858","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Contraction Clustering (RASTER) is a single-pass algorithm for density-based clustering of 2D data. It can process arbitrary amounts of data in linear time and in constant memory, quickly identifying approximate clusters. It also exhibits good scalability in the presence of multiple CPU cores. RASTER exhibits very competitive performance compared to standard clustering algorithms, but at the cost of decreased precision. Yet, RASTER is limited to batch processing and unable to identify clusters that only exist temporarily. In contrast, S-RASTER is an adaptation of RASTER to the stream processing paradigm that is able to identify clusters in evolving data streams. This algorithm retains the main benefits of its parent algorithm, i.e.\u00a0single-pass linear time cost and constant memory requirements for each discrete time step within a sliding window. The sliding window is efficiently pruned, and clustering is still performed in linear time. Like RASTER, S-RASTER trades off an often negligible amount of precision for speed. Our evaluation shows that competing algorithms are at least 50% slower. Furthermore, S-RASTER shows good qualitative results, based on standard metrics. It is very well suited to real-world scenarios where clustering does not happen continually but only periodically.<\/jats:p>","DOI":"10.1186\/s40537-020-00336-3","type":"journal-article","created":{"date-parts":[[2020,8,13]],"date-time":"2020-08-13T13:03:48Z","timestamp":1597323828000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["S-RASTER: contraction clustering for evolving data streams"],"prefix":"10.1186","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7848-4883","authenticated-orcid":false,"given":"Gregor","family":"Ulm","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8525-2474","authenticated-orcid":false,"given":"Simon","family":"Smith","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8927-845X","authenticated-orcid":false,"given":"Adrian","family":"Nilsson","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1290-9989","authenticated-orcid":false,"given":"Emil","family":"Gustavsson","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6612-8037","authenticated-orcid":false,"given":"Mats","family":"Jirstrand","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,8,13]]},"reference":[{"key":"336_CR1","doi-asserted-by":"crossref","unstructured":"Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data. SIGMOD \u201998. New York: ACM; 1998. p. 94\u2013105.","DOI":"10.1145\/276304.276314"},{"issue":"1","key":"336_CR2","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1007\/s10618-005-1396-1","volume":"11","author":"R Agrawal","year":"2005","unstructured":"Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data. Data Min Knowl Discov. 2005;11(1):5\u201333.","journal-title":"Data Min Knowl Discov"},{"key":"336_CR3","doi-asserted-by":"crossref","unstructured":"Ba\u00e7\u00e3o F, Lobo V, Painho M. Self-organizing maps as substitutes for k-means clustering. In: International conference on computational science. Berlin: Springer; 2005. p. 476\u201383.","DOI":"10.1007\/11428862_65"},{"key":"336_CR4","doi-asserted-by":"crossref","unstructured":"B\u00e4r A, Finamore A, Casas P, Golab L, Mellia M. Large-scale network traffic monitoring with dbstream, a system for rolling big data analysis. In: 2014 IEEE international conference on big data (big data). New York: IEEE; 2014. p. 165\u201370.","DOI":"10.1109\/BigData.2014.7004227"},{"issue":"May","key":"336_CR5","first-page":"1601","volume":"11","author":"A Bifet","year":"2010","unstructured":"Bifet A, Holmes G, Kirkby R, Pfahringer B. Moa: massive online analysis. J Mach Learn Res. 2010;11(May):1601\u20134.","journal-title":"J Mach Learn Res"},{"issue":"4","key":"336_CR6","first-page":"28","volume":"36","author":"P Carbone","year":"2015","unstructured":"Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K. Apache flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Comm Data Eng. 2015;36(4):28\u201338.","journal-title":"Bull IEEE Comput Soc Tech Comm Data Eng."},{"key":"336_CR7","doi-asserted-by":"crossref","unstructured":"Chen Y, Tu L. Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2007. p. 133\u201342.","DOI":"10.1145\/1281192.1281210"},{"key":"336_CR8","unstructured":"van Diggelen F, Enge P. The world\u2019s first gps mooc and worldwide laboratory using smartphones. In: Proceedings of the 28th international technical meeting of the satellite division of the institute of navigation (ION GNSS+ 2015). ION 2015. p. 361\u20139."},{"key":"336_CR9","first-page":"226","volume":"96","author":"M Ester","year":"1996","unstructured":"Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. SIGKDD Conf Knowl Discov Data Min. 1996;96:226\u201331.","journal-title":"SIGKDD Conf Knowl Discov Data Min"},{"issue":"8","key":"336_CR10","doi-asserted-by":"publisher","first-page":"893","DOI":"10.1016\/j.neunet.2007.07.008","volume":"20","author":"S Furao","year":"2007","unstructured":"Furao S, Ogura T, Hasegawa O. An enhanced self-organizing incremental neural network for online unsupervised learning. Neural Netw. 2007;20(8):893\u2013903.","journal-title":"Neural Netw"},{"key":"336_CR11","doi-asserted-by":"crossref","unstructured":"Gao J, Li J, Zhang Z, Tan PN. An incremental data stream clustering algorithm based on dense units detection. In: Pacific-Asia conference on knowledge discovery and data mining. Berlin: Springer; 2005. p. 420\u20135.","DOI":"10.1007\/11430919_49"},{"key":"336_CR12","doi-asserted-by":"crossref","unstructured":"Garc\u0131\u0301a HL, Gonz\u00e1lez IM. Self-organizing map and clustering for wastewater treatment monitoring. Eng Appl Artif Intell 17(3):215\u2013225","DOI":"10.1016\/j.engappai.2004.03.004"},{"issue":"6","key":"336_CR13","doi-asserted-by":"publisher","first-page":"1449","DOI":"10.1109\/TKDE.2016.2522412","volume":"28","author":"M Hahsler","year":"2016","unstructured":"Hahsler M, Bola\u00f1os M. Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng. 2016;28(6):1449\u201361.","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"14","key":"336_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v076.i14","volume":"76","author":"M Hahsler","year":"2017","unstructured":"Hahsler M, Bolanos M, Forrest J, et al. Introduction to stream: An extensible framework for data stream clustering research with r. J Stat Softw. 2017;76(14):1\u201350.","journal-title":"J Stat Softw"},{"issue":"1","key":"336_CR15","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2(1):193\u2013218.","journal-title":"J Classif"},{"key":"336_CR16","doi-asserted-by":"crossref","unstructured":"Jia C, Tan C, Yong A. A grid and density-based clustering algorithm for processing data stream. In: 2008 second international conference on genetic and evolutionary computing. New York: IEEE; 2008. p. 517\u201321.","DOI":"10.1109\/WGEC.2008.32"},{"key":"336_CR17","doi-asserted-by":"crossref","unstructured":"Kaisler S, Armour F, Espinosa JA, Money W. Big data: Issues and challenges moving forward. In: 2013 46th Hawaii international conference on system sciences. 2013. p. 995\u20131004.","DOI":"10.1109\/HICSS.2013.645"},{"issue":"2","key":"336_CR18","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1016\/S0167-9473(01)00040-8","volume":"38","author":"MY Kiang","year":"2001","unstructured":"Kiang MY. Extending the kohonen self-organizing map networks for clustering analysis. Comput Stat Data Anal. 2001;38(2):161\u201380.","journal-title":"Comput Stat Data Anal"},{"issue":"1","key":"336_CR19","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1007\/BF00337288","volume":"43","author":"T Kohonen","year":"1982","unstructured":"Kohonen T. Self-organized formation of topologically correct feature maps. Biol Cybern. 1982;43(1):59\u201369.","journal-title":"Biol Cybern"},{"key":"336_CR20","doi-asserted-by":"crossref","unstructured":"Kohonen T. Applications. In: Self-organizing maps. Berlin: Springer; 2001. p. 263\u2013310.","DOI":"10.1007\/978-3-642-56927-2_7"},{"key":"336_CR21","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1016\/j.neunet.2012.09.018","volume":"37","author":"T Kohonen","year":"2013","unstructured":"Kohonen T. Essentials of the self-organizing map. Neural Netw. 2013;37:52\u201365 twenty-fifth Anniversay Commemorative Issue.","journal-title":"Neural Netw."},{"issue":"24","key":"336_CR22","doi-asserted-by":"publisher","first-page":"4817","DOI":"10.1016\/j.ijleo.2015.09.127","volume":"126","author":"H Li","year":"2015","unstructured":"Li H, He H, Wen Y. Dynamic particle swarm optimization and k-means clustering algorithm for image segmentation. Optik. 2015;126(24):4817\u201322.","journal-title":"Optik"},{"key":"336_CR23","unstructured":"MacQueen J et\u00a0al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol.\u00a01. 1967. p. 281\u201397."},{"issue":"4","key":"336_CR24","doi-asserted-by":"publisher","first-page":"2145","DOI":"10.1109\/TCE.2009.5373781","volume":"55","author":"NA Mat Isa","year":"2009","unstructured":"Mat Isa NA, Salamah SA, Ngah UK. Adaptive fuzzy moving k-means clustering algorithm for image segmentation. IEEE Trans Consumer Electron. 2009;55(4):2145\u201353.","journal-title":"IEEE Trans Consumer Electron"},{"issue":"1","key":"336_CR25","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/s40537-019-0178-3","volume":"6","author":"S Mazumdar","year":"2019","unstructured":"Mazumdar S, Seybold D, Kritikos K, Verginadis Y. A survey on data storage and placement methodologies for cloud-big data ecosystem. J Big Data. 2019;6(1):15.","journal-title":"J Big Data"},{"key":"336_CR26","unstructured":"Ng HP, Ong SH, Foong KWC, Goh PS, Nowinski WL. Medical image segmentation using k-means clustering and improved watershed algorithm. In: 2006 IEEE southwest symposium on image analysis and interpretation. 2006. p. 61\u20135."},{"issue":"Oct","key":"336_CR27","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12(Oct):2825\u201330.","journal-title":"J Mach Learn Res"},{"key":"336_CR28","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rousseeuw","year":"1987","unstructured":"Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53\u201365.","journal-title":"J Comput Appl Math"},{"issue":"3","key":"336_CR29","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1109\/TNN.2005.845141","volume":"16","author":"Rui Xu","year":"2005","unstructured":"Xu Rui, Wunsch D. Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645\u201378. https:\/\/doi.org\/10.1109\/TNN.2005.845141.","journal-title":"IEEE Trans Neural Netw"},{"issue":"6","key":"336_CR30","doi-asserted-by":"publisher","first-page":"757","DOI":"10.1016\/j.compenvurbsys.2006.02.006","volume":"30","author":"G Taylor","year":"2006","unstructured":"Taylor G, Brunsdon C, Li J, Olden A, Steup D, Winter M. Gps accuracy estimation using map matching techniques: applied to vehicle positioning and odometer calibration. Comput Environ Urban Syst. 2006;30(6):757\u201372. https:\/\/doi.org\/10.1016\/j.compenvurbsys.2006.02.006.","journal-title":"Comput Environ Urban Syst"},{"issue":"1\u20132","key":"336_CR31","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/j.chemolab.2004.07.011","volume":"77","author":"TN Tran","year":"2005","unstructured":"Tran TN, Wehrens R, Buydens LM. Clustering multispectral images: a tutorial. Chemom Intell Lab Syst. 2005;77(1\u20132):3\u201317.","journal-title":"Chemom Intell Lab Syst"},{"key":"336_CR32","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1007\/978-3-319-72926-8_6","volume-title":"Machine learning, optimization, and big data","author":"G Ulm","year":"2018","unstructured":"Ulm G, Gustavsson E, Jirstrand M. Contraction clustering (RASTER). In: Nicosia G, Pardalos P, Giuffrida G, Umeton R, editors. Machine learning, optimization, and big data. Cham: Springer International Publishing; 2018. p. 63\u201375."},{"key":"336_CR33","unstructured":"Ulm G, Smith S, Nilsson A, Gustavsson E, Jirstrand M. Contraction clustering (RASTER): a very fast big data algorithm for sequential and parallel density-based clustering in linear time, constant memory, and a single pass. 2019.\u00a0arXiv preprint arXiv:1907.03620."},{"key":"336_CR34","doi-asserted-by":"crossref","unstructured":"Venkatasubramanian S. Clustering on streams. New York: Springer; 2018. p. 488\u201394.","DOI":"10.1007\/978-1-4614-8265-9_68"},{"issue":"3","key":"336_CR35","doi-asserted-by":"publisher","first-page":"586","DOI":"10.1109\/72.846731","volume":"11","author":"J Vesanto","year":"2000","unstructured":"Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Trans Neural Netw. 2000;11(3):586\u2013600.","journal-title":"IEEE Trans Neural Netw"},{"issue":"4","key":"336_CR36","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1093\/jof\/103.4.169","volume":"103","author":"MG Wing","year":"2005","unstructured":"Wing MG, Eklund A, Kellogg LD. Consumer-grade global positioning system (GPS) accuracy and reliability. J. For. 2005;103(4):169\u201373. https:\/\/doi.org\/10.1093\/jof\/103.4.169.","journal-title":"J. For."},{"issue":"10\u201310","key":"336_CR37","first-page":"95","volume":"10","author":"M Zaharia","year":"2010","unstructured":"Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. HotCloud. 2010;10(10\u201310):95.","journal-title":"HotCloud"},{"issue":"11","key":"336_CR38","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1145\/2934664","volume":"59","author":"M Zaharia","year":"2016","unstructured":"Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, et al. Apache spark: a unified engine for big data processing. Commun ACM. 2016;59(11):56\u201365.","journal-title":"Commun ACM"},{"issue":"3","key":"336_CR39","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1109\/LGRS.2013.2275738","volume":"11","author":"Y Zheng","year":"2013","unstructured":"Zheng Y, Zhang X, Hou B, Liu G. Using combined difference image and $$k$$-means clustering for sar image change detection. IEEE Geosci Remote Sens Lett. 2013;11(3):691\u20135.","journal-title":"IEEE Geosci Remote Sens Lett"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00336-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-020-00336-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00336-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,6]],"date-time":"2022-11-06T20:57:15Z","timestamp":1667768235000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-020-00336-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,13]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["336"],"URL":"https:\/\/doi.org\/10.1186\/s40537-020-00336-3","relation":{},"ISSN":["2196-1115"],"issn-type":[{"type":"electronic","value":"2196-1115"}],"subject":[],"published":{"date-parts":[[2020,8,13]]},"assertion":[{"value":"26 April 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 July 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 August 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"62"}}