{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T13:46:12Z","timestamp":1776087972472,"version":"3.50.1"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T00:00:00Z","timestamp":1738195200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T00:00:00Z","timestamp":1738195200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Strategic Research Council at the Academy of Finland","award":["336325"],"award-info":[{"award-number":["336325"]}]},{"name":"Strategic Research Council at the Academy of Finland","award":["336330"],"award-info":[{"award-number":["336330"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Agglomerative clustering, and Ward\u2019s method, in particular, provide good clustering accuracy for most applications. However, its adoption has been limited by its quadratic time complexity, which makes it slow for large datasets. It also consumes O(<jats:italic>N<\/jats:italic>\n            <jats:sup>2<\/jats:sup>) memory for non-vectorial data. In this work, we propose a fast variant of Ward\u2019s method that reduces the number of distance calculations and only needs O(<jats:italic>N<\/jats:italic>) memory. It works by constraining the merges to neighboring nodes on a novel fully connected graph that consists of multiple approximate traveling salesman problem (TSP) solutions. This avoids the problems caused by disconnected graph components that can occur with a <jats:italic>k<\/jats:italic>-nearest neighbors graph. The method is general and works for all types of data for which a distance function can be defined. For a dataset of size 1.28 million, the proposed method achieves a speedup factor of 25:1 compared with the best exact implementation of Ward\u2019s method and obtains identical results. The algorithm allows a flexible compromise between clustering quality and running time. For a moderate degradation in quality (NMI 0.90 to 0.80), the speedup factor improves further to 625:1. The proposed TSP-graph can also be used in other applications that require a connected neighborhood graph.<\/jats:p>","DOI":"10.1186\/s40537-024-01053-x","type":"journal-article","created":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T16:23:36Z","timestamp":1738254216000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Fast agglomerative clustering using approximate traveling salesman solutions"],"prefix":"10.1186","volume":"12","author":[{"given":"Sami","family":"Sieranoja","sequence":"first","affiliation":[]},{"given":"Pasi","family":"Fr\u00e4nti","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,30]]},"reference":[{"key":"1053_CR1","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1080\/01621459.1963.10500845","volume":"58","author":"JH Ward Jr","year":"1963","unstructured":"Ward JH Jr. Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963;58:236\u201344.","journal-title":"J Am Stat Assoc"},{"key":"1053_CR2","doi-asserted-by":"publisher","first-page":"4545","DOI":"10.1109\/JSYST.2019.2960076","volume":"14","author":"LH Park","year":"2020","unstructured":"Park LH, Yu J, Kang H-K, Lee T, Kwon T. Birds of a feature: intrafamily clustering for version identification of packed malware. IEEE Syst J. 2020;14:4545\u201356.","journal-title":"IEEE Syst J"},{"key":"1053_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-021-86718-w","volume":"11","author":"S Sadhukhan","year":"2021","unstructured":"Sadhukhan S, Root-Gutteridge H, Habib B. Identifying unknown indian wolves by their distinctive howls: its potential as a non-invasive survey method. Sci Rep. 2021;11:1\u201313.","journal-title":"Sci Rep"},{"key":"1053_CR4","doi-asserted-by":"publisher","first-page":"2495","DOI":"10.1117\/1.1412423","volume":"40","author":"O Virmajoki","year":"2001","unstructured":"Virmajoki O, Fr\u00e4nti P, Kaukoranta T. Practical methods for speeding-up the PNN method. Opt Eng. 2001;40:2495\u2013504.","journal-title":"Opt Eng"},{"key":"1053_CR5","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0168288","volume":"12","author":"T Strauss","year":"2017","unstructured":"Strauss T, von Maltitz MJ. Generalising ward\u2019s method for use with Manhattan distances. PLoS ONE. 2017;12: e0168288.","journal-title":"PLoS ONE"},{"key":"1053_CR6","doi-asserted-by":"publisher","first-page":"165","DOI":"10.2307\/2528688","volume":"25","author":"D Wishart","year":"1969","unstructured":"Wishart D. An algorithm for hierarchical classifications. Biometrics. 1969;25:165\u201370.","journal-title":"Biometrics"},{"key":"1053_CR7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13015-019-0138-7","volume":"14","author":"C Ambroise","year":"2019","unstructured":"Ambroise C, Dehman A, Neuvial P, Rigaill G, Vialaneix N. Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Algor Molr Biol. 2019;14:1\u201314.","journal-title":"Algor Molr Biol"},{"key":"1053_CR8","unstructured":"M\u00fcllner D. Modern hierarchical, agglomerative clustering algorithms. arXiv preprint. 2011. https:\/\/arxiv.org\/abs\/1109.2378."},{"key":"1053_CR9","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1109\/83.841516","volume":"9","author":"P Fr\u00e4nti","year":"2000","unstructured":"Fr\u00e4nti P, Kaukoranta T, Shen D-F, Chang K-S. Fast and memory efficient implementation of the exact pnn. IEEE Trans Image Process. 2000;9:773\u20137.","journal-title":"IEEE Trans Image Process"},{"key":"1053_CR10","doi-asserted-by":"publisher","first-page":"1862","DOI":"10.1117\/1.602251","volume":"38","author":"T Kaukoranta","year":"1999","unstructured":"Kaukoranta T, Fr\u00e4nti P, Nevalainen O. Vector quantization by lazy pairwise nearest neighbor method. Opt Eng. 1999;38:1862\u20138.","journal-title":"Opt Eng"},{"key":"1053_CR11","doi-asserted-by":"publisher","first-page":"1875","DOI":"10.1109\/TPAMI.2006.227","volume":"28","author":"P Fr\u00e4nti","year":"2006","unstructured":"Fr\u00e4nti P, Virmajoki O, Hautamaki V. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans Pattern Anal Mach Intell. 2006;28:1875\u201381.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1053_CR12","doi-asserted-by":"publisher","first-page":"2785","DOI":"10.1016\/j.eswa.2014.09.054","volume":"42","author":"A Bouguettaya","year":"2015","unstructured":"Bouguettaya A, Yu Q, Liu X, Zhou X, Song A. Efficient agglomerative hierarchical clustering. Exp Syst Appl. 2015;42:2785\u201397.","journal-title":"Exp Syst Appl"},{"key":"1053_CR13","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1093\/comjnl\/9.4.373","volume":"9","author":"GN Lance","year":"1967","unstructured":"Lance GN, Williams WT. A general theory of classificatory sorting strategies: 1. Hierarchical systems. Comp J. 1967;9:373\u201380.","journal-title":"Comp J"},{"key":"1053_CR14","first-page":"67","volume-title":"Generalized ward and related clustering problems, Classification and related methods of data analysis","author":"V Batagelj","year":"1988","unstructured":"Batagelj V. Generalized ward and related clustering problems, Classification and related methods of data analysis. Amsterdam: Academia Education; 1988. p. 67\u201374."},{"key":"1053_CR15","doi-asserted-by":"publisher","first-page":"614","DOI":"10.1109\/83.563327","volume":"6","author":"J Shanbehzadeh","year":"1997","unstructured":"Shanbehzadeh J, Ogunbona PO. On the computational complexity of the lbg and pnn algorithms. IEEE Trans Image Process. 1997;6:614\u20136.","journal-title":"IEEE Trans Image Process"},{"key":"1053_CR16","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1016\/0031-3203(91)90062-A","volume":"24","author":"T Kurita","year":"1991","unstructured":"Kurita T. An efficient agglomerative clustering algorithm using a heap. Pattern Recogn. 1991;24:205\u20139.","journal-title":"Pattern Recogn"},{"key":"1053_CR17","volume-title":"Multidimensional clustering algorithms","author":"F Murtagh","year":"1985","unstructured":"Murtagh F. Multidimensional clustering algorithms. Cham: Compstat lectures; 1985."},{"key":"1053_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v053.i09","volume":"53","author":"D M\u00fcllner","year":"2013","unstructured":"M\u00fcllner D. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. J Stat Softw. 2013;53:1\u201318.","journal-title":"J Stat Softw"},{"key":"1053_CR19","doi-asserted-by":"publisher","first-page":"1568","DOI":"10.1109\/29.35395","volume":"37","author":"WH Equitz","year":"1989","unstructured":"Equitz WH. A new vector quantization clustering algorithm. IEEE Trans Acoust Speech Signal Process. 1989;37:1568\u201375.","journal-title":"IEEE Trans Acoust Speech Signal Process"},{"key":"1053_CR20","doi-asserted-by":"publisher","first-page":"4743","DOI":"10.1007\/s10489-018-1238-7","volume":"48","author":"P Fr\u00e4nti","year":"2018","unstructured":"Fr\u00e4nti P, Sieranoja S. K-means properties on six clustering benchmark datasets. Appl Intell. 2018;48:4743\u201359.","journal-title":"Appl Intell"},{"key":"1053_CR21","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1016\/j.ins.2016.05.003","volume":"363","author":"M Gagolewski","year":"2016","unstructured":"Gagolewski M, Bartoszuk M, Cena A. Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf Sci. 2016;363:8\u201323.","journal-title":"Inf Sci"},{"key":"1053_CR22","doi-asserted-by":"publisher","first-page":"P09008","DOI":"10.1088\/1742-5468\/2005\/09\/P09008","volume":"2005","author":"L Danon","year":"2005","unstructured":"Danon L, Diaz-Guilera A, Duch J, Arenas A. Comparing community structure identification. J Stat Mech Theory Exp. 2005;2005:P09008.","journal-title":"J Stat Mech Theory Exp"},{"key":"1053_CR23","first-page":"285","volume-title":"Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR)","author":"P Fr\u00e4nti","year":"2016","unstructured":"Fr\u00e4nti P, Rezaei M. Generalizing centroid index to different clustering models. In: Robles-Kelly A, Loog M, Biggio B, Escolano F, Wilson R, editors. Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Cham: Springer; 2016. p. 285\u201396."},{"key":"1053_CR24","first-page":"1312","volume":"22","author":"K Hajebi","year":"2011","unstructured":"Hajebi K, Abbasi-Yadkori Y, Shahbazi H, Zhang H. Fast approximate nearest-neighbor search with k-nearest neighbor graph. IJCAI Proc Int Joint Conf Artif Intell. 2011;22:1312.","journal-title":"IJCAI Proc Int Joint Conf Artif Intell."},{"issue":"3","key":"1053_CR25","doi-asserted-by":"publisher","first-page":"909","DOI":"10.1090\/S0894-0347-07-00589-9","volume":"21","author":"C Bachoc","year":"2008","unstructured":"Bachoc C, Vallentin F. New upper bounds for kissing numbers from semidefinite programming. J Am Math Soc. 2008;21(3):909\u201324.","journal-title":"J Am Math Soc"},{"issue":"1","key":"1053_CR26","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1080\/09298215.2017.1354891","volume":"47","author":"A Flexer","year":"2018","unstructured":"Flexer A, Stevens J. Mutual proximity graphs for improved reachability in music recommendation. J New Music Res. 2018;47(1):17\u201328.","journal-title":"J New Music Res"},{"issue":"9","key":"1053_CR27","doi-asserted-by":"publisher","first-page":"1722","DOI":"10.1016\/j.ins.2011.01.011","volume":"181","author":"JZC Lai","year":"2011","unstructured":"Lai JZC, Huang T-J. An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list. Inf Sci. 2011;181(9):1722\u201334.","journal-title":"Inf Sci"},{"issue":"1","key":"1053_CR28","doi-asserted-by":"publisher","first-page":"93","DOI":"10.3934\/aci.2023006","volume":"3","author":"MI Malinen","year":"2023","unstructured":"Malinen MI, Fr\u00e4nti P. All-pairwise squared distances lead to more balanced clustering. Appl Comp Intell. 2023;3(1):93\u2013115.","journal-title":"Appl Comp Intell"},{"key":"1053_CR29","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1007\/978-3-319-07692-8_9","volume-title":"Recent advances on soft computing and data mining (SCDM-2014)","author":"HH Chieng","year":"2014","unstructured":"Chieng HH, Wahid N. A performance comparison of genetic algorithm\u2019s mutation operators in n-cities open loop travelling salesman problem. In: Herawan T, Ghazali R, Deris MM, editors. Recent advances on soft computing and data mining (SCDM-2014). Cham: Springer International Publishing; 2014. p. 89\u201397."},{"issue":"8","key":"1053_CR30","first-page":"707","volume":"10","author":"VI Levenshtein","year":"1966","unstructured":"Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady. 1966;10(8):707\u201310.","journal-title":"Soviet Phys Doklady"},{"key":"1053_CR31","volume-title":"Introduction to algorithms","author":"TH Cormen","year":"2022","unstructured":"Cormen TH, et al. Introduction to algorithms. Cambridge: MIT press; 2022."},{"key":"1053_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107295","volume":"228","author":"C Wu","year":"2021","unstructured":"Wu C, Peng Q, Lee J, Leibnitz K, Xia Y. Effective hierarchical clustering based on structural similarities in nearest neighbor graphs. Knowl-Based Syst. 2021;228: 107295.","journal-title":"Knowl-Based Syst"},{"issue":"2","key":"1053_CR33","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1007\/s00357-020-09377-y","volume":"38","author":"N Randriamihamison","year":"2021","unstructured":"Randriamihamison N, Vialaneix N, Neuvial P. Applicability and interpretability of ward\u2019s hierarchical agglomerative clustering with or without contiguity constraints. J Classif. 2021;38(2):363\u201389.","journal-title":"J Classif"},{"issue":"15","key":"1053_CR34","doi-asserted-by":"publisher","first-page":"11459","DOI":"10.1007\/s00521-019-04636-5","volume":"32","author":"N Nidheesh","year":"2020","unstructured":"Nidheesh N, Nazeer KA, Ameer P. A hierarchical clustering algorithm based on silhouette index for cancer subtype discovery from genomic data. Neural Comput Appl. 2020;32(15):11459\u201376.","journal-title":"Neural Comput Appl"},{"key":"1053_CR35","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.109230","volume":"136","author":"W-B Xie","year":"2023","unstructured":"Xie W-B, Liu Z, Das D, Chen B, Srivastava J. Scalable clustering by aggregating representatives in hierarchical groups. Pattern Recogn. 2023;136: 109230.","journal-title":"Pattern Recogn"},{"issue":"1","key":"1053_CR36","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1186\/s13638-024-02333-z","volume":"2024","author":"SM Hamedoon","year":"2024","unstructured":"Hamedoon SM, Chattha JN, Bilal M. Towards intelligent user clustering techniques for non-orthogonal multiple access: a survey. EURASIP J Wirel Commun Netw. 2024;2024(1):7.","journal-title":"EURASIP J Wirel Commun Netw"},{"key":"1053_CR37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-016-0046-3","volume":"3","author":"S Kumar","year":"2016","unstructured":"Kumar S, Toshniwal D. Analysis of hourly road accident counts using hierarchical clustering and cophenetic correlation coefficient (cpcc). J Big Data. 2016;3:1\u201311.","journal-title":"J Big Data"},{"issue":"9","key":"1053_CR38","doi-asserted-by":"publisher","first-page":"2534","DOI":"10.1109\/TPDS.2014.2355205","volume":"26","author":"Y Jeon","year":"2014","unstructured":"Jeon Y, Yoon S. Multi-threaded hierarchical clustering by parallel nearest-neighbor chaining. IEEE Trans Parallel Distrib Syst. 2014;26(9):2534\u201348.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"4","key":"1053_CR39","doi-asserted-by":"publisher","first-page":"542","DOI":"10.1109\/TC.2018.2879332","volume":"68","author":"N Pang","year":"2018","unstructured":"Pang N, Zhang J, Zhang C, Qin X. Parallel hierarchical subspace clustering of categorical data. IEEE Trans Comput. 2018;68(4):542\u201355.","journal-title":"IEEE Trans Comput"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-01053-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-024-01053-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-01053-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T16:23:43Z","timestamp":1738254223000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-024-01053-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,30]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1053"],"URL":"https:\/\/doi.org\/10.1186\/s40537-024-01053-x","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,30]]},"assertion":[{"value":"11 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 December 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"21"}}