{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T04:13:53Z","timestamp":1772424833204,"version":"3.50.1"},"reference-count":49,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,12,4]],"date-time":"2021-12-04T00:00:00Z","timestamp":1638576000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,4]],"date-time":"2021-12-04T00:00:00Z","timestamp":1638576000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Strategic Research Council (SRC) at the Academy of Finland","award":["312760-1"],"award-info":[{"award-number":["312760-1"]}]},{"name":"University of Eastern Finland (UEF) including Kuopio University Hospital"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2022,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We propose two new algorithms for clustering graphs and networks. The first, called <jats:italic>K\u2011algorithm<\/jats:italic>, is derived directly from the <jats:italic>k<\/jats:italic>-means algorithm. It applies similar iterative local optimization but without the need to calculate the means. It inherits the properties of <jats:italic>k<\/jats:italic>-means clustering in terms of both good local optimization capability and the tendency to get stuck at a local optimum. The second algorithm, called the <jats:italic>M-algorithm<\/jats:italic>, gradually improves on the results of the <jats:italic>K<\/jats:italic>-algorithm to find new and potentially better local optima. It repeatedly merges and splits random clusters and tunes the results with the <jats:italic>K<\/jats:italic>-algorithm. Both algorithms are general in the sense that they can be used with different cost functions. We consider the conductance cost function and also introduce two new cost functions, called <jats:italic>inverse internal weight<\/jats:italic> and <jats:italic>mean internal weight<\/jats:italic>. According to our experiments, the <jats:italic>M<\/jats:italic>-algorithm outperforms eight other state-of-the-art methods. We also perform a case study by analyzing clustering results of a disease co-occurrence network, which demonstrate the usefulness of the algorithms in an important real-life application.<\/jats:p>","DOI":"10.1007\/s10115-021-01623-y","type":"journal-article","created":{"date-parts":[[2021,12,4]],"date-time":"2021-12-04T16:02:31Z","timestamp":1638633751000},"page":"115-142","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":43,"title":["Adapting k-means for graph clustering"],"prefix":"10.1007","volume":"64","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9110-2421","authenticated-orcid":false,"given":"Sami","family":"Sieranoja","sequence":"first","affiliation":[]},{"given":"Pasi","family":"Fr\u00e4nti","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,12,4]]},"reference":[{"issue":"23","key":"1623_CR1","doi-asserted-by":"publisher","first-page":"8577","DOI":"10.1073\/pnas.0601602103","volume":"103","author":"ME Newman","year":"2006","unstructured":"Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577\u20138582","journal-title":"Proc Natl Acad Sci"},{"issue":"5","key":"1623_CR2","doi-asserted-by":"publisher","first-page":"056131","DOI":"10.1103\/PhysRevE.70.056131","volume":"70","author":"ME Newman","year":"2004","unstructured":"Newman ME (2004) Analysis of weighted networks. Phys Rev E 70(5):056131","journal-title":"Phys Rev E"},{"issue":"2","key":"1623_CR3","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1002\/j.1538-7305.1970.tb01770.x","volume":"49","author":"BW Kernighan","year":"1970","unstructured":"Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2):291\u2013307","journal-title":"Bell Syst Tech J"},{"issue":"8","key":"1623_CR4","doi-asserted-by":"publisher","first-page":"888","DOI":"10.1109\/34.868688","volume":"22","author":"J Shi","year":"2000","unstructured":"Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888\u2013905","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1623_CR5","first-page":"113","volume":"22","author":"MJ Divo","year":"2015","unstructured":"Divo MJ, Casanova C, Marin JM et al (2015) Chronic obstructive pulmonary disease comorbidities network. Eur Respir J 22:113\u2013118","journal-title":"Eur Respir J"},{"key":"1623_CR6","doi-asserted-by":"crossref","unstructured":"Hromic H, Prangnawarat N, Hulpu\u015f I, Karnstedt M, Hayes C (2015) Graph-based methods for clustering topics of interest in twitter. In: International conference on web engineering, pp 701\u2013704","DOI":"10.1007\/978-3-319-19890-3_61"},{"issue":"3\u20135","key":"1623_CR7","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1016\/j.physrep.2009.11.002","volume":"486","author":"S Fortunato","year":"2010","unstructured":"Fortunato S (2010) Community detection in graphs. Phys Rep 486(3\u20135):75\u2013174","journal-title":"Phys Rep"},{"key":"1623_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.physrep.2016.09.002","volume":"659","author":"S Fortunato","year":"2016","unstructured":"Fortunato S, Hric D (2016) Community detection in networks: a user guide. Phys Rep 659:1\u201344","journal-title":"Phys Rep"},{"issue":"10","key":"1623_CR9","doi-asserted-by":"publisher","first-page":"10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","volume":"2008","author":"VD Blondel","year":"2008","unstructured":"Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):10008","journal-title":"J Stat Mech Theory Exp"},{"issue":"1","key":"1623_CR10","doi-asserted-by":"publisher","first-page":"016118","DOI":"10.1103\/PhysRevE.80.016118","volume":"80","author":"A Lancichinetti","year":"2009","unstructured":"Lancichinetti A, Fortunato S (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):016118","journal-title":"Phys Rev E"},{"issue":"5","key":"1623_CR11","doi-asserted-by":"publisher","first-page":"1272","DOI":"10.1109\/TKDE.2016.2518687","volume":"28","author":"JJ Whang","year":"2016","unstructured":"Whang JJ, Gleich DF, Dhillon IS (2016) Overlapping community detection using neighborhood-inflated seed expansion. IEEE Trans Knowl Data Eng 28(5):1272\u20131284","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1623_CR12","unstructured":"Lu Z, Wen Y, Cao G (2013) Community detection in weighted networks: Algorithms and applications. In: 2013 IEEE international conference on pervasive computing and communications (PerCom), pp 179\u2013184"},{"issue":"5","key":"1623_CR13","doi-asserted-by":"publisher","first-page":"056117","DOI":"10.1103\/PhysRevE.80.056117","volume":"80","author":"A Lancichinetti","year":"2009","unstructured":"Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5):056117","journal-title":"Phys Rev E"},{"key":"1623_CR14","doi-asserted-by":"crossref","unstructured":"Zhang W, Wang X, Zhao D, Tang X (2012) Graph degree linkage: agglomerative clustering on a directed graph. In: European conference on computer vision, pp 428\u2013441","DOI":"10.1007\/978-3-642-33718-5_31"},{"key":"1623_CR15","doi-asserted-by":"crossref","unstructured":"LaSalle D, Karypis G (2016) A parallel hill-climbing refinement algorithm for graph partitioning. In: 45th International conference on parallel processing (ICPP), pp 236\u2013241","DOI":"10.1109\/ICPP.2016.34"},{"issue":"2","key":"1623_CR16","doi-asserted-by":"publisher","first-page":"831","DOI":"10.1016\/j.patcog.2011.06.018","volume":"45","author":"SS Tabatabaei","year":"2012","unstructured":"Tabatabaei SS, Coates M, Rabbat M (2012) Ganc: greedy agglomerative normalized cut for graph clustering. Pattern Recogn 45(2):831\u2013843","journal-title":"Pattern Recogn"},{"key":"1623_CR17","doi-asserted-by":"crossref","unstructured":"Sch\u00e4fer T, Mutzel P (2017) Struclus: scalable structural graph set clustering with representative sampling. In: International conference on advanced data mining and applications, pp 343\u2013359","DOI":"10.1007\/978-3-319-69179-4_24"},{"key":"1623_CR18","doi-asserted-by":"crossref","unstructured":"Riesen K, Bunke H (2008) Kernel k-means clustering applied to vector space embeddings of graphs. In: IAPR workshop on artificial neural networks in pattern recognition, pp 24\u201335","DOI":"10.1007\/978-3-540-69939-2_3"},{"key":"1623_CR19","doi-asserted-by":"crossref","unstructured":"Ferrer M, Valveny E, Serratosa F, Bardaj\u00ed I, Bunke H (2009) Graph-based k-means clustering: a comparison of the set median versus the generalized median graph. In: International conference on computer analysis of images and patterns, pp 342\u2013350","DOI":"10.1007\/978-3-642-03767-2_42"},{"issue":"12","key":"1623_CR20","doi-asserted-by":"publisher","first-page":"7821","DOI":"10.1073\/pnas.122653799","volume":"99","author":"M Girvan","year":"2002","unstructured":"Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821\u20137826","journal-title":"Proc Natl Acad Sci"},{"issue":"3","key":"1623_CR21","doi-asserted-by":"publisher","first-page":"497","DOI":"10.1145\/990308.990313","volume":"51","author":"R Kannan","year":"2004","unstructured":"Kannan R, Vempala S, Vetta A (2004) On clusterings: good, bad and spectral. J ACM 51(3):497\u2013515","journal-title":"J ACM"},{"key":"1623_CR22","doi-asserted-by":"crossref","unstructured":"Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: International symposium on computer and information sciences, pp 284\u2013293","DOI":"10.1007\/11569596_31"},{"issue":"6","key":"1623_CR23","doi-asserted-by":"publisher","first-page":"066111","DOI":"10.1103\/PhysRevE.70.066111","volume":"70","author":"A Clauset","year":"2004","unstructured":"Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111","journal-title":"Phys Rev E"},{"issue":"1","key":"1623_CR24","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1137\/S1064827595287997","volume":"20","author":"G Karypis","year":"1998","unstructured":"Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359\u2013392","journal-title":"SIAM J Sci Comput"},{"issue":"2","key":"1623_CR25","doi-asserted-by":"publisher","first-page":"026113","DOI":"10.1103\/PhysRevE.69.026113","volume":"69","author":"ME Newman","year":"2004","unstructured":"Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113","journal-title":"Phys Rev E"},{"issue":"1","key":"1623_CR26","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1073\/pnas.0605965104","volume":"104","author":"S Fortunato","year":"2007","unstructured":"Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36\u201341","journal-title":"Proc Natl Acad Sci"},{"issue":"4","key":"1623_CR27","doi-asserted-by":"publisher","first-page":"e1000353","DOI":"10.1371\/journal.pcbi.1000353","volume":"5","author":"CA Hidalgo","year":"2009","unstructured":"Hidalgo CA, Blumm N, Barab\u00e1si A, Christakis NA (2009) A dynamic network approach for the study of human phenotypes. PLoS Comput Biol 5(4):e1000353","journal-title":"PLoS Comput Biol"},{"key":"1623_CR28","doi-asserted-by":"crossref","unstructured":"Okuda M, Satoh S, Sato Y, Kidawara Y (2019) Community detection using restrained random-walk similarity. IEEE Trans Pattern Anal Mach Intell","DOI":"10.1109\/TPAMI.2019.2926033"},{"issue":"1","key":"1623_CR29","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1016\/0890-5401(89)90067-9","volume":"82","author":"A Sinclair","year":"1989","unstructured":"Sinclair A, Jerrum M (1989) Approximate counting, uniform generation and rapidly mixing markov chains. Inf Comput 82(1):93\u2013133","journal-title":"Inf Comput"},{"key":"1623_CR30","doi-asserted-by":"crossref","unstructured":"Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on world wide web, pp 631\u2013640","DOI":"10.1145\/1772690.1772755"},{"issue":"1","key":"1623_CR31","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1007\/s10115-013-0693-z","volume":"42","author":"J Yang","year":"2015","unstructured":"Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181\u2013213","journal-title":"Knowl Inf Syst"},{"issue":"1","key":"1623_CR32","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.cosrev.2007.05.001","volume":"1","author":"SE Schaeffer","year":"2007","unstructured":"Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27\u201364","journal-title":"Comput Sci Rev"},{"issue":"4","key":"1623_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3091106","volume":"50","author":"T Chakraborty","year":"2017","unstructured":"Chakraborty T, Dalmia A, Mukherjee A, Ganguly N (2017) Metrics for community analysis: a survey. ACM Comput Surv 50(4):1\u201337","journal-title":"ACM Comput Surv"},{"key":"1623_CR34","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1016\/j.jnca.2018.02.011","volume":"108","author":"MA Javed","year":"2018","unstructured":"Javed MA, Younis MS, Latif S, Qadir J, Baig A (2018) Community detection in networks: a multidisciplinary review. J Netw Comput Appl 108:87\u2013111","journal-title":"J Netw Comput Appl"},{"key":"1623_CR35","unstructured":"Biedermann S, Henzinger M, Schulz C, Schuster B (2018) Memetic graph clustering. In: Proceedings of the 17th international symposium on experimental algorithms (SEA'18), LIPIcs. Dagstuhl. Technical report arXiv: 1802.07034"},{"issue":"1","key":"1623_CR36","doi-asserted-by":"publisher","first-page":"012804","DOI":"10.1103\/PhysRevE.89.012804","volume":"89","author":"TP Peixoto","year":"2014","unstructured":"Peixoto TP (2014) Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. Phys Rev E 89(1):012804","journal-title":"Phys Rev E"},{"key":"1623_CR37","doi-asserted-by":"crossref","unstructured":"Rozemberczki B, Davies R, Sarkar R, Sutton C (2019) Gemsec: graph embedding with self clustering. In: Proceedings of the 2019 IEEE\/ACM international conference on advances in social networks analysis and mining, pp 65\u201372","DOI":"10.1145\/3341161.3342890"},{"key":"1623_CR38","doi-asserted-by":"crossref","unstructured":"Bulu\u00e7 A, Meyerhenke H, Safro I, Sanders P, Schulz C (2016) Recent advances in graph partitioning. In: Algorithm engineering, pp 117\u2013158","DOI":"10.1007\/978-3-319-49487-6_4"},{"key":"1623_CR39","doi-asserted-by":"crossref","unstructured":"Sanders P, Schulz C (2012) Distributed evolutionary graph partitioning. In: 2012 Proceedings of the fourteenth workshop on algorithm engineering and experiments (ALENEX), pp 16\u201329","DOI":"10.1137\/1.9781611972924.2"},{"key":"1623_CR40","doi-asserted-by":"crossref","unstructured":"Benlic U, Hao JK (2010) An effective multilevel memetic algorithm for balanced graph partitioning. In: 2010 22nd IEEE international conference on tools with artificial intelligence, vol 1, pp 121\u2013128","DOI":"10.1109\/ICTAI.2010.25"},{"key":"1623_CR41","doi-asserted-by":"crossref","unstructured":"Par\u00e9s F, Gasulla DG, Vilalta A, Moreno J, Ayguad\u00e9 E, Labarta J, Cort\u00e9s U, Suzumura T (2017) Fluid communities: a competitive, scalable and diverse community detection algorithm. In: International conference on complex networks and their applications, pp 229\u2013240","DOI":"10.1007\/978-3-319-72150-7_19"},{"issue":"12","key":"1623_CR42","doi-asserted-by":"publisher","first-page":"4743","DOI":"10.1007\/s10489-018-1238-7","volume":"48","author":"P Fr\u00e4nti","year":"2018","unstructured":"Fr\u00e4nti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. Appl Intell 48(12):4743\u20134759","journal-title":"Appl Intell"},{"key":"1623_CR43","doi-asserted-by":"crossref","unstructured":"Sieranoja S, Fr\u00e4nti P (2018) Fast random pair divisive construction of knn graph using generic distance measures. In: Proceedings of the 2018 international conference on big data and computing, pp 95\u201398","DOI":"10.1145\/3220199.3220215"},{"key":"1623_CR44","volume-title":"International statistical classification of diseases and related health problems, 10th revision","author":"WHO","year":"2016","unstructured":"WHO (2016) International statistical classification of diseases and related health problems, 10th revision. World Health Organization, Geneva"},{"issue":"09","key":"1623_CR45","doi-asserted-by":"publisher","first-page":"P09008","DOI":"10.1088\/1742-5468\/2005\/09\/P09008","volume":"2005","author":"L Danon","year":"2005","unstructured":"Danon L, Diaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 2005(09):P09008","journal-title":"J Stat Mech Theory Exp"},{"issue":"9","key":"1623_CR46","doi-asserted-by":"publisher","first-page":"3034","DOI":"10.1016\/j.patcog.2014.03.017","volume":"47","author":"P Fr\u00e4nti","year":"2014","unstructured":"Fr\u00e4nti P, Rezaei M, Zhao Q (2014) Centroid index: cluster level similarity measure. Pattern Recogn 47(9):3034\u20133045","journal-title":"Pattern Recogn"},{"issue":"1","key":"1623_CR47","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s41109-019-0165-9","volume":"4","author":"G Rossetti","year":"2019","unstructured":"Rossetti G, Milli L, Cazabet R (2019) Cdlib: a python library to extract, compare and evaluate communities from complex networks. Appl Netw Sci 4(1):1\u201326","journal-title":"Appl Netw Sci"},{"key":"1623_CR48","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1016\/j.patcog.2019.04.014","volume":"93","author":"P Fr\u00e4nti","year":"2019","unstructured":"Fr\u00e4nti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats? Pattern Recogn 93:95\u2013112","journal-title":"Pattern Recogn"},{"issue":"1","key":"1623_CR49","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1186\/s40537-018-0122-y","volume":"5","author":"P Fr\u00e4nti","year":"2018","unstructured":"Fr\u00e4nti P (2018) Efficiency of random swap clustering. J Big Data 5(1):13","journal-title":"J Big Data"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-021-01623-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-021-01623-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-021-01623-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T16:32:07Z","timestamp":1643301127000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-021-01623-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,4]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1]]}},"alternative-id":["1623"],"URL":"https:\/\/doi.org\/10.1007\/s10115-021-01623-y","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"value":"0219-1377","type":"print"},{"value":"0219-3116","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,4]]},"assertion":[{"value":"13 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 November 2021","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 December 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declared that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}