{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,8]],"date-time":"2025-07-08T11:33:44Z","timestamp":1751974424466},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"14","license":[{"start":{"date-parts":[[2023,5,23]],"date-time":"2023-05-23T00:00:00Z","timestamp":1684800000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,23]],"date-time":"2023-05-23T00:00:00Z","timestamp":1684800000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Soft Comput"],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper presents a new way of selecting an initialisation for the <jats:inline-formula><jats:alternatives><jats:tex-math>$$k$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>k<\/mml:mi>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>-modes algorithm that allows for a notion of game theoretic fairness that classic initialisations, namely those by Huang and Cao, do not. Our new method utilises the hospital-resident assignment problem to find the set of initial cluster centroids which we compare with two classical initialisation methods for <jats:inline-formula><jats:alternatives><jats:tex-math>$$k$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>k<\/mml:mi>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>-modes: the original presented by Huang and the next most popular method of Cao and co-authors. To highlight the merits of our proposed method, two stages of analysis are presented. It is demonstrated that the proposed method is often able to offer computational speed-up of the order of <jats:inline-formula><jats:alternatives><jats:tex-math>$$50\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>50<\/mml:mn>\n                    <mml:mo>%<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>. Improved clustering, in terms of a commonly used cost-function, was witnessed in several cases and can be of the order of <jats:inline-formula><jats:alternatives><jats:tex-math>$$10\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>10<\/mml:mn>\n                    <mml:mo>%<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>, particularly for more complex datasets.<\/jats:p>","DOI":"10.1007\/s00500-023-08407-2","type":"journal-article","created":{"date-parts":[[2023,5,23]],"date-time":"2023-05-23T15:11:38Z","timestamp":1684854698000},"page":"9441-9457","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["A novel initialisation based on hospital-resident assignment for the $$k$$-modes algorithm"],"prefix":"10.1007","volume":"27","author":[{"given":"Jonathan","family":"Gillard","sequence":"first","affiliation":[]},{"given":"Vincent","family":"Knight","sequence":"additional","affiliation":[]},{"given":"Henry","family":"Wilde","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,23]]},"reference":[{"key":"8407_CR1","unstructured":"Arthur D, Vassilvitskii S (2007) $$k$$-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA \u201907, pp 1027\u20131035"},{"key":"8407_CR2","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1016\/S0167-9473(00)00046-3","volume":"36","author":"DM Bashtannyk","year":"2001","unstructured":"Bashtannyk DM, Hyndman RJ (2001) Bandwidth selection for kernel conditional density estimation. Comput Stat Data Anal 36:279\u2013298","journal-title":"Comput Stat Data Anal"},{"key":"8407_CR3","doi-asserted-by":"publisher","first-page":"10223","DOI":"10.1016\/j.eswa.2009.01.060","volume":"36","author":"F Cao","year":"2009","unstructured":"Cao F, Liang J, Bai L (2009) A new initialization method for categorical data clustering. Expert Syst Appl 36:10223\u201310228","journal-title":"Expert Syst Appl"},{"key":"8407_CR4","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1016\/j.knosys.2011.07.011","volume":"26","author":"F Cao","year":"2012","unstructured":"Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the $$k$$-modes clustering algorithm. Knowl Based Syst 26:120\u2013127","journal-title":"Knowl Based Syst"},{"key":"8407_CR5","unstructured":"Dua D, Graff C (2017) UCI Machine Learning Repository"},{"key":"8407_CR6","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1016\/j.jet.2017.07.002","volume":"171","author":"A Erdil","year":"2017","unstructured":"Erdil A, Ergin H (2017) Two-sided matching with indifferences. J Econ Theory 171:268\u2013292","journal-title":"J Econ Theory"},{"key":"8407_CR7","doi-asserted-by":"crossref","unstructured":"Fuku T, Namatame A, Kaizoji T (2006) Collective efficiency in two-sided matching, pp 115\u2013126","DOI":"10.1007\/3-540-28547-4_10"},{"issue":"1","key":"8407_CR8","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1080\/00029890.1962.11989827","volume":"69","author":"D Gale","year":"1962","unstructured":"Gale D, Shapley L (1962) College admissions and the stability of marriage. Am Math Mon 69(1):9\u201315","journal-title":"Am Math Mon"},{"key":"8407_CR9","unstructured":"Huang Z (1997a) Clustering large data sets with mixed numeric and categorical values. In: The first Pacific-Asia conference on knowledge discovery and data mining, pp 21\u201334"},{"key":"8407_CR10","unstructured":"Huang Z (1997b) A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the SIGMOD workshop on research issues on data mining and knowledge discovery, pp 1\u20138"},{"issue":"3","key":"8407_CR11","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1023\/A:1009769707641","volume":"2","author":"Z Huang","year":"1998","unstructured":"Huang Z (1998) Extensions to the $$k$$-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283\u2013304","journal-title":"Data Min Knowl Discov"},{"key":"8407_CR12","first-page":"2071","volume-title":"Stable marriage with ties and incomplete lists","author":"K Iwama","year":"2016","unstructured":"Iwama K, Miyazaki S (2016) Stable marriage with ties and incomplete lists. Springer, New York, pp 2071\u20132075"},{"key":"8407_CR13","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1016\/j.ins.2015.11.005","volume":"332","author":"F Jiang","year":"2016","unstructured":"Jiang F, Liu G, Junwei D, Sui Y (2016) Initialization of $$k$$-modes clustering using outlier detection techniques. Inf Sci 332:167\u2013183","journal-title":"Inf Sci"},{"key":"8407_CR14","first-page":"115","volume-title":"Principal component analysis and factor analysis","author":"IT Jolliffe","year":"1986","unstructured":"Jolliffe IT (1986) Principal component analysis and factor analysis. Springer, New York, pp 115\u2013128"},{"key":"8407_CR15","doi-asserted-by":"crossref","unstructured":"Kwanashie A, Irving RW, Manlove DF, Sng CTS (2015) Profile-based optimal matchings in the student\/project allocation problem. In: Combinatorial algorithms, pp 213\u2013225","DOI":"10.1007\/978-3-319-19315-1_19"},{"issue":"1","key":"8407_CR16","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1016\/S0304-3975(01)00206-7","volume":"276","author":"DF Manlove","year":"2002","unstructured":"Manlove DF, Irving RW, Iwama K, Miyazaki S, Morita Y (2002) Hard variants of stable marriage. Theor Comput Sci 276(1):261\u2013279","journal-title":"Theor Comput Sci"},{"key":"8407_CR17","doi-asserted-by":"crossref","unstructured":"M\u00e9moli F (2011) Metric structures on datasets: stability and classification of algorithms. In: Computer analysis of images and patterns. Springer, Berlin, pp 1\u201333","DOI":"10.1007\/978-3-642-23678-5_1"},{"issue":"3","key":"8407_CR18","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1109\/TPAMI.2007.53","volume":"29","author":"K Ng Michael","year":"2007","unstructured":"Ng Michael K, Junjie LM, Zhexue HJ, Zengyou H (2007) On the impact of dissimilarity measure in $$k$$-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503\u2013507","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"8407_CR19","doi-asserted-by":"crossref","unstructured":"Olaode A, Naghdy G, Todd C (2014) Unsupervised image classification by probabilistic latent semantic analysis for the annotation of images. In: International conference on digital image computing: techniques and applications","DOI":"10.1109\/DICTA.2014.7008133"},{"issue":"6","key":"8407_CR20","doi-asserted-by":"publisher","first-page":"991","DOI":"10.1086\/261272","volume":"92","author":"A Roth","year":"1984","unstructured":"Roth A (1984) The evolution of the labor market for medical interns and residents: a case study in game theory. J Polit Econ 92(6):991\u20131016","journal-title":"J Polit Econ"},{"key":"8407_CR21","doi-asserted-by":"crossref","unstructured":"Satopaa V, Albrecht J, Irwin D, Raghavan B (2011) Finding a \u2018kneedle\u2019 in a haystack: detecting knee points in system behavior. In: Proceedings of the 2011 31st international conference on distributed computing systems workshops, pp 166\u2013171, 07","DOI":"10.1109\/ICDCSW.2011.20"},{"key":"8407_CR22","doi-asserted-by":"crossref","unstructured":"Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27\u201364","DOI":"10.1016\/j.cosrev.2007.05.001"},{"issue":"17","key":"8407_CR23","first-page":"1","volume":"127","author":"N Sharma","year":"2015","unstructured":"Sharma N, Gaud N (2015) $$k$$-modes clustering algorithm for categorical data. Int J Comput Appl 127(17):1\u20136","journal-title":"Int J Comput Appl"},{"key":"8407_CR24","doi-asserted-by":"crossref","unstructured":"The matching library developers (2019) Matching: v1.1","DOI":"10.1155\/2019\/8398356"},{"key":"8407_CR25","doi-asserted-by":"publisher","first-page":"1172","DOI":"10.1007\/s10489-019-01592-4","volume":"50","author":"H Wilde","year":"2019","unstructured":"Wilde H, Knight V, Gillard J (2019) Evolutionary dataset optimisation: learning algorithm quality through evolution. Appl Intell 50:1172\u20131191","journal-title":"Appl Intell"}],"container-title":["Soft Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00500-023-08407-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00500-023-08407-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00500-023-08407-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,6]],"date-time":"2023-06-06T17:54:09Z","timestamp":1686074049000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00500-023-08407-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,23]]},"references-count":25,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["8407"],"URL":"https:\/\/doi.org\/10.1007\/s00500-023-08407-2","relation":{},"ISSN":["1432-7643","1433-7479"],"issn-type":[{"value":"1432-7643","type":"print"},{"value":"1433-7479","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,23]]},"assertion":[{"value":"1 May 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 May 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not contain any studies with human participants or animals performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}}]}}