{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T19:09:38Z","timestamp":1776280178014,"version":"3.50.1"},"reference-count":0,"publisher":"Sciedu Press","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AIR"],"abstract":"<jats:p>It is an important and challenging problem in unsupervised learning to estimate the number of clusters in a dataset. Knowing the number of clusters is a prerequisite for many commonly used clustering algorithms such as \\textit{k}-means. In this paper, we propose a novel diversity based approach to this problem. Specifically, we show that the difference between the global diversity of clusters and the sum of each cluster\u2019s local diversity of their members can be used as an effective indicator of the optimality of the number of clusters, where the diversity is measured by Rao\u2019s quadratic entropy. A notable advantage of our proposed method is that it encourages balanced clustering by taking into account both the sizes of clusters and the distances between clusters. In other words, it is less prone to very small \u201coutlier\u201d clusters than existing methods. Our extensive experiments on both synthetic and real-world datasets (with known ground-truth clustering) have demonstrated that our proposed method is robust for clusters of different sizes, variances, and shapes, and it is more accurate than existing methods (including elbow, Cali\u0144ski-Harabasz, silhouette, and gap-statistic) in terms of finding out the optimal number of clusters.<\/jats:p>","DOI":"10.5430\/air.v7n1p15","type":"journal-article","created":{"date-parts":[[2017,12,18]],"date-time":"2017-12-18T04:13:13Z","timestamp":1513570393000},"page":"15","source":"Crossref","is-referenced-by-count":42,"title":["Estimating the number of clusters using diversity"],"prefix":"10.5430","volume":"7","author":[{"given":"Suneel Kumar","family":"Kingrani","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mark","family":"Levene","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dell","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"3394","published-online":{"date-parts":[[2017,12,18]]},"container-title":["Artificial Intelligence Research"],"original-title":[],"link":[{"URL":"http:\/\/www.sciedupress.com\/journal\/index.php\/air\/article\/viewFile\/12360\/7881","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/www.sciedupress.com\/journal\/index.php\/air\/article\/viewFile\/12360\/7881","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2017,12,18]],"date-time":"2017-12-18T04:13:14Z","timestamp":1513570394000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.sciedupress.com\/journal\/index.php\/air\/article\/view\/12360"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,18]]},"references-count":0,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2017,12,15]]}},"URL":"https:\/\/doi.org\/10.5430\/air.v7n1p15","relation":{},"ISSN":["1927-6982","1927-6974"],"issn-type":[{"value":"1927-6982","type":"electronic"},{"value":"1927-6974","type":"print"}],"subject":[],"published":{"date-parts":[[2017,12,18]]}}}