{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T20:42:01Z","timestamp":1769632921997,"version":"3.49.0"},"reference-count":30,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,3,6]],"date-time":"2019-03-06T00:00:00Z","timestamp":1551830400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["Grant No. 61402205"],"award-info":[{"award-number":["Grant No. 61402205"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["Grant No. 2015M571688"],"award-info":[{"award-number":["Grant No. 2015M571688"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Postgraduate Research &amp; Practice Innovation Program of Jiangsu Province","award":["Grant No. KYCX18_2258"],"award-info":[{"award-number":["Grant No. KYCX18_2258"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Clustering is to group data so that the observations in the same group are more similar to each other than to those in other groups. k-means is a popular clustering algorithm in data mining. Its objective is to optimize the mean squared error (MSE). The traditional k-means algorithm is not suitable for applications where the sizes of clusters need to be balanced. Given n observations, our objective is to optimize the MSE under the constraint that the observations need to be evenly divided into k clusters. In this paper, we propose an iterative method for the task of clustering with balanced size constraints. Each iteration can be split into two steps, namely an assignment step and an update step. In the assignment step, the data are evenly assigned to each cluster. The balanced assignment task here is formulated as an integer linear program (ILP), and we prove that the constraint matrix of this ILP is totally unimodular. Thus the ILP is relaxed as a linear program (LP) which can be efficiently solved with the simplex algorithm. In the update step, the new centers are updated as the centroids of the observations in the clusters. Assuming that there are n observations and the algorithm needs m iterations to converge, we show that the average time complexity of the proposed algorithm is     O ( m  n  1.65   )    \u2013    O ( m  n  1.70   )    . Experimental results indicate that, comparing with state-of-the-art methods, the proposed algorithm is efficient in deriving more accurate clustering.<\/jats:p>","DOI":"10.3390\/sym11030338","type":"journal-article","created":{"date-parts":[[2019,3,7]],"date-time":"2019-03-07T10:52:22Z","timestamp":1551955942000},"page":"338","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Optimizing MSE for Clustering with Balanced Size Constraints"],"prefix":"10.3390","volume":"11","author":[{"given":"Wei","family":"Tang","sequence":"first","affiliation":[{"name":"School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lanling","family":"Zeng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongzhao","family":"Zhan","sequence":"additional","affiliation":[{"name":"School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1109\/TNN.2005.845141","article-title":"Survey of clustering algorithms","volume":"16","author":"Xu","year":"2005","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_2","unstructured":"Yang, Y., and Padmanabhan, B. (2003, January 19\u201322). Segmenting customer transactions using a pattern-based clustering approach. Proceedings of the International Conference on Data Mining, Melbourne, FL, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1498","DOI":"10.1109\/JSEN.2012.2227704","article-title":"Load-Balanced Clustering Algorithm with Distributed Self-Organization for Wireless Sensor Networks","volume":"13","author":"Liao","year":"2013","journal-title":"IEEE Sens. J."},{"key":"ref_4","unstructured":"Hagen, L., and Kahng, A. (1991, January 11\u201314). Fast spectral methods for ratio cut partitioning and clustering. Proceedings of the IEEE International Conference on Computer-Aided Design, Santa Clara, CA, USA."},{"key":"ref_5","first-page":"185","article-title":"Document Clustering","volume":"38","author":"Issal","year":"2010","journal-title":"IEEE Swarm Intel. Symp."},{"key":"ref_6","unstructured":"Dengel, A., Althoff, T., and Ulges, A. (2008). Balanced Clustering for Content-Based Image Browsing. Gi-Informatiktage, 27\u201330."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1109\/TNN.2004.824416","article-title":"Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres","volume":"15","author":"Banerjee","year":"2004","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/s10589-008-9207-4","article-title":"Progress in the dual simplex algorithm for solving large scale LP problems: techniques for a fast and stable implementation","volume":"41","author":"Koberstein","year":"2008","journal-title":"Comput. Optim. Appl."},{"key":"ref_9","unstructured":"Malinen, M.I., and Fr\u00e4nti, P. (2014, January 20\u201322). Balanced k-means for Clustering. Proceedings of the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Joensuu, Finland."},{"key":"ref_10","first-page":"123","article-title":"Multivariate analysis","volume":"37","author":"Mardia","year":"1979","journal-title":"Math. Gazette"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"424","DOI":"10.1007\/s10618-016-0480-z","article-title":"Survey on using constraints in data mining","volume":"31","author":"Grossi","year":"2017","journal-title":"Data Mining Knowl. Discov."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1007\/s10618-006-0040-z","article-title":"Scalable Clustering Algorithms with Balancing Constraints","volume":"13","author":"Banerjee","year":"2006","journal-title":"Data Mining Knowl. Discov."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","article-title":"A tutorial on spectral clustering","volume":"17","author":"Luxburg","year":"2007","journal-title":"Stat. Comput."},{"key":"ref_14","unstructured":"Chen, Y., Zhang, Y., and Ji, X. (2005, January 5\u20138). Size Regularized Cut for Data Clustering. Proceedings of the Advances in Neural Information Processing Systems 18, Vancouver, BC, Canada."},{"key":"ref_15","first-page":"888","article-title":"Normalized cuts and image segmentation","volume":"22","author":"Shi","year":"1997","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1016\/j.patrec.2010.08.008","article-title":"Submodular fractional programming for balanced clustering","volume":"32","author":"Kawahara","year":"2011","journal-title":"Pattern Recognit. Lett."},{"key":"ref_17","unstructured":"Chang, X., Nie, F., Ma, Z., and Yang, Y. (2019, March 05). Balanced k-means and Min-Cut Clustering. Available online: https:\/\/arxiv.org\/abs\/1411.6235."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"883","DOI":"10.1016\/j.knosys.2010.06.003","article-title":"Data clustering with size constraints","volume":"23","author":"Zhu","year":"2010","journal-title":"Knowl.-Based Syst."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, R., Xu, W., Sun, J., and Zu, B. (2009, January 21\u201322). Balanced k-means Algorithm for Partitioning Areas in Large-Scale Vehicle Routing Problem. Proceedings of the International Symposium on Intelligent Information Technology Application, Nanchang, China.","DOI":"10.1109\/IITA.2009.307"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Nguyen, N.T., Tojo, S., Nguyen, L.M., and Trawi\u0144ski, B. (2017). Balanced k-means. Intelligent Information and Database Systems, Springer International Publishing.","DOI":"10.3233\/JIFS-169115"},{"key":"ref_21","unstructured":"Bennett, K., Bradley, P., and Demiriz, A. (2000). Constrained k-Means Clustering, Microsoft Research. Technical Report."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yuepeng, S., Min, L., and Cheng, W. (2011, January 6\u20137). A Modified k-means Algorithm for Clustering Problem with Balancing Constraints. Proceedings of the International Conference on Measuring Technology and Mechatronics Automation, Shanghai, China.","DOI":"10.1109\/ICMTMA.2011.37"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ganganath, N., Cheng, C.T., and Chi, K.T. (2014, January 13\u201315). Data Clustering with Cluster Size Constraints Using a Modified k-means Algorithm. Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Shanghai, China.","DOI":"10.1109\/CyberC.2014.36"},{"key":"ref_24","unstructured":"Arthur, D., and Vassilvitskii, S. (2007, January 7\u20139). k-means++: The advantages of careful seeding. Proceedings of the Eighteenth ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA."},{"key":"ref_25","unstructured":"Papadimitriou, C.H., and Steiglitz, K. (1998). Combinatorial Optimization: Algorithms and Complexity, Prentice Hall."},{"key":"ref_26","unstructured":"Schrijver, A. (1986). Theory of Linear and Integer Programming, John Wiley & Sons, Inc."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1145\/990308.990310","article-title":"Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time","volume":"51","author":"Spielman","year":"2004","journal-title":"J. ACM"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Borgwardt, K.H. (1987). The Simplex Method: A Probabilistic Analysis, Springer Science & Business Media.","DOI":"10.1007\/978-3-642-61578-8"},{"key":"ref_29","unstructured":"Fang, S.C., and Puthenpura, S. (1993). Linear Optimization and Extensions: Theory and Algorithms, Prentice-Hall."},{"key":"ref_30","unstructured":"Dheeru, D., and Taniskidou, E.K. (2019). UCI Machine Learning Repository, University of California."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/3\/338\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:36:49Z","timestamp":1760186209000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/3\/338"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,6]]},"references-count":30,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["sym11030338"],"URL":"https:\/\/doi.org\/10.3390\/sym11030338","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,3,6]]}}}