{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T20:24:22Z","timestamp":1773951862027,"version":"3.50.1"},"reference-count":33,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2020,2,13]],"date-time":"2020-02-13T00:00:00Z","timestamp":1581552000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders\u2019 mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.<\/jats:p>","DOI":"10.3390\/e22020213","type":"journal-article","created":{"date-parts":[[2020,2,18]],"date-time":"2020-02-18T10:10:25Z","timestamp":1582020625000},"page":"213","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1835-6964","authenticated-orcid":false,"given":"Yi\u011fit","family":"U\u011fur","sequence":"first","affiliation":[{"name":"Laboratoire d\u2019informatique Gaspard-Monge, Universit\u00e9 Paris-Est, 77454 Champs-sur-Marne, France"},{"name":"Mathematical and Algorithmic Sciences Lab, Paris Research Center, Huawei Technologies, 92100 Boulogne-Billancourt, France"}]},{"given":"George","family":"Arvanitakis","sequence":"additional","affiliation":[{"name":"Mathematical and Algorithmic Sciences Lab, Paris Research Center, Huawei Technologies, 92100 Boulogne-Billancourt, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2023-9476","authenticated-orcid":false,"given":"Abdellatif","family":"Zaidi","sequence":"additional","affiliation":[{"name":"Laboratoire d\u2019informatique Gaspard-Monge, Universit\u00e9 Paris-Est, 77454 Champs-sur-Marne, France"}]}],"member":"1968","published-online":{"date-parts":[[2020,2,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sculley, D. (2010, January 26\u201330). Web-scale K-means clustering. Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA.","DOI":"10.1145\/1772690.1772862"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1023\/A:1009769707641","article-title":"Extensions to the k-means algorithm for clustering large datasets with categorical values","volume":"2","author":"Huang","year":"1998","journal-title":"Data Min. Knowl. Disc."},{"key":"ref_3","first-page":"100","article-title":"Algorithm AS 136: A K-means clustering algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"J. R. Stat. Soc."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ding, C., and He, X. (2004, January 4\u20138). K-means clustering via principal component analysis. Proceedings of the 21st International Conference on Machine Learning, Banff, AB, Canada.","DOI":"10.1145\/1015330.1015408"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1080\/14786440109462720","article-title":"On lines and planes of closest fit to systems of points in space","volume":"2","author":"Pearson","year":"1901","journal-title":"Philos. Mag."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/0169-7439(87)80084-9","article-title":"Principal component analysis","volume":"2","author":"Wold","year":"1987","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_8","unstructured":"Roweis, S. (1997, January 1\u20136). EM algorithms for PCA and SPCA. Proceedings of the Advances in Neural Information Processing Systems 10, Denver, CO, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1171","DOI":"10.1214\/009053607000000677","article-title":"Kernel methods in machine learning","volume":"36","author":"Hofmann","year":"2008","journal-title":"Ann. Stat."},{"key":"ref_10","unstructured":"Tishby, N., Pereira, F.C., and Bialek, W. (1999, January 22\u201324). The information bottleneck method. Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Slonim, N., and Tishby, N. (2000, January 24\u201328). Document clustering using word clusters via the information bottleneck method. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece.","DOI":"10.1145\/345508.345578"},{"key":"ref_12","unstructured":"Slonim, N. (2002). The Information Bottleneck: Theory and Applications. [Ph.D. Thesis, Hebrew University]."},{"key":"ref_13","unstructured":"Kingma, D.P., and Welling, M. (2014, January 14\u201316). Auto-encoding variational bayes. Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada."},{"key":"ref_14","unstructured":"Rezende, D.J., Mohamed, S., and Wierstra, D. (2014, January 21\u201326). Stochastic backpropagation and approximate inference in deep generative models. Proceedings of the 31st International Conference on Machine Learning, Beijing, China."},{"key":"ref_15","unstructured":"Alemi, A.A., Fischer, I., Dillon, J.V., and Murphy, K. (2017, January 24\u201326). Deep variational information bottleneck. Proceedings of the 5th International Conference on Learning Representations, Toulon, France."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_17","first-page":"361","article-title":"A new benchmark collection for text categorization research","volume":"5","author":"Lewis","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","unstructured":"Coates, A., Ng, A., and Lee, H. (2011, January 11\u201313). An analysis of single-layer networks in unsupervised feature learning. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Jiang, Z., Zheng, Y., Tan, H., Tang, B., and Zhou, H. (2017, January 19\u201325). Variational deep embedding: An unsupervised and generative approach to clustering. Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia.","DOI":"10.24963\/ijcai.2017\/273"},{"key":"ref_20","unstructured":"Xie, J., Girshick, R., and Farhadi, A. (2016, January 19\u201324). Unsupervised deep embedding for clustering analysis. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Guo, X., Gao, L., Liu, X., and Yin, J. (2017, January 19\u201325). Improved deep embedded clustering with local structure preservation. Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia.","DOI":"10.24963\/ijcai.2017\/243"},{"key":"ref_22","unstructured":"Dilokthanakul, N., Mediano, P.A.M., Garnelo, M., Lee, M.C.H., Salimbeni, H., Arulkumaran, K., and Shanahani, M. (2017). Deep unsupervised clustering with Gaussian mixture variational autoencoders. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"39501","DOI":"10.1109\/ACCESS.2018.2855437","article-title":"A survey of clustering with deep learning: From the perspective of network architecture","volume":"6","author":"Min","year":"2018","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Hershey, J.R., and Olsen, P.A. (2007, January 15\u201320). Approximating the Kullback Leibler divergence between Gaussian mixture models. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, HI, USA.","DOI":"10.1109\/ICASSP.2007.366913"},{"key":"ref_25","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1109\/TPAMI.2017.2784440","article-title":"Information dropout: Learning optimal representations through noisy computation","volume":"40","author":"Achille","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_28","first-page":"3371","article-title":"Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion","volume":"11","author":"Vincent","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_29","unstructured":"Schwartz-Ziv, R., and Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv."},{"key":"ref_30","first-page":"2579","article-title":"Visualizing Data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Estella-Aguerri, I., and Zaidi, A. (2020). Distributed variational representation learning. IEEE Trans. Pattern Anal. Mach. Intell., in press.","DOI":"10.1109\/TPAMI.2019.2928806"},{"key":"ref_32","unstructured":"Estella-Aguerri, I., and Zaidi, A. (2018, January 21\u201323). Distributed information bottleneck method for discrete and Gaussian sources. Proceedings of the International Zurich Seminar on Information and Communication, Z\u00fcrich, Switzerland."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zaidi, A., Estella-Aguerri, I., and Shamai (Shitz), S. (2020). On the information bottleneck problems: Models, connections, applications and information theoretic views. Entropy, 22.","DOI":"10.3390\/e22020151"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/2\/213\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T08:57:39Z","timestamp":1760173059000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/2\/213"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,13]]},"references-count":33,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2020,2]]}},"alternative-id":["e22020213"],"URL":"https:\/\/doi.org\/10.3390\/e22020213","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,13]]}}}