{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T20:10:43Z","timestamp":1767211843415,"version":"build-2065373602"},"reference-count":37,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2017,9,6]],"date-time":"2017-09-06T00:00:00Z","timestamp":1504656000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In this work we study the distributed representations learnt by generative neural network models. In particular, we investigate the properties of redundant and synergistic information that groups of hidden neurons contain about the target variable. To this end, we use an emerging branch of information theory called partial information decomposition (PID) and track the informational properties of the neurons through training. We find two differentiated phases during the training process: a first short phase in which the neurons learn redundant information about the target, and a second phase in which neurons start specialising and each of them learns unique information about the target. We also find that in smaller networks individual neurons learn more specific information about certain features of the input, suggesting that learning pressure can encourage disentangled representations.<\/jats:p>","DOI":"10.3390\/e19090474","type":"journal-article","created":{"date-parts":[[2017,9,6]],"date-time":"2017-09-06T11:23:34Z","timestamp":1504697014000},"page":"474","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":54,"title":["The Partial Information Decomposition of Generative Neural Network Models"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1450-3077","authenticated-orcid":false,"given":"Tycho","family":"Tax","sequence":"first","affiliation":[{"name":"Corti, N\u00f8rrebrogade 45E 2, 2200 Copenhagen N, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1789-5894","authenticated-orcid":false,"given":"Pedro","family":"Mediano","sequence":"additional","affiliation":[{"name":"Department of Computing, Imperial College London, London SW7 2RH, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5984-2964","authenticated-orcid":false,"given":"Murray","family":"Shanahan","sequence":"additional","affiliation":[{"name":"Department of Computing, Imperial College London, London SW7 2RH, UK"}]}],"member":"1968","published-online":{"date-parts":[[2017,9,6]]},"reference":[{"key":"ref_1","first-page":"1929","article-title":"Dropout: A Simple Way to Prevent Neural Networks from Overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_2","unstructured":"Gal, Y., and Ghahramani, Z. (2015). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. arXiv."},{"key":"ref_3","unstructured":"Bengio, Y., Courville, A., and Vincent, P. (2012). Representation Learning: A Review and New Perspectives. arXiv."},{"key":"ref_4","unstructured":"Higgins, I., Matthey, L., Glorot, X., Pal, A., Uria, B., Blundell, C., Mohamed, S., and Lerchner, A. (2016). Early Visual Concept Learning with Unsupervised Deep Learning. arXiv."},{"key":"ref_5","unstructured":"Mathieu, M., Zhao, J., Sprechmann, P., Ramesh, A., and LeCun, Y. (2016). Disentangling Factors of Variation in Deep Representations Using Adversarial Training. arXiv."},{"key":"ref_6","unstructured":"Siddharth, N., Paige, B., Van de Meent, J.W., Desmaison, A., Wood, F., Goodman, N.D., Kohli, P., and Torr, P.H.S. (2017). Learning Disentangled Representations with Semi-Supervised Deep Generative Models. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lake, B.M., Ullman, T.D., Tenenbaum, J.B., and Gershman, S.J. (2016). Building Machines That Learn and Think Like People. arXiv.","DOI":"10.1017\/S0140525X16001837"},{"key":"ref_8","unstructured":"Garnelo, M., Arulkumaran, K., and Shanahan, M. (2016). Towards Deep Symbolic Reinforcement Learning. arXiv."},{"key":"ref_9","unstructured":"Williams, P.L., and Beer, R.D. (2010). Nonnegative Decomposition of Multivariate Information. arXiv."},{"key":"ref_10","unstructured":"Rieke, F., Bialek, W., Warland, D., and de Ruyter van Steveninck, R. (1997). Spikes: Exploring the Neural Code, MIT Press."},{"key":"ref_11","unstructured":"Le, Q.V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., and Ng, A.Y. (2011). Building High-Level Features Using Large Scale Unsupervised Learning. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., and Fergus, R. (2014, January 6\u201312). Visualizing and Understanding Convolutional Networks. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"ref_13","unstructured":"Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., and LeCun, Y. (2014). The Loss Surfaces of Multilayer Networks. arXiv."},{"key":"ref_14","unstructured":"Kawaguchi, K. (2016). Deep Learning Without Poor Local Minima. arXiv."},{"key":"ref_15","unstructured":"S\u00f8rng\u00e5rd, B. (2014). Information Theory for Analyzing Neural Networks. [Master\u2019s Thesis, Norwegian University of Science and Technology]."},{"key":"ref_16","unstructured":"Schwartz-Ziv, R., and Tishby, N. (2017). Opening the Black Box of Deep Neural Networks via Information. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Achille, A., and Soatto, S. (2017). On the Emergence of Invariance and Disentangling in Deep Representations. arXiv.","DOI":"10.1109\/ITA.2018.8503149"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tishby, N., and Zaslavsky, N. (2015). Deep Learning and the Information Bottleneck Principle. arXiv.","DOI":"10.1109\/ITW.2015.7133169"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.neunet.2014.09.004","article-title":"Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information","volume":"64","author":"Berglund","year":"2015","journal-title":"Neural Netw."},{"key":"ref_20","unstructured":"Balduzzi, D., Frean, M., Leary, L., Lewis, J., Ma, K.W.D., and McWilliams, B. (2017). The Shattered Gradients Problem: If Resnets are the Answer, Then What is the Question?. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Hinton, G.E., and van Camp, D. (1993, January 26\u201328). Keeping the Neural Networks Simple by Minimizing the Description Length of the Weights. Proceedings of the Sixth Annual Conference on Computational Learning Theory (COLT), Santa Cruz, CA, USA.","DOI":"10.1145\/168304.168306"},{"key":"ref_22","unstructured":"Smolensky, P. (1986). Information Processing in Dynamical Systems: Foundations of Harmony Theory, MIT Press. Technical Report, DTIC Document."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Larochelle, H., and Bengio, Y. (2008, January 5\u20139). Classification Using Discriminative Restricted Boltzmann Machines. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.","DOI":"10.1145\/1390156.1390224"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","article-title":"A Fast Learning Algorithm for Deep Belief Nets","volume":"18","author":"Hinton","year":"2006","journal-title":"Neural Comput."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tieleman, T. (2008, January 5\u20139). Training Restricted Boltzmann Machines Using Approximations to the Likelihood Gradient. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.","DOI":"10.1145\/1390156.1390290"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Cover, T.M., and Thomas, J.A. (2006). Elements of Information Theory, Wiley.","DOI":"10.1002\/047174882X"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1088\/0954-898X_10_4_303","article-title":"How to Measure the Information Gained from one Symbol","volume":"12","author":"DeWeese","year":"1999","journal-title":"Netw. Comput. Neural Syst."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ince, R.A.A. (2017). Measuring Multivariate Redundant Information with Pointwise Common Change in Surprisal. Entropy, 19.","DOI":"10.3390\/e19070318"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"4644","DOI":"10.3390\/e17074644","article-title":"Quantifying Redundant Information in Predicting a Target Random Variable","volume":"17","author":"Griffith","year":"2015","journal-title":"Entropy"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Harder, M., Salge, C., and Polani, D. (2013). Bivariate Measure of Redundant Information. Phys. Rev. E, 87.","DOI":"10.1103\/PhysRevE.87.012130"},{"key":"ref_31","unstructured":"Gilbert, T., Kirkilionis, M., and Nicolis, G. Shared Information\u2014New Insights and Problems in Decomposing Information in Complex Systems. Proceedings of the European Conference on Complex Systems 2012."},{"key":"ref_32","unstructured":"Williams, P.L. (2011). Information Dynamics: Its Theory and Application to EmbodiedCognitive Systems. [Ph.D. Thesis, Indiana University]."},{"key":"ref_33","unstructured":"Lizier, J.T. (2010). The Local Information Dynamics of Distributed Computation in Complex Systems, Springer."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1007\/s10827-013-0458-4","article-title":"Synergy, Redundancy, and Multivariate Information Measures: An Experimentalist\u2019s Perspective","volume":"36","author":"Timme","year":"2014","journal-title":"J. Comput. Neurosci."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2161","DOI":"10.3390\/e16042161","article-title":"Quantifying Unique Information","volume":"16","author":"Bertschinger","year":"2014","journal-title":"Entropy"},{"key":"ref_36","first-page":"2405","article-title":"Geometry and Expressive Power of Conditional Restricted Boltzmann Machines","volume":"16","author":"Ay","year":"2015","journal-title":"J. Mach. Learn. Res."},{"key":"ref_37","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/19\/9\/474\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:44:14Z","timestamp":1760208254000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/19\/9\/474"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,9,6]]},"references-count":37,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2017,9]]}},"alternative-id":["e19090474"],"URL":"https:\/\/doi.org\/10.3390\/e19090474","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2017,9,6]]}}}