{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:53:41Z","timestamp":1760241221862,"version":"build-2065373602"},"reference-count":32,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,12,19]],"date-time":"2019-12-19T00:00:00Z","timestamp":1576713600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100011730","name":"Templeton World Charity Foundation","doi-asserted-by":"publisher","award":["0322"],"award-info":[{"award-number":["0322"]}],"id":[{"id":"10.13039\/501100011730","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>The goal of lossy data compression is to reduce the storage cost of a data set X while retaining as much information as possible about something (Y) that you care about. For example, what aspects of an image X contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping     X \u2192 Z \u2261 f ( X )     that maximizes the mutual information     I ( Z , Y )     while the entropy     H ( Z )     is kept below some fixed threshold. We present a new method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable X (an image, say) drawn from a class     Y \u2208 { 1 , \u2026 , n }     can be distilled into a vector     W = f  ( X )  \u2208  R  n \u2212 1       losslessly, so that     I ( W , Y ) = I ( X , Y )    ; for example, for a binary classification task of cats and dogs, each image X is mapped into a single real number W retaining all information that helps distinguish cats from dogs. For the     n = 2     case of binary classification, we then show how W can be further compressed into a discrete variable     Z =  g \u03b2   ( W )  \u2208  { 1 , \u2026 ,  m \u03b2  }      by binning W into     m \u03b2     bins, in such a way that varying the parameter    \u03b2    sweeps out the full Pareto frontier, solving a generalization of the discrete information bottleneck (DIB) problem. We argue that the most interesting points on this frontier are \u201ccorners\u201d maximizing     I ( Z , Y )     for a fixed number of bins     m = 2 , 3 , \u2026     which can conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm. We find that these Pareto frontiers are not concave, and that recently reported DIB phase transitions correspond to transitions between these corners, changing the number of clusters.<\/jats:p>","DOI":"10.3390\/e22010007","type":"journal-article","created":{"date-parts":[[2019,12,20]],"date-time":"2019-12-20T09:50:33Z","timestamp":1576835433000},"page":"7","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Pareto-Optimal Data Compression for Binary Classification Tasks"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7670-7190","authenticated-orcid":false,"given":"Max","family":"Tegmark","sequence":"first","affiliation":[{"name":"Department of Physics, MIT Kavli Institute &amp; Center for Brains, Minds &amp; Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]},{"given":"Tailin","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of Physics, MIT Kavli Institute &amp; Center for Brains, Minds &amp; Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,12,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1080\/14786440109462720","article-title":"LIII. On lines and planes of closest fit to systems of points in space","volume":"2","author":"Pearson","year":"1901","journal-title":"Lond. Edinb. Dublin Philos. Mag. J. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.A. (2008, January 5\u20139). Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.","DOI":"10.1145\/1390156.1390294"},{"key":"ref_3","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014, January 8\u201313). Generative adversarial nets. Proceedings of the Neural Information Processing Systems 2014, Montreal, QC, Canada."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1093\/biomet\/28.3-4.321","article-title":"Relation between two sets of variates","volume":"28","author":"Hotelling","year":"1936","journal-title":"Biometrica"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/BF02288367","article-title":"The approximation of one matrix by another of lower rank","volume":"1","author":"Eckart","year":"1936","journal-title":"Psychometrika"},{"key":"ref_6","unstructured":"van den Oord, A., Li, Y., and Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv."},{"key":"ref_7","unstructured":"Clark, D.G., Livezey, J.A., and Bouchard, K.E. (2019). Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis. arXiv."},{"key":"ref_8","unstructured":"Tegmark, M. (2019). Optimal Latent Representations: Distilling Mutual Information into Principal Pairs. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"4544","DOI":"10.1109\/TIT.2014.2327016","article-title":"Quantization of binary-input discrete memoryless channels","volume":"60","author":"Kurkoski","year":"2014","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_10","unstructured":"Tishby, N., Pereira, F.C., and Bialek, W. (2000). The information bottleneck method. arXiv."},{"key":"ref_11","unstructured":"Tan, A., Meshulam, L., Bialek, W., and Schwab, D. (2019, January 4\u20138). The renormalization group and information bottleneck: A unified framework. Proceedings of the APS Meeting Abstracts, Boston, MA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1611","DOI":"10.1162\/NECO_a_00961","article-title":"The deterministic information bottleneck","volume":"29","author":"Strouse","year":"2017","journal-title":"Neural Comput."},{"key":"ref_13","unstructured":"Alemi, A.A., Fischer, I., Dillon, J.V., and Murphy, K. (2016). Deep variational information bottleneck. arXiv."},{"key":"ref_14","unstructured":"Chalk, M., Marre, O., and Tkacik, G. (2016, January 5\u201310). Relevant sparse codes with variational information bottleneck. Proceedings of the Neural Information Processing Systems 2016, Barcelona, Spain."},{"key":"ref_15","unstructured":"Fischer, I. (2019, December 11). The Conditional Entropy Bottleneck. Available online: https:\/\/openreview.net\/forum?id=rkVOXhAqY7."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Amjad, R.A., and Geiger, B.C. (2019). Learning representations for neural network-based classification using the information bottleneck principle. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2019.2909031"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1007\/s00158-004-0465-1","article-title":"Adaptive weighted-sum method for bi-objective optimization: Pareto front generation","volume":"29","author":"Kim","year":"2005","journal-title":"Struct. Multidiscip. Optim."},{"key":"ref_18","unstructured":"Krizhevsky, A., Nair, V., and Hinton, G. (2019, December 11). The CIFAR-10 Dataset. Available online: https:\/\/www.cs.toronto.edu\/~kriz\/cifar.html."},{"key":"ref_19","unstructured":"LeCun, Y., Cortes, C., and Burges, C. (2019, December 11). MNIST Handwritten Digit Database. Available online: http:\/\/yann.lecun.com\/exdb\/mnist."},{"key":"ref_20","unstructured":"Xiao, H., Rasul, K., and Vollgraf, R. (2017). Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv."},{"key":"ref_21","unstructured":"Krizhevsky, A., and Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images, University of Toronto. Technical Report TR-2009."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1162\/neco_a_01136","article-title":"The information bottleneck and geometric clustering","volume":"31","author":"Strouse","year":"2019","journal-title":"Neural Comput."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wu, T., Fischer, I., Chuang, I., and Tegmark, M. (2019). Learnability for the information bottleneck. arXiv.","DOI":"10.3390\/e21100924"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wu, T., Fischer, I., Chuang, I., and Tegmark, M. (2019). Learnability for the information bottleneck. Entropy, 21.","DOI":"10.3390\/e21100924"},{"key":"ref_26","first-page":"031013","article-title":"Causal asymmetry in a quantum world","volume":"8","author":"Thompson","year":"2018","journal-title":"Phys. Rev. X"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"460","DOI":"10.1109\/TIT.1972.1054855","article-title":"Computation of channel capacity and rate-distortion functions","volume":"18","author":"Blahut","year":"1972","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/TIT.1972.1054753","article-title":"An algorithm for computing the capacity of arbitrary discrete memoryless channels","volume":"18","author":"Arimoto","year":"1972","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_29","first-page":"165","article-title":"Information bottleneck for Gaussian variables","volume":"6","author":"Chechik","year":"2005","journal-title":"J. Mach. Learn. Res."},{"key":"ref_30","unstructured":"Rezende, D.J., and Viola, F. (2018). Taming VAEs. arXiv."},{"key":"ref_31","first-page":"1947","article-title":"Emergence of invariance and disentanglement in deep representations","volume":"19","author":"Achille","year":"2018","journal-title":"J. Mach. Learn. Res."},{"key":"ref_32","unstructured":"Still, S. (2017). Thermodynamic cost and benefit of data representations. arXiv."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/1\/7\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:43:41Z","timestamp":1760190221000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/1\/7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,19]]},"references-count":32,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,1]]}},"alternative-id":["e22010007"],"URL":"https:\/\/doi.org\/10.3390\/e22010007","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2019,12,19]]}}}