{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:02:14Z","timestamp":1750309334385,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,8,24]],"date-time":"2024-08-24T00:00:00Z","timestamp":1724457600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,8,25]]},"DOI":"10.1145\/3637528.3671904","type":"proceedings-article","created":{"date-parts":[[2024,8,25]],"date-time":"2024-08-25T04:54:55Z","timestamp":1724561695000},"page":"2212-2223","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Scaling Training Data with Lossy Image Compression"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7710-1841","authenticated-orcid":false,"given":"Katherine L","family":"Mentzer","sequence":"first","affiliation":[{"name":"Granica, Mountain View, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0267-8574","authenticated-orcid":false,"given":"Andrea","family":"Montanari","sequence":"additional","affiliation":[{"name":"Granica, Mountain View, CA, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,8,24]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"[n. d.]. JPEG An Overview of JPEG 1. https:\/\/jpeg.org\/jpeg\/index.html. Accessed: 2024-06-04."},{"volume-title":"Portable Network Graphics","key":"e_1_3_2_2_2_1","unstructured":"[n. d.]. PNG, Portable Network Graphics. http:\/\/libpng.org\/pub\/png\/. Accessed: 2024-06-04."},{"key":"e_1_3_2_2_3_1","first-page":"22300","article-title":"Revisiting neural scaling laws in language and vision","volume":"35","author":"Alabdulmohsin Ibrahim M","year":"2022","unstructured":"Ibrahim M Alabdulmohsin, Behnam Neyshabur, and Xiaohua Zhai. 2022. Revisiting neural scaling laws in language and vision. Advances in Neural Information Processing Systems 35 (2022), 22300--22312.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_4_1","volume-title":"Sami Boukortt, Martin Bruse, Iulia-Maria Com s, a, Moritz Firsching, Thomas Fischbacher, Evgenii Kliuchnikov, Sebastian Gomez, Robert Obryk, et al.","author":"Alakuijala Jyrki","year":"2019","unstructured":"Jyrki Alakuijala, Ruud Van Asseldonk, Sami Boukortt, Martin Bruse, Iulia-Maria Com s, a, Moritz Firsching, Thomas Fischbacher, Evgenii Kliuchnikov, Sebastian Gomez, Robert Obryk, et al. 2019. JPEG XL next-generation image compression architecture and coding tools. In Applications of Digital Image Processing XLII, Vol. 11137. SPIE, 112--124."},{"key":"e_1_3_2_2_5_1","volume-title":"Explaining neural scaling laws. arXiv preprint arXiv:2102.06701","author":"Bahri Yasaman","year":"2021","unstructured":"Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. 2021. Explaining neural scaling laws. arXiv preprint arXiv:2102.06701 (2021)."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10599-4_29"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.21037\/atm.2020.02.44"},{"key":"e_1_3_2_2_8_1","volume-title":"Dimension free ridge regression. arXiv:2210.08571","author":"Cheng Chen","year":"2022","unstructured":"Chen Cheng and Andrea Montanari. 2022. Dimension free ridge regression. arXiv:2210.08571 (2022)."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.350"},{"key":"e_1_3_2_2_10_1","volume-title":"Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv preprint arXiv:1705.02900","author":"Das Nilaksh","year":"2017","unstructured":"Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Li Chen, Michael E Kounavis, and Duen Horng Chau. 2017. Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv preprint arXiv:1705.02900 (2017)."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Ingrid Daubechies. 1992. Ten lectures on wavelets. SIAM.","DOI":"10.1137\/1.9781611970104"},{"volume-title":"Understanding how image quality affects deep neural networks. In 2016 eighth international conference on quality of multimedia experience (QoMEX)","author":"Dodge Samuel","key":"e_1_3_2_2_12_1","unstructured":"Samuel Dodge and Lina Karam. 2016. Understanding how image quality affects deep neural networks. In 2016 eighth international conference on quality of multimedia experience (QoMEX). IEEE, 1--6."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.720544"},{"key":"e_1_3_2_2_14_1","volume-title":"A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853","author":"Dziugaite Gintare Karolina","year":"2016","unstructured":"Gintare Karolina Dziugaite, Zoubin Ghahramani, and Daniel M Roy. 2016. A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853 (2016)."},{"key":"e_1_3_2_2_15_1","volume-title":"Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV) (Oct","author":"He Kaiming","year":"2017","unstructured":"Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. 2017. Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV) (Oct 2017)."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_17_1","unstructured":"Tom Henighan Jared Kaplan Mor Katz Mark Chen Christopher Hesse Jacob Jackson Heewoo Jun Tom B Brown Prafulla Dhariwal Scott Gray et al. 2020. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701 (2020)."},{"key":"e_1_3_2_2_18_1","volume-title":"Scaling laws for transfer. arXiv preprint arXiv:2102.01293","author":"Hernandez Danny","year":"2021","unstructured":"Danny Hernandez, Jared Kaplan, Tom Henighan, and Sam McCandlish. 2021. Scaling laws for transfer. arXiv preprint arXiv:2102.01293 (2021)."},{"key":"e_1_3_2_2_19_1","volume-title":"Yang Yang, and Yanqi Zhou.","author":"Hestness Joel","year":"2017","unstructured":"Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. 2017. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017)."},{"key":"e_1_3_2_2_20_1","volume-title":"Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al .","author":"Hoffmann Jordan","year":"2022","unstructured":"Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al . 2022. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556 (2022)."},{"key":"e_1_3_2_2_21_1","volume-title":"Scaling laws for neural language models. arXiv preprint arXiv:2001.08361","author":"Kaplan Jared","year":"2020","unstructured":"Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020)."},{"key":"e_1_3_2_2_22_1","volume-title":"Proceedings, Part V 13","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V 13. Springer, 740--755."},{"volume-title":"A wavelet tour of signal processing","author":"Mallat St\u00e9phane","key":"e_1_3_2_2_23_1","unstructured":"St\u00e9phane Mallat. 1999. A wavelet tour of signal processing. Elsevier."},{"key":"e_1_3_2_2_24_1","volume-title":"Aleksandra Piktus, Nouamane Tazi, Sampo Pyysalo, Thomas Wolf, and Colin Raffel.","author":"Muennighoff Niklas","year":"2023","unstructured":"Niklas Muennighoff, Alexander M Rush, Boaz Barak, Teven Le Scao, Aleksandra Piktus, Nouamane Tazi, Sampo Pyysalo, Thomas Wolf, and Colin Raffel. 2023. Scaling Data-Constrained Language Models. arXiv preprint arXiv:2305.16264 (2023)."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_2_26_1","volume-title":"International Conference on Learning Representations.","author":"Rosenfeld Jonathan S","year":"2019","unstructured":"Jonathan S Rosenfeld, Amir Rosenfeld, Yonatan Belinkov, and Nir Shavit. 2019. A Constructive Prediction of the Generalization Error Across Scales. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_27_1","first-page":"25278","article-title":"Laion-5b: An open large-scale dataset for training next generation image-text models","volume":"35","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35 (2022), 25278--25294.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_28_1","volume-title":"International Conference on Learning Representations.","author":"Tay Yi","year":"2021","unstructured":"Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, and Donald Metzler. 2021. Scale Efficiently: Insights from Pretraining and Finetuning Transformers. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_29_1","volume-title":"International conference on learning representations.","author":"Theis Lucas","year":"2022","unstructured":"Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Husz\u00e1r. 2022. Lossy image compression with compressive autoencoders. In International conference on learning representations."},{"key":"e_1_3_2_2_30_1","first-page":"1","article-title":"Benign overfitting in ridge regression","volume":"24","author":"Tsigler Alexander","year":"2023","unstructured":"Alexander Tsigler and Peter L Bartlett. 2023. Benign overfitting in ridge regression. Journal of Machine Learning Research 24, 123 (2023), 1--76.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_2_31_1","unstructured":"Vasilis Vryniotis. 2021. How to Train State-Of-The-Art Models Using TorchVision's Latest Primitives. https:\/\/pytorch.org\/blog\/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives\/. Accessed: 2024-01--29."},{"key":"e_1_3_2_2_32_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28--37","author":"Zamir Syed Waqas","year":"2019","unstructured":"Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, and Xiang Bai. 2019. iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28--37."},{"key":"e_1_3_2_2_33_1","volume-title":"SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. CoRR abs\/2105.15203","author":"Xie Enze","year":"2021","unstructured":"Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. 2021. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. CoRR abs\/2105.15203 (2021). arXiv:2105.15203 https:\/\/arxiv.org\/abs\/2105.15203"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.3390\/e23070881"}],"event":{"name":"KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"],"location":"Barcelona Spain","acronym":"KDD '24"},"container-title":["Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3637528.3671904","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3637528.3671904","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:04:15Z","timestamp":1750291455000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3637528.3671904"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,24]]},"references-count":34,"alternative-id":["10.1145\/3637528.3671904","10.1145\/3637528"],"URL":"https:\/\/doi.org\/10.1145\/3637528.3671904","relation":{},"subject":[],"published":{"date-parts":[[2024,8,24]]},"assertion":[{"value":"2024-08-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}