{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T10:52:57Z","timestamp":1772103177108,"version":"3.50.1"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,11,12]],"date-time":"2021-11-12T00:00:00Z","timestamp":1636675200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2020AAA0106300"],"award-info":[{"award-number":["2020AAA0106300"]}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["61872215 and 61771273"],"award-info":[{"award-number":["61872215 and 61771273"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Shenzhen Science and Technology Program","award":["RCYX20200714114523079"],"award-info":[{"award-number":["RCYX20200714114523079"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2021,11,30]]},"abstract":"<jats:p>\n            With the growth of computer vision-based applications, an explosive amount of images have been uploaded to cloud servers that host such online computer vision algorithms, usually in the form of deep learning models. JPEG has been used as the\n            <jats:italic>de facto<\/jats:italic>\n            compression and encapsulation method for images. However, standard JPEG configuration does not always perform well for compressing images that are to be processed by a deep learning model\u2014for example, the standard quality level of JPEG leads to 50% of size overhead (compared with the best quality level selection) on ImageNet under the same inference accuracy in popular computer vision models (e.g., InceptionNet and ResNet). Knowing this, designing a better JPEG configuration for online computer vision-based services is still extremely challenging. First, cloud-based computer vision models are usually a black box to end-users; thus, it is challenging to design JPEG configuration without knowing their model structures. Second, the \u201coptimal\u201d JPEG configuration is not fixed; instead, it is determined by confounding factors, including the characteristics of the input images and the model, the expected accuracy and image size, and so forth. In this article, we propose a reinforcement learning (RL)-based adaptive JPEG configuration framework, AdaCompress. In particular, we design an edge (i.e., user-side) RL agent that learns the optimal compression quality level to achieve an expected inference accuracy and upload image size, only from the online inference results, without knowing details of the model structures. Furthermore, we design an\n            <jats:italic>explore-exploit<\/jats:italic>\n            mechanism to let the framework fast switch an agent when it detects a performance degradation, mainly due to the input change (e.g., images captured across daytime and night). Our evaluation experiments using real-world online computer vision-based APIs from Amazon Rekognition, Face++, and Baidu Vision show that our approach outperforms existing baselines by reducing the size of images by one-half to one-third while the overall classification accuracy only decreases slightly. Meanwhile, AdaCompress adaptively re-trains or re-loads the RL agent promptly to maintain the performance.\n          <\/jats:p>","DOI":"10.1145\/3447878","type":"journal-article","created":{"date-parts":[[2021,11,12]],"date-time":"2021-11-12T21:16:06Z","timestamp":1636751766000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach"],"prefix":"10.1145","volume":"17","author":[{"given":"Zhaoliang","family":"He","sequence":"first","affiliation":[{"name":"Tsinghua University and Peng Cheng Laboratory, Shenzhen, China"}]},{"given":"Hongshan","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua University, Shenzhen, China"}]},{"given":"Zhi","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University and Peng Cheng Laboratory, Shenzhen, China"}]},{"given":"Shutao","family":"Xia","sequence":"additional","affiliation":[{"name":"Tsinghua University, Shenzhen, China"}]},{"given":"Wenwu","family":"Zhu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2021,11,12]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24702-1_11"},{"key":"e_1_3_2_3_2","unstructured":"Amazon. 2019. Amazon Rekognition. Retrieved September 15 2021 from https:\/\/aws.amazon.com\/rekognition\/."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178146"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3005348"},{"key":"e_1_3_2_6_2","unstructured":"Baidu. 2019. Baidu AI Open Platform. Retrieved September 15 2021 from https:\/\/ai.baidu.com\/."},{"key":"e_1_3_2_7_2","unstructured":"Johannes Ball\u00e9 David Minnen Saurabh Singh Sung Jin Hwang and Nick Johnston. 2018. Variational image compression with a scale hyperprior. arXiv:1802.01436."},{"key":"e_1_3_2_8_2","unstructured":"Shumeet Baluja David Marwood and Nicholas Johnston. 2019. Task-specific color spaces and compression for machine-based object recognition. Technical Disclosure Commons March 21 2019."},{"key":"e_1_3_2_9_2","unstructured":"M. Calore. 2010. Meet WebP Google\u2019s New Image Format. Wired October 1 2010."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00066"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2013.6738345"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2011.6116299"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00796"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/11552499_16"},{"key":"e_1_3_2_15_2","first-page":"6","article-title":"10918-1. Digital compression and coding of continuous-tone still images (JPEG)","volume":"81","author":"DIS ISO","year":"1991","unstructured":"ISO DIS. 1991. 10918-1. Digital compression and coding of continuous-tone still images (JPEG). CCITT Recommendation T 81 (1991), 6.","journal-title":"CCITT Recommendation T"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/QoMEX.2016.7498955"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3194554.3194565"},{"key":"e_1_3_2_18_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Evtimov Ivan","year":"2018","unstructured":"Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. 2018. Robust physical-world attacks on deep learning models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_19_2","unstructured":"Face++. 2019. Face++ Cognitive Services. Retrieved September 15 2021 from https:\/\/www.faceplusplus.com\/."},{"key":"e_1_3_2_20_2","unstructured":"FLIR. 2018. FLIR Thermal Dataset. Retrieved September 15 2021 from https:\/\/www.flir.com\/oem\/adas\/adas-dataset-form\/."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.9"},{"key":"e_1_3_2_22_2","unstructured":"Yunchao Gong Liu Liu Ming Yang and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115."},{"key":"e_1_3_2_23_2","unstructured":"Google Inc.2019. Google Edge TPU. Retrieved September 15 2021 from https:\/\/cloud.google.com\/edge-tpu\/."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327308"},{"key":"e_1_3_2_25_2","unstructured":"Song Han Huizi Mao and William J. Dally. 2015. A deep neural network compression pipeline: Pruning quantization Huffman encoding. arXiv:1510.00149."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969366"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2906388.2906396"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2016.2607178"},{"key":"e_1_3_2_30_2","unstructured":"Yun Chao Hu Milan Patel Dario Sabella Nurit Sprecher and Valerie Young. 2015. Mobile Edge Computing\u2014A Key Technology Towards 5G . White Paper No. 11. ETSI."},{"key":"e_1_3_2_31_2","article-title":"Huawei Atlas 500 Edge Station","year":"2019","unstructured":"Huawei. 2019. Huawei Atlas 500 Edge Station. Retrieved September 15, 2021 from https:\/\/e.huawei.com\/en\/products\/cloud-computing-dc\/servers\/g-series\/atlas-500.","journal-title":"https:\/\/e.huawei.com\/en\/products\/cloud-computing-dc\/servers\/g-series\/atlas-500"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3081333.3081360"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/SiPS.2014.6986082"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093337.3037698"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_3_2_36_2","unstructured":"Jooyoung Lee Seunghyun Cho and Seung-Kwon Beack. 2018. Context-adaptive entropy model for end-to-end optimized image compression. arXiv:1809.10452."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350874"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/PADSW.2018.8645013"},{"key":"e_1_3_2_39_2","unstructured":"Python Imaging Library. 2019. Image File Formats. Retrieved September 15 2021 from https:\/\/pillow.readthedocs.io\/en\/stable\/."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3195970.3196022"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.5555\/3327546.3327736"},{"key":"e_1_3_2_42_2","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602."},{"key":"e_1_3_2_43_2","volume-title":"Proceedings of the Picture Coding Symposium","volume":"2018","author":"Ohm Jens-Rainer","year":"2018","unstructured":"Jens-Rainer Ohm and Gary J. Sullivan. 2018. Versatile video coding\u2013towards the next generation of video compression. In Proceedings of the Picture Coding Symposium, Vol. 2018."},{"key":"e_1_3_2_44_2","unstructured":"Image-Net.org. 2012. ImageNet Large Scale Visual Recognition Challenge 2012. Retrieved September 15 2021 from http:\/\/image-net.org\/challenges\/LSVRC\/2012\/."},{"key":"e_1_3_2_45_2","article-title":"Raspberry Pi 4 Model B","author":"Pi Raspberry","year":"2019","unstructured":"Raspberry Pi. 2019. Raspberry Pi 4 Model B. Retrieved September 15, 2021 from https:\/\/www.raspberrypi.org\/products\/raspberry-pi-4-model-b\/.","journal-title":"https:\/\/www.raspberrypi.org\/products\/raspberry-pi-4-model-b\/"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1117\/1.1469618"},{"key":"e_1_3_2_47_2","article-title":"Improving Language Understanding by Generative Pre-training","author":"Radford Alec","year":"2018","unstructured":"Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-training. Retrieved September 15, 2021 from https:\/\/www.cs.ubc.ca\/ amuham01\/LING530\/papers\/radford2018improving.pdf.","journal-title":"https:\/\/www.cs.ubc.ca\/ amuham01\/LING530\/papers\/radford2018improving.pdf"},{"key":"e_1_3_2_48_2","unstructured":"R. Flynn. 2019. Lossy Image Optimization. Retrieved September 15 2021 from https:\/\/github.com\/rflynn\/imgmin."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.5555\/3305890.3305983"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCCN.2019.2936193"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2017.9"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_3_2_55_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556."},{"key":"e_1_3_2_56_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556."},{"key":"e_1_3_2_57_2","article-title":"Speedtest Global Index","year":"2019","unstructured":"SpeedTest. 2019. Speedtest Global Index. Retrieved September 15, 2021 from https:\/\/www.speedtest.net\/global-index.","journal-title":"https:\/\/www.speedtest.net\/global-index"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2012.2221191"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.5555\/3312046"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_61_2","unstructured":"Lucas Theis Wenzhe Shi Andrew Cunningham and Ferenc Husz\u00e1r. 2017. Lossy image compression with compressive autoencoders. arXiv:1703.00395."},{"key":"e_1_3_2_62_2","unstructured":"George Toderici Sean M. O\u2019Malley Sung Jin Hwang Damien Vincent David Minnen Shumeet Baluja Michele Covell and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. arXiv:1511.06085."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.577"},{"key":"e_1_3_2_64_2","unstructured":"Robert Torfason Fabian Mentzer Eirikur Agustsson Michael Tschannen Radu Timofte and Luc Van Gool. 2018. Towards image understanding from deep compression without decoding. arXiv:1803.06131."},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/30.125072"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155373"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413968"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2886017"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CISS.2018.8362276"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447878","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447878","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:28Z","timestamp":1750268968000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447878"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,12]]},"references-count":68,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,11,30]]}},"alternative-id":["10.1145\/3447878"],"URL":"https:\/\/doi.org\/10.1145\/3447878","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,12]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}