{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:14:30Z","timestamp":1750220070956,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"2s","license":[{"start":{"date-parts":[[2023,2,17]],"date-time":"2023-02-17T00:00:00Z","timestamp":1676592000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:p>Image enhancement has stimulated significant research works over the past years for its great application potential in video conferencing scenarios. Nevertheless, most existing image enhancement approaches are still struggling to find a good tradeoff that reduces the computational cost as much as possible while maintaining plausible result quality. Recently, curve-based mapping methods are proposed and have shown great potential for real-time and high-quality image enhancement of arbitrary resolutions. In this article, we take advantage of the curve-based mapping representation and focus on further improving the enhancement quality and robustness, while minimizing additional computational costs. Specifically, we (1) carefully re-formulate the curve function to improve learning stability, and (2) aggregate different semantic attention into the curve regression process, which can overcome the major problems of curve-based methods that generate moderate results with low contrast. The semantic attention is jointly learned with the supervision from class activation mapping of pre-trained feature extractors, thus reducing the manual annotation cost of semantic labels. Experiments have shown that our proposed method significantly improves curve-based methods both qualitatively and quantitatively, achieving visually plausible results compared with other deep neural network-based enhancement methods, and maintains a very low computational cost, i.e., taking 18.7 ms for a 360p image on a single P40 GPU. Extensive experiments demonstrate that our method is also capable of video enhancement tasks.<\/jats:p>","DOI":"10.1145\/3564607","type":"journal-article","created":{"date-parts":[[2022,9,26]],"date-time":"2022-09-26T12:54:37Z","timestamp":1664196877000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Real-time Image Enhancement with Attention Aggregation"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4743-9657","authenticated-orcid":false,"given":"Qiqi","family":"Gao","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, Harbin, Heilongjiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9359-3586","authenticated-orcid":false,"given":"Jie","family":"Li","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, Heilongjiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4659-4935","authenticated-orcid":false,"given":"Tiejun","family":"Zhao","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, Heilongjiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9627-4492","authenticated-orcid":false,"given":"Yadong","family":"Wang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, Heilongjiang, China"}]}],"member":"320","published-online":{"date-parts":[[2023,2,17]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916). 265\u2013283."},{"key":"e_1_3_2_3_2","volume-title":"The Theory of Splines and Their Applications: Mathematics in Science and Engineering: A Series of Monographs and Textbooks","author":"Ahlberg J. Harold","year":"2016","unstructured":"J. Harold Ahlberg, Edwin Norman Nilson, and Joseph Leonard Walsh. 2016. The Theory of Splines and Their Applications: Mathematics in Science and Engineering: A Series of Monographs and Textbooks. Elsevier."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2591009"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2002.1008390"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818107"},{"key":"e_1_3_2_7_2","first-page":"97","volume-title":"Proceedings of the CVPR 2011","author":"Bychkovsky Vladimir","year":"2011","unstructured":"Vladimir Bychkovsky, Sylvain Paris, Eric Chan, and Fr\u00e9do Durand. 2011. Learning photographic global tonal adjustment with a database of input\/output image pairs. In Proceedings of the CVPR 2011. IEEE, 97\u2013104."},{"key":"e_1_3_2_8_2","first-page":"4778","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Caballero Jose","year":"2017","unstructured":"Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4778\u20134787."},{"key":"e_1_3_2_9_2","first-page":"1105","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Chen Dongdong","year":"2017","unstructured":"Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 1105\u20131114."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073592"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2461912.2461997"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3181974"},{"key":"e_1_3_2_15_2","first-page":"235","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Huang Yan","year":"2015","unstructured":"Yan Huang, Wei Wang, and Liang Wang. 2015. Bidirectional recurrent convolutional networks for multi-frame super-resolution. In Proceedings of the Advances in Neural Information Processing Systems. 235\u2013243."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350994"},{"key":"e_1_3_2_17_2","unstructured":"Forrest N. Iandola Song Han Matthew W. Moskewicz Khalid Ashraf William J. Dally and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv:1602.07360. Retrieved from https:\/\/arxiv.org\/abs\/1602.07360."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.632"},{"key":"e_1_3_2_19_2","first-page":"694","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Johnson Justin","year":"2016","unstructured":"Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision. Springer, 694\u2013711."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305564"},{"key":"e_1_3_2_21_2","first-page":"374","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Kim Han-Ul","year":"2020","unstructured":"Han-Ul Kim, Young Jun Koh, and Chang-Su Kim. 2020. PieNet: Personalized image enhancement network. In Proceedings of the European Conference on Computer Vision. Springer, 374\u2013390."},{"key":"e_1_3_2_22_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915) San Diego CA USA May 7-9 2015 Conference Track Proceedings."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1275808.1276497"},{"key":"e_1_3_2_24_2","first-page":"170","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Lai Wei-Sheng","year":"2018","unstructured":"Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018. Learning blind video temporal consistency. In Proceedings of the European Conference on Computer Vision. 170\u2013185."},{"key":"e_1_3_2_25_2","first-page":"560","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Liu Sifei","year":"2016","unstructured":"Sifei Liu, Jinshan Pan, and Ming-Hsuan Yang. 2016. Learning recursive filters for low-level vision via a hybrid neural network. In Proceedings of the European Conference on Computer Vision. Springer, 560\u2013576."},{"issue":"3","key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3446619","article-title":"Video decolorization based on the CNN and LSTM neural network","volume":"17","author":"Liu Shiguang","year":"2021","unstructured":"Shiguang Liu, Huixin Wang, and Xiaoli Zhang. 2021. Video decolorization based on the CNN and LSTM neural network. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 3 (2021), 1\u201318.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_2_27_2","first-page":"12826","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Moran Sean","year":"2020","unstructured":"Sean Moran, Pierre Marza, Steven McDonagh, Sarah Parisot, and Gregory Slabaugh. 2020. DeepLPF: Deep local parametric filters for image enhancement. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 12826\u201312835."},{"key":"e_1_3_2_28_2","first-page":"5928","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Park Jongchan","year":"2018","unstructured":"Jongchan Park, Joon-Young Lee, Donggeun Yoo, and In So Kweon. 2018. Distort-and-recover: Color enhancement using deep reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5928\u20135936."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.262"},{"key":"e_1_3_2_30_2","unstructured":"Manuel Rebol and Patrick Kn\u00f6belreiter. 2020. Frame-to-frame consistent semantic segmentation. In Proceedings of the Joint Austrian Computer Vision And Robotics Workshop (ACVRW\u201920) ."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_33_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs\/1409.1556 (2015)."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.33"},{"key":"e_1_3_2_35_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT press."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.479"},{"issue":"4","key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2010324.1964959","article-title":"Example-based image color and tone style enhancement","volume":"30","author":"Wang Baoyuan","year":"2011","unstructured":"Baoyuan Wang, Yizhou Yu, and Ying-Qing Xu. 2011. Example-based image color and tone style enhancement. ACM Transactions on Graphics 30, 4 (2011), 1\u201312.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00701"},{"key":"e_1_3_2_39_2","first-page":"4111","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Wang Wei","year":"2019","unstructured":"Wei Wang, Xin Chen, Cheng Yang, Xiang Li, Xuemei Hu, and Tao Yue. 2019. Enhancing low light videos by exploring high sensitivity camera noise. In Proceedings of the IEEE International Conference on Computer Vision. 4111\u20134119."},{"key":"e_1_3_2_40_2","article-title":"CIELAB color space-Wikipedia, The Free Encyclopedia","author":"contributors Wikipedia","year":"2021","unstructured":"Wikipedia contributors. 2021. CIELAB color space-Wikipedia, The Free Encyclopedia. Retrieved February 27, 2021 from https:\/\/en.wikipedia.org\/w\/index.php?title=CIELAB_color _space&oldid=1008944203","journal-title":"Retrieved February 27, 2021 from https:\/\/en.wikipedia.org\/w\/index.php?title=CIELAB_color _space&oldid=1008944203"},{"key":"e_1_3_2_41_2","first-page":"327","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Xia Xide","year":"2020","unstructured":"Xide Xia, Meng Zhang, Tianfan Xue, Zheng Sun, Hui Fang, Brian Kulis, and Jiawen Chen. 2020. Joint bilateral learning for real-time universal photorealistic style transfer. In Proceedings of the European Conference on Computer Vision. Springer, 327\u2013342."},{"key":"e_1_3_2_42_2","first-page":"802","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Xingjian SHI","year":"2015","unstructured":"SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems. 802\u2013810."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3424341"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2790296"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413951"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.319"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3564607","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3564607","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:11Z","timestamp":1750183751000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3564607"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,17]]},"references-count":46,"journal-issue":{"issue":"2s","published-print":{"date-parts":[[2023,6,30]]}},"alternative-id":["10.1145\/3564607"],"URL":"https:\/\/doi.org\/10.1145\/3564607","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2023,2,17]]},"assertion":[{"value":"2021-09-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-10","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}