{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T03:19:56Z","timestamp":1768274396689,"version":"3.49.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,12,25]],"date-time":"2024-12-25T00:00:00Z","timestamp":1735084800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62371310 and 62172400"],"award-info":[{"award-number":["62371310 and 62172400"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100021171","name":"Guangdong Basic and Applied Basic Research Foundation","doi-asserted-by":"crossref","award":["2023A1515011236"],"award-info":[{"award-number":["2023A1515011236"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Stable Support Project of Shenzhen","award":["20231122122722001"],"award-info":[{"award-number":["20231122122722001"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:p>\n            RGB-D data, being homogeneous cross-modal data, demonstrates significant correlations among data elements. However, current research focuses only on a uni-directional pattern of cross-modal contextual information, neglecting the exploration of bi-directional relationships in the compression field. Thus, we propose a joint RGB-D compression scheme, which is combined with Bi-Directional Cross-Modal Prior Transfer (Bi-CPT) modules and a Bi-Directional Cross-Modal Enhanced Entropy (Bi-CEE) model. The Bi-CPT module is designed for compact representations of cross-modal features, effectively eliminating spatial and modality redundancies at different granularity levels. In contrast to the traditional entropy models, our proposed Bi-CEE model not only achieves spatial-channel contextual adaptation through partitioning RGB and depth features but also incorporates information from other modalities as prior to enhance the accuracy of probability estimation for latent variables. Furthermore, this model enables parallel multi-stage processing to accelerate coding. Experimental results demonstrate the superiority of our proposed framework over the current compression scheme, outperforming both rate-distortion performance and downstream tasks, including surface reconstruction and semantic segmentation. The source code will be available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/xyy7\/Learning-based-RGB-D-Image-Compression\">https:\/\/github.com\/xyy7\/Learning-based-RGB-D-Image-Compression<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3702997","type":"journal-article","created":{"date-parts":[[2024,11,5]],"date-time":"2024-11-05T16:38:18Z","timestamp":1730824698000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["RGB-D Data Compression via Bi-Directional Cross-Modal Prior Transfer and Enhanced Entropy Modeling"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-0060-1558","authenticated-orcid":false,"given":"Yuyu","family":"Xu","sequence":"first","affiliation":[{"name":"Shenzhen University, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4188-1572","authenticated-orcid":false,"given":"Pingping","family":"Zhang","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8478-1802","authenticated-orcid":false,"given":"Minghui","family":"Chen","sequence":"additional","affiliation":[{"name":"Shenzhen University, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6067-8188","authenticated-orcid":false,"given":"Qiudan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shenzhen University, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0416-7719","authenticated-orcid":false,"given":"Wenhui","family":"Wu","sequence":"additional","affiliation":[{"name":"Shenzhen University, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9457-7801","authenticated-orcid":false,"given":"Yun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Sun Yat-Sen University, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2948-6468","authenticated-orcid":false,"given":"Xu","family":"Wang","sequence":"additional","affiliation":[{"name":"Shenzhen University, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,12,25]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"1","volume-title":"2016 Picture Coding Symposium","author":"Ball\u00e9 Johannes","year":"2016","unstructured":"Johannes Ball\u00e9, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-End Optimization of Nonlinear Transform Codes for Perceptual Quality. In 2016 Picture Coding Symposium, 1\u20135."},{"key":"e_1_3_1_3_2","first-page":"1","volume-title":"5th International Conference on Learning Representations","author":"Ball\u00e9 Johannes","year":"2017","unstructured":"Johannes Ball\u00e9, Valero Laparra, and Eero P. Simoncelli. 2017. End-to-End Optimized Image Compression. In 5th International Conference on Learning Representations, 1\u201327."},{"key":"e_1_3_1_4_2","first-page":"1","volume-title":"6th International Conference on Learning Representations","author":"Ball\u00e9 Johannes","year":"2018","unstructured":"Johannes Ball\u00e9, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational Image Compression with a Scale Hyperprior. In 6th International Conference on Learning Representations, 1\u201310."},{"key":"e_1_3_1_5_2","unstructured":"Fabrice Bellard. 2015. Better Portable Graphics. Retrieved from https:\/\/bellard.org\/bpg\/"},{"key":"e_1_3_1_6_2","first-page":"3206","volume-title":"2022 IEEE International Conference on Image Processing","author":"Chen Minghui","year":"2022","unstructured":"Minghui Chen, Pingping Zhang, Zhuo Chen, Yun Zhang, Xu Wang, and Sam Kwong. 2022. End-to-End Depth Map Compression Framework via RGB-to-Depth Structure Priors Learning. In 2022 IEEE International Conference on Image Processing, 3206\u20133210."},{"key":"e_1_3_1_7_2","first-page":"7936","volume-title":"2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cheng Zhengxue","year":"2020","unstructured":"Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 7936\u20137945."},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1145\/357744.357757","volume-title":"the ACM Multimedia 2000 Workshops","author":"Christopoulos Charilaos A.","year":"2000","unstructured":"Charilaos A. Christopoulos, Touradj Ebrahimi, and Athanassios N. Skodras. 2000. JPEG2000: The New Still Picture Compression Standard. In the ACM Multimedia 2000 Workshops, 45\u201349."},{"key":"e_1_3_1_9_2","first-page":"29","volume-title":"8th International Conference on Signal Image Technology and Internet Based Systems","author":"Farrugia Reuben A.","year":"2012","unstructured":"Reuben A. Farrugia. 2012. Efficient Depth Image Compression Using Accurate Depth Discontinuity Detection and Prediction. In 8th International Conference on Signal Image Technology and Internet Based Systems, 29\u201335."},{"key":"e_1_3_1_10_2","first-page":"1","volume-title":"2015 Picture Coding Symposium","author":"F\u00f6rster Emmy-Charlotte","year":"2015","unstructured":"Emmy-Charlotte F\u00f6rster, Thomas L\u00f6we, Stephan Wenger, and Marcus A. Magnor. 2015. RGB-Guided Depth Map Compression via Compressed Sensing and Sparse Coding. In 2015 Picture Coding Symposium, 1\u20134."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2247584"},{"key":"e_1_3_1_12_2","first-page":"1916","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition Workshops","author":"Gao Yixin","year":"2021","unstructured":"Yixin Gao, Yaojun Wu, Zongyu Guo, Zhizheng Zhang, and Zhibo Chen. 2021. Perceptual Friendly Variable Rate Image Compression. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 1916\u20131920."},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1109\/PCS.2012.6213291","article-title":"Efficient Depth Map Compression Based on Lossless Edge Coding and Diffusion","author":"Gautier Josselin","year":"2012","unstructured":"Josselin Gautier, Olivier Le Meur, and Christine Guillemot. 2012. Efficient Depth Map Compression Based on Lossless Edge Coding and Diffusion. In 2012 Picture Coding Symposium, 81\u201384.","journal-title":"2012 Picture Coding Symposium"},{"key":"e_1_3_1_14_2","first-page":"5718","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"He Dailan","year":"2022","unstructured":"Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. 2022. ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 5718\u20135727."},{"key":"e_1_3_1_15_2","first-page":"14771","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"He Dailan","year":"2021","unstructured":"Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. 2021. Checkerboard Context Model for Efficient Learned Image Compression. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14771\u201314780."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_1_17_2","first-page":"11013","volume-title":"the 34th AAAI Conference on Artificial Intelligence","author":"Hu Yueyu","year":"2020","unstructured":"Yueyu Hu, Wenhan Yang, and Jiaying Liu. 2020. Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression. In the 34th AAAI Conference on Artificial Intelligence, 11013\u201311020."},{"key":"e_1_3_1_18_2","unstructured":"Wei Jiang and Ronggang Wang. 2023. MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression. In ICML 2023 Workshop Neural Compression: From Information Theory to Applications 1\u20136. Retrieved from https:\/\/openreview.net\/forum?id=hxIpcSoz2t"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611694"},{"key":"e_1_3_1_20_2","unstructured":"Joint Video Experts Team (JVET). 2021. VVC Official Test Model VTM. Retrieved from https:\/\/vcgit.hhi.fraunhofer.de\/jvet\/VVCSoftware_VTM\/-\/tree\/VTM-12.1"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3124978"},{"key":"e_1_3_1_22_2","first-page":"4230","volume-title":"29th ACM International Conference on Multimedia","author":"Li Jiguo","year":"2021","unstructured":"Jiguo Li, Chuanmin Jia, Xinfeng Zhang, Siwei Ma, and Wen Gao. 2021. Cross Modal Compression: Towards Human-Comprehensible Semantic Compression. In 29th ACM International Conference on Multimedia, 4230\u20134238."},{"key":"e_1_3_1_23_2","first-page":"2356","volume-title":"2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Liu Jie","year":"2020","unstructured":"Jie Liu, Wenjie Zhang, Yuting Tang, Jie Tang, and Gangshan Wu. 2020. Residual Feature Aggregation Network for Image Super-Resolution. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2356\u20132365."},{"key":"e_1_3_1_24_2","first-page":"6073","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Lu Guo","year":"2022","unstructured":"Guo Lu, Tianxiong Zhong, Jing Geng, Qiang Hu, and Dong Xu. 2022. Learning Based Multi-Modality Image and Video Compression. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 6073\u20136082."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2910119"},{"key":"e_1_3_1_26_2","first-page":"10794","article-title":"Joint Autoregressive and Hierarchical Priors for Learned Image Compression","author":"Minnen David","year":"2018","unstructured":"David Minnen, Johannes Ball\u00e9, and George Toderici. 2018. Joint Autoregressive and Hierarchical Priors for Learned Image Compression. In Advances in Neural Information Processing Systems, 10794\u201310803.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_27_2","first-page":"59","volume-title":"Joint Virtual Reality Conference of EGVE (JVRC \u201911)","author":"Pece Fabrizio","year":"2011","unstructured":"Fabrizio Pece, Jan Kautz, and Tim Weyrich. 2011. Adapting Standard Video Codecs for Depth Streaming. In Joint Virtual Reality Conference of EGVE (JVRC \u201911), 59\u201366."},{"key":"e_1_3_1_28_2","first-page":"2386","volume-title":"2022 IEEE International Conference on Image Processing","author":"Peng Bo","year":"2022","unstructured":"Bo Peng, Yuying Jing, Dengchao Jin, Xiangrui Liu, Zhaoqing Pan, and Jianjun Lei. 2022. Texture-Guided End-to-End Depth Map Compression. In 2022 IEEE International Conference on Image Processing, 2386\u20132390."},{"key":"e_1_3_1_29_2","first-page":"7132","volume-title":"2021 IEEE International Conference on Robotics and Automation","author":"Seichter Daniel","year":"2021","unstructured":"Daniel Seichter, Mona K\u00f6hler, Benjamin Lewandowski, Tim Wengefeld, and Horst-Michael Gross. 2021. Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis. In 2021 IEEE International Conference on Robotics and Automation, 7132\u20137141."},{"key":"e_1_3_1_30_2","first-page":"746","article-title":"Indoor Segmentation and Support Inference from RGBD Images","author":"Silberman Nathan","year":"2012","unstructured":"Nathan Silberman, Pushmeet Kohli, Derek Hoiem, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In European Conference on Computer Vision, 746\u2013760.","journal-title":"European Conference on Computer Vision"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"e_1_3_1_32_2","first-page":"1","volume-title":"International Conference on Learning Representations","author":"Theis Lucas","year":"2017","unstructured":"Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Husz\u00e1r. 2017. Lossy Image Compression with Compressive Autoencoders. In International Conference on Learning Representations, 1\u201319."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/30.125072"},{"key":"e_1_3_1_34_2","first-page":"651","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"W\u00f6dlinger Matthias","year":"2022","unstructured":"Matthias W\u00f6dlinger, Jan Kotera, Jan Xu, and Robert Sablatnig. 2022. SASIC: Stereo Image Compression with Latent Shifts and Stereo Attention. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 651\u2013660."},{"key":"e_1_3_1_35_2","first-page":"489","volume-title":"2022 Data Compression Conference","author":"Wu Yuyang","year":"2022","unstructured":"Yuyang Wu and Wei Gao. 2022. End-to-End Lossless Compression of High Precision Depth Maps Guided by Pseudo-Residual. In 2022 Data Compression Conference, 489\u2013489."},{"key":"e_1_3_1_36_2","first-page":"1926","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wu Yuyang","year":"2021","unstructured":"Yuyang Wu, Zhiyang Qi, Huiming Zheng, Lvfang Tao, and Wei Gao. 2021. Deep Image Compression with Latent Optimization and Piece-Wise Quantization Approximation. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1926\u20131930."},{"issue":"198","key":"e_1_3_1_37_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3650034","article-title":"Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image Compression","volume":"20","author":"Yang Runyu","year":"2024","unstructured":"Runyu Yang, Dong Liu, Siwei Ma, Feng Wu, and Wen Gao. 2024. Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image Compression. ACM Transactions on Multimedia Computing, Communications and Applications 20, 198 (2024), 1\u201322.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3241225"},{"issue":"1","key":"e_1_3_1_39_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3580499","article-title":"A Universal Optimization Framework for Learning-Based Image Codec","volume":"20","author":"Zhao Jing","year":"2023","unstructured":"Jing Zhao, Bin Li, Jiahao Li, Ruiqin Xiong, and Yan Lu. 2023. A Universal Optimization Framework for Learning-Based Image Codec. ACM Transactions on Multimedia Computing, Communications and Applications 20, 1 (2023), 1\u201319.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548314"},{"key":"e_1_3_1_41_2","first-page":"7562","volume-title":"AAAI Conference on Artificial Intelligence","author":"Zheng Huiming","year":"2024","unstructured":"Huiming Zheng and Wei Gao. 2024. End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy. In AAAI Conference on Artificial Intelligence, 7562\u20137570."},{"key":"e_1_3_1_42_2","first-page":"17492","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zou Renjie","year":"2022","unstructured":"Renjie Zou, Chunfeng Song, and Zhaoxiang Zhang. 2022. The Devil Is in the Details: Window-Based Attention for Image Compression. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 17492\u201317501."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702997","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3702997","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:18Z","timestamp":1750295418000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702997"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,25]]},"references-count":41,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,28]]}},"alternative-id":["10.1145\/3702997"],"URL":"https:\/\/doi.org\/10.1145\/3702997","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,25]]},"assertion":[{"value":"2024-05-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}