{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T11:29:56Z","timestamp":1762342196687,"version":"build-2065373602"},"reference-count":55,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T00:00:00Z","timestamp":1700092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Natural Science Foundation of Jilin Provincial Department of Science and Technology","award":["20210101412JC","U19A2063"],"award-info":[{"award-number":["20210101412JC","U19A2063"]}]},{"name":"National Natural Science Foundation","award":["20210101412JC","U19A2063"],"award-info":[{"award-number":["20210101412JC","U19A2063"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Monocular panoramic depth estimation has various applications in robotics and autonomous driving due to its ability to perceive the entire field of view. However, panoramic depth estimation faces two significant challenges: global context capturing and distortion awareness. In this paper, we propose a new framework for panoramic depth estimation that can simultaneously address panoramic distortion and extract global context information, thereby improving the performance of panoramic depth estimation. Specifically, we introduce an attention mechanism into the multi-scale dilated convolution and adaptively adjust the receptive field size between different spatial positions, designing the adaptive attention dilated convolution module, which effectively perceives distortion. At the same time, we design the global scene understanding module to integrate global context information into the feature maps generated using the feature extractor. Finally, we trained and evaluated our model on three benchmark datasets which contains the virtual and real-world RGB-D panorama datasets. The experimental results show that the proposed method achieves competitive performance, comparable to existing techniques in both quantitative and qualitative evaluations. Furthermore, our method has fewer parameters and more flexibility, making it a scalable solution in mobile AR.<\/jats:p>","DOI":"10.3390\/s23229218","type":"journal-article","created":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T08:19:43Z","timestamp":1700122783000},"page":"9218","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["E2LNet: An Efficient and Effective Lightweight Network for Panoramic Depth Estimation"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-1825-3345","authenticated-orcid":false,"given":"Jiayue","family":"Xu","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0824-538X","authenticated-orcid":false,"given":"Jianping","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8566-1558","authenticated-orcid":false,"given":"Hua","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3735-0162","authenticated-orcid":false,"given":"Cheng","family":"Han","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0873-7192","authenticated-orcid":false,"given":"Chao","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,11,16]]},"reference":[{"key":"ref_1","first-page":"1","article-title":"3d scene geometry estimation from 360 imagery: A survey","volume":"55","author":"Pinto","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Ai, H., Cao, Z., Zhu, J., Bai, H., Chen, Y., and Wang, L. (2022). Deep Learning for Omnidirectional Vision: A Survey and New Perspectives. arXiv.","DOI":"10.36227\/techrxiv.19807699"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zioulis, N., Karakottas, A., Zarpalas, D., and Daras, P. (2018, January 8\u201314). Omnidepth: Dense depth estimation for indoors spherical panoramas. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_28"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Tateno, K., Navab, N., and Tombari, F. (2018, January 8\u201314). Distortion-aware convolutional filters for dense prediction in panoramic images. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01270-0_43"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1109\/LSP.2021.3050712","article-title":"Distortion-aware monocular depth estimation for omnidirectional images","volume":"28","author":"Chen","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Li, M., Meng, M., and Zhou, Z. (2022, January 4\u20138). RepF-Net: Distortion-aware Re-projection Fusion Network for Object Detection in Panorama Image. Proceedings of the Asian Conference on Computer Vision, Macau, China.","DOI":"10.1007\/978-3-031-26313-2_31"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Wang, F.E., Yeh, Y.H., Sun, M., Chiu, W.C., and Tsai, Y.H. (2020, January 13\u201319). Bifuse: Monocular 360 depth estimation via bi-projection fusion. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00054"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1519","DOI":"10.1109\/LRA.2021.3058957","article-title":"Unifuse: Unidirectional fusion for 360 panorama depth estimation","volume":"6","author":"Jiang","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_9","first-page":"5448","article-title":"BiFuse++: Self-Supervised and Efficient Bi-Projection Fusion for 360\u00b0 Depth Estimation","volume":"45","author":"Wang","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","unstructured":"Bai, J., Lai, S., Qin, H., Guo, J., and Guo, Y. (2022). GLPanoDepth: Global-to-local panoramic depth estimation. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Pintore, G., Agus, M., Almansa, E., Schneider, J., and Gobbetti, E. (2021, January 20\u201325). Slicenet: Deep dense depth estimation from a single indoor panorama using a slice-based representation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01137"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1583","DOI":"10.1007\/s13042-020-01251-y","article-title":"Attention-based context aggregation network for monocular depth estimation","volume":"12","author":"Chen","year":"2021","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"678","DOI":"10.1109\/LSP.2021.3067498","article-title":"Monocular depth estimation with multi-scale feature fusion","volume":"28","author":"Xu","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Jiao, J., Cao, Y., Song, Y., and Lau, R. (2018, January 8\u201314). Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01267-0_4"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Johnston, A., and Carneiro, G. (2020, January 13\u201319). Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00481"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhuang, C., Lu, Z., Wang, Y., Xiao, J., and Wang, Y. (2022, January 7\u201314). ACDNet: Adaptively combined dilated convolution for monocular panorama depth estimation. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v36i3.20278"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2015: 18th International Conference, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1612","DOI":"10.1007\/s11431-020-1582-8","article-title":"Monocular depth estimation based on deep learning: An overview","volume":"63","author":"Zhao","year":"2020","journal-title":"Sci. China Technol. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cheng, X., Wang, P., Zhou, Y., Guan, C., and Yang, R. (August, January 31). Omnidirectional depth extension networks. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9197123"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Shen, Z., Lin, C., Liao, K., Nie, L., Zheng, Z., and Zhao, Y. (2022, January 23\u201327). PanoFormer: Panorama Transformer for Indoor 360\u2014Depth Estimation. Proceedings of the Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19769-7_12"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1049\/cvi2.12144","article-title":"PCformer: A parallel convolutional transformer network for 360\u00b0 depth estimation","volume":"17","author":"Xu","year":"2023","journal-title":"IET Comput. Vis."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zioulis, N., Karakottas, A., Zarpalas, D., Alvarez, F., and Daras, P. (2019, January 16\u201319). Spherical view synthesis for self-supervised 360 depth estimation. Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada.","DOI":"10.1109\/3DV.2019.00081"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zeng, W., Karaoglu, S., and Gevers, T. (2020, January 23\u201328). Joint 3d layout and depth prediction from a single indoor panorama image. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58517-4_39"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Jin, L., Xu, Y., Zheng, J., Zhang, J., Tang, R., Xu, S., Yu, J., and Gao, S. (2020, January 13\u201319). Geometric structure based and regularized depth estimation from 360 indoor imagery. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00097"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhou, K., Wang, K., and Yang, K. (2020, January 20\u201323). PADENet: An efficient and robust panoramic monocular depth estimation network for outdoor scenes. Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece.","DOI":"10.1109\/ITSC45102.2020.9294206"},{"key":"ref_26","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Fu, H., Gong, M., Wang, C., Batmanghelich, K., and Tao, D. (2018, January 18\u201323). Deep ordinal regression network for monocular depth estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00214"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"185179","DOI":"10.1109\/ACCESS.2019.2960520","article-title":"Multi-scale dilated convolution network based depth estimation in intelligent transportation systems","volume":"7","author":"Tian","year":"2019","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"20134","DOI":"10.1109\/TITS.2022.3179365","article-title":"Mobilexnet: An efficient convolutional neural network for monocular depth estimation","volume":"23","author":"Dong","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Sagar, A. (2022, January 3\u20138). Monocular depth estimation using multi scale neural network and feature fusion. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACVW54805.2022.00072"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Gur, S., and Wolf, L. (2019, January 15\u201320). Single image depth estimation trained via depth from defocus cues. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00787"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wang, N.H., Solarte, B., Tsai, Y.H., Chiu, W.C., and Sun, M. (August, January 31). 360sd-net: 360 stereo depth estimation with learnable cost volume. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9196975"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Yang, Y., Zhu, Y., Gao, Z., and Zhai, G. (2021, January 5\u20138). Salgfcn: Graph based fully convolutional network for panoramic saliency prediction. Proceedings of the 2021 International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany.","DOI":"10.1109\/VCIP53242.2021.9675373"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Dai, F., Zhang, Y., Ma, Y., Li, H., and Zhao, Q. (2020, January 4\u20138). Dilated convolutional neural networks for panoramic image saliency prediction. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053888"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"6309","DOI":"10.1109\/TGRS.2020.2976658","article-title":"Dense dilated convolutions\u2019 merging network for land cover classification","volume":"58","author":"Liu","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_37","unstructured":"Wu, H., Zhang, J., Huang, K., Liang, K., and Yu, Y. (2019). Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Yuan, Y., Chen, X., and Wang, J. (2020, January 23\u201328). Object-contextual representations for semantic segmentation. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58539-6_11"},{"key":"ref_39","unstructured":"Ashish, V. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Kong, S., Shin, D., and Fowlkes, C. (2020, January 13\u201319). Domain decluttering: Simplifying images to mitigate synthetic-real domain shift and improve depth estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00339"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Xu, D., Wang, W., Tang, H., Liu, H., Sebe, N., and Ricci, E. (2018, January 18\u201323). Structured attention guided convolutional neural fields for monocular depth estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00412"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1016\/j.neucom.2019.10.107","article-title":"Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention","volume":"379","author":"Zhang","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Yang, Y., Wang, J., Xu, W., and Yuille, A.L. (2016, January 27\u201330). Attention to scale: Scale-aware semantic image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.396"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Shi, W., Caballero, J., Husz\u00e1r, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., and Wang, Z. (2016, January 27\u201330). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.207"},{"key":"ref_48","unstructured":"Eigen, D., Puhrsch, C., and Fergus, R. (2014, January 4\u20139). Depth map prediction from a single image using a multi-scale deep network. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., and Navab, N. (2016, January 25\u201328). Deeper depth prediction with fully convolutional residual networks. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.32"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., and Zhang, Y. (2017). Matterport3d: Learning from rgb-d data in indoor environments. arXiv.","DOI":"10.1109\/3DV.2017.00081"},{"key":"ref_51","unstructured":"Armeni, I., Sax, S., Zamir, A.R., and Savarese, S. (2017). Joint 2d-3d-semantic data for indoor scene understanding. arXiv."},{"key":"ref_52","unstructured":"Wang, F.E., Hu, H.N., Cheng, H.T., Lin, J.T., Yang, S.T., Shih, M.L., Chu, H.K., and Sun, M. (2018). Self-Supervised Learning of Depth and Camera Motion from 360\u2218 Videos. arXiv."},{"key":"ref_53","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). Pytorch: An imperative style, high-performance deep learning library. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_55","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/22\/9218\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:24:06Z","timestamp":1760131446000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/22\/9218"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,16]]},"references-count":55,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["s23229218"],"URL":"https:\/\/doi.org\/10.3390\/s23229218","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,11,16]]}}}