{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T11:27:24Z","timestamp":1762342044094,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,2,26]],"date-time":"2023-02-26T00:00:00Z","timestamp":1677369600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Monocular depth estimation techniques are used to recover the distance from the target to the camera plane in an image scene. However, there are still several problems, such as insufficient estimation accuracy, the inaccurate localization of details, and depth discontinuity in planes parallel to the camera plane. To solve these problems, we propose the Global Feature Interaction Network (GFI-Net), which aims to utilize geometric features, such as object locations and vanishing points, on a global scale. In order to capture the interactive information of the width, height, and channel of the feature graph and expand the global information in the network, we designed a global interactive attention mechanism. The global interactive attention mechanism reduces the loss of pixel information and improves the performance of depth estimation. Furthermore, the encoder uses the Transformer to reduce coding losses and improve the accuracy of depth estimation. Finally, a local\u2013global feature fusion module is designed to improve the depth map\u2019s representation of detailed areas. The experimental results on the NYU-Depth-v2 dataset and the KITTI dataset showed that our model achieved state-of-the-art performance with full detail recovery and depth continuation on the same plane.<\/jats:p>","DOI":"10.3390\/e25030421","type":"journal-article","created":{"date-parts":[[2023,2,27]],"date-time":"2023-02-27T03:23:36Z","timestamp":1677468216000},"page":"421","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["GFI-Net: Global Feature Interaction Network for Monocular Depth Estimation"],"prefix":"10.3390","volume":"25","author":[{"given":"Cong","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ke","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanxin","family":"Ma","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianwei","family":"Wan","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,26]]},"reference":[{"key":"ref_1","unstructured":"Lee, J.H., Han, M.K., Ko, D.W., and Suh, I.H. (2019). From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"147808","DOI":"10.1109\/ACCESS.2020.3016008","article-title":"Leveraging contextual information for monocular depth estimation","volume":"8","author":"Kim","year":"2020","journal-title":"IEEE Access"},{"key":"ref_3","unstructured":"Kim, D., Ga, W., Ahn, P., Joo, D., Chun, S., and Kim, J. (2022). Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ranftl, R., Bochkovskiy, A., and Koltun, V. (2021, January 10\u201317). Vision transformers for dense prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01196"},{"key":"ref_5","unstructured":"Bhat, S.F., Alhashim, I., and Wonka, P. (2021, January 20\u201325). Adabins: Depth estimation using adaptive bins. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1237","DOI":"10.1109\/TPAMI.2016.2578333","article-title":"Planar structure-from-motion with affine camera models: Closed-form solutions, ambiguities and degeneracy analysis","volume":"39","author":"Collins","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","unstructured":"Eigen, D., Puhrsch, C., and Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inf. Process. Syst., 27."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1623","DOI":"10.1109\/TPAMI.2020.3019967","article-title":"Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer","volume":"44","author":"Ranftl","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ning, J., Li, C., Zhang, Z., Geng, Z., Dai, Q., He, K., and Hu, H. (2023). All in Tokens: Unifying Output Space of Visual Tasks via Soft Token. arXiv.","DOI":"10.1109\/ICCV51070.2023.01822"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Fu, H., Gong, M., Wang, C., Batmanghelich, K., and Tao, D. (2018, January 18\u201323). Deep ordinal regression network for monocular depth estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00214"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Agarwal, A., and Arora, C. (2023, January 2\u20137). Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00581"},{"key":"ref_12","unstructured":"Yin, W., Liu, Y., Shen, C., and Yan, Y. (November, January 27). Enforcing geometric constraints of virtual normal for depth prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_13","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_14","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Qin, Z., Zhang, P., Wu, F., and Li, X. (2021, January 10\u201317). Fcanet: Frequency channel attention networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00082"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_18","unstructured":"Park, J., Woo, S., Lee, J.Y., and Kweon, I.S. (2018). Bam: Bottleneck attention module. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018, January 18\u201323). Shufflenet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00716"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012, January 7\u201313). Indoor segmentation and support inference from rgbd images. Proceedings of the European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_22","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","unstructured":"Alhashim, I., and Wonka, P. (2018). High quality monocular depth estimation via transfer learning. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Huynh, L., Nguyen-Ha, P., Matas, J., Rahtu, E., and Heikkil\u00e4, J. (2020, January 23\u201328). Guiding monocular depth estimation using depth-attention volume. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58574-7_35"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/3\/421\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:43:05Z","timestamp":1760121785000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/3\/421"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,26]]},"references-count":24,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["e25030421"],"URL":"https:\/\/doi.org\/10.3390\/e25030421","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2023,2,26]]}}}