{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T19:00:30Z","timestamp":1780513230888,"version":"3.54.1"},"reference-count":66,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2024,12,12]],"date-time":"2024-12-12T00:00:00Z","timestamp":1733961600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Integrated Computer-Aided Engineering"],"published-print":{"date-parts":[[2025,5]]},"abstract":"<jats:p>\n                    \n                    \n                    Depth estimation from images is fundamental for autonomous navigation of robots, vehicles, drones, and integrating navigation aid systems for people with visual impairments. Despite the challenges of obtaining depth information from complex scenes, advancements in Deep Learning have opened new possibilities. Thus, this work introduces an approach based on recent Convolutional Neural Network architectures and attention mechanisms to enhance monocular image depth estimation, with potential applications in navigation aid systems for the visually impaired. The proposal focuses on implementing a Convolutional Neural Network model with an attention mechanism configuration that has not yet been tested in the literature, primarily integrating the Convolutional Block Attention Module and the Modified Global Context Network in the encoder and decoder, respectively. Unlike stereo camera-based systems, which require complex setups and image pairs, this model simplifies data collection and processing, although it still faces the challenge of requiring large datasets and significant computational capacity. However, it is experimentally possible to demonstrate that these limitations can be overcome by using reduced-resolution images and resizing techniques. The evaluation of the proposed model indicated satisfactory performance compared to state-of-the-art works that use images with resolutions identical to those in this work, validating the comparative tests. It presented an improvement in the Absolute Relative Error of 25.22% and 6.28% in relation to the Root Mean Squared Error. The results highlight the feasibility of conducting Deep Learning research, even with limited hardware resources.\n                  <\/jats:p>","DOI":"10.1177\/10692509241301587","type":"journal-article","created":{"date-parts":[[2024,12,13]],"date-time":"2024-12-13T01:32:56Z","timestamp":1734053576000},"page":"158-175","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["A new methodology for monocular depth estimation with attention mechanisms"],"prefix":"10.1177","volume":"32","author":[{"given":"Ricardo S","family":"Casado","sequence":"first","affiliation":[{"name":"Department of Computer Science, Federal University of S\u00e3o Carlos, Rodovia Washington Luis, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Emerson C","family":"Pedrino","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Federal University of S\u00e3o Carlos, Rodovia Washington Luis, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2024,12,12]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.3233\/ICA-230727"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Lore KG Reddy K Giering M et\u00a0al. Generative adversarial networks for depth map estimation from rgbRGB video. In: 2018 IEEE\/CVF conference on computer vision and pattern recognition workshops (CVPRW) IEEE 2018 pp.1258\u201312588.","DOI":"10.1109\/CVPRW.2018.00163"},{"key":"e_1_3_2_4_2","first-page":"371","article-title":"Contributions to the physiology of vision","volume":"128","author":"Wheatstone C","year":"1838","unstructured":"Wheatstone C. Contributions to the physiology of vision. Philosoph Trans R Soc 1838; 128: 371\u2013394.","journal-title":"Philosoph Trans R Soc"},{"key":"e_1_3_2_5_2","volume-title":"Treatise on Physiological Optics, Volume III","author":"Von Helmholtz H","year":"2013","unstructured":"Von Helmholtz H. Treatise on Physiological Optics, Volume III. New York: Courier Corporation 2013."},{"key":"e_1_3_2_6_2","volume-title":"Multiple view geometry in computer vision","author":"Hartley R","year":"2003","unstructured":"Hartley R, Zisserman A. Multiple view geometry in computer vision. Cambridge: Cambridge university press, 2003."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.3390\/app13106319"},{"key":"e_1_3_2_8_2","unstructured":"Zaman SRNPN. Single-image stereo depth estimation using gans. [Online] Available at: https:\/\/sharanramjee.github.io\/files\/projects\/cs231a.pdf (2013 accessed 09 August 2023)."},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Hoiem D Efros AA Hebert M. Automatic photo pop-up. In: ACM SIGGRAPH 2005 Papers 2005 pp.577\u2013584.","DOI":"10.1145\/1186822.1073232"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2316835"},{"key":"e_1_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Ladicky L Shi J Pollefeys M. Pulling things out of perspective. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2014 pp.89\u201396.","DOI":"10.1109\/CVPR.2014.19"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Chang Q Maruyama T. Real-time high-quality stereo matching system on a GPU. In: 2018 IEEE 29th international conference on application-specific systems architectures and processors (ASAP) 2018 pp.1\u20138. IEEE.","DOI":"10.1109\/ASAP.2018.8445111"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","unstructured":"Goesele M Snavely N Curless B et\u00a0al. Multi-view stereo for community photo collections. In: 2007 IEEE 11th international conference on computer vision IEEE 2007 pp.1\u20138.","DOI":"10.1109\/ICCV.2007.4408933"},{"key":"e_1_3_2_14_2","unstructured":"Eigen D Puhrsch C Fergus R. Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani Z Welling M Cortes C et\u00a0al. (eds) Advances in neural information processing systems volume 27. Curran Associates Inc 2014a."},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"Godard C Mac Aodha O Brostow GJ. Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2017 pp.270\u2013279.","DOI":"10.1109\/CVPR.2017.699"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2019.102848"},{"key":"e_1_3_2_17_2","first-page":"1","article-title":"Learning depth from single monocular images","volume":"18","author":"Saxena A","year":"2005","unstructured":"Saxena A, Chung S, Ng A. Learning depth from single monocular images. Adv Neural Inf Process Syst 2005; 18: 1\u20138.","journal-title":"Adv Neural Inf Process Syst"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1111\/exsy.12494"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1111\/exsy.12647"},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","unstructured":"Jiang H Larsson G Shakhnarovich MMG et\u00a0al. Self-supervised relative depth learning for urban scene understanding. In: Proceedings of the European conference on computer vision (ECCV) 2018 pp.19\u201335.","DOI":"10.1007\/978-3-030-01252-6_2"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.132"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2023.111084"},{"key":"e_1_3_2_23_2","first-page":"1","article-title":"Depth map prediction from a single image using a multi-scale deep network","volume":"27","author":"Eigen D","unstructured":"Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst 2014b; 27: 1\u20139.","journal-title":"Adv Neural Inf Process Syst"},{"key":"e_1_3_2_24_2","first-page":"1","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky A","year":"2012","unstructured":"Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 2012; 25: 1\u20139.","journal-title":"Adv Neural Inf Process Syst"},{"key":"e_1_3_2_25_2","doi-asserted-by":"crossref","unstructured":"Kumari S Jha RR Bhavsar A et\u00a0al. Autodepth: single image depth map estimation via residual CNN encoder-decoder and stacked hourglass. In: 2019 IEEE International Conference on Image Processing (ICIP) pages 340\u2013344. IEEE 2019.","DOI":"10.1109\/ICIP.2019.8803006"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"Newell A Yang K Deng J. Stacked hourglass networks for human pose estimation. In: Computer Vision\u2013ECCV 2016: 14th European conference Amsterdam The Netherlands October 11-14 2016 proceedings part VIII 14 Springer 2016 pp.483\u2013499.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"e_1_3_2_27_2","unstructured":"Alhashim I Wonka P. High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 2018."},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","unstructured":"Huang G Liu Z Van Der Maaten L et\u00a0al. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2017 pp.4700\u20134708.","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","unstructured":"Eigen D Fergus R. Predicting depth surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision 2015 pp.2650\u20132658.","DOI":"10.1109\/ICCV.2015.304"},{"key":"e_1_3_2_30_2","unstructured":"Simonyan K Vedaldi A Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 2013."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.3390\/app112110088"},{"key":"e_1_3_2_32_2","doi-asserted-by":"crossref","unstructured":"Guariglia E Silvestrov S. Fractional-wavelet analysis of positive definite distributions and wavelets on d(c) d(c). In: Engineering mathematics II: Algebraic stochastic and analysis structures for networks data classification and optimization Springer 2016 pp.337\u2013353.","DOI":"10.1007\/978-3-319-42105-6_16"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0219691319500504"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2019.2896246"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.3390\/app13179940"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2017.2682102"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3465055"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","unstructured":"Zhou B Khosla A Lapedriza A et\u00a0al. Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2016 pp.2921\u20132929.","DOI":"10.1109\/CVPR.2016.319"},{"key":"e_1_3_2_39_2","first-page":"1","article-title":"Recurrent models of visual attention","volume":"27","author":"Mnih V","year":"2014","unstructured":"Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention. Adv Neural Inf Process Syst 2014;\u00a027: 1\u20139.","journal-title":"Adv Neural Inf Process Syst"},{"key":"e_1_3_2_40_2","doi-asserted-by":"crossref","unstructured":"Anderson P He X Buehler C et\u00a0al. Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2023 pp.6077\u20136086.","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","unstructured":"Wang F Jiang M Qian C et\u00a0al. Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2017 pp.3156\u20133164.","DOI":"10.1109\/CVPR.2017.683"},{"key":"e_1_3_2_42_2","unstructured":"Bahdanau D. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 2014."},{"key":"e_1_3_2_43_2","unstructured":"Xu K Ba J Kiros R et\u00a0al. Show attend and tell: neural image caption generation with visual attention. In: International conference on machine learning PMLR 2015 pp.2048\u20132057."},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","unstructured":"Cao Y Xu J Lin S et\u00a0al. Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE\/CVF international conference on computer vision workshops 2019 pp.0\u20130.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"e_1_3_2_45_2","doi-asserted-by":"crossref","unstructured":"Wang X Girshick R Gupta A et\u00a0al. Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2018 pp.7794\u20137803.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_3_2_46_2","unstructured":"Zhang H Goodfellow I Metaxas D et\u00a0al. Self-attention generative adversarial networks. In: International conference on machine learning PMLR 2019 pp.7354\u20137363 2019."},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","unstructured":"Hu J Shen L Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2018 pp.7132\u20137141.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","unstructured":"Woo S Park J Lee J-Y et\u00a0al. Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV) 2018 pp.3\u201319.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_2_49_2","doi-asserted-by":"crossref","unstructured":"Jiang M Song L Wang Y et\u00a0al. Fusion of the YOLOv4 network model and visual attention mechanism to detect low-quality young apples in a complex environment. Precision Agriculture 2022 pp.1\u201319.","DOI":"10.1007\/s11119-021-09849-0"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compag.2022.107404"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.3390\/s23177466"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/s41095-022-0279-3"},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","unstructured":"Wang Y Lai Z Huang G et\u00a0al. Anytime stereo image depth estimation on mobile devices. In: 2019 international conference on robotics and automation (ICRA) IEEE 2019a pp.5893\u20135900.","DOI":"10.1109\/ICRA.2019.8794003"},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","unstructured":"Wang Y Wang P Yang Z et\u00a0al. Unos: unified unsupervised optical-flow and stereo-depth estimation by watching videos. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition 2019b pp.8071\u20138081.","DOI":"10.1109\/CVPR.2019.00826"},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","unstructured":"Agarwal A Arora C. Attention attention everywhere: monocular depth prediction with skip attention. In: Proceedings of the IEEE\/CVF Winter conference on applications of computer vision 2023 pp.5861\u20135870.","DOI":"10.1109\/WACV56688.2023.00581"},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","unstructured":"Silberman N Hoiem D Kohli P et\u00a0al. Indoor segmentation and support inference from rgbd images. In: Computer vision\u2013ECCV 2012: 12th European conference on computer vision Florence Italy October 7-13 2012 Proceedings Part V 12 Springer 2012 pp.746\u2013760.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"e_1_3_2_57_2","doi-asserted-by":"crossref","unstructured":"Levin A Lischinski D Weiss Y. Colorization using optimization. In: ACM SIGGRAPH 2004 Papers 2004 pp.689\u2013694.","DOI":"10.1145\/1186562.1015780"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913491297"},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","unstructured":"Fu H Gong M Wang C et\u00a0al. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition 2018 pp.2002\u20132011.","DOI":"10.1109\/CVPR.2018.00214"},{"key":"e_1_3_2_60_2","doi-asserted-by":"crossref","unstructured":"Laina I Rupprecht C Belagiannis V et\u00a0al. Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth international conference on 3D vision (3DV) IEEE 2016 pp.239\u2013248.","DOI":"10.1109\/3DV.2016.32"},{"key":"e_1_3_2_61_2","doi-asserted-by":"crossref","unstructured":"Ronneberger O Fischer P Brox T. U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assistedintervention\u2013MICCAI 2015: 18th international conference Munich Germany October 5-9 2015 Proceedings Part III 18 Springer 2015 pp.234\u2013241.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_2_63_2","unstructured":"Abadi M Agarwal A Barham P et\u00a0al. Tensorflow: large-scale machine learning on heterogeneous systems 2015."},{"key":"e_1_3_2_64_2","unstructured":"Kingma DP Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014."},{"key":"e_1_3_2_65_2","unstructured":"Bhat SF Alhashim I Wonka P. Adabins: depth estimation using adaptive bins. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition 2021 pp.4009\u20134018."},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","unstructured":"Yuan W Gu X Dai Z et\u00a0al. Neural window fully-connected crfs for monocular depth estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition 2022 pp.3916\u20133925.","DOI":"10.1109\/CVPR52688.2022.00389"},{"key":"e_1_3_2_67_2","unstructured":"Kim D Ka W Ahn P et\u00a0al. Global-local path networks for monocular depth estimation with vertical cutdepth. arXiv preprint arXiv:2201.07436 2022."}],"container-title":["Integrated Computer-Aided Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10692509241301587","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10692509241301587","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10692509241301587","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:14:56Z","timestamp":1777454096000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10692509241301587"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,12]]},"references-count":66,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,5]]}},"alternative-id":["10.1177\/10692509241301587"],"URL":"https:\/\/doi.org\/10.1177\/10692509241301587","relation":{},"ISSN":["1069-2509","1875-8835"],"issn-type":[{"value":"1069-2509","type":"print"},{"value":"1875-8835","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,12]]}}}