{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T17:13:21Z","timestamp":1778346801586,"version":"3.51.4"},"reference-count":37,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,2,16]],"date-time":"2023-02-16T00:00:00Z","timestamp":1676505600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004271","name":"Sapienza University of Rome","doi-asserted-by":"publisher","award":["RM12117A56C08D64"],"award-info":[{"award-number":["RM12117A56C08D64"]}],"id":[{"id":"10.13039\/501100004271","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The knowledge of environmental depth is essential in multiple robotics and computer vision tasks for both terrestrial and underwater scenarios. Moreover, the hardware on which this technology runs, generally IoT and embedded devices, are limited in terms of power consumption, and therefore, models with a low-energy footprint are required to be designed. Recent works aim at enabling depth perception using single RGB images on deep architectures, such as convolutional neural networks and vision transformers, which are generally unsuitable for real-time inferences on low-power embedded hardware. Moreover, such architectures are trained to estimate depth maps mainly on terrestrial scenarios due to the scarcity of underwater depth data. Purposely, we present two lightweight architectures based on optimized MobileNetV3 encoders and a specifically designed decoder to achieve fast inferences and accurate estimations over embedded devices, a feasibility study to predict depth maps over underwater scenarios, and an energy assessment to understand which is the effective energy consumption during the inference. Precisely, we propose the MobileNetV3S75 configuration to infer on the 32-bit ARM CPU and the MobileNetV3LMin for the 8-bit Edge TPU hardware. In underwater settings, the proposed design achieves comparable estimations with fast inference performances compared to state-of-the-art methods. Moreover, we statistically proved that the architecture of the models has an impact on the energy footprint in terms of Watts required by the device during the inference. Then, the proposed architectures would be considered to be a promising approach for real-time monocular depth estimation by offering the best trade-off between inference performances, estimation error and energy consumption, with the aim of improving the environment perception for underwater drones, lightweight robots and Internet of things.<\/jats:p>","DOI":"10.3390\/s23042223","type":"journal-article","created":{"date-parts":[[2023,2,16]],"date-time":"2023-02-16T04:01:52Z","timestamp":1676520112000},"page":"2223","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Lightweight and Energy-Aware Monocular Depth Estimation Models for IoT Embedded Devices: Challenges and Performances in Terrestrial and Underwater Scenarios"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9393-5248","authenticated-orcid":false,"given":"Lorenzo","family":"Papa","sequence":"first","affiliation":[{"name":"Department of Computer, Control and Management Engineering, Sapienza University of Rome, Via Ariosto 25, 00185 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4551-7567","authenticated-orcid":false,"given":"Gabriele","family":"Proietti Mattia","sequence":"additional","affiliation":[{"name":"Department of Computer, Control and Management Engineering, Sapienza University of Rome, Via Ariosto 25, 00185 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1886-3491","authenticated-orcid":false,"given":"Paolo","family":"Russo","sequence":"additional","affiliation":[{"name":"Department of Computer, Control and Management Engineering, Sapienza University of Rome, Via Ariosto 25, 00185 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6461-1391","authenticated-orcid":false,"given":"Irene","family":"Amerini","sequence":"additional","affiliation":[{"name":"Department of Computer, Control and Management Engineering, Sapienza University of Rome, Via Ariosto 25, 00185 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9731-6321","authenticated-orcid":false,"given":"Roberto","family":"Beraldi","sequence":"additional","affiliation":[{"name":"Department of Computer, Control and Management Engineering, Sapienza University of Rome, Via Ariosto 25, 00185 Rome, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Papa, L., Russo, P., and Amerini, I. (2022, January 3\u20135). Real-time monocular depth estimation on embedded devices: Challenges and performances in terrestrial and underwater scenarios. Proceedings of the 2022 IEEE International Workshop on Metrology for the Sea, Milazzo, Italy. Learning to Measure Sea Health Parameters (MetroSea).","DOI":"10.1109\/MetroSea55331.2022.9950812"},{"key":"ref_2","unstructured":"Li, Z., Chen, Z., Liu, X., and Jiang, J. (2022). DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ranftl, R., Bochkovskiy, A., and Koltun, V. (2021). Vision Transformers for Dense Prediction. arXiv.","DOI":"10.1109\/ICCV48922.2021.01196"},{"key":"ref_4","unstructured":"Bhat, S.F., Alhashim, I., and Wonka, P. (2020). AdaBins: Depth Estimation using Adaptive Bins. arXiv."},{"key":"ref_5","unstructured":"Alhashim, I., and Wonka, P. (2019). High Quality Monocular Depth Estimation via Transfer Learning. arXiv."},{"key":"ref_6","unstructured":"Kist, A.M. (2021). Deep Learning on Edge TPUs. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yazdanbakhsh, A., Seshadri, K., Akin, B., Laudon, J., and Narayanaswami, R. (2021). An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks. arXiv.","DOI":"10.1109\/IISWC55918.2022.00017"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Peluso, V., Cipolletta, A., Calimera, A., Poggi, M., Tosi, F., and Mattoccia, S. (2019, January 25\u201329). Enabling Energy-Efficient Unsupervised Monocular Depth Estimation on ARMv7-Based Platforms. Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy.","DOI":"10.23919\/DATE.2019.8714893"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Poggi, M., Aleotti, F., Tosi, F., and Mattoccia, S. (2018). Towards real-time unsupervised monocular depth estimation on CPU. arXiv.","DOI":"10.1109\/IROS.2018.8593814"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1524","DOI":"10.1109\/TCSVT.2021.3077395","article-title":"Monocular Depth Perception on Microcontrollers for Edge Applications","volume":"32","author":"Peluso","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"44881","DOI":"10.1109\/ACCESS.2022.3170425","article-title":"SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings","volume":"10","author":"Papa","year":"2022","journal-title":"IEEE Access"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wofk, D., Ma, F., Yang, T.J., Karaman, S., and Sze, V. (2019). FastDepth: Fast Monocular Depth Estimation on Embedded Systems. arXiv.","DOI":"10.1109\/ICRA.2019.8794182"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Spek, A., Dharmasiri, T., and Drummond, T. (2018). CReaM: Condensed Real-time Models for Depth Prediction using Convolutional Neural Networks. arXiv.","DOI":"10.1109\/IROS.2018.8594243"},{"key":"ref_14","first-page":"746","article-title":"Indoor Segmentation and Support Inference from RGBD Images","volume":"7576","author":"Silberman","year":"2012","journal-title":"ECCV"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3995","DOI":"10.1109\/TCSVT.2019.2958950","article-title":"Deep Joint Depth Estimation and Color Correction From Monocular Underwater Images Based on Unsupervised Adaptation Networks","volume":"30","author":"Ye","year":"2020","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Gupta, H., and Mitra, K. (2019, January 22\u201325). Unsupervised Single Image Underwater Depth Estimation. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8804200"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Peng, Y.T., Zhao, X., and Cosman, P.C. (2015, January 27\u201330). Single underwater image enhancement using depth estimation based on blurriness. Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada.","DOI":"10.1109\/ICIP.2015.7351749"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1109\/MCG.2016.26","article-title":"Underwater Depth Estimation and Image Restoration Based on Single Images","volume":"36","author":"Drews","year":"2016","journal-title":"IEEE Comput. Graph. Appl."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/bs.adcom.2020.07.002","article-title":"Chapter Eight - Energy-efficient deep learning inference on edge devices","volume":"Volume 122","author":"Kim","year":"2021","journal-title":"Hardware Accelerator Systems for Artificial Intelligence and Machine Learning"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, Y., Li, B., Luo, R., Chen, Y., Xu, N., and Yang, H. (2014, January 24\u201328). Energy efficient neural networks for big data analytics. Proceedings of the 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany.","DOI":"10.7873\/DATE.2014.358"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lee, E.H., Miyashita, D., Chai, E., Murmann, B., and Wong, S.S. (2017, January 5\u20139). LogNet: Energy-efficient neural networks using logarithmic computation. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7953288"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"4403","DOI":"10.1109\/JIOT.2020.2976702","article-title":"FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things","volume":"7","author":"Wang","year":"2020","journal-title":"IEEE Internet Things J."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Jiao, X., Akhlaghi, V., Jiang, Y., and Gupta, R.K. (2018, January 19\u201323). Energy-efficient neural networks using approximate computation reuse. Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany.","DOI":"10.23919\/DATE.2018.8342202"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"4670","DOI":"10.1109\/TCSI.2020.3019460","article-title":"Weight-Oriented Approximation for Energy-Efficient Neural Network Inference Accelerators","volume":"67","author":"Tasoulas","year":"2020","journal-title":"IEEE Trans. Circuits Syst. I Regul. Pap."},{"key":"ref_26","first-page":"2822","article-title":"Underwater Single Image Color Restoration Using Haze-Lines and a New Quantitative Dataset","volume":"43","author":"Berman","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., and Vasudevan, V. (2019). Searching for MobileNetV3. arXiv.","DOI":"10.1109\/ICCV.2019.00140"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., Albanie, S., Sun, G., and Wu, E. (2019). Squeeze-and-Excitation Networks. arXiv.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_30","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201323). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_32","unstructured":"Tan, M., and Le, Q.V. (2020). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zoph, B., Vasudevan, V., Shlens, J., and Le, Q. (2018, January 18\u201323). Learning Transferable Architectures for Scalable Image Recognition. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00907"},{"key":"ref_34","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_35","first-page":"2366","article-title":"Depth map prediction from a single image using a multi-scale deep network","volume":"27","author":"Eigen","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1579","DOI":"10.1109\/TIP.2017.2663846","article-title":"Underwater Image Restoration Based on Image Blurriness and Light Absorption","volume":"26","author":"Peng","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3224419","article-title":"What your DRAM power models are not telling you: Lessons from a detailed experimental study","volume":"2","author":"Ghose","year":"2018","journal-title":"Proc. ACM Meas. Anal. Comput. Syst."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/4\/2223\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:37:50Z","timestamp":1760121470000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/4\/2223"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,16]]},"references-count":37,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["s23042223"],"URL":"https:\/\/doi.org\/10.3390\/s23042223","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,16]]}}}