{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T18:26:45Z","timestamp":1763058405732,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2020,6,7]],"date-time":"2020-06-07T00:00:00Z","timestamp":1591488000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002850","name":"Fondo Nacional de Desarrollo Cient\u00edfico y Tecnol\u00f3gico","doi-asserted-by":"publisher","award":["11180856","11180881"],"award-info":[{"award-number":["11180856","11180881"]}],"id":[{"id":"10.13039\/501100002850","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008736","name":"Fondo de Fomento al Desarrollo Cient\u00edfico y Tecnol\u00f3gico","doi-asserted-by":"publisher","award":["ID14I20364"],"award-info":[{"award-number":["ID14I20364"]}],"id":[{"id":"10.13039\/501100008736","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Current CNN-based stereo depth estimation models can barely run under real-time constraints on embedded graphic processing unit (GPU) devices. Moreover, state-of-the-art evaluations usually do not consider model optimization techniques, being that it is unknown what is the current potential on embedded GPU devices. In this work, we evaluate two state-of-the-art models on three different embedded GPU devices, with and without optimization methods, presenting performance results that illustrate the actual capabilities of embedded GPU devices for stereo depth estimation. More importantly, based on our evaluation, we propose the use of a U-Net like architecture for postprocessing the cost-volume, instead of a typical sequence of 3D convolutions, drastically augmenting the runtime speed of current models. In our experiments, we achieve real-time inference speed, in the range of 5\u201332 ms, for 1216 \u00d7 368 input stereo images on the Jetson TX2, Jetson Xavier, and Jetson Nano embedded devices.<\/jats:p>","DOI":"10.3390\/s20113249","type":"journal-article","created":{"date-parts":[[2020,6,9]],"date-time":"2020-06-09T06:34:16Z","timestamp":1591684456000},"page":"3249","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Fast CNN Stereo Depth Estimation through Embedded GPU Devices"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2504-9305","authenticated-orcid":false,"given":"Cristhian A.","family":"Aguilera","sequence":"first","affiliation":[{"name":"Universidad Tecnol\u00f3gica de Chile INACAP, Av. Vitacura 10.151, Vitacura 7650033, Santiago, Chile"}]},{"given":"Cristhian","family":"Aguilera","sequence":"additional","affiliation":[{"name":"Departamento de Ingenier\u00eda El\u00e9ctrica y Electr\u00f3cnica, University of B\u00edo-B\u00edo, Concepci\u00f3n 4051381, Chile"}]},{"given":"Crist\u00f3bal A.","family":"Navarro","sequence":"additional","affiliation":[{"name":"Institute of Informatics, Universidad Austral de Chile, Valdivia 5111187, Chile"}]},{"given":"Angel D.","family":"Sappa","sequence":"additional","affiliation":[{"name":"Escuela Superior Polit\u00e9cnica del Litoral, ESPOL, Campus Gustavo Galindo, Guayaquil EC090101, Ecuador"},{"name":"Computer Vision Center, Edifici O, Campus UAB, Bellaterra, 08193 Barcelona, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2020,6,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Du, Y.C., Muslikhin, M., Hsieh, T.H., and Wang, M.S. (2020). Stereo Vision-Based Object Recognition and Manipulation by Regions with Convolutional Neural Network. Electronics, 9.","DOI":"10.3390\/electronics9020210"},{"key":"ref_2","unstructured":"Xie, M., Xiong, Y., Xiong, C., Liu, H., and Hu, Z. (2009). Stereovision-Based Algorithm for Obstacle Avoidance. Intelligent Robotics and Applications, Springer."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Pon, A.D., Ku, J., Li, C., and Waslander, S.L. (2019). Object-Centric Stereo Matching for 3D Object Detection. arXiv.","DOI":"10.1109\/ICRA40945.2020.9196660"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Chen, Y., Bai, X., Yu, S., Yu, K., Li, Z., and Yang, K. (2020). Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching. arXiv.","DOI":"10.1609\/aaai.v34i07.6991"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wang, Y., Lai, Z., Huang, G., Wang, B.H., van der Maaten, L., Campbell, M., and Weinberger, K.Q. (2019, January 20\u201324). Anytime Stereo Image Depth Estimation on Mobile Devices. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794003"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/TPAMI.2007.1166","article-title":"Stereo processing by semiglobal matching and mutual information","volume":"30","author":"Hirschmuller","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","first-page":"2287","article-title":"Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches","volume":"17","author":"LeCun","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Luo, W., Schwing, A.G., and Urtasun, R. (2016, January 27\u201330). Efficient Deep Learning for Stereo Matching. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.614"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chang, J., and Chen, Y. (2018, January 18\u201323). Pyramid Stereo Matching Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00567"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1023\/A:1014573219977","article-title":"A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms","volume":"47","author":"Scharstein","year":"2002","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Jie, Z., Wang, P., Ling, Y., Zhao, B., Wei, Y., Feng, J., and Liu, W. (2018, January 18\u201323). Left-Right Comparative Recurrent Model for Stereo Matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00404"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Batsos, K., and Mordohai, P. (2018, January 5\u20138). RecResNet: A Recurrent Residual CNN Architecture for Disparity Map Enhancement. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00036"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kendall, A., Martirosyan, H., Dasgupta, S., and Henry, P. (2017, January 22\u201329). End-to-End Learning of Geometry and Context for Deep Stereo Regression. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.17"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Khamis, S., Fanello, S.R., Rhemann, C., Kowdle, A., Valentin, J.P.C., and Izadi, S. (2018). StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction. arXiv.","DOI":"10.1007\/978-3-030-01267-0_35"},{"key":"ref_16","unstructured":"Hernandez-Juarez, D., Chac\u00f3n, A., Espinosa, A., V\u00e1zquez, D., Moure, J.C., and L\u00f3pez, A.M. (2016, January 6\u20138). Embedded Real-time Stereo Estimation via Semi-Global Matching on the GPU. Proceedings of the International Conference on Computational Science 2016 (ICCS 2016), San Diego, CA, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., and Brox, T. (July, January 26). A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.438"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Montavon, G., Orr, G.B., and M\u00fcller, K.R. (2012). Early Stopping\u2014But When. Neural Networks: Tricks of the Trade, Springe. [2nd ed.].","DOI":"10.1007\/978-3-642-35289-8"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Smolyanskiy, N., Kamenev, A., and Birchfield, S. (2018, January 18\u201322). On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00147"},{"key":"ref_21","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_22","unstructured":"Falcon, W.E.A. (2019, March 10). PyTorch Lightning. Available online: https:\/\/github.com\/PytorchLightning\/pytorch-lightning."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C. (2020). GhostNet: More Features from Cheap Operations. arXiv.","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhou, X., Lin, M., and Sun, J. (2017). ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv.","DOI":"10.1109\/CVPR.2018.00716"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/11\/3249\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:36:27Z","timestamp":1760175387000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/11\/3249"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,7]]},"references-count":24,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,6]]}},"alternative-id":["s20113249"],"URL":"https:\/\/doi.org\/10.3390\/s20113249","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2020,6,7]]}}}