{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T17:57:54Z","timestamp":1772906274835,"version":"3.50.1"},"reference-count":51,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T00:00:00Z","timestamp":1723593600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Federal Ministry of Education and Research (BMBF)"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>In this paper, we present a multi-task model that predicts disparities and confidence levels in deep stereo matching simultaneously. We do this by combining its successful model for each separate task and obtaining a multi-task model that can be trained with a proposed loss function. We show the advantages of this model compared to training and predicting disparity and confidence sequentially. This method enables an improvement of 15% to 30% in the area under the curve (AUC) metric when trained in parallel rather than sequentially. In addition, the effect of weighting the components in the loss function on the stereo and confidence performance is investigated. By improving the confidence estimate, the practicality of stereo estimators for creating distance images is increased.<\/jats:p>","DOI":"10.3390\/jimaging10080198","type":"journal-article","created":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T09:20:51Z","timestamp":1723627251000},"page":"198","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Simultaneous Stereo Matching and Confidence Estimation Network"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-0030-1836","authenticated-orcid":false,"given":"Tobias","family":"Schm\u00e4hling","sequence":"first","affiliation":[{"name":"Institute for Photonic Systems Hochschule Ravensburg-Weingarten, University of Applied Sciences, Doggenriedstra\u00dfe, 88250 Weingarten, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5171-7385","authenticated-orcid":false,"given":"Tobias","family":"M\u00fcller","sequence":"additional","affiliation":[{"name":"Institute for Photonic Systems Hochschule Ravensburg-Weingarten, University of Applied Sciences, Doggenriedstra\u00dfe, 88250 Weingarten, Germany"}]},{"given":"J\u00f6rg","family":"Eberhardt","sequence":"additional","affiliation":[{"name":"Institute for Photonic Systems Hochschule Ravensburg-Weingarten, University of Applied Sciences, Doggenriedstra\u00dfe, 88250 Weingarten, Germany"}]},{"given":"Stefan","family":"Elser","sequence":"additional","affiliation":[{"name":"Institute for Artificial Intelligence Hochschule Ravensburg-Weingarten, University of Applied Sciences, Doggenriedstra\u00dfe, 88250 Weingarten, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1023\/A:1014573219977","article-title":"A taxonomy and evaluation of dense two-frame stereo correspondence algorithms","volume":"47","author":"Scharstein","year":"2002","journal-title":"Int. J. Comput. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hartley, R., and Zisserman, A. (2003). Multiple View Geometry in Computer Vision, Cambridge University Press.","DOI":"10.1017\/CBO9780511811685"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hannah, M.J. (1974). Computer Matching of Areas in Stereo Images, Stanford University.","DOI":"10.21236\/AD0786720"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1068\/p140449","article-title":"PMF: A stereo correspondence algorithm using a disparity gradient limit","volume":"14","author":"Pollard","year":"1985","journal-title":"Perception"},{"key":"ref_5","first-page":"17","article-title":"Real-time stereo and motion integration for navigation","volume":"Volume 2357","author":"Baker","year":"1994","journal-title":"Proceedings of the ISPRS Commission III Symposium: Spatial Information from Digital Photogrammetry and Computer Vision"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/TPAMI.2007.1166","article-title":"Stereo processing by semiglobal matching and mutual information","volume":"30","author":"Hirschmuller","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zbontar, J., and LeCun, Y. (2015, January 7\u201312). Computing the stereo matching cost with a convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298767"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Shaked, A., and Wolf, L. (2017, January 21\u201326). Improved stereo matching with constant highway networks and reflective confidence learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.730"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mei, X., Sun, X., Zhou, M., Jiao, S., Wang, H., and Zhang, X. (2011, January 6\u201313). On building an accurate stereo matching system on graphics hardware. Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130280"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.1109\/TPAMI.2002.1023808","article-title":"Detecting binocular half-occlusions: Empirical comparisons of five approaches","volume":"24","author":"Egnal","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1180","DOI":"10.1016\/j.cviu.2010.03.012","article-title":"A fast stereo matching algorithm suitable for embedded real-time systems","volume":"114","author":"Humenberger","year":"2010","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zabih, R., and Woodfill, J. (1994, January 2\u20136). Non-parametric local transforms for computing visual correspondence. Proceedings of the Computer Vision\u2014ECCV\u201994: Third European Conference on Computer Vision, Stockholm, Sweden. Proceedings, Volume II 3.","DOI":"10.1007\/BFb0028345"},{"key":"ref_13","first-page":"807","article-title":"Robust stereo matching using adaptive normalized cross-correlation","volume":"33","author":"Heo","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_14","first-page":"5293","article-title":"On the confidence of stereo matching in a deep-learning era: A quantitative evaluation","volume":"44","author":"Poggi","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1023\/A:1008015117424","article-title":"Stereo matching with nonlinear diffusion","volume":"28","author":"Scharstein","year":"1998","journal-title":"Int. J. Comput. Vis."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1016\/j.imavis.2004.03.018","article-title":"A stereo confidence metric using single view imagery with comparison to five alternative approaches","volume":"22","author":"Egnal","year":"2004","journal-title":"Image Vis. Comput."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2121","DOI":"10.1109\/TPAMI.2012.46","article-title":"A quantitative evaluation of confidence measures for stereo vision","volume":"34","author":"Hu","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Kim, S., Kim, S., Min, D., and Sohn, K. (2019, January 15\u201320). LAF-net: Locally adaptive fusion networks for stereo confidence estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00029"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Xu, H., and Zhang, J. (2020, January 14\u201319). Aanet: Adaptive aggregation network for efficient stereo matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00203"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Seki, A., and Pollefeys, M. (2017, January 21\u201326). Sgm-nets: Semi-global matching with neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.703"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., Sinha, S.N., and Pollefeys, M. (2018, January 8\u201314). Learning to fuse proposals from multiple scanline optimizations in semi-global matching. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_45"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., and Brox, T. (2016, January 27\u201330). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.438"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2281","DOI":"10.1109\/TMI.2019.2903562","article-title":"Ce-net: Context encoder network for 2d medical image segmentation","volume":"38","author":"Gu","year":"2019","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liang, Z., Feng, Y., Guo, Y., Liu, H., Chen, W., Qiao, L., Zhou, L., and Zhang, J. (2018, January 18\u201322). Learning for disparity estimation through feature constancy. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00297"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., and Stefano, L.D. (2019, January 15\u201320). Real-time self-adaptive deep stereo. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00028"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., and Bry, A. (2017, January 22\u201329). End-to-end learning of geometry and context for deep stereo regression. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.17"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chang, J.R., and Chen, Y.S. (2018, January 18\u201322). Pyramid stereo matching network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00567"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Nie, G.Y., Cheng, M.M., Liu, Y., Liang, Z., Fan, D.P., Liu, Y., and Wang, Y. (2019, January 15\u201320). Multi-level context ultra-aggregation for stereo matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00340"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chabra, R., Straub, J., Sweeney, C., Newcombe, R., and Fuchs, H. (2019, January 15\u201320). Stereodrnet: Dilated residual stereonet. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01206"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Guo, X., Yang, K., Yang, W., Wang, X., and Li, H. (2019, January 15\u201320). Group-wise correlation stereo network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00339"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, F., Prisacariu, V., Yang, R., and Torr, P.H. (2019, January 15\u201320). Ga-net: Guided aggregation net for end-to-end stereo matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00027"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Menze, M., and Geiger, A. (2015, January 7\u201312). Object Scene Flow for Autonomous Vehicles. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298925"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lipson, L., Teed, Z., and Deng, J. (2021, January 1\u20133). Raft-stereo: Multilevel recurrent field transforms for stereo matching. Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK.","DOI":"10.1109\/3DV53792.2021.00032"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Teed, Z., and Deng, J. (2020, January 23\u201328). Raft: Recurrent all-pairs field transforms for optical flow. Proceedings of the Computer Vision\u2014ECCV 2020: 16th European Conference, Glasgow, UK. Proceedings, Part II 16.","DOI":"10.1007\/978-3-030-58536-5_24"},{"key":"ref_35","unstructured":"Xiao, W., and Zhao, W. (2024). Stepwise Regression and Pre-trained Edge for Robust Stereo Matching. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, J., Wang, P., Xiong, P., Cai, T., Yan, Z., Yang, L., Liu, J., Fan, H., and Liu, S. (2022, January 18\u201324). Practical stereo matching via cascaded recurrent network with adaptive correlation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01578"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Shamsafar, F., Woerz, S., Rahim, R., and Zell, A. (2022, January 3\u20138). Mobilestereonet: Towards lightweight deep networks for stereo matching. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV51458.2022.00075"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201322). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_39","unstructured":"Guo, X., Zhang, C., Nie, D., Zheng, W., Zhang, Y., and Chen, L. (2024). LightStereo: Channel Boost Is All Your Need for Efficient 2D Cost Aggregation. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Kim, S., Yoo, D.-g., and Kim, Y.H. (2014, January 20\u201323). Stereo confidence metrics using the costs of surrounding pixels. Proceedings of the 2014 19th International Conference on Digital Signal Processing, Hong Kong, China.","DOI":"10.1109\/ICDSP.2014.6900808"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Haeusler, R., Nair, R., and Kondermann, D. (2013, January 23\u201328). Ensemble learning for confidence measures in stereo vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.46"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Spyropoulos, A., Komodakis, N., and Mordohai, P. (2014, January 23\u201328). Learning to detect ground control points for improving the accuracy of stereo matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.210"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Tosi, F., Poggi, M., Benincasa, A., and Mattoccia, S. (2018, January 8\u201314). Beyond local reasoning for stereo confidence estimation with deep learning. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_20"},{"key":"ref_44","unstructured":"Fu, Z., Ardabilian, M., and Stern, G. (2017, January 17\u201320). Stereo matching confidence learning based on multi-modal convolution neural networks. Proceedings of the Representations, Analysis and Recognition of Shape and Motion from Imaging Data: 7th International Workshop, RFMI 2017, Savoie, France. Revised Selected Papers 7."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Mehltretter, M., and Heipke, C. (2019, January 27\u201328). CNN-Based Cost Volume Analysis as Confidence Measure for Dense Matching. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00262"},{"key":"ref_46","first-page":"2287","article-title":"Stereo matching by training a convolutional neural network to compare image patches","volume":"17","author":"Zbontar","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"69","DOI":"10.5194\/isprs-annals-V-2-2022-69-2022","article-title":"Joint estimation of depth and its uncertainty from stereo images using bayesian deep learning","volume":"2","author":"Mehltretter","year":"2022","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Chen, L., Wang, W., and Mordohai, P. (2023, January 17\u201324). Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01653"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"427","DOI":"10.5194\/isprsannals-II-3-W5-427-2015","article-title":"Joint 3D Estimation of Vehicles and Scene Flow","volume":"II-3\/W5","author":"Menze","year":"2015","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Agrawal, A., M\u00fcller, T., Schm\u00e4hling, T., Elser, S., and Eberhardt, J. (2023, January 16\u201319). RWU3D: Real World ToF and Stereo Dataset with High Quality Ground Truth. Proceedings of the 2023 Twelfth International Conference on Image Processing Theory, Tools and Applications (IPTA), Paris, France.","DOI":"10.1109\/IPTA59101.2023.10320041"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1145\/1531326.1531330","article-title":"PatchMatch: A randomized correspondence algorithm for structural image editing","volume":"28","author":"Barnes","year":"2009","journal-title":"ACM Trans. Graph."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/10\/8\/198\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:36:32Z","timestamp":1760110592000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/10\/8\/198"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,14]]},"references-count":51,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["jimaging10080198"],"URL":"https:\/\/doi.org\/10.3390\/jimaging10080198","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,14]]}}}