{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T17:51:17Z","timestamp":1774893077585,"version":"3.50.1"},"reference-count":41,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2019,5,29]],"date-time":"2019-05-29T00:00:00Z","timestamp":1559088000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"MOTIE Research Grant; ICT R&amp;D program of MSIT\/IITP","award":["10067764; 2016-0-00098"],"award-info":[{"award-number":["10067764; 2016-0-00098"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Herein, we propose an unsupervised learning architecture under coupled consistency conditions to estimate the depth, ego-motion, and optical flow. Previously invented learning techniques in computer vision adopted a large amount of the ground truth dataset for network training. A ground truth dataset, including depth and optical flow collected from the real world, requires tremendous effort in pre-processing due to the exposure to noise artifacts. In this paper, we propose a framework that trains networks while using a different type of data with combined losses that are derived from a coupled consistency structure. The core concept is composed of two parts. First, we compare the optical flows, which are estimated from both the depth plus ego-motion and flow estimation network. Subsequently, to prevent the effects of the artifacts of the occluded regions in the estimated optical flow, we compute flow local consistency along the forward\u2013backward directions. Second, synthesis consistency enables the exploration of the geometric correlation between the spatial and temporal domains in a stereo video. We perform extensive experiments on the depth, ego-motion, and optical flow estimation on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset. We verify that the flow local consistency loss improves the optical flow accuracy in terms of the occluded regions. Furthermore, we also show that the view-synthesis-based photometric loss enhances the depth and ego-motion accuracy via scene projection. The experimental results exhibit the competitive performance of the estimated depth and the optical flow; moreover, the induced ego-motion is comparable to that obtained from other unsupervised methods.<\/jats:p>","DOI":"10.3390\/s19112459","type":"journal-article","created":{"date-parts":[[2019,5,29]],"date-time":"2019-05-29T11:31:28Z","timestamp":1559129488000},"page":"2459","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Unsupervised Learning for Depth, Ego-Motion, and Optical Flow Estimation Using Coupled Consistency Conditions"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2037-7273","authenticated-orcid":false,"given":"Ji-Hun","family":"Mun","sequence":"first","affiliation":[{"name":"School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2775-7789","authenticated-orcid":false,"given":"Moongu","family":"Jeon","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea"}]},{"given":"Byung-Geun","family":"Lee","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2019,5,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/S0921-8890(02)00372-X","article-title":"A survey of socially interactive robot","volume":"42","author":"Fong","year":"2003","journal-title":"Robot. Auton. Syst."},{"key":"ref_2","first-page":"3872","article-title":"Topological mapping, localization and navigation using image collections","volume":"77","author":"Fraundorfer","year":"2007","journal-title":"Int. Conf. Intell. Robot. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/TPAMI.2007.1166","article-title":"Stereo processing by semiglobal matching and mutual information","volume":"30","author":"Hirschmuller","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.procs.2016.05.305","article-title":"Embedded real-time stereo estimation via semi-global matching on the GPU","volume":"80","author":"Hernandez","year":"2016","journal-title":"Procedia Comput. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Newcombe, R.A., Lovegrove, S.J., and Davison, A.J. (2011, January 6\u201313). DTAM: Dense tracking and mapping in real-time. Proceedings of the 2011 International Conference on Computer Vision (ICCV 2011), Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126513"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Agrawal, P., Carreira, J., and Malik, J. (2015, January 13\u201316). Learning to see by moving. Proceedings of the 2015 International Conference on Computer Vision (ICCV 2015), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.13"},{"key":"ref_7","unstructured":"Eigen, D., Puhrsch, C., and Fergus, R. (2014, January 8\u201313). Depth map prediction from a single image using a multi-scale deep network. Proceedings of the 2014 Neural Information Processing Systems (NIPS 2014), Montreal, QC, Canada."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Furukawa, W., Curless, B., Seitz, S.M., and Szeliski, R. (2010, January 13\u201318). Towards internet-scale multi-view stereo. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2010), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539802"},{"key":"ref_9","unstructured":"Triggs, B., McLauchlan, P.F., Hartley, R.I., and Fitzgibbon, A.W. (2002). Bundle adjustment\u2014A modern synthesis. International Workshop on Vision Algorithms, Springer."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kendall, A., Grimes, M., and Cipolla, R. (2015, January 13\u201316). PoseNet: A convolutional network for real-time 6-DoF camera relocalization. Proceedings of the 2015 International Conference on Computer Vision (ICCV 2015), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.336"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, S., Clark, R., Wen, H., and Trigoni, N. (June, January 29). Deepvo: Toward end-to-end visual odometry with deep recurrent convolutional neural networks. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA 2017), Singapore.","DOI":"10.1109\/ICRA.2017.7989236"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ladicky, L., Zeisl, B., and Pollefeys, M. (2014, January 6\u201312). Discriminatively trained dense surface normal estimation. Proceedings of the 13th European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_31"},{"key":"ref_13","unstructured":"Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B.L., and Yuille, A.L. (2015, January 8\u201310). Towards unified depth and semantic prediction from a single image. Proceedings of the 28th IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Garg, R., BG, V.K., Carneiro, G., and Reid, I. (2016, January 8\u201316). Unsupervised cnn for single view depth estimation. Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_45"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xie, J., Girshick, R.B., and Farhadi, A. (2016, January 8\u201316). Deep3D: Fully automatic 2D-to-3D video conversion with deep convolutional neural networks. Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_51"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., and Navab, N. (2016, January 25). Deeper depth prediction with fully convolutional residual networks. Proceedings of the 4th International Conference on 3D Vision (3dv 2016), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.32"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, P., Bachrach, A., and Bry, A. (2017, January 22\u201329). End-to-end learning of geometry and context for deep stereo regression. Proceedings of the 2017 International Conference on Computer Vision (ICCV 2017), Venice, Italy.","DOI":"10.1109\/ICCV.2017.17"},{"key":"ref_18","unstructured":"Jason, J.Y., Harley, A.W., and Derpanis, K.G. (2016, January 8\u201316). Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands."},{"key":"ref_19","unstructured":"Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., and Fragkiadaki, K. (2017). Sfm-net: Learning of structure and motion from video. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhou, T., Brow, M., Snavely, N., and Lowe, D. (2014, January 21\u201326). Unsupervised learning of depth and ego-motion from video. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.700"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Godard, C., Mac Aodha, O., and Brostow, G.J. (2016, January 27\u201330). Unsupervised monocular depth estimation with left-right consistency. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las vegas, NV, USA.","DOI":"10.1109\/CVPR.2017.699"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"32796","DOI":"10.1109\/ACCESS.2019.2903871","article-title":"Unsupervised learning of accurate camera pose and depth from video sequences with Kalman filter","volume":"7","author":"Wang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Flynn, J., Neulander, I., Philbin, J., and Snavely, N. (2016, January 27\u201330). Deep stereo: Learning to predict new views from the world\u2019s imagery. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.595"},{"key":"ref_25","unstructured":"Ian, G., Jean, P., Mehdi, M., Bing, X., David, W., Sherjil, O., Aaron, C., and Yoshua, B. (2014, January 8\u201313). Generative Adversarial Nets. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhu, J.Y., Park, T., Isola, P., and Efros, A.A. (2017, January 22\u201329). Unpaired image-to-image translation using cycle-consistency adversarial networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy.","DOI":"10.1109\/ICCV.2017.244"},{"key":"ref_27","first-page":"2017","article-title":"Spatial transformer networks","volume":"28","author":"Jaderberg","year":"2016","journal-title":"Adv. Neural Info. Process. Syst."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: from error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Proc."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Meister, S., Hur, J., and Roth, S. (2018, January 2\u20137). UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12276"},{"key":"ref_30","unstructured":"Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., De Vito, Z., Lin, Z., Lin, Z., Desmaison, A., and Antiaga, L. (2017, January 4\u20139). Automatic differentiation in PyTorch. Proceedings of the 2017 International Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_31","unstructured":"Kingma, D., and Ba, J. (2014, January 3\u20135). Adam: A method for stochastic optimization. Proceedings of the 13th International Conference on Learning Representations (ICLR 2014), Detroit, MI, USA."},{"key":"ref_32","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv."},{"key":"ref_33","unstructured":"Nair, V., and Hinton, G.E. (2010, January 21\u201324). Rectified linear units improve restricted Boltzmann machines. Proceedings of the 27th international conference on machine learning (ICML 2010), Haifa, Israel."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Haziirbas, C., Golkov, V., van der Smagt, P., Cremers, D., and Brox, T. (2015, January 11\u201318). Flownet: Learning optical flow with convolutional networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Los Condes, Chile.","DOI":"10.1109\/ICCV.2015.316"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. (2016, January 27\u201330). FlowNet 2.0: Evolution of optical flow estimation with deep networks. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las vegas, NV, USA.","DOI":"10.1109\/CVPR.2017.179"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., and Brox, T. (2016, January 27\u201330). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.438"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ren, J., Yan, J., Ni, B., Liu, B., Yang, X., and Zha, H. (2017, January 4\u20139). Unsupervised deep learning for optical flow estimation. Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.10723"},{"key":"ref_39","unstructured":"Thomas, B., Andres, B., Nils, P., and Joachim, W. (2004, January 6\u201312). High accuracy optical flow estimation based on a theory for warping. Proceedings of the European Conference on Computer Vision (ECCV 2004), Zurich, Switzerland."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1109\/TRO.2015.2463671","article-title":"ORB-SLAM: A versatile and accurate monocular SLAM system","volume":"31","author":"Tards","year":"2015","journal-title":"IEEE Trans. Robot."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Yin, Z., and Shi, J. (2018, January 18\u201322). GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00212"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/11\/2459\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:54:27Z","timestamp":1760187267000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/11\/2459"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,29]]},"references-count":41,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["s19112459"],"URL":"https:\/\/doi.org\/10.3390\/s19112459","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,5,29]]}}}