{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T02:25:26Z","timestamp":1768703126136,"version":"3.49.0"},"reference-count":28,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2021,9,10]],"date-time":"2021-09-10T00:00:00Z","timestamp":1631232000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62076122"],"award-info":[{"award-number":["62076122"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Natural Science Foundation of Nanjing Institute of Technology","award":["CKJB201804 \uff0c ZKJ201906"],"award-info":[{"award-number":["CKJB201804 \uff0c ZKJ201906"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Depth estimation based on light field imaging is a new methodology that has succeeded the traditional binocular stereo matching and depth from monocular images. Significant progress has been made in light-field depth estimation. Nevertheless, the balance between computational time and the accuracy of depth estimation is still worth exploring. The geometry in light field imaging is the basis of depth estimation, and the abundant light-field data provides convenience for applying deep learning algorithms. The Epipolar Plane Image (EPI) generated from the light-field data has a line texture containing geometric information. The slope of the line is proportional to the depth of the corresponding object. Considering the light field depth estimation as a spatial density prediction task, we design a convolutional neural network (ESTNet) to estimate the accurate depth quickly. Inspired by the strong image feature extraction ability of convolutional neural networks, especially for texture images, we propose to generate EPI synthetic images from light field data as the input of ESTNet to improve the effect of feature extraction and depth estimation. The architecture of ESTNet is characterized by three input streams, encoding-decoding structure, and skipconnections. The three input streams receive horizontal EPI synthetic image (EPIh), vertical EPI synthetic image (EPIv), and central view image (CV), respectively. EPIh and EPIv contain rich texture and depth cues, while CV provides pixel position association information. ESTNet consists of two stages: encoding and decoding. The encoding stage includes several convolution modules, and correspondingly, the decoding stage embodies some transposed convolution modules. In addition to the forward propagation of the network ESTNet, some skip-connections are added between the convolution module and the corresponding transposed convolution module to fuse the shallow local and deep semantic features. ESTNet is trained on one part of a synthetic light-field dataset and then tested on another part of the synthetic light-field dataset and real light-field dataset. Ablation experiments show that our ESTNet structure is reasonable. Experiments on the synthetic light-field dataset and real light-field dataset show that our ESTNet can balance the accuracy of depth estimation and computational time.<\/jats:p>","DOI":"10.3390\/s21186061","type":"journal-article","created":{"date-parts":[[2021,9,12]],"date-time":"2021-09-12T21:48:01Z","timestamp":1631483281000},"page":"6061","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Depth Estimation from Light Field Geometry Using Convolutional Neural Networks"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8399-9184","authenticated-orcid":false,"given":"Lei","family":"Han","sequence":"first","affiliation":[{"name":"School of Computer Engineering, Nanjing Institute of Technology, Nanjing 211167, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8897-3517","authenticated-orcid":false,"given":"Xiaohua","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Computer Engineering, Nanjing Institute of Technology, Nanjing 211167, China"}]},{"given":"Zhan","family":"Shi","sequence":"additional","affiliation":[{"name":"School of Computer Engineering, Nanjing Institute of Technology, Nanjing 211167, China"}]},{"given":"Shengnan","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Computer Engineering, Nanjing Institute of Technology, Nanjing 211167, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.neucom.2016.09.136","article-title":"High quality depth map estimation of object surface from light-field images","volume":"252","author":"Liu","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2024","DOI":"10.1109\/TPAMI.2015.2505283","article-title":"Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields","volume":"38","author":"Liu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","first-page":"1","article-title":"Scene reconstruction from high spatio-angular resolution light fields","volume":"32","author":"Kim","year":"2013","journal-title":"ACM Trans. Graph."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Cavalin, P., and Oliveira, L.S. (2017, January 17\u201318). A review of texture classification methods and databases. Proceedings of the 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), Niter\u00f3i, Brazil.","DOI":"10.1109\/SIBGRAPI-T.2017.10"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Shin, C., Jeon, H.-G., Yoon, Y., Kweon, I.S., and Kim, S.J. (2018, January 19\u201321). Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00499"},{"key":"ref_6","unstructured":"Han, L., Huang, X., Shi, Z., and Zheng, S. (2020, January 20\u201322). Learning Depth from Light Field via Deep Convolutional Neural Network. Proceedings of the 2nd International Conference on Big Data and Security (ICBDS), Singapore."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Tao, M.W., Hadap, S., Malik, J., and Ramamoorthi, R. (2013, January 1\u20138). Depth from combining defocus and correspondence using light-field cameras. Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.89"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"390-1","DOI":"10.2352\/ISSN.2470-1173.2018.13.IPAS-390","article-title":"Occlusion Aware Reduced Angular Candidates based Light Field Depth Estimation from an Epipolar Plane Image","volume":"2018","author":"Mun","year":"2018","journal-title":"Electron. Imaging"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1016\/j.jvcir.2018.06.020","article-title":"Guided filtering based data fusion for light field depth estimation with L0 gradient minimization","volume":"55","author":"Han","year":"2018","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wanner, S., and Goldluecke, B. (2012, January 16\u201321). Globally consistent depth labeling of 4D light fields. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6247656"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3257","DOI":"10.1109\/TIP.2015.2440760","article-title":"Continuous depth map reconstruction from light fields","volume":"24","author":"Li","year":"2015","journal-title":"IEEE Trans. Image Process."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1109\/TIP.2018.2871753","article-title":"A maximum likelihood approach for depth field estimation based on epipolar plane images","volume":"28","author":"Neri","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lourenco, R., Assuncao, P.A., Tavora, L.M., Fonseca-Pinto, R., and Faria, S.M. (2018, January 7\u201310). Silhouette enhancement in light field disparity estimation using the structure tensor. Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece.","DOI":"10.1109\/ICIP.2018.8451848"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, J., and Jin, X. (2020, January 4\u20138). EPI-neighborhood distribution based light field depth estimation. Proceedings of the ICASSP 2020\u20132020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053664"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Schilling, H., Diebold, M., Rother, C., and Jahne, B. (2018, January 18\u201323). Trust your model: Light field depth estimation with inline occlusion handling. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00476"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Jeon, H.-G., Park, J., Choe, G., Park, J., Bok, Y., Tai, Y.-W., and Kweon, I.S. (2015, January 7\u201312). Accurate depth map estimation from a lenslet light field camera. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298762"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lin, H., Chen, C., Kang, S.B., and Yu, J. (2015, January 7\u201313). Depth recovery from light field using focal stack symmetry. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.394"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/MSP.2007.905883","article-title":"Plenoptic manifolds","volume":"24","author":"Berent","year":"2007","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"900","DOI":"10.1109\/JDT.2014.2360992","article-title":"Depth from light fields analyzing 4D local structure","volume":"11","author":"Luke","year":"2014","journal-title":"J. Disp. Technol."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Heber, S., and Pock, T. (2016, January 27\u201330). Convolutional networks for shape from light field. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.407"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Guo, C., Jin, J., Hou, J., and Chen, J. (2020, January 6\u201310). Accurate light field depth estimation via an occlusion-aware network. Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK.","DOI":"10.1109\/ICME46284.2020.9102829"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Heber, S., Yu, W., and Pock, T. (2017, January 22\u201329). Neural epi-volume networks for shape from light field. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.247"},{"key":"ref_23","unstructured":"Liang, L. (2019). Study of Light Field Depth Estimation Based on Deep Learning. [Master\u2019s Thesis, Hangzhou Dianzi University]."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"102585","DOI":"10.1016\/j.dsp.2019.102585","article-title":"A hybrid learning of multimodal cues for light field depth estimation","volume":"95","author":"Zhou","year":"2019","journal-title":"Digit. Signal Process."},{"key":"ref_25","unstructured":"Ma, H., Li, H., Qian, Z., Shi, S., and Mu, T. (2018). VommaNet: An End-to-End network for disparity estimation from reflective and texture-less light field images. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"43369","DOI":"10.1109\/ACCESS.2019.2908685","article-title":"Dense convolutional networks for semantic segmentation","volume":"7","author":"Han","year":"2019","journal-title":"IEEE Access"},{"key":"ref_27","unstructured":"Honauer, K., Johannsen, O., Kondermann, D., and Goldluecke, B. (2016). A dataset and evaluation methodology for depth estimation on 4D light fields. Asian Conference on Computer Vision, Springer."},{"key":"ref_28","unstructured":"Mousnier, A., Vural, E., and Guillemot, C. (2015). Partial light field tomographic reconstruction from a fixed-camera focal stack. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/18\/6061\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:00:08Z","timestamp":1760166008000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/18\/6061"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,10]]},"references-count":28,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["s21186061"],"URL":"https:\/\/doi.org\/10.3390\/s21186061","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,10]]}}}