{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T19:45:34Z","timestamp":1769283934458,"version":"3.49.0"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T00:00:00Z","timestamp":1700438400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T00:00:00Z","timestamp":1700438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100017090","name":"Sony","doi-asserted-by":"publisher","award":["20211011"],"award-info":[{"award-number":["20211011"]}],"id":[{"id":"10.13039\/100017090","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003407","name":"Ministero dell\u2019Istruzione, dell\u2019Universit\u00e1 e della Ricerca","doi-asserted-by":"publisher","award":["232\/2016"],"award-info":[{"award-number":["232\/2016"]}],"id":[{"id":"10.13039\/501100003407","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Vis Comput"],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Monocular depth estimation is an open challenge due to the ill-posed nature of the problem at hand. Deep learning techniques proved capable of producing acceptable depth estimation accuracy but the lack of robust depth cues within RGB images severally limits their performance. Coded aperture-based methods using phase and amplitude masks encode strong depth cues within 2D images by means of depth-dependent Point Spread Functions (PSFs) at the price of a reduced image quality. In this paper, we propose a novel end-to-end learning approach for depth from diffracted rotation. A phase mask that produces a Rotating Point Spread Function (RPSF) as a function of defocus is jointly optimized with the weights of a depth estimation neural network. To this aim, we introduce a differentiable physical model of the aperture mask and exploit an accurate simulation of the camera imaging pipeline. Our approach requires a significantly less complex model and less training data, yet it outperforms existing methods for monocular depth estimation on indoor benchmarks. In addition, we address the image degradation problem by incorporating a non-blind and nonuniform image deblurring module to recover the sharp all-in-focus image from its blurred counterpart.\n<\/jats:p>","DOI":"10.1007\/s00371-023-03147-8","type":"journal-article","created":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T15:03:42Z","timestamp":1700492622000},"page":"5961-5977","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["End-to-end learning for joint depth and image reconstruction from diffracted rotation"],"prefix":"10.1007","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8081-0529","authenticated-orcid":false,"given":"Mazen","family":"Mel","sequence":"first","affiliation":[]},{"given":"Muhammad","family":"Siddiqui","sequence":"additional","affiliation":[]},{"given":"Pietro","family":"Zanuttigh","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,11,20]]},"reference":[{"issue":"2","key":"3147_CR1","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1364\/OL.31.000181","volume":"31","author":"A Greengard","year":"2006","unstructured":"Greengard, A., Schechner, Y.Y., Piestun, R.: Depth from diffracted rotation. Opt. Lett. 31(2), 181\u2013183 (2006)","journal-title":"Opt. Lett."},{"issue":"11","key":"3147_CR2","doi-asserted-by":"publisher","first-page":"8185","DOI":"10.1103\/PhysRevA.45.8185","volume":"45","author":"L Allen","year":"1992","unstructured":"Allen, L., Beijersbergen, M.W., Spreeuw, R., Woerdman, J.: Orbital angular momentum of light and the transformation of Laguerre\u2013Gaussian laser modes. Phys. Rev. A 45(11), 8185 (1992)","journal-title":"Phys. Rev. A"},{"issue":"2","key":"3147_CR3","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1364\/JOSAA.17.000294","volume":"17","author":"R Piestun","year":"2000","unstructured":"Piestun, R., Schechner, Y.Y., Shamir, J.: Propagation-invariant wave fields with finite energy. JOSA A 17(2), 294\u2013303 (2000)","journal-title":"JOSA A"},{"issue":"5","key":"3147_CR4","doi-asserted-by":"publisher","first-page":"3484","DOI":"10.1364\/OE.16.003484","volume":"16","author":"SRP Pavani","year":"2008","unstructured":"Pavani, S.R.P., Piestun, R.: High-efficiency rotating point spread functions. Opt. Express 16(5), 3484\u20133489 (2008)","journal-title":"Opt. Express"},{"issue":"13","key":"3147_CR5","doi-asserted-by":"publisher","first-page":"133902","DOI":"10.1103\/PhysRevLett.113.133902","volume":"113","author":"Y Shechtman","year":"2014","unstructured":"Shechtman, Y., Sahl, S.J., Backer, A.S., Moerner, W.: Optimal point spread function design for 3d imaging. Phys. Rev. Lett. 113(13), 133902 (2014)","journal-title":"Phys. Rev. Lett."},{"issue":"5","key":"3147_CR6","doi-asserted-by":"publisher","first-page":"849","DOI":"10.1364\/JOSAA.22.000849","volume":"22","author":"VV Kotlyar","year":"2005","unstructured":"Kotlyar, V.V., Almazov, A.A., Khonina, S.N., Soifer, V.A., Elfstrom, H., Turunen, J.: Generation of phase singularity through diffracting a plane or gaussian beam by a spiral phase plate. JOSA A 22(5), 849\u2013861 (2005)","journal-title":"JOSA A"},{"issue":"12","key":"3147_CR7","doi-asserted-by":"publisher","first-page":"2656","DOI":"10.1364\/AO.45.002656","volume":"45","author":"VV Kotlyar","year":"2006","unstructured":"Kotlyar, V.V., Kovalev, A.A., Khonina, S.N., Skidanov, R.V., Soifer, V.A., Elfstrom, H., Tossavainen, N., Turunen, J.: Diffraction of conic and gaussian beams by a spiral phase plate. Appl. Opt. 45(12), 2656\u20132665 (2006)","journal-title":"Appl. Opt."},{"issue":"4","key":"3147_CR8","doi-asserted-by":"publisher","first-page":"585","DOI":"10.1364\/OL.38.000585","volume":"38","author":"S Prasad","year":"2013","unstructured":"Prasad, S.: Rotating point spread function via pupil-phase engineering. Opt. Lett. 38(4), 585\u2013587 (2013)","journal-title":"Opt. Lett."},{"issue":"4","key":"3147_CR9","doi-asserted-by":"publisher","first-page":"4873","DOI":"10.1364\/OE.26.004873","volume":"26","author":"R Berlich","year":"2018","unstructured":"Berlich, R., Stallinga, S.: High-order-helix point spread functions for monocular three-dimensional imaging with superior aberration robustness. Opt. Express 26(4), 4873\u20134891 (2018)","journal-title":"Opt. Express"},{"key":"3147_CR10","unstructured":"Kumar, R.: Three-dimensional imaging using a novel rotating point spread function imager, The University of New Mexico (2015)"},{"issue":"3","key":"3147_CR11","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1145\/1276377.1276464","volume":"26","author":"A Levin","year":"2007","unstructured":"Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. ACM Trans Graph (TOG) 26(3), 70\u2013526 (2007)","journal-title":"ACM Trans Graph (TOG)"},{"issue":"1","key":"3147_CR12","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1007\/s11263-010-0409-8","volume":"93","author":"C Zhou","year":"2011","unstructured":"Zhou, C., Lin, S., Nayar, S.K.: Coded aperture pairs for depth from defocus and defocus deblurring. Int. J. Comput. Vision 93(1), 53\u201372 (2011)","journal-title":"Int. J. Comput. Vision"},{"key":"3147_CR13","unstructured":"Kumar, R., Prasad, S.: Psf rotation with changing defocus and applications to 3d imaging for space situational awareness, In: Proceedings of the 2013 AMOS Technical Conference, Maui, HI (2013)"},{"issue":"4","key":"3147_CR14","doi-asserted-by":"publisher","first-page":"4029","DOI":"10.1364\/OE.22.004029","volume":"22","author":"C Roider","year":"2014","unstructured":"Roider, C., Jesacher, A., Bernet, S., Ritsch-Marte, M.: Axial super-localisation using rotating point spread functions shaped by polarisation-dependent phase modulation. Opt. Express 22(4), 4029\u20134037 (2014)","journal-title":"Opt. Express"},{"issue":"6","key":"3147_CR15","doi-asserted-by":"publisher","first-page":"5946","DOI":"10.1364\/OE.24.005946","volume":"24","author":"R Berlich","year":"2016","unstructured":"Berlich, R., Br\u00e4uer, A., Stallinga, S.: Single shot three-dimensional imaging using an engineered point spread function. Opt. Express 24(6), 5946\u20135960 (2016)","journal-title":"Opt. Express"},{"issue":"3","key":"3147_CR16","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1109\/TCI.2018.2849326","volume":"4","author":"H Haim","year":"2018","unstructured":"Haim, H., Elmalem, S., Giryes, R., Bronstein, A.M., Marom, E.: Depth estimation from a single image using deep learned phase coded mask. IEEE Trans. Comput. Imag. 4(3), 298\u2013310 (2018)","journal-title":"IEEE Trans. Comput. Imag."},{"key":"3147_CR17","doi-asserted-by":"crossref","unstructured":"Chang, J., Wetzstein, G.: Deep optics for monocular depth estimation and 3d object detection, In: Proceedings of the IEEE\/CVF international conference on computer vision, pp. 10193\u201310202 (2019)","DOI":"10.1109\/ICCV.2019.01029"},{"key":"3147_CR18","doi-asserted-by":"crossref","unstructured":"Wu, Y., Boominathan, V., Chen, H., Sankaranarayanan, A., Veeraraghavan, A.: Phasecam3d-learning phase masks for passive single view depth estimation, In: 2019 IEEE International Conference on Computational Photography (ICCP), IEEE, pp. 1\u201312 (2019)","DOI":"10.1109\/ICCPHOT.2019.8747330"},{"issue":"1","key":"3147_CR19","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1109\/10.900255","volume":"48","author":"DR Iskander","year":"2001","unstructured":"Iskander, D.R., Collins, M.J., Davis, B.: Optimal modeling of corneal surfaces with zernike polynomials. IEEE Trans. Biomed. Eng. 48(1), 87\u201395 (2001)","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"3147_CR20","volume-title":"Principles of optics: electromagnetic theory of propagation, interference and diffraction of light","author":"M Born","year":"2013","unstructured":"Born, M., Wolf, E.: Principles of optics: electromagnetic theory of propagation, interference and diffraction of light. Elsevier, Amsterdam (2013)"},{"key":"3147_CR21","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., Brox, T.: U-net Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 234\u2013241. Springer, Berlin (2015)","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"3147_CR22","doi-asserted-by":"crossref","unstructured":"Anger, J., Facciolo, G., Delbracio, M.: Modeling realistic degradations in non-blind deconvolution, In: 2018 25th IEEE International conference on image processing (ICIP), IEEE, pp. 978\u2013982 (2018)","DOI":"10.1109\/ICIP.2018.8451115"},{"key":"3147_CR23","doi-asserted-by":"crossref","unstructured":"Cho, S., Lee, S.: Fast motion deblurring, In: ACM SIGGRAPH Asia 2009 papers, pp. 1\u20138 (2009)","DOI":"10.1145\/1661412.1618491"},{"issue":"12","key":"3147_CR24","doi-asserted-by":"publisher","first-page":"5468","DOI":"10.1109\/TNNLS.2020.2968289","volume":"31","author":"D Gong","year":"2020","unstructured":"Gong, D., Zhang, Z., Shi, Q., van den Hengel, A., Shen, C., Zhang, Y.: Learning deep gradient descent optimization for image deconvolution. IEEE Trans. Neural Netw. Learn. Syst. 31(12), 5468\u20135482 (2020)","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"3147_CR25","first-page":"1033","volume":"22","author":"D Krishnan","year":"2009","unstructured":"Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-Laplacian priors. Adv. Neural. Inf. Process. Syst. 22, 1033\u20131041 (2009)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"3147_CR26","first-page":"1790","volume":"27","author":"L Xu","year":"2014","unstructured":"Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. Adv. Neural. Inf. Process. Syst. 27, 1790\u20131798 (2014)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"3147_CR27","doi-asserted-by":"crossref","unstructured":"Schuler, C.J., Christopher\u00a0Burger, H., Harmeling, S., Scholkopf, B.: A machine learning approach for non-blind image deconvolution, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1067\u20131074 (2013)","DOI":"10.1109\/CVPR.2013.142"},{"key":"3147_CR28","unstructured":"Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: wiener meets deep learning for image deblurring, arXiv preprint arXiv:2103.09962"},{"key":"3147_CR29","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., H\u00e4usser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation, In: IEEE International conference on computer vision and pattern recognition (CVPR), 2016, arXiv:1512.02134. http:\/\/lmb.informatik.uni-freiburg.de\/Publications\/2016\/MIFDB16","DOI":"10.1109\/CVPR.2016.438"},{"key":"3147_CR30","unstructured":"Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network, arXiv preprint arXiv:1406.2283"},{"key":"3147_CR31","doi-asserted-by":"crossref","unstructured":"Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, In: Proceedings of the IEEE international conference on computer vision, pp. 2650\u20132658 (2015)","DOI":"10.1109\/ICCV.2015.304"},{"key":"3147_CR32","unstructured":"Bhat, S.F., Alhashim, I., Wonka, P.: Adabins: depth estimation using adaptive bins, arXiv preprint arXiv:2011.14141"},{"key":"3147_CR33","doi-asserted-by":"crossref","unstructured":"Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: A rgb-d scene understanding benchmark suite, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 567\u2013576 (2015)","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"3147_CR34","doi-asserted-by":"crossref","unstructured":"Malvar, H.S., He, L.-w., Cutler, R.: High-quality linear interpolation for demosaicing of bayer-patterned color images, In: 2004 IEEE International conference on acoustics, speech, and signal processing, Vol.\u00a03, IEEE, pp. iii\u2013485 (2004)","DOI":"10.1109\/ICASSP.2004.1326587"},{"key":"3147_CR35","unstructured":"M. A. et\u00a0al., TensorFlow: large-scale machine learning on heterogeneous systems, software available from tensorflow.org (2015). http:\/\/tensorflow.org\/"},{"key":"3147_CR36","unstructured":"Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, arXiv preprint arXiv:1412.6980"},{"key":"3147_CR37","unstructured":"Chen, X., Chen, X., Zha, Z.-J.: Structure-aware residual pyramid network for monocular depth estimation, arXiv preprint arXiv:1907.06023"},{"key":"3147_CR38","doi-asserted-by":"crossref","unstructured":"Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks, In: 2018 International Conference on 3D Vision (3DV), IEEE, pp. 304\u2013313 (2018)","DOI":"10.1109\/3DV.2018.00043"},{"issue":"10","key":"3147_CR39","doi-asserted-by":"publisher","first-page":"1448","DOI":"10.1109\/TIP.2005.854474","volume":"14","author":"SJ Reeves","year":"2005","unstructured":"Reeves, S.J.: Fast image restoration without boundary artifacts. IEEE Trans. Image Process. 14(10), 1448\u20131453 (2005)","journal-title":"IEEE Trans. Image Process."},{"key":"3147_CR40","doi-asserted-by":"crossref","unstructured":"Kraus, M., Strengert, M.: Depth-of-field rendering by pyramidal image processing, In: Computer graphics forum, Vol.\u00a026, Wiley Online Library, pp. 645\u2013654 (2007)","DOI":"10.1111\/j.1467-8659.2007.01088.x"},{"key":"3147_CR41","volume-title":"Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications","author":"N Wiener","year":"1964","unstructured":"Wiener, N., et al.: Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications, vol. 8. MIT press, Cambridge (1964)"},{"key":"3147_CR42","doi-asserted-by":"crossref","unstructured":"Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks, In: 2016 Fourth international conference on 3D vision (3DV). IEEE. 239\u2013248 (2016)","DOI":"10.1109\/3DV.2016.32"},{"key":"3147_CR43","doi-asserted-by":"crossref","unstructured":"Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2002\u20132011 (2018)","DOI":"10.1109\/CVPR.2018.00214"},{"key":"3147_CR44","doi-asserted-by":"crossref","unstructured":"Qi, X., Liao, R., Liu, Z., Urtasun, R., Jia, J.: Geonet: Geometric neural network for joint depth and surface normal estimation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 283\u2013291 (2018)","DOI":"10.1109\/CVPR.2018.00037"},{"key":"3147_CR45","doi-asserted-by":"crossref","unstructured":"Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction, In: Proceedings of the IEEE\/CVF international conference on computer vision, pp. 5684\u20135693 (2019)","DOI":"10.1109\/ICCV.2019.00578"},{"key":"3147_CR46","unstructured":"Lee, J.H., Han, M.-K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation, arXiv preprint arXiv:1907.10326"},{"key":"3147_CR47","doi-asserted-by":"crossref","unstructured":"Huynh, L., Nguyen-Ha, P., Matas, J., Rahtu, E., Heikkil\u00e4, J.: Guiding monocular depth estimation using depth-attention volume, In: European conference on computer vision, Springer, pp. 581\u2013597 (2020)","DOI":"10.1007\/978-3-030-58574-7_35"},{"key":"3147_CR48","unstructured":"Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning, arXiv preprint arXiv:1812.11941"},{"key":"3147_CR49","unstructured":"Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction, arXiv preprint arXiv:2103.13413"},{"key":"3147_CR50","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770\u2013778 (2016)","DOI":"10.1109\/CVPR.2016.90"},{"key":"3147_CR51","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database, In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, 248\u2013255 (2009)","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"3147_CR52","doi-asserted-by":"crossref","unstructured":"Diakogiannis, F.I., Waldner, F., Caccetta, P., Wu, C.: Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote. Sens. 162, 94\u2013114 (2020)","DOI":"10.1016\/j.isprsjprs.2020.01.013"}],"container-title":["The Visual Computer"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00371-023-03147-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00371-023-03147-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00371-023-03147-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,13]],"date-time":"2024-08-13T15:09:52Z","timestamp":1723561792000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00371-023-03147-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,20]]},"references-count":52,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["3147"],"URL":"https:\/\/doi.org\/10.1007\/s00371-023-03147-8","relation":{},"ISSN":["0178-2789","1432-2315"],"issn-type":[{"value":"0178-2789","type":"print"},{"value":"1432-2315","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,20]]},"assertion":[{"value":"20 October 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 November 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}