{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:57:43Z","timestamp":1760234263561,"version":"build-2065373602"},"reference-count":41,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T00:00:00Z","timestamp":1619481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>In this paper, we propose two novel AR glasses pose estimation algorithms from single infrared images by using 3D point clouds as an intermediate representation. Our first approach \u201cPointsToRotation\u201d is based on a Deep Neural Network alone, whereas our second approach \u201cPointsToPose\u201d is a hybrid model combining Deep Learning and a voting-based mechanism. Our methods utilize a point cloud estimator, which we trained on multi-view infrared images in a semi-supervised manner, generating point clouds based on one image only. We generate a point cloud dataset with our point cloud estimator using the HMDPose dataset, consisting of multi-view infrared images of various AR glasses with the corresponding 6-DoF poses. In comparison to another point cloud-based 6-DoF pose estimation named CloudPose, we achieve an error reduction of around 50%. Compared to a state-of-the-art image-based method, we reduce the pose estimation error by around 96%.<\/jats:p>","DOI":"10.3390\/jimaging7050080","type":"journal-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T21:18:20Z","timestamp":1619558300000},"page":"80","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["From IR Images to Point Clouds to Pose: Point Cloud-Based AR Glasses Pose Estimation"],"prefix":"10.3390","volume":"7","author":[{"given":"Ahmet","family":"Firintepe","sequence":"first","affiliation":[{"name":"BMW Group Research, New Technologies, Innovations, 85748 Munich, Germany"},{"name":"Department of Informatics, University of Kaiserslautern, 67653 Kaiserslautern, Germany"}]},{"given":"Carolin","family":"Vey","sequence":"additional","affiliation":[{"name":"BMW Group Research, New Technologies, Innovations, 85748 Munich, Germany"},{"name":"Department of Data Science and Knowledge Engineering, Maastricht University, 6211 TE Maastricht, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4298-6870","authenticated-orcid":false,"given":"Stylianos","family":"Asteriadis","sequence":"additional","affiliation":[{"name":"Department of Data Science and Knowledge Engineering, Maastricht University, 6211 TE Maastricht, The Netherlands"}]},{"given":"Alain","family":"Pagani","sequence":"additional","affiliation":[{"name":"German Research Center for Artificial Intelligence (DFKI), 67653 Kaiserslautern, Germany"}]},{"given":"Didier","family":"Stricker","sequence":"additional","affiliation":[{"name":"Department of Informatics, University of Kaiserslautern, 67653 Kaiserslautern, Germany"},{"name":"German Research Center for Artificial Intelligence (DFKI), 67653 Kaiserslautern, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Peng, S., Liu, Y., Huang, Q., Zhou, X., and Bao, H. (2019, January 15\u201320). PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00469"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Tekin, B., Sinha, S.N., and Fua, P. (2018, January 18\u201323). Real-Time Seamless Single Shot 6D Object Pose Prediction. Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00038"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Xiang, Y., Schmidt, T., Narayanan, V., and Fox, D. (2018). PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. Robotics: Science and Systems (RSS), RSS.","DOI":"10.15607\/RSS.2018.XIV.019"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kehl, W., Manhardt, F., Tombari, F., Ilic, S., and Navab, N. (2017, January 22\u201329). SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again. Proceedings of the The IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.169"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Firintepe, A., Pagani, A., and Stricker, D. (2020, January 1\u20134). HMDPose: A large-scale trinocular IR Augmented Reality Glasses Pose Dataset. Proceedings of the 26th ACM Symposium on Virtual Reality Software and Technology, VRST \u201920, Virtual Event.","DOI":"10.1145\/3385956.3422121"},{"key":"ref_6","unstructured":"Berg, A., Oskarsson, M., and O\u2019Connor, M. (2020). Deep Ordinal Regression with Label Diversity. arXiv."},{"key":"ref_7","unstructured":"Gao, G., Lauri, M., Wang, Y., Hu, X., Zhang, J., and Frintrop, S. (August, January 31). 6D Object Pose Regression via Supervised Learning on Point Clouds. Proceedings of the ICRA, Paris, France."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Rambach, J., Deng, C., Pagani, A., and Stricker, D. (2018, January 16\u201320). Learning 6DoF Object Poses from Synthetic Single Channel Images. Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Munich, Germany.","DOI":"10.1109\/ISMAR-Adjunct.2018.00058"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201327). Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/0600000001","article-title":"Monocular model-based 3d tracking of rigid objects: A survey","volume":"Volume 1","author":"Lepetit","year":"2005","journal-title":"Foundations and Trends in Computer Graphics and Vision"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Rad, M., and Lepetit, V. (2017, January 22\u201329). BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.413"},{"key":"ref_12","unstructured":"Xu, Z., Chen, K., and Jia, K. (2019). W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Borghi, G., Venturelli, M., Vezzani, R., and Cucchiara, R. (2017, January 21\u201326). POSEidon: Face-From-Depth for Driver Pose Estimation. Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.583"},{"key":"ref_14","first-page":"596","article-title":"Face-from-Depth for Head Pose Estimation on Depth Images","volume":"42","author":"Guido","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wohlhart, P., and Lepetit, V. (2015, January 7\u201312). Learning descriptors for object recognition and 3D pose estimation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298930"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Kehl, W., Milletari, F., Tombari, F., Ilic, S., and Navab, N. (2016, January 8\u201310). Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation. Proceedings of the Computer Vision\u2013ECCV 2016, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_13"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Li, C., Bai, J., and Hager, G.D. (2018, January 8\u201314). A Unified Framework for Multi-View Multi-Class Object Pose Estimation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01270-0_16"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Wang, C., Xu, D., Zhu, Y., Mart\u00edn-Mart\u00edn, R., Lu, C., Fei-Fei, L., and Savarese, S. (2019, January 15\u201320). DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00346"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Charles, R.Q., Su, H., Kaichun, M., and Guibas, L.J. (2017, January 21\u201326). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.16"},{"key":"ref_20","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017, January 4\u20139). PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS\u201917, Long Beach, CA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 15\u201320). PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Gao, G., Lauri, M., Zhang, J., and Frintrop, S. (2018, January 8\u201314). Occlusion Resistant Object Rotation Regression from Point Cloud Segments. Proceedings of the Computer Vision\u2013ECCV 2018 Workshops, Munich, Germany.","DOI":"10.1007\/978-3-030-11009-3_44"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Hu, T., Jha, S., and Busso, C. (November, January 19). Robust Driver Head Pose Estimation in Naturalistic Conditions from Point-Cloud Data. Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA.","DOI":"10.1109\/IV47402.2020.9304592"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Xiao, S., Sang, N., Wang, X., and Ma, X. (2020, January 4\u20138). Leveraging Ordinal Regression With Soft Labels For 3d Head Pose Estimation From Point Sets. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053370"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, Y., Tu, Z., Ge, L., Zhang, D., Chen, R., and Yuan, J. (November, January 27). SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation With Semi-Supervised Learning. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00706"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ge, L., Ren, Z., and Yuan, J. (2018, January 8\u201314). Point-to-Point Regression PointNet for 3D Hand Pose Estimation. Proceedings of the Computer Vision\u2013 ECCV 2018, Munich, Germany.","DOI":"10.1109\/CVPR.2018.00878"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ge, L., Cai, Y., Weng, J., and Yuan, J. (2018, January 18\u201323). Hand PointNet: 3D Hand Pose Estimation Using Point Sets. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00878"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, Y., Snavely, N., Huttenlocher, D., and Fua, P. (2012, January 7\u201313). Worldwide Pose Estimation Using 3D Point Clouds. Proceedings of the Computer Vision\u2013ECCV 2012, Florence, Italy.","DOI":"10.1007\/978-3-642-33718-5_2"},{"key":"ref_29","unstructured":"Wang, L., Shi, Y., Li, X., and Fang, Y. (2020). Unsupervised Learning of Global Registration of Temporal Sequence of Point Clouds. arXiv."},{"key":"ref_30","unstructured":"Qi, C.R., Litany, O., He, K., and Guibas, L.J. (November, January 27). Deep hough voting for 3d object detection in point clouds. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., and Weinberger, K. (2019, January 16\u201320). Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. Proceedings of the CVPR, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00864"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"771","DOI":"10.1016\/S0167-8655(98)00057-9","article-title":"An Iterative Algorithm for Minimum Cross Entropy Thresholding","volume":"19","author":"Li","year":"1998","journal-title":"Pattern Recogn. Lett."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1016\/0031-3203(93)90115-D","article-title":"Minimum cross entropy thresholding","volume":"26","author":"Li","year":"1993","journal-title":"Pattern Recognit."},{"key":"ref_34","unstructured":"Insafutdinov, E., and Dosovitskiy, A. (2018, January 3\u20138). Unsupervised Learning of Shape and Pose with Differentiable Point Clouds. Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 18, Montreal, QC, Canada."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kendall, A., and Cipolla, R. (2017, January 21\u201326). Geometric loss functions for camera pose regression with deep learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.694"},{"key":"ref_36","unstructured":"Schwarz, A. (2018). Tiefen-basierte Bestimmung der Kopfposition und -orientierung im Fahrzeuginnenraum. [Ph.D. Thesis, Karlsruher Institut f\u00fcr Technologie (KIT)]."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Kendall, A., Grimes, M., and Cipolla, R. (2015, January 7\u201313). Posenet: A convolutional network for real-time 6-dof camera relocalization. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.336"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Schwarz, A., Haurilet, M., Martinez, M., and Stiefelhagen, R. (2017, January 21\u201326). DriveAHead-a large-scale driver head pose dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.155"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Selim, M., Firintepe, A., Pagani, A., and Stricker, D. (2020, January 27\u201329). AutoPOSE: Large-scale Automotive Driver Head Pose and Gaze Dataset with Deep Head Orientation Baseline. Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Valletta, Malta.","DOI":"10.5220\/0009330105990606"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Firintepe, A., Selim, M., Pagani, A., and Stricker, D. (November, January 19). The More, the Merrier? A Study on In-Car IR-based Head Pose Estimation. Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA.","DOI":"10.1109\/IV47402.2020.9304545"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/7\/5\/80\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:53:19Z","timestamp":1760161999000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/7\/5\/80"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,27]]},"references-count":41,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["jimaging7050080"],"URL":"https:\/\/doi.org\/10.3390\/jimaging7050080","relation":{},"ISSN":["2313-433X"],"issn-type":[{"type":"electronic","value":"2313-433X"}],"subject":[],"published":{"date-parts":[[2021,4,27]]}}}