{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:00:14Z","timestamp":1760241614978,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2018,6,17]],"date-time":"2018-06-17T00:00:00Z","timestamp":1529193600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Modern facial motion capture systems employ a two-pronged approach for capturing and rendering facial motion. Visual data (2D) is used for tracking the facial features and predicting facial expression, whereas Depth (3D) data is used to build a series of expressions on 3D face models. An issue with modern research approaches is the use of a single data stream that provides little indication of the 3D facial structure. We compare and analyse the performance of Convolutional Neural Networks (CNN) using visual, Depth and merged data to identify facial features in real-time using a Depth sensor. First, we review the facial landmarking algorithms and its datasets for Depth data. We address the limitation of the current datasets by introducing the Kinect One Expression Dataset (KOED). Then, we propose the use of CNNs for the single data stream and merged data streams for facial landmark detection. We contribute to existing work by performing a full evaluation on which streams are the most effective for the field of facial landmarking. Furthermore, we improve upon the existing work by extending neural networks to predict into 3D landmarks in real-time with additional observations on the impact of using 2D landmarks as auxiliary information. We evaluate the performance by using Mean Square Error (MSE) and Mean Average Error (MAE). We observe that the single data stream predicts accurate facial landmarks on Depth data when auxiliary information is used to train the network. The codes and dataset used in this paper will be made available.<\/jats:p>","DOI":"10.3390\/sym10060230","type":"journal-article","created":{"date-parts":[[2018,6,18]],"date-time":"2018-06-18T10:57:11Z","timestamp":1529319431000},"page":"230","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Towards Real-Time Facial Landmark Detection in Depth Data Using Auxiliary Information"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3623-6598","authenticated-orcid":false,"given":"Connah","family":"Kendrick","sequence":"first","affiliation":[{"name":"Visual Computing Lab, School of Computing, Mathematics and Digital Technology, Manchester Metropolitan University, Chester Street, Manchester M1 5GD, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kevin","family":"Tan","sequence":"additional","affiliation":[{"name":"Visual Computing Lab, School of Computing, Mathematics and Digital Technology, Manchester Metropolitan University, Chester Street, Manchester M1 5GD, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kevin","family":"Walker","sequence":"additional","affiliation":[{"name":"Image Metrics Ltd., Manchester M1 3HZ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7681-4287","authenticated-orcid":false,"given":"Moi Hoon","family":"Yap","sequence":"additional","affiliation":[{"name":"Visual Computing Lab, School of Computing, Mathematics and Digital Technology, Manchester Metropolitan University, Chester Street, Manchester M1 5GD, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,6,17]]},"reference":[{"key":"ref_1","unstructured":"Vicon Motion Systems Ltd (2016). Capture Systems, Vicon Motion Systems Ltd."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1499","DOI":"10.1109\/LSP.2016.2603342","article-title":"Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks","volume":"23","author":"Zhang","year":"2016","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_3","unstructured":"Ranjan, R., Patel, V.M., and Chellappa, R. (2016). HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Trans. Pattern Anal. Mach. Intell., 1."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bui, H.M., Lech, M., Cheng, E., Neville, K., and Burnett, I.S. (2016, January 27\u201329). Using grayscale images for object recognition with convolutional-recursive neural network. Proceedings of the 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), Ha Long, Vietnam.","DOI":"10.1109\/CCE.2016.7562656"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhou, E., Fan, H., Cao, Z., Jiang, Y., and Yin, Q. (2013, January 2\u20138). Extensive facial landmark localization with coarse-to-fine convolutional network cascade. Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Sydney, NSW, Australia.","DOI":"10.1109\/ICCVW.2013.58"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jourabloo, A., and Liu, X. (2016, January 27\u201330). Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.454"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"321","DOI":"10.4236\/jsea.2012.55038","article-title":"Face recognition in the presence of expressions","volume":"5","author":"Han","year":"2012","journal-title":"J. Softw. Eng. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Faceware Technologies Inc (2015). Faceware, Faceware Technologies Inc.","DOI":"10.1016\/S1365-6937(15)30249-5"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1016\/S0969-4765(18)30038-9","article-title":"Advances in facial landmark detection","volume":"2018","author":"Feng","year":"2018","journal-title":"Biom. Technol. Today"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1109\/CVPR.2005.268","article-title":"Overview of the Face Recognition Grand Challenge","volume":"Volume 1","author":"Phillips","year":"2005","journal-title":"Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905)"},{"key":"ref_11","unstructured":"Microsoft (2013). Microsoft Kinect, Microsoft."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1007\/978-3-319-10599-4_7","article-title":"Facial landmark detection by deep multi-task learning","volume":"Volume 8694","author":"Zhang","year":"2014","journal-title":"Lecture Notes in Computer Science"},{"key":"ref_13","unstructured":"Hand, E.M., and Chellappa, R. (arXiv, 2016). Attributes for Improved Attributes: A Multi-Task Network for Attribute Classification, arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"918","DOI":"10.1109\/TPAMI.2015.2469286","article-title":"Learning Deep Representation for Face Alignment with Auxiliary Attributes","volume":"38","author":"Zhang","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Sun, Y., Wang, X., and Tang, X. (2013, January 23\u201328). Deep convolutional network cascade for facial point detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.446"},{"key":"ref_16","unstructured":"Lai, H., Xiao, S., Pan, Y., Cui, Z., Feng, J., Xu, C., Yin, J., and Yan, S. (arXiv, 2015). Deep Recurrent Regression for Facial Landmark Detection, arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1666","DOI":"10.1109\/TIP.2017.2657118","article-title":"Learning Deep Sharable and Structural Detectors for Face Alignment","volume":"26","author":"Liu","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1109\/72.265960","article-title":"An evolutionary algorithm that constructs recurrent neural networks","volume":"5","author":"Angeline","year":"1994","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Dibeklioglu, H., Salah, A.A., and Akarun, L. (October, January 29). 3D Facial Landmarking under Expression, Pose, and Occlusion Variations. Proceedings of the 2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems, Arlington, VA, USA.","DOI":"10.1109\/BTAS.2008.4699324"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1109\/TMM.2009.2017629","article-title":"3-D Face Detection, Landmark Localization, and Registration Using a Point Distribution Model","volume":"11","author":"Nair","year":"2009","journal-title":"IEEE Trans. Multimed."},{"key":"ref_22","unstructured":"Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A.Y. (2014, January 3\u20137). Multimodal Deep Learning. Proceedings of the 28th International Conference on Machine Learning (ICML), Orlando, FL, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Park, E., Han, X., Tamara, L., and Berg, A.C. (2016, January 7\u20139). Combining Multiple Sources of Knowledege in Deep CNNs for Action Recognition. Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA.","DOI":"10.1109\/WACV.2016.7477589"},{"key":"ref_24","unstructured":"Socher, R., and Huval, B. (2012). Convolutional-recursive deep learning for 3D object classification. Advances in Neural Information Processing Systems 9: Proceedings of the 1996 Conference, MIT Press Ltd."},{"key":"ref_25","first-page":"1493","article-title":"Learning discriminative representations from RGB-D video data","volume":"1","author":"Liu","year":"2013","journal-title":"IJCAI"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1109\/TVCG.2013.249","article-title":"FaceWarehouse: A 3D facial expression database for visual computing","volume":"20","author":"Cao","year":"2014","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_27","unstructured":"Microsoft (2010). Microsoft Kinect 360, Microsoft."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1007\/s11263-012-0549-0","article-title":"Random Forests for Real Time 3D Face Analysis","volume":"101","author":"Fanelli","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1534","DOI":"10.1109\/TSMC.2014.2331215","article-title":"KinectFaceDB: A Kinect Face Database for Face Recognition","volume":"44","author":"Min","year":"2014","journal-title":"IEEE Trans. Syst. Man Cybern. A"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Hg, R.I., Jasek, P., Rofidal, C., Nasrollahi, K., Moeslund, T.B., and Tranchet, G. (2012, January 25\u201329). An RGB-D database using microsoft\u2019s kinect for windows for face detection. Proceedings of the 2012 8th International Conference on Signal Image Technology and Internet Based Systems, (SITIS\u20192012), Naples, Italy.","DOI":"10.1109\/SITIS.2012.17"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Erdogmus, N., and Marcel, S. (October, January 29). Spoofing in 2D face recognition with 3D masks and anti-spoofing with Kinect. Proceedings of the 2013 IEEE 6th International Conference on Biometrics: Theory, Applications and Systems (BTAS), Arlington, VA, USA.","DOI":"10.1109\/BTAS.2013.6712688"},{"key":"ref_32","first-page":"189","article-title":"The Application of Neural Networks for Facial Landmarking on Mobile Devices","volume":"Volume 4","author":"Kendrick","year":"2018","journal-title":"Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP)"},{"key":"ref_33","first-page":"265","article-title":"TensorFlow: A System for Large-Scale Machine Learning","volume":"16","author":"Abadi","year":"2016","journal-title":"Osdi"},{"key":"ref_34","unstructured":"Chollet, F. (2018, June 15). Keras. Available online: https:\/\/keras.io\/."},{"key":"ref_35","unstructured":"Kingma, D.P., and Ba, J.L. (arXiv, 2015). Adam: A Method for Stochastic Optimization, arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Kendrick, C., Tan, K., Williams, T., and Yap, M.H. (June, January 30). An Online Tool for the Annotation of 3D Models. Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA.","DOI":"10.1109\/FG.2017.52"},{"key":"ref_37","unstructured":"Intel (2016). RealSense SR300, Intel."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"4508","DOI":"10.1109\/JSEN.2017.2703829","article-title":"On the Performance of the Intel SR300 Depth Camera: Metrological and Critical Characterization","volume":"17","author":"Carfagni","year":"2017","journal-title":"IEEE Sens. J."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/10\/6\/230\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:09:07Z","timestamp":1760195347000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/10\/6\/230"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,17]]},"references-count":38,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2018,6]]}},"alternative-id":["sym10060230"],"URL":"https:\/\/doi.org\/10.3390\/sym10060230","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2018,6,17]]}}}