{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T18:43:04Z","timestamp":1773772984323,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T00:00:00Z","timestamp":1653955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Age estimation from human faces is an important yet challenging task in computer vision because of the large differences between physical age and apparent age. Due to the differences including races, genders, and other factors, the performance of a learning method for this task strongly depends on the training data. Although many inspiring works have focused on the age estimation of a single human face through deep learning, the existing methods still have lower performance when dealing with faces in videos because of the differences in head pose between frames, which can lead to greatly different results. In this paper, a combined system of age estimation and head pose estimation is proposed to improve the performance of age estimation from faces in videos. We use deep regression forests (DRFs) to estimate the age of facial images, while a multiloss convolutional neural network is also utilized to estimate the head pose. Accordingly, we estimate the age of faces only for head poses within a set degree threshold to enable value refinement. First, we divided the images in the Cross-Age Celebrity Dataset (CACD) and the Asian Face Age Dataset (AFAD) according to the estimated head pose degrees and generated separate age estimates for images with different poses. The experimental results showed that the accuracy of age estimation from frontal facial images was better than that for faces at different angles, thus demonstrating the effect of head pose on age estimation. Further experiments were conducted on several videos to estimate the age of the same person with his or her face at different angles, and the results show that our proposed combined system can provide more precise and reliable age estimates than a system without head pose estimation.<\/jats:p>","DOI":"10.3390\/s22114171","type":"journal-article","created":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T05:25:42Z","timestamp":1653974742000},"page":"4171","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Age Estimation of Faces in Videos Using Head Pose Estimation and Convolutional Neural Networks"],"prefix":"10.3390","volume":"22","author":[{"given":"Beichen","family":"Zhang","sequence":"first","affiliation":[{"name":"Visual Media Laboratory, Department of Information Science, Tokyo City University, Tokyo 1588557, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yue","family":"Bao","sequence":"additional","affiliation":[{"name":"Visual Media Laboratory, Department of Information Science, Tokyo City University, Tokyo 1588557, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1109\/TSMCB.2003.817091","article-title":"Comparing different classifiers for automatic age estimation","volume":"34","author":"Lanitis","year":"2004","journal-title":"IEEE Trans. Syst. Man Cybern. Part Cybern."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Han, H., Otto, C., and Jain, A.K. (2013, January 4\u20137). Age estimation from face images: Human vs. machine performance. Proceedings of the ICB, Madrid, Spain.","DOI":"10.1109\/ICB.2013.6613022"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Geng, X., Zhou, Z.-H., Zhang, Y., Li, G., and Dai, H. (2006, January 23\u201327). Learning from facial aging patterns for automatic age estimation. Proceedings of the ACM International Conference on Multimedia, Santa Barbara, CA, USA.","DOI":"10.1145\/1180639.1180711"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1109\/34.993553","article-title":"Toward automatic simulation of aging effects on face images","volume":"24","author":"Lanitis","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Song, Z., Ni, B., Guo, D., Sim, T., and Yan, S. (2011, January 6\u201313). Learning universal multi-view age estimator using video context. Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126248"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Shan, C., Porikli, F., Xiang, T., and Gong, S. (2012). Video Analytics for Business Intelligence. Studies in Computational Intelligence, Springer.","DOI":"10.1007\/978-3-642-28598-1"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1148","DOI":"10.1109\/TPAMI.2014.2362759","article-title":"Demographic estimation from face images: Human vs. machine performance","volume":"37","author":"Han","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Guo, G., Mu, G., Fu, Y., and Huang, T. (2009, January 20\u201325). Human age estimation using bio-inspired features. Proceedings of the IEEE CVPR, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206681"},{"key":"ref_9","first-page":"3349","article-title":"Age progression in human faces: A survey","volume":"15","author":"Ramanathan","year":"2009","journal-title":"J. Vis. Lang. Comput."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chen, S., Zhang, C., Dong, M., Le, J., and Rao, M. (2017, January 22\u201329). Using ranking-CNN for age estimation. Proceedings of the IEEE ICCV, Venice, Italy.","DOI":"10.1109\/CVPR.2017.86"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Shen, W., Guo, Y., Wang, Y., Zhao, K., Wang, B., and Yuille, A. (2018, January 18\u201323). Deep Regression Forests for Age Estimation. Proceedings of the IEEE CVPR, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00245"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Niu, Z., Zhou, M., Wang, L., Gao, X., and Hua, G. (2016, January 27\u201330). Ordinal regression with multiple output cnn for age estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.532"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ruiz, N., Chong, E., and Rehg, J.M. (2018, January 18\u201322). Fine-grained head pose estimation without key-points. Proceedings of the CVPR Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00281"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhu, X., Lei, Z., Liu, X., Shi, H., and Li, S.Z. (2016, January 27\u201330). Face alignment across large poses: A 3d solution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.23"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1109\/TMM.2015.2420374","article-title":"Face recognition and retrieval using cross-age reference coding with cross-age celebrity dataset","volume":"17","author":"Chen","year":"2015","journal-title":"IEEE Trans. Multimed."},{"key":"ref_16","unstructured":"Kwonand, Y., and Lobo, N. (1994, January 21\u201323). Age classification from facial images. Proceedings of the IEEE CVPR, Seattle, WA, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yi, D., Lei, Z., and Li, S.Z. (2015, January 7\u201313). Age estimation by multi-scale convolutional network. Proceedings of the IEEE ICCV, Santiago, Chile.","DOI":"10.1007\/978-3-319-16811-1_10"},{"key":"ref_18","first-page":"1","article-title":"Deep expectation of real and apparent age from a single image without facial landmarks","volume":"126","author":"Rothe","year":"2016","journal-title":"Int. J. Comput. Vis."},{"key":"ref_19","unstructured":"Wang, F., Han, H., Shan, S., and Chen, X. (June, January 30). Multi-task learning for joint prediction of heterogeneous face attributes. Proceedings of the IEEE FG, Washington, DC, USA."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2597","DOI":"10.1109\/TPAMI.2017.2738004","article-title":"Heterogeneous face attribute estimation: A deep multi-task learning approach","volume":"40","author":"Han","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ji, Z., Lang, C., Li, K., and Xing, J. (2018, January 20\u201324). Deep Age Estimation Model Stabilization from Images to Videos. Proceedings of the International Conference on Pattern Recognition, Beijing, China.","DOI":"10.1109\/ICPR.2018.8545283"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1972","DOI":"10.1109\/TIP.2019.2948288","article-title":"Attended End-to-End Architecture for Age Estimation From Facial Expression Videos","volume":"29","author":"Pei","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_23","unstructured":"Zhu, X., and Ramanan, D. (2012, January 16\u201321). Face detection, pose estimation, and landmark localization in the wild. Proceedings of the IEEE CVPR, Providence, RI, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Belhumeur, P.N., Jacobs, D.W., Kriegman, D., and Kumar, N. (2011, January 20\u201325). Localizing parts of faces using a consensus of exemplars. Proceedings of the IEEE CVPR, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995602"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhou, E., Fan, H., Cao, Z., Jiang, Y., and Yin, Q. (2013, January 2\u20138). Extensive facial landmark localization with coarse-to-fine convolutional network cascade. Proceedings of the IEEE ICCVW, Sydney, NSW, Australia.","DOI":"10.1109\/ICCVW.2013.58"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., and Pantic, M. (2013, January 2\u20138). 300 faces in-the-wild challenge: The first facial landmark localization challenge. Proceedings of the IEEE ICCVW, Sydney, NSW, Australia.","DOI":"10.1109\/ICCVW.2013.59"},{"key":"ref_27","unstructured":"Messer, K., Matas, J., Kittler, J., Luettin, J., and Maitre, G. (1999, January 22\u201324). XM2VTSDB: The extended M2VTS database. Proceedings of the Second International Conference on Audio and Video-Based Biometric Person Authentication, Washington, DC, USA."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1499","DOI":"10.1109\/LSP.2016.2603342","article-title":"Joint face detection and alignment using multitask cascaded convolutional networks","volume":"23","author":"Zhang","year":"2016","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_29","unstructured":"Simonyan, K., Zisserman, A., and Very Deep Convolutional NETWORKS for large-Scale Image Recognition (2021, September 30). CoRR abs\/1409.1556. Available online: https:\/\/arxiv.org\/abs\/1409.1556."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis. (IJCV)"},{"key":"ref_31","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bulat, A., and Tzimiropoulos, G. (2017, January 22\u201329). How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). Proceedings of the International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.116"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kazemi, V., and Sullivan, J. (2014, January 23\u201328). One millisecond face alignment with an ensemble of regression trees. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.241"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"142632","DOI":"10.1109\/ACCESS.2021.3120098","article-title":"Attention Span Prediction Using Head-Pose Estimation With Deep Neural Networks","volume":"9","author":"Singh","year":"2021","journal-title":"IEEE Access"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/11\/4171\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:22:24Z","timestamp":1760138544000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/11\/4171"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,31]]},"references-count":35,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["s22114171"],"URL":"https:\/\/doi.org\/10.3390\/s22114171","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,31]]}}}