{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T03:06:38Z","timestamp":1761102398628,"version":"build-2065373602"},"reference-count":61,"publisher":"MDPI AG","issue":"24","license":[{"start":{"date-parts":[[2019,12,4]],"date-time":"2019-12-04T00:00:00Z","timestamp":1575417600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Education, Korea","award":["NRF-2016R1D1A1B03933895"],"award-info":[{"award-number":["NRF-2016R1D1A1B03933895"]}]},{"DOI":"10.13039\/501100003052","name":"Ministry of Trade, Industry and Energy","doi-asserted-by":"publisher","award":["P0002397"],"award-info":[{"award-number":["P0002397"]}],"id":[{"id":"10.13039\/501100003052","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Facial landmark detection has gained enormous interest for face-related applications due to its success in facial analysis tasks such as facial recognition, cartoon generation, face tracking and facial expression analysis. Many studies have been proposed and implemented to deal with the challenging problems of localizing facial landmarks from given images, including large appearance variations and partial occlusion. Studies have differed in the way they use the facial appearances and shape information of input images. In our work, we consider facial information within both global and local contexts. We aim to obtain local pixel-level accuracy for local-context information in the first stage and integrate this with knowledge of spatial relationships between each key point in a whole image for global-context information in the second stage. Thus, the pipeline of our architecture consists of two main components: (1) a deep network for local-context subnet that generates detection heatmaps via fully convolutional DenseNets with additional kernel convolution filters and (2) a dilated skip convolution subnet\u2014a combination of dilated convolutions and skip-connections networks\u2014that are in charge of robustly refining the local appearance heatmaps. Through this proposed architecture, we demonstrate that our approach achieves state-of-the-art performance on challenging datasets\u2014including LFPW, HELEN, 300W and AFLW2000-3D\u2014by leveraging fully convolutional DenseNets, skip-connections and dilated convolution architecture without further post-processing.<\/jats:p>","DOI":"10.3390\/s19245350","type":"journal-article","created":{"date-parts":[[2019,12,5]],"date-time":"2019-12-05T03:16:36Z","timestamp":1575515796000},"page":"5350","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Dilated Skip Convolution for Facial Landmark Detection"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3773-0491","authenticated-orcid":false,"given":"Seyha","family":"Chim","sequence":"first","affiliation":[{"name":"School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1587-7684","authenticated-orcid":false,"given":"Jin-Gu","family":"Lee","sequence":"additional","affiliation":[{"name":"School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ho-Hyun","family":"Park","sequence":"additional","affiliation":[{"name":"School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,12,4]]},"reference":[{"key":"ref_1","unstructured":"Wu, Y., and Ji, Q. (2017). Facial Landmark Detection: A Literature Survey. Int. J. Comput. Vis., 1\u201328."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1548","DOI":"10.1109\/TPAMI.2016.2515606","article-title":"Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications","volume":"38","author":"Corneanu","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"668","DOI":"10.1016\/j.neucom.2017.08.015","article-title":"Intelligent facial emotion recognition based on stationary wavelet entropy and Jaya algorithm","volume":"272","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"8375","DOI":"10.1109\/ACCESS.2016.2628407","article-title":"Facial Emotion Recognition Based on Biorthogonal Wavelet Entropy, Fuzzy Support Vector Machine, and Stratified Cross Validation","volume":"4","author":"Zhang","year":"2016","journal-title":"IEEE Access"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Xiong, X., and De la Torre, F. (2013, January 23\u201328). Supervised Descent Method and Its Applications to Face Alignment. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.75"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1016\/j.patcog.2017.09.006","article-title":"Gaussian mixture 3D morphable face model","volume":"74","author":"Koppen","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sinha, G., Shahi, R., and Shankar, M. (2010, January 19\u201321). Human Computer Interaction. Proceedings of the 2010 3rd International Conference on Emerging Trends in Engineering and Technology, Goa, India.","DOI":"10.1109\/ICETET.2010.85"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Bulat, A., and Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230, 000 3D facial landmarks). CoRR.","DOI":"10.1109\/ICCV.2017.116"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2409","DOI":"10.1109\/TIFS.2018.2800901","article-title":"Combining data-driven and model-driven methods for robust facial landmark detection","volume":"13","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shi, H., and Wang, Z. (2019, January 2\u20135). Improved Stacked Hourglass Network with Offset Learning for Robust Facial Landmark Detection. Proceedings of the 2019 9th International Conference on Information Science and Technology (ICIST), Hulunbuir, China.","DOI":"10.1109\/ICIST.2019.8836739"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1007\/s11263-013-0667-3","article-title":"Face alignment by explicit shape regression","volume":"107","author":"Cao","year":"2014","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","unstructured":"Luo, P., Wang, X., and Tang, X. (2012, January 16\u201321). Hierarchical face parsing via deep learning. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wu, W., and Yang, S. (2017, January 21\u201326). Leveraging Intra and Inter-Dataset Variations for Robust Face Alignment. Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.261"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Fischer, P., Dosovitskiy, A., and Brox, T. (2014). Descriptor Matching with Convolutional Neural Networks: A Comparison to SIFT. arXiv.","DOI":"10.1109\/CVPR.2015.7298761"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wu, Y., Wang, Z., and Ji, Q. (2013, January 23\u201328). Facial feature tracking under varying facial expressions and face poses based on restricted boltzmann machines. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.443"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1109\/TCSVT.2016.2645723","article-title":"Deep recurrent regression for facial landmark detection","volume":"28","author":"Lai","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Belagiannis, V., and Zisserman, A. (June, January 30). Recurrent human pose estimation. Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA.","DOI":"10.1109\/FG.2017.64"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Merget, D., Rock, M., and Rigoll, G. (2018, January 18\u201323). Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00088"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"J\u00e9gou, S., Drozdzal, M., Vazquez, D., Romero, A., and Bengio, Y. (2017, January 21\u201326). The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.156"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2930","DOI":"10.1109\/TPAMI.2013.23","article-title":"Localizing Parts of Faces Using a Consensus of Exemplars","volume":"35","author":"Belhumeur","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., and Schmid, C. (2012). Interactive Facial Feature Localization. Computer Vision\u2014ECCV 2012, Springer.","DOI":"10.1007\/978-3-642-33709-3"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., and Pantic, M. (2013, January 2\u20138). 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge. Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Sydney, Australia.","DOI":"10.1109\/ICCVW.2013.59"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhu, X., Lei, Z., Liu, X., Shi, H., and Li, S.Z. (2015). Face Alignment Across Large Poses: A 3D Solution. CoRR.","DOI":"10.1109\/CVPR.2016.23"},{"key":"ref_24","unstructured":"Richard, C., Wilson, E.R.H., and Smith, W.A.P. (2016). Convolutional aggregation of local evidence for large pose face alignment. Proceedings of the British Machine Vision Conference (BMVC), 19\u201322 September 2016, BMVA Press."},{"key":"ref_25","unstructured":"Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T. (2014). Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment. Computer Vision\u2013ECCV 2014, Springer International Publishing."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Xu, X., and Kakadiaris, I.A. (June, January 30). Joint Head Pose Estimation and Face Alignment Framework Using Global and Local CNN Features. Proceedings of the 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), Washington, DC, USA.","DOI":"10.1109\/FG.2017.81"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.image.2019.01.005","article-title":"Face analysis through semantic face segmentation","volume":"74","author":"Benini","year":"2019","journal-title":"Signal Process. Image Commun."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"103881","DOI":"10.1016\/j.dib.2019.103881","article-title":"FASSEG: A FAce semantic SEGmentation repository for face image analysis","volume":"24","author":"Benini","year":"2019","journal-title":"Data Brief"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Toshev, A., and Szegedy, C. (2013). DeepPose: Human Pose Estimation via Deep Neural Networks. CoRR.","DOI":"10.1109\/CVPR.2014.214"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016). Stacked Hourglass Networks for Human Pose Estimation. arXiv.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_31","unstructured":"Jain, A., Tompson, J., LeCun, Y., and Bregler, C. (2014). MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., and Weinberger, K.Q. (2016). Densely Connected Convolutional Networks. arXiv.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_33","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv."},{"key":"ref_34","unstructured":"Nair, V., and Hinton, G.E. (2010, January 21\u201324). Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, Israel."},{"key":"ref_35","unstructured":"Yu, F., and Koltun, V. (2015). Multi-Scale Context Aggregation by Dilated Convolutions. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Ji, S. (2018). Smoothed Dilated Convolutions for Improved Dense Prediction. CoRR.","DOI":"10.1145\/3219819.3219944"},{"key":"ref_37","unstructured":"Van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A.W., and Kavukcuoglu, K. (2016). WaveNet: A Generative Model for Raw Audio. arXiv."},{"key":"ref_38","unstructured":"Kalchbrenner, N., van den Oord, A., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A., and Kavukcuoglu, K. (2016). Video Pixel Networks. arXiv."},{"key":"ref_39","unstructured":"Kalchbrenner, N., Espeholt, L., Simonyan, K., van den Oord, A., Graves, A., and Kavukcuoglu, K. (2016). Neural Machine Translation in Linear Time. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pfister, T., Charles, J., and Zisserman, A. (2015). Flowing ConvNets for Human Pose Estimation in Videos. arXiv.","DOI":"10.1109\/ICCV.2015.222"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_42","unstructured":"Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017, January 8). Automatic differentiation in PyTorch. Proceedings of the Neural Information Processing Systems (NIPS 2017) Workshop on Autodiff, Long Beach, CA, USA."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Burgos-Artizzu, X.P., Perona, P., and Doll\u00e1r, P. (2013, January 1\u20138). Robust face landmark estimation under occlusion. Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia.","DOI":"10.1109\/ICCV.2013.191"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet Classification with Deep Convolutional Neural Networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_45","first-page":"2627","article-title":"Convolutional Kernel Networks","volume":"Volume 2","author":"Ghahramani","year":"2014","journal-title":"Advances in Neural Information Processing Systems 27, 8\u201313 December 2014"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"3360","DOI":"10.1007\/s10489-018-1150-1","article-title":"Skip-connection convolutional neural network for still image crowd counting","volume":"48","author":"Wang","year":"2018","journal-title":"Appl. Intell."},{"key":"ref_47","unstructured":"Zhu, X., and Ramanan, D. (2012, January 16\u201321). Face detection, pose estimation, and landmark localization in the wild. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., and Pantic, M. (2013, January 23\u201328). A Semi-automatic Methodology for Facial Landmark Annotation. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, USA.","DOI":"10.1109\/CVPRW.2013.132"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"K\u00f6stinger, M., Wohlhart, P., Roth, P.M., and Bischof, H. (2011, January 6\u201313). Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization. Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130513"},{"key":"ref_50","first-page":"26","article-title":"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude","volume":"4","author":"Tieleman","year":"2012","journal-title":"COURSERA Neural Netw. Mach. Learn."},{"key":"ref_51","unstructured":"Zhu, S., Li, C., Loy, C.C., and Tang, X. (2015, January 7\u201312). Face alignment by coarse-to-fine shape searching. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Chen, X., Zhou, E., Mo, Y., Liu, J., and Cao, Z. (2017, January 21\u201326). Delving Deep Into Coarse-To-Fine Framework for Facial Landmark Localization. Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.260"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Kowalski, M., Naruniec, J., and Trzcinski, T. (2017). Deep Alignment Network: A convolutional neural network for robust face alignment. arXiv.","DOI":"10.1109\/CVPRW.2017.254"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Tzimiropoulos, G., and Pantic, M. (2014, January 23\u201328). Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.239"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Yang, J., Liu, Q., and Zhang, K. (2017, January 21\u201326). Stacked Hourglass Network for Robust Facial Landmark Localisation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.253"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Wei, S., Ramakrishna, V., Kanade, T., and Sheikh, Y. (2016). Convolutional Pose Machines. arXiv.","DOI":"10.1109\/CVPR.2016.511"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Liu, Y., Jourabloo, A., Ren, W., and Liu, X. (2017). Dense Face Alignment. arXiv.","DOI":"10.1109\/ICCVW.2017.190"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Ren, S., Cao, X., Wei, Y., and Sun, J. (2017, January 21\u201326). Face Alignment at 3000 FPS via Regressing Local Binary Features. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2014.218"},{"key":"ref_59","unstructured":"Leibe, B., Matas, J., Sebe, N., and Welling, M. (2016). Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks. Computer Vision\u2014ECCV 2016, Springer International Publishing."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Bhagavatula, C., Zhu, C., Luu, K., and Savvides, M. (2017). Faster Than Real-time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses. arXiv.","DOI":"10.1109\/ICCV.2017.429"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Yan, J., Lei, Z., Yi, D., and Li, S.Z. (2013, January 2\u20138). Learn to Combine Multiple Hypotheses for Accurate Face Alignment. Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Sydney, Australia.","DOI":"10.1109\/ICCVW.2013.126"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/24\/5350\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:40:12Z","timestamp":1760190012000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/24\/5350"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,4]]},"references-count":61,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2019,12]]}},"alternative-id":["s19245350"],"URL":"https:\/\/doi.org\/10.3390\/s19245350","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2019,12,4]]}}}