{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T17:30:04Z","timestamp":1774373404898,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T00:00:00Z","timestamp":1771977600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62507035"],"award-info":[{"award-number":["62507035"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62307034"],"award-info":[{"award-number":["62307034"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>In gaze estimation, existing mainstream methods face significant challenges in capturing the fine-grained structures of eye regions, particularly in the absence of explicit geometric prior information, which hampers gaze prediction accuracy. To address this limitation, we propose the landmark-guided gaze estimation network (LGNet), a gaze estimation method guided by keypoints, which effectively incorporates geometric prior information to enhance estimation performance. The proposed method begins by training an eye-keypoint generator on the synthetic UnityEyes dataset using a Conditional Variational Autoencoder (CVAE). Next, we introduce a Symmetric Spatial Feature Fusion module (SSFF), combined with a dual-stream cross-attention mechanism, to achieve semantic alignment between the keypoint features and the facial image features extracted using ResNet50. Furthermore, we propose a Gated Channel Reweighting module (GCR) to suppress redundant information and amplify the critical features, thereby enhancing the model\u2019s overall response. Experimental results demonstrate that LGNet outperforms existing methods on three benchmark datasets. The code for this research has been made publicly available.<\/jats:p>","DOI":"10.3390\/info17030224","type":"journal-article","created":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T09:31:32Z","timestamp":1772098292000},"page":"224","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Landmark-Guided Gaze Estimation via Conditional Keypoint Generation and Cross-Attention Fusion"],"prefix":"10.3390","volume":"17","author":[{"given":"Guanghui","family":"Xu","sequence":"first","affiliation":[{"name":"School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan 430068, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8563-0094","authenticated-orcid":false,"given":"Xiaoyang","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan 430068, China"},{"name":"School of Information Management, Wuhan University, Wuhan 430072, China"}]},{"given":"Wanli","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Information Management, Wuhan University, Wuhan 430072, China"}]},{"given":"Zhongjie","family":"Mao","sequence":"additional","affiliation":[{"name":"School of Big Data and Artificial Intelligence, Chizhou University, Chizhou 247000, China"},{"name":"Wuhan HomeLightyear Technology Co., Ltd., Wuhan 430061, China"}]},{"given":"Yue","family":"Li","sequence":"additional","affiliation":[{"name":"School of Information Management, Wuhan University, Wuhan 430072, China"},{"name":"Department of Intelligent Construction, School of Civil Engineering, Wuhan University, Wuhan 430072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2902-7365","authenticated-orcid":false,"given":"Duantengchuan","family":"Li","sequence":"additional","affiliation":[{"name":"School of Information Management, Wuhan University, Wuhan 430072, China"}]},{"given":"Liangshan","family":"Dong","sequence":"additional","affiliation":[{"name":"School of Physical Education, China University of Geosciences, Wuhan 430074, China"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Steil, J., Hagestedt, I., Huang, M.X., and Bulling, A. (2019, January 25\u201328). Privacy-aware eye tracking using differential privacy. Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, Denver, CO, USA.","DOI":"10.1145\/3314111.3319915"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"101605","DOI":"10.1016\/j.jksuci.2023.101605","article-title":"Gcanet: Geometry cues-aware facial expression recognition based on graph convolutional networks","volume":"35","author":"Wang","year":"2023","journal-title":"J. King Saud-Univ.-Comput. Inf. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"101869","DOI":"10.1016\/j.jksuci.2023.101869","article-title":"Dadl: Double asymmetric distribution learning for head pose estimation in wisdom museum","volume":"36","author":"Zhao","year":"2024","journal-title":"J. King Saud-Univ.-Comput. Inf. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2449","DOI":"10.1109\/TMM.2021.3081873","article-title":"MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation","volume":"24","author":"Liu","year":"2021","journal-title":"IEEE Trans. Multimed."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"111892","DOI":"10.1016\/j.engappai.2025.111892","article-title":"Multi-task driver gaze estimation in real world driving scenes","volume":"160","author":"Wu","year":"2025","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wood, E., Baltru\u0161aitis, T., Morency, L.-P., Robinson, P., and Bulling, A. (2016, January 14\u201317). Learning an appearance-based gaze estimator from one million synthesised images. Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, Charleston, SC, USA.","DOI":"10.1145\/2857491.2857492"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1109\/TPAMI.2017.2778103","article-title":"Mpiigaze: Real-world dataset and deep appearance-based gaze estimation","volume":"41","author":"Zhang","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhang, X., Sugano, Y., Fritz, M., and Bulling, A. (2017, January 21\u201326). It\u2019s written all over your face: Full-face appearance-based gaze estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.284"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Funes Mora, K.A., Monay, F., and Odobez, J.-M. (2014, January 26\u201328). Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras. Proceedings of the Symposium on Eye Tracking Research and Applications, Safety Harbor, FL, USA.","DOI":"10.1145\/2578153.2578190"},{"key":"ref_10","unstructured":"Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., and Torralba, A. (November, January 27). Gaze360: Physically unconstrained gaze estimation in the wild. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_11","unstructured":"Zhang, X., Park, S., Beeler, T., Bradley, D., Tang, S., and Hilliges, O. (2020). Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, 23\u201328 August 2020, Springer. Proceedings, Part V 16, 2020."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Alberto Funes Mora, K., and Odobez, J.-M. (2014, January 23\u201328). Geometric generative gaze estimation (g3e) for remote rgb-d cameras. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.229"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Cheng, Y., Lu, F., and Zhang, X. (2018, January 8\u201314). Appearance-based gaze estimation via evaluation-guided asymmetric regression. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_7"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Park, S., Spurr, A., and Hilliges, O. (2018, January 8\u201314). Deep pictorial gaze estimation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_44"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Park, S., Zhang, X., Bulling, A., and Hilliges, O. (2018, January 14\u201317). Learning to find eye region landmarks for remote gaze estimation in unconstrained settings. Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, Warsaw, Poland.","DOI":"10.1145\/3204493.3204545"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, X., Sugano, Y., Fritz, M., and Bulling, A. (2015, January 7\u201312). Appearance-based gaze estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299081"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2592","DOI":"10.1109\/TCYB.2023.3312392","article-title":"Gaze estimation by attention-induced hierarchical variational auto-encoder","volume":"54","author":"Huang","year":"2023","journal-title":"IEEE Trans. Cybern."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"111536","DOI":"10.1016\/j.patcog.2025.111536","article-title":"Adgaze: Anisotropic gaussian label distribution learning for fine-grained gaze estimation","volume":"164","author":"Li","year":"2025","journal-title":"Pattern Recognit."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"110760","DOI":"10.1016\/j.patcog.2024.110760","article-title":"Cascaded learning with transformer for simultaneous eye landmark, eye state and gaze estimation","volume":"156","author":"Gou","year":"2024","journal-title":"Pattern Recognit."},{"key":"ref_20","first-page":"3483","article-title":"Learning structured output representation using deep conditional generative models","volume":"28","author":"Sohn","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_21","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv, Available online: https:\/\/arxiv.org\/abs\/1312.6114."},{"key":"ref_22","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2015: 18th International Conference, Munich, Germany, 5\u20139 October 2015, Springer. Proceedings, Part III 18."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zheng, C., Mendieta, M., and Chen, C. (2023, January 1\u20136). Poster: A pyramid cross-fusion transformer network for facial expression recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCVW60793.2023.00339"},{"key":"ref_24","unstructured":"Kingma, D.P. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_25","unstructured":"Paszke, A. (2019). Pytorch: An imperative style, high-performance deep learning library. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Chen, Z., and Shi, B.E. (2018). Appearance-based gaze estimation using dilated-convolutions. Asian Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-030-20876-9_20"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Cheng, Y., Huang, S., Wang, F., Qian, C., and Lu, F. (2020, January 7\u201312). A coarse-to-fine adaptive network for appearance-based gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6636"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"106994","DOI":"10.1016\/j.engappai.2023.106994","article-title":"Attention-guided and fine-grained feature extraction from face images for gaze estimation","volume":"126","author":"Wu","year":"2023","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Cheng, Y., and Lu, F. (2022). Gaze estimation using transformer. Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21\u201325 August 2022, IEEE.","DOI":"10.1109\/ICPR56361.2022.9956687"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Shi, Y., Zhang, F., Yang, W., Wang, G., and Su, N. (2024). Agent-guided gaze estimation network by two-eye asymmetry exploration. Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27\u201330 October 2024, IEEE.","DOI":"10.1109\/ICIP51287.2024.10648029"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/3\/224\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T05:12:54Z","timestamp":1772169174000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/3\/224"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,25]]},"references-count":30,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["info17030224"],"URL":"https:\/\/doi.org\/10.3390\/info17030224","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,25]]}}}