{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T07:15:59Z","timestamp":1768806959995,"version":"3.49.0"},"reference-count":24,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2020,5,19]],"date-time":"2020-05-19T00:00:00Z","timestamp":1589846400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>In this paper, we explore the effect of using different convolutional layers, batch normalization and the global average pooling layer upon a convolutional neural network (CNN) based gaze tracking system. A novel method is proposed to label the participant\u2019s face images as gaze points retrieved from eye tracker while watching videos for building a training dataset that is closer to human visual behavior. The participants can swing their head freely; therefore, the most real and natural images can be obtained without too many restrictions. The labeled data are classified according to the coordinate of gaze and area of interest on the screen. Therefore, varied network architectures are applied to estimate and compare the effects including the number of convolutional layers, batch normalization (BN) and the global average pooling (GAP) layer instead of the fully connected layer. Three schemes, including the single eye image, double eyes image and facial image, with data augmentation are used to feed into neural network to train and evaluate the efficiency. The input image of the eye or face for an eye tracking system is mostly a small-sized image with relatively few features. The results show that BN and GAP are helpful in overcoming the problem to train models and in reducing the amount of network parameters. It is shown that the accuracy is significantly improved when using GAP and BN at the mean time. Overall, the face scheme has a highest accuracy of 0.883 when BN and GAP are used at the mean time. Additionally, comparing to the fully connected layer set to 512 cases, the number of parameters is reduced by less than 50% and the accuracy is improved by about 2%. A detection accuracy comparison of our model with the existing George and Routray methods shows that our proposed method achieves better prediction accuracy of more than 6%.<\/jats:p>","DOI":"10.3390\/a13050127","type":"journal-article","created":{"date-parts":[[2020,5,20]],"date-time":"2020-05-20T02:48:24Z","timestamp":1589942904000},"page":"127","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["The Effect of Different Deep Network Architectures upon CNN-Based Gaze Tracking"],"prefix":"10.3390","volume":"13","author":[{"given":"Hui-Hui","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7302-1019","authenticated-orcid":false,"given":"Bor-Jiunn","family":"Hwang","sequence":"additional","affiliation":[{"name":"Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jung-Shyr","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of Communication Engineering, National Central University, Taoyuan 320, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Po-Ting","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,5,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1017\/cem.2018.490","article-title":"Peak performance: Simulation and the nature of expertise in emergency medicine","volume":"21","author":"Hicks","year":"2019","journal-title":"Can. J. Emerg. Med."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"31215","DOI":"10.1007\/s11042-019-07940-3","article-title":"Eye gaze tracking based directional control interface for interactive applications","volume":"78","author":"Laddi","year":"2019","journal-title":"Multimed. Tools Appl."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Paul, I.J.L., Sasirekha, S., Maheswari, S.U., Ajith, K.A.M., Arjun, S.M., and Kumar, S.A. (2019). Eye gaze tracking-based adaptive e-learning for enhancing teaching and learning in virtual classrooms. Information and Communication Technology for Competitive Strategies, Springer.","DOI":"10.1007\/978-981-13-0586-3_17"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"478","DOI":"10.1109\/TPAMI.2009.30","article-title":"In the eye of the beholder: A survey of models for eyes and gaze","volume":"32","author":"Hansen","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","unstructured":"Duchowski, A. (2007). Eye Tracking Methodology: Theory and Practice, Springer Science & Business Media."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Blignaut, P., and Wium, D. The effect of mapping function on the accuracy of a video-based eye tracker. Proceedings of the 2013 Conference on Eye Tracking South Africa.","DOI":"10.1145\/2513456.2513461"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2246","DOI":"10.1109\/TBME.2007.895750","article-title":"Novel Eye Gaze Tracking Techniques under Natural Head Movement","volume":"54","author":"Zhu","year":"2007","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhou, X., Cai, H., Shao, Z., Yu, H., and Liu, H. (2016, January 3\u20137). 3D eye model-based gaze estimation from a depth sensor. Proceedings of the 2016 IEEE International Conference on Robotics and Biomimetics (ROBIO), Qingdao, China.","DOI":"10.1109\/ROBIO.2016.7866350"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"16495","DOI":"10.1109\/ACCESS.2017.2735633","article-title":"A review and analysis of eye-gaze estimation systems, algorithms and performance evaluation methods in consumer platforms","volume":"5","author":"Anuradha","year":"2017","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2037","DOI":"10.1007\/s11042-012-1220-z","article-title":"Gaze direction estimation using support vector machine with active appearance model","volume":"70","author":"Wu","year":"2012","journal-title":"Multimed. Tools Appl."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, X., Sugano, Y., Fritz, M., and Bulling, A. (2015). Appearance-based gaze estimation in the wild. 2015 IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society.","DOI":"10.1109\/CVPR.2015.7299081"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., and Torralba, A. (2016, January 27\u201330). Eye tracking for everyone. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.239"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1016\/j.knosys.2016.07.038","article-title":"Appearance-based gaze estimation using deep features and random forest regression","volume":"110","author":"Wang","year":"2016","journal-title":"Knowl. Based Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1109\/TCE.2019.2899869","article-title":"Convolutional Neural Network Implementation for Eye-Gaze Estimation on Low-Quality Consumer Imaging Systems","volume":"65","author":"Lemley","year":"2019","journal-title":"IEEE Trans. Consum. Electron."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, X., Sugano, Y., Fritz, M., and Bulling, A. (2017, January 21\u201326). It\u2019s written all over your face: Full-face appearance-based gaze estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.284"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"George, A., and Routray, A. (2016, January 12\u201315). Real-time eye gaze direction classification using convolutional neural network. Proceedings of the 2016 International Conference on Signal Processing and Communications (SPCOM), Bangalore, India.","DOI":"10.1109\/SPCOM.2016.7746701"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"19679","DOI":"10.1007\/s11042-017-5426-y","article-title":"Efficient eye typing with 9-direction gaze estimation","volume":"77","author":"Zhang","year":"2017","journal-title":"Multimed. Tools Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Kao, C.W., Chen, H.H., Wu, S.H., Hwang, B.J., and Fan, K.C. (2017, January 6\u20138). Cluster based gaze estimation and data visualization supporting diverse environments. Proceedings of the International Conference on Watermarking and Image Processing (ICWIP 2017), Paris, France.","DOI":"10.1145\/3150978.3150988"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_20","unstructured":"Lin, M., Chen, Q., and Yan, S. (2014). Network in network. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision Foundation.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_22","unstructured":"Simonyan, K., and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_23","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Klare, B.F., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., Grother, P., Mah, A., and Jain, A.K. (2015, January 7\u201312). Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298803"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/5\/127\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:30:19Z","timestamp":1760175019000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/5\/127"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,19]]},"references-count":24,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2020,5]]}},"alternative-id":["a13050127"],"URL":"https:\/\/doi.org\/10.3390\/a13050127","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,19]]}}}