{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T21:23:14Z","timestamp":1780694594558,"version":"3.54.1"},"reference-count":80,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,9,25]],"date-time":"2023-09-25T00:00:00Z","timestamp":1695600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Veoneer"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput.-Hum. Interact."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans\u2019 visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequently employed, the difficulty of customizing these devices places unnecessary constraints on the exploration of more efficient, end-to-end models of eye dynamics. In this work, we propose CLERA, a unified model for Cognitive Load and Eye Region Analysis, which achieves precise keypoint detection and spatiotemporal tracking in a joint-learning framework. Our method demonstrates significant efficiency and outperforms prior work on tasks including cognitive load estimation, eye landmark detection, and blink estimation. We also introduce a large-scale dataset of 30 k human faces with joint pupil, eye-openness, and landmark annotation, which aims at supporting future HCI research on human factors and eye-related analysis.<\/jats:p>","DOI":"10.1145\/3603622","type":"journal-article","created":{"date-parts":[[2023,6,7]],"date-time":"2023-06-07T11:12:52Z","timestamp":1686136372000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild"],"prefix":"10.1145","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1315-1196","authenticated-orcid":false,"given":"Li","family":"Ding","sequence":"first","affiliation":[{"name":"University of Massachusetts Amherst, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0235-7455","authenticated-orcid":false,"given":"Jack","family":"Terwilliger","sequence":"additional","affiliation":[{"name":"University of California San Diego, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8401-217X","authenticated-orcid":false,"given":"Aishni","family":"Parab","sequence":"additional","affiliation":[{"name":"University of California Los Angeles, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3304-0610","authenticated-orcid":false,"given":"Meng","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1484-6843","authenticated-orcid":false,"given":"Lex","family":"Fridman","sequence":"additional","affiliation":[{"name":"Massachusetts Instituteof Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5929-4179","authenticated-orcid":false,"given":"Bruce","family":"Mehler","sequence":"additional","affiliation":[{"name":"Massachusetts Instituteof Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4850-8738","authenticated-orcid":false,"given":"Bryan","family":"Reimer","sequence":"additional","affiliation":[{"name":"Massachusetts Instituteof Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,9,25]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.22002\/D1.20237"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.5399\/osu\/jtrf.46.3.676"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.5220\/0006172700880095"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2021.3098237"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/SMC.2015.460"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.89"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2015.7350892"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00742"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.16910\/jemr.12.1.3"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00147"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/79.911197"},{"key":"e_1_3_2_14_2","unstructured":"Li Ding and Lex Fridman. 2019. Object as distribution. arXiv:1907.12929. Retrieved from https:\/\/arxiv.org\/abs\/1907.12929."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/IV47402.2020.9304677"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/IV48863.2021.9575742"},{"key":"e_1_3_2_17_2","article-title":"MIT DriveSeg (Manual) Dataset for Dynamic Driving Scene Segmentation","author":"Ding Li","year":"2020","unstructured":"Li Ding, Jack Terwilliger, Rini Sherony, Bryan Reimer, and Lex Fridman. 2020. MIT DriveSeg (Manual) Dataset for Dynamic Driving Scene Segmentation. Massachusetts Institute of Technology AgeLab Technical Report 2020-1. Cambridge, MA.","journal-title":"Massachusetts Institute of Technology AgeLab Technical Report 2020-1. Cambridge, MA."},{"key":"e_1_3_2_18_2","article-title":"MIT DriveSeg (Semi-auto) Dataset: Large-scale Semi-automated Annotation of Semantic Driving Scenes","author":"Ding Li","year":"2020","unstructured":"Li Ding, Jack Terwilliger, Rini Sherony, Bryan Reimer, and Lex Fridman. 2020. MIT DriveSeg (Semi-auto) Dataset: Large-scale Semi-automated Annotation of Semantic Driving Scenes. Massachusetts Institute of Technology AgeLab Technical Report 2020-2. Cambridge, MA.","journal-title":"Massachusetts Institute of Technology AgeLab Technical Report 2020-2. Cambridge, MA."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIV.2021.3094836"},{"key":"e_1_3_2_20_2","first-page":"436","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Drutarovsky Tomas","year":"2014","unstructured":"Tomas Drutarovsky and Andrej Fogelton. 2014. Eye blink detection using variance of motion vectors. In Proceedings of the European Conference on Computer Vision. Springer, 436\u2013448."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011145532042"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01249-6_21"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2016.03.011"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2926040"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00173"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174226"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025929"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-23192-1_4"},{"issue":"1","key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1109\/TSMCA.2007.909557","article-title":"The CAS-PEAL large-scale Chinese face database and baseline evaluations","volume":"38","author":"Gao Wen","year":"2007","unstructured":"Wen Gao, Bo Cao, Shiguang Shan, Xilin Chen, Delong Zhou, Xiaohua Zhang, and Debin Zhao. 2007. The CAS-PEAL large-scale Chinese face database and baseline evaluations. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38, 1 (2007), 149\u2013161.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans"},{"key":"e_1_3_2_30_2","unstructured":"Stephan J. Garbin Yiru Shen Immo Schuetz Robert Cavin Gregory Hughes and Sachin S. Talathi. 2019. Openeds: Open eye dataset. arXiv:1905.03702. Retrieved from https:\/\/arxiv.org\/abs\/1905.03702."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-044451020-4\/50027-X"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/1864349.1864395"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compmedimag.2017.04.006"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.5555\/1763088.1763122"},{"key":"e_1_3_2_37_2","unstructured":"Gary B. Huang Marwan Mattar Tamara Berg and Eric Learned-Miller. 2008. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in \u2018Real-Life\u2019 Images: Detection Alignment and Recognition ."},{"key":"e_1_3_2_38_2","first-page":"448","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, 448\u2013456."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.37398\/JSR.2020.640137"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-80624-8_13"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300780"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.3390\/s17071534"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CRV.2007.54"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPSN54338.2022.00026"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2851451"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2007.895298"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.3390\/s20082384"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2313123"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-6392-3_3"},{"key":"e_1_3_2_51_2","article-title":"Is supportive driver monitoring needed to maximize trust, use, and the safety-benefits of collaborative automation?","author":"Mehler Bruce","year":"2020","unstructured":"Bruce Mehler. 2020. Is supportive driver monitoring needed to maximize trust, use, and the safety-benefits of collaborative automation? Panel on \u201cEmerging Automotive Technologies - How our Life will Change\u201d. United Nations Economic Commission for Europe (UNECE), Global Forum for Road Traffic Safety (WP.1). (2020).","journal-title":"Panel on \u201cEmerging Automotive Technologies - How our Life will Change\u201d. United Nations Economic Commission for Europe (UNECE), Global Forum for Road Traffic Safety (WP.1)."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1177\/0018720812442086"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-021-89023-8"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1207\/S15326985EP3801_8"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2007.4409068"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.395"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/2929464.2929472"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1080\/10447318.2013.848320"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMECH.2022.3175774"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-822314-7.00007-9"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1037\/1076-898X.9.2.119"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_2_63_2","unstructured":"Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv:1804.02767. Retrieved from https:\/\/arxiv.org\/abs\/1804.02767."},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1177\/0018720812437274"},{"key":"e_1_3_2_65_2","first-page":"91","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems. 91\u201399."},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/HUMANOIDS.2015.7363515"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3314111.3319844"},{"key":"e_1_3_2_68_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2014.03.024"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4419-8126-4_6"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/2168556.2168585"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/2857491.2857520"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.trf.2014.08.003"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1177\/0018720819874544"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.nucengdes.2017.07.012"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.23890\/IJAST.vm02is01.0104"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.net.2019.06.023"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.284"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2778103"},{"key":"e_1_3_2_80_2","first-page":"642","volume-title":"Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No. 04TH8749)","author":"Zhang Yilu","year":"2004","unstructured":"Yilu Zhang, Yuri Owechko, and Jing Zhang. 2004. Driver cognitive workload estimation: A data-driven perspective. In Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No. 04TH8749). IEEE, 642\u2013647."},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1080\/10508414.2017.1313096"}],"container-title":["ACM Transactions on Computer-Human Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603622","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3603622","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:21Z","timestamp":1750178241000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603622"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,25]]},"references-count":80,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3603622"],"URL":"https:\/\/doi.org\/10.1145\/3603622","relation":{},"ISSN":["1073-0516","1557-7325"],"issn-type":[{"value":"1073-0516","type":"print"},{"value":"1557-7325","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,25]]},"assertion":[{"value":"2022-03-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-02","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}