{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T01:55:42Z","timestamp":1771552542708,"version":"3.50.1"},"reference-count":69,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2023,1,4]],"date-time":"2023-01-04T00:00:00Z","timestamp":1672790400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,1,4]],"date-time":"2023-01-04T00:00:00Z","timestamp":1672790400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100013348","name":"Innosuisse - Schweizerische Agentur f\u00fcr Innovationsf\u00f6rderung","doi-asserted-by":"publisher","award":["43122.1"],"award-info":[{"award-number":["43122.1"]}],"id":[{"id":"10.13039\/501100013348","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Swiss Federal Institute of Technology Zurich"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["User Model User-Adap Inter"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep learning models have shown remarkable performances in egocentric video-based action recognition (EAR), but rely heavily on a large quantity of training data. In specific applications with only limited data available, eye movement data may provide additional valuable sensory information to achieve accurate classification performances. However, little is known about the effectiveness of gaze data as a modality for egocentric action recognition. We, therefore, propose the new Peripheral Vision-Based HMM (PVHMM) classification framework, which utilizes context-rich and object-related gaze features for the detection of human action sequences. Gaze information is quantified using two features, the object-of-interest hit and the object\u2013gaze distance, and human action recognition is achieved by employing a hidden Markov model. The classification performance of the framework is tested and validated on a safety-critical medical device handling task sequence involving seven distinct action classes, using 43 mobile eye tracking recordings. The robustness of the approach is evaluated using the addition of Gaussian noise. Finally, the results are then compared to the performance of a VGG-16 model. The gaze-enhanced PVHMM achieves high classification performances in the investigated medical procedure task, surpassing the purely image-based classification model. Consequently, this gaze-enhanced EAR approach shows the potential for the implementation in action sequence-dependent real-world applications, such as surgical training, performance assessment, or medical procedural tasks.\n<\/jats:p>","DOI":"10.1007\/s11257-022-09352-9","type":"journal-article","created":{"date-parts":[[2023,1,4]],"date-time":"2023-01-04T06:03:09Z","timestamp":1672812189000},"page":"939-965","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["What we see is what we do: a practical Peripheral Vision-Based HMM framework for gaze-enhanced recognition of actions in a medical procedural task"],"prefix":"10.1007","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8369-202X","authenticated-orcid":false,"given":"Felix S.","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5816-4302","authenticated-orcid":false,"given":"Thomas","family":"Kreiner","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8358-5221","authenticated-orcid":false,"given":"Alexander","family":"Lutz","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3802-5329","authenticated-orcid":false,"given":"Quentin","family":"Lohmeyer","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5828-5406","authenticated-orcid":false,"given":"Mirko","family":"Meboldt","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,1,4]]},"reference":[{"key":"9352_CR1","unstructured":"Allahverdyan, A., Galstyan, A.: Comparative analysis of viterbi training and maximum likelihood estimation for HMMs. In: Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011, NIPS 2011. https:\/\/arxiv.org\/abs\/1312.4551v1. (2011, December 16)"},{"key":"9352_CR2","unstructured":"Almaadeed, N., Elharrouss, O., Al-Maadeed, S., Bouridane, A., Beghdadi, A.: A novel approach for robust multi human action recognition and summarization based on 3D convolutional neural networks. https:\/\/www.researchgate.net\/publication\/334735494. (2019)"},{"issue":"11","key":"9352_CR3","doi-asserted-by":"publisher","first-page":"16299","DOI":"10.1007\/s11042-020-08789-7","volume":"80","author":"MA Arabac\u0131","year":"2018","unstructured":"Arabac\u0131, M.A., \u00d6zkan, F., Surer, E., Jan\u010dovi\u010d, P., Temizel, A.: Multi-modal egocentric activity recognition using audio-visual features. Multimed. Tools Appl 80(11), 16299\u201316328 (2018). https:\/\/doi.org\/10.1007\/s11042-020-08789-7","journal-title":"Multimed. Tools Appl"},{"key":"9352_CR4","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2020.2986648","author":"A Bandini","year":"2020","unstructured":"Bandini, A., Zariffa, J.: Analysis of the hands in egocentric vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell (2020). https:\/\/doi.org\/10.1109\/tpami.2020.2986648","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"9352_CR5","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1016\/J.NEUCOM.2019.10.008","volume":"378","author":"SHS Basha","year":"2020","unstructured":"Basha, S.H.S., Dubey, S.R., Pulabaigari, V., Mukherjee, S.: Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378, 112\u2013119 (2020a). https:\/\/doi.org\/10.1016\/J.NEUCOM.2019.10.008","journal-title":"Neurocomputing"},{"key":"9352_CR6","unstructured":"Basha, S. H. S., Pulabaigari, V., Mukherjee, S.: An information-rich sampling technique over spatio-temporal CNN for Classification of human actions in videos. https:\/\/arxiv.org\/abs\/2002.02100v2. (2020b)"},{"key":"9352_CR7","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1016\/J.RCIM.2017.12.001","volume":"51","author":"K Bauters","year":"2018","unstructured":"Bauters, K., Cottyn, J., Claeys, D., Slembrouck, M., Veelaert, P., van Landeghem, H.: Automated work cycle classification and performance measurement for manual work stations. Robot. Comput. Integr. Manuf 51, 139\u2013157 (2018). https:\/\/doi.org\/10.1016\/J.RCIM.2017.12.001","journal-title":"Robot. Comput. Integr. Manuf"},{"issue":"4","key":"9352_CR8","doi-asserted-by":"publisher","first-page":"1048","DOI":"10.1109\/TCSVT.2018.2818407","volume":"29","author":"T Billah","year":"2019","unstructured":"Billah, T., Rahman, S.M.M., Ahmad, M.O., Swamy, M.N.S.: Recognizing distractions for assistive driving by tracking body parts. IEEE Trans. Circuits. Syst. Video. Technol 29(4), 1048\u20131062 (2019). https:\/\/doi.org\/10.1109\/TCSVT.2018.2818407","journal-title":"IEEE Trans. Circuits. Syst. Video. Technol"},{"key":"9352_CR9","doi-asserted-by":"publisher","unstructured":"Boualia, S. N., Amara, N. E. Ben.: 3D cnn for human action recognition. In: 18th IEEE International Multi-Conference on Systems, Signals and Devices, SSD 2021, pp 276\u2013282. https:\/\/doi.org\/10.1109\/SSD52085.2021.9429429. (2021)","DOI":"10.1109\/SSD52085.2021.9429429"},{"key":"9352_CR10","doi-asserted-by":"publisher","unstructured":"Cartas, A., Luque, J., Radeva, P., Segura, C., Dimiccoli, M.: How much does audio matter to recognize egocentric object interactions? https:\/\/doi.org\/10.48550\/arxiv.1906.00634. (2019)","DOI":"10.48550\/arxiv.1906.00634"},{"issue":"4","key":"9352_CR11","doi-asserted-by":"publisher","first-page":"567","DOI":"10.1111\/bju.14852","volume":"124","author":"J Chen","year":"2019","unstructured":"Chen, J., Remulla, D., Nguyen, J.H., Aastha, D., Liu, Y., Dasgupta, P., Hung, A.J.: Current status of artificial intelligence applications in urology and their potential to influence clinical practice. BJU Int 124(4), 567\u2013577 (2019). https:\/\/doi.org\/10.1111\/bju.14852","journal-title":"BJU Int"},{"key":"9352_CR12","doi-asserted-by":"publisher","DOI":"10.1145\/3447744","author":"K Chen","year":"2021","unstructured":"Chen, K., Zhang, D., Yao, L., Wales, S., Yu, Z., Guo, B., Liu, Y.: 77 deep learning for sensor-based human activity recognition: overview, challenges, and opportunities. ACM Comput. Surv (2021). https:\/\/doi.org\/10.1145\/3447744","journal-title":"ACM Comput. Surv"},{"issue":"3","key":"9352_CR13","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1016\/j.intcom.2011.02.008","volume":"23","author":"F Courtemanche","year":"2011","unstructured":"Courtemanche, F., A\u00efmeur, E., Dufresne, A., Najjar, M., Mpondo, F.: Activity recognition using eye-gaze movements and traditional interactions. Interact. Comput 23(3), 202\u2013213 (2011). https:\/\/doi.org\/10.1016\/j.intcom.2011.02.008","journal-title":"Interact. Comput"},{"key":"9352_CR14","doi-asserted-by":"publisher","unstructured":"Czempiel, T., Paschali, M., Keicher, M., Simson, W., Feussner, H., Kim, S. T., Navab, N.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 12263 LNCS, pp 343\u2013352. https:\/\/doi.org\/10.1007\/978-3-030-59716-0_33. (2020)","DOI":"10.1007\/978-3-030-59716-0_33"},{"key":"9352_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2019.105820","author":"C Dai","year":"2020","unstructured":"Dai, C., Liu, X., Lai, J.: Human action recognition using two-stream attention based LSTM networks. Appl. Soft. Comput. J (2020). https:\/\/doi.org\/10.1016\/j.asoc.2019.105820","journal-title":"Appl. Soft. Comput. J"},{"key":"9352_CR16","doi-asserted-by":"publisher","unstructured":"Damen, D., Doughty, H., Farinella, G. M., Fidler, S., Furnari, A., Kazakos, E., Moltisanti, D., Munro, J., Perrett, T., Price, W., Wray, M.: Scaling egocentric vision: the EPIC-KITCHENS Dataset. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 11208 LNCS, 753\u2013771. https:\/\/doi.org\/10.48550\/arxiv.1804.02748. (2018)","DOI":"10.48550\/arxiv.1804.02748"},{"key":"9352_CR17","doi-asserted-by":"publisher","unstructured":"Eivazi, S., Slupina, M., Fuhl, W., Afkari, H., Hafez, A., Kasneci, E.: Towards automatic skill evaluation in microsurgery. In: International conference on intelligent user interfaces, proceedings IUI, pp 73\u201376. https:\/\/doi.org\/10.1145\/3030024.3040985. (2017)","DOI":"10.1145\/3030024.3040985"},{"key":"9352_CR18","doi-asserted-by":"publisher","unstructured":"Fathi, A., Li, Y., Rehg, J. M.: Learning to recognize daily actions using gaze. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 7572 LNCS(PART 1), pp 314\u2013327. https:\/\/doi.org\/10.1007\/978-3-642-33718-5_23. (2012)","DOI":"10.1007\/978-3-642-33718-5_23"},{"key":"9352_CR19","doi-asserted-by":"publisher","first-page":"647930","DOI":"10.3389\/fnbot.2021.647930","volume":"15","author":"S Fuchs","year":"2021","unstructured":"Fuchs, S.: Gaze-based intention estimation for shared autonomy in pick-and-place tasks. Front. Neurorobot 15, 647930 (2021). https:\/\/doi.org\/10.3389\/fnbot.2021.647930","journal-title":"Front. Neurorobot"},{"issue":"3","key":"9352_CR20","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1007\/s11257-019-09248-1","volume":"30","author":"E Garcia-Ceja","year":"2020","unstructured":"Garcia-Ceja, E., Riegler, M., Kvernberg, A.K., Torresen, J.: User-adaptive models for activity and emotion recognition using deep transfer learning and data augmentation. User Model User Adap. Inter 30(3), 365\u2013393 (2020). https:\/\/doi.org\/10.1007\/s11257-019-09248-1","journal-title":"User Model User Adap. Inter"},{"key":"9352_CR21","doi-asserted-by":"publisher","unstructured":"Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T. K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 409\u2013419. https:\/\/doi.org\/10.1109\/CVPR.2018.00050. (2018)","DOI":"10.1109\/CVPR.2018.00050"},{"key":"9352_CR22","doi-asserted-by":"publisher","first-page":"133982","DOI":"10.1109\/ACCESS.2020.3010715","volume":"8","author":"D Gholamiangonabadi","year":"2020","unstructured":"Gholamiangonabadi, D., Kiselov, N., Grolinger, K.: Deep neural networks for human activity recognition with wearable sensors: leave-one-subject-out cross-validation for model selection. IEEE Access 8, 133982\u2013133994 (2020). https:\/\/doi.org\/10.1109\/ACCESS.2020.3010715","journal-title":"IEEE Access"},{"key":"9352_CR23","doi-asserted-by":"publisher","first-page":"115540","DOI":"10.1109\/ACCESS.2019.2936564","volume":"7","author":"H Gunduz","year":"2019","unstructured":"Gunduz, H.: Deep learning-based parkinson\u2019s disease classification using vocal feature sets. IEEE Access 7, 115540\u2013115551 (2019). https:\/\/doi.org\/10.1109\/ACCESS.2019.2936564","journal-title":"IEEE Access"},{"key":"9352_CR24","doi-asserted-by":"publisher","first-page":"7795","DOI":"10.1109\/TIP.2020.3007841","volume":"29","author":"Y Huang","year":"2020","unstructured":"Huang, Y., Cai, M., Li, Z., Lu, F., Sato, Y.: Mutual context network for jointly estimating egocentric gaze and action. IEEE Trans. Image Process 29, 7795\u20137806 (2020). https:\/\/doi.org\/10.1109\/TIP.2020.3007841","journal-title":"IEEE Trans. Image Process"},{"key":"9352_CR25","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1016\/j.procs.2019.08.100","volume":"155","author":"C Jobanputra","year":"2019","unstructured":"Jobanputra, C., Bavishi, J., Doshi, N.: Human activity recognition: a survey. Procedia Comput. Sci 155, 698\u2013703 (2019). https:\/\/doi.org\/10.1016\/j.procs.2019.08.100","journal-title":"Procedia Comput. Sci"},{"issue":"8","key":"9352_CR26","doi-asserted-by":"publisher","first-page":"2442","DOI":"10.1109\/JPROC.2012.2200554","volume":"100","author":"T Kanade","year":"2012","unstructured":"Kanade, T., Hebert, M.: First-person vision. Proc. IEEE 100(8), 2442\u20132453 (2012). https:\/\/doi.org\/10.1109\/JPROC.2012.2200554","journal-title":"Proc. IEEE"},{"key":"9352_CR27","doi-asserted-by":"publisher","unstructured":"Kapidis, G., Poppe, R., Van Dam, E., Noldus, L., Veltkamp, R.: Egocentric hand track and object-based human action recognition. In: Proceedings\u20142019a IEEE smartworld, ubiquitous intelligence and computing, advanced and trusted computing, scalable computing and communications, internet of people and smart city innovation, SmartWorld\/UIC\/ATC\/SCALCOM\/IOP\/SCI 2019a, pp 922\u2013929. https:\/\/doi.org\/10.1109\/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00185. (2019a)","DOI":"10.1109\/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00185"},{"key":"9352_CR28","doi-asserted-by":"publisher","unstructured":"Kapidis, G., Poppe, R., Van Dam, E., Noldus, L., Veltkamp, R.: Multitask learning to improve egocentric action recognition. In: Proceedings\u20142019b International Conference on Computer Vision Workshop, ICCVW 2019b, pp 4396\u20134405. https:\/\/doi.org\/10.1109\/ICCVW.2019.00540. (2019b)","DOI":"10.1109\/ICCVW.2019.00540"},{"key":"9352_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TPAMI.2021.3061479","volume":"01","author":"G Kapidis","year":"2021","unstructured":"Kapidis, G., Poppe, R., Veltkamp, R.C.: Multi-Dataset, Multitask Learning of Egocentric Vision Tasks. IEEE Trans. Pattern. Anal. Mach. Intell 01, 1\u20131 (2021). https:\/\/doi.org\/10.1109\/TPAMI.2021.3061479","journal-title":"IEEE Trans. Pattern. Anal. Mach. Intell"},{"key":"9352_CR30","doi-asserted-by":"publisher","unstructured":"Kapidis, G., Poppe, R. W., Van Dam, E. A., Veltkamp, R. C., Noldus, L. P. J. J.: Where Am I? comparing CNN and LSTM for location classification in egocentric videos. In: 2018 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2018, pp 878\u2013883. https:\/\/doi.org\/10.1109\/PERCOMW.2018.8480258. (2018)","DOI":"10.1109\/PERCOMW.2018.8480258"},{"key":"9352_CR31","doi-asserted-by":"publisher","unstructured":"Kazakos, E., Nagrani, A., Zisserman, A., Damen, Di.: EPIC-fusion: Audio-visual temporal binding for egocentric action recognition. In: Proceedings of the IEEE international conference on computer vision, 2019-Octob, pp 5491\u20135500. https:\/\/doi.org\/10.1109\/ICCV.2019.00559. (2019)","DOI":"10.1109\/ICCV.2019.00559"},{"key":"9352_CR32","doi-asserted-by":"publisher","unstructured":"Kit, D., Sullivan, B.: Classifying mobile eye tracking data with hidden Markov models. In: Proceedings of the 18th international conference on human\u2013computer interaction with mobile devices and services adjunct, MobileHCI 2016, pp 1037\u20131040. https:\/\/doi.org\/10.1145\/2957265.2965014. (2016)","DOI":"10.1145\/2957265.2965014"},{"key":"9352_CR33","doi-asserted-by":"publisher","first-page":"114037","DOI":"10.1016\/j.eswa.2020.114037","volume":"166","author":"AF Klaib","year":"2021","unstructured":"Klaib, A.F., Alsrehin, N.O., Melhem, W.Y., Bashtawi, H.O., Magableh, A.A.: Eye tracking algorithms, techniques, tools, and applications with an emphasis on machine learning and internet of things technologies. Expert Syst. Appl 166, 114037 (2021). https:\/\/doi.org\/10.1016\/j.eswa.2020.114037","journal-title":"Expert Syst. Appl"},{"key":"9352_CR34","doi-asserted-by":"publisher","DOI":"10.1145\/2896452","author":"K Krejtz","year":"2016","unstructured":"Krejtz, K., Duchowski, A., Krejtz, I., Szarkowska, A., Kopacz, A.: Discerning ambient\/focal attention with coefficient K. ACM Trans. Appl. Percept. (TAP) (2016). https:\/\/doi.org\/10.1145\/2896452","journal-title":"ACM Trans. Appl. Percept. (TAP)"},{"issue":"12","key":"9352_CR35","doi-asserted-by":"publisher","first-page":"1543","DOI":"10.1016\/j.humpath.2006.08.024","volume":"37","author":"EA Krupinski","year":"2006","unstructured":"Krupinski, E.A., Tillack, A.A., Richter, L., Henderson, J.T., Bhattacharyya, A.K., Scott, K.M., Graham, A.R., Descour, M.R., Davis, J.R., Weinstein, R.S.: Eye-movement study and human performance using telepathology virtual slides. Implications for medical education and differences with experience. Hum. Pathol 37(12), 1543\u20131556 (2006). https:\/\/doi.org\/10.1016\/j.humpath.2006.08.024","journal-title":"Hum. Pathol"},{"key":"9352_CR36","doi-asserted-by":"publisher","unstructured":"Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: A large video database for human motion recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2556\u20132563. https:\/\/doi.org\/10.1109\/ICCV.2011.6126543. (2011)","DOI":"10.1109\/ICCV.2011.6126543"},{"issue":"25\u201326","key":"9352_CR37","doi-asserted-by":"publisher","first-page":"3559","DOI":"10.1016\/S0042-6989(01)00102-X","volume":"41","author":"MF Land","year":"2001","unstructured":"Land, M.F., Hayhoe, M.: In what ways do eye movements contribute to everyday activities? Vision. Res 41(25\u201326), 3559\u20133565 (2001). https:\/\/doi.org\/10.1016\/S0042-6989(01)00102-X","journal-title":"Vision. Res"},{"key":"9352_CR38","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3051319","author":"Y Li","year":"2021","unstructured":"Li, Y., Liu, M., Rehg, J.: In the eye of the beholder: gaze and actions in first person video. IEEE Trans. Pattern Anal. Mach. Intell (2021). https:\/\/doi.org\/10.1109\/TPAMI.2021.3051319","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"9352_CR39","doi-asserted-by":"publisher","unstructured":"Li, Y., Ye, Z., Rehg, J. M.: Delving into egocentric actions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07\u201312-June, pp 287\u2013295. https:\/\/doi.org\/10.1109\/CVPR.2015.7298625. (2015)","DOI":"10.1109\/CVPR.2015.7298625"},{"key":"9352_CR40","doi-asserted-by":"publisher","unstructured":"Liao, H., Dong, W., Huang, H., Gartner, G., Liu, H. Inferring user tasks in pedestrian navigation from eye movement data in real-world environments. 33(4):739\u2013763. https:\/\/doi.org\/10.1080\/13658816.2018.1482554. (2018)","DOI":"10.1080\/13658816.2018.1482554"},{"issue":"4","key":"9352_CR41","doi-asserted-by":"publisher","first-page":"41","DOI":"10.4018\/IJMHCI.2017100104","volume":"9","author":"K Lukander","year":"2017","unstructured":"Lukander, K., Toivanen, M., Puolam\u00e4ki, K.: Inferring intent and action from gaze in naturalistic behavior: a review. Int. J. Mob. Hum. Comput Interact 9(4), 41\u201357 (2017). https:\/\/doi.org\/10.4018\/IJMHCI.2017100104","journal-title":"Int. J. Mob. Hum. Comput Interact"},{"key":"9352_CR42","doi-asserted-by":"crossref","unstructured":"Ma, M., Fan, H., Kitani, K. M:. Going deeper into first-person activity recognition (pp. 1894\u20131903). (2016)","DOI":"10.1109\/CVPR.2016.209"},{"key":"9352_CR43","doi-asserted-by":"publisher","unstructured":"Mart\u00ednez-Villase\u00f1or, L., Ponce, H.: A concise review on sensor signal acquisition and transformation applied to human activity recognition and human\u2013robot interaction. 15(6). https:\/\/doi.org\/10.1177\/1550147719853987. (2019)","DOI":"10.1177\/1550147719853987"},{"key":"9352_CR44","doi-asserted-by":"publisher","unstructured":"Min, K., Corso, J. J. Integrating human gaze into attention for egocentric activity recognition. In: Proceedings\u20142021 ieee winter conference on applications of computer vision, WACV 2021,pp 1068\u20131077. https:\/\/doi.org\/10.1109\/WACV48630.2021.00111. (2021)","DOI":"10.1109\/WACV48630.2021.00111"},{"key":"9352_CR45","doi-asserted-by":"publisher","unstructured":"Mizik, N., Hanssens, D.: Machine learning and big data. In: Handbook of marketing analytics, pp. 253\u2013254. https:\/\/doi.org\/10.4337\/9781784716752.00022. (2018)","DOI":"10.4337\/9781784716752.00022"},{"key":"9352_CR46","doi-asserted-by":"publisher","unstructured":"Mojarad, R., Attal, F., Chibani, A., Fiorini, S. R., Amirat, Y.: Hybrid approach for human activity recognition by ubiquitous robots. In: IEEE international conference on intelligent robots and systems, 5660\u20135665. https:\/\/doi.org\/10.1109\/IROS.2018.8594173. (2018)","DOI":"10.1109\/IROS.2018.8594173"},{"key":"9352_CR47","doi-asserted-by":"publisher","unstructured":"Ng, J. Y. H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: Deep networks for video classification. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 07\u201312-June, 4694\u20134702. https:\/\/doi.org\/10.1109\/CVPR.2015.7299101.(2015)","DOI":"10.1109\/CVPR.2015.7299101."},{"key":"9352_CR48","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1016\/j.neucom.2021.11.081","volume":"472","author":"A N\u00fa\u00f1ez-Marcos","year":"2022","unstructured":"N\u00fa\u00f1ez-Marcos, A., Azkune, G., Arganda-Carreras, I.: Egocentric vision-based action recognition: a survey. Neurocomputing 472, 175\u2013197 (2022). https:\/\/doi.org\/10.1016\/j.neucom.2021.11.081","journal-title":"Neurocomputing"},{"issue":"1","key":"9352_CR49","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1109\/TIV.2016.2571067","volume":"1","author":"E Ohn-Bar","year":"2016","unstructured":"Ohn-Bar, E., Trivedi, M.M.: Looking at humans in the age of self-driving and highly automated vehicles. IEEE Trans. Intell. Veh 1(1), 90\u2013104 (2016). https:\/\/doi.org\/10.1109\/TIV.2016.2571067","journal-title":"IEEE Trans. Intell. Veh"},{"key":"9352_CR50","doi-asserted-by":"publisher","DOI":"10.1080\/13645706.2019.1584116","author":"N Padoy","year":"2019","unstructured":"Padoy, N.: Machine and deep learning for workflow recognition during surgery. Minim. Invasive Ther. Allied Technol (2019). https:\/\/doi.org\/10.1080\/13645706.2019.1584116","journal-title":"Minim. Invasive Ther. Allied Technol"},{"key":"9352_CR51","unstructured":"Pupil Labs. (n.d.) Pupil invisible\u2014 Eye tracking glasses technical specifications\u2014Pupil Labs. Retrieved August 24, 2022, from https:\/\/pupil-labs.com\/products\/core\/tech-specs\/"},{"key":"9352_CR52","doi-asserted-by":"publisher","unstructured":"Reingold, E. M., Sheridan, H.: Eye movements and visual expertise in chess and medicine. In The Oxford handbook of eye movements. Oxford University Press. https:\/\/doi.org\/10.1093\/oxfordhb\/9780199539789.013.0029. (2012)","DOI":"10.1093\/oxfordhb\/9780199539789.013.0029"},{"key":"9352_CR53","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1016\/J.NEUCOM.2015.04.022","volume":"166","author":"HM Romero Ugalde","year":"2015","unstructured":"Romero Ugalde, H.M., Carmona, J.C., Reyes-Reyes, J., Alvarado, V.M., Mantilla, J.: Computational cost improvement of neural network models in black box nonlinear system identification. Neurocomputing 166, 96\u2013108 (2015). https:\/\/doi.org\/10.1016\/J.NEUCOM.2015.04.022","journal-title":"Neurocomputing"},{"key":"9352_CR54","unstructured":"Rong, Y., Xu, W., Akata, Z., Kasneci, E.: Human attention in fine-grained classification. http:\/\/arxiv.org\/abs\/2111.01628. (2021)"},{"key":"9352_CR55","doi-asserted-by":"publisher","unstructured":"Sch\u00fcldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings\u2014international conference on pattern recognition, 3, 32\u201336. https:\/\/doi.org\/10.1109\/ICPR.2004.1334462. (2004)","DOI":"10.1109\/ICPR.2004.1334462"},{"key":"9352_CR56","doi-asserted-by":"publisher","unstructured":"Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015\u2013conference track proceedings. https:\/\/doi.org\/10.48550\/arxiv.1409.1556.(2014)","DOI":"10.48550\/arxiv.1409.1556."},{"key":"9352_CR57","unstructured":"Soomro, K., Roshan Zamir, A., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. http:\/\/crcv.ucf.edu\/data\/UCF101.php. (2012)"},{"key":"9352_CR58","doi-asserted-by":"publisher","unstructured":"Sudhakaran, S., Escalera, S., Lanz, O.: LSTA: Long short-term attention for egocentric action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2019-June, 9946\u20139955. https:\/\/doi.org\/10.1109\/CVPR.2019.01019(2019)","DOI":"10.1109\/CVPR.2019.01019"},{"key":"9352_CR59","unstructured":"Supervisely, O.: Supervisely: unified OS for computer vision. https:\/\/supervise.ly\/. (2022)"},{"key":"9352_CR60","doi-asserted-by":"publisher","unstructured":"Tang, Y., Tian, Y., Lu, J., Feng, J., Zhou, J.: Action recognition in RGB-D egocentric videos. In: Proceedings\u2014international conference on image processing, ICIP, 2017-September, 3410\u20133414. https:\/\/doi.org\/10.1109\/ICIP.2017.8296915(2018)","DOI":"10.1109\/ICIP.2017.8296915"},{"key":"9352_CR61","doi-asserted-by":"publisher","unstructured":"Tekin, B., Bogo, F., Pollefeys, M.: H+O: Unified egocentric recognition of 3D hand-object poses and interactions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2019a-June, 4506\u20134515. https:\/\/doi.org\/10.1109\/CVPR.2019.00464(2019a)","DOI":"10.1109\/CVPR.2019.00464"},{"key":"9352_CR62","doi-asserted-by":"publisher","unstructured":"Tekin, B., Bogo, F., Pollefeys, M.: H+O: Unified egocentric recognition of 3D hand-object poses and interactions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 4506\u20134515, 2019b-June. https:\/\/doi.org\/10.1109\/CVPR.2019.00464. (2019b)","DOI":"10.1109\/CVPR.2019.00464"},{"key":"9352_CR63","unstructured":"Tobii Pro (2020) Latest in wearable eye tracking|choose Tobii Pro Glasses 3. https:\/\/www.tobiipro.com\/product-listing\/tobii-pro-glasses-3\/"},{"issue":"2","key":"9352_CR64","doi-asserted-by":"publisher","first-page":"761","DOI":"10.1007\/s10100-019-00628-x","volume":"28","author":"BH Ulutas","year":"2020","unstructured":"Ulutas, B.H., \u00d6zkan, N.F., Michalski, R.: Application of hidden Markov models to eye tracking data analysis of visual quality inspection operations. Cent. Eur. J. Oper. Res 28(2), 761\u2013777 (2020). https:\/\/doi.org\/10.1007\/s10100-019-00628-x","journal-title":"Cent. Eur. J. Oper. Res"},{"key":"9352_CR65","doi-asserted-by":"publisher","first-page":"85284","DOI":"10.1109\/ACCESS.2020.2993227","volume":"8","author":"Y Wan","year":"2020","unstructured":"Wan, Y., Yu, Z., Wang, Y., Li, X.: Action recognition based on two-stream convolutional networks with long-short-term spatiotemporal features. IEEE Access 8, 85284\u201385293 (2020). https:\/\/doi.org\/10.1109\/ACCESS.2020.2993227","journal-title":"IEEE Access"},{"issue":"1","key":"9352_CR66","doi-asserted-by":"publisher","first-page":"1","DOI":"10.16910\/jemr.14.1.5","volume":"14","author":"FS Wang","year":"2021","unstructured":"Wang, F.S., Wolf, J., Farshad, M., Meboldt, M., Lohmeyer, Q.: Object-gaze distance: quantifying near- peripheral gaze behavior in real-world application. J. Eye Mov. Res 14(1), 1\u201313 (2021). https:\/\/doi.org\/10.16910\/jemr.14.1.5","journal-title":"J. Eye Mov. Res"},{"key":"9352_CR67","doi-asserted-by":"publisher","DOI":"10.3929\/ethz-b-000309840","author":"J Wolf","year":"2018","unstructured":"Wolf, J., Hess, S., Bachmann, D., Lohmeyer, Q., Meboldt, M.: Automating areas of interest analysis in mobile eye tracking experiments based on machine learning. J. Eye Mov. Res (2018). https:\/\/doi.org\/10.3929\/ethz-b-000309840","journal-title":"J. Eye Mov. Res"},{"key":"9352_CR68","doi-asserted-by":"publisher","unstructured":"Wu, Z., Jiang, Y.G., Wang, X., Ye, H., Xue, X.: Multi-stream multi-class fusion of deep networks for video classification. In: MM 2016\u2014Proceedings of the 2016 ACM multimedia conference, 791\u2013800 (2016). https:\/\/doi.org\/10.1145\/2964284.2964328","DOI":"10.1145\/2964284.2964328"},{"key":"9352_CR69","doi-asserted-by":"publisher","first-page":"144331","DOI":"10.1109\/ACCESS.2020.3014355","volume":"8","author":"J Zhou","year":"2020","unstructured":"Zhou, J., Cao, R., Kang, J., Guo, K., Xu, Y.: An efficient high-quality medical lesion image data labeling method based on active learning. IEEE Access 8, 144331\u2013144342 (2020). https:\/\/doi.org\/10.1109\/ACCESS.2020.3014355","journal-title":"IEEE Access"}],"container-title":["User Modeling and User-Adapted Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11257-022-09352-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11257-022-09352-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11257-022-09352-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T18:12:50Z","timestamp":1692900770000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11257-022-09352-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,4]]},"references-count":69,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["9352"],"URL":"https:\/\/doi.org\/10.1007\/s11257-022-09352-9","relation":{},"ISSN":["0924-1868","1573-1391"],"issn-type":[{"value":"0924-1868","type":"print"},{"value":"1573-1391","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,4]]},"assertion":[{"value":"19 April 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 December 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have declared no conflict of interest. This research was financially supported by Innosuisse, the Swiss Innovation Agency, under Project No. 43122.1. The datasets generated during and\/or analyzed during the current study are available in the figshare repository (). The experiments were carried out in accordance to the regulations of the Swiss Ethics Committees on research involving humans. None of the experiments was preregistered.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}