{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T18:51:54Z","timestamp":1775069514804,"version":"3.50.1"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"ETRA","license":[{"start":{"date-parts":[[2022,5,13]],"date-time":"2022-05-13T00:00:00Z","timestamp":1652400000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Cluster of Excellence - Machine Learning","award":["EXC number 2064\/1 - Project number 390727645."],"award-info":[{"award-number":["EXC number 2064\/1 - Project number 390727645."]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2022,5,13]]},"abstract":"<jats:p>Human drivers use their attentional mechanisms to focus on critical objects and make decisions while driving. As human attention can be revealed from gaze data, capturing and analyzing gaze information has emerged in recent years to benefit autonomous driving technology. Previous works in this context have primarily aimed at predicting \"where\" human drivers look at and lack knowledge of \"what\" objects drivers focus on. Our work bridges the gap between pixel-level and object-level attention prediction. Specifically, we propose to integrate an attention prediction module into a pretrained object detection framework and predict the attention in a grid-based style. Furthermore, critical objects are recognized based on predicted attended-to areas. We evaluate our proposed method on two driver attention datasets, BDD-A and DR(eye)VE. Our framework achieves competitive state-of-the-art performance in the attention prediction on both pixel-level and object-level but is far more efficient (75.3 GFLOPs less) in computation.<\/jats:p>","DOI":"10.1145\/3530887","type":"journal-article","created":{"date-parts":[[2022,5,13]],"date-time":"2022-05-13T22:17:43Z","timestamp":1652480263000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Where and What"],"prefix":"10.1145","volume":"6","author":[{"given":"Yao","family":"Rong","sequence":"first","affiliation":[{"name":"University of T\u00fcbingen, T\u00fcbingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Naemi-Rebecca","family":"Kassautzki","sequence":"additional","affiliation":[{"name":"University of T\u00fcbingen, T\u00fcbingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wolfgang","family":"Fuhl","sequence":"additional","affiliation":[{"name":"University of T\u00fcbingen, T\u00fcbingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Enkelejda","family":"Kasneci","sequence":"additional","affiliation":[{"name":"University of T\u00fcbingen, T\u00fcbingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,5,13]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Attend and Brake: An Attention-based Saliency Map Prediction Model for End-to-End Driving. arXiv preprint arXiv:2002.11020","author":"Aksoy Ekrem","year":"2020","unstructured":"Ekrem Aksoy , Ahmet Yazici , and Mahmut Kasap . 2020. See , Attend and Brake: An Attention-based Saliency Map Prediction Model for End-to-End Driving. arXiv preprint arXiv:2002.11020 ( 2020 ). Ekrem Aksoy, Ahmet Yazici, and Mahmut Kasap. 2020. See, Attend and Brake: An Attention-based Saliency Map Prediction Model for End-to-End Driving. arXiv preprint arXiv:2002.11020 (2020)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Stefano Alletto Andrea Palazzi Francesco Solera Simone Calderara and Rita Cucchiara. 2016. Dr (eye) ve: a dataset for attention-based tasks with applications to autonomous and assisted driving. In CVPRW .  Stefano Alletto Andrea Palazzi Francesco Solera Simone Calderara and Rita Cucchiara. 2016. Dr (eye) ve: a dataset for attention-based tasks with applications to autonomous and assisted driving. In CVPRW .","DOI":"10.1109\/CVPRW.2016.14"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Michael Barz Sebastian Kapp Jochen Kuhn and Daniel Sonntag. 2021. Automatic recognition and augmentation of attended objects in real-time using eye tracking and a head-mounted display. In ACM ETRA. 1--4.  Michael Barz Sebastian Kapp Jochen Kuhn and Daniel Sonntag. 2021. Automatic recognition and augmentation of attended objects in real-time using eye tracking and a head-mounted display. In ACM ETRA. 1--4.","DOI":"10.1145\/3450341.3458766"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3390\/s21124143"},{"key":"e_1_2_1_5_1","volume-title":"Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934","author":"Bochkovskiy Alexey","year":"2020","unstructured":"Alexey Bochkovskiy , Chien-Yao Wang , and Hong-Yuan Mark Liao . 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 ( 2020 ). Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)."},{"key":"e_1_2_1_6_1","volume-title":"Saliency prediction in the deep learning era: Successes, limitations, and future challenges. arXiv preprint arXiv:1810.03716","author":"Borji Ali","year":"2018","unstructured":"Ali Borji . 2018. Saliency prediction in the deep learning era: Successes, limitations, and future challenges. arXiv preprint arXiv:1810.03716 ( 2018 ). Ali Borji. 2018. Saliency prediction in the deep learning era: Successes, limitations, and future challenges. arXiv preprint arXiv:1810.03716 (2018)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Jiwoong Choi Dayoung Chun Hyun Kim and Hyuk-Jae Lee. 2019. Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In ICCV. 502--511.  Jiwoong Choi Dayoung Chun Hyun Kim and Hyuk-Jae Lee. 2019. Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In ICCV. 502--511.","DOI":"10.1109\/ICCV.2019.00059"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Marcella Cornia Lorenzo Baraldi Giuseppe Serra and Rita Cucchiara. 2016. A deep multi-level network for saliency prediction. In ICPR .  Marcella Cornia Lorenzo Baraldi Giuseppe Serra and Rita Cucchiara. 2016. A deep multi-level network for saliency prediction. In ICPR .","DOI":"10.1109\/ICPR.2016.7900174"},{"key":"e_1_2_1_9_1","volume-title":"ECCVW","volume":"1","author":"Csurka Gabriella","year":"2004","unstructured":"Gabriella Csurka , Christopher Dance , Lixin Fan , Jutta Willamowski , and C\u00e9dric Bray . 2004 . Visual categorization with bags of keypoints . In ECCVW , Vol. 1 . Prague, 1--2. Gabriella Csurka, Christopher Dance, Lixin Fan, Jutta Willamowski, and C\u00e9dric Bray. 2004. Visual categorization with bags of keypoints. In ECCVW, Vol. 1. Prague, 1--2."},{"key":"e_1_2_1_10_1","volume-title":"Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255.","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Tao Deng Hongmei Yan Long Qin Thuyen Ngo and B. Manjunath. 2019. How Do Drivers Allocate Their Potential Attention? Driving Fixation Prediction via Convolutional Neural Networks. T-ITS (2019).  Tao Deng Hongmei Yan Long Qin Thuyen Ngo and B. Manjunath. 2019. How Do Drivers Allocate Their Potential Attention? Driving Fixation Prediction via Convolutional Neural Networks. T-ITS (2019).","DOI":"10.1109\/TITS.2019.2915540"},{"key":"e_1_2_1_12_1","unstructured":"Kaiming He Georgia Gkioxari Piotr Doll\u00e1r and Ross Girshick. 2017. Mask r-cnn. In ICCV . 2961--2969.  Kaiming He Georgia Gkioxari Piotr Doll\u00e1r and Ross Girshick. 2017. Mask r-cnn. In ICCV . 2961--2969."},{"key":"e_1_2_1_13_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780."},{"key":"#cr-split#-e_1_2_1_14_1.1","unstructured":"Glenn Jocher Alex Stoken Jirka Borovec NanoCode012 Ayush Chaurasia TaoXie Liu Changyu Abhiram V Laughing tkianai yxNONG Adam Hogan lorenzomammana AlexWang1900 Jan Hajek Laurentiu Diaconu Marc Yonghye Kwon oleg wanghaoyang0106 Yann Defretin Aditya Lohia ml5ah Ben Milanko Benjamin Fineran Daniel Khromov Ding Yiwei Doug Durgesh and Francisco Ingham. 2021. ultralytics\/yolov5: v5.0 - YOLOv5-P6 1280 models. https:\/\/doi.org\/10.5281\/zenodo.4679653 10.5281\/zenodo.4679653"},{"key":"#cr-split#-e_1_2_1_14_1.2","unstructured":"Glenn Jocher Alex Stoken Jirka Borovec NanoCode012 Ayush Chaurasia TaoXie Liu Changyu Abhiram V Laughing tkianai yxNONG Adam Hogan lorenzomammana AlexWang1900 Jan Hajek Laurentiu Diaconu Marc Yonghye Kwon oleg wanghaoyang0106 Yann Defretin Aditya Lohia ml5ah Ben Milanko Benjamin Fineran Daniel Khromov Ding Yiwei Doug Durgesh and Francisco Ingham. 2021. ultralytics\/yolov5: v5.0 - YOLOv5-P6 1280 models. https:\/\/doi.org\/10.5281\/zenodo.4679653"},{"key":"e_1_2_1_15_1","volume-title":"Improving Driver Gaze Prediction With Reinforced Attention","author":"Sheng Hao","year":"2020","unstructured":"lv Kai, Hao Sheng , Zhang Xiong , Wei Li , and Liang Zheng . 2020. Improving Driver Gaze Prediction With Reinforced Attention . IEEE Transactions on Multimedia ( 2020 ). lv Kai, Hao Sheng, Zhang Xiong, Wei Li, and Liang Zheng. 2020. Improving Driver Gaze Prediction With Reinforced Attention. IEEE Transactions on Multimedia (2020)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-021-00863-5"},{"key":"e_1_2_1_17_1","unstructured":"Jinkyu Kim Anna Rohrbach Trevor Darrell John Canny and Zeynep Akata. 2018. Textual explanations for self-driving vehicles. In ECCV. 563--578.  Jinkyu Kim Anna Rohrbach Trevor Darrell John Canny and Zeynep Akata. 2018. Textual explanations for self-driving vehicles. In ECCV. 563--578."},{"key":"e_1_2_1_18_1","volume-title":"Adam: A Method for Stochastic Optimization. In ICLR .","author":"Kingma Diederik P","year":"2015","unstructured":"Diederik P Kingma and Jimmy Ba . 2015 . Adam: A Method for Stochastic Optimization. In ICLR . Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR ."},{"key":"e_1_2_1_19_1","volume-title":"Imagenet classification with deep convolutional neural networks. NeurIPS","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks. NeurIPS ( 2012 ). Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. NeurIPS (2012)."},{"key":"e_1_2_1_20_1","volume-title":"Learning-based approach for online lane change intention prediction","author":"Kumar Puneet","unstructured":"Puneet Kumar , Mathias Perrollaz , St\u00e9phanie Lefevre , and Christian Laugier . 2013. Learning-based approach for online lane change intention prediction . In IV. IEEE , 797--802. Puneet Kumar, Mathias Perrollaz, St\u00e9phanie Lefevre, and Christian Laugier. 2013. Learning-based approach for online lane change intention prediction. In IV. IEEE, 797--802."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3390\/s21227668"},{"key":"e_1_2_1_22_1","volume-title":"Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet. arXiv preprint arXiv:1411.1045","author":"K\u00fcmmerer Matthias","year":"2014","unstructured":"Matthias K\u00fcmmerer , Lucas Theis , and Matthias Bethge . 2014. Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet. arXiv preprint arXiv:1411.1045 ( 2014 ). Matthias K\u00fcmmerer, Lucas Theis, and Matthias Bethge. 2014. Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet. arXiv preprint arXiv:1411.1045 (2014)."},{"key":"e_1_2_1_23_1","volume-title":"Thomas SA Wallis, and Matthias Bethge","author":"K\u00fcmmerer Matthias","year":"2016","unstructured":"Matthias K\u00fcmmerer , Thomas SA Wallis, and Matthias Bethge . 2016 . DeepGaze II : Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016). Matthias K\u00fcmmerer, Thomas SA Wallis, and Matthias Bethge. 2016. DeepGaze II: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016)."},{"key":"e_1_2_1_24_1","volume-title":"Microsoft coco: Common objects in context","author":"Lin Tsung-Yi","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C Lawrence Zitnick . 2014. Microsoft coco: Common objects in context . In ECCV. Springer , 740--755. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755."},{"key":"e_1_2_1_25_1","unstructured":"Congcong Liu Yuying Chen Lei Tai Haoyang Ye Ming Liu and Bertram E Shi. 2019. A gaze model improves autonomous driving. In ACM ETRA .  Congcong Liu Yuying Chen Lei Tai Haoyang Ye Ming Liu and Bertram E Shi. 2019. A gaze model improves autonomous driving. In ACM ETRA ."},{"key":"e_1_2_1_26_1","volume-title":"Picanet: Learning pixel-wise contextual attention for saliency detection. In CVPR . 3089--3098.","author":"Liu Nian","year":"2018","unstructured":"Nian Liu , Junwei Han , and Ming-Hsuan Yang . 2018 a. Picanet: Learning pixel-wise contextual attention for saliency detection. In CVPR . 3089--3098. Nian Liu, Junwei Han, and Ming-Hsuan Yang. 2018a. Picanet: Learning pixel-wise contextual attention for saliency detection. In CVPR . 3089--3098."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Shu Liu Lu Qi Haifang Qin Jianping Shi and Jiaya Jia. 2018b. Path aggregation network for instance segmentation. In CVPR . 8759--8768.  Shu Liu Lu Qi Haifang Qin Jianping Shi and Jiaya Jia. 2018b. Path aggregation network for instance segmentation. In CVPR . 8759--8768.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Yang Liu Lei Zhou Xiao Bai Yifei Huang Lin Gu Jun Zhou and Tatsuya Harada. 2021. Goal-oriented gaze estimation for zero-shot learning. In CVPR. 3794--3803.  Yang Liu Lei Zhou Xiao Bai Yifei Huang Lin Gu Jun Zhou and Tatsuya Harada. 2021. Goal-oriented gaze estimation for zero-shot learning. In CVPR. 3794--3803.","DOI":"10.1109\/CVPR46437.2021.00379"},{"key":"e_1_2_1_29_1","volume-title":"Visual Attention-Based Object Detection in Cluttered Environments","author":"Silva Machado Eduardo Manuel","unstructured":"Eduardo Manuel Silva Machado , Ivan Carrillo , Miguel Collado , and Liming Chen . 2019. Visual Attention-Based Object Detection in Cluttered Environments . In SmartWorld\/SCALCOM\/UIC\/ATC\/CBDCom\/IOP\/SCI. IEEE , 133--139. Eduardo Manuel Silva Machado, Ivan Carrillo, Miguel Collado, and Liming Chen. 2019. Visual Attention-Based Object Detection in Cluttered Environments. In SmartWorld\/SCALCOM\/UIC\/ATC\/CBDCom\/IOP\/SCI. IEEE, 133--139."},{"key":"e_1_2_1_30_1","volume-title":"Human visual attention prediction boosts learning & performance of autonomous driving agents. arXiv preprint arXiv:1909.05003","author":"Makrigiorgos Alexander","year":"2019","unstructured":"Alexander Makrigiorgos , Ali Shafti , Alex Harston , Julien Gerard , and A Aldo Faisal . 2019. Human visual attention prediction boosts learning & performance of autonomous driving agents. arXiv preprint arXiv:1909.05003 ( 2019 ). Alexander Makrigiorgos, Ali Shafti, Alex Harston, Julien Gerard, and A Aldo Faisal. 2019. Human visual attention prediction boosts learning & performance of autonomous driving agents. arXiv preprint arXiv:1909.05003 (2019)."},{"key":"e_1_2_1_31_1","volume-title":"et almbox","author":"Nugraha Brilian Tafjira","year":"2017","unstructured":"Brilian Tafjira Nugraha , Shun-Feng Su , et almbox . 2017 . Towards self-driving car using convolutional neural network and road lane detector. In ICACOMIT . Brilian Tafjira Nugraha, Shun-Feng Su, et almbox. 2017. Towards self-driving car using convolutional neural network and road lane detector. In ICACOMIT ."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2003.1246946"},{"key":"e_1_2_1_33_1","unstructured":"Anwesan Pal Sayan Mondal and Henrik I Christensen. 2020. \"Looking at the Right Stuff\"-Guided Semantic-Gaze for Autonomous Driving. In CVPR .  Anwesan Pal Sayan Mondal and Henrik I Christensen. 2020. \"Looking at the Right Stuff\"-Guided Semantic-Gaze for Autonomous Driving. In CVPR ."},{"key":"e_1_2_1_34_1","volume-title":"Predicting the Driver's Focus of Attention: the DR(eye)VE Project. TPAMI","author":"Palazzi Andrea","year":"2018","unstructured":"Andrea Palazzi , Davide Abati , Simone Calderara , Francesco Solera , and Rita Cucchiara . 2018. Predicting the Driver's Focus of Attention: the DR(eye)VE Project. TPAMI ( 2018 ). Andrea Palazzi, Davide Abati, Simone Calderara, Francesco Solera, and Rita Cucchiara. 2018. Predicting the Driver's Focus of Attention: the DR(eye)VE Project. TPAMI (2018)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/THMS.2019.2892919"},{"key":"e_1_2_1_36_1","volume-title":"Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention","author":"Peters Robert J","unstructured":"Robert J Peters and Laurent Itti . 2007. Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention . In CVPR. IEEE , 1--8. Robert J Peters and Laurent Itti. 2007. Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. In CVPR. IEEE, 1--8."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2070719.2070721"},{"key":"e_1_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Joseph Redmon Santosh Divvala Ross Girshick and Ali Farhadi. 2016. You only look once: Unified real-time object detection. In CVPR. 779--788.  Joseph Redmon Santosh Divvala Ross Girshick and Ali Farhadi. 2016. You only look once: Unified real-time object detection. In CVPR. 779--788.","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01421202"},{"key":"e_1_2_1_40_1","volume-title":"Driver intention anticipation based on in-cabin and driving scene monitoring","author":"Rong Yao","unstructured":"Yao Rong , Zeynep Akata , and Enkelejda Kasneci . 2020. Driver intention anticipation based on in-cabin and driving scene monitoring . In ITSC. IEEE , 1--8. Yao Rong, Zeynep Akata, and Enkelejda Kasneci. 2020. Driver intention anticipation based on in-cabin and driving scene monitoring. In ITSC. IEEE, 1--8."},{"key":"e_1_2_1_41_1","unstructured":"Yao Rong Wenjia Xu Zeynep Akata and Enkelejda Kasneci. 2021. Human Attention in Fine-grained Classification. In BMVC .  Yao Rong Wenjia Xu Zeynep Akata and Enkelejda Kasneci. 2021. Human Attention in Fine-grained Classification. In BMVC ."},{"key":"e_1_2_1_42_1","volume-title":"MICCAI","author":"Saab Khaled","unstructured":"Khaled Saab , Sarah M Hooper , Nimit S Sohoni , Jupinder Parmar , Brian Pogatchnik , Sen Wu , Jared A Dunnmon , Hongyang R Zhang , Daniel Rubin , and Christopher R\u00e9. 2021. Observational supervision for medical image classification using gaze data . In MICCAI . Springer , 603--614. Khaled Saab, Sarah M Hooper, Nimit S Sohoni, Jupinder Parmar, Brian Pogatchnik, Sen Wu, Jared A Dunnmon, Hongyang R Zhang, Daniel Rubin, and Christopher R\u00e9. 2021. Observational supervision for medical image classification using gaze data. In MICCAI . Springer, 603--614."},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Anthony Santella Maneesh Agrawala Doug DeCarlo David Salesin and Michael Cohen. 2006. Gaze-based interaction for semi-automatic photo cropping. In CHI. 771--780.  Anthony Santella Maneesh Agrawala Doug DeCarlo David Salesin and Michael Cohen. 2006. Gaze-based interaction for semi-automatic photo cropping. In CHI. 771--780.","DOI":"10.1145\/1124772.1124886"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Karthikeyan Shanmuga Vadivel Thuyen Ngo Miguel Eckstein and BS Manjunath. 2015. Eye tracking assisted extraction of attentionally important objects from videos. In CVPR .  Karthikeyan Shanmuga Vadivel Thuyen Ngo Miguel Eckstein and BS Manjunath. 2015. Eye tracking assisted extraction of attentionally important objects from videos. In CVPR .","DOI":"10.1109\/CVPR.2015.7298944"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Mohsen Shirpour Steven S Beauchemin and Michael A Bauer. 2021. Driver's Eye Fixation Prediction by Deep Neural Network.. In VISIGRAPP .  Mohsen Shirpour Steven S Beauchemin and Michael A Bauer. 2021. Driver's Eye Fixation Prediction by Deep Neural Network.. In VISIGRAPP .","DOI":"10.5220\/0010220800670075"},{"key":"e_1_2_1_46_1","volume-title":"Complex-yolo: An euler-region-proposal for real-time 3d object detection on point clouds. In ECCV .","author":"Simony Martin","year":"2018","unstructured":"Martin Simony , Stefan Milzy , Karl Amendey , and Horst-Michael Gross . 2018 . Complex-yolo: An euler-region-proposal for real-time 3d object detection on point clouds. In ECCV . Martin Simony, Stefan Milzy, Karl Amendey, and Horst-Michael Gross. 2018. Complex-yolo: An euler-region-proposal for real-time 3d object detection on point clouds. In ECCV ."},{"key":"e_1_2_1_47_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_2_1_48_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro , Amir Roshan Zamir, and Mubarak Shah . 2012 . UCF101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR- 12-01 (2012). Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01 (2012)."},{"key":"e_1_2_1_49_1","volume-title":"Learning to predict by the methods of temporal differences. Machine learning","author":"Sutton Richard S","year":"1988","unstructured":"Richard S Sutton . 1988. Learning to predict by the methods of temporal differences. Machine learning , Vol. 3 , 1 ( 1988 ), 9--44. Richard S Sutton. 1988. Learning to predict by the methods of temporal differences. Machine learning , Vol. 3, 1 (1988), 9--44."},{"key":"e_1_2_1_50_1","doi-asserted-by":"crossref","unstructured":"Arun Balajee Vasudevan Dengxin Dai and Luc Van Gool. 2018. Object referring in videos with language and human gaze. In CVPR. 4129--4138.  Arun Balajee Vasudevan Dengxin Dai and Luc Van Gool. 2018. Object referring in videos with language and human gaze. In CVPR. 4129--4138.","DOI":"10.1109\/CVPR.2018.00434"},{"key":"e_1_2_1_51_1","volume-title":"Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh.","author":"Wang Chien-Yao","year":"2020","unstructured":"Chien-Yao Wang , Hong-Yuan Mark Liao , Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. 2020 . CSPNet: A new backbone that can enhance learning capability of CNN. In CVPRW . 390--391. Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. 2020. CSPNet: A new backbone that can enhance learning capability of CNN. In CVPRW . 390--391."},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Wenguan Wang Jianbing Shen Xingping Dong and Ali Borji. 2018. Salient object detection driven by fixation prediction. In CVPR . 1711--1720.  Wenguan Wang Jianbing Shen Xingping Dong and Ali Borji. 2018. Salient object detection driven by fixation prediction. In CVPR . 1711--1720.","DOI":"10.1109\/CVPR.2018.00184"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.16910\/jemr.11.6.6"},{"key":"e_1_2_1_54_1","doi-asserted-by":"crossref","unstructured":"Ye Xia Danqing Zhang Jinkyu Kim Ken Nakayama Karl Zipser and David Whitney. 2018. Predicting driver attention in critical situations. In ACCV .  Ye Xia Danqing Zhang Jinkyu Kim Ken Nakayama Karl Zipser and David Whitney. 2018. Predicting driver attention in critical situations. In ACCV .","DOI":"10.1007\/978-3-030-20873-8_42"},{"key":"e_1_2_1_55_1","volume-title":"NeurIPS","volume":"28","author":"Xingjian SHI","year":"2015","unstructured":"SHI Xingjian , Zhourong Chen , Hao Wang , Dit-Yan Yeung , Wai-Kin Wong , and Wang-chun Woo. 2015 . Convolutional LSTM network: A machine learning approach for precipitation nowcasting . In NeurIPS , Vol. 28 . SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In NeurIPS, Vol. 28."},{"key":"e_1_2_1_56_1","volume-title":"Eye Movements and Vision","author":"Yarbus Alfred L","unstructured":"Alfred L Yarbus . 1967. Eye movements during perception of complex objects . In Eye Movements and Vision . Springer , 171--211. Alfred L Yarbus. 1967. Eye movements during perception of complex objects. In Eye Movements and Vision . Springer, 171--211."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/689"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2876865"},{"key":"e_1_2_1_59_1","doi-asserted-by":"crossref","unstructured":"Xingyi Zhou Vladlen Koltun and Philipp Kr\"ahenb\u00fchl. 2020. Tracking objects as points. In ECCV .  Xingyi Zhou Vladlen Koltun and Philipp Kr\"ahenb\u00fchl. 2020. Tracking objects as points. In ECCV .","DOI":"10.1007\/978-3-030-58548-8_28"}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3530887","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3530887","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:26Z","timestamp":1750183766000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3530887"}},"subtitle":["Driver Attention-based Object Detection"],"short-title":[],"issued":{"date-parts":[[2022,5,13]]},"references-count":60,"journal-issue":{"issue":"ETRA","published-print":{"date-parts":[[2022,5,13]]}},"alternative-id":["10.1145\/3530887"],"URL":"https:\/\/doi.org\/10.1145\/3530887","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,13]]},"assertion":[{"value":"2022-05-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}