{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T22:37:06Z","timestamp":1764715026030,"version":"3.41.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T00:00:00Z","timestamp":1624406400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No. 62090025 and 61932007"],"award-info":[{"award-number":["No. 62090025 and 61932007"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-2008151"],"award-info":[{"award-number":["CNS-2008151"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2021,6,23]]},"abstract":"<jats:p>This work presents MemX: a biologically-inspired attention-aware eyewear system developed with the goal of pursuing the long-awaited vision of a personalized visual Memex. MemX captures human visual attention on the fly, analyzes the salient visual content, and records moments of personal interest in the form of compact video snippets. Accurate attentive scene detection and analysis on resource-constrained platforms is challenging because these tasks are computation and energy intensive. We propose a new temporal visual attention network that unifies human visual attention tracking and salient visual content analysis. Attention tracking focuses computation-intensive video analysis on salient regions, while video analysis makes human attention detection and tracking more accurate. Using the YouTube-VIS dataset and 30 participants, we experimentally show that MemX significantly improves the attention tracking accuracy over the eye-tracking-alone method, while maintaining high system energy efficiency. We have also conducted 11 in-field pilot studies across a range of daily usage scenarios, which demonstrate the feasibility and potential benefits of MemX.<\/jats:p>","DOI":"10.1145\/3463509","type":"journal-article","created":{"date-parts":[[2021,6,24]],"date-time":"2021-06-24T16:29:19Z","timestamp":1624552159000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["MemX"],"prefix":"10.1145","volume":"5","author":[{"given":"Yuhu","family":"Chang","sequence":"first","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China, Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China"}]},{"given":"Yingying","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China, Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China"}]},{"given":"Mingzhi","family":"Dong","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China, Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China"}]},{"given":"Yujiang","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computing, Imperial College London, London, United Kingdom"}]},{"given":"Yutian","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China, Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China"}]},{"given":"Qin","family":"Lv","sequence":"additional","affiliation":[{"name":"University of Colorado Boulder, Boulder, Colorado, United States"}]},{"given":"Robert P.","family":"Dick","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan, United States"}]},{"given":"Tun","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China, Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China"}]},{"given":"Ning","family":"Gu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China, Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China"}]},{"given":"Li","family":"Shang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China, Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2021,6,24]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3351227"},{"key":"e_1_2_2_2_1","unstructured":"Ambarella. 2020. Ambarella introduces CV28M SoC with CVflow to enable new categories of intelligent sensing devices. https:\/\/www.ambarella.com  Ambarella. 2020. Ambarella introduces CV28M SoC with CVflow to enable new categories of intelligent sensing devices. https:\/\/www.ambarella.com"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1038\/nrn3443"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10702-005-7125-6"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0047313"},{"key":"e_1_2_2_6_1","unstructured":"Vannevar Bush et al. 1945. As we may think. The atlantic monthly 176 1 (1945) 101--108.  Vannevar Bush et al. 1945. As we may think. The atlantic monthly 176 1 (1945) 101--108."},{"key":"e_1_2_2_7_1","volume-title":"Visual attention: The past 25 years. Vision research 51, 13","author":"Carrasco Marisa","year":"2011","unstructured":"Marisa Carrasco . 2011. Visual attention: The past 25 years. Vision research 51, 13 ( 2011 ), 1484--1525. Marisa Carrasco. 2011. Visual attention: The past 25 years. Vision research 51, 13 (2011), 1484--1525."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.3390\/s20133739"},{"key":"e_1_2_2_9_1","volume-title":"Detecting Attended Visual Targets in Video. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5395--5405","author":"Chong E.","year":"2020","unstructured":"E. Chong , Y. Wang , N. Ruiz , and J. M. Rehg . 2020 . Detecting Attended Visual Targets in Video. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5395--5405 . https:\/\/doi.org\/10.1109\/CVPR42600. 2020 .00544 10.1109\/CVPR42600.2020.00544 E. Chong, Y. Wang, N. Ruiz, and J. M. Rehg. 2020. Detecting Attended Visual Targets in Video. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5395--5405. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00544"},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Dan Witzner Hansen and Qiang Ji. 2009. In the eye of the beholder: A survey of models for eyes and gaze. IEEE transactions on pattern analysis and machine intelligence 32 3 (2009) 478--500.  Dan Witzner Hansen and Qiang Ji. 2009. In the eye of the beholder: A survey of models for eyes and gaze. IEEE transactions on pattern analysis and machine intelligence 32 3 (2009) 478--500.","DOI":"10.1109\/TPAMI.2009.30"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1080\/10447318.2017.1314611"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2699644"},{"key":"e_1_2_2_17_1","volume-title":"Adapting mask-rcnn for automatic nucleus segmentation. arXiv preprint arXiv:1805.00500","author":"Johnson Jeremiah W","year":"2018","unstructured":"Jeremiah W Johnson . 2018. Adapting mask-rcnn for automatic nucleus segmentation. arXiv preprint arXiv:1805.00500 ( 2018 ). Jeremiah W Johnson. 2018. Adapting mask-rcnn for automatic nucleus segmentation. arXiv preprint arXiv:1805.00500 (2018)."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2638728.2641695"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1136\/archdischild-2017-314449"},{"key":"e_1_2_2_20_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1743666.1743682"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.239"},{"key":"e_1_2_2_23_1","doi-asserted-by":"crossref","unstructured":"Rachavarapu Kranthi Kumar Kumar Moneish Gandhi Vineet and Subramanian Ramanathan. 2018. Watch to Edit: Video Retargeting using Gaze. 205--215.  Rachavarapu Kranthi Kumar Kumar Moneish Gandhi Vineet and Subramanian Ramanathan. 2018. Watch to Edit: Video Retargeting using Gaze. 205--215.","DOI":"10.1111\/cgf.13354"},{"key":"e_1_2_2_24_1","doi-asserted-by":"crossref","unstructured":"R John Leigh and David S Zee. 2015. The neurology of eye movements. OUP USA.  R John Leigh and David S Zee. 2015. The neurology of eye movements. OUP USA.","DOI":"10.1093\/med\/9780199969289.001.0001"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.3389\/fncom.2020.00029"},{"key":"e_1_2_2_26_1","unstructured":"Cherlynn Low. 2018. Google Clips review: A smart but unpredictable camera. https:\/\/www.engadget.com\/2018-02-27-google-clips-ai-camera-review.html  Cherlynn Low. 2018. Google Clips review: A smart but unpredictable camera. https:\/\/www.engadget.com\/2018-02-27-google-clips-ai-camera-review.html"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2858340"},{"volume-title":"From Human Attention to Computational Attention","author":"Mancas Matei","key":"e_1_2_2_28_1","unstructured":"Matei Mancas , Vincent P Ferrera , Nicolas Riche , and John G Taylor . 2016. From Human Attention to Computational Attention . Vol. 2 . Springer . Matei Mancas, Vincent P Ferrera, Nicolas Riche, and John G Taylor. 2016. From Human Attention to Computational Attention. Vol. 2. Springer."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0215-3"},{"volume-title":"gazeNet: End-to-end eye-movement event detection with deep neural networks. Behavior Research Methods","year":"2018","key":"e_1_2_2_30_1","unstructured":"Raimondas, Zemblys, Diederick, C, Niehorster, Kenneth, and Holmqvist. 2018. gazeNet: End-to-end eye-movement event detection with deep neural networks. Behavior Research Methods ( 2018 ). Raimondas, Zemblys, Diederick, C, Niehorster, Kenneth, and Holmqvist. 2018. gazeNet: End-to-end eye-movement event detection with deep neural networks. Behavior Research Methods (2018)."},{"key":"e_1_2_2_31_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.  Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/355017.355028"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2857491.2857512"},{"key":"e_1_2_2_35_1","volume-title":"Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (ETRA '18)","author":"Silva Nelson","year":"2044","unstructured":"Nelson Silva , Tobias Schreck , Eduardo Veas , Vedran Sabol , Eva Eggeling , and Dieter W. Fellner . 2018. Leveraging Eye-Gaze and Time-Series Features to Predict User Interests and Build a Recommendation Model for Visual Analysis . In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (ETRA '18) . Association for Computing Machinery, New York, NY, USA, Article 13, 9 pages. https:\/\/doi.org\/10.1145\/3 2044 93.3204546 10.1145\/3204493.3204546 Nelson Silva, Tobias Schreck, Eduardo Veas, Vedran Sabol, Eva Eggeling, and Dieter W. Fellner. 2018. Leveraging Eye-Gaze and Time-Series Features to Predict User Interests and Build a Recommendation Model for Visual Analysis. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (ETRA '18). Association for Computing Machinery, New York, NY, USA, Article 13, 9 pages. https:\/\/doi.org\/10.1145\/3204493.3204546"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13428-018-1144-2"},{"key":"e_1_2_2_37_1","volume-title":"The Psychology of Attention","author":"Styles Elizabeth","unstructured":"Elizabeth Styles and Elizabeth Styles . 2006. The Psychology of Attention , 2 nd Edition. Elizabeth Styles and Elizabeth Styles. 2006. The Psychology of Attention, 2nd Edition.","edition":"2"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168556.2168617"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40728-4_56"},{"volume-title":"Facial Thermography for Attention Tracking on Smart Eyewear: An Initial Study","author":"Tag Benjamin","key":"e_1_2_2_40_1","unstructured":"Benjamin Tag , Ryan Mannschreck , Kazunori Sugiura , George Chernyshov , Naohisa Ohta , and Kai Kunze . 2017. Facial Thermography for Attention Tracking on Smart Eyewear: An Initial Study . Association for Computing Machinery , New York, NY, USA , 2959--2966. https:\/\/doi.org\/10.1145\/3027063.3053243 10.1145\/3027063.3053243 Benjamin Tag, Ryan Mannschreck, Kazunori Sugiura, George Chernyshov, Naohisa Ohta, and Kai Kunze. 2017. Facial Thermography for Attention Tracking on Smart Eyewear: An Initial Study. Association for Computing Machinery, New York, NY, USA, 2959--2966. https:\/\/doi.org\/10.1145\/3027063.3053243"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2001.937604"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01006"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01218"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2015.09.042"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00318"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00699"},{"key":"e_1_2_2_47_1","volume-title":"Gaze Estimation Using Residual Neural Network. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 411--414","author":"Wong En Teng","year":"2019","unstructured":"En Teng Wong , Seanglidet Yean , Qingyao Hu , Bu Sung Lee , Jigang Liu , and Rajan Deepu . 2019 . Gaze Estimation Using Residual Neural Network. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 411--414 . En Teng Wong, Seanglidet Yean, Qingyao Hu, Bu Sung Lee, Jigang Liu, and Rajan Deepu. 2019. Gaze Estimation Using Residual Neural Network. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 411--414."},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578153.2578185"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00529"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3204493.3204548"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300646"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2778103"},{"key":"e_1_2_2_53_1","doi-asserted-by":"crossref","unstructured":"Yingying Zhao Mingzhi Dong Yujiang Wang Da Feng Qin Lv Robert Dick Dongsheng Li Tun Lu Ning Gu and Li Shang. 2021. A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline. arXiv:2104.04443  Yingying Zhao Mingzhi Dong Yujiang Wang Da Feng Qin Lv Robert Dick Dongsheng Li Tun Lu Ning Gu and Li Shang. 2021. A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline. arXiv:2104.04443","DOI":"10.1109\/TMM.2021.3076612"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3463509","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3463509","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3463509","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:28Z","timestamp":1750195888000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3463509"}},"subtitle":["An Attention-Aware Smart Eyewear System for Personalized Moment Auto-capture"],"short-title":[],"issued":{"date-parts":[[2021,6,23]]},"references-count":52,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,6,23]]}},"alternative-id":["10.1145\/3463509"],"URL":"https:\/\/doi.org\/10.1145\/3463509","relation":{},"ISSN":["2474-9567"],"issn-type":[{"type":"electronic","value":"2474-9567"}],"subject":[],"published":{"date-parts":[[2021,6,23]]},"assertion":[{"value":"2021-06-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}