{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T15:16:33Z","timestamp":1761059793458,"version":"3.41.0"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2014,8,1]],"date-time":"2014-08-01T00:00:00Z","timestamp":1406851200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p>\n            Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict\n            <jats:italic>salient<\/jats:italic>\n            and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. Applications such as content-aware retargeting of videos to different aspect ratios while preserving informative regions and smart insertion of dialog (closed-caption text)\n            <jats:sup>1<\/jats:sup>\n            into the video stream can significantly be improved using the predicted ROIs. We propose an interactive human-in-the-loop framework to model eye movements and predict visual saliency into yet-unseen frames. Eye tracking and video content are used to model visual attention in a manner that accounts for important eye-gaze characteristics such as temporal discontinuities due to sudden eye movements, noise, and behavioral artifacts. A novel statistical- and algorithm-based method\n            <jats:italic>gaze buffering<\/jats:italic>\n            is proposed for eye-gaze analysis and its fusion with content-based features. Our robust saliency prediction is instantiated for two challenging and exciting applications. The first application alters video aspect ratios on-the-fly using content-aware video retargeting, thus making them suitable for a variety of display sizes. The second application dynamically localizes active speakers and places dialog captions on-the-fly in the video stream. Our method ensures that dialogs are faithful to active speaker locations and do not interfere with salient content in the video stream. Our framework naturally accommodates personalisation of the application to suit biases and preferences of individual users.\n          <\/jats:p>","DOI":"10.1145\/2632284","type":"journal-article","created":{"date-parts":[[2014,8,29]],"date-time":"2014-08-29T13:03:31Z","timestamp":1409317411000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Online Estimation of Evolving Human Visual Interest"],"prefix":"10.1145","volume":"11","author":[{"given":"Harish","family":"Katti","sequence":"first","affiliation":[{"name":"Indian Institute of Science, India"}]},{"given":"Anoop Kolar","family":"Rajagopal","sequence":"additional","affiliation":[{"name":"Indian Institute of Science, India"}]},{"given":"Mohan","family":"Kankanhalli","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}]},{"given":"Ramakrishnan","family":"Kalpathi","sequence":"additional","affiliation":[{"name":"Indian Institute of Science, India"}]}],"member":"320","published-online":{"date-parts":[[2014,9,4]]},"reference":[{"volume-title":"Paris Zarcilla","year":"2009","key":"e_1_2_2_1_1","unstructured":"A1 Clip : Paris Zarcilla . 2009 . Smile. in aniBOOM online video clip, YouTube , https:\/\/www.youtube.com\/watch&quest;v=ghgzFY85Gw. A1 Clip: Paris Zarcilla. 2009. Smile. in aniBOOM online video clip, YouTube, https:\/\/www.youtube.com\/watch&quest;v=ghgzFY85Gw."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1743666.1743685"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.24"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.978374"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276390"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2009.12.007"},{"key":"e_1_2_2_7_1","unstructured":"M. Cerf J. Harel W. Einhuser and C. Koch. 2007. Predicting human gaze using low-level saliency combined with face detection. In Neural Information Processing Systems J. C. Platt D. Koller Y. Singer and S. T. Roweis Eds. MIT Press 1--7.  M. Cerf J. Harel W. Einhuser and C. Koch. 2007. Predicting human gaze using low-level saliency combined with face detection. In Neural Information Processing Systems J. C. Platt D. Koller Y. Singer and S. T. Roweis Eds. MIT Press 1--7."},{"volume-title":"Shimit Amin","year":"2007","key":"e_1_2_2_8_1","unstructured":"Chakde Clip: Dir . Shimit Amin . 2007 . Chak de! India. Yash Raj films, DVD. Chakde Clip: Dir. Shimit Amin. 2007. Chak de! India. Yash Raj films, DVD."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.177"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2272918"},{"volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'10)","author":"Grundmann M.","key":"e_1_2_2_11_1","unstructured":"M. Grundmann , V. Kwatra , M. Han , and I. A. Essa . 2010. Discontinuous seam-carving for video retargeting . In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'10) . 569--576. M. Grundmann, V. Kwatra, M. Han, and I. A. Essa. 2010. Discontinuous seam-carving for video retargeting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'10). 569--576."},{"key":"e_1_2_2_12_1","first-page":"545","article-title":"Graph-based visual saliency","volume":"19","author":"Harel J.","year":"2007","unstructured":"J. Harel , C. Koch , and P. Perona . 2007 . Graph-based visual saliency . Adv. Neural Inf. Process. Syst. 19 , 545 -- 552 . J. Harel, C. Koch, and P. Perona. 2007. Graph-based visual saliency. Adv. Neural Inf. Process. Syst. 19, 545--552.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874013"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.730558"},{"key":"e_1_2_2_16_1","volume-title":"Kundan Shah","author":"Dir JBDY","year":"1983","unstructured":"JBDY Clip: Dir . Kundan Shah . 1983 . Jaane Bhi Do Yaaro. in National Film Development Corporation , Ultra Distributors, DVD 2004. JBDY Clip: Dir. Kundan Shah. 1983. Jaane Bhi Do Yaaro. in National Film Development Corporation, Ultra Distributors, DVD 2004."},{"volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV'09)","author":"Judd T.","key":"e_1_2_2_17_1","unstructured":"T. Judd , K. Ehinger , F. Durand , and A. Torralba . 2009. Learning to predict where humans look . In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09) . 2106--2113. T. Judd, K. Ehinger, F. Durand, and A. Torralba. 2009. Learning to predict where humans look. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09). 2106--2113."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2006.879876"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874047"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1631272.1631317"},{"key":"e_1_2_2_21_1","unstructured":"National Captioning Institute. 1970. Online article on captioned television. http:\/\/www.ncicap.org\/caphist.asp.  National Captioning Institute. 1970. Online article on captioned television. http:\/\/www.ncicap.org\/caphist.asp."},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1631272.1631399"},{"volume-title":"Proceedings of the 11th European Conference on Computer Vision (ECCV'10)","author":"Ramanathan S.","key":"e_1_2_2_23_1","unstructured":"S. Ramanathan , H. Katti , M. S. Kankanhalli , T. S. Chua , and N. Sebe . 2010. An eye fixation database for saliency detection in images . In Proceedings of the 11th European Conference on Computer Vision (ECCV'10) . 30--43. S. Ramanathan, H. Katti, M. S. Kankanhalli, T. S. Chua, and N. Sebe. 2010. An eye fixation database for saliency detection in images. In Proceedings of the 11th European Conference on Computer Vision (ECCV'10). 30--43."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360615"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1520340.1520682"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/968363.968368"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1665817.1665828"},{"key":"e_1_2_2_28_1","volume-title":"Tomas Alfredson","author":"Dir TTSS","year":"2011","unstructured":"TTSS Clip: Dir . Tomas Alfredson . 2011 . Tinker Tailor Soldier Spy. in StudioCanal, Karla Films, Paradis Films, Kinowelt, Filmproduction, Working Title Films, Canal&plus;, Cin&plus;. STUDIOCANAL, UK, DVD. TTSS Clip: Dir. Tomas Alfredson. 2011. Tinker Tailor Soldier Spy. in StudioCanal, Karla Films, Paradis Films, Kinowelt, Filmproduction, Working Title Films, Canal&plus;, Cin&plus;. STUDIOCANAL, UK, DVD."},{"volume-title":"Proceedings of the 16th IEEE International Conference on Image Processing (ICIP'09)","author":"Xu D.","key":"e_1_2_2_29_1","unstructured":"D. Xu and P. Nasiopoulos . 2009. Logo insertion transcoding for h.264\/avc compressed video . In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP'09) . 3693--3696. D. Xu and P. Nasiopoulos. 2009. Logo insertion transcoding for h.264\/avc compressed video. In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP'09). 3693--3696."},{"volume-title":"K. Tam, T. Bradbury, K. Rashidi, and K. Liang.","year":"2009","key":"e_1_2_2_30_1","unstructured":"Y2 Clip : Justin Lee , K. Tam, T. Bradbury, K. Rashidi, and K. Liang. 2009 . The 5 second rule. in CAMPUS MOVIEFEST, Outspire Productions online video clip, YouTube , https:\/\/www.youtube.com\/watch&quest;v=9rgCsosjJtl. Y2 Clip: Justin Lee, K. Tam, T. Bradbury, K. Rashidi, and K. Liang. 2009. The 5 second rule. in CAMPUS MOVIEFEST, Outspire Productions online video clip, YouTube, https:\/\/www.youtube.com\/watch&quest;v=9rgCsosjJtl."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2632284","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2632284","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:56:13Z","timestamp":1750229773000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2632284"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.1145\/2632284"],"URL":"https:\/\/doi.org\/10.1145\/2632284","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2014,8]]},"assertion":[{"value":"2013-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-09-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}