{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:52:55Z","timestamp":1761126775297,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,18]]},"DOI":"10.1145\/3462244.3479942","type":"proceedings-article","created":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T14:41:47Z","timestamp":1634308907000},"page":"619-627","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Knowing Where and What to Write in Automated Live Video Comments: A Unified Multi-Task Approach"],"prefix":"10.1145","author":[{"given":"Hao","family":"Wu","sequence":"first","affiliation":[{"name":"Department of Electronic &amp; Electrical Engineering &amp; Adapt Centre, Trinity College Dublin, Ireland"}]},{"given":"Gareth James Francis","family":"Jones","sequence":"additional","affiliation":[{"name":"School of Computing, Dublin City University, Ireland"}]},{"given":"Francois","family":"Pitie","sequence":"additional","affiliation":[{"name":"Department of Electronic &amp; Electrical Engineering, Trinity College Dublin, Ireland"}]}],"member":"320","published-online":{"date-parts":[[2021,10,18]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2002.1035909"},{"key":"e_1_3_2_1_2_1","unstructured":"X. Bo K. Yannis G. Deepti and G. Kristen. 2019. Less Is More: Learning Highlight Detection From Video Duration. In CVPR.  X. Bo K. Yannis G. Deepti and G. Kristen. 2019. Less Is More: Learning Highlight Detection From Video Duration. In CVPR."},{"key":"e_1_3_2_1_3_1","unstructured":"D. Chaoqun C. Lei M. Shuming W. Furu Z. Conghui and Z. Tiejun. 2020. Multimodal Matching Transformer for Live Commenting. In ECAI.  D. Chaoqun C. Lei M. Shuming W. Furu Z. Conghui and Z. Tiejun. 2020. Multimodal Matching Transformer for Live Commenting. In ECAI."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.121"},{"volume-title":"Enhancing transformer for end-to-end speech-to-text translation","author":"Di\u00a0Gangi Mattia\u00a0Antonino","key":"e_1_3_2_1_5_1","unstructured":"Mattia\u00a0Antonino Di\u00a0Gangi , Matteo Negri , Roldano Cattoni , Dessi Roberto , and Marco Turchi . 2019. Enhancing transformer for end-to-end speech-to-text translation . In Machine Translation Summit XVII. European Association for Machine Translation , 21\u201331. Mattia\u00a0Antonino Di\u00a0Gangi, Matteo Negri, Roldano Cattoni, Dessi Roberto, and Marco Turchi. 2019. Enhancing transformer for end-to-end speech-to-text translation. In Machine Translation Summit XVII. European Association for Machine Translation, 21\u201331."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00033"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 958\u2013959","author":"Iashin Vladimir","year":"2020","unstructured":"Vladimir Iashin and Esa Rahtu . 2020 . Multi-modal dense video captioning . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 958\u2013959 . Vladimir Iashin and Esa Rahtu. 2020. Multi-modal dense video captioning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 958\u2013959."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Y. Jiao Z. Li S. Huang X. Yang B. Liu and T. Zhang. 2018. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection. IEEE Transactions on Multimedia(2018).  Y. Jiao Z. Li S. Huang X. Yang B. Liu and T. Zhang. 2018. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection. IEEE Transactions on Multimedia(2018).","DOI":"10.1109\/TMM.2018.2815998"},{"key":"e_1_3_2_1_10_1","volume-title":"Tesseract: an open-source optical character recognition engine. Linux Journal","author":"Kay Anthony","year":"2007","unstructured":"Anthony Kay . 2007. Tesseract: an open-source optical character recognition engine. Linux Journal ( 2007 ). Anthony Kay. 2007. Tesseract: an open-source optical character recognition engine. Linux Journal (2007)."},{"key":"e_1_3_2_1_11_1","volume-title":"Adam: A method for stochastic optimization. ICLR","author":"Kingma P","year":"2014","unstructured":"Diederik\u00a0 P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. ICLR (2014). Diederik\u00a0P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. ICLR (2014)."},{"volume-title":"Gossiping the videos: An embedding-based generative adversarial framework for time-sync comments generation","author":"Lv G.","key":"e_1_3_2_1_12_1","unstructured":"G. Lv , T. Xu , Q. Liu , E. Chen , W. He , M. An , and Z. Chen . 2019. Gossiping the videos: An embedding-based generative adversarial framework for time-sync comments generation . In Springer . G. Lv, T. Xu, Q. Liu, E. Chen, W. He, M. An, and Z.Chen. 2019. Gossiping the videos: An embedding-based generative adversarial framework for time-sync comments generation. In Springer."},{"key":"e_1_3_2_1_13_1","volume-title":"Livebot: Generating live video comments based on visual and textual contexts. In AAAI.","author":"Ma S.","year":"2019","unstructured":"S. Ma , L. Cui , D. Dai , F. Wei , and X. Sun . 2019 . Livebot: Generating live video comments based on visual and textual contexts. In AAAI. S. Ma, L. Cui, D. Dai, F. Wei, and X. Sun. 2019. Livebot: Generating live video comments based on visual and textual contexts. In AAAI."},{"key":"e_1_3_2_1_14_1","unstructured":"S. Min F. Ali and S. Steve. 2014. Ranking domain-specific highlights by analyzing edited videos. In ECCV.  S. Min F. Ali and S. Steve. 2014. Ranking domain-specific highlights by analyzing edited videos. In ECCV."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"P. Qing and C. Chaomei. 2017. Video highlights detection and summarization with lag-calibration based on concept-emotion mapping of crowd-sourced time-sync comments. arXiv (2017).  P. Qing and C. Chaomei. 2017. Video highlights detection and summarization with lag-calibration based on concept-emotion mapping of crowd-sourced time-sync comments. arXiv (2017).","DOI":"10.18653\/v1\/W17-4501"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3235765.3235781"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379173.3393711"},{"key":"e_1_3_2_1_18_1","unstructured":"Y. Ting M. Tao and R. Yong. 2016. Highlight detection with pairwise deep ranking for first-person video summarization. In CVPR.  Y. Ting M. Tao and R. Yong. 2016. Highlight detection with pairwise deep ranking for first-person video summarization. In CVPR."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1656"},{"key":"e_1_3_2_1_20_1","volume":"202","author":"Weiying W.","unstructured":"W. Weiying , C. Jieting , and J. Qin. 202 0. VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation. In ACMMM. W. Weiying, C. Jieting, and J. Qin. 2020. VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation. In ACMMM.","journal-title":"J. Qin."},{"key":"e_1_3_2_1_21_1","unstructured":"H. Wu G.\u00a0J.\u00a0F. Jones and F. Piti\u00e9. 2020. Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts. arXiv (2020).  H. Wu G.\u00a0J.\u00a0F. Jones and F. Piti\u00e9. 2020. Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts. arXiv (2020)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.maiworkshop-1.8"},{"key":"e_1_3_2_1_23_1","volume-title":"Investigating Automated Mechanisms for Multi-Modal Prediction of User Online-Video Commenting Behaviour. In 2021 International Conference on Content-Based Multimedia Indexing (CBMI). IEEE, 1\u20136.","author":"Wu Hao","year":"2021","unstructured":"Hao Wu , Fran\u00e7ois Piti\u00e9 , and Gareth\u00a0 JF Jones . 2021 . Investigating Automated Mechanisms for Multi-Modal Prediction of User Online-Video Commenting Behaviour. In 2021 International Conference on Content-Based Multimedia Indexing (CBMI). IEEE, 1\u20136. Hao Wu, Fran\u00e7ois Piti\u00e9, and Gareth\u00a0JF Jones. 2021. Investigating Automated Mechanisms for Multi-Modal Prediction of User Online-Video Commenting Behaviour. In 2021 International Conference on Content-Based Multimedia Indexing (CBMI). IEEE, 1\u20136."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2005.1521352"},{"key":"e_1_3_2_1_25_1","unstructured":"C. Xu Z. Yongfeng A. Qingyao X. Hongteng Y. Junchi and Q. Zheng. 2017. Personalized key frame recommendation. In SIGIR. 315\u2013324.  C. Xu Z. Yongfeng A. Qingyao X. Hongteng Y. Junchi and Q. Zheng. 2017. Personalized key frame recommendation. In SIGIR. 315\u2013324."},{"key":"e_1_3_2_1_26_1","unstructured":"Weilong Yang Min-hsuan Tsai and Tomas Izo. 2016. Video thumbnail selection based on deep learning. (2016).  Weilong Yang Min-hsuan Tsai and Tomas Izo. 2016. Video thumbnail selection based on deep learning. (2016)."},{"key":"e_1_3_2_1_27_1","volume-title":"DCA: Diversified Co-attention Towards Informative Live Video Commenting. In CCF NLPCC.","author":"Zhang Z.","year":"2020","unstructured":"Z. Zhang , Z. Yin , S. Ren , X. Li , and S. Li . 2020 . DCA: Diversified Co-attention Towards Informative Live Video Commenting. In CCF NLPCC. Z. Zhang, Z. Yin, S. Ren, X. Li, and S. Li. 2020. DCA: Diversified Co-attention Towards Informative Live Video Commenting. In CCF NLPCC."},{"key":"e_1_3_2_1_28_1","unstructured":"W. Zheng Z. Jie M. Jing L. Jingjing A. Jiangbo and Y. Yang. 2020. Discovering attractive segments in the user-generated video streams. Information Processing & Management(2020).  W. Zheng Z. Jie M. Jing L. Jingjing A. Jiangbo and Y. Yang. 2020. Discovering attractive segments in the user-generated video streams. Information Processing & Management(2020)."}],"event":{"name":"ICMI '21: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Montr\u00e9al QC Canada","acronym":"ICMI '21"},"container-title":["Proceedings of the 2021 International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462244.3479942","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3462244.3479942","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:55Z","timestamp":1750193335000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462244.3479942"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,18]]},"references-count":28,"alternative-id":["10.1145\/3462244.3479942","10.1145\/3462244"],"URL":"https:\/\/doi.org\/10.1145\/3462244.3479942","relation":{},"subject":[],"published":{"date-parts":[[2021,10,18]]},"assertion":[{"value":"2021-10-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}