{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T06:28:19Z","timestamp":1772519299570,"version":"3.50.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,12,11]],"date-time":"2024-12-11T00:00:00Z","timestamp":1733875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Web"],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:p>\n            In today\u2019s fast-paced digital landscape, the attention span of users consuming video content is alarmingly brief, often as short as 15 seconds for music or entertainment videos and 6 minutes for lecture videos. This presents a significant challenge for video producers and platform providers as they seek to engage users with longer content. One promising solution involves recommending specific fragments within longer videos that align with individual user profiles. In this article, we address this challenge by introducing a novel framework for video fragment recommendations, guided by three key insights. First, we implement a Self-Attention Block that captures the inter-fragment contextual effect, enhancing the relevance of recommendations. Second, we incorporate video-level preferences to ensure that the fragment recommendations are consistent with users\u2019 overall interests. Third, we propose a Self-Attentive Herding Effect (SAHE) module to model the intra-fragment contextual effect, specifically the herding effect of time-sync comments within a fragment. To evaluate the effectiveness of our proposed method, we conduct extensive experiments comparing our model against the state-of-the-art approaches in terms of NDCG@K and Recall@K. Our results demonstrate that the model effectively leverages inter-fragment and intra-fragment contextual effects along with video-level preferences, outperforming existing methods. Additionally, we carry out empirical experiments to analyze the key components and parameters of the proposed model, providing further insights into its performance.\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n          <\/jats:p>","DOI":"10.1145\/3700645","type":"journal-article","created":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T11:20:01Z","timestamp":1729682401000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Fragment of Interest: Personalized Video Fragment Recommendation with Inter-Fragment &amp; Intra-Fragment Contextual Effect"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-8847-9194","authenticated-orcid":false,"given":"Jiaqi","family":"Wang","sequence":"first","affiliation":[{"name":"Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0727-4376","authenticated-orcid":false,"given":"Ricky Y.k.","family":"Kwok","sequence":"additional","affiliation":[{"name":"President's Office, Hong Kong Metropolitan University, Hong Kong, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3454-8731","authenticated-orcid":false,"given":"Edith C.h.","family":"Ngai","sequence":"additional","affiliation":[{"name":"Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,12,11]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01365"},{"key":"e_1_3_2_3_2","article-title":"Neural machine translation by jointly learning to align and translate","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).","journal-title":"arXiv preprint arXiv:1409.0473"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080779"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186070"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080797"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00788"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080776"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240599"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2843948"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2556325.2566239"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"issue":"1","key":"e_1_3_2_13_2","first-page":"1","article-title":"Exploring the emerging type of comment for online videos: DanMu","volume":"12","author":"He Ming","year":"2017","unstructured":"Ming He, Yong Ge, Enhong Chen, Qi Liu, and Xuesong Wang. 2017. Exploring the emerging type of comment for online videos: DanMu. ACM Transactions on the Web (TWEB) 12, 1 (2017), 1\u201333.","journal-title":"ACM Transactions on the Web (TWEB)"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3109859.3109882"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401063"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2831682"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3358030"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371776"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271761"},{"key":"e_1_3_2_20_2","article-title":"Session-based recommendations with recurrent neural networks","author":"Hidasi Bal\u00e1zs","year":"2015","unstructured":"Bal\u00e1zs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).","journal-title":"arXiv preprint arXiv:1511.06939"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2018.00035"},{"key":"e_1_3_2_22_2","volume-title":"International Conference on Learning Representations","author":"Ke Guolin","year":"2020","unstructured":"Guolin Ke, Di He, and Tie-Yan Liu. 2020. Rethinking positional encoding in language pre-training. In International Conference on Learning Representations."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371786"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2020.05.004"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3172944.3172966"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2019.102099"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016810"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-019-7578-4"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_2_30_2","article-title":"BPR: Bayesian personalized ranking from implicit feedback","author":"Rendle Steffen","year":"2012","unstructured":"Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2012. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).","journal-title":"arXiv preprint arXiv:1205.2618"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772773"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/371920.372071"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159656"},{"key":"e_1_3_2_34_2","article-title":"SKEP: Sentiment knowledge enhanced pre-training for sentiment analysis","author":"Tian Hao","year":"2020","unstructured":"Hao Tian, Can Gao, Xinyan Xiao, Hao Liu, Bolei He, Hua Wu, Haifeng Wang, and Feng Wu. 2020. SKEP: Sentiment knowledge enhanced pre-training for sentiment analysis. arXiv preprint arXiv:2005.05635 (2020).","journal-title":"arXiv preprint arXiv:2005.05635"},{"issue":"11","key":"e_1_3_2_35_2","article-title":"Visualizing data using t-SNE.","volume":"9","author":"Maaten Laurens Van der","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_36_2","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/WI-IAT55865.2022.00036"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623625"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3412258"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-19274-7_13"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.5555\/3172077.3172324"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3612920"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00135"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10753"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00085"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3332932"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3007194"},{"key":"e_1_3_2_48_2","article-title":"A survey on deep learning technique for video segmentation","author":"Zhou Tianfei","year":"2022","unstructured":"Tianfei Zhou, Fatih Porikli, David J. Crandall, Luc Van Gool, and Wenguan Wang. 2022. A survey on deep learning technique for video segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"}],"container-title":["ACM Transactions on the Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3700645","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3700645","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:28Z","timestamp":1750295848000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3700645"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,11]]},"references-count":47,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2,28]]}},"alternative-id":["10.1145\/3700645"],"URL":"https:\/\/doi.org\/10.1145\/3700645","relation":{},"ISSN":["1559-1131","1559-114X"],"issn-type":[{"value":"1559-1131","type":"print"},{"value":"1559-114X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,11]]},"assertion":[{"value":"2024-01-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-31","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}