{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:05:50Z","timestamp":1750309550339,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2024,8,21]],"date-time":"2024-08-21T00:00:00Z","timestamp":1724198400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science Foundation","award":["IIS-2133650"],"award-info":[{"award-number":["IIS-2133650"]}]},{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"crossref","award":["1R01LM012605-01A1 and R21LM013678-01"],"award-info":[{"award-number":["1R01LM012605-01A1 and R21LM013678-01"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>\n            Self-attention (SA) mechanisms have been widely used in developing sequential recommendation (SR) methods, and demonstrated state-of-the-art performance. However, in this article, we show that self-attentive SR methods substantially suffer from the over-smoothing issue that item embeddings within a sequence become increasingly similar across attention blocks. As widely demonstrated in the literature, this issue could lead to a loss of information in individual items, and significantly degrade models\u2019 scalability and performance. To address the over-smoothing issue, in this article, we view items within a sequence constituting a star graph and develop a method, denoted as\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathop{\\mathtt{MSSG}}\\limits\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , for SR. Different from existing self-attentive methods,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathop{\\mathtt{MSSG}}\\limits\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            introduces an additional internal node to specifically capture the global information within the sequence, and does not require information propagation among items. This design fundamentally addresses the over-smoothing issue and enables\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathop{\\mathtt{MSSG}}\\limits\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            a linear time complexity with respect to the sequence length. We compare\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathop{\\mathtt{MSSG}}\\limits\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            with eleven state-of-the-art baseline methods on six public benchmark datasets. Our experimental results demonstrate that\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathop{\\mathtt{MSSG}}\\limits\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            significantly outperforms the baseline methods, with an improvement of as much as 10.10%. Our analysis shows the superior scalability of\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathop{\\mathtt{MSSG}}\\limits\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            over the state-of-the-art self-attentive methods. Our complexity analysis and runtime performance comparison together show that\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathop{\\mathtt{MSSG}}\\limits\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            is both theoretically and practically more efficient than self-attentive methods. Our analysis of the attention weights learned in SA-based methods indicates that on sparse recommendation data, modeling dependencies in all item pairs using the SA mechanism yields limited information gain, and thus, might not benefit the recommendation performance. Our source code and data are publicly accessible through\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ninglab\/MSSG\">GitHub<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3676560","type":"journal-article","created":{"date-parts":[[2024,7,9]],"date-time":"2024-07-09T13:20:39Z","timestamp":1720531239000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Modeling Sequences as Star Graphs to Address Over-Smoothing in Self-Attentive Sequential Recommendation"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-7569-1828","authenticated-orcid":false,"given":"Bo","family":"Peng","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4778-8826","authenticated-orcid":false,"given":"Ziqi","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6062-6449","authenticated-orcid":false,"given":"Srinivasan","family":"Parthasarathy","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6842-1165","authenticated-orcid":false,"given":"Xia","family":"Ning","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Department of Computer Science and Engineering, Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,8,21]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330944"},{"key":"e_1_3_2_3_2","unstructured":"Chen Cai and Yusu Wang. 2020. A note on over-smoothing for graph neural networks. arXiv:2006.13318. Retrieved from https:\/\/arxiv.org\/abs\/2006.13318"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1609\/AAAI.V34I04.5747"},{"key":"e_1_3_2_5_2","first-page":"1098","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV \u201915)","author":"Ding Kun","year":"2015","unstructured":"Kun Ding, Chunlei Huo, Bin Fan, and Chunhong Pan. 2015. KNN hashing with factorized neighborhood representation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV \u201915). 1098\u20131106."},{"key":"e_1_3_2_6_2","unstructured":"Vijay P. Dwivedi and Xavier Bresson. 2020. A generalization of transformer networks to graphs. arXiv:2012.09699. Retrieved from https:\/\/arxiv.org\/abs\/2012.09699"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3512077"},{"key":"e_1_3_2_8_2","first-page":"1024","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Hamilton William L.","year":"2017","unstructured":"William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 1024\u20131034. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2827872"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2883037"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482136"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (GELUs). arXiv:1606.08415v5. Retrieved from 10.48550\/arXiv.1606.08415","DOI":"10.48550\/arXiv.1606.08415"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271761"},{"key":"e_1_3_2_15_2","unstructured":"Wenbing Huang Yu Rong Tingyang Xu Fuchun Sun and Junzhou Huang. 2020. Tackling over-smoothing for general graph convolutional networks. arXiv:2008.09864. Retrieved from https:\/\/arxiv.org\/abs\/2008.09864"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2018.00035"},{"key":"e_1_3_2_17_2","first-page":"5156","volume-title":"Proceedings of the 37th International Conference on Machine Learning (ICML \u201920)","volume":"119","author":"Katharopoulos Angelos","year":"2020","unstructured":"Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and Fran\u00e7ois Fleuret. 2020. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proceedings of the 37th International Conference on Machine Learning (ICML \u201920), Vol. 119. PMLR, 5156\u20135165. Retrieved from http:\/\/proceedings.mlr.press\/v119\/katharopoulos20a.html"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3132926"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2023.3345251"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1609\/AAAI.V32I1.11604"},{"key":"e_1_3_2_21_2","first-page":"333","volume-title":"Proceedings of the International Conference on Cyber-Physical Social Intelligence (ICCSI \u201923)","author":"Li Yangding","year":"2023","unstructured":"Yangding Li, Shaobin Fu, Hao Feng, Yangyang Zeng, Jinghao Wang, Zhihao Jiang, and Lvyun Zhang. 2023. Simple and efficient knowledge graph attention network for recommendation. In Proceedings of the International Conference on Cyber-Physical Social Intelligence (ICCSI \u201923). IEEE, 333\u2013338."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2023.111174"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219950"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330984"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104425"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/D19-1018"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412014"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2021.3049692"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","unstructured":"Shashank Rajput Nikhil Mehta Anima Singh Raghunandan H. Keshavan Trung Vu Lukasz Heldt Lichan Hong Yi Tay Vinh Q. Tran Jonah Samost Maciej Kula Ed H. Chi and Maheswaran Sathiamoorthy. 2023. Recommender systems with generative retrieval. arXiv:2305.05065. Retrieved from 10.48550\/ARXIV.2305.05065","DOI":"10.48550\/ARXIV.2305.05065"},{"key":"e_1_3_2_30_2","first-page":"452","volume-title":"Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI \u201909)","author":"Rendle Steffen","year":"2009","unstructured":"Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI \u201909). Jeff A. Bilmes and Andrew Y. Ng (Eds.), AUAI Press, 452\u2013461. Retrieved from https:\/\/www.auai.org\/uai2009\/papers\/UAI2009_0139_48141db02b9f0b02bc7158819ebfa2c7.pdf"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772773"},{"key":"e_1_3_2_32_2","unstructured":"Prasun Roy Subhankar Ghosh Saumik Bhattacharya and Umapada Pal. 2018. Effects of degradations on deep neural network architectures. arXiv:1807.10108 Retrieved from http:\/\/arxiv.org\/abs\/1807.10108"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1137\/0907087"},{"key":"e_1_3_2_34_2","volume-title":"Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922)","author":"Shi Han","year":"2022","unstructured":"Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, and James T. Kwok. 2022. Revisiting over-smoothing in BERT from the perspective of graph. In Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922). Retrieved from https:\/\/openreview.net\/forum?id=dUV91uaXm3"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3357895"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159656"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3530811"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","unstructured":"Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological) 58 1 (1996) 267\u2013288. Retrieved from http:\/\/www.jstor.org\/stable\/2346178","DOI":"10.1111\/j.2517-6161.1996.tb02080.x"},{"key":"e_1_3_2_39_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 5998\u20136008. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240369"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/P19-1248"},{"key":"e_1_3_2_42_2","unstructured":"Sinong Wang Belinda Z. Li Madian Khabsa Han Fang and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv:2006.04768. Retrieved from https:\/\/arxiv.org\/abs\/2006.04768"},{"key":"e_1_3_2_43_2","first-page":"5449","volume-title":"Proceedings of the 35th International Conference on Machine Learning (ICML \u201918)","volume":"80","author":"Xu Keyulu","year":"2018","unstructured":"Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-Ichi Kawarabayashi, and Stefanie Jegelka. 2018. Representation learning on graphs with jumping knowledge networks. In Proceedings of the 35th International Conference on Machine Learning (ICML \u201918), Vol. 80. Jennifer G. Dy and Andreas Krause (Eds.), PMLR, 5449\u20135458. Retrieved from http:\/\/proceedings.mlr.press\/v80\/xu18c.html"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13244-018-0639-9"},{"key":"e_1_3_2_45_2","unstructured":"Chaoqi Yang Ruijie Wang Shuochao Yao Shengzhong Liu and Tarek F. Abdelzaher. 2020. Revisiting \u201cover-smoothing\u201d in deep GCNs. arXiv:2003.13663. Retrieved from https:\/\/arxiv.org\/abs\/2003.13663"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290975"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2021.3119140"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2022.3185149"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/J.NEUCOM.2022.06.082"},{"key":"e_1_3_2_50_2","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR \u201920)","author":"Zhao Lingxiao","year":"2020","unstructured":"Lingxiao Zhao and Leman Akoglu. 2020. PairNorm: Tackling oversmoothing in GNNs. In Proceedings of the 8th International Conference on Learning Representations (ICLR \u201920). Retrieved from https:\/\/openreview.net\/forum?id=rkecl1rtwB"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3512111"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3676560","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3676560","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:46Z","timestamp":1750295926000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3676560"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,21]]},"references-count":50,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3676560"],"URL":"https:\/\/doi.org\/10.1145\/3676560","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2024,8,21]]},"assertion":[{"value":"2024-02-21","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-27","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}