{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T20:52:05Z","timestamp":1768337525043,"version":"3.49.0"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"name":"Beijing Natural Science Foundation","award":["JQ24019"],"award-info":[{"award-number":["JQ24019"]}]},{"name":"Open Project Program of State Key Laboratory of CNS\/ATM","award":["2024B31"],"award-info":[{"award-number":["2024B31"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62036012 and 62276257"],"award-info":[{"award-number":["62036012 and 62276257"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>\n                    Text-Attributed Heterogeneous Graphs (TAHGs), which combine text data with various graph relationship information linked to rich semantic entities, are ubiquitous in real-world scenarios. To extract information from TAHGs, a commonly used method is employing Pretrained Language Models (PLMs). However, existing methods are primarily designed for processing text and face challenges when dealing with graph information, leading to two main issues: incomplete context due to graph sampling and weak integration of text and graph information. In this article, we present a new approach named Metapath-Enhanced Language Model Pretraining (MLMP) on Text-Attributed Heterogeneous Graphs. The proposed model starts by gathering metapath information through pre-computed neighbor aggregation using a simple mean aggregator. Subsequently, this gathered metapath information, combined with textual data, is input into a GNN-nested PLM. Here, GNN components at each layer are nested alongside the transformer blocks of PLMs during the training process. We have also developed corresponding pretraining strategies for joint pretraining. The experimental results indicate that our model efficiently captures information within TAHGs. Across benchmark datasets, it consistently outperforms current state-of-the-art methods, demonstrating remarkable effectiveness in tasks such as link prediction and node classification. Our code is available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/chensh911\/MLMP\">https:\/\/github.com\/chensh911\/MLMP<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3763241","type":"journal-article","created":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T16:28:18Z","timestamp":1755880098000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Metapath-Enhanced Language Model Pretraining on Text-Attributed Heterogeneous Graphs"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-3182-0543","authenticated-orcid":false,"given":"Shangheng","family":"Chen","sequence":"first","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4190-1529","authenticated-orcid":false,"given":"Quan","family":"Fang","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9488-2208","authenticated-orcid":false,"given":"Shengsheng","family":"Qian","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8343-9665","authenticated-orcid":false,"given":"Changsheng","family":"Xu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing, China, University of Chinese Academy of Sciences, Beijing, China and Peng Cheng Laboratory, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2026,1,13]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et al. 2021. On the opportunities and risks of foundation models. arXiv:2108.07258. Retrieved from https:\/\/arxiv.org\/abs\/2108.07258"},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","unstructured":"William Brannon Suyash Fulay Hang Jiang Wonjune Kang Brandon Roy Jad Kabbara and Deb Roy. 2023. ConGraT: Self-supervised contrastive pretraining for joint graph and text embeddings. arXiv:2305.14321. Retrieved from https:\/\/arxiv.org\/abs\/2305.14321","DOI":"10.18653\/v1\/2024.textgraphs-1.2"},{"key":"e_1_3_1_4_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33, 1877\u20131901.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3616855.3635843"},{"key":"e_1_3_1_6_2","unstructured":"Petar Velikovi Guillem Cucurull Arantxa Casanova Adriana Romero Pietro Li\u00f2 and Yoshua Bengio. 2018. Graph attention networks. In International Conference on Learning Representations."},{"key":"e_1_3_1_7_2","unstructured":"Eli Chien Wei-Cheng Chang Cho-Jui Hsieh Hsiang-Fu Yu Jiong Zhang Olgica Milenkovic and Inderjit S. Dhillon. 2021. Node feature extraction by self-supervised multi-scale neighborhood prediction. arXiv:2111.00064. Retrieved from https:\/\/arxiv.org\/abs\/2111.00064"},{"key":"e_1_3_1_8_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_1_9_2","unstructured":"Keyu Duan Qian Liu Tat-Seng Chua Shuicheng Yan Wei Tsang Ooi Qizhe Xie and Junxian He. 2023. Simteg: A frustratingly simple approach improves textual graph learning. arXiv:2308.02565. Retrieved from https:\/\/arxiv.org\/abs\/2308.02565"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3580501"},{"key":"e_1_3_1_11_2","article-title":"Inductive representation learning on large graphs","volume":"30","author":"Hamilton Will","year":"2017","unstructured":"Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, Vol. 30.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2883037"},{"key":"e_1_3_1_13_2","first-page":"11790","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"39","author":"Hu Jun","year":"2025","unstructured":"Jun Hu, Bryan Hooi, Bingsheng He, and Yinwei Wei. 2025. Modality-Independent graph neural networks with global transformers for multimodal recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, 11790\u201311798."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380027"},{"key":"e_1_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Bowen Jin Wentao Zhang Yu Zhang Yu Meng Xinyang Zhang Qi Zhu and Jiawei Han. 2023. Patton: Language model pretraining on text-rich networks. arXiv:2305.12268. Retrieved from https:\/\/arxiv.org\/abs\/2305.12268","DOI":"10.18653\/v1\/2023.acl-long.387"},{"key":"e_1_3_1_16_2","unstructured":"Bowen Jin Yu Zhang Qi Zhu and Jiawei Han. 2022. Heterformer: Transformer-based deep node representation learning on heterogeneous text-rich networks. arXiv:2205.10282. Retrieved from https:\/\/arxiv.org\/abs\/2205.10282"},{"key":"e_1_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Vladimir Karpukhin Barlas O\u011fuz Sewon Min Patrick Lewis Ledell Wu Sergey Edunov Danqi Chen and Wen-Tau Yih. 2020. Dense passage retrieval for open-domain question answering. arXiv:2004.04906. Retrieved from https:\/\/arxiv.org\/abs\/2004.04906","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_3_1_18_2","unstructured":"Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907. Retrieved from https:\/\/arxiv.org\/abs\/1609.02907"},{"key":"e_1_3_1_19_2","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive nlp tasks","volume":"33","author":"Lewis Patrick","year":"2020","unstructured":"Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-Tau Yih, Tim Rockt\u00e4schel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, Vol. 33, 9459\u20139474.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3538533"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2024.3378914"},{"key":"e_1_3_1_22_2","unstructured":"Feng Liu Ziwang Fu Yunlong Wang and Qijian Zheng. 2025. TACFN: Transformer-based adaptive cross-modal fusion network for multimodal emotion recognition. arXiv:2505.06536. Retrieved from https:\/\/arxiv.org\/abs\/2505.06536"},{"key":"e_1_3_1_23_2","unstructured":"Hao Liu Jiarui Feng Lecheng Kong Ningyue Liang Dacheng Tao Yixin Chen and Muhan Zhang. 2023. One for all: Towards training one graph model for all classification tasks. arXiv:2310.00149. Retrieved from https:\/\/arxiv.org\/abs\/2310.00149"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Xin Liu Mingyu Yan Lei Deng Guoqi Li Xiaochun Ye Dongrui Fan Shirui Pan and Yuan Xie. 2022. Survey on graph neural network acceleration: An algorithmic perspective. arXiv:2202.04822. Retrieved from https:\/\/arxiv.org\/abs\/2202.04822","DOI":"10.24963\/ijcai.2022\/772"},{"key":"e_1_3_1_25_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664816"},{"key":"e_1_3_1_27_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv:1711.05101. Retrieved from https:\/\/arxiv.org\/abs\/1711.05101"},{"key":"e_1_3_1_28_2","article-title":"Distributed representations of words and phrases and their compositionality","volume":"26","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, Vol. 26.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_29_2","first-page":"14200","article-title":"Attention bottlenecks for multimodal fusion","volume":"34","author":"Nagrani Arsha","year":"2021","unstructured":"Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun. 2021. Attention bottlenecks for multimodal fusion. In Advances in Neural Information Processing Systems, Vol. 34, 14200\u201314213.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2024.3352100"},{"key":"e_1_3_1_31_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever. 2018. Improving language understanding by generative pre-training. OpenAI Technical Report. Retrieved from https:\/\/www.mikecaptain.com\/resources\/pdf\/GPT-1.pdf"},{"issue":"140","key":"e_1_3_1_32_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1\u201367.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93417-4_38"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3357866"},{"key":"e_1_3_1_35_2","unstructured":"Junwei Su Lingjun Mao and Chuan Wu. 2024. BG-HGNN: Toward scalable and efficient heterogeneous graph neural network. arXiv:2403.08207. Retrieved from https:\/\/arxiv.org\/abs\/2403.08207"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.14778\/3402707.3402736"},{"key":"e_1_3_1_37_2","article-title":"The Harvard USPTO patent dataset: A large-scale, well-structured, and multi-purpose corpus of patent applications","volume":"36","author":"Suzgun Mirac","year":"2024","unstructured":"Mirac Suzgun, Luke Melas-Kyriazi, Suproteem Sarkar, Scott D. Kominers, and Stuart Shieber. 2024. The Harvard USPTO patent dataset: A large-scale, well-structured, and multi-purpose corpus of patent applications. In Advances in Neural Information Processing Systems, Vol. 36.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","first-page":"2842","DOI":"10.1145\/3637528.3671987","volume-title":"Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","author":"Tang Jiabin","year":"2024","unstructured":"Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Long Xia, Dawei Yin, and Chao Huang. 2024. Higpt: Heterogeneous graph language model. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2842\u20132853."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i17.29875"},{"key":"e_1_3_1_40_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"issue":"11","key":"e_1_3_1_41_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Van der Maaten Laurens","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579\u20132605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240369"},{"key":"e_1_3_1_43_2","unstructured":"Mengting Wan Rishabh Misra Ndapa Nakashole and Julian McAuley. 2019. Fine-grained spoiler detection from large-scale review corpora. arXiv:1905.13416. Retrieved from https:\/\/arxiv.org\/abs\/1905.13416"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3618301"},{"key":"e_1_3_1_45_2","doi-asserted-by":"crossref","unstructured":"Zhihao Wen and Yuan Fang. 2023. Augmenting low-resource text classification with graph-grounded pre-training and prompting. arXiv:2305.03324. Retrieved from https:\/\/arxiv.org\/abs\/2305.03324","DOI":"10.1145\/3539618.3591641"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2978386"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i3.20204"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3045924"},{"key":"e_1_3_1_49_2","first-page":"28798","article-title":"GraphFormers: GNN-nested transformers for representation learning on textual graph","volume":"34","author":"Yang Junhan","year":"2021","unstructured":"Junhan Yang, Zheng Liu, Shitao Xiao, Chaozhuo Li, Defu Lian, Sanjay Agrawal, Amit Singh, Guangzhong Sun, and Xing Xie. 2021. GraphFormers: GNN-nested transformers for representation learning on textual graph. In Advances in Neural Information Processing Systems, Vol. 34, 28798\u201328810.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2024.3360454"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i9.26283"},{"key":"e_1_3_1_52_2","unstructured":"Michihiro Yasunaga Jure Leskovec and Percy Liang. 2022. Linkbert: Pretraining language models with document links. arXiv:2203.15827. Retrieved from https:\/\/arxiv.org\/abs\/2203.15827"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2022.3160208"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i15.29596"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330961"},{"key":"e_1_3_1_56_2","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong et al. 2023. A survey of large language models. arXiv:2303.18223. Retrieved from https:\/\/arxiv.org\/abs\/2303.18223"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3634918"},{"key":"e_1_3_1_58_2","first-page":"1534","volume-title":"Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM)","author":"Zhu Shichao","year":"2019","unstructured":"Shichao Zhu, Chuan Zhou, Shirui Pan, Xingquan Zhu, and Bin Wang. 2019. Relation structure-aware heterogeneous graph neural network. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 1534\u20131539."},{"key":"e_1_3_1_59_2","unstructured":"Yanqiao Zhu Yichen Xu Feng Yu Qiang Liu Shu Wu and Liang Wang. 2020. Deep graph contrastive representation learning. arXiv:2006.04131. Retrieved from https:\/\/arxiv.org\/abs\/2006.04131"},{"key":"e_1_3_1_60_2","first-page":"10316","volume-title":"Findings of the Association for Computational Linguistics (EMNLP \u201923)","author":"Zou Tao","year":"2023","unstructured":"Tao Zou, Le Yu, Yifei Huang, Leilei Sun, and Bowen Du. 2023. Pretraining language models with text-attributed heterogeneous graphs. In Findings of the Association for Computational Linguistics (EMNLP \u201923), 10316\u201310333."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3763241","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T14:19:49Z","timestamp":1768313989000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3763241"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,13]]},"references-count":59,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3763241"],"URL":"https:\/\/doi.org\/10.1145\/3763241","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,13]]},"assertion":[{"value":"2025-04-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-10","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}