{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T05:05:59Z","timestamp":1765343159974,"version":"3.46.0"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","funder":[{"name":"National Natural Science Foundation of China","award":["62376062"],"award-info":[{"award-number":["62376062"]}]},{"name":"Philosophy and Social Sciences 14th Five-Year Plan Project of Guangdong Province","award":["GD23CTS03"],"award-info":[{"award-number":["GD23CTS03"]}]},{"name":"Ministry of Education of Humanities and Social Science Project","award":["23YJAZH220, 24YJAZH244"],"award-info":[{"award-number":["23YJAZH220, 24YJAZH244"]}]},{"DOI":"10.13039\/501100012456","name":"National Social Science Fund of China","doi-asserted-by":"publisher","award":["24BXW047"],"award-info":[{"award-number":["24BXW047"]}],"id":[{"id":"10.13039\/501100012456","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,27]]},"DOI":"10.1145\/3746027.3755668","type":"proceedings-article","created":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T07:26:55Z","timestamp":1761377215000},"page":"6280-6288","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["DiffTMR: Diffusion-based Hierarchical Alignment for Text-Molecule Retrieval"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-2031-7635","authenticated-orcid":false,"given":"Chenxu","family":"Wang","sequence":"first","affiliation":[{"name":"Guangdong University of Foreign Studies, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3310-8347","authenticated-orcid":false,"given":"Dong","family":"Zhou","sequence":"additional","affiliation":[{"name":"Guangdong University of Foreign Studies, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4205-9530","authenticated-orcid":false,"given":"Ting","family":"Liu","sequence":"additional","affiliation":[{"name":"Guangdong University of Foreign Studies, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5935-2110","authenticated-orcid":false,"given":"Jianghao","family":"Lin","sequence":"additional","affiliation":[{"name":"Guangdong University of Foreign Studies, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2661-3078","authenticated-orcid":false,"given":"Yongmei","family":"Zhou","sequence":"additional","affiliation":[{"name":"Guangdong University of Foreign Studies, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7671-6150","authenticated-orcid":false,"given":"Aimin","family":"Yang","sequence":"additional","affiliation":[{"name":"Lingnan Normal University, Zhanjiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,27]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1371"},{"key":"e_1_3_2_1_2_1","first-page":"3","article-title":"Generative or discriminative? getting the best of both worlds","volume":"8","author":"Bernardo JM","year":"2007","unstructured":"JM Bernardo, MJ Bayarri, JO Berger, AP Dawid, D Heckerman, AFM Smith, and M West. 2007. Generative or discriminative? getting the best of both worlds. Bayesian Statistics, Vol. 8, 3 (2007), 3-24.","journal-title":"Bayesian Statistics"},{"key":"e_1_3_2_1_3_1","volume-title":"Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096","author":"Brock Andrew","year":"2018","unstructured":"Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)."},{"key":"e_1_3_2_1_4_1","first-page":"1877","article-title":"Language models are few-shot learners","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al., 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems, 1877-1901.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_5_1","unstructured":"Jinho Chang and Jong Chul Ye. 2024. LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space. arXiv:2405.17829 [cs.LG]"},{"key":"e_1_3_2_1_6_1","volume-title":"Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938","author":"Choi Jooyoung","year":"2021","unstructured":"Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. 2021. Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938 (2021)."},{"key":"e_1_3_2_1_7_1","first-page":"6140","volume-title":"Proceedings of International Conference on Machine Learning, ICML","volume":"202","author":"Christofidellis Dimitrios","year":"2023","unstructured":"Dimitrios Christofidellis, Giorgio Giannone, Jannis Born, Ole Winther, Teodoro Laino, and Matteo Manica. 2023. Unifying Molecular and Textual Representations via Multi-task Language Modelling. In Proceedings of International Conference on Machine Learning, ICML 2023, 23-29 July, Vol. 202. 6140-6157."},{"key":"e_1_3_2_1_8_1","first-page":"8780","article-title":"Diffusion models beat gans on image synthesis","volume":"34","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, Vol. 34 (2021), 8780-8794.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_9_1","volume-title":"Drug discovery: a historical perspective. science","author":"Drews Jurgen","year":"2000","unstructured":"Jurgen Drews. 2000. Drug discovery: a historical perspective. science, Vol. 287, 5460 (2000), 1960-1964."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.26"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.47"},{"key":"e_1_3_2_1_12_1","volume-title":"the 14th international conference on artificial intelligence and statistics. 315-323","author":"Glorot Xavier","year":"2011","unstructured":"Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In the 14th international conference on artificial intelligence and statistics. 315-323."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1002\/wcms.1637"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/380995.381002"},{"key":"e_1_3_2_1_15_1","first-page":"6840","article-title":"Denoising Diffusion Probabilistic Models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, Vol. 33, 6840-6851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_16_1","first-page":"13242","article-title":"Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser","volume":"34","author":"Kadkhodaie Zahra","year":"2021","unstructured":"Zahra Kadkhodaie and Eero Simoncelli. 2021. Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser. Advances in Neural Information Processing Systems, Vol. 34, 13242-13254.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Sunghwan Kim Paul A Thiessen Evan E Bolton Jie Chen Gang Fu Asta Gindulyte Lianyi Han Jane He Siqian He Benjamin A Shoemaker et al. 2016. PubChem substance and compound databases. Nucleic acids research Vol. 44 D1 (2016) D1202-D1213.","DOI":"10.1093\/nar\/gkv951"},{"key":"e_1_3_2_1_18_1","volume-title":"Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of The Twelfth International Conference on Learning Representations.","author":"Li Sihang","year":"2024","unstructured":"Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, and Qi Tian. 2024. Towards 3D Molecule-Text Interpretation in Language Models. In Proceedings of The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_1_20_1","first-page":"31360","article-title":"GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models","volume":"35","author":"Liang Chen","year":"2022","unstructured":"Chen Liang, Wenguan Wang, Jiaxu Miao, and Yi Yang. 2022. GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models. Advances in Neural Information Processing Systems, Vol. 35, 31360-31375.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_21_1","volume-title":"Git-mol: A multi-modal large language model for molecular science with graph, image, and text. Computers in biology and medicine","author":"Liu Pengfei","year":"2024","unstructured":"Pengfei Liu, Yiming Ren, Jun Tao, and Zhixiang Ren. 2024. Git-mol: A multi-modal large language model for molecular science with graph, image, and text. Computers in biology and medicine, Vol. 171 (2024), 108073."},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the 11th International Conference on Learning Representations.","author":"Liu Shengchao","year":"2023","unstructured":"Shengchao Liu, Hongyu Guo, and Jian Tang. 2023a. Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-023-00759-6"},{"key":"e_1_3_2_1_24_1","volume-title":"Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. arXiv preprint arXiv:2310.12798","author":"Liu Zhiyuan","year":"2023","unstructured":"Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. 2023b. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. arXiv preprint arXiv:2310.12798 (2023)."},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of 5th International Conference on Learning Representations.","author":"Loshchilov Ilya","year":"2017","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of 5th International Conference on Learning Representations."},{"key":"e_1_3_2_1_26_1","volume-title":"Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval. CoRR","author":"Luo Dezhao","year":"2024","unstructured":"Dezhao Luo, Shaogang Gong, Jiabo Huang, Hailin Jin, and Yang Liu. 2024. Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval. CoRR, Vol. abs\/2401.13329 (2024). arXiv:2401.13329"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00286"},{"key":"e_1_3_2_1_28_1","volume-title":"Xing Yi Liu, and Zaiqing Nie","author":"Luo Yizhen","year":"2023","unstructured":"Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, and Zaiqing Nie. 2023. MolFM: A Multimodal Molecular Foundation Model. CoRR, Vol. abs\/2307.09484 (2023). arXiv:2307.09484"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/BIBM62325.2024.10822800"},{"key":"e_1_3_2_1_30_1","volume-title":"Aaron Van den Oord, and Oriol Vinyals","author":"Razavi Ali","year":"2019","unstructured":"Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_1_31_1","volume-title":"Breckon","author":"Sasaki Hiroshi","year":"2021","unstructured":"Hiroshi Sasaki, Chris G. Willcocks, and Toby P. Breckon. 2021. UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models. CoRR, Vol. abs\/2104.05358 (2021). arXiv:2104.05358"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/BIBM62325.2024.10821722"},{"key":"e_1_3_2_1_33_1","volume-title":"A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language. CoRR","author":"Su Bing","year":"2022","unstructured":"Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, and Ji-Rong Wen. 2022. A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language. CoRR, Vol. abs\/2209.05481 (2022). arXiv:2209.05481"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.sbi.2023.102559"},{"key":"e_1_3_2_1_35_1","unstructured":"Aaron van den Oord Yazhe Li and Oriol Vinyals. 2019. Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748 [cs.LG]"},{"key":"e_1_3_2_1_36_1","volume-title":"Virtual screeninga-an overview. Drug discovery today","author":"Walters W Patrick","year":"1998","unstructured":"W Patrick Walters, Matthew T Stahl, and Mark A Murcko. 1998. Virtual screeninga-an overview. Drug discovery today, Vol. 3, 4 (1998), 160-178."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2022.01.013"},{"key":"e_1_3_2_1_38_1","volume-title":"CLASS: Enhancing Cross-Modal Text-Molecule Retrieval Performance and Training Efficiency. arXiv:2502.11633 [cs.CL]","author":"Wu Hongyan","year":"2025","unstructured":"Hongyan Wu, Peijian Zeng, Weixiong Zheng, Lianxi Wang, Nankai Lin, Shengyi Jiang, and Aimin Yang. 2025. CLASS: Enhancing Cross-Modal Text-Molecule Retrieval Performance and Training Efficiency. arXiv:2502.11633 [cs.CL]"},{"key":"e_1_3_2_1_39_1","volume-title":"DrugAssist: A Large Language Model for Molecule Optimization. CoRR","author":"Ye Geyan","year":"2024","unstructured":"Geyan Ye, Xibao Cai, Houtim Lai, Xing Wang, Junhong Huang, Longyue Wang, Wei Liu, and Xiangxiang Zeng. 2024. DrugAssist: A Large Language Model for Molecule Optimization. CoRR, Vol. abs\/2401.10334 (2024). arXiv:2401.10334"},{"key":"e_1_3_2_1_40_1","volume-title":"A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature communications","author":"Zeng Zheni","year":"2022","unstructured":"Zheni Zeng, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2022. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature communications, Vol. 13, 1 (2022), 862."},{"key":"e_1_3_2_1_41_1","volume-title":"Proceedings of the 13th International Conference on Learning Representations.","author":"Zhang Yikun","year":"2025","unstructured":"Yikun Zhang, Geyan Ye, Chaohao Yuan, Bo Han, Long-Kai Huang, Jianhua Yao, Wei Liu, and Yu Rong. 2025. Atomas: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation. In Proceedings of the 13th International Conference on Learning Representations."},{"key":"e_1_3_2_1_42_1","volume-title":"Rui Yan, and Zechao Li.","author":"Zhao Henghao","year":"2024","unstructured":"Henghao Zhao, Kevin Qinghong Lin, Rui Yan, and Zechao Li. 2024a. DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection. IEEE Transactions on Neural Networks and Learning Systems (2024), 1-14."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAI.2023.3254518"}],"event":{"name":"MM '25: The 33rd ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"MM '25"},"container-title":["Proceedings of the 33rd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746027.3755668","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T05:02:46Z","timestamp":1765342966000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746027.3755668"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,27]]},"references-count":43,"alternative-id":["10.1145\/3746027.3755668","10.1145\/3746027"],"URL":"https:\/\/doi.org\/10.1145\/3746027.3755668","relation":{},"subject":[],"published":{"date-parts":[[2025,10,27]]},"assertion":[{"value":"2025-10-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}