{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T16:27:26Z","timestamp":1778084846841,"version":"3.51.4"},"reference-count":292,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T00:00:00Z","timestamp":1739145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. This growing interest has led to the advent of scientific LLMs, a novel subclass specifically engineered for facilitating scientific discovery. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration. However, a systematic and up-to-date survey introducing them is currently lacking. In this article, we endeavor to methodically delineate the concept of \u201cscientific language,\u201d whilst providing a thorough review of the latest advancements in scientific LLMs. Given the expansive realm of scientific disciplines, our analysis adopts a focused lens, concentrating on the biological and chemical domains. This includes an in-depth examination of LLMs for textual knowledge, small molecules, macromolecular proteins, genomic sequences, and their combinations, analyzing them in terms of model architectures, capabilities, datasets, and evaluation. Finally, we critically examine the prevailing challenges and point out promising research directions along with the advances of LLMs. By offering a comprehensive overview of technical developments in this field, this survey aspires to be an invaluable resource for researchers navigating the intricate landscape of scientific LLMs.<\/jats:p>","DOI":"10.1145\/3715318","type":"journal-article","created":{"date-parts":[[2025,1,26]],"date-time":"2025-01-26T09:45:54Z","timestamp":1737884754000},"page":"1-38","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":67,"title":["Scientific Large Language Models: A Survey on Biological &amp; Chemical Domains"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1636-5269","authenticated-orcid":false,"given":"Qiang","family":"Zhang","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"},{"name":"ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2900-7313","authenticated-orcid":false,"given":"Keyan","family":"Ding","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"},{"name":"ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-2691-6467","authenticated-orcid":false,"given":"Tianwen","family":"Lv","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-5559-1714","authenticated-orcid":false,"given":"Xinda","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-5612-5521","authenticated-orcid":false,"given":"Qingyu","family":"Yin","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4551-6938","authenticated-orcid":false,"given":"Yiwen","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1156-9326","authenticated-orcid":false,"given":"Jing","family":"Yu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-0013-7259","authenticated-orcid":false,"given":"Yuhao","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3930-4474","authenticated-orcid":false,"given":"Xiaotong","family":"Li","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-4230-3026","authenticated-orcid":false,"given":"Zhuoyi","family":"Xiang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0253-1476","authenticated-orcid":false,"given":"Xiang","family":"Zhuang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5036-9602","authenticated-orcid":false,"given":"Zeyuan","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8607-8965","authenticated-orcid":false,"given":"Ming","family":"Qin","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-6545-8171","authenticated-orcid":false,"given":"Mengyao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-3336-8074","authenticated-orcid":false,"given":"Jinlu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1409-7338","authenticated-orcid":false,"given":"Jiyu","family":"Cui","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7566-7948","authenticated-orcid":false,"given":"Renjun","family":"Xu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7626-0162","authenticated-orcid":false,"given":"Hongyang","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang Lab, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6336-3007","authenticated-orcid":false,"given":"Xiaohui","family":"Fan","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"},{"name":"Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7418-0046","authenticated-orcid":false,"given":"Huabin","family":"Xing","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"},{"name":"ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China"},{"name":"Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, Hangzhou China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5496-7442","authenticated-orcid":false,"given":"Huajun","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"},{"name":"ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou China"}]}],"member":"320","published-online":{"date-parts":[[2025,2,10]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.2c00715"},{"key":"e_1_3_1_3_2","first-page":"10757","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"Abdine Hadi","year":"2024","unstructured":"Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, and Michalis Vazirgiannis. 2024. Prot2text: Multimodal protein\u2019s function generation with gnns and transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 10757\u201310765."},{"key":"e_1_3_1_4_2","article-title":"Gpt-4 technical report","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat et\u00a0al. 2023. Gpt-4 technical report. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774","journal-title":"Retrieved from"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Sanjar Adilov. 2021. Generative pre-training from molecules. Retrieved from https:\/\/chemrxiv.org\/engage\/chemrxiv\/article-details\/6142f60742198e8c31782e9e","DOI":"10.26434\/chemrxiv-2021-5fwjd"},{"key":"e_1_3_1_6_2","article-title":"Chemberta-2: Towards chemical foundation models","author":"Ahmad Walid","year":"2022","unstructured":"Walid Ahmad, Elana Simon, Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. 2022. Chemberta-2: Towards chemical foundation models. Retrieved from https:\/\/arxiv.org\/abs\/2209.01712","journal-title":"Retrieved from"},{"key":"e_1_3_1_7_2","article-title":"The impact of large language models on scientific discovery: A preliminary study using gpt-4","author":"AI4Science Microsoft Research","year":"2023","unstructured":"Microsoft Research AI4Science and Microsoft Azure Quantum. 2023. The impact of large language models on scientific discovery: A preliminary study using gpt-4. Retrieved from https:\/\/arxiv.org\/abs\/2311.07361","journal-title":"Retrieved from"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"221","DOI":"10.18653\/v1\/2021.bionlp-1.24","volume-title":"Proceedings of the 20th Workshop on Biomedical Language Processing","author":"Alrowili Sultan","year":"2021","unstructured":"Sultan Alrowili and K. Vijay-Shanker. 2021. BioM-transformers: Building large biomedical language models with BERT, ALBERT, and ELECTRA. In Proceedings of the 20th Workshop on Biomedical Language Processing. 221\u2013227."},{"key":"e_1_3_1_9_2","first-page":"1","volume-title":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","author":"An Weizhi","year":"2022","unstructured":"Weizhi An, Yuzhi Guo, Yatao Bian, Hehuan Ma, Jinyu Yang, Chunyuan Li, and Junzhou Huang. 2022. MoDNA: Motif-oriented pre-training for DNA language model. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 1\u20135."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1038\/75556"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-021-01252-x"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-022-01288-4"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.1c00600"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/28.1.45"},{"key":"e_1_3_1_15_2","article-title":"GPT-MolBERTa: GPT molecular features language model for molecular property prediction","author":"Balaji Suryanarayanan","year":"2023","unstructured":"Suryanarayanan Balaji, Rishikesh Magar, Yayati Jadhav et\u00a0al. 2023. GPT-MolBERTa: GPT molecular features language model for molecular property prediction. Retrieved from https:\/\/arxiv.org\/abs\/2310.03030","journal-title":"Retrieved from"},{"key":"e_1_3_1_16_2","article-title":"Towards foundational models for molecular learning on large-scale multi-task datasets","author":"Beaini Dominique","year":"2023","unstructured":"Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis M\u00fcller, Jama Hussein Mohamud et\u00a0al. 2023. Towards foundational models for molecular learning on large-scale multi-task datasets. Retrieved from https:\/\/arxiv.org\/abs\/2310.04292","journal-title":"Retrieved from"},{"key":"e_1_3_1_17_2","article-title":"SciBERT: A pretrained language model for scientific text","author":"Beltagy Iz","year":"2019","unstructured":"Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A pretrained language model for scientific text. Retrieved from https:\/\/arxiv.org\/abs\/1903.10676","journal-title":"Retrieved from"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","unstructured":"Gonzalo Benegas Carlos Albors Alan J. Aw Chengzhong Ye and Yun S. Song. 2023. GPN-MSA: An alignment-based DNA language model for genome-wide variant effect prediction. Retrieved from 10.1101\/2023.10.10.561776v1","DOI":"10.1101\/2023.10.10.561776v1"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","unstructured":"Gonzalo Benegas Sanjit Singh Batra and Yun S. Song. 2022. DNA language models are powerful zero-shot predictors of genome-wide variant effects. Retrieved from 10.1101\/2022.08.22.504706v1","DOI":"10.1101\/2022.08.22.504706v1"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2311219120"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cels.2021.05.017"},{"key":"e_1_3_1_22_2","article-title":"Translating embeddings for modeling multi-relational data","volume":"26","author":"Bordes Antoine","year":"2013","unstructured":"Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 26 (2013).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_23_2","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/978-1-4939-3167-5_2","article-title":"UniProtKB\/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view","author":"Boutet Emmanuel","year":"2016","unstructured":"Emmanuel Boutet, Damien Lieberherr, Michael Tognolli, Michel Schneider, Parit Bansal, Alan J. Bridge, Sylvain Poux, Lydie Bougueleret, and Ioannis Xenarios. 2016. UniProtKB\/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view. Plant Bioinform.: Methods Protocols (2016), 23\u201354.","journal-title":"Plant Bioinform.: Methods Protocols"},{"key":"e_1_3_1_24_2","article-title":"ChemCrow: Augmenting large-language models with chemistry tools","author":"Bran Andres M.","year":"2023","unstructured":"Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. 2023. ChemCrow: Augmenting large-language models with chemistry tools. Retrieved from https:\/\/arxiv.org\/abs\/2304.05376","journal-title":"Retrieved from"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1007\/978-981-97-4828-0_8","volume-title":"Drug Development Supported by Informatics","author":"Bran Andres M.","year":"2024","unstructured":"Andres M. Bran and Philippe Schwaller. 2024. Transformers and large language models for chemistry and drug discovery. In Drug Development Supported by Informatics. Springer, 143\u2013163."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btac020"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.8b00839"},{"key":"e_1_3_1_28_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33 (2020), 1877\u20131901.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_29_2","article-title":"Sparks of artificial general intelligence: Early experiments with gpt-4","author":"Bubeck S\u00e9bastien","year":"2023","unstructured":"S\u00e9bastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg et\u00a0al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. Retrieved from https:\/\/arxiv.org\/abs\/2303.12712","journal-title":"Retrieved from"},{"key":"e_1_3_1_30_2","article-title":"Uni-SMART: Universal science multimodal analysis and research transformer","author":"Cai Hengxing","year":"2024","unstructured":"Hengxing Cai, Xiaochen Cai, Shuwen Yang, Jiankun Wang, Lin Yao, Zhifeng Gao, Junhan Chang, Sihang Li, Mingjun Xu, Changxin Wang et\u00a0al. 2024. Uni-SMART: Universal science multimodal analysis and research transformer. Retrieved from https:\/\/arXiv:2403.10301","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_31_2","first-page":"D498\u2013D508","article-title":"BRENDA, the ELIXIR core data resource in 2021: New developments and updates","volume":"49","author":"Chang Antje","year":"2021","unstructured":"Antje Chang, Lisa Jeske, Sandra Ulbrich, Julia Hofmann, Julia Koblitz, Ida Schomburg, Meina Neumann-Schaal, Dieter Jahn, and Dietmar Schomburg. 2021. BRENDA, the ELIXIR core data resource in 2021: New developments and updates. Nucl. Acids Res. 49, D1 (2021), D498\u2013D508.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_32_2","article-title":"xTrimoPGLM: Unified 100B-scale pre-trained transformer for deciphering the language of protein","author":"Chen Bo","year":"2024","unstructured":"Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng et\u00a0al. 2024. xTrimoPGLM: Unified 100B-scale pre-trained transformer for deciphering the language of protein. Retrieved from https:\/\/arxiv.org\/abs\/2401.06199","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"3521","DOI":"10.1038\/s41467-021-23720-w","article-title":"Algebraic graph-assisted bidirectional transformers for molecular property prediction","volume":"12","author":"Chen Dong","year":"2021","unstructured":"Dong Chen, Kaifu Gao, Duc Duy Nguyen, Xin Chen, Yi Jiang, Guo-Wei Wei, and Feng Pan. 2021. Algebraic graph-assisted bidirectional transformers for molecular property prediction. Nature Commun. 12, 1 (2021), 3521.","journal-title":"Nature Commun."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","unstructured":"Ken Chen Yue Zhou Maolin Ding Yu Wang Zhixiang Ren and Yuedong Yang. 2023. Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction. Retrieved from 10.1101\/2023.01.31.526427v1","DOI":"10.1101\/2023.01.31.526427v1"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","unstructured":"Qiyuan Chen and Cheng Deng. 2023. Bioinfo-Bench: A simple benchmark framework for LLM bioinformatics skills evaluation. Retrieved from 10.1101\/2023.10.18.563023v1","DOI":"10.1101\/2023.10.18.563023v1"},{"issue":"1","key":"e_1_3_1_36_2","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1186\/s13321-023-00702-2","article-title":"Deep generative model for drug design from protein target sequence","volume":"15","author":"Chen Yangyang","year":"2023","unstructured":"Yangyang Chen, Zixu Wang, Lei Wang, Jianmin Wang, Pengyong Li, Dongsheng Cao, Xiangxiang Zeng, Xiucai Ye, and Tetsuya Sakurai. 2023. Deep generative model for drug design from protein target sequence. J. Cheminform. 15, 1 (2023), 38.","journal-title":"J. Cheminform."},{"key":"e_1_3_1_37_2","article-title":"Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality","author":"Chiang Wei-Lin","year":"2023","unstructured":"Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez et\u00a0al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. Retrieved 14 April 2023 from https:\/\/vicuna.lmsys.org","journal-title":"Retrieved 14 April 2023 from"},{"key":"e_1_3_1_38_2","article-title":"Bartsmiles: Generative masked language models for molecular representations","author":"Chilingaryan Gayane","year":"2022","unstructured":"Gayane Chilingaryan, Hovhannes Tamoyan, Ani Tevosyan, Nelly Babayan, Lusine Khondkaryan, Karen Hambardzumyan, Zaven Navoyan, Hrant Khachatrian, and Armen Aghajanyan. 2022. Bartsmiles: Generative masked language models for molecular representations. Retrieved from https:\/\/arxiv.org\/abs\/2211.16349","journal-title":"Retrieved from"},{"key":"e_1_3_1_39_2","article-title":"ChemBERTa: Large-scale self-supervised pretraining for molecular property prediction","author":"Chithrananda Seyone","year":"2020","unstructured":"Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. 2020. ChemBERTa: Large-scale self-supervised pretraining for molecular property prediction. Retrieved from https:\/\/arxiv.org\/abs\/2010.09885","journal-title":"Retrieved from"},{"issue":"240","key":"e_1_3_1_40_2","first-page":"1","article-title":"Palm: Scaling language modeling with pathways","volume":"24","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann et\u00a0al. 2023. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 24, 240 (2023), 1\u2013113.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_1_41_2","first-page":"6140","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Christofidellis Dimitrios","year":"2023","unstructured":"Dimitrios Christofidellis, Giorgio Giannone, Jannis Born, Ole Winther, Teodoro Laino, and Matteo Manica. 2023. Unifying molecular and textual representations via multi-task language modelling. In Proceedings of the International Conference on Machine Learning. PMLR, 6140\u20136157."},{"key":"e_1_3_1_42_2","article-title":"Generative antibody design for complementary chain pairing sequences through encoder-decoder language model","author":"Chu Simon K. S.","year":"2023","unstructured":"Simon K. S. Chu and Kathy Y. Wei. 2023. Generative antibody design for complementary chain pairing sequences through encoder-decoder language model. Retrieved from https:\/\/arxiv.org\/abs\/2301.02748","journal-title":"Retrieved from"},{"key":"e_1_3_1_43_2","article-title":"Electra: Pre-training text encoders as discriminators rather than generators","author":"Clark Kevin","year":"2020","unstructured":"Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. Retrieved from https:\/\/arxiv.org\/abs\/2003.10555","journal-title":"Retrieved from"},{"key":"e_1_3_1_44_2","article-title":"Think you have solved question answering? Try arc, the ai2 reasoning challenge","author":"Clark Peter","year":"2018","unstructured":"Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. 2018. Think you have solved question answering? Try arc, the ai2 reasoning challenge. Retrieved from https:\/\/arxiv.org\/abs\/1803.05457","journal-title":"Retrieved from"},{"key":"e_1_3_1_45_2","article-title":"To transformers and beyond: Large language models for the genome","author":"Consens Micaela E.","year":"2023","unstructured":"Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, and Bo Wang. 2023. To transformers and beyond: Large language models for the genome. Retrieved from https:\/\/arxiv.org\/abs\/2311.07621","journal-title":"Retrieved from"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature15393"},{"issue":"1","key":"e_1_3_1_47_2","first-page":"D523\u2013D531","article-title":"UniProt: The universal protein knowledgebase in 2023","volume":"51","author":"Consortium The UniProt","year":"2023","unstructured":"The UniProt Consortium. 2023. UniProt: The universal protein knowledgebase in 2023. Nucl. Acids Res. 51, D1 (2023), D523\u2013D531.","journal-title":"Nucl. Acids Res."},{"issue":"7414","key":"e_1_3_1_48_2","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"82 ENCODE Project Consortium Overall coordination (data analysis coordination) Dunham Ian 2 Kundaje Anshul 3 81 82","year":"2012","unstructured":"ENCODE Project Consortium Overall coordination (data analysis coordination) Dunham Ian 2 Kundaje Anshul 3 81 82 82, Writing group Bernstein Bradley E. 7 34 Birney Ewan Dunham Ian Green Eric D. 35 Gunter Chris 15 Snyder Michael 13 et\u00a0al. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 7414 (2012), 57\u201374.","journal-title":"Nature"},{"key":"e_1_3_1_49_2","article-title":"The nucleotide transformer: Building and evaluating robust foundation models for human genomics","author":"Dalla-Torre Hugo","year":"2023","unstructured":"Hugo Dalla-Torre, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, Evan Trop, Bernardo P. de Almeida, Hassan Sirelkhatim et\u00a0al. 2023. The nucleotide transformer: Building and evaluating robust foundation models for human genomics. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.01.11.523679","journal-title":"Retrieved from"},{"key":"e_1_3_1_50_2","article-title":"FLIP: Benchmark tasks in fitness landscape inference for proteins","author":"Dallago Christian","year":"2021","unstructured":"Christian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, and Kevin K. Yang. 2021. FLIP: Benchmark tasks in fitness landscape inference for proteins. Retrieved from https:\/\/www.biorxiv.org\/content\/2021.11.09.467890","journal-title":"Retrieved from"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","unstructured":"Lucas Paulo de Lima Camillo Raghav Sehgal Jenel Armstrong Albert Tzongyang Higgins-Chen Steve Horvath and Bo Wang. 2024. CpGPT: A foundation model for DNA methylation. Retrieved from 10.1101\/2024.10.24.619766v1","DOI":"10.1101\/2024.10.24.619766v1"},{"key":"e_1_3_1_52_2","article-title":"Chemical language model linker: Blending text and molecules with modular adapters","author":"Deng Yifan","year":"2024","unstructured":"Yifan Deng, Spencer S. Ericksen, and Anthony Gitter. 2024. Chemical language model linker: Blending text and molecules with modular adapters. Retrieved from https:\/\/arXiv:2410.20182","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_53_2","first-page":"D157\u2013D164","article-title":"EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era","volume":"41","author":"Dreos Ren\u00e9","year":"2013","unstructured":"Ren\u00e9 Dreos, Giovanna Ambrosini, Rouayda Cavin P\u00e9rier, and Philipp Bucher. 2013. EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era. Nucl. Acids Res. 41, D1 (2013), D157\u2013D164.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_54_2","article-title":"Molgensurvey: A systematic survey in machine learning models for molecule design","author":"Du Yuanqi","year":"2022","unstructured":"Yuanqi Du, Tianfan Fu, Jimeng Sun, and Shengchao Liu. 2022. Molgensurvey: A systematic survey in machine learning models for molecule design. Retrieved from https:\/\/arxiv.org\/abs\/2203.14500","journal-title":"Retrieved from"},{"key":"e_1_3_1_55_2","article-title":"Glm: General language model pretraining with autoregressive blank infilling","author":"Du Zhengxiao","year":"2021","unstructured":"Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2021. Glm: General language model pretraining with autoregressive blank infilling. Retrieved from https:\/\/arxiv.org\/abs\/2103.10360","journal-title":"Retrieved from"},{"key":"e_1_3_1_56_2","article-title":"Translation between molecules and natural language","author":"Edwards Carl","year":"2022","unstructured":"Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji. 2022. Translation between molecules and natural language. Retrieved from https:\/\/arxiv.org\/abs\/2204.11817","journal-title":"Retrieved from"},{"key":"e_1_3_1_57_2","article-title":"MolCap-Arena: A comprehensive captioning benchmark on language-enhanced molecular property prediction","author":"Edwards Carl","year":"2024","unstructured":"Carl Edwards, Ziqing Lu, Ehsan Hajiramezanali, Tommaso Biancalani, Heng Ji, and Gabriele Scalia. 2024. MolCap-Arena: A comprehensive captioning benchmark on language-enhanced molecular property prediction. Retrieved from https:\/\/arXiv:2411.00737","journal-title":"Retrieved from"},{"key":"e_1_3_1_58_2","first-page":"595","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Edwards Carl","year":"2021","unstructured":"Carl Edwards, ChengXiang Zhai, and Heng Ji. 2021. Text2mol: Cross-modal molecule retrieval with natural language queries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 595\u2013607."},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3095381"},{"key":"e_1_3_1_60_2","article-title":"Mol-instructions: A large-scale biomolecular instruction dataset for large language models","author":"Fang Yin","year":"2023","unstructured":"Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. 2023. Mol-instructions: A large-scale biomolecular instruction dataset for large language models. Retrieved from https:\/\/arxiv.org\/abs\/2306.08018","journal-title":"Retrieved from"},{"key":"e_1_3_1_61_2","article-title":"Domain-agnostic molecular generation with self-feedback","author":"Fang Yin","year":"2023","unstructured":"Yin Fang, Ningyu Zhang, Zhuo Chen, Lingbing Guo, Xiaohui Fan, and Huajun Chen. 2023. Domain-agnostic molecular generation with self-feedback. Retrieved from https:\/\/arxiv.org\/abs\/2301.11259","journal-title":"Retrieved from"},{"key":"e_1_3_1_62_2","article-title":"SciKnowEval: Evaluating multi-level scientific knowledge of large language models","author":"Feng Kehua","year":"2024","unstructured":"Kehua Feng, Keyan Ding, Weijie Wang, Xiang Zhuang, Zeyuan Wang, Ming Qin, Yu Zhao, Jianhua Yao, Qiang Zhang, and Huajun Chen. 2024. SciKnowEval: Evaluating multi-level scientific knowledge of large language models. Retrieved from https:\/\/arxiv.org\/abs\/2406.09098","journal-title":"Retrieved from"},{"key":"e_1_3_1_63_2","article-title":"A deep unsupervised language model for protein design","author":"Ferruz Noelia","year":"2022","unstructured":"Noelia Ferruz, Steffen Schmidt, and Birte H\u00f6cker. 2022. A deep unsupervised language model for protein design. Retrieved from https:\/\/www.biorxiv.org\/content\/2022.03.09.483666","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_64_2","first-page":"D247\u2013D251","article-title":"Pfam: Clans, web tools and services","volume":"34","author":"Finn Robert D.","year":"2006","unstructured":"Robert D. Finn, Jaina Mistry, Benjamin Schuster-B\u00f6ckler, Sam Griffiths-Jones, Volker Hollich, Timo Lassmann, Simon Moxon, Mhairi Marshall, Ajay Khanna, Richard Durbin et\u00a0al. 2006. Pfam: Clans, web tools and services. Nucl. Acids Res. 34, suppl_1 (2006), D247\u2013D251.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_65_2","article-title":"GENA-LM: A family of open-source foundational models for long DNA sequences","author":"Fishman Veniamin","year":"2023","unstructured":"Veniamin Fishman, Yuri Kuratov, Maxim Petrov, Aleksei Shmelev, Denis Shepelin, Nikolay Chekanov, Olga Kardymon, and Mikhail Burtsev. 2023. GENA-LM: A family of open-source foundational models for long DNA sequences. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.06.12.544594","journal-title":"Retrieved from"},{"issue":"4","key":"e_1_3_1_66_2","first-page":"47","article-title":"Bloom\u2019s taxonomy","volume":"41","author":"Forehand Mary","year":"2010","unstructured":"Mary Forehand. 2010. Bloom\u2019s taxonomy. Emerg. Perspect. Learn. Teach. Technol. 41, 4 (2010), 47\u201356.","journal-title":"Emerg. Perspect. Learn. Teach. Technol."},{"key":"e_1_3_1_67_2","first-page":"baz046","article-title":"PanglaoDB: A web server for exploration of mouse and human single-cell RNA sequencing data","volume":"2019","author":"Franz\u00e9n Oscar","year":"2019","unstructured":"Oscar Franz\u00e9n, Li-Ming Gan, and Johan L. M. Bj\u00f6rkegren. 2019. PanglaoDB: A web server for exploration of mouse and human single-cell RNA sequencing data. Database 2019 (2019), baz046.","journal-title":"Database"},{"key":"e_1_3_1_68_2","first-page":"1","article-title":"A foundation model of transcription across human cell types","author":"Fu Xi","year":"2025","unstructured":"Xi Fu, Shentong Mo, Alejandro Buendia, Anouchka P. Laurent, Anqi Shao, Maria del Mar Alvarez-Torres, Tianji Yu, Jimin Tan, Jiayu Su, Romella Sagatelian et\u00a0al. 2025. A foundation model of transcription across human cell types. Nature (2025), 1\u20139.","journal-title":"Nature"},{"key":"e_1_3_1_69_2","article-title":"Species-aware DNA language modeling","author":"Gankin Dennis","year":"2023","unstructured":"Dennis Gankin, Alexander Karollus, Martin Grosshauser, Kristian Klemon, Johannes Hingerl, and Julien Gagneur. 2023. Species-aware DNA language modeling. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.01.26.525670","journal-title":"Retrieved from"},{"key":"e_1_3_1_70_2","article-title":"DrugCLIP: Contrasive protein-molecule representation learning for virtual screening","volume":"36","author":"Gao Bowen","year":"2024","unstructured":"Bowen Gao, Bo Qiang, Haichuan Tan, Yinjun Jia, Minsi Ren, Minsi Lu, Jingjing Liu, Wei-Ying Ma, and Yanyan Lan. 2024. DrugCLIP: Contrasive protein-molecule representation learning for virtual screening. Adv. Neural Inf. Process. Syst. 36 (2024).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_71_2","article-title":"EpiGePT: A pretrained transformer model for epigenomics","author":"Gao Zijing","year":"2023","unstructured":"Zijing Gao, Qiao Liu, Wanwen Zeng, Wing H. Wong, and Rui Jiang. 2023. EpiGePT: A pretrained transformer model for epigenomics. Retrieved from https:\/\/bioRxiv:2023.07.15.549134","journal-title":"Retrieved from"},{"key":"e_1_3_1_72_2","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1351\/pac198456050595","article-title":"Nomenclature and symbolism for amino acids and peptides","volume":"56","author":"Bielka H.","year":"1984","unstructured":"H. Bielka, N. Sharon, and E. W. Australia. 1984. Nomenclature and symbolism for amino acids and peptides. Pure Appl. Chem. 56 (1984), 595\u2013624.","journal-title":"Pure Appl. Chem."},{"issue":"1","key":"e_1_3_1_73_2","first-page":"D1045\u2013D1053","article-title":"BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology","volume":"44","author":"Gilson Michael K.","year":"2016","unstructured":"Michael K. Gilson, Tiqing Liu, Michael Baitaluk, George Nicola, Linda Hwang, and Jenny Chong. 2016. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucl. Acids Res. 44, D1 (2016), D1045\u2013D1053.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458754"},{"key":"e_1_3_1_75_2","first-page":"18099","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"Gu Zhouhong","year":"2024","unstructured":"Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Jianchen Wang, Yixin Zhu, Sihang Jiang, Zhuozhi Xiong, Zihan Li, Weijie Wu et\u00a0al. 2024. Xiezhi: An ever-updating benchmark for holistic domain knowledge evaluation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 18099\u201318107."},{"issue":"9","key":"e_1_3_1_76_2","first-page":"2035","article-title":"Automated chemical reaction extraction from scientific literature","volume":"62","author":"Guo Jiang","year":"2021","unstructured":"Jiang Guo, A. Santiago Ibanez-Lopez, Hanyu Gao, Victor Quach, Connor W. Coley, Klavs F. Jensen, and Regina Barzilay. 2021. Automated chemical reaction extraction from scientific literature. J. Chem. Inf. Model. 62, 9 (2021), 2035\u20132045.","journal-title":"J. Chem. Inf. Model."},{"key":"e_1_3_1_77_2","article-title":"ProtDAT: A unified framework for protein sequence design from any protein text description","author":"Guo Xiao-Yu","year":"2024","unstructured":"Xiao-Yu Guo, Yi-Fan Li, Yuan Liu, Xiaoyong Pan, and Hong-Bin Shen. 2024. ProtDAT: A unified framework for protein sequence design from any protein text description. Retrieved from https:\/\/arXiv:2412.04069","journal-title":"Retrieved from"},{"issue":"2","key":"e_1_3_1_78_2","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.ygeno.2017.01.005","article-title":"Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis","volume":"109","author":"Guo Yan","year":"2017","unstructured":"Yan Guo, Yulin Dai, Hui Yu, Shilin Zhao, David C. Samuels, and Yu Shyr. 2017. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis. Genomics 109, 2 (2017), 83\u201390.","journal-title":"Genomics"},{"issue":"1","key":"e_1_3_1_79_2","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1038\/s41524-022-00784-w","article-title":"MatSciBERT: A materials domain language model for text mining and information extraction","volume":"8","author":"Gupta Tanishq","year":"2022","unstructured":"Tanishq Gupta, Mohd Zaki, N. M. Anoop Krishnan, and Mausam. 2022. MatSciBERT: A materials domain language model for text mining and information extraction. npj Comput. Mater. 8, 1 (2022), 102.","journal-title":"npj Comput. Mater."},{"key":"e_1_3_1_80_2","article-title":"Pre-training co-evolutionary protein representation via a pairwise masked language model","author":"He Liang","year":"2021","unstructured":"Liang He, Shizhuo Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, Yingce Xia, Jianwei Zhu, Pan Deng et\u00a0al. 2021. Pre-training co-evolutionary protein representation via a pairwise masked language model. Retrieved from https:\/\/arxiv.org\/abs\/2110.15527","journal-title":"Retrieved from"},{"key":"e_1_3_1_81_2","article-title":"ProstT5: Bilingual language model for protein sequence and structure","author":"Heinzinger Michael","year":"2023","unstructured":"Michael Heinzinger, Konstantin Weissenow, Joaquin Gomez Sanchez, Adrian Henkel, Martin Steinegger, and Burkhard Rost. 2023. ProstT5: Bilingual language model for protein sequence and structure. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.07.23.550085","journal-title":"Retrieved from"},{"key":"e_1_3_1_82_2","article-title":"Measuring massive multitask language understanding","author":"Hendrycks Dan","year":"2020","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2020. Measuring massive multitask language understanding. Retrieved from https:\/\/arxiv.org\/abs\/2009.03300","journal-title":"Retrieved from"},{"key":"e_1_3_1_83_2","article-title":"Rita: A study on scaling up generative protein sequence models","author":"Hesslow Daniel","year":"2022","unstructured":"Daniel Hesslow, Niccol\u00f3 Zanichelli, Pascal Notin, Iacopo Poli, and Debora Marks. 2022. Rita: A study on scaling up generative protein sequence models. Retrieved from https:\/\/arxiv.org\/abs\/2205.05789","journal-title":"Retrieved from"},{"issue":"5","key":"e_1_3_1_84_2","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1093\/bioinformatics\/16.5.484","article-title":"Viral genome DataBase: Storing and analyzing genes and proteins from complete viral genomes","volume":"16","author":"Hiscock David","year":"2000","unstructured":"David Hiscock and Chris Upton. 2000. Viral genome DataBase: Storing and analyzing genes and proteins from complete viral genomes. Bioinformatics 16, 5 (2000), 484\u2013485.","journal-title":"Bioinformatics"},{"key":"e_1_3_1_85_2","article-title":"Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery","author":"Honda Shion","year":"2019","unstructured":"Shion Honda, Shoi Shi, and Hiroki R. Ueda. 2019. Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. Retrieved from https:\/\/arxiv.org\/abs\/1911.04738","journal-title":"Retrieved from"},{"key":"e_1_3_1_86_2","article-title":"The diminishing returns of masked language models to science","author":"Hong Zhi","year":"2022","unstructured":"Zhi Hong, Aswathy Ajith, Gregory Pauloski, Eamon Duede, Kyle Chard, and Ian Foster. 2022. The diminishing returns of masked language models to science. Retrieved from https:\/\/arxiv.org\/abs\/2205.11342","journal-title":"Retrieved from"},{"issue":"5","key":"e_1_3_1_87_2","doi-asserted-by":"crossref","first-page":"bbad307","DOI":"10.1093\/bib\/bbad307","article-title":"A systematic benchmark of machine learning methods for protein\u2013RNA interaction prediction","volume":"24","author":"Horlacher Marc","year":"2023","unstructured":"Marc Horlacher, Giulia Cantini, Julian Hesse, Patrick Schinke, Nicolas Goedert, Shubhankar Londhe, Lambert Moyon, and Annalisa Marsico. 2023. A systematic benchmark of machine learning methods for protein\u2013RNA interaction prediction. Brief. Bioinform. 24, 5 (2023), bbad307.","journal-title":"Brief. Bioinform."},{"key":"e_1_3_1_88_2","article-title":"Protein language models and structure prediction: Connection and progression","author":"Hu Bozhen","year":"2022","unstructured":"Bozhen Hu, Jun Xia, Jiangbin Zheng, Cheng Tan, Yufei Huang, Yongjie Xu, and Stan Z. Li. 2022. Protein language models and structure prediction: Connection and progression. Retrieved from https:\/\/arxiv.org\/abs\/2211.16742","journal-title":"Retrieved from"},{"key":"e_1_3_1_89_2","article-title":"Ogb-lsc: A large-scale challenge for machine learning on graphs","author":"Hu Weihua","year":"2021","unstructured":"Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, and Jure Leskovec. 2021. Ogb-lsc: A large-scale challenge for machine learning on graphs. Retrieved from https:\/\/arxiv.org\/abs\/2103.09430","journal-title":"Retrieved from"},{"key":"e_1_3_1_90_2","first-page":"22118","article-title":"Open graph benchmark: Datasets for machine learning on graphs","volume":"33","author":"Hu Weihua","year":"2020","unstructured":"Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Adv. Neural Inf. Process. Syst. 33 (2020), 22118\u201322133.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_91_2","article-title":"C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models","volume":"36","author":"Huang Yuzhen","year":"2024","unstructured":"Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Yao Fu et\u00a0al. 2024. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. Adv. Neural Inf. Process. Syst. 36 (2024).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1021\/ci3001277"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.0c00675"},{"key":"e_1_3_1_94_2","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac3ffb"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btab083"},{"issue":"01","key":"e_1_3_1_96_2","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1142\/S0219720008003278","article-title":"Protein structure\u2013structure alignment with discrete Fr\u00e9chet distance","volume":"6","author":"Jiang Minghui","year":"2008","unstructured":"Minghui Jiang, Ying Xu, and Binhai Zhu. 2008. Protein structure\u2013structure alignment with discrete Fr\u00e9chet distance. J. Bioinform. Comput. Biol. 6, 01 (2008), 51\u201364.","journal-title":"J. Bioinform. Comput. Biol."},{"issue":"1","key":"e_1_3_1_97_2","first-page":"1","article-title":"iDNA-ABF: Multi-scale deep biological language learning model for the interpretable prediction of DNA methylations","volume":"23","author":"Jin Junru","year":"2022","unstructured":"Junru Jin, Yingying Yu, Ruheng Wang, Xin Zeng, Chao Pang, Yi Jiang, Zhongshen Li, Yutong Dai, Ran Su, Quan Zou et\u00a0al. 2022. iDNA-ABF: Multi-scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome Biol. 23, 1 (2022), 1\u201323.","journal-title":"Genome Biol."},{"key":"e_1_3_1_98_2","unstructured":"Mingyu Jin Haochen Xue Zhenting Wang Boming Kang Ruosong Ye Kaixiong Zhou Mengnan Du and Yongfeng Zhang. 2024. ProLLM: Protein chain-of-thoughts enhanced LLM for protein-protein interaction prediction. Retrieved from https:\/\/arxiv.org\/abs\/2405.06649"},{"key":"e_1_3_1_99_2","article-title":"Pubmedqa: A dataset for biomedical research question answering","author":"Jin Qiao","year":"2019","unstructured":"Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W. Cohen, and Xinghua Lu. 2019. Pubmedqa: A dataset for biomedical research question answering. Retrieved from https:\/\/arxiv.org\/abs\/1909.06146","journal-title":"Retrieved from"},{"key":"e_1_3_1_100_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-021-03819-2"},{"key":"e_1_3_1_101_2","article-title":"Does your model understand genes? A benchmark of gene properties for biological and text models","author":"Kan-Tor Yoav","year":"2024","unstructured":"Yoav Kan-Tor, Michael Morris Danziger, Eden Zohar, Matan Ninio, and Yishai Shimoni. 2024. Does your model understand genes? A benchmark of gene properties for biological and text models. Retrieved from https:\/\/arxiv.org\/abs\/2412.04075","journal-title":"Retrieved from"},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkg129"},{"key":"e_1_3_1_103_2","first-page":"817","volume-title":"Proceedings of the International Conference on Artificial Neural Networks","author":"Karpov Pavel","year":"2019","unstructured":"Pavel Karpov, Guillaume Godin, and Igor V. Tetko. 2019. A transformer model for retrosynthesis. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 817\u2013830."},{"issue":"9","key":"e_1_3_1_104_2","doi-asserted-by":"crossref","first-page":"1436","DOI":"10.1002\/humu.23873","article-title":"CAGI5: Objective performance assessments of predictions based on the Evolutionary Action equation","volume":"40","author":"Katsonis Panagiotis","year":"2019","unstructured":"Panagiotis Katsonis and Olivier Lichtarge. 2019. CAGI5: Objective performance assessments of predictions based on the Evolutionary Action equation. Hum. Mutat. 40, 9 (2019), 1436\u20131454.","journal-title":"Hum. Mutat."},{"key":"e_1_3_1_105_2","first-page":"2","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT\u201919)","volume":"1","author":"Kenton Jacob Devlin Ming-Wei Chang","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT\u201919), Vol. 1. Minneapolis, Minnesota, 2."},{"issue":"12","key":"e_1_3_1_106_2","doi-asserted-by":"crossref","first-page":"5804","DOI":"10.1021\/acs.jcim.1c01289","article-title":"Generative chemical transformer: Neural machine learning of molecular geometric structures from chemical language via attention","volume":"61","author":"Kim Hyunseung","year":"2021","unstructured":"Hyunseung Kim, Jonggeol Na, and Won Bo Lee. 2021. Generative chemical transformer: Neural machine learning of molecular geometric structures from chemical language via attention. J. Chem. Inf. Model. 61, 12 (2021), 5804\u20135814.","journal-title":"J. Chem. Inf. Model."},{"issue":"1","key":"e_1_3_1_107_2","first-page":"D1373\u2013D1380","article-title":"PubChem 2023 update","volume":"51","author":"Kim Sunghwan","year":"2023","unstructured":"Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A. Shoemaker, Paul A. Thiessen, Bo Yu et\u00a0al. 2023. PubChem 2023 update. Nucl. Acids Res. 51, D1 (2023), D1373\u2013D1380.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_108_2","doi-asserted-by":"publisher","DOI":"10.1207\/s15430421tip4104_2"},{"key":"e_1_3_1_109_2","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba947"},{"issue":"1","key":"e_1_3_1_110_2","first-page":"D1202\u2013D1210","article-title":"The arabidopsis information resource (TAIR): Improved gene annotation and new tools","volume":"40","author":"Lamesch Philippe","year":"2012","unstructured":"Philippe Lamesch, Tanya Z. Berardini, Donghui Li, David Swarbreck, Christopher Wilks, Rajkumar Sasidharan, Robert Muller, Kate Dreher, Debbie L. Alexander, Margarita Garcia-Hernandez et\u00a0al. 2012. The arabidopsis information resource (TAIR): Improved gene annotation and new tools. Nucl. Acids Res. 40, D1 (2012), D1202\u2013D1210.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_111_2","article-title":"Albert: A lite bert for self-supervised learning of language representations","author":"Lan Zhenzhong","year":"2019","unstructured":"Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. Retrieved from https:\/\/arxiv.org\/abs\/1909.11942","journal-title":"Retrieved from"},{"key":"e_1_3_1_112_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btz682"},{"key":"e_1_3_1_113_2","article-title":"Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension","author":"Lewis M.","year":"2019","unstructured":"M. Lewis. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Retrieved from https:\/\/arxiv.org\/abs\/1910.13461","journal-title":"Retrieved from"},{"key":"e_1_3_1_114_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539426"},{"issue":"22","key":"e_1_3_1_115_2","first-page":"e129\u2013e129","article-title":"BioSeq-BLM: A platform for analyzing DNA, RNA and protein sequences based on biological language models","volume":"49","author":"Li Hong-Liang","year":"2021","unstructured":"Hong-Liang Li, Yi-He Pang, and Bin Liu. 2021. BioSeq-BLM: A platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucl. Acids Res. 49, 22 (2021), e129\u2013e129.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_116_2","first-page":"1","article-title":"Mol-BERT: An effective molecular representation with BERT for molecular property prediction","volume":"2021","author":"Li Juncai","year":"2021","unstructured":"Juncai Li and Xiaofei Jiang. 2021. Mol-BERT: An effective molecular representation with BERT for molecular property prediction. Wireless Commun. Mobile Comput. 2021 (2021), 1\u20137.","journal-title":"Wireless Commun. Mobile Comput."},{"key":"e_1_3_1_117_2","article-title":"BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. Retrieved from https:\/\/arxiv.org\/abs\/2301.12597","journal-title":"Retrieved from"},{"key":"e_1_3_1_118_2","article-title":"TOMG-Bench: Evaluating LLMs on text-based open molecule generation","author":"Li Jiatong","year":"2024","unstructured":"Jiatong Li, Junxian Li, Yunqing Liu, Dongzhan Zhou, and Qing Li. 2024. TOMG-Bench: Evaluating LLMs on text-based open molecule generation. Retrieved from https:\/\/arXiv:2412.14642","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_119_2","doi-asserted-by":"crossref","first-page":"vbad043","DOI":"10.1093\/bioadv\/vbad043","article-title":"iEnhancer-ELM: Improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models","volume":"3","author":"Li Jiahao","year":"2023","unstructured":"Jiahao Li, Zhourun Wu, Wenhao Lin, Jiawei Luo, Jun Zhang, Qingcai Chen, and Junjie Chen. 2023. iEnhancer-ELM: Improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models. Bioinform. Adv. 3, 1 (2023), vbad043.","journal-title":"Bioinform. Adv."},{"key":"e_1_3_1_120_2","article-title":"Druggpt: A gpt-based strategy for designing potential ligands targeting specific proteins","author":"Li Yuesen","year":"2023","unstructured":"Yuesen Li, Chengyi Gao, Xin Song, Xiangyu Wang, Yungang Xu, and Suxia Han. 2023. Druggpt: A gpt-based strategy for designing potential ligands targeting specific proteins. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.06.29.543848","journal-title":"Retrieved from"},{"key":"e_1_3_1_121_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2023.107260"},{"key":"e_1_3_1_122_2","article-title":"DrugChat: Towards enabling ChatGPT-like capabilities on drug molecule graphs","author":"Liang Youwei","year":"2023","unstructured":"Youwei Liang, Ruiyi Zhang, Li Zhang, and Pengtao Xie. 2023. DrugChat: Towards enabling ChatGPT-like capabilities on drug molecule graphs. Retrieved from https:\/\/arxiv.org\/abs\/2309.03907","journal-title":"Retrieved from"},{"key":"e_1_3_1_123_2","first-page":"74","volume-title":"Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, 74\u201381."},{"key":"e_1_3_1_124_2","doi-asserted-by":"publisher","unstructured":"Zeming Lin Halil Akin Roshan Rao Brian Hie Zhongkai Zhu Wenting Lu Allan dos Santos Costa Maryam Fazel-Zarandi Tom Sercu Sal Candido et\u00a0al. 2022. Language models of protein sequences at the scale of evolution enable accurate structure prediction. Retrieved from 10.1101\/2022.07.20.500902v1","DOI":"10.1101\/2022.07.20.500902v1"},{"key":"e_1_3_1_125_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2024.108073"},{"key":"e_1_3_1_126_2","article-title":"A text-guided protein design framework","author":"Liu Shengchao","year":"2023","unstructured":"Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao et\u00a0al. 2023. A text-guided protein design framework. Retrieved from https:\/\/arxiv.org\/abs\/2302.04611","journal-title":"Retrieved from"},{"key":"e_1_3_1_127_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-023-00759-6"},{"key":"e_1_3_1_128_2","article-title":"Pre-training molecular graph representation with 3D geometry","author":"Liu Shengchao","year":"2021","unstructured":"Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. 2021. Pre-training molecular graph representation with 3D geometry. Retrieved from https:\/\/arxiv.org\/abs\/2110.07728","journal-title":"Retrieved from"},{"key":"e_1_3_1_129_2","article-title":"Chatgpt-powered conversational drug editing using retrieval and domain feedback","author":"Liu Shengchao","year":"2023","unstructured":"Shengchao Liu, Jiongxiao Wang, Yijin Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, and Chaowei Xiao. 2023. Chatgpt-powered conversational drug editing using retrieval and domain feedback. Retrieved from https:\/\/arxiv.org\/abs\/2305.18090","journal-title":"Retrieved from"},{"key":"e_1_3_1_130_2","article-title":"MolecularGPT: Open large language model (LLM) for few-shot molecular property prediction","author":"Liu Yuyan","year":"2024","unstructured":"Yuyan Liu, Sirui Ding, Sheng Zhou, Wenqi Fan, and Qiaoyu Tan. 2024. MolecularGPT: Open large language model (LLM) for few-shot molecular property prediction. Retrieved from https:\/\/arXiv:2406.12950","journal-title":"Retrieved from"},{"key":"e_1_3_1_131_2","article-title":"Roberta: A robustly optimized bert pretraining approach","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692","journal-title":"Retrieved from"},{"key":"e_1_3_1_132_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmgm.2022.108344"},{"key":"e_1_3_1_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_1_134_2","article-title":"ProtT3: Protein-to-text generation for text-based protein understanding","author":"Liu Zhiyuan","year":"2024","unstructured":"Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, and Tat-Seng Chua. 2024. ProtT3: Protein-to-text generation for text-based protein understanding. Retrieved from https:\/\/arXiv:2405.12564","journal-title":"Retrieved from"},{"key":"e_1_3_1_135_2","first-page":"1606","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics","author":"Liu Zequn","year":"2023","unstructured":"Zequn Liu, Wei Zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, and Tie-Yan Liu. 2023. MolXPT: Wrapping molecules with text for generative pre-training. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 1606\u20131616."},{"key":"e_1_3_1_136_2","article-title":"S2ORC: The semantic scholar open research corpus","author":"Lo Kyle","year":"2019","unstructured":"Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Dan S. Weld. 2019. S2ORC: The semantic scholar open research corpus. Retrieved from https:\/\/arxiv.org\/abs\/1911.02782","journal-title":"Retrieved from"},{"key":"e_1_3_1_137_2","volume-title":"Extraction of chemical structures and reactions from the literature","author":"Lowe Daniel Mark","year":"2012","unstructured":"Daniel Mark Lowe. 2012. Extraction of chemical structures and reactions from the literature. Ph.D. Dissertation, University of Cambridge."},{"key":"e_1_3_1_138_2","first-page":"2507","article-title":"Learn to explain: Multimodal reasoning via thought chains for science question answering","volume":"35","author":"Lu Pan","year":"2022","unstructured":"Pan Lu, Swaroop Mishra, Tanglin Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. 2022. Learn to explain: Multimodal reasoning via thought chains for science question answering. Adv. Neural Inf. Process. Syst. 35 (2022), 2507\u20132521.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_139_2","article-title":"MoleculeQA: A dataset to evaluate factual accuracy in molecular comprehension","author":"Lu Xingyu","year":"2024","unstructured":"Xingyu Lu, He Cao, Zijing Liu, Shengyuan Bai, Leqing Chen, Yuan Yao, Hai-Tao Zheng, and Yu Li. 2024. MoleculeQA: A dataset to evaluate factual accuracy in molecular comprehension. Retrieved from https:\/\/arXiv:2403.08192","journal-title":"Retrieved from"},{"key":"e_1_3_1_140_2","doi-asserted-by":"publisher","unstructured":"Yen-Chun Lu Ashley Varghese Rahul Nahar Hao Chen Kunming Shao Xiaoping Bao and Can Li. 2024. scChat: A large language model-powered co-pilot for contextualized single-cell RNA sequencing analysis. Retrieved from 10.1101\/2024.10.01.616063v1","DOI":"10.1101\/2024.10.01.616063v1"},{"key":"e_1_3_1_141_2","first-page":"153","volume-title":"Proceedings of the International Conference on Intelligent Computing","author":"Luo Hanyu","year":"2022","unstructured":"Hanyu Luo, Cheng Chen, Wenyu Shan, Pingjian Ding, and Lingyun Luo. 2022. iEnhancer-BERT: A novel transfer learning architecture based on DNA-Language model for identifying enhancers and their strength. In Proceedings of the International Conference on Intelligent Computing. Springer, 153\u2013165."},{"issue":"1","key":"e_1_3_1_142_2","first-page":"32","article-title":"Improving language model of human genome for DNA\u2013protein binding prediction based on task-specific pre-training","volume":"15","author":"Luo Hanyu","year":"2023","unstructured":"Hanyu Luo, Wenyu Shan, Cheng Chen, Pingjian Ding, and Lingyun Luo. 2023. Improving language model of human genome for DNA\u2013protein binding prediction based on task-specific pre-training. Interdisc. Sci.: Comput. Life Sci. 15, 1 (2023), 32\u201343.","journal-title":"Interdisc. Sci.: Comput. Life Sci."},{"issue":"6","key":"e_1_3_1_143_2","doi-asserted-by":"crossref","first-page":"bbac409","DOI":"10.1093\/bib\/bbac409","article-title":"BioGPT: Generative pre-trained transformer for biomedical text generation and mining","volume":"23","author":"Luo Renqian","year":"2022","unstructured":"Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, and Tie-Yan Liu. 2022. BioGPT: Generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinform. 23, 6 (2022), bbac409.","journal-title":"Brief. Bioinform."},{"key":"e_1_3_1_144_2","article-title":"Molfm: A multimodal molecular foundation model","author":"Luo Yizhen","year":"2023","unstructured":"Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, and Zaiqing Nie. 2023. Molfm: A multimodal molecular foundation model. Retrieved from https:\/\/arxiv.org\/abs\/2307.09484","journal-title":"Retrieved from"},{"key":"e_1_3_1_145_2","article-title":"Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine","author":"Luo Yizhen","year":"2023","unstructured":"Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. 2023. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. Retrieved from https:\/\/arxiv.org\/abs\/2308.09442","journal-title":"Retrieved from"},{"key":"e_1_3_1_146_2","article-title":"ProLLaMA: A protein large language model for multi-task protein language processing","author":"Lv Liuzhenghao","year":"2024","unstructured":"Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, and Yonghong Tian. 2024. ProLLaMA: A protein large language model for multi-task protein language processing. Retrieved from https:\/\/arXiv:2402.16445","journal-title":"Retrieved from"},{"key":"e_1_3_1_147_2","article-title":"Retrieved sequence augmentation for protein representation learning","author":"Ma Chang","year":"2023","unstructured":"Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, and Lingpeng Kong. 2023. Retrieved sequence augmentation for protein representation learning. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.02.22.529597","journal-title":"Retrieved from"},{"key":"e_1_3_1_148_2","article-title":"Progen: Language modeling for protein generation","author":"Madani Ali","year":"2020","unstructured":"Ali Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R. Eguchi, Po-Ssu Huang, and Richard Socher. 2020. Progen: Language modeling for protein generation. Retrieved from https:\/\/arxiv.org\/abs\/2004.03497","journal-title":"Retrieved from"},{"key":"e_1_3_1_149_2","article-title":"Understanding the natural language of DNA using encoder-decoder foundation models with byte-level precision","author":"Malusare Aditya","year":"2023","unstructured":"Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia A. Lanman, and Vaneet Aggarwal. 2023. Understanding the natural language of DNA using encoder-decoder foundation models with byte-level precision. Retrieved from https:\/\/arxiv.org\/abs\/2311.02333","journal-title":"Retrieved from"},{"issue":"3","key":"e_1_3_1_150_2","doi-asserted-by":"crossref","first-page":"e17190","DOI":"10.1002\/aic.17190","article-title":"Predicting chemical reaction outcomes: A grammar ontology-based transformer framework","volume":"67","author":"Mann Vipul","year":"2021","unstructured":"Vipul Mann and Venkat Venkatasubramanian. 2021. Predicting chemical reaction outcomes: A grammar ontology-based transformer framework. AIChE J. 67, 3 (2021), e17190.","journal-title":"AIChE J."},{"key":"e_1_3_1_151_2","unstructured":"Jiashun Mao Jianmin Wang Kwang-Hwi Cho and Kyoung Tai No. 2023. iupacGPT: IUPAC-based large-scale molecular pre-trained model for property prediction and molecule generation. Retrieved from https:\/\/chemrxiv.org\/engage\/chemrxiv\/article-details\/645f49f9a32ceeff2d90c9ae"},{"key":"e_1_3_1_152_2","article-title":"Molecule attention transformer","author":"Maziarka \u0141ukasz","year":"2020","unstructured":"\u0141ukasz Maziarka, Tomasz Danel, S\u0142awomir Mucha, Krzysztof Rataj, Jacek Tabor, and Stanis\u0142aw Jastrz\u0119bski. 2020. Molecule attention transformer. Retrieved from https:\/\/arxiv.org\/abs\/2002.08264","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_153_2","doi-asserted-by":"crossref","first-page":"8799","DOI":"10.1038\/s41598-023-35648-w","article-title":"Molecule generation using transformers and policy gradient reinforcement learning","volume":"13","author":"Mazuz Eyal","year":"2023","unstructured":"Eyal Mazuz, Guy Shtar, Bracha Shapira, and Lior Rokach. 2023. Molecule generation using transformers and policy gradient reinforcement learning. Sci. Rep. 13, 1 (2023), 8799.","journal-title":"Sci. Rep."},{"issue":"1","key":"e_1_3_1_154_2","first-page":"D593\u2013D597","article-title":"ExplorEnz: The primary source of the IUBMB enzyme list","volume":"37","author":"McDonald Andrew G.","year":"2009","unstructured":"Andrew G. McDonald, Sin\u00e9ad Boyce, and Keith F. Tipton. 2009. ExplorEnz: The primary source of the IUBMB enzyme list. Nucl. Acids Res. 37, suppl_1 (2009), D593\u2013D597.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_155_2","first-page":"29287","article-title":"Language models enable zero-shot prediction of the effects of mutations on protein function","volume":"34","author":"Meier Joshua","year":"2021","unstructured":"Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alex Rives. 2021. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst. 34 (2021), 29287\u201329303.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_156_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/15.3.219"},{"key":"e_1_3_1_157_2","volume-title":"Proceedings of the Machine Learning for Structural Biology Workshop. NeurIPS 2022","author":"Munsamy Geraldene","year":"2022","unstructured":"Geraldene Munsamy, Sebastian Lindner, Philipp Lorenz, and Noelia Ferruz. 2022. ZymCTRL: A conditional language model for the controllable generation of artificial enzymes. In Proceedings of the Machine Learning for Structural Biology Workshop. NeurIPS 2022."},{"key":"e_1_3_1_158_2","doi-asserted-by":"publisher","DOI":"10.1021\/jm300687e"},{"key":"e_1_3_1_159_2","doi-asserted-by":"publisher","unstructured":"Eric Nguyen Michael Poli Matthew G. Durrant Armin W. Thomas Brian Kang Jeremy Sullivan Madelena Y. Ng Ashley Lewis Aman Patel Aaron Lou et\u00a0al. 2024. Sequence modeling and design from molecular to genome scale with Evo. Retrieved from 10.1126\/science.ado9336","DOI":"10.1126\/science.ado9336"},{"key":"e_1_3_1_160_2","article-title":"Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution","volume":"36","author":"Nguyen Eric","year":"2024","unstructured":"Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Michael Wornow, Callum Birch-Sykes, Stefano Massaroli, Aman Patel, Clayton Rabideau, Yoshua Bengio et\u00a0al. 2024. Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. Adv. Neural Inf. Process. Syst. 36 (2024).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_161_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cels.2023.10.002"},{"key":"e_1_3_1_162_2","first-page":"16990","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Notin Pascal","year":"2022","unstructured":"Pascal Notin, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado, Aidan N. Gomez, Debora Marks, and Yarin Gal. 2022. Tranception: Protein fitness prediction with autoregressive transformers and inference-time retrieval. In Proceedings of the International Conference on Machine Learning. PMLR, 16990\u201317017."},{"key":"e_1_3_1_163_2","article-title":"ProteinGym: Large-scale benchmarks for protein design and fitness prediction","author":"Notin Pascal","year":"2023","unstructured":"Pascal Notin, Aaron W. Kollasch, Daniel Ritter, Lood van Niekerk, Steffanie Paul, Hansen Spinner, Nathan Rollins, Ada Shaw, Ruben Weitzman, Jonathan Frazeret al.. 2023. ProteinGym: Large-scale benchmarks for protein design and fitness prediction. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.12.07.570727","journal-title":"Retrieved from"},{"key":"e_1_3_1_164_2","article-title":"ProteinNPT: Improving protein property prediction and design with non-parametric transformers","author":"Notin Pascal","year":"2023","unstructured":"Pascal Notin, Ruben Weitzman, Debora S. Marks, and Yarin Gal. 2023. ProteinNPT: Improving protein property prediction and design with non-parametric transformers. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.12.06.570473","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_165_2","first-page":"D678\u2013D689","article-title":"Introducing the bacterial and viral bioinformatics resource center (BV-BRC): A resource combining PATRIC, IRD and ViPR","volume":"51","author":"Olson Robert D.","year":"2023","unstructured":"Robert D. Olson, Rida Assaf, Thomas Brettin, Neal Conrad, Clark Cucinell, James J. Davis, Donald M. Dempsey, Allan Dickerman, Emily M. Dietrich, Ronald W. Kenyon et\u00a0al. 2023. Introducing the bacterial and viral bioinformatics resource center (BV-BRC): A resource combining PATRIC, IRD and ViPR. Nucl. Acids Res. 51, D1 (2023), D678\u2013D689.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_166_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray et\u00a0al. 2022. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35 (2022), 27730\u201327744.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_167_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311\u2013318."},{"key":"e_1_3_1_168_2","article-title":"Biot5+: Towards generalized biological understanding with iupac integration and multi-task tuning","author":"Pei Qizhi","year":"2024","unstructured":"Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, and Rui Yan. 2024. Biot5+: Towards generalized biological understanding with iupac integration and multi-task tuning. Retrieved from https:\/\/arXiv:2402.17810","journal-title":"Retrieved from"},{"key":"e_1_3_1_169_2","article-title":"Biot5: Enriching cross-modal integration in biology with chemical knowledge and natural language associations","author":"Pei Qizhi","year":"2023","unstructured":"Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. 2023. Biot5: Enriching cross-modal integration in biology with chemical knowledge and natural language associations. Retrieved from https:\/\/arxiv.org\/abs\/2310.07276","journal-title":"Retrieved from"},{"key":"e_1_3_1_170_2","article-title":"An empirical study of multi-task learning on BERT for biomedical text mining","author":"Peng Yifan","year":"2020","unstructured":"Yifan Peng, Qingyu Chen, and Zhiyong Lu. 2020. An empirical study of multi-task learning on BERT for biomedical text mining. Retrieved from https:\/\/arxiv.org\/abs\/2005.02799","journal-title":"Retrieved from"},{"key":"e_1_3_1_171_2","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2020.565644"},{"key":"e_1_3_1_172_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever et\u00a0al. 2018. Improving language understanding by generative pre-training. https:\/\/gwern.net\/doc\/www\/s3-us-west-2.amazonaws.com\/d73fdc5ffa8627bce44dcda2fc012da638ffb158.pdf"},{"issue":"8","key":"e_1_3_1_173_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_1_174_2","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_3_1_175_2","article-title":"Few shot protein generation","author":"Ram Soumya","year":"2022","unstructured":"Soumya Ram and Tristan Bepler. 2022. Few shot protein generation. Retrieved from https:\/\/arxiv.org\/abs\/2204.01168","journal-title":"Retrieved from"},{"key":"e_1_3_1_176_2","article-title":"Evaluating protein transfer learning with TAPE","volume":"32","author":"Rao Roshan","year":"2019","unstructured":"Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Peter Chen, John Canny, Pieter Abbeel, and Yun Song. 2019. Evaluating protein transfer learning with TAPE. Adv. Neural Inf. Process. Syst. 32 (2019).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_177_2","first-page":"8844","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Rao Roshan M.","year":"2021","unstructured":"Roshan M. Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. 2021. MSA transformer. In Proceedings of the International Conference on Machine Learning. PMLR, 8844\u20138856."},{"key":"e_1_3_1_178_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2016239118"},{"key":"e_1_3_1_179_2","first-page":"12559","article-title":"Self-supervised graph transformer on large-scale molecular data","volume":"33","author":"Rong Yu","year":"2020","unstructured":"Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. 2020. Self-supervised graph transformer on large-scale molecular data. Adv. Neural Inf. Process. Syst. 33 (2020), 12559\u201312571.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_180_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-022-00580-7"},{"key":"e_1_3_1_181_2","doi-asserted-by":"publisher","DOI":"10.1021\/ci300415d"},{"issue":"8","key":"e_1_3_1_182_2","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1038\/s41592-018-0087-y","article-title":"Sequence-based prediction of variants\u2019 effects","volume":"15","author":"Rusk Nicole","year":"2018","unstructured":"Nicole Rusk. 2018. Sequence-based prediction of variants\u2019 effects. Nature Methods 15, 8 (2018), 571\u2013571.","journal-title":"Nature Methods"},{"key":"e_1_3_1_183_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0031826"},{"key":"e_1_3_1_184_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93417-4_38"},{"key":"e_1_3_1_185_2","article-title":"Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences","author":"Schmidinger Niklas","year":"2024","unstructured":"Niklas Schmidinger, Lisa Schneckenreiter, Philipp Seidl, Johannes Schimunek, Pieter-Jan Hoedt, Johannes Brandstetter, Andreas Mayr, Sohvi Luukkonen, Sepp Hochreiter, and G\u00fcnter Klambauer. 2024. Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences. Retrieved from https:\/\/arxiv.org\/abs\/2411.04165","journal-title":"Retrieved from"},{"key":"e_1_3_1_186_2","doi-asserted-by":"publisher","DOI":"10.1021\/acscentsci.9b00576"},{"key":"e_1_3_1_187_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-00284-w"},{"key":"e_1_3_1_188_2","first-page":"30458","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Seidl Philipp","year":"2023","unstructured":"Philipp Seidl, Andreu Vall, Sepp Hochreiter, and G\u00fcnter Klambauer. 2023. Enhancing activity prediction models in drug discovery with the ability to understand human language. In Proceedings of the International Conference on Machine Learning. PMLR, 30458\u201330490."},{"key":"e_1_3_1_189_2","doi-asserted-by":"publisher","DOI":"10.1145\/3624724"},{"issue":"1","key":"e_1_3_1_190_2","doi-asserted-by":"crossref","first-page":"9392","DOI":"10.1038\/s41467-024-53759-4","article-title":"A long-context language model for deciphering and generating bacteriophage genomes","volume":"15","author":"Shao Bin","year":"2024","unstructured":"Bin Shao and Jiawei Yan. 2024. A long-context language model for deciphering and generating bacteriophage genomes. Nature Commun. 15, 1 (2024), 9392.","journal-title":"Nature Commun."},{"key":"e_1_3_1_191_2","article-title":"A fine-tuning dataset and benchmark for large language models for protein understanding","author":"Shen Yiqing","year":"2024","unstructured":"Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, and Yu Guang Wang. 2024. A fine-tuning dataset and benchmark for large language models for protein understanding. Retrieved from https:\/\/arXiv:2406.05540","journal-title":"Retrieved from"},{"key":"e_1_3_1_192_2","article-title":"Toursynbio: A multi-modal large model and agent framework to bridge text and protein sequences for protein engineering","author":"Shen Yiqing","year":"2024","unstructured":"Yiqing Shen, Zan Chen, Michail Mamalakis, Yungeng Liu, Tianbin Li, Yanzhou Su, Junjun He, Pietro Li\u00f2, and Yu Guang Wang. 2024. Toursynbio: A multi-modal large model and agent framework to bridge text and protein sequences for protein engineering. Retrieved from https:\/\/arXiv:2408.15299","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_193_2","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1038\/s41524-023-01003-w","article-title":"A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing","volume":"9","author":"Shetty Pranav","year":"2023","unstructured":"Pranav Shetty, Arunkumar Chitteth Rajan, Chris Kuenneth, Sonakshi Gupta, Lakshmi Prerana Panchumarti, Lauren Holm, Chao Zhang, and Rampi Ramprasad. 2023. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. npj Comput. Mater. 9, 1 (2023), 52.","journal-title":"npj Comput. Mater."},{"key":"e_1_3_1_194_2","first-page":"4700","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201920)","author":"Shin Hoo-Chang","year":"2020","unstructured":"Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, and Raghav Mani. 2020. BioMegatron: Larger biomedical domain language model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201920). 4700\u20134706."},{"key":"e_1_3_1_195_2","article-title":"MAMMAL\u2013molecular aligned multi-modal architecture and language","author":"Shoshan Yoel","year":"2024","unstructured":"Yoel Shoshan, Moshiko Raboh, Michal Ozery-Flato, Vadim Ratner, Alex Golts, Jeffrey K. Weber, Ella Barkan, Simona Rabinovici-Cohen, Sagi Polaczek, Ido Amos et\u00a0al. 2024. MAMMAL\u2013molecular aligned multi-modal architecture and language. Retrieved from https:\/\/arXiv:2410.22367","journal-title":"Retrieved from"},{"key":"e_1_3_1_196_2","doi-asserted-by":"publisher","unstructured":"Richard W. Shuai Jeffrey A. Ruffolo and Jeffrey J. Gray. 2021. Generative language modeling for antibody design. Retrieved from 10.1101\/2021.12.13.472419v2","DOI":"10.1101\/2021.12.13.472419v2"},{"key":"e_1_3_1_197_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-019-0437-4"},{"key":"e_1_3_1_198_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-04964-5"},{"key":"e_1_3_1_199_2","doi-asserted-by":"crossref","first-page":"2324","DOI":"10.1021\/acs.jcim.5b00559","article-title":"ZINC 15\u2013ligand discovery for everyone","author":"Sterling Teague","year":"2015","unstructured":"Teague Sterling and John J. Irwin. 2015. ZINC 15\u2013ligand discovery for everyone. J. Chem. Inf. Model. (2015), 2324\u20132337.","journal-title":"J. Chem. Inf. Model."},{"issue":"5439","key":"e_1_3_1_200_2","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1126\/science.286.5439.455","article-title":"The mammalian gene collection","volume":"286","author":"Strausberg Robert L.","year":"1999","unstructured":"Robert L. Strausberg, Elise A. Feingold, Richard D. Klausner, and Francis S. Collins. 1999. The mammalian gene collection. Science 286, 5439 (1999), 455\u2013457.","journal-title":"Science"},{"key":"e_1_3_1_201_2","article-title":"A molecular multimodal foundation model associating molecule graphs with natural language","author":"Su Bing","year":"2022","unstructured":"Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, and Ji-Rong Wen. 2022. A molecular multimodal foundation model associating molecule graphs with natural language. Retrieved from https:\/\/arxiv.org\/abs\/2209.05481","journal-title":"Retrieved from"},{"key":"e_1_3_1_202_2","article-title":"Saprot: Protein language modeling with structure-aware vocabulary","author":"Su Jin","year":"2023","unstructured":"Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan. 2023. Saprot: Protein language modeling with structure-aware vocabulary. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.10.01.560349","journal-title":"Retrieved from"},{"key":"e_1_3_1_203_2","first-page":"1","article-title":"ExCAPE-DB: An integrated large scale dataset facilitating Big Data analysis in chemogenomics","volume":"9","author":"Sun Jiangming","year":"2017","unstructured":"Jiangming Sun, Nina Jeliazkova, Vladimir Chupakhin, Jose-Felipe Golib-Dzib, Ola Engkvist, Lars Carlsson, J\u00f6rg Wegner, Hugo Ceulemans, Ivan Georgiev, Vedrin Jeliazkov et\u00a0al. 2017. ExCAPE-DB: An integrated large scale dataset facilitating Big Data analysis in chemogenomics. J. Cheminform. 9 (2017), 1\u20139.","journal-title":"J. Cheminform."},{"key":"e_1_3_1_204_2","first-page":"19053","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"Sun Liangtai","year":"2024","unstructured":"Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, and Kai Yu. 2024. Scieval: A multi-level large language model evaluation benchmark for scientific research. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 19053\u201319061."},{"key":"e_1_3_1_205_2","article-title":"SciDFM: A large language model with mixture-of-experts for science","author":"Sun Liangtai","year":"2024","unstructured":"Liangtai Sun, Danyu Luo, Da Ma, Zihan Zhao, Baocai Chen, Zhennan Shen, Su Zhu, Lu Chen, Xin Chen, and Kai Yu. 2024. SciDFM: A large language model with mixture-of-experts for science. Retrieved from https:\/\/arXiv:2409.18412","journal-title":"Retrieved from"},{"key":"e_1_3_1_206_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm098"},{"key":"e_1_3_1_207_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu739"},{"key":"e_1_3_1_208_2","article-title":"Galactica: A large language model for science","author":"Taylor Ross","year":"2022","unstructured":"Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. 2022. Galactica: A large language model for science. Retrieved from https:\/\/arxiv.org\/abs\/2211.09085","journal-title":"Retrieved from"},{"key":"e_1_3_1_209_2","article-title":"Gemini: A family of highly capable multimodal models","author":"Team Gemini","year":"2023","unstructured":"Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth et\u00a0al. 2023. Gemini: A family of highly capable multimodal models. Retrieved from https:\/\/arxiv.org\/abs\/2312.11805","journal-title":"Retrieved from"},{"key":"e_1_3_1_210_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-020-19266-y"},{"issue":"7","key":"e_1_3_1_211_2","doi-asserted-by":"crossref","first-page":"1488","DOI":"10.1021\/acscentsci.3c00372","article-title":"Unbiasing retrosynthesis language models with disconnection prompts","volume":"9","author":"Thakkar Amol","year":"2023","unstructured":"Amol Thakkar, Alain C. Vaucher, Andrea Byekwaso, Philippe Schwaller, Alessandra Toniato, and Teodoro Laino. 2023. Unbiasing retrosynthesis language models with disconnection prompts. ACS Central Sci. 9, 7 (2023), 1488\u20131498.","journal-title":"ACS Central Sci."},{"issue":"2","key":"e_1_3_1_212_2","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1039\/D2DD00110A","article-title":"Enhancing diversity in language based models for single-step retrosynthesis","volume":"2","author":"Toniato Alessandra","year":"2023","unstructured":"Alessandra Toniato, Alain C. Vaucher, Philippe Schwaller, and Teodoro Laino. 2023. Enhancing diversity in language based models for single-step retrosynthesis. Dig. Discov. 2, 2 (2023), 489\u2013501.","journal-title":"Dig. Discov."},{"key":"e_1_3_1_213_2","article-title":"Llama: Open and efficient foundation language models","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971","journal-title":"Retrieved from"},{"issue":"4","key":"e_1_3_1_214_2","doi-asserted-by":"crossref","first-page":"3775","DOI":"10.3390\/ijms24043775","article-title":"Survey of protein sequence embedding models","volume":"24","author":"Tran Chau","year":"2023","unstructured":"Chau Tran, Siddharth Khadkikar, and Aleksey Porollo. 2023. Survey of protein sequence embedding models. Int. J. Mol. Sci. 24, 4 (2023), 3775.","journal-title":"Int. J. Mol. Sci."},{"key":"e_1_3_1_215_2","doi-asserted-by":"publisher","DOI":"10.3390\/ijms241511948"},{"key":"e_1_3_1_216_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-022-28857-w"},{"issue":"2","key":"e_1_3_1_217_2","first-page":"ii155\u2013ii161","article-title":"Exploiting pretrained biochemical language models for targeted drug design","volume":"38","author":"Uludo\u011fan G\u00f6k\u00e7e","year":"2022","unstructured":"G\u00f6k\u00e7e Uludo\u011fan, Elif Ozkirimli, Kutlu O. Ulgen, Nilg\u00fcn Karal\u0131, and Arzucan \u00d6zg\u00fcr. 2022. Exploiting pretrained biochemical language models for targeted drug design. Bioinformatics 38, Supplement_2 (2022), ii155\u2013ii161.","journal-title":"Bioinformatics"},{"key":"e_1_3_1_218_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-022-00457-9"},{"key":"e_1_3_1_219_2","doi-asserted-by":"publisher","unstructured":"Michel van Kempen Stephanie S. Kim Charlotte Tumescheit Milot Mirdita Cameron L. M. Gilchrist Johannes S\u00f6ding and Martin Steinegger. 2022. Foldseek: Fast and accurate protein structure search. Retrieved from 10.1101\/2022.02.07.479398v1","DOI":"10.1101\/2022.02.07.479398v1"},{"issue":"1","key":"e_1_3_1_220_2","first-page":"D439\u2013D444","article-title":"AlphaFold protein structure database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi Mihaly","year":"2021","unstructured":"Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydonet al.. 2021. AlphaFold protein structure database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucl. Acids Res. 50, D1 (112021), D439\u2013D444.","journal-title":"Nucl. Acids Res."},{"issue":"10","key":"e_1_3_1_221_2","doi-asserted-by":"crossref","first-page":"101600","DOI":"10.1016\/j.xcrp.2023.101600","article-title":"Deciphering the protein landscape with ProtFlash: A lightweight language model","volume":"4","author":"Wang Lei","year":"2023","unstructured":"Lei Wang, Hui Zhang, Wei Xu, Zhidong Xue, and Yan Wang. 2023. Deciphering the protein landscape with ProtFlash: A lightweight language model. Cell Rep. Phys. Sci. 4, 10 (2023), 101600.","journal-title":"Cell Rep. Phys. Sci."},{"key":"e_1_3_1_222_2","doi-asserted-by":"publisher","DOI":"10.1021\/jm048957q"},{"key":"e_1_3_1_223_2","doi-asserted-by":"publisher","DOI":"10.1145\/3307339.3342186"},{"key":"e_1_3_1_224_2","unstructured":"Wenlu Wang Ye Wang Honggang Zhao and Simone Sciabola. 2022. A pre-trained conditional transformer for Target-specific De Novo Molecular Generation. https:\/\/tamucc-ir.tdl.org\/items\/d77bfc4e-a96b-4d4b-8216-0b5c97ca750f"},{"issue":"3","key":"e_1_3_1_225_2","doi-asserted-by":"crossref","first-page":"bbad093","DOI":"10.1093\/bib\/bbad093","article-title":"miProBERT: Identification of microRNA promoters based on the pre-trained model BERT","volume":"24","author":"Wang Xin","year":"2023","unstructured":"Xin Wang, Xin Gao, Guohua Wang, and Dan Li. 2023. miProBERT: Identification of microRNA promoters based on the pre-trained model BERT. Brief. Bioinform. 24, 3 (2023), bbad093.","journal-title":"Brief. Bioinform."},{"key":"e_1_3_1_226_2","article-title":"UNI-RNA: Universal pre-trained models revolutionize RNA research","author":"Wang Xi","year":"2023","unstructured":"Xi Wang, Ruichu Gu, Zhiyuan Chen, Yongge Li, Xiaohong Ji, Guolin Ke, and Han Wen. 2023. UNI-RNA: Universal pre-trained models revolutionize RNA research. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.07.11.548588","journal-title":"Retrieved from"},{"key":"e_1_3_1_227_2","doi-asserted-by":"publisher","DOI":"10.3390\/molecules28114430"},{"key":"e_1_3_1_228_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-022-10775-y"},{"key":"e_1_3_1_229_2","article-title":"BioBridge: Bridging biomedical foundation models via knowledge graph","author":"Wang Zifeng","year":"2023","unstructured":"Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N. Ioannidis, Huzefa Rangwala, and Rishita Anubhai. 2023. BioBridge: Bridging biomedical foundation models via knowledge graph. Retrieved from https:\/\/arXiv:2310.03320","journal-title":"Retrieved from"},{"key":"e_1_3_1_230_2","article-title":"InstructProtein: Aligning human and protein language via knowledge instruction","author":"Wang Zeyuan","year":"2023","unstructured":"Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, and Huajun Chen. 2023. InstructProtein: Aligning human and protein language via knowledge instruction. Retrieved from https:\/\/arxiv.org\/abs\/2310.03269","journal-title":"Retrieved from"},{"key":"e_1_3_1_231_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Wang Zeyuan","year":"2023","unstructured":"Zeyuan Wang, Qiang Zhang, Shuang-Wei HU, Haoran Yu, Xurui Jin, Zhichen Gong, and Huajun Chen. 2023. Multi-level protein structure pre-training via prompt learning. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_1_232_2","doi-asserted-by":"publisher","DOI":"10.1021\/ci00057a005"},{"key":"e_1_3_1_233_2","article-title":"Crowdsourcing multiple choice science questions","author":"Welbl Johannes","year":"2017","unstructured":"Johannes Welbl, Nelson F. Liu, and Matt Gardner. 2017. Crowdsourcing multiple choice science questions. Retrieved from https:\/\/arxiv.org\/abs\/1707.06209","journal-title":"Retrieved from"},{"key":"e_1_3_1_234_2","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2016.18"},{"issue":"1","key":"e_1_3_1_235_2","first-page":"D1074\u2013D1082","article-title":"DrugBank 5.0: A major update to the DrugBank database for 2018","volume":"46","author":"Wishart David S.","year":"2018","unstructured":"David S. Wishart, Yannick D. Feunang, An C. Guo, Elvis J. Lo, Ana Marcu, Jason R. Grant, Tanvir Sajed, Daniel Johnson, Carin Li, Zinat Sayeeda et\u00a0al. 2018. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucl. Acids Res. 46, D1 (2018), D1074\u2013D1082.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_236_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i4.25662"},{"key":"e_1_3_1_237_2","doi-asserted-by":"publisher","DOI":"10.1101\/2024.05.14.594226v1"},{"key":"e_1_3_1_238_2","doi-asserted-by":"publisher","DOI":"10.1039\/C7SC02664A"},{"issue":"1","key":"e_1_3_1_239_2","first-page":"D520\u2013D528","article-title":"Protein data bank: The single global archive for 3D macromolecular structure data","volume":"47","author":"consortium wwPDB","year":"2018","unstructured":"wwPDB consortium. 2018. Protein data bank: The single global archive for 3D macromolecular structure data. Nucl. Acids Res. 47, D1 (102018), D520\u2013D528.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_240_2","article-title":"A systematic survey of chemical pre-trained models","author":"Xia Jun","year":"2022","unstructured":"Jun Xia, Yanqiao Zhu, Yuanqi Du, and Stan Z. Li. 2022. A systematic survey of chemical pre-trained models. Retrieved from https:\/\/arxiv.org\/abs\/2210.16484","journal-title":"Retrieved from"},{"key":"e_1_3_1_241_2","volume-title":"Proceedings of the NeurIPS Workshop Foundation Models for Science: Progress, Opportunities, and Challenges","author":"Xiao Hongwang","unstructured":"Hongwang Xiao, Wenjun Lin, Hui Wang, Zheng Liu, and Qiwei Ye. [n.d.]. OPI: An open instruction dataset for adapting large language models to protein-related tasks. In Proceedings of the NeurIPS Workshop Foundation Models for Science: Progress, Opportunities, and Challenges."},{"key":"e_1_3_1_242_2","article-title":"Molbind: Multimodal alignment of language, molecules, and proteins","author":"Xiao Teng","year":"2024","unstructured":"Teng Xiao, Chao Cui, Huaisheng Zhu, and Vasant G. Honavar. 2024. Molbind: Multimodal alignment of language, molecules, and proteins. Retrieved from https:\/\/arXiv:2403.08167","journal-title":"Retrieved from"},{"key":"e_1_3_1_243_2","first-page":"8717","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"36","author":"Xie Renchunzi","year":"2022","unstructured":"Renchunzi Xie, Hongxin Wei, Lei Feng, and Bo An. 2022. Gearnet: Stepwise dual learning for weakly supervised domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8717\u20138725."},{"key":"e_1_3_1_244_2","article-title":"Darwin series: Domain specific large language models for natural science","author":"Xie Tong","year":"2023","unstructured":"Tong Xie, Yuwei Wan, Wei Huang, Zhenyu Yin, Yixuan Liu, Shaozhou Wang, Qingyuan Linghu, Chunyu Kit, Clara Grazian, Wenjie Zhang et\u00a0al. 2023. Darwin series: Domain specific large language models for natural science. Retrieved from https:\/\/arxiv.org\/abs\/2308.13565","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_245_2","first-page":"W5\u2013W14","article-title":"ADMETlab 2.0: An integrated online platform for accurate and comprehensive predictions of ADMET properties","volume":"49","author":"Xiong Guoli","year":"2021","unstructured":"Guoli Xiong, Zhenxing Wu, Jiacai Yi, Li Fu, Zhijiang Yang, Changyu Hsieh, Mingzhu Yin, Xiangxiang Zeng, Chengkun Wu, Aiping Lu et\u00a0al. 2021. ADMETlab 2.0: An integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucl. Acids Res. 49, W1 (2021), W5\u2013W14.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_246_2","first-page":"279","volume-title":"Proceedings of the International Conference on Research in Computational Molecular Biology","author":"Xu Hanwen","year":"2022","unstructured":"Hanwen Xu and Sheng Wang. 2022. ProTranslator: Zero-shot protein function prediction using textual description. In Proceedings of the International Conference on Research in Computational Molecular Biology. Springer, 279\u2013294."},{"issue":"1","key":"e_1_3_1_247_2","doi-asserted-by":"crossref","first-page":"738","DOI":"10.1038\/s41467-023-36476-2","article-title":"Multilingual translation for zero-shot biomedical classification using BioTranslator","volume":"14","author":"Xu Hanwen","year":"2023","unstructured":"Hanwen Xu, Addie Woicik, Hoifung Poon, Russ B. Altman, and Sheng Wang. 2023. Multilingual translation for zero-shot biomedical classification using BioTranslator. Nature Commun. 14, 1 (2023), 738.","journal-title":"Nature Commun."},{"key":"e_1_3_1_248_2","first-page":"38749","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Xu Minghao","year":"2023","unstructured":"Minghao Xu, Xinyu Yuan, Santiago Miret, and Jian Tang. 2023. Protst: Multi-modality learning of protein sequences and biomedical texts. In Proceedings of the International Conference on Machine Learning. PMLR, 38749\u201338767."},{"key":"e_1_3_1_249_2","first-page":"35156","article-title":"Peer: A comprehensive and multi-task benchmark for protein sequence understanding","volume":"35","author":"Xu Minghao","year":"2022","unstructured":"Minghao Xu, Zuobai Zhang, Jiarui Lu, Zhaocheng Zhu, Yangtian Zhang, Ma Chang, Runcheng Liu, and Jian Tang. 2022. Peer: A comprehensive and multi-task benchmark for protein sequence understanding. Adv. Neural Inf. Process. Syst. 35 (2022), 35156\u201335173.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_250_2","article-title":"X-MOL: Large-scale pre-training for molecular understanding and diverse molecular analysis","author":"Xue Dongyu","year":"2020","unstructured":"Dongyu Xue, Han Zhang, Dongling Xiao, Yukang Gong, Guohui Chuai, Yu Sun, Hao Tian, Hua Wu, Yukun Li, and Qi Liu. 2020. X-MOL: Large-scale pre-training for molecular understanding and diverse molecular analysis. Retrieved from https:\/\/www.biorxiv.org\/content\/2020.12.23.424259","journal-title":"Retrieved from"},{"key":"e_1_3_1_251_2","article-title":"Baichuan 2: Open large-scale language models","author":"Yang Aiyuan","year":"2023","unstructured":"Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan et\u00a0al. 2023. Baichuan 2: Open large-scale language models. Retrieved from https:\/\/arxiv.org\/abs\/2309.10305","journal-title":"Retrieved from"},{"key":"e_1_3_1_252_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-022-00534-z"},{"key":"e_1_3_1_253_2","article-title":"Harnessing the power of llms in practice: A survey on chatgpt and beyond","author":"Yang Jingfeng","year":"2023","unstructured":"Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, and Xia Hu. 2023. Harnessing the power of llms in practice: A survey on chatgpt and beyond. Retrieved from https:\/\/arxiv.org\/abs\/2304.13712","journal-title":"Retrieved from"},{"issue":"1","key":"e_1_3_1_254_2","first-page":"D1096\u2013D1103","article-title":"BioLiP: A semi-manually curated database for biologically relevant ligand\u2013protein interactions","volume":"41","author":"Yang Jianyi","year":"2012","unstructured":"Jianyi Yang, Ambrish Roy, and Yang Zhang. 2012. BioLiP: A semi-manually curated database for biologically relevant ligand\u2013protein interactions. Nucl. Acids Res. 41, D1 (2012), D1096\u2013D1103.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_255_2","doi-asserted-by":"crossref","unstructured":"Meng Yang Haiping Huang Lichao Huang Nan Zhang Jihong Wu Huanming Yang and Feng Mu. 2021. LOGO a contextualized pre-trained language model of human genome flexibly adapts to various downstream tasks by fine-tuning. https:\/\/www.researchgate.net\/publication\/354105080_LOGO_a_contextualized_pre-trained_language_model_of_human_genome_flexibly_adapts_to_various_downstream_tasks_by_fine-tuning","DOI":"10.21203\/rs.3.rs-448927\/v1"},{"key":"e_1_3_1_256_2","article-title":"Linkbert: Pretraining language models with document links","author":"Yasunaga Michihiro","year":"2022","unstructured":"Michihiro Yasunaga, Jure Leskovec, and Percy Liang. 2022. Linkbert: Pretraining language models with document links. Retrieved from https:\/\/arxiv.org\/abs\/2203.15827","journal-title":"Retrieved from"},{"key":"e_1_3_1_257_2","article-title":"First place solution of KDD Cup 2021 & OGB large-scale challenge graph prediction track","author":"Ying Chengxuan","year":"2021","unstructured":"Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, and Di He. 2021. First place solution of KDD Cup 2021 & OGB large-scale challenge graph prediction track. Retrieved from https:\/\/arxiv.org\/abs\/2106.08279","journal-title":"Retrieved from"},{"key":"e_1_3_1_258_2","article-title":"Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset","author":"Yu Botao","year":"2024","unstructured":"Botao Yu, Frazier N. Baker, Ziqi Chen, Xia Ning, and Huan Sun. 2024. Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset. Retrieved from https:\/\/arXiv:2402.09391","journal-title":"Retrieved from"},{"key":"e_1_3_1_259_2","article-title":"Selformer: Molecular representation learning via selfies language models","author":"Y\u00fcksel Atakan","year":"2023","unstructured":"Atakan Y\u00fcksel, Erva Ulusoy, Atabey \u00dcnl\u00fc, and Tunca Do\u011fan. 2023. Selformer: Molecular representation learning via selfies language models. Mach. Learn.: Sci. Technol. (2023).","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"e_1_3_1_260_2","first-page":"gkad1004","article-title":"The ChEMBL database in 2023: A drug discovery platform spanning multiple bioactivity data types and time periods","author":"Zdrazil Barbara","year":"2023","unstructured":"Barbara Zdrazil, Eloy Felix, Fiona Hunter, Emma J. Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Harris Ioannidis, David Mendez Lopez, Juan F. Mosquera et\u00a0al. 2023. The ChEMBL database in 2023: A drug discovery platform spanning multiple bioactivity data types and time periods. Nucl. Acids Res. (2023), gkad1004.","journal-title":"Nucl. Acids Res."},{"key":"e_1_3_1_261_2","article-title":"Glm-130b: An open bilingual pre-trained model","author":"Zeng Aohan","year":"2022","unstructured":"Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia et\u00a0al. 2022. Glm-130b: An open bilingual pre-trained model. Retrieved from https:\/\/arxiv.org\/abs\/2210.02414","journal-title":"Retrieved from"},{"issue":"12","key":"e_1_3_1_262_2","first-page":"i121\u2013i127","article-title":"Convolutional neural network architectures for predicting DNA\u2013protein binding","volume":"32","author":"Zeng Haoyang","year":"2016","unstructured":"Haoyang Zeng, Matthew D. Edwards, Ge Liu, and David K. Gifford. 2016. Convolutional neural network architectures for predicting DNA\u2013protein binding. Bioinformatics 32, 12 (2016), i121\u2013i127.","journal-title":"Bioinformatics"},{"key":"e_1_3_1_263_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-022-28494-3"},{"key":"e_1_3_1_264_2","article-title":"Interactive molecular discovery with natural language","author":"Zeng Zheni","year":"2023","unstructured":"Zheni Zeng, Bangchen Yin, Shipeng Wang, Jiarui Liu, Cheng Yang, Haishen Yao, Xingzhi Sun, Maosong Sun, Guotong Xie, and Zhiyuan Liu. 2023. Interactive molecular discovery with natural language. Retrieved from https:\/\/arxiv.org\/abs\/2306.11976","journal-title":"Retrieved from"},{"key":"e_1_3_1_265_2","article-title":"Sciglm: Training scientific language models with self-reflective instruction annotation and tuning","author":"Zhang Dan","year":"2024","unstructured":"Dan Zhang, Ziniu Hu, Sining Zhoubian, Zhengxiao Du, Kaiyu Yang, Zihan Wang, Yisong Yue, Yuxiao Dong, and Jie Tang. 2024. Sciglm: Training scientific language models with self-reflective instruction annotation and tuning. Retrieved from https:\/\/arXiv:2401.07950","journal-title":"Retrieved from"},{"key":"e_1_3_1_266_2","article-title":"Chemllm: A chemical large language model","author":"Zhang Di","year":"2024","unstructured":"Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Dongzhan Zhou et\u00a0al. 2024. Chemllm: A chemical large language model. Retrieved from https:\/\/arXiv:2402.06852","journal-title":"Retrieved from"},{"key":"e_1_3_1_267_2","article-title":"DNAGPT: A generalized pretrained tool for multiple DNA sequence analysis tasks","author":"Zhang Daoan","year":"2023","unstructured":"Daoan Zhang, Weitong Zhang, Bing He, Jianguo Zhang, Chenchen Qin, and Jianhua Yao. 2023. DNAGPT: A generalized pretrained tool for multiple DNA sequence analysis tasks. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.07.11.548628","journal-title":"Retrieved from"},{"key":"e_1_3_1_268_2","article-title":"Enhancing the protein tertiary structure prediction by multiple sequence alignment generation","author":"Zhang Le","year":"2023","unstructured":"Le Zhang, Jiayang Chen, Tao Shen, Yu Li, and Siqi Sun. 2023. Enhancing the protein tertiary structure prediction by multiple sequence alignment generation. Retrieved from https:\/\/arxiv.org\/abs\/2306.01824","journal-title":"Retrieved from"},{"key":"e_1_3_1_269_2","article-title":"Ontoprotein: Protein pretraining with gene ontology embedding","author":"Zhang Ningyu","year":"2022","unstructured":"Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Jiazhang Lian, Qiang Zhang, and Huajun Chen. 2022. Ontoprotein: Protein pretraining with gene ontology embedding. Retrieved from https:\/\/arxiv.org\/abs\/2201.11147","journal-title":"Retrieved from"},{"key":"e_1_3_1_270_2","article-title":"Ontoprotein: Protein pretraining with gene ontology embedding","author":"Zhang Ningyu","year":"2022","unstructured":"Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Jiazhang Lian, Qiang Zhang, and Huajun Chen. 2022. Ontoprotein: Protein pretraining with gene ontology embedding. Retrieved from https:\/\/arXiv:2201.11147","journal-title":"Retrieved from"},{"key":"e_1_3_1_271_2","article-title":"OPT: Open pre-trained transformer language models","author":"Zhang Susan","year":"2022","unstructured":"Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin et\u00a0al. 2022. OPT: Open pre-trained transformer language models. Retrieved from https:\/\/arxiv.org\/abs\/2205.01068","journal-title":"Retrieved from"},{"key":"e_1_3_1_272_2","article-title":"BERTScore: Evaluating text generation with bert","author":"Zhang Tianyi","year":"2019","unstructured":"Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2019. BERTScore: Evaluating text generation with bert. Retrieved from https:\/\/arxiv.org\/abs\/1904.09675","journal-title":"Retrieved from"},{"issue":"6","key":"e_1_3_1_273_2","doi-asserted-by":"crossref","first-page":"bbab152","DOI":"10.1093\/bib\/bbab152","article-title":"MG-BERT: Leveraging unsupervised atomic representation learning for molecular property prediction","volume":"22","author":"Zhang Xiao-Chen","year":"2021","unstructured":"Xiao-Chen Zhang, Cheng-Kun Wu, Zhi-Jiang Yang, Zhen-Xing Wu, Jia-Cai Yi, Chang-Yu Hsieh, Ting-Jun Hou, and Dong-Sheng Cao. 2021. MG-BERT: Leveraging unsupervised atomic representation learning for molecular property prediction. Brief. Bioinform. 22, 6 (2021), bbab152.","journal-title":"Brief. Bioinform."},{"key":"e_1_3_1_274_2","doi-asserted-by":"publisher","DOI":"10.34133\/research.0004"},{"key":"e_1_3_1_275_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2023.3283985"},{"key":"e_1_3_1_276_2","article-title":"Multiple sequence-alignment-based RNA language model and its application to structural inference","author":"Zhang Yikun","year":"2023","unstructured":"Yikun Zhang, Mei Lang, Jiuhong Jiang, Zhiqiang Gao, Fan Xu, Thomas Litfin, Ke Chen, Jaswinder Singh, Xiansong Huang, Guoli Song et\u00a0al. 2023. Multiple sequence-alignment-based RNA language model and its application to structural inference. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.03.15.532863","journal-title":"Retrieved from"},{"key":"e_1_3_1_277_2","article-title":"A systematic study of joint representation learning on protein sequences and structures","author":"Zhang Zuobai","year":"2023","unstructured":"Zuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aur\u00e9lie Lozano, Payel Das, and Jian Tang. 2023. A systematic study of joint representation learning on protein sequences and structures. Retrieved from https:\/\/arxiv.org\/abs\/2303.06275","journal-title":"Retrieved from"},{"key":"e_1_3_1_278_2","article-title":"Enhancing protein language models with structure-based encoder and pre-training","author":"Zhang Zuobai","year":"2023","unstructured":"Zuobai Zhang, Minghao Xu, Vijil Chenthamarakshan, Aur\u00e9lie Lozano, Payel Das, and Jian Tang. 2023. Enhancing protein language models with structure-based encoder and pre-training. Retrieved from https:\/\/arxiv.org\/abs\/2303.06275","journal-title":"Retrieved from"},{"key":"e_1_3_1_279_2","doi-asserted-by":"crossref","unstructured":"Haiteng Zhao Shengchao Liu Chang Ma Hannan Xu Jie Fu Zhi-Hong Deng Lingpeng Kong and Qi Liu. 2023. GIMLET: A unified graph-text model for instruction-based molecule zero-shot learning. Retrieved from https:\/\/arxiv.org\/abs\/2306.13089","DOI":"10.1101\/2023.05.30.542904"},{"key":"e_1_3_1_280_2","article-title":"A survey of large language models","author":"Zhao Wayne Xin","year":"2023","unstructured":"Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong et\u00a0al. 2023. A survey of large language models. Retrieved from https:\/\/arxiv.org\/abs\/2303.18223","journal-title":"Retrieved from"},{"key":"e_1_3_1_281_2","article-title":"Chemdfm: Dialogue foundation model for chemistry","author":"Zhao Zihan","year":"2024","unstructured":"Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen et\u00a0al. 2024. Chemdfm: Dialogue foundation model for chemistry. Retrieved from https:\/\/arXiv:2401.14818","journal-title":"Retrieved from"},{"key":"e_1_3_1_282_2","first-page":"46595","article-title":"Judging llm-as-a-judge with mt-bench and chatbot arena","volume":"36","author":"Zheng Lianmin","year":"2023","unstructured":"Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing et\u00a0al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena. Adv. Neural Inf. Process. Syst. 36 (2023), 46595\u201346623.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_283_2","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.9b00949"},{"key":"e_1_3_1_284_2","article-title":"Structure-informed language models are protein designers","author":"Zheng Zaixiang","year":"2023","unstructured":"Zaixiang Zheng, Yifan Deng, Dongyu Xue, Yi Zhou, Fei Ye, and Quanquan Gu. 2023. Structure-informed language models are protein designers. Retrieved from https:\/\/www.biorxiv.org\/content\/2023.02.03.526917","journal-title":"Retrieved from"},{"key":"e_1_3_1_285_2","article-title":"Agieval: A human-centric benchmark for evaluating foundation models","author":"Zhong Wanjun","year":"2023","unstructured":"Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, and Nan Duan. 2023. Agieval: A human-centric benchmark for evaluating foundation models. Retrieved from https:\/\/arxiv.org\/abs\/2304.06364","journal-title":"Retrieved from"},{"key":"e_1_3_1_286_2","unstructured":"Gengmo Zhou Zhifeng Gao Qiankun Ding Hang Zheng Hongteng Xu Zhewei Wei Linfeng Zhang and Guolin Ke. 2023. Uni-Mol: A universal 3D molecular representation learning framework. Retrieved from https:\/\/chemrxiv.org\/engage\/chemrxiv\/article-details\/628e5b4d5d948517f5ce6d72"},{"key":"e_1_3_1_287_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Zhou Hong-Yu","year":"2023","unstructured":"Hong-Yu Zhou, Yunxiang Fu, Zhicheng Zhang, Bian Cheng, and Yizhou Yu. 2023. Protein representation learning via knowledge enhanced primary structure reasoning. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_1_288_2","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.3547"},{"key":"e_1_3_1_289_2","article-title":"Dnabert-2: Efficient foundation model and benchmark for multi-species genome","author":"Zhou Zhihan","year":"2023","unstructured":"Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and Han Liu. 2023. Dnabert-2: Efficient foundation model and benchmark for multi-species genome. Retrieved from https:\/\/arxiv.org\/abs\/2306.15006","journal-title":"Retrieved from"},{"key":"e_1_3_1_290_2","article-title":"Learning over molecular conformer ensembles: Datasets and benchmarks","author":"Zhu Yanqiao","year":"2023","unstructured":"Yanqiao Zhu, Jeehyun Hwang, Keir Adams, Zhen Liu, Bozhao Nan, Brock Stenfors, Yuanqi Du, Jatin Chauhan, Olaf Wiest, Olexandr Isayev et\u00a0al. 2023. Learning over molecular conformer ensembles: Datasets and benchmarks. Retrieved from https:\/\/arxiv.org\/abs\/2310.00115","journal-title":"Retrieved from"},{"key":"e_1_3_1_291_2","doi-asserted-by":"publisher","DOI":"10.1039\/D1RA03086H"},{"key":"e_1_3_1_292_2","article-title":"ProtLLM: An interleaved protein-language LLM with protein-as-word pre-training","author":"Zhuo Le","year":"2024","unstructured":"Le Zhuo, Zewen Chi, Minghao Xu, Heyan Huang, Heqi Zheng, Conghui He, Xian-Ling Mao, and Wentao Zhang. 2024. ProtLLM: An interleaved protein-language LLM with protein-as-word pre-training. Retrieved from https:\/\/arXiv:2403.07920","journal-title":"Retrieved from"},{"key":"e_1_3_1_293_2","doi-asserted-by":"publisher","DOI":"10.1177\/10943420231201154"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715318","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715318","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:18Z","timestamp":1750295898000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715318"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,10]]},"references-count":292,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3715318"],"URL":"https:\/\/doi.org\/10.1145\/3715318","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,10]]},"assertion":[{"value":"2024-02-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}