{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T00:35:04Z","timestamp":1774398904415,"version":"3.50.1"},"reference-count":120,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T00:00:00Z","timestamp":1753401600000},"content-version":"vor","delay-in-days":24,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Large language models (LLMs), representing a breakthrough advancement in artificial intelligence, have demonstrated substantial application value and development potential in bioinformatics research, particularly showing significant progress in the processing and analysis of complex biological data. This comprehensive review systematically examines the development and applications of LLMs in bioinformatics, with particular emphasis on their advancements in protein and nucleic acid structure prediction, omics analysis, drug design and screening, and biomedical literature mining. This work highlights the distinctive capabilities of LLMs in end-to-end learning and knowledge transfer paradigms. Additionally, this paper thoroughly discusses the major challenges confronting LLMs in current applications, including key issues such as model interpretability and data bias. Furthermore, this review comprehensively explores the potential of LLMs in cross-modal learning and interdisciplinary development. In conclusion, this paper aims to systematically summarize the current research status of LLMs in bioinformatics, objectively evaluate their advantages and limitations, and provide insights and recommendations for future research directions, thereby positioning LLMs as essential tools in bioinformatics research and fostering innovative developments in the biomedical field.<\/jats:p>","DOI":"10.1093\/bib\/bbaf357","type":"journal-article","created":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T04:44:14Z","timestamp":1753418654000},"source":"Crossref","is-referenced-by-count":20,"title":["Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics"],"prefix":"10.1093","volume":"26","author":[{"given":"Anqi","family":"Lin","sequence":"first","affiliation":[{"name":"Donghai County People\u2019s Hospital (Affiliated Kangda College of Nanjing Medical University); Department of Oncology, Zhujiang Hospital, Southern Medical University , Lianyungang 222000 ,","place":["China"]}]},{"given":"Junpu","family":"Ye","sequence":"additional","affiliation":[{"name":"Donghai County People\u2019s Hospital (Affiliated Kangda College of Nanjing Medical University); Department of Oncology, Zhujiang Hospital, Southern Medical University , Lianyungang 222000 ,","place":["China"]}]},{"given":"Chang","family":"Qi","sequence":"additional","affiliation":[{"name":"Institute of Logic and Computation, Vienna University of Technology , Vienna ,","place":["Austria"]}]},{"given":"Lingxuan","family":"Zhu","sequence":"additional","affiliation":[{"name":"Department of Oncology, Zhujiang Hospital, Southern Medical University , Guangzhou 510282 ,","place":["China"]}]},{"given":"Weiming","family":"Mou","sequence":"additional","affiliation":[{"name":"Department of Oncology, Zhujiang Hospital, Southern Medical University , Guangzhou 510282 ,","place":["China"]},{"name":"Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine , Shanghai ,","place":["China"]}]},{"given":"Wenyi","family":"Gan","sequence":"additional","affiliation":[{"name":"Department of Joint Surgery and Sports Medicine, Zhuhai People\u2019s Hospital (Zhuhai hospital affiliated with Jinan University) , Guangdong ,","place":["China"]}]},{"given":"Dongqiang","family":"Zeng","sequence":"additional","affiliation":[{"name":"Department of Oncology , Nanfang Hospital, Southern Medical University, Guangzhou, 510515,","place":["China"]},{"name":"Cancer Center , The Sixth Affiliated Hospital, School of Medicine, South China University of Technology, Foshan, 528000,","place":["China"]}]},{"given":"Bufu","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Radiation Oncology, Zhongshan Hospital Affiliated to Fudan University , Shanghai ,","place":["China"]}]},{"given":"Mingjia","family":"Xiao","sequence":"additional","affiliation":[{"name":"Hepatobiliary Surgery Department, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People\u2019s Hospital ,","place":["China"]}]},{"given":"Guangdi","family":"Chu","sequence":"additional","affiliation":[{"name":"Department of Urology, The Affiliated Hospital of Qingdao University , Qingdao ,","place":["China"]}]},{"given":"Shengkun","family":"Peng","sequence":"additional","affiliation":[{"name":"Department of Radiology, Sichuan Provincial People\u2019s Hospital, University of Electronic Science and Technology of China , Chengdu, 610072 ,","place":["China"]}]},{"given":"Hank Z H","family":"Wong","sequence":"additional","affiliation":[{"name":"Li Ka Shing Faculty of Medicine, The University of Hong Kong , Hong Kong SAR ,","place":["China"]}]},{"given":"Lin","family":"Zhang","sequence":"additional","affiliation":[{"name":"The School of Public Health and Preventive Medicine, Monash University , Melbourne, VIC 3000 ,","place":["Australia"]},{"name":"Suzhou Industrial Park Monash Research Institute of Science and Technology , Suzhou, Jiangsu 215000 ,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4438-8348","authenticated-orcid":false,"given":"Hengguo","family":"Zhang","sequence":"additional","affiliation":[{"name":"College & Hospital of Stomatology, Anhui Medical University, Key Laboratory of Oral Diseases Research of Anhui Province , Hefei, 230032 ,","place":["China"]}]},{"given":"Xinpei","family":"Deng","sequence":"additional","affiliation":[{"name":"Department of Urology, State Key Laboratory of Oncology in Southern China, Sun Yat-sen University Cancer Center, Guangdong Provincial Clinical Research Center for Cancer , Guangzhou, 510060 ,","place":["China"]}]},{"given":"Kailai","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Oncology, Zhujiang Hospital, Southern Medical University , Guangzhou 510282 ,","place":["China"]}]},{"given":"Jian","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Oncology, Zhujiang Hospital, Southern Medical University , Guangzhou 510282 ,","place":["China"]}]},{"given":"Aimin","family":"Jiang","sequence":"additional","affiliation":[{"name":"Department of Urology, Changhai Hospital, Naval Medical University (Second Military Medical University) , Shanghai ,","place":["China"]}]},{"given":"Zhengrui","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Oral and Cranio-Maxillofacial Surgery, Shanghai Ninth People\u2019s Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology and Shanghai Research Institute of Stomatology , Shanghai 200011 ,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8215-2045","authenticated-orcid":false,"given":"Peng","family":"Luo","sequence":"additional","affiliation":[{"name":"Donghai County People\u2019s Hospital (Affiliated Kangda College of Nanjing Medical University); Department of Oncology, Zhujiang Hospital, Southern Medical University , Lianyungang 222000 ,","place":["China"]},{"name":"Department of Microbiology, State Key Laboratory of Emerging Infectious Diseases, Carol Yu Centre for Infection, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong , Hong Kong SAR 999077 ,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,7,25]]},"reference":[{"key":"2025072500440683300_ref1","doi-asserted-by":"publisher","first-page":"383","DOI":"10.2165\/00129785-200404060-00005","article-title":"Biomedical literature mining: challenges and solutions in the \u2018omics\u2019 era","volume":"4","author":"Chaussabel","year":"2004","journal-title":"Am J Pharmacogenomics"},{"key":"2025072500440683300_ref2","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1002\/mdr2.70000","article-title":"GseaVis: an R package for enhanced visualization of gene set enrichment analysis in biomedicine","volume":"1","author":"Zhang","journal-title":"Med Research"},{"key":"2025072500440683300_ref3","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1038\/s41746-025-01435-2","article-title":"Computational frameworks transform antagonism to synergy in optimizing combination therapies","volume":"8","author":"Chen","year":"2025","journal-title":"NPJ Digit Med"},{"key":"2025072500440683300_ref4","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","article-title":"Deep learning: new computational modelling techniques for genomics","volume":"20","author":"Eraslan","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2025072500440683300_ref5","doi-asserted-by":"publisher","first-page":"e52700","DOI":"10.2196\/52700","article-title":"ChatGPT and medicine: together we embrace the AI renaissance","volume":"5","author":"Hacking","year":"2024","journal-title":"JMIR Bioinform Biotechnol"},{"key":"2025072500440683300_ref6","doi-asserted-by":"publisher","DOI":"10.1002\/imo2.7","article-title":"STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability","volume-title":"iMetaOmics","author":"Chen"},{"key":"2025072500440683300_ref7","doi-asserted-by":"publisher","first-page":"4547","DOI":"10.1097\/JS9.0000000000001583","article-title":"Advancing generative artificial intelligence in medicine: recommendations for standardized evaluation","volume":"110","author":"Lin","year":"2024","journal-title":"Int J Surg"},{"key":"2025072500440683300_ref8","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1186\/s12859-022-04688-w","article-title":"Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT","volume":"23","author":"Naseem","year":"2022","journal-title":"BMC Bioinformatics"},{"key":"2025072500440683300_ref9","doi-asserted-by":"publisher","first-page":"108796","DOI":"10.1016\/j.compbiomed.2024.108796","article-title":"GPT-4 as a biomedical simulator","volume":"178","author":"Schaefer","year":"2024","journal-title":"Comput Biol Med"},{"key":"2025072500440683300_ref10","doi-asserted-by":"publisher","first-page":"105","DOI":"10.15302\/J-QB-023-0327","article-title":"Empowering beginners in bioinformatics with ChatGPT","volume":"11","author":"Shue","year":"2023","journal-title":"bioRxiv"},{"key":"2025072500440683300_ref11","doi-asserted-by":"publisher","first-page":"1422","DOI":"10.1038\/s41592-024-02354-y","article-title":"Language models for biological research: a primer","volume":"21","author":"Simon","year":"2024","journal-title":"Nat Methods"},{"key":"2025072500440683300_ref12","doi-asserted-by":"publisher","first-page":"4096","DOI":"10.1097\/JS9.0000000000001359","article-title":"Step into the era of large multimodal models: a pilot study on ChatGPT-4V(ision)\u2019s ability to interpret radiological images","volume":"110","author":"Zhu","year":"2024","journal-title":"Int J Surg"},{"key":"2025072500440683300_ref13","article-title":"Large language models in bioinformatics: applications and perspectives","author":"Liu","year":"2024","journal-title":"ArXiv"},{"key":"2025072500440683300_ref14","doi-asserted-by":"publisher","DOI":"10.3390\/ijms24043775","article-title":"Survey of protein sequence embedding models","volume":"24","author":"Tran","year":"2023","journal-title":"Int J Mol Sci"},{"key":"2025072500440683300_ref15","doi-asserted-by":"publisher","first-page":"9622","DOI":"10.1038\/s41467-024-54011-9","article-title":"In silico formulation optimization and particle engineering of pharmaceutical products using a generative artificial intelligence structure synthesis method","volume":"15","author":"Hornick","year":"2024","journal-title":"Nat Commun"},{"key":"2025072500440683300_ref16","doi-asserted-by":"crossref","first-page":"674296","DOI":"10.1155\/2015\/674296","article-title":"Survey of natural language processing techniques in bioinformatics","volume":"2015","author":"Zeng","year":"2015","journal-title":"Comput Math Methods Med"},{"key":"2025072500440683300_ref17","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.91415.3","article-title":"Sensitive remote homology search by local alignment of small positional embeddings from protein language models","volume":"12","author":"Johnson","year":"2024","journal-title":"Elife"},{"key":"2025072500440683300_ref18","doi-asserted-by":"publisher","first-page":"723","DOI":"10.1186\/s12859-019-3220-8","article-title":"Modeling aspects of the language of life through transfer-learning protein sequences","volume":"20","author":"Heinzinger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2025072500440683300_ref19","doi-asserted-by":"publisher","first-page":"114416","DOI":"10.1016\/j.ab.2021.114416","article-title":"Identification of efflux proteins based on contextual representations with deep bidirectional transformer encoders","volume":"633","author":"Taju","year":"2021","journal-title":"Anal Biochem"},{"key":"2025072500440683300_ref20","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbab005","article-title":"A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information","volume":"22","author":"Le","year":"2021","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref21","doi-asserted-by":"publisher","first-page":"2556","DOI":"10.1093\/bioinformatics\/btab133","article-title":"BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides","volume":"37","author":"Charoenkwan","year":"2021","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref22","doi-asserted-by":"publisher","first-page":"648","DOI":"10.1093\/bioinformatics\/btab712","article-title":"BERT-Kcr: prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models","volume":"38","author":"Qiao","year":"2022","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref23","doi-asserted-by":"publisher","first-page":"2102","DOI":"10.1093\/bioinformatics\/btac020","article-title":"ProteinBERT: a universal deep-learning model of protein sequence and function","volume":"38","author":"Brandes","year":"2022","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref24","doi-asserted-by":"crossref","DOI":"10.1101\/2022.02.27.481241","article-title":"EpiBERTope: a sequence-based pre-trained BERT model improves linear and structural epitope prediction by learning long-distance protein interactions effectively","author":"Park","year":"2022"},{"key":"2025072500440683300_ref25","doi-asserted-by":"publisher","first-page":"1099","DOI":"10.1038\/s41587-022-01618-2","article-title":"Large language models generate functional protein sequences across diverse families","volume":"41","author":"Madani","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2025072500440683300_ref26","doi-asserted-by":"publisher","first-page":"968","DOI":"10.1016\/j.cels.2023.10.002","article-title":"ProGen2: exploring the boundaries of protein language models","volume":"14","author":"Nijkamp","year":"2023","journal-title":"Cell Syst"},{"key":"2025072500440683300_ref27","doi-asserted-by":"publisher","first-page":"4348","DOI":"10.1038\/s41467-022-32007-7","article-title":"ProtGPT2 is a deep unsupervised language model for protein design","volume":"13","author":"Ferruz","year":"2022","journal-title":"Nat Commun"},{"key":"2025072500440683300_ref28","doi-asserted-by":"publisher","first-page":"6699","DOI":"10.1038\/s41467-024-51071-9","article-title":"Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model","volume":"15","author":"Shrestha","year":"2024","journal-title":"Nat Commun"},{"key":"2025072500440683300_ref29","doi-asserted-by":"publisher","first-page":"4961","DOI":"10.1016\/j.csbj.2021.08.044","article-title":"EMCBOW-GPCR: a method for identifying G-protein coupled receptors based on word embedding and wordbooks","volume":"19","author":"Qiu","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2025072500440683300_ref30","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbac599","article-title":"Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings","volume":"24","author":"Yeung","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref31","doi-asserted-by":"publisher","first-page":"12","DOI":"10.7554\/eLife.79854","article-title":"Generative power of a protein language model trained on multiple sequence alignments","volume":"12","author":"Sgarbossa","year":"2023","journal-title":"Elife"},{"key":"2025072500440683300_ref32","doi-asserted-by":"publisher","first-page":"778","DOI":"10.1007\/s11684-024-1085-3","article-title":"Artificial intelligence methods available for cancer research","volume":"18","author":"Murmu","year":"2024","journal-title":"Front Med"},{"key":"2025072500440683300_ref33","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1007\/s11103-021-01204-1","article-title":"Using k-mer embeddings learned from a skip-gram based neural network for building a cross-species DNA N6-methyladenine site prediction model","volume":"107","author":"Nguyen","year":"2021","journal-title":"Plant Mol Biol"},{"key":"2025072500440683300_ref34","doi-asserted-by":"publisher","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref35","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2306.15006","article-title":"DNABERT-2: efficient foundation model and benchmark for multi-species genome","author":"Zhou","year":"2023"},{"key":"2025072500440683300_ref36","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2402.08777","article-title":"DNABERT-S: pioneering species differentiation with species-aware DNA embeddings","author":"Zhou","year":"2024"},{"key":"2025072500440683300_ref37","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbae157","article-title":"DSNetax: a deep learning species annotation method based on a deep-shallow parallel framework","volume":"25","author":"Zhao","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref38","volume-title":"2023 IEEE Signal Processing in Medicine and Biology Symposium (SPMB)","author":"Refahi","year":"2023"},{"key":"2025072500440683300_ref39","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2306.15794","article-title":"HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution","author":"Nguyen","year":"2023"},{"key":"2025072500440683300_ref40","doi-asserted-by":"publisher","first-page":"1183","DOI":"10.1093\/bioinformatics\/btab815","article-title":"Identification, semantic annotation and comparison of combinations of functional elements in multiple biological conditions","volume":"38","author":"Leone","year":"2022","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref41","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2307.05628","article-title":"DNAGPT: a generalized pre-trained tool for versatile DNA sequence analysis tasks","author":"Zhang","year":"2023"},{"key":"2025072500440683300_ref42","doi-asserted-by":"publisher","first-page":"lqac012","DOI":"10.1093\/nargab\/lqac012","article-title":"Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning","volume":"4","author":"Akiyama","year":"2022","journal-title":"NAR Genom Bioinform"},{"key":"2025072500440683300_ref43","doi-asserted-by":"publisher","first-page":"5120","DOI":"10.1093\/bioinformatics\/btaa647","article-title":"GeoBoost2: a natural language processing pipeline for GenBank metadata enrichment for virus phylogeography","volume":"36","author":"Magge","year":"2020","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref44","doi-asserted-by":"publisher","DOI":"10.3390\/biomedicines11051323","article-title":"Classification of highly divergent viruses from DNA\/RNA sequence using transformer-based models","volume":"11","author":"Sadad","year":"2023","journal-title":"Biomedicines"},{"key":"2025072500440683300_ref45","first-page":"580","article-title":"Enabling integrative genomic analysis of high-impact human diseases through text mining","author":"Dudley","year":"2008","journal-title":"Pac Symp Biocomput"},{"key":"2025072500440683300_ref46","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1038\/s42256-024-00855-1","article-title":"TopoFormer: multiscale topology-enabled structure-to-sequence transformer for protein-ligand interaction predictions","volume":"6","author":"Chen","year":"2024","journal-title":"Res Sq"},{"key":"2025072500440683300_ref47","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1016\/j.tibs.2022.11.001","article-title":"Novel machine learning approaches revolutionize protein knowledge","volume":"48","author":"Bordin","year":"2023","journal-title":"Trends Biochem Sci"},{"key":"2025072500440683300_ref48","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025072500440683300_ref49","doi-asserted-by":"publisher","first-page":"e4780","DOI":"10.1002\/pro.4780","article-title":"Zero-shot mutation effect prediction on protein stability and function using RoseTTAFold","volume":"32","author":"Mansoor","year":"2023","journal-title":"Protein Sci"},{"key":"2025072500440683300_ref50","doi-asserted-by":"crossref","DOI":"10.1101\/2022.04.10.487779","article-title":"Learning inverse folding from millions of predicted structures","author":"Hsu","year":"2022"},{"key":"2025072500440683300_ref51","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025072500440683300_ref52","doi-asserted-by":"publisher","first-page":"1617","DOI":"10.1038\/s41587-022-01432-w","article-title":"Single-sequence protein structure prediction using a language model and deep learning","volume":"40","author":"Chowdhury","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2025072500440683300_ref53","doi-asserted-by":"publisher","first-page":"e4497","DOI":"10.1002\/pro.4497","article-title":"BepiPred-3.0: improved B-cell epitope prediction using protein language models","volume":"31","author":"Clifford","year":"2022","journal-title":"Protein Sci"},{"key":"2025072500440683300_ref54","doi-asserted-by":"publisher","first-page":"2705","DOI":"10.1093\/bioinformatics\/btac188","article-title":"TransPPMP: predicting pathogenicity of frameshift and non-sense mutations by a transformer based on protein features","volume":"38","author":"Nie","year":"2022","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref55","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btac723","article-title":"Improving protein structure prediction using templates and sequence embedding","volume":"39","author":"Wu","year":"2023","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref56","doi-asserted-by":"publisher","first-page":"e0220182","DOI":"10.1371\/journal.pone.0220182","article-title":"rawMSA: end-to-end deep learning using raw multiple sequence alignments","volume":"14","author":"Mirabello","year":"2019","journal-title":"PloS One"},{"key":"2025072500440683300_ref57","doi-asserted-by":"publisher","first-page":"1653","DOI":"10.1093\/bioinformatics\/bti165","article-title":"Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction","volume":"21","author":"Santos","year":"2005","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref58","doi-asserted-by":"publisher","first-page":"W598","DOI":"10.1093\/nar\/gkac426","article-title":"PRECOGx: exploring GPCR signaling mechanisms with deep protein representations","volume":"50","author":"Matic","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025072500440683300_ref59","doi-asserted-by":"publisher","first-page":"494","DOI":"10.1021\/acs.jproteome.3c00364","article-title":"AraPathogen2.0: an improved prediction of plant-pathogen protein-protein interactions empowered by the natural language processing technique","volume":"23","author":"Lei","year":"2024","journal-title":"J Proteome Res"},{"key":"2025072500440683300_ref60","article-title":"GeneBERT: BERT for predicting differential gene expression from histone modifications","author":"Ruan","year":"2021"},{"key":"2025072500440683300_ref61","doi-asserted-by":"publisher","first-page":"vbac023","DOI":"10.1093\/bioadv\/vbac023","article-title":"Prediction of RNA-protein interactions using a nucleotide language model","volume":"2","author":"Yamada","year":"2022","journal-title":"Bioinform Adv"},{"key":"2025072500440683300_ref62","article-title":"TCR-BERT: a deep learning approach for t-cell receptor sequence analysis","volume-title":"International Conference on Learning Representations (ICLR) 2021","author":"Wu","year":"2021"},{"key":"2025072500440683300_ref63","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbac173","article-title":"HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction","volume":"23","author":"Zhang","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref64","doi-asserted-by":"publisher","first-page":"4603","DOI":"10.1093\/bioinformatics\/btab677","article-title":"iDNA-ABT: advanced deep learning model for detecting DNA methylation with adaptive features and transductive information maximization","volume":"37","author":"Yu","year":"2021","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref65","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1093\/gigascience\/giad054","article-title":"MuLan-methyl-multiple transformer-based language models for accurate DNA methylation prediction","volume":"12","author":"Zeng","year":"2022","journal-title":"Gigascience"},{"key":"2025072500440683300_ref66","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbae195","article-title":"BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning","volume":"25","author":"Wang","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref67","volume-title":"Intelligent Computing Theories and Application: 14th International Conference, ICIC 2018, Wuhan, China, August 15\u201318, 2018, Proceedings, Part II. Vol. 10955","author":"Huang","year":"2018"},{"key":"2025072500440683300_ref68","doi-asserted-by":"publisher","first-page":"852","DOI":"10.1038\/s42256-022-00534-z","article-title":"scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data","volume":"4","author":"Yang","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2025072500440683300_ref69","doi-asserted-by":"publisher","first-page":"1470","DOI":"10.1038\/s41592-024-02201-0","article-title":"scGPT: toward building a foundation model for single-cell multi-omics using generative AI","volume":"21","author":"Cui","year":"2024","journal-title":"Nat Methods"},{"key":"2025072500440683300_ref70","doi-asserted-by":"crossref","DOI":"10.1101\/2023.12.07.569910","article-title":"scelmo: Embeddings from language models are good learners for single-cell data analysis","volume-title":"bioRxiv (Cold Spring Harbor Laboratory)","author":"Liu"},{"key":"2025072500440683300_ref71","article-title":"Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials","volume-title":"ArXiv","author":"Zheng"},{"key":"2025072500440683300_ref72","article-title":"GP-GPT: Large Language Model for Gene-Phenotype Mapping","volume-title":"ArXiv","author":"Lyu"},{"key":"2025072500440683300_ref73","doi-asserted-by":"crossref","DOI":"10.1016\/j.eswa.2023.120047","article-title":"DeepGene transformer: Transformer for the gene expression-based classification of cancer subtypes","volume-title":"Expert Systems With Applications","author":"Khan"},{"key":"2025072500440683300_ref74","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2407.09089","article-title":"Lomics: generation of pathways and gene sets using large language models for transcriptomic analysis","author":"Wong","year":"2024"},{"key":"2025072500440683300_ref75","doi-asserted-by":"publisher","article-title":"Iterative prompt refinement for mining gene relationships from ChatGPT","author":"Chen","DOI":"10.1101\/2023.12.23.573201"},{"key":"2025072500440683300_ref76","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1186\/s12918-016-0352-6","article-title":"Identifying the missing proteins in human proteome by biological language model","volume":"10","author":"Dong","year":"2016","journal-title":"BMC Syst Biol"},{"key":"2025072500440683300_ref77","doi-asserted-by":"publisher","first-page":"752","DOI":"10.1021\/acs.jafc.3c07143","article-title":"pLM4Alg: protein language model-based predictors for allergenic proteins and peptides","volume":"72","author":"Du","year":"2024","journal-title":"J Agric Food Chem"},{"key":"2025072500440683300_ref78","doi-asserted-by":"publisher","DOI":"10.1128\/msystems.01004-23","article-title":"POOE: predicting oomycete effectors based on a pre-trained large protein language model","volume":"9","author":"Zhao","year":"2024","journal-title":"mSystems"},{"key":"2025072500440683300_ref79","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2024.108114","article-title":"Mtx-COBRA: subcellular localization prediction for bacterial proteins","volume":"171","author":"Arora","year":"2024","journal-title":"Comput Biol Med"},{"key":"2025072500440683300_ref80","doi-asserted-by":"publisher","first-page":"998","DOI":"10.1002\/prot.26694","article-title":"Exploiting protein language models for the precise classification of ion channels and ion transporters","volume":"92","author":"Ghazikhani","year":"2024","journal-title":"Proteins"},{"key":"2025072500440683300_ref81","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbae404","article-title":"PLM_Sol: predicting protein solubility by benchmarking multiple protein language models with the updated Escherichia coli protein solubility dataset","volume":"25","author":"Zhang","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref82","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbac131","article-title":"Knowledge-based BERT: a method to extract molecular features like computational chemists","volume":"23","author":"Wu","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref83","article-title":"Dropedge: towards deep graph convolutional networks on node classification","author":"Rong","year":"2019"},{"key":"2025072500440683300_ref84","article-title":"ChemBERTa: large-scale self-supervised pretraining for molecular property prediction","author":"Chithrananda","year":"2020"},{"key":"2025072500440683300_ref85","doi-asserted-by":"publisher","first-page":"2064","DOI":"10.1021\/acs.jcim.1c00600","article-title":"MolGPT: molecular generation using a transformer-decoder model","volume":"62","author":"Bagal","year":"2022","journal-title":"J Chem Inf Model"},{"key":"2025072500440683300_ref86","doi-asserted-by":"publisher","first-page":"13930","DOI":"10.1038\/s41598-024-64585-5","article-title":"Multi role ChatGPT framework for transforming medical data analysis","volume":"14","author":"Chen","year":"2024","journal-title":"Sci Rep"},{"key":"2025072500440683300_ref87","doi-asserted-by":"publisher","DOI":"10.3389\/fgene.2022.859188","article-title":"DTI-BERT: identifying drug-target interactions in cellular networking based on BERT and deep learning method","volume":"13","author":"Zheng","year":"2022","journal-title":"Front Genet"},{"key":"2025072500440683300_ref88","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbad226","article-title":"PharmBERT: a domain-specific BERT model for drug labels","volume":"24","author":"ValizadehAslani","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref89","doi-asserted-by":"publisher","first-page":"2706","DOI":"10.1021\/acsomega.1c05203","article-title":"TransDTI: transformer-based language models for estimating DTIs and building a drug recommendation workflow","volume":"7","author":"Kalakoti","year":"2022","journal-title":"ACS Omega"},{"key":"2025072500440683300_ref90","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1016\/j.jbi.2018.09.015","article-title":"PISTON: predicting drug indications and side effects using topic modeling and natural language processing","volume":"87","author":"Jang","year":"2018","journal-title":"J Biomed Inform"},{"key":"2025072500440683300_ref91","article-title":"Foundational model aided automatic high-throughput drug screening using self-controlled cohort study","author":"Xu","year":"2024","journal-title":"medRxiv"},{"key":"2025072500440683300_ref92","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1186\/s13326-016-0060-6","article-title":"MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions","volume":"7","author":"Blank","year":"2016","journal-title":"J Biomed Semantics"},{"key":"2025072500440683300_ref93","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref94","volume-title":"Proceedings of the 2nd Clinical Natural Language Processing Workshop","author":"Alsentzer","year":"2019"},{"key":"2025072500440683300_ref95","volume-title":"The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks","author":"Park","year":"2023"},{"key":"2025072500440683300_ref96","doi-asserted-by":"publisher","first-page":"1844","DOI":"10.1093\/jamia\/ocae029","article-title":"BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights","volume":"31","author":"Remy","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"2025072500440683300_ref97","doi-asserted-by":"publisher","first-page":"1973","DOI":"10.1093\/bioinformatics\/btz807","article-title":"GenCLiP 3: mining human genes\u2019 functions and regulatory networks from PubMed based on co-occurrences and natural language processing","volume":"36","author":"Wang","year":"2019","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref98","doi-asserted-by":"publisher","first-page":"404","DOI":"10.1093\/bioinformatics\/btaa721","article-title":"LBERT: lexically aware transformer-based bidirectional encoder representation model for learning universal bio-entity relations","volume":"37","author":"Warikoo","year":"2021","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref99","doi-asserted-by":"publisher","first-page":"vbac035","DOI":"10.1093\/bioadv\/vbac035","article-title":"MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction","volume":"2","author":"Gu","year":"2022","journal-title":"Bioinform Adv"},{"key":"2025072500440683300_ref100","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btae500","article-title":"GENEVIC: GENetic data exploration and visualization via intelligent interactive console","volume":"40","author":"Nath","year":"2024","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref101","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btae196","article-title":"Effect of tokenization on transformers for biological sequences","volume":"40","author":"Dotan","year":"2024","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref102","doi-asserted-by":"publisher","first-page":"5678","DOI":"10.1093\/bioinformatics\/btaa1087","article-title":"BERT-GT: cross-sentence n-ary relation extraction with BERT and graph transformer","volume":"36","author":"Lai","year":"2021","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref103","doi-asserted-by":"crossref","first-page":"1331233","DOI":"10.3389\/fmicb.2023.1331233","article-title":"ProkBERT family: genomic language models for microbiome applications","volume":"14","author":"Ligeti","year":"2023","journal-title":"Front Microbiol"},{"key":"2025072500440683300_ref104","doi-asserted-by":"publisher","first-page":"1648","DOI":"10.1093\/bioinformatics\/btac001","article-title":"STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs","volume":"38","author":"Balabin","year":"2022","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref105","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1038\/s41698-024-00576-z","article-title":"Scientific figures interpreted by ChatGPT: strengths in plot recognition and limits in color perception","volume":"8","author":"Wang","year":"2024","journal-title":"NPJ Precis Oncol"},{"key":"2025072500440683300_ref106","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2401.14656","article-title":"Scientific large language models: a survey on biological & chemical domains","author":"Zhang","year":"2024"},{"key":"2025072500440683300_ref107","doi-asserted-by":"publisher","first-page":"2219","DOI":"10.1093\/bib\/bbaa054","article-title":"Biomedical named entity recognition and linking datasets: survey and our recent development","volume":"21","author":"Huang","year":"2020","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref108","doi-asserted-by":"publisher","first-page":"e0003522","DOI":"10.1128\/msystems.00035-22","article-title":"Mapping data to deep understanding: making the most of the deluge of SARS-CoV-2 genome sequences","volume":"7","author":"Sokhansanj","year":"2022","journal-title":"mSystems"},{"key":"2025072500440683300_ref109","doi-asserted-by":"publisher","first-page":"2190","DOI":"10.1016\/j.ajhg.2024.08.010","article-title":"Assessing the utility of large language models for phenotype-driven gene prioritization in rare genetic disorder diagnosis","volume":"111","author":"Kim","year":"2024","journal-title":"ArXiv"},{"key":"2025072500440683300_ref110","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1186\/s12859-021-04421-z","article-title":"Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes","volume":"22","author":"Pourreza Shahri","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2025072500440683300_ref111","doi-asserted-by":"publisher","first-page":"4087","DOI":"10.1093\/bioinformatics\/bty449","article-title":"Transfer learning for biomedical named entity recognition with neural networks","volume":"34","author":"Giorgi","year":"2018","journal-title":"Bioinformatics"},{"key":"2025072500440683300_ref112","doi-asserted-by":"publisher","first-page":"vbad001","DOI":"10.1093\/bioadv\/vbad001","article-title":"Applications of transformer-based language models in bioinformatics: a survey","volume":"3","author":"Zhang","year":"2023","journal-title":"Bioinform Adv"},{"key":"2025072500440683300_ref113","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2201.07338","article-title":"Controllable protein design with language models","author":"Ferruz","year":"2022"},{"key":"2025072500440683300_ref114","doi-asserted-by":"publisher","first-page":"754","DOI":"10.1007\/s10439-023-03324-9","article-title":"Code interpreter for bioinformatics: are we there yet?","volume":"52","author":"Wang","year":"2024","journal-title":"Ann Biomed Eng"},{"key":"2025072500440683300_ref115","doi-asserted-by":"publisher","first-page":"e2400005","DOI":"10.1002\/pmic.202400005","article-title":"Open-source large language models in action: a bioinformatics chatbot for PRIDE database","volume":"24","author":"Bai","year":"2024","journal-title":"Proteomics"},{"key":"2025072500440683300_ref116","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1186\/s13040-024-00414-9","article-title":"Open challenges and opportunities in federated foundation models towards biomedical healthcare","volume":"18","author":"Li","year":"2025","journal-title":"BioData Min"},{"key":"2025072500440683300_ref117","doi-asserted-by":"publisher","DOI":"10.3390\/jcm14051605","article-title":"Shaping the future of healthcare: ethical clinical challenges and pathways to trustworthy AI","volume":"14","author":"Goktas","year":"2025","journal-title":"J Clin Med"},{"key":"2025072500440683300_ref118","doi-asserted-by":"publisher","first-page":"2471001","DOI":"10.1142\/S021972002471001X","article-title":"How much can ChatGPT really help computational biologists in programming?","volume":"22","author":"Rahman","year":"2024","journal-title":"J Bioinform Comput Biol"},{"key":"2025072500440683300_ref119","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbac232","article-title":"Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks","volume":"23","author":"Fenoy","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025072500440683300_ref120","doi-asserted-by":"publisher","first-page":"i266","DOI":"10.1093\/bioinformatics\/btae230","article-title":"BioCoder: a benchmark for bioinformatics code generation with large language models","volume":"40","author":"Tang","year":"2024","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf357\/63842851\/bbaf357.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf357\/63842851\/bbaf357.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T04:44:18Z","timestamp":1753418658000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf357\/8212018"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":120,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf357","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7]]},"article-number":"bbaf357"}}