{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T14:02:53Z","timestamp":1763992973990,"version":"3.45.0"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T00:00:00Z","timestamp":1763942400000},"content-version":"vor","delay-in-days":23,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01HG012572","R01DA063316"],"award-info":[{"award-number":["R01HG012572","R01DA063316"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The single-cell sequencing revolution enables simultaneous molecular profiling of various modalities across thousands of individual cells, allowing scientists to investigate the diverse functions of complex tissues. Among all the analysis steps, assigning individual cells to specific types is fundamental for understanding cellular heterogeneity. However, this process is labor-intensive and requires extensive expert knowledge. Recent advances in large language models (LLMs) have demonstrated their ability to automatically extract biological knowledge, such as marker genes, promoting efficient, and automated cell-type annotations. To evaluate the capability of modern LLMs in automating the cell-type identification process, we first introduce an automated cell-type annotation method with comprehensive benchmark: Single-cell Omics Arena). Specifically, we began by compiling 11 publicly available single-cell RNA sequencing (scRNA-seq) datasets and evaluating eight LLMs across 1226 cell-type annotation-related tasks. This effort established a foundation for automated cell-type annotation from scRNA-seq data using interpretable features such as gene names. Building upon this benchmark, we introduced domain-specific chain-of-thought prompting techniques to enhance the accuracy of cell-type annotation and facilitate the extraction of relevant biological insights. Finally, to accommodate non-interpretable features, we proposed to leverage a pretrained VAE-based cross-modality translation module to convert features such as epigenetic marks into interpretable representations, which enables the seamless extension of LLM-based cell-type annotation to non-RNA-based sequencing technologies. In summary, our benchmark provides key insights into automated cell-type annotation from scRNA-seq data and demonstrates the potential of cross-modality translation for handling non-interpretable features.<\/jats:p>","DOI":"10.1093\/bib\/bbaf622","type":"journal-article","created":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T13:57:48Z","timestamp":1763992668000},"source":"Crossref","is-referenced-by-count":0,"title":["Single-cell omics arena: evaluation of large language models for automatic cell-type annotations on single-cell omics data via RNA-seq bridging"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-3694-7799","authenticated-orcid":false,"given":"Junhao","family":"Liu","sequence":"first","affiliation":[{"name":"Department of Computer Science , University of California, Irvine, 6210 Donald Bren Hall, Irvine, CA 92697,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2828-3706","authenticated-orcid":false,"given":"Siwei","family":"Xu","sequence":"additional","affiliation":[{"name":"Department of Computer Science , University of California, Irvine, 6210 Donald Bren Hall, Irvine, CA 92697,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1497-2444","authenticated-orcid":false,"given":"Yongxian","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of Chemical and Biomolecular Engineering , University of California, Irvine, 5200 Engineering Hall, Irvine, CA 92697,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5970-0509","authenticated-orcid":false,"given":"Jing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science , University of California, Irvine, 6210 Donald Bren Hall, Irvine, CA 92697,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,11,24]]},"reference":[{"key":"2025112408574334400_ref1","doi-asserted-by":"publisher","first-page":"494","DOI":"10.1038\/s41576-023-00580-2","article-title":"Methods and applications for single-cell and spatial multi-omics","volume":"24","author":"Vandereyken","year":"2023","journal-title":"Nat Rev Genet"},{"key":"2025112408574334400_ref2","doi-asserted-by":"publisher","first-page":"695","DOI":"10.1038\/s41580-023-00615-w","article-title":"The technological landscape and applications of single-cell multi-omics","volume":"24","author":"Baysoy","year":"2023","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2025112408574334400_ref3","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1038\/s41576-019-0093-7","article-title":"Integrative single-cell analysis","volume":"20","author":"Stuart","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2025112408574334400_ref4","doi-asserted-by":"publisher","first-page":"1007","DOI":"10.1016\/j.tibtech.2020.02.013","article-title":"Integrative methods and practical challenges for single-cell multi-omics","volume":"38","author":"Ma","year":"2020","journal-title":"Trends Biotechnol"},{"key":"2025112408574334400_ref5","doi-asserted-by":"publisher","first-page":"e2023070118","DOI":"10.1073\/pnas.2023070118","article-title":"Babel enables cross-modality translation between multiomic profiles at single-cell resolution","volume":"118","author":"Wu","year":"2021","journal-title":"Proc Natl Acad Sci"},{"key":"2025112408574334400_ref6","doi-asserted-by":"publisher","first-page":"1479","DOI":"10.1038\/s41588-022-01187-9","article-title":"Identifying disease-critical cell types and cellular processes by integrating single-cell rna-sequencing and human genetics","volume":"54","author":"Jagadeesh","year":"2022","journal-title":"Nat Genet"},{"key":"2025112408574334400_ref7","doi-asserted-by":"publisher","first-page":"eabl4290","DOI":"10.1126\/science.abl4290","article-title":"Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function","volume":"376","author":"Eraslan","year":"2022","journal-title":"Science"},{"article-title":"Cell2Sentence: teaching large language models the language of biology","volume-title":"Proceedings of the 41st International Conference on Machine Learning","year":"2024","key":"2025112408574334400_ref8"},{"key":"2025112408574334400_ref9"},{"key":"2025112408574334400_ref10"},{"key":"2025112408574334400_ref11"},{"key":"2025112408574334400_ref12"},{"key":"2025112408574334400_ref13"},{"key":"2025112408574334400_ref14","doi-asserted-by":"publisher","first-page":"616","DOI":"10.1038\/s41586-023-06139-9","article-title":"Transfer learning enables predictions in network biology","volume":"618","author":"Theodoris","year":"2023","journal-title":"Nature"},{"key":"2025112408574334400_ref15","doi-asserted-by":"crossref","DOI":"10.1038\/s41592-024-02201-0","article-title":"scGPT: toward building a foundation model for single-cell multi-omics using generative AI","volume":"21","author":"Cui","year":"2024","journal-title":"Nat Methods"},{"key":"2025112408574334400_ref16","doi-asserted-by":"crossref","DOI":"10.1038\/s41592-024-02235-4","article-title":"Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis","volume":"21","author":"Hou","year":"2024","journal-title":"Nat Methods"},{"key":"2025112408574334400_ref17","doi-asserted-by":"crossref","first-page":"D870","DOI":"10.1093\/nar\/gkac947","article-title":"CellMarker 2.0: an updated database of manually curated cell markers in human\/mouse and web tools based on scRNA-seq data","volume":"51","author":"Congxue","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025112408574334400_ref18","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1038\/s41590-018-0276-y","article-title":"Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage","volume":"20","author":"Aran","year":"2019","journal-title":"Nat Immunol"},{"key":"2025112408574334400_ref19","doi-asserted-by":"publisher","first-page":"1246","DOI":"10.1038\/s41467-022-28803-w","article-title":"Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data","volume":"13","author":"Ianevski","year":"2022","journal-title":"Nat Commun"},{"key":"2025112408574334400_ref20","article-title":"Measuring massive multitask language understanding","volume-title":"International Conference on Learning Representations","author":"Hendrycks","year":"2021"},{"key":"2025112408574334400_ref21"},{"key":"2025112408574334400_ref22","first-page":"22199","article-title":"Large language models are zero-shot reasoners","volume-title":"Advances in neural information processing systems","author":"Kojima","year":"2022"},{"key":"2025112408574334400_ref23"},{"key":"2025112408574334400_ref24","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1126\/science.1181369","article-title":"Comprehensive mapping of long-range interactions reveals folding principles of the human genome","volume":"326","author":"Lieberman-Aiden","year":"2009","journal-title":"Science"},{"key":"2025112408574334400_ref25"},{"key":"2025112408574334400_ref26"},{"key":"2025112408574334400_ref27"},{"key":"2025112408574334400_ref28"},{"volume-title":"Stanford Alpaca: An Instruction-Following Llama Model","year":"2023","author":"Taori","key":"2025112408574334400_ref29"},{"key":"2025112408574334400_ref30"},{"key":"2025112408574334400_ref31","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.naacl-long.478","article-title":"On learning to summarize with large language models as references","volume-title":"Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)","author":"Liu"},{"key":"2025112408574334400_ref32","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume-title":"Advances in neural information processing systems","author":"Ouyang","year":"2022"},{"key":"2025112408574334400_ref33"},{"key":"2025112408574334400_ref34","doi-asserted-by":"crossref","DOI":"10.1145\/3627673.3679576","article-title":"scACT: accurate cross-modality translation via cycle-consistent training from unpaired single-cell data","volume-title":"Proceedings of the 33rd ACM International Conference on Information and Knowledge Management","author":"Xu"},{"key":"2025112408574334400_ref35","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1038\/s41586-019-1629-x","article-title":"The human body at cellular resolution: the NIH human biomolecular atlas program","volume":"574","author":"consortium HuBMAP.","year":"2019","journal-title":"Nature"},{"key":"2025112408574334400_ref36","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1038\/s41586-020-2157-4","article-title":"Construction of a human cell landscape at single-cell level","volume":"581","author":"Han","year":"2020","journal-title":"Nature"},{"key":"2025112408574334400_ref37","doi-asserted-by":"publisher","first-page":"1091","DOI":"10.1016\/j.cell.2018.02.001","article-title":"Mapping the mouse cell atlas by Microwell-seq","volume":"172","author":"Han","year":"2018","journal-title":"Cell"},{"key":"2025112408574334400_ref38","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1038\/s41421-023-00559-7","article-title":"Single-cell landscape of primary central nervous system diffuse large B-cell lymphoma","volume":"9","author":"Liu","year":"2023","journal-title":"Cell Discovery"},{"key":"2025112408574334400_ref39","doi-asserted-by":"publisher","first-page":"594","DOI":"10.1038\/s41588-020-0636-z","article-title":"Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer","volume":"52","author":"Lee","year":"2020","journal-title":"Nat Genet"},{"key":"2025112408574334400_ref40","doi-asserted-by":"publisher","first-page":"2285","DOI":"10.1038\/s41467-020-16164-1","article-title":"Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma","volume":"11","author":"Kim","year":"2020","journal-title":"Nat Commun"},{"key":"2025112408574334400_ref41","doi-asserted-by":"crossref","first-page":"eabl4896","DOI":"10.1126\/science.abl4896","article-title":"The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans","volume":"376","author":"Consortium The Tabula Sapiens","year":"2022","journal-title":"Science"},{"key":"2025112408574334400_ref42","doi-asserted-by":"publisher","first-page":"7083","DOI":"10.1038\/s41467-021-27162-2","article-title":"Single cell atlas for 11 non-model mammals, reptiles and birds","volume":"12","author":"Chen","year":"2021","journal-title":"Nat Commun"},{"key":"2025112408574334400_ref43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-017-1382-0","article-title":"Scanpy: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"key":"2025112408574334400_ref44","first-page":"118","article-title":"A new ontology lookup service at EMBL-EBI","volume":"2","author":"Jupp","year":"2015","journal-title":"SWAT4LS"},{"key":"2025112408574334400_ref45","doi-asserted-by":"crossref","first-page":"W170","DOI":"10.1093\/nar\/gkp440","article-title":"BioPortal: ontologies and integrated data resources at the click of a mouse","volume":"37","author":"Noy","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2025112408574334400_ref46","doi-asserted-by":"crossref","first-page":"eadi5199","DOI":"10.1126\/science.adi5199","article-title":"Single-cell genomics and regulatory networks for 388 human brains","volume":"384","author":"Emani","year":"2024","journal-title":"Science"},{"author":"","key":"2025112408574334400_ref47"},{"author":"","key":"2025112408574334400_ref48"},{"author":"","key":"2025112408574334400_ref49"},{"key":"2025112408574334400_ref50","doi-asserted-by":"crossref","DOI":"10.3115\/1073083.1073135","article-title":"BLEU: a method for automatic evaluation of machine translation","volume-title":"Proceedings of the 40th annual meeting of the Association for Computational Linguistics","author":"Papineni"},{"key":"2025112408574334400_ref51","doi-asserted-by":"crossref","DOI":"10.3115\/1220355.1220427","article-title":"ORANGE: a method for evaluating automatic evaluation metrics for machine translation","volume-title":"COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics","author":"Lin"},{"article-title":"SQuAD: 100, 000+ questions for machine comprehension of text","author":"Rajpurkar","key":"2025112408574334400_ref52","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D16-1264"},{"author":"Chen","key":"2025112408574334400_ref53","article-title":"When do you need chain-of-thought prompting for chatGPT?"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/6\/bbaf622\/65490434\/bbaf622.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/6\/bbaf622\/65490434\/bbaf622.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T13:57:52Z","timestamp":1763992672000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf622\/8341160"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,1]]},"references-count":53,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf622","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2025,11]]},"published":{"date-parts":[[2025,11,1]]},"article-number":"bbaf622"}}