{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T22:18:03Z","timestamp":1780093083674,"version":"3.54.0"},"reference-count":43,"publisher":"China Science Publishing & Media Ltd.","issue":"4","license":[{"start":{"date-parts":[[2023,12,20]],"date-time":"2023-12-20T00:00:00Z","timestamp":1703030400000},"content-version":"vor","delay-in-days":353,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,11,1]]},"abstract":"<jats:title>ABSTRACT<\/jats:title><jats:p>ChatGPT has attracted extension attention of academia and industry. This paper aims to evaluate ChatGPT in Chinese language understanding capability on 6 tasks using 11 datasets. Experiments indicate that ChatGPT achieved competitive results in sentiment analysis, summary, and reading comprehension in Chinese, while it is prone to factual errors in closed-book QA. Further, on two more difficult Chinese understanding tasks, that is, idiom fill-in-the-blank and cants understanding, we found that a simple chain-of-thought prompt can improve the accuracy of ChatGPT in complex reasoning. This paper further analyses the possible risks of using ChatGPT based on the results. Finally, we briefly describe the research and development progress of our ChatBIT.<\/jats:p>","DOI":"10.1162\/dint_a_00232","type":"journal-article","created":{"date-parts":[[2023,9,12]],"date-time":"2023-09-12T15:37:56Z","timestamp":1694533076000},"page":"885-903","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":17,"title":["Evaluation on ChatGPT for Chinese Language Understanding"],"prefix":"10.3724","volume":"5","author":[{"given":"Linhan","family":"Li","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Huaping","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chunjin","family":"Li","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haowen","family":"You","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wenyao","family":"Cui","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"2026","published-online":{"date-parts":[[2023,11,1]]},"reference":[{"key":"2023122015381488700_ref1","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023122015381488700_ref2","volume-title":"Lamda: Language models for dialog applications","author":"Thoppilan","year":"2022"},{"key":"2023122015381488700_ref3","volume-title":"Ernie 3.0 titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation","author":"Wang","year":"2021"},{"key":"2023122015381488700_ref4","volume-title":"PanGu-\u03b1: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation","author":"Zeng","year":"2021"},{"key":"2023122015381488700_ref5","volume-title":"WeLM: A Well-Read Pre-trained Language Model for Chinese","author":"Su","year":"2022"},{"key":"2023122015381488700_ref6","volume-title":"Glm-130b: An open bilingual pre-trained model","author":"Zeng","year":"2022"},{"key":"2023122015381488700_ref7","first-page":"4110","volume-title":"Dynabench: Dynabench: Rethinking benchmarking in NLP","author":"Kiela","year":"2021"},{"key":"2023122015381488700_ref8","first-page":"1","volume-title":"ChatGPT: potential, prospects, and limitations","author":"Zhou","year":"2023"},{"issue":"7947","key":"2023122015381488700_ref9","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1038\/d41586-023-00288-7","article-title":"ChatGPT: five priorities for research","volume":"614","author":"van Dis","year":"2023","journal-title":"Nature"},{"issue":"6630","key":"2023122015381488700_ref10","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1126\/science.adg7879","article-title":"ChatGPT is fun, but not an author[J]","volume":"379","author":"Thorp","year":"2023","journal-title":"Science"},{"key":"2023122015381488700_ref11","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.emnlp-main.85","volume-title":"Is ChatGPT a General-Purpose Natural Language Processing Task Solver?","author":"Qin","year":"2023"},{"key":"2023122015381488700_ref12","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.ijcnlp-main.45","volume-title":"A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity","author":"Bang","year":"2023"},{"key":"2023122015381488700_ref13","volume-title":"How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks","author":"Chen","year":"2023"},{"key":"2023122015381488700_ref14","volume-title":"Is ChatGPT a good translator? A preliminary study","author":"Jiao","year":"2023"},{"key":"2023122015381488700_ref15","first-page":"4171","volume-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2019"},{"key":"2023122015381488700_ref16","volume-title":"Improving language understanding by generative pretraining","author":"Radford"},{"key":"2023122015381488700_ref17","volume-title":"Language models are unsupervised multitask learners","author":"Radford"},{"issue":"2","key":"2023122015381488700_ref18","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1207\/s15516709cog1402_1","article-title":"Finding structure in time","volume":"14","author":"Elman","year":"1990","journal-title":"Cognitive Science"},{"issue":"8","key":"2023122015381488700_ref19","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2023122015381488700_ref20","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023122015381488700_ref21","volume-title":"Evaluating large language models trained on code","author":"Chen","year":"2021"},{"key":"2023122015381488700_ref22","volume-title":"Finetuned language models are zero-shot learners","author":"Wei","year":"2021"},{"key":"2023122015381488700_ref23","first-page":"270","volume-title":"Dialogpt: Large-scale generative pre-training for conversational response generation","author":"Zhang","year":"2020"},{"key":"2023122015381488700_ref24","volume-title":"Webgpt: Browser-assisted question-answering with human feedback","author":"Nakano","year":"2021"},{"key":"2023122015381488700_ref25","volume-title":"Training language models to follow instructions with human feedback","author":"Ouyang","year":"2022"},{"key":"2023122015381488700_ref26","first-page":"4299","volume-title":"Deep reinforcement learning from human preferences","author":"Christiano","year":"2017"},{"key":"2023122015381488700_ref27","volume-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017"},{"key":"2023122015381488700_ref28","volume-title":"Fewclue: A Chinese few-shot learning evaluation benchmark","author":"Xu","year":"2021"},{"key":"2023122015381488700_ref29","first-page":"1967","volume-title":"STS: A large scale Chinese short text summarization dataset","author":"Hu","year":"2015"},{"key":"2023122015381488700_ref30","first-page":"942","volume-title":"Overview of the NLPCC 2017 shared task: single document summarization","author":"Hua","year":"2018"},{"key":"2023122015381488700_ref31","first-page":"5883","volume-title":"A span-extraction dataset for Chinese machine reading comprehension","author":"Cui","year":"2019"},{"key":"2023122015381488700_ref32","volume-title":"DRCD: A Chinese machine reading comprehension dataset","author":"Shao","year":"2018"},{"key":"2023122015381488700_ref33","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1162\/tacl_a_00305","article-title":"Investigating prior knowledge for challenging chinese machine reading comprehension","volume":"8","author":"Sun","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2023122015381488700_ref34","volume-title":"Dataset and neural recurrent sequence labeling model for open-domain factoid question answering","author":"Li","year":"2016"},{"key":"2023122015381488700_ref35","volume-title":"ROUGE: A Package for Automatic Evaluation of summaries","author":"Lin","year":"2004"},{"key":"2023122015381488700_ref36","volume-title":"GPT-4 Technical Report","author":"OpenAI","year":"2023"},{"key":"2023122015381488700_ref37","volume-title":"Chain of thought prompting elicits reasoning in large language models","author":"Wei","year":"2022"},{"key":"2023122015381488700_ref38","volume-title":"Self-consistency improves chain of thought reasoning in language models","author":"Wang","year":"2022"},{"key":"2023122015381488700_ref39","volume-title":"Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks","author":"Chen","year":"2022"},{"key":"2023122015381488700_ref40","volume-title":"Large language models can self-improve","author":"Huang","year":"2022"},{"key":"2023122015381488700_ref41","volume-title":"Blow the dog whistle: A Chinese dataset for cant understanding with common sense and world knowledge","author":"Xu","year":"2021"},{"issue":"12","key":"2023122015381488700_ref42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3571730","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji","year":"2023","journal-title":"ACM Computing Surveys"},{"key":"2023122015381488700_ref43","volume-title":"Mixed precision training","author":"Micikevicius","year":"2017"}],"container-title":["Data Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/5\/4\/885\/2199763\/dint_a_00232.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/5\/4\/885\/2199763\/dint_a_00232.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:43:26Z","timestamp":1741938206000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.sciengine.com\/doi\/10.1162\/dint_a_00232"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,11,1]]}},"URL":"https:\/\/doi.org\/10.1162\/dint_a_00232","relation":{},"ISSN":["2641-435X"],"issn-type":[{"value":"2641-435X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}