{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T01:22:47Z","timestamp":1773278567340,"version":"3.50.1"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Manage. Inf. Syst."],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>The strategic significance of Large Language Models (LLMs) in economic expansion, innovation, societal development, and national security has been increasingly recognized since the advent of ChatGPT. This study provides a comprehensive comparative evaluation of LLMs developed in the U.S. and China, in both English and Chinese contexts. We proposed an evaluation framework that encompasses natural language proficiency, disciplinary expertise, and safety and responsibility, and systematically assessed notable models from the U.S. and China under various operational tasks and scenarios. Our key findings show that GPT-4 Turbo leads in English contexts, whereas the Chinese LLM Ernie-Bot 4 stands out in Chinese contexts. The study also highlights disparities in LLM performance across languages and tasks, stressing the necessity for linguistically and culturally nuanced model development. The complementary strengths of LLMs developed in the U.S. and China highlight the cross-national collaboration value in advancing LLM technology. The research delineates the current LLM competition landscape and offers valuable insights for policymakers and businesses regarding strategic LLM investments and development.<\/jats:p>","DOI":"10.1145\/3769086","type":"journal-article","created":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:36:52Z","timestamp":1758627412000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["AI Development and Innovation: A Comparison of Large Language Models from the U.S. and China"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4161-3447","authenticated-orcid":false,"given":"Jiaxin","family":"Li","sequence":"first","affiliation":[{"name":"The University of Hong Kong","place":["Hong Kong, Hong Kong"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5803-3919","authenticated-orcid":false,"given":"Zhenhui","family":"Jiang","sequence":"additional","affiliation":[{"name":"The University of Hong Kong","place":["Hong Kong, Hong Kong"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7469-9027","authenticated-orcid":false,"given":"Yang","family":"Liu","sequence":"additional","affiliation":[{"name":"Xi'an Jiaotong University","place":["Xi'an, China"]}]}],"member":"320","published-online":{"date-parts":[[2025,11,14]]},"reference":[{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3682069"},{"key":"e_1_3_4_3_2","doi-asserted-by":"crossref","unstructured":"Yejin Bang Samuel Cahyawijaya Nayeon Lee Wenliang Dai Dan Su Bryan Wilie Holy Lovenia Ziwei Ji Tiezheng Yu Willy Chung Quyet V. Do Yan Xu and Pascale Fung. 2023. A multitask multilingual multimodal evaluation of ChatGPT on reasoning hallucination and interactivity. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (volume 1: Long Papers). November 2023. Association for Computational Linguistics Nusa Dua Bali 675--718.","DOI":"10.18653\/v1\/2023.ijcnlp-main.45"},{"key":"e_1_3_4_4_2","doi-asserted-by":"crossref","unstructured":"Ning Bian Xianpei Han Le Sun Hongyu Lin Yaojie Lu Ben He Shanshan Jiang and Bin Dong. 2024. ChatGPT Is a Knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics Language Resources and Evaluation (LREC-COLING 2024). 3098--3110.","DOI":"10.63317\/32y85i5g9gso"},{"key":"e_1_3_4_5_2","unstructured":"Rebecca Cairns. 2025. China pitches global AI governance group as the US goes it alone | CNN Business. CNN. Retrieved August 14 2025 from https:\/\/www.cnn.com\/2025\/07\/28\/tech\/china-global-ai-cooperation-organization-waic-hnk-spc"},{"key":"e_1_3_4_6_2","doi-asserted-by":"crossref","unstructured":"Yupeng Chang Xu Wang Jindong Wang Yuan Wu Linyi Yang Kaijie Zhu Hao Chen Xiaoyuan Yi Cunxiang Wang and Yidong Wang. 2024. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology 15 3 (2024) 1--45.","DOI":"10.1145\/3641289"},{"key":"e_1_3_4_7_2","doi-asserted-by":"crossref","unstructured":"Minje Choi Jiaxin Pei Sagar Kumar Chang Shu and David Jurgens. 2023. Do LLMs understand social knowledge? Evaluating the sociability of large language models with SocKET benchmark. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 11370--11403.","DOI":"10.18653\/v1\/2023.emnlp-main.699"},{"key":"e_1_3_4_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3688400"},{"key":"e_1_3_4_9_2","doi-asserted-by":"crossref","unstructured":"Sorouralsadat Fatemi Yuheng Hu and Maryam Mousavi. 2025. A Comparative analysis of instruction fine-tuning large language models for financial text classification. ACM Trans. Manage. Inf. Syst. 16 1 (March 2025) 1--30.","DOI":"10.1145\/3706119"},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3705734"},{"key":"e_1_3_4_11_2","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andy Zou Mantas Mazeika Dawn Song and Jacob Steinhardt. 2021. Measuring massive multitask language understanding. In 9th International Conference on Learning Representations 2021. Virtual Event Austria. 2021."},{"key":"e_1_3_4_12_2","doi-asserted-by":"publisher","unstructured":"Taojun Hu and Xiao-Hua Zhou. 2024. Unveiling LLM evaluation focused on metrics: Challenges and solutions. arXiv:2404.09135. Retrieved from https:\/\/arxiv.org\/abs\/2404.09135. DOI:10.48550\/arXiv.2404.09135","DOI":"10.48550\/arXiv.2404.09135"},{"key":"e_1_3_4_13_2","doi-asserted-by":"crossref","unstructured":"Yuzhen Huang Yuzhuo Bai Zhihao Zhu Junlei Zhang Jinghan Zhang Tangjun Su Junteng Liu Chuancheng Lv Yikai Zhang and Yao Fu. 2023. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. Advances in Neural Information Processing Systems 36 (2023) 62991--63010.","DOI":"10.52202\/075280-2749"},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3700597"},{"key":"e_1_3_4_15_2","doi-asserted-by":"crossref","unstructured":"Yukyung Lee Soonwon Ka Bokyung Son Pilsung Kang and Jaewook Kang. 2025. Navigating the path of writing: Outline-guided text generation with large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track). 233--250.","DOI":"10.18653\/v1\/2025.naacl-industry.20"},{"key":"e_1_3_4_16_2","doi-asserted-by":"crossref","unstructured":"Haonan Li Yixuan Zhang Fajri Koto Yifei Yang Hai Zhao Yeyun Gong Nan Duan and Timothy Baldwin. 2024. CMMLU: Measuring massive multitask language understanding in Chinese. In Findings of the Association for Computational Linguistics ACL 2024. 11260--11285.","DOI":"10.18653\/v1\/2024.findings-acl.671"},{"key":"e_1_3_4_17_2","unstructured":"Percy Liang Rishi Bommasani Tony Lee Dimitris Tsipras Dilara Soylu Michihiro Yasunaga Yian Zhang Deepak Narayanan Yuhuai Wu Ananya Kumar Benjamin Newman Binhang Yuan Bobby Yan Ce Zhang Christian Cosgrove Christopher D. Manning Christopher Re Diana Acosta-Navas Drew A. Hudson Eric Zelikman Esin Durmus Faisal Ladhak Frieda Rong Hongyu Ren Huaxiu Yao Jue WANG Keshav Santhanam Laurel Orr Lucia Zheng Mert Yuksekgonul Mirac Suzgun Nathan Kim Neel Guha Niladri S. Chatterji Omar Khattab Peter Henderson Qian Huang Ryan Andrew Chi Sang Michael Xie Shibani Santurkar Surya Ganguli Tatsunori Hashimoto Thomas Icard Tianyi Zhang Vishrav Chaudhary William Wang Xuechen Li Yifan Mai Yuhui Zhang and Yuta Koreeda. 2023. Holistic evaluation of language models. Transactions on Machine Learning Research (2023)."},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-acl.165"},{"key":"e_1_3_4_19_2","doi-asserted-by":"crossref","unstructured":"Alejandro Pe\u00f1a Aythami Morales Julian Fierrez Ignacio Serna Javier Ortega-Garcia \u00cd\u00f1igo Puente Jorge C\u00f3rdova and Gonzalo C\u00f3rdova. 2023. Leveraging large language models for topic classification in the domain of public affairs. In Document Analysis and Recognition -- ICDAR 2023 Workshops. Springer Nature Switzerland Cham 20--33.","DOI":"10.1007\/978-3-031-41498-5_2"},{"key":"e_1_3_4_20_2","doi-asserted-by":"crossref","unstructured":"Chengwei Qin Aston Zhang Zhuosheng Zhang Jiaao Chen Michihiro Yasunaga and Diyi Yang. 2023. Is ChatGPT a general-purpose natural language processing task solver? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 1339--1384.","DOI":"10.18653\/v1\/2023.emnlp-main.85"},{"key":"e_1_3_4_21_2","article-title":"Everything you need to know about Multilingual LLMs: Towards fair, performant and reliable models for languages of the world","author":"Sitaram Sunayana","year":"2023","unstructured":"Sunayana Sitaram, Monojit Choudhury, Barun Patra, Vishrav Chaudhary, Kabir Ahuja, and Kalika Bali. 2023. Everything you need to know about Multilingual LLMs: Towards fair, performant and reliable models for languages of the world. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Toronto, Canada. Retrieved from https:\/\/aclanthology.org\/2023.acl-tutorials.3","journal-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics"},{"key":"e_1_3_4_22_2","doi-asserted-by":"crossref","unstructured":"Zhengwei Tao Zhi Jin Yifan Zhang Xiancai Chen Haiyan Zhao Jia Li Bin Liang Chongyang Tao Qun Liu and Kam-Fai Wong. 2025. A comprehensive evaluation on event reasoning of large language models. In Proceedings of the AAAI Conference on Artificial Intelligence. 25273--25281.","DOI":"10.1609\/aaai.v39i24.34714"},{"key":"e_1_3_4_23_2","unstructured":"Wen Wang Siqi Pei and Tianshu Sun. 2023. Unraveling generative AI from a human intelligence perspective: A battery of experiments. Available SSRN 4543351 (2023)."},{"key":"e_1_3_4_24_2","unstructured":"Zengzhi Wang Qiming Xie Yi Feng Zixiang Ding Zinong Yang and Rui Xia. 2024. Is ChatGPT a good sentiment analyzer? In First Conference on Language Modeling. 2024."},{"key":"e_1_3_4_25_2","doi-asserted-by":"crossref","unstructured":"Fangyuan Xu Yixiao Song Mohit Iyyer and Eunsol Choi. 2023. A critical evaluation of evaluations for long-form question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers). Association for Computational Linguistics Toronto Canada 3225--3245.","DOI":"10.18653\/v1\/2023.acl-long.181"},{"key":"e_1_3_4_26_2","unstructured":"Guohai Xu Jiayi Liu Ming Yan Haotian Xu Jinghui Si Zhuoran Zhou Peng Yi Xing Gao Jitao Sang Rong Zhang et al. 2023. CValues: Measuring the values of Chinese large language models from safety to responsibility. arXiv:2307.09705. Retrieved from http:\/\/arxiv.org\/abs\/2307.09705"},{"key":"e_1_3_4_27_2","unstructured":"Liang Xu Anqi Li Lei Zhu Hang Xue Changtai Zhu Kangkang Zhao Haonan He Xuanwei Zhang Qiyue Kang and Zhenzhong Lan. 2023. SuperCLUE: A comprehensive Chinese large language model benchmark. arXiv:2307.15020. Retrieved from http:\/\/arxiv.org\/abs\/2307.15020"},{"key":"e_1_3_4_28_2","doi-asserted-by":"crossref","unstructured":"Wenxuan Zhang Yue Deng Bing Liu Sinno Pan and Lidong Bing. 2024. Sentiment analysis in the era of large language models: A reality check. In Findings of the Association for Computational Linguistics: NAACL 2024. Association for Computational Linguistics Mexico City Mexico 3881--3906.","DOI":"10.18653\/v1\/2024.findings-naacl.246"},{"key":"e_1_3_4_29_2","doi-asserted-by":"crossref","unstructured":"Zhexin Zhang Leqi Lei Lindong Wu Rui Sun Yongkang Huang Chong Long Xiao Liu Xuanyu Lei Jie Tang and Minlie Huang. 2024. SafetyBench: Evaluating the safety of large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15537--15553.","DOI":"10.18653\/v1\/2024.acl-long.830"},{"key":"e_1_3_4_30_2","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong et al. 2023. A survey of large language models. arXiv:2303.18223. Retrieved from http:\/\/arxiv.org\/abs\/2303.18223"},{"key":"e_1_3_4_31_2","doi-asserted-by":"crossref","unstructured":"Wanjun Zhong Ruixiang Cui Yiduo Guo Yaobo Liang Shuai Lu Yanlin Wang Amin Saied Weizhu Chen and Nan Duan. 2024. AGIEval: A human-centric benchmark for evaluating foundation models. In Findings of the Association for Computational Linguistics: NAACL 2024 (2024) 2299--2314.","DOI":"10.18653\/v1\/2024.findings-naacl.149"},{"key":"e_1_3_4_32_2","unstructured":"The White House. 2025. White House Unveils America.s AI Action Plan. The White House. Retrieved August 14 2025 from https:\/\/www.whitehouse.gov\/articles\/2025\/07\/white-house-unveils-americas-ai-action-plan\/"},{"key":"e_1_3_4_33_2","unstructured":"The 2025 AI Index Report | Stanford HAI. Retrieved April 22 2025 from https:\/\/hai.stanford.edu\/ai-index\/2025-ai-index-report"},{"key":"e_1_3_4_34_2","unstructured":"CGTN. 2024. China U.S. hold first meeting of inter-governmental dialogue on AI. Retrieved April 30 2025 from https:\/\/english.www.gov.cn\/news\/202405\/16\/content_WS664579edc6d0868f4e8e7268.html"},{"key":"e_1_3_4_35_2","unstructured":"Open LLM Leaderboard - a Hugging Face Space by open-llm-leaderboard. Retrieved August 6 2025 from https:\/\/huggingface.co\/spaces\/open-llm-leaderboard\/open_llm_leaderboard"},{"key":"e_1_3_4_36_2","unstructured":"MMLU-Pro Benchmark Leaderboard. Artificial Analysis. Retrieved August 6 2025 from https:\/\/artificialanalysis.ai\/evaluations\/mmlu-pro"},{"key":"e_1_3_4_37_2","unstructured":"SuperCLUE\u4e2d\u6587\u5927\u6a21\u578b\u6d4b\u8bc4\u57fa\u51c6\u2014\u2014\u8bc4\u6d4b\u699c\u5355. Retrieved from https:\/\/www.superclueai.com"},{"key":"e_1_3_4_38_2","unstructured":"Chatbot Arena Leaderboard - a Hugging Face Space by lmarena-ai. Retrieved April 20 2025 from https:\/\/huggingface.co\/spaces\/lmarena-ai\/chatbot-arena-leaderboard"}],"container-title":["ACM Transactions on Management Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769086","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T10:53:06Z","timestamp":1773226386000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3769086"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,14]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3769086"],"URL":"https:\/\/doi.org\/10.1145\/3769086","relation":{},"ISSN":["2158-656X","2158-6578"],"issn-type":[{"value":"2158-656X","type":"print"},{"value":"2158-6578","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,14]]},"assertion":[{"value":"2025-03-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}