{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T21:48:55Z","timestamp":1776116935173,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":52,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,23]]},"DOI":"10.1145\/3715275.3732147","type":"proceedings-article","created":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T17:01:18Z","timestamp":1750698078000},"page":"2151-2165","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7613-5636","authenticated-orcid":false,"given":"Ariba","family":"Khan","sequence":"first","affiliation":[{"name":"MIT, Cambridge, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0084-1937","authenticated-orcid":false,"given":"Stephen","family":"Casper","sequence":"additional","affiliation":[{"name":"MIT, Cambridge, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6168-4763","authenticated-orcid":false,"given":"Dylan","family":"Hadfield-Menell","sequence":"additional","affiliation":[{"name":"MIT, Cambridge, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,6,23]]},"reference":[{"key":"e_1_3_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.emnlp-main.882"},{"key":"e_1_3_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.671"},{"key":"e_1_3_3_2_4_2","unstructured":"Sotiris Anagnostidis and Jannis Bulian. 2024. How Susceptible are LLMs to Influence in Prompts? arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2408.11865 (2024)."},{"key":"e_1_3_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.c3nlp-1.12"},{"key":"e_1_3_3_2_6_2","unstructured":"Leif Azzopardi and Yashar Moshfeghi. 2024. PRISM: a methodology for auditing biases in large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.18906 (2024)."},{"key":"e_1_3_3_2_7_2","unstructured":"Noam Benkler Drisana Mosaphir Scott Friedman Andrew Smart and Sonja Schmer-Galunder. 2023. Assessing LLMs for Moral Value Pluralism. arxiv:https:\/\/arXiv.org\/abs\/2312.10075\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2312.10075"},{"key":"e_1_3_3_2_8_2","doi-asserted-by":"publisher","unstructured":"S. Beugelsdijk and C. Welzel. 2018. Dimensions and Dynamics of National Culture: Synthesizing Hofstede With Inglehart. Journal of Cross-Cultural Psychology 49 10 (2018) 1469\u20131505. https:\/\/doi.org\/10.1177\/0022022118798505","DOI":"10.1177\/0022022118798505"},{"key":"e_1_3_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.15465\/gesis-sgen016"},{"key":"e_1_3_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.c3nlp-1.7"},{"key":"e_1_3_3_2_11_2","unstructured":"Stephen Casper Xander Davies Claudia Shi Thomas\u00a0Krendl Gilbert J\u00e9r\u00e9my Scheurer Javier Rando Rachel Freedman Tomasz Korbak David Lindner Pedro Freire et\u00a0al. 2023. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.15217 (2023)."},{"key":"e_1_3_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Tanise Ceron Neele Falk Ana Bari\u0107 Dmitry Nikolaev and Sebastian Pad\u00f3. 2024. Beyond Prompt Brittleness: Evaluating the Reliability and Consistency of Political Worldviews in LLMs. Transactions of the Association for Computational Linguistics 12 (2024) 1378\u20131400.","DOI":"10.1162\/tacl_a_00710"},{"key":"e_1_3_3_2_13_2","unstructured":"Ricardo Dominguez-Olmedo Moritz Hardt and Celestine Mendler-D\u00fcnner. 2024. Questioning the Survey Responses of Large Language Models. arxiv:https:\/\/arXiv.org\/abs\/2306.07951\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2306.07951"},{"key":"e_1_3_3_2_14_2","unstructured":"E. Durmus K. Nguyen T. Liao N. Schiefer A. Askell A. Bakhtin C. Chen Z. Hatfield-Dodds D. Hernandez N. Joseph L. Lovitt S. McCandlish O. Sikder A. Tamkin J. Thamkul J. Kaplan J. Clark and D. Ganguli. 2023. Towards Measuring the Representation of Subjective Global Opinions in Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2306.16388 (June 2023) 1\u201343. https:\/\/arxiv.org\/abs\/2306.16388"},{"key":"e_1_3_3_2_15_2","unstructured":"Federico Errica Giuseppe Siracusano Davide Sanvito and Roberto Bifulco. 2024. What Did I Do Wrong? Quantifying LLMs\u2019 Sensitivity and Consistency to Prompt Engineering. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.12334 (2024)."},{"key":"e_1_3_3_2_16_2","unstructured":"Akshat Gupta Xiaoyang Song and Gopala Anumanchipalli. 2024. Self-Assessment Tests are Unreliable Measures of LLM Personality. arxiv:https:\/\/arXiv.org\/abs\/2309.08163\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2309.08163"},{"key":"e_1_3_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.14281\/18241.24"},{"key":"e_1_3_3_2_18_2","doi-asserted-by":"publisher","unstructured":"J. Haidt. 2001. The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychological Review 108 4 (2001) 814\u2013834. https:\/\/doi.org\/10.1037\/0033-295X.108.4.814","DOI":"10.1037\/0033-295X.108.4.814"},{"key":"e_1_3_3_2_19_2","volume-title":"Values Survey Module 2013 Manual","author":"Hofstede G.","year":"2013","unstructured":"G. Hofstede and M. Minkov. 2013. Values Survey Module 2013 Manual. Geert Hofstede BV, Wageningen, Netherlands. https:\/\/geerthofstede.com\/wp-content\/uploads\/2016\/07\/Manual-VSM-2013.pdf"},{"key":"e_1_3_3_2_20_2","doi-asserted-by":"publisher","unstructured":"C.\u00a0K. Hsee. 1996. The evaluability hypothesis: An explanation for preference reversals between joint and separate evaluations of alternatives. Organizational Behavior and Human Decision Processes 67 3 (1996) 247\u2013257. https:\/\/doi.org\/10.1006\/obhd.1996.0072","DOI":"10.1006\/obhd.1996.0072"},{"key":"e_1_3_3_2_21_2","unstructured":"Guangyuan Jiang Manjie Xu Song-Chun Zhu Wenjuan Han Chi Zhang and Yixin Zhu. 2023. Evaluating and Inducing Personality in Pre-trained Language Models. arxiv:https:\/\/arXiv.org\/abs\/2206.07550\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2206.07550"},{"key":"e_1_3_3_2_22_2","unstructured":"Guangyuan Jiang Manjie Xu Song-Chun Zhu Wenjuan Han Chi Zhang and Yixin Zhu. 2024. Evaluating and inducing personality in pre-trained language models. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_2_23_2","unstructured":"R.\u00a0L. Johnson G. Pistilli N. Men\u00e9dez-Gonz\u00e1lez L.\u00a0D.\u00a0D. Duran E. Panai J. Kalpokiene and D.\u00a0J. Bertulfo. 2022. The ghost in the Machine has an American accent: Value conflict in GPT-3. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2203.07785 (March 2022) 1\u2013xx. https:\/\/arxiv.org\/abs\/2203.07785"},{"key":"e_1_3_3_2_24_2","unstructured":"Justin Kaashoek Manish Raghavan and John\u00a0J. Horton. 2024. The Impact of Generative AI on Labor Market Matching. MIT Generative AI (March 2024). https:\/\/mit-genai.pubpub.org\/pub\/4t8pqt06\/release\/4 Accessed: 2025-01-16."},{"key":"e_1_3_3_2_25_2","doi-asserted-by":"crossref","unstructured":"Pratyusha Kalluri. 2020. Don\u2019t ask if artificial intelligence is good or fair ask how it shifts power. Nature 583 (2020) 169 \u2013 169. https:\/\/api.semanticscholar.org\/CorpusID:256822507","DOI":"10.1038\/d41586-020-02003-2"},{"key":"e_1_3_3_2_26_2","unstructured":"Omar Khattab Arnav Singhvi Paridhi Maheshwari Zhiyuan Zhang Keshav Santhanam Sri Vardhamanan Saiful Haq Ashutosh Sharma Thomas\u00a0T Joshi Hanna Moazam et\u00a0al. 2023. Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.03714 (2023)."},{"key":"e_1_3_3_2_27_2","unstructured":"G. Kova\u010d M. Sawayama R. Portelas C. Colas P.\u00a0F. Dominey and P. Oudeyer. 2023. Large Language Models as Superpositions of Cultural Perspectives. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.07870 (July 2023) 1\u201335. https:\/\/arxiv.org\/abs\/2307.07870"},{"key":"e_1_3_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1002\/9781118490013.ch6"},{"key":"e_1_3_3_2_29_2","doi-asserted-by":"publisher","unstructured":"S. Lindgren and J. Holmstr\u00f6m. 2020. A Social Science Perspective on Artificial Intelligence: Building Blocks for a Research Agenda. Journal of Digital Social Research 2 3 (2020) 1\u201315. https:\/\/doi.org\/10.33621\/jdsr.v2i3.65","DOI":"10.33621\/jdsr.v2i3.65"},{"key":"e_1_3_3_2_30_2","unstructured":"Y. Liu Y. Yao J. Ton X. Zhang R. Guo H. Cheng Y. Klochkov M.\u00a0F. Taufiq and H. Li. 2023. Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models\u2019 Alignment. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.05374 (August 2023) 1\u201367. https:\/\/arxiv.org\/abs\/2308.05374"},{"key":"e_1_3_3_2_31_2","unstructured":"R. Masoud Z. Liu M. Ferianc P. Treleaven and M. Rodrigues. 2023. Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede\u2019s Cultural Dimensions. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.12342 (August 2023) 1\u201328. https:\/\/arxiv.org\/abs\/2309.12342"},{"key":"e_1_3_3_2_32_2","unstructured":"Mantas Mazeika Xuwang Yin Rishub Tamirisa Jaehyuk Lim Bruce\u00a0W Lee Richard Ren Long Phan Norman Mu Adam Khoja Oliver Zhang et\u00a0al. 2025. Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.08640 (2025)."},{"key":"e_1_3_3_2_33_2","unstructured":"Jared Moore Tanvi Deshpande and Diyi Yang. 2024. Are Large Language Models Consistent over Value-laden Questions? arxiv:https:\/\/arXiv.org\/abs\/2407.02996\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2407.02996"},{"key":"e_1_3_3_2_34_2","unstructured":"Pawe\u0142 Niszczota Mateusz Janczak and Micha\u0142 Misiak. 2024. Large Language Models Can Replicate Cross-Cultural Differences in Personality. arxiv:https:\/\/arXiv.org\/abs\/2310.10679\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2310.10679"},{"key":"e_1_3_3_2_35_2","unstructured":"Natalia O\u017cegalska-\u0141ukasik and Szymon \u0141ukasik. 2023. Culturally Responsive Artificial Intelligence: Problems Challenges and Solutions. arxiv:https:\/\/arXiv.org\/abs\/2312.08467\u00a0[cs.CY] https:\/\/arxiv.org\/abs\/2312.08467"},{"key":"e_1_3_3_2_36_2","doi-asserted-by":"crossref","unstructured":"Joseph P\u00a0Simmons Leif D\u00a0Nelson and Uri Simonsohn. 2021. Pre-registration: Why and how. Journal of Consumer Psychology 31 1 (2021) 151\u2013162.","DOI":"10.1002\/jcpy.1208"},{"key":"e_1_3_3_2_37_2","doi-asserted-by":"crossref","unstructured":"S. Pawar J. Park J. Jin A. Arora J. Myung S. Yadav F.\u00a0G. Haznitrama I. Song A. Oh and I. Augenstein. 2024. Survey of cultural awareness in language models: Text and beyond. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2411.00860 (October 2024) 1\u2013xx. https:\/\/arxiv.org\/abs\/2411.00860","DOI":"10.1162\/COLI.a.14"},{"key":"e_1_3_3_2_38_2","unstructured":"Vinodkumar Prabhakaran Rida Qadri and Ben Hutchinson. 2022. Cultural Incongruencies in Artificial Intelligence. arxiv:https:\/\/arXiv.org\/abs\/2211.13069\u00a0[cs.CY] https:\/\/arxiv.org\/abs\/2211.13069"},{"key":"e_1_3_3_2_39_2","doi-asserted-by":"publisher","unstructured":"L. Ross T.\u00a0M. Amabile and J.\u00a0L. Steinmetz. 1977. Social roles social control and biases in social-perception processes. Journal of Personality and Social Psychology 35 7 (1977) 485\u2013494. https:\/\/doi.org\/10.1037\/0022-3514.35.7.485","DOI":"10.1037\/0022-3514.35.7.485"},{"key":"e_1_3_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.816"},{"key":"e_1_3_3_2_41_2","series-title":"Proceedings of Machine Learning Research","first-page":"29971","volume-title":"Proceedings of the 40th International Conference on Machine Learning","volume":"202","author":"Santurkar Shibani","year":"2023","unstructured":"Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose Opinions Do Language Models Reflect?. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.\u00a0202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 29971\u201330004. https:\/\/proceedings.mlr.press\/v202\/santurkar23a.html"},{"key":"e_1_3_3_2_42_2","volume-title":"sklearn.metrics.adjusted_rand_score","author":"developers scikit-learn","year":"2025","unstructured":"scikit-learn developers. 2025. sklearn.metrics.adjusted_rand_score. https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.adjusted_rand_score.html Accessed: 2025-01-19."},{"key":"e_1_3_3_2_43_2","unstructured":"Melanie Sclar Yejin Choi Yulia Tsvetkov and Alane Suhr. 2023. Quantifying Language Models\u2019 Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.11324 (2023)."},{"key":"e_1_3_3_2_44_2","unstructured":"Rusheb Shah Soroush Pour Arush Tagade Stephen Casper Javier Rando et\u00a0al. 2023. Scalable and transferable black-box jailbreaks for language models via persona modulation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.03348 (2023)."},{"key":"e_1_3_3_2_45_2","doi-asserted-by":"publisher","unstructured":"Y. Tao O. Viberg R.\u00a0S. Baker and R.\u00a0F. Kizilcec. 2024. Cultural Bias and Cultural Alignment of Large Language Models. PNAS Nexus 3 9 (2024) 1\u201312. https:\/\/doi.org\/10.1093\/pnasnexus\/pgae346","DOI":"10.1093\/pnasnexus\/pgae346"},{"key":"e_1_3_3_2_46_2","unstructured":"Jen tse Huang Wenxiang Jiao Man\u00a0Ho Lam Eric\u00a0John Li Wenxuan Wang and Michael\u00a0R. Lyu. 2024. Revisiting the Reliability of Psychological Scales on Large Language Models. arxiv:https:\/\/arXiv.org\/abs\/2305.19926\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2305.19926"},{"key":"e_1_3_3_2_47_2","unstructured":"Shashikant Vishwakarma. 2023. Cover Letter Dataset. https:\/\/huggingface.co\/datasets\/ShashiVish\/cover-letter-dataset Accessed: 2025-01-16."},{"key":"e_1_3_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.345"},{"key":"e_1_3_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.c3nlp-1.1"},{"key":"e_1_3_3_2_50_2","doi-asserted-by":"publisher","unstructured":"T.\u00a0D. Wilson and J.\u00a0W. Schooler. 1991. Thinking too much: Introspection can reduce the quality of preferences and decisions. Journal of Personality and Social Psychology 60 2 (1991) 181\u2013192. https:\/\/doi.org\/10.1037\/0022-3514.60.2.181","DOI":"10.1037\/0022-3514.60.2.181"},{"key":"e_1_3_3_2_51_2","first-page":"17696","volume-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)","author":"Zhao Wenlong","year":"2024","unstructured":"Wenlong Zhao, Debanjan Mondal, Niket Tandon, Danica Dillion, Kurt Gray, and Yuling Gu. 2024. WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL, Torino, Italia, 17696\u201317706."},{"key":"e_1_3_3_2_52_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Zheng Chujie","year":"2023","unstructured":"Chujie Zheng, Hao Zhou, Fandong Meng, Jie Zhou, and Minlie Huang. 2023. Large language models are not robust multiple choice selectors. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_3_2_53_2","doi-asserted-by":"crossref","unstructured":"Jingming Zhuo Songyang Zhang Xinyu Fang Haodong Duan Dahua Lin and Kai Chen. 2024. ProSA: Assessing and understanding the prompt sensitivity of LLMs. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.12405 (2024).","DOI":"10.18653\/v1\/2024.findings-emnlp.108"}],"event":{"name":"FAccT '25: The 2025 ACM Conference on Fairness, Accountability, and Transparency","location":"Athens Greece","acronym":"FAccT '25"},"container-title":["Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715275.3732147","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T11:17:11Z","timestamp":1750763831000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715275.3732147"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,23]]},"references-count":52,"alternative-id":["10.1145\/3715275.3732147","10.1145\/3715275"],"URL":"https:\/\/doi.org\/10.1145\/3715275.3732147","relation":{},"subject":[],"published":{"date-parts":[[2025,6,23]]},"assertion":[{"value":"2025-06-23","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}