{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T02:49:32Z","timestamp":1777344572655,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":53,"publisher":"ACM","funder":[{"name":"SFB1310","award":[""],"award-info":[{"award-number":[""]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,11,5]]},"DOI":"10.1145\/3757887.3763016","type":"proceedings-article","created":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T07:42:58Z","timestamp":1761723778000},"page":"74-81","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Challenging the Validity of Personality Tests for Large Language Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1751-0691","authenticated-orcid":false,"given":"Tom","family":"S\u00fchr","sequence":"first","affiliation":[{"name":"Max Planck Institute for Intelligent Systems, T\u00fcbingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-9258-6664","authenticated-orcid":false,"given":"Florian E.","family":"Dorner","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3423-4414","authenticated-orcid":false,"given":"Samira","family":"Samadi","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Intelligent Systems, T\u00fcbingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6053-0415","authenticated-orcid":false,"given":"Augustin","family":"Kelava","sequence":"additional","affiliation":[{"name":"Methods Center, Eberhard Karls University of T\u00fcbingen, T\u00fcbingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,11,4]]},"reference":[{"key":"e_1_3_3_3_2_2","unstructured":"Julius Adebayo Justin Gilmer Michael Muelly Ian Goodfellow Moritz Hardt and Been Kim. 2018. Sanity checks for saliency maps. Advances in neural information processing systems 31 (2018)."},{"key":"e_1_3_3_3_3_2","volume-title":"Personality: A psychological interpretation.","author":"Allport Gordon\u00a0Willard","year":"1937","unstructured":"Gordon\u00a0Willard Allport. 1937. Personality: A psychological interpretation.Holt."},{"key":"e_1_3_3_3_4_2","unstructured":"American Educational Research Association American\u00a0Psychological Association and National\u00a0Council on\u00a0Measurement\u00a0in Education. 2014. Standards for educational and psychological testing."},{"key":"e_1_3_3_3_5_2","doi-asserted-by":"crossref","unstructured":"Michael\u00a0C Ashton Kibeom Lee Marco Perugini Piotr Szarota Reinout\u00a0E De\u00a0Vries Lisa Di\u00a0Blas Kathleen Boies and Boele De\u00a0Raad. 2004. A six-factor structure of personality-descriptive adjectives: solutions from psycholexical studies in seven languages. Journal of Personality and Social Psychology 86 2 (2004) 356.","DOI":"10.1037\/0022-3514.86.2.356"},{"key":"e_1_3_3_3_6_2","doi-asserted-by":"crossref","unstructured":"Marcel Binz and Eric Schulz. 2023. Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences 120 6 (2023) e2218523120.","DOI":"10.1073\/pnas.2218523120"},{"key":"e_1_3_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1002\/9781118619179"},{"key":"e_1_3_3_3_8_2","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared\u00a0D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877\u20131901."},{"key":"e_1_3_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.156"},{"key":"e_1_3_3_3_10_2","volume-title":"Description and measurement of personality.","author":"Cattell Raymond\u00a0Bernard","year":"1946","unstructured":"Raymond\u00a0Bernard Cattell. 1946. Description and measurement of personality.World Book Company."},{"key":"e_1_3_3_3_11_2","unstructured":"Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham Hyung\u00a0Won Chung Charles Sutton Sebastian Gehrmann et\u00a0al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.02311 (2022)."},{"key":"e_1_3_3_3_12_2","doi-asserted-by":"crossref","unstructured":"Paul\u00a0T Costa\u00a0Jr and Robert\u00a0R McCrae. 1992. The five-factor model of personality and its relevance to personality disorders. Journal of Personality Disorders 6 4 (1992) 343\u2013359.","DOI":"10.1521\/pedi.1992.6.4.343"},{"key":"e_1_3_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.6102\/zis247"},{"key":"e_1_3_3_3_14_2","volume-title":"The biological basis of personality","author":"Eysenck Hans\u00a0J\u00fcrgen","year":"1967","unstructured":"Hans\u00a0J\u00fcrgen Eysenck. 1967. The biological basis of personality. Vol.\u00a0689. Transaction publishers."},{"key":"e_1_3_3_3_15_2","doi-asserted-by":"crossref","unstructured":"David Gallardo-Pujol V\u00edctor Rouco Anna Cortijos-Bernabeu Luis Oceja Christopher\u00a0J Soto and Oliver\u00a0P John. 2022. Factor structure gender invariance measurement properties and short forms of the Spanish adaptation of the Big Five Inventory-2. Psychological Test Adaptation and Development (2022).","DOI":"10.31234\/osf.io\/nxr4q"},{"key":"e_1_3_3_3_16_2","doi-asserted-by":"publisher","unstructured":"Leo Gao Jonathan Tow Baber Abbasi Stella Biderman Sid Black Anthony DiPofi Charles Foster Laurence Golding Jeffrey Hsu Alain Le\u00a0Noac\u2019h Haonan Li Kyle McDonell Niklas Muennighoff Chris Ociepa Jason Phang Laria Reynolds Hailey Schoelkopf Aviya Skowron Lintang Sutawika Eric Tang Anish Thite Ben Wang Kevin Wang and Andy Zou. 2024. A framework for few-shot language model evaluation. 10.5281\/zenodo.12608602","DOI":"10.5281\/zenodo.12608602"},{"key":"e_1_3_3_3_17_2","unstructured":"Akshat Gupta Xiaoyang Song and Gopala Anumanchipalli. 2023. Investigating the applicability of self-assessment tests for personality measurement of large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.08163 (2023)."},{"key":"e_1_3_3_3_18_2","doi-asserted-by":"crossref","unstructured":"Michael Gurven Christopher Von\u00a0Rueden Maxim Massenkoff Hillard Kaplan and Marino Lero\u00a0Vie. 2013. How universal is the Big Five? Testing the five-factor model of personality variation among forager\u2013farmers in the Bolivian Amazon. Journal of personality and social psychology 104 2 (2013) 354.","DOI":"10.1037\/a0030841"},{"key":"e_1_3_3_3_19_2","doi-asserted-by":"publisher","unstructured":"Li\u2010tze Hu and Peter\u00a0M. Bentler. 1999. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling 6 1 (1999) 1\u201355. 10.1080\/10705519909540118 arXiv:10.1080\/10705519909540118","DOI":"10.1080\/10705519909540118"},{"key":"e_1_3_3_3_20_2","unstructured":"Jen-tse Huang Wenxuan Wang M Lam E Li Wenxiang Jiao and M Lyu. 2023. Revisiting the reliability of psychological scales on large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.19926 (2023)."},{"key":"e_1_3_3_3_21_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Huang Jen-tse","year":"2023","unstructured":"Jen-tse Huang, Wenxuan Wang, Eric\u00a0John Li, Man\u00a0Ho LAM, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, and Michael Lyu. 2023. On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_3_3_22_2","unstructured":"Jen-tse Huang Wenxuan Wang Eric\u00a0John Li Man\u00a0Ho Lam Shujie Ren Youliang Yuan Wenxiang Jiao Zhaopeng Tu and Michael\u00a0R Lyu. 2023. Who is ChatGPT? Benchmarking LLMs\u2019 Psychological Portrayal Using PsychoBench. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.01386 (2023)."},{"key":"e_1_3_3_3_23_2","unstructured":"International Personality Item Pool. [n. d.]. Administering IPIP Measures with a 50-item Sample Questionnaire. https:\/\/ipip.ori.org\/new_ipip-50-item-scale.htm. Accessed: 2023-09-24."},{"key":"e_1_3_3_3_24_2","volume-title":"Thirty-seventh Conference on Neural Information Processing Systems","author":"Jiang Guangyuan","year":"2023","unstructured":"Guangyuan Jiang, Manjie Xu, Song-Chun Zhu, Wenjuan Han, Chi Zhang, and Yixin Zhu. 2023. Evaluating and Inducing Personality in Pre-trained Language Models. In Thirty-seventh Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=I9xE1Jsjfx"},{"key":"e_1_3_3_3_25_2","unstructured":"Hang Jiang Xiajie Zhang Xubo Cao Jad Kabbara and Deb Roy. 2023. Personallm: Investigating the ability of gpt-3.5 to express personality traits and gender differences. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.02547 (2023)."},{"key":"e_1_3_3_3_26_2","doi-asserted-by":"crossref","unstructured":"Oliver\u00a0P John Eileen\u00a0M Donahue and Robert\u00a0L Kentle. 1991. Big five inventory. Journal of Personality and Social Psychology (1991).","DOI":"10.1037\/t07550-000"},{"key":"e_1_3_3_3_27_2","doi-asserted-by":"crossref","unstructured":"Daniel\u00a0N Jones and Delroy\u00a0L Paulhus. 2014. Introducing the short dark triad (SD3) a brief measure of dark personality traits. Assessment 21 1 (2014) 28\u201341.","DOI":"10.1177\/1073191113514105"},{"key":"e_1_3_3_3_28_2","unstructured":"Saketh\u00a0Reddy Karra Son\u00a0The Nguyen and Theja Tulabandhula. 2022. Estimating the Personality of White-Box Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.12000 (2022)."},{"key":"e_1_3_3_3_29_2","unstructured":"Xingxuan Li Yutong Li Linlin Liu Lidong Bing and Shafiq Joty. 2022. Is gpt-3 a psychopath? evaluating large language models from a psychological perspective. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2212.10529 (2022)."},{"key":"e_1_3_3_3_30_2","doi-asserted-by":"crossref","unstructured":"Riccardo Loconte Graziella Orr\u00f9 Mirco Tribastone Pietro Pietrini and Giuseppe Sartori. 2023. Challenging ChatGPT\u2019Intelligence\u2019with Human Tools: A Neuropsychological Investigation on Prefrontal Functioning of a Large Language Model. Intelligence (2023).","DOI":"10.2139\/ssrn.4471829"},{"key":"e_1_3_3_3_31_2","unstructured":"Yang Lu Jordan Yu and Shou-Hsuan\u00a0Stephen Huang. 2023. Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.14202 (2023)."},{"key":"e_1_3_3_3_32_2","unstructured":"Shengyu Mao Ningyu Zhang Xiaohan Wang Mengru Wang Yunzhi Yao Yong Jiang Pengjun Xie Fei Huang and Huajun Chen. 2023. Editing personality for llms. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.02168 (2023)."},{"key":"e_1_3_3_3_33_2","doi-asserted-by":"crossref","unstructured":"Robert\u00a0R McCrae and Oliver\u00a0P John. 1992. An introduction to the five-factor model and its applications. Journal of personality 60 2 (1992) 175\u2013215.","DOI":"10.1111\/j.1467-6494.1992.tb00970.x"},{"key":"e_1_3_3_3_34_2","unstructured":"R McDonald. 1999. Test theory: A unified treatment. Nueva York."},{"key":"e_1_3_3_3_35_2","doi-asserted-by":"crossref","unstructured":"Qiaozhu Mei Yutong Xie Walter Yuan and Matthew\u00a0O Jackson. 2024. A Turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences 121 9 (2024) e2313925121.","DOI":"10.1073\/pnas.2313925121"},{"key":"e_1_3_3_3_36_2","doi-asserted-by":"publisher","unstructured":"William Meredith. 1993. Measurement invariance factor analysis and factorial invariance. Psychometrika 58 4 (1993) 525\u2013543. 10.1007\/BF02294825","DOI":"10.1007\/BF02294825"},{"key":"e_1_3_3_3_37_2","doi-asserted-by":"crossref","unstructured":"IB Myers. 1962. The Myers-Briggs Type Indicator. Educational Testing Service\/Princeton (1962).","DOI":"10.1037\/14404-000"},{"key":"e_1_3_3_3_38_2","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray et\u00a0al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022) 27730\u201327744."},{"key":"e_1_3_3_3_39_2","unstructured":"Keyu Pan and Yawen Zeng. 2023. Do llms possess a personality? making the mbti test an amazing evaluation for large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.16180 (2023)."},{"key":"e_1_3_3_3_40_2","doi-asserted-by":"crossref","unstructured":"Max Pellert Clemens\u00a0M Lechner Claudia Wagner Beatrice Rammstedt and Markus Strohmaier. 2023. AI Psychometrics: Assessing the psychological profiles of large language models through psychometric inventories. Perspectives on Psychological Science (2023) 17456916231214460.","DOI":"10.31234\/osf.io\/jv5dt"},{"key":"e_1_3_3_3_41_2","unstructured":"Nikolay\u00a0B Petrov Gregory Serapio-Garc\u00eda and Jason Rentfrow. 2024. Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.07248 (2024)."},{"key":"e_1_3_3_3_42_2","doi-asserted-by":"crossref","unstructured":"David\u00a0J Pittenger. 2005. Cautionary comments regarding the Myers-Briggs type indicator. Consulting Psychology Journal: Practice and Research 57 3 (2005) 210.","DOI":"10.1037\/1065-9293.57.3.210"},{"key":"e_1_3_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1524\/9783050063782"},{"key":"e_1_3_3_3_44_2","doi-asserted-by":"crossref","unstructured":"Mustafa Safdari Greg Serapio-Garc\u00eda Cl\u00e9ment Crepy Stephen Fitz Peter Romero Luning Sun Marwa Abdulhai Aleksandra Faust and Maja Matari\u0107. 2023. Personality traits in large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.00184 (2023).","DOI":"10.21203\/rs.3.rs-3296728\/v1"},{"key":"e_1_3_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CAI54212.2023.00097"},{"key":"e_1_3_3_3_46_2","unstructured":"Xiaoyang Song Akshat Gupta Kiyan Mohebbizadeh Shujie Hu and Anant Singh. 2023. Have large language models developed a personality?: Applicability of self-assessment tests in measuring personality in llms. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.14693 (2023)."},{"key":"e_1_3_3_3_47_2","doi-asserted-by":"crossref","unstructured":"Christopher\u00a0J Soto. 2019. How replicable are links between personality traits and consequential life outcomes? The life outcomes of personality replication project. Psychological Science 30 5 (2019) 711\u2013727.","DOI":"10.1177\/0956797619831612"},{"key":"e_1_3_3_3_48_2","doi-asserted-by":"crossref","unstructured":"Christopher\u00a0J Soto and Oliver\u00a0P John. 2017. The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth fidelity and predictive power. Journal of Personality and Social psychology 113 1 (2017) 117.","DOI":"10.1037\/pspp0000096"},{"key":"e_1_3_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1037\/11491-005"},{"key":"e_1_3_3_3_50_2","doi-asserted-by":"crossref","unstructured":"Randy Stein and Alexander\u00a0B Swan. 2019. Evaluating the validity of Myers-Briggs Type Indicator theory: A teaching tool and window into intuitive psychology. Social and Personality Psychology Compass 13 2 (2019) e12434.","DOI":"10.1111\/spc3.12434"},{"key":"e_1_3_3_3_51_2","doi-asserted-by":"crossref","unstructured":"Ross\u00a0David Stewart Ren\u00e9 M\u00f5ttus Anne Seeboth Christopher\u00a0John Soto and Wendy Johnson. 2022. The finer details? The predictability of life outcomes from Big Five domains facets and nuances. Journal of Personality 90 2 (2022) 167\u2013182.","DOI":"10.1111\/jopy.12660"},{"key":"e_1_3_3_3_52_2","first-page":"35","volume-title":"Test scoring","author":"Wainer Howard","year":"2001","unstructured":"Howard Wainer and David Thissen. 2001. True score theory: The traditional method. In Test scoring. Routledge, 35\u201384."},{"key":"e_1_3_3_3_53_2","unstructured":"Taylor Webb Keith\u00a0J Holyoak and Hongjing Lu. 2023. Emergent analogical reasoning in large language models. Nature Human Behaviour (2023) 1\u201316."},{"key":"e_1_3_3_3_54_2","volume-title":"Cronbach\u2019s \u03b1 , Revelle\u2019s \u03b2 , and McDonald\u2019s \u03c9 H: Their relations with each other and two alternative conceptualizations of reliability","author":"Zinbarg Richard\u00a0E","year":"2005","unstructured":"Richard\u00a0E Zinbarg, William Revelle, Iftah Yovel, and Wen Li. 2005. Cronbach\u2019s \u03b1 , Revelle\u2019s \u03b2 , and McDonald\u2019s \u03c9 H: Their relations with each other and two alternative conceptualizations of reliability. Vol.\u00a070. Springer. 123\u2013133 pages."}],"event":{"name":"EAAMO '25: Equity and Access in Algorithms, Mechanisms, and Optimization","location":"Pittsburgh USA","acronym":"EAAMO '25","sponsor":["SIGecom Special Interest Group on Economics and Computation","SIGAI ACM Special Interest Group on Artificial Intelligence"]},"container-title":["Proceedings of the 5th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization"],"original-title":[],"deposited":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T09:14:37Z","timestamp":1761815677000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3757887.3763016"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,4]]},"references-count":53,"alternative-id":["10.1145\/3757887.3763016","10.1145\/3757887"],"URL":"https:\/\/doi.org\/10.1145\/3757887.3763016","relation":{},"subject":[],"published":{"date-parts":[[2025,11,4]]},"assertion":[{"value":"2025-11-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}