{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T21:14:22Z","timestamp":1775164462772,"version":"3.50.1"},"reference-count":36,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"3S Holding O\u00dc R&amp;D fund"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Large language models (LLMs) have revolutionized natural language processing across diverse domains, yet they also raise critical fairness and ethical concerns, particularly regarding gender bias. In this study, we conduct a systematic, mathematically grounded investigation of gender bias in four leading LLMs\u2014GPT-4o, Gemini 1.5 Pro, Sonnet 3.5, and LLaMA 3.1:8b\u2014by evaluating the gender distributions produced when generating \u201cperfect personas\u201d for a wide range of occupational roles spanning healthcare, engineering, and professional services. Leveraging standardized prompts, controlled experimental settings, and repeated trials, our methodology quantifies bias against an ideal uniform distribution using rigorous statistical measures and information-theoretic metrics. Our results reveal marked discrepancies: GPT-4o exhibits pronounced occupational gender segregation, disproportionately linking healthcare roles to female identities while assigning male labels to engineering and physically demanding positions. In contrast, Gemini 1.5 Pro, Sonnet 3.5, and LLaMA 3.1:8b predominantly favor female assignments, albeit with less job-specific precision. These findings demonstrate how architectural decisions, training data composition, and token embedding strategies critically influence gender representation. The study underscores the urgent need for inclusive datasets, advanced bias-mitigation techniques, and continuous model audits to develop AI systems that are not only free from stereotype perpetuation but actively promote equitable and representative information processing.<\/jats:p>","DOI":"10.3390\/info16050358","type":"journal-article","created":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T05:17:12Z","timestamp":1745903832000},"page":"358","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Quantifying Gender Bias in Large Language Models Using Information-Theoretic and Statistical Analysis"],"prefix":"10.3390","volume":"16","author":[{"given":"Imran","family":"Mirza","sequence":"first","affiliation":[{"name":"Emerald School, 3600 Central Parkway, Dublin, CA 94568, USA"}]},{"given":"Akbar Anbar","family":"Jafari","sequence":"additional","affiliation":[{"name":"iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia"}]},{"given":"Cagri","family":"Ozcinar","sequence":"additional","affiliation":[{"name":"iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8460-5717","authenticated-orcid":false,"given":"Gholamreza","family":"Anbarjafari","sequence":"additional","affiliation":[{"name":"PwC Advisory, It\u00e4merentori 2, 00180 Helsinki, Finland"},{"name":"3S Holding, 62220 Tartu, Estonia"},{"name":"Department of Digitalisation & Data, Estonian Business School, A. Lauteri 3, 10114 Tallinn, Estonia"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,29]]},"reference":[{"key":"ref_1","unstructured":"Chen, Z.Z., Ma, J., Zhang, X., Hao, N., Yan, A., Nourbakhsh, A., Yang, X., McAuley, J.J., Petzold, L.R., and Wang, W.Y. (2024). A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law. Trans. Mach. Learn. Res."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Nazi, Z.A., and Peng, W. (2024). Large language models in healthcare and medical domain: A review. Informatics, 11.","DOI":"10.3390\/informatics11030057"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Moeslund, T.B., Escalera, S., Anbarjafari, G., Nasrollahi, K., and Wan, J. (2020). Statistical machine learning for human behaviour analysis. Entropy, 22.","DOI":"10.3390\/e22050530"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"100129","DOI":"10.1016\/j.chbah.2025.100129","article-title":"More is More: Addition Bias in Large Language Models","volume":"3","author":"Santagata","year":"2025","journal-title":"Comput. Hum. Behav. Artif. Hum."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Yu, J., Kim, S.U., Choi, J., and Choi, J.D. (2024). What Is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models. Information, 15.","DOI":"10.3390\/info15090549"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"736","DOI":"10.1162\/qss_a_00310","article-title":"A critical review of large language models: Sensitivity, bias, and the path toward specialized ai","volume":"5","author":"Hajikhani","year":"2024","journal-title":"Quant. Sci. Stud."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1982","DOI":"10.1111\/bjet.13505","article-title":"The life cycle of large language models in education: A framework for understanding sources of bias","volume":"55","author":"Lee","year":"2024","journal-title":"Br. J. Educ. Technol."},{"key":"ref_8","unstructured":"Domnich, A., and Anbarjafari, G. (2021). Responsible AI: Gender bias assessment in emotion recognition. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Rizhinashvili, D., Sham, A.H., and Anbarjafari, G. (2022). Gender neutralisation for unbiased speech synthesising. Electronics, 11.","DOI":"10.3390\/electronics11101594"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1007\/s11760-022-02246-8","article-title":"Ethical AI in facial expression analysis: Racial bias","volume":"17","author":"Sham","year":"2023","journal-title":"Signal Image Video Process."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P.M., and Bowman, S. (2022, January 22\u201327). BBQ: A hand-built bias benchmark for question answering. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.findings-acl.165"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zake, I. (2023). Holistic Bias in Sociology: Contemporary Trends. The Palgrave Handbook of Methodological Individualism: Volume II, Springer.","DOI":"10.1007\/978-3-031-41508-1_17"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hartvigsen, T., Gabriel, S., Palangi, H., Sap, M., Ray, D., and Kamar, E. (2022, January 22\u201327). ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland.","DOI":"10.18653\/v1\/2022.acl-long.234"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Dai, S., Xu, C., Xu, S., Pang, L., Dong, Z., and Xu, J. (2024, January 25\u201329). Bias and unfairness in information retrieval systems: New challenges in the llm era. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain.","DOI":"10.1145\/3637528.3671458"},{"key":"ref_16","unstructured":"Duan, Y. (2024, December 15). The Large Language Model (LLM) Bias Evaluation (Age Bias). DIKWP Research Group International Standard Evaluation, Available online: https:\/\/www.researchgate.net\/profile\/Yucong-Duan\/publication\/378861188_The_Large_Language_Model_LLM_Bias_Evaluation_Age_Bias_\u2013DIKWP_Research_Group_International_Standard_Evaluation\/links\/65ee981eb7819b433bf53822\/The-Large-Language-Model-LLM-Bias-Evaluation-Age-Bias\u2013DIKWP-Research-Group-International-Standard-Evaluation.pdf."},{"key":"ref_17","unstructured":"Oketunji, A., Anas, M., and Saina, D. (2023). Large Language Model (LLM) Bias Index\u2014LLMBI. Data Policy."},{"key":"ref_18","unstructured":"Lin, L., Wang, L., Guo, J., and Wong, K.F. (2025, January 19\u201324). Investigating Bias in LLM-Based Bias Detection: Disparities between LLMs and Human Perception. Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wan, Y., Pu, G., Sun, J., Garimella, A., Chang, K.W., and Peng, N. (2023). \u201ckelly is a warm person, joseph is a role model\u201d: Gender biases in llm-generated reference letters. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.243"},{"key":"ref_20","unstructured":"Dong, X., Wang, Y., Yu, P.S., and Caverlee, J. (2024). Disclosure and mitigation of gender bias in llms. arXiv."},{"key":"ref_21","unstructured":"Rhue, L., Goethals, S., and Sundararajan, A. (2024). Evaluating LLMs for Gender Disparities in Notable Persons. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"You, Z., Lee, H., Mishra, S., Jeoung, S., Mishra, A., Kim, J., and Diesner, J. (2024, January 16). Beyond Binary Gender Labels: Revealing Gender Bias in LLMs through Gender-Neutral Name Predictions. Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.gebnlp-1.16"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Smith, E.M., Hall, M., Kambadur, M., Presani, E., and Williams, A. (2022, January 7\u201311). \u201cI\u2019m sorry to hear that\u201d: Finding New Biases in Language Models with a Holistic Descriptor Dataset. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.625"},{"key":"ref_24","first-page":"46595","article-title":"Judging llm-as-a-judge with mt-bench and chatbot arena","volume":"36","author":"Zheng","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","unstructured":"Zhu, L., Wang, X., and Wang, X. (2025, January 24\u201328). Judgelm: Fine-tuned large language models are scalable judges. Proceedings of the Thirteenth International Conference on Learning Representations, Singapore."},{"key":"ref_26","unstructured":"Shayegani, E., Mamun, M.A.A., Fu, Y., Zaree, P., Dong, Y., and Abu-Ghazaleh, N. (2023). Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv."},{"key":"ref_27","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_28","unstructured":"Arslan, H.S., Fishel, M., and Anbarjafari, G. (2018). Doubly attentive transformer machine translation. arXiv."},{"key":"ref_29","unstructured":"Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., and Anadkat, S. (2023). Gpt-4 technical report. arXiv."},{"key":"ref_30","unstructured":"Anthropic (2024, December 15). Claude 3.5 Sonnet. Available online: https:\/\/www.anthropic.com\/news\/claude-3-5-sonnet."},{"key":"ref_31","unstructured":"Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., and Fan, A. (2024). The llama 3 herd of models. arXiv."},{"key":"ref_32","first-page":"6","article-title":"Do I have to be an \u201cother\u201d to be myself? Exploring gender diversity in taxonomy, data collection, and through the research data lifecycle","volume":"10","author":"Gofman","year":"2021","journal-title":"J. eSci. Libr."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Ben Amor, M., Granitzer, M., and Mitrovi\u0107, J. (2024, January 8\u201312). Impact of Position Bias on Language Models in Token Classification. Proceedings of the 39th ACM\/SIGAPP Symposium on Applied Computing, \u00c1vila, Spain.","DOI":"10.1145\/3605098.3636126"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Yang, J., Wang, Z., Lin, Y., and Zhao, Z. (2024, January 15\u201318). Problematic Tokens: Tokenizer Bias in Large Language Models. Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA.","DOI":"10.1109\/BigData62323.2024.10825615"},{"key":"ref_35","unstructured":"Wu, X., Ajorlou, A., Wang, Y., Jegelka, S., and Jadbabaie, A. (2024, January 10\u201315). On the Role of Attention Masks and LayerNorm in Transformers. Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Naim, O., and Asher, N. (2024). On explaining with attention matrices. ECAI 2024, IOS Press.","DOI":"10.3233\/FAIA240594"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/358\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:23:50Z","timestamp":1760030630000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/358"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,29]]},"references-count":36,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["info16050358"],"URL":"https:\/\/doi.org\/10.3390\/info16050358","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,29]]}}}