{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:44:45Z","timestamp":1767339885896,"version":"build-2065373602"},"reference-count":45,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,12,5]],"date-time":"2024-12-05T00:00:00Z","timestamp":1733356800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Author Gender Identification (AGI) is an extensively studied subject owing to its significance in several domains, such as security and marketing. Recognizing an author\u2019s gender may assist marketers in segmenting consumers more effectively and crafting tailored content that aligns with a gender\u2019s preferences. Also, in cybersecurity, identifying an author\u2019s gender might aid in detecting phishing attempts where hackers could imitate individuals of a specific gender. Although studies in Arabic have mostly concentrated on written dialects, such as tweets, there is a paucity of studies addressing Modern Standard Arabic (MSA) in journalistic genres. To address the AGI issue, this work combines the beneficial properties of natural language processing with cutting-edge deep learning methods. Firstly, we propose a large 8k MSA article dataset composed of various columns sourced from news platforms, labeled with each author\u2019s gender. Moreover, we extract and analyze textual features that may be beneficial in identifying gender-related cues through their writings, focusing on semantics and syntax linguistics. Furthermore, we probe several innovative deep learning models, namely, Convolutional Neural Networks (CNNs), LSTM, Bidirectional LSTM (BiLSTM), and Bidirectional Encoder Representations from Transformers (BERT). Beyond that, a novel enhanced BERT model is proposed by incorporating gender-specific textual features. Through various experiments, the results underscore the potential of both BERT and the textual features, resulting in a 91% accuracy for the enhanced BERT model and a range of accuracy from 80% to 90% accuracy for deep learning models. We also employ these features for AGI in informal, dialectal text, with the enhanced BERT model reaching 68.7% accuracy. This demonstrates that these gender-specific textual features are conducive to AGI across MSA and dialectal texts.<\/jats:p>","DOI":"10.3390\/info15120779","type":"journal-article","created":{"date-parts":[[2024,12,5]],"date-time":"2024-12-05T08:33:51Z","timestamp":1733387631000},"page":"779","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Advancing Author Gender Identification in Modern Standard Arabic with Innovative Deep Learning and Textual Feature Techniques"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0182-2511","authenticated-orcid":false,"given":"Hanen","family":"Himdi","sequence":"first","affiliation":[{"name":"Computer Science and Artificial Intelligence Department, College of Computer Science and Engineering, University of Jeddah, Jeddah 21955, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0823-8390","authenticated-orcid":false,"given":"Khaled","family":"Shaalan","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and IT, The British University in Dubai, DIAC Block 11, Dubai P.O. Box 345015, United Arab Emirates"}]}],"member":"1968","published-online":{"date-parts":[[2024,12,5]]},"reference":[{"key":"ref_1","first-page":"332","article-title":"What is gender, anyway: A review of the options for operationalising gender","volume":"12","author":"Lindqvist","year":"2021","journal-title":"Psychol. Sex."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/j.diin.2011.04.002","article-title":"Author gender identification from text","volume":"8","author":"Cheng","year":"2011","journal-title":"Digit. Investig."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"102895","DOI":"10.1016\/j.jretconser.2021.102895","article-title":"The role of brand experience, brand resonance and brand trust in luxury consumption","volume":"66","author":"Husain","year":"2022","journal-title":"J. Retail. Consum. Serv."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"102404","DOI":"10.1016\/j.jretconser.2020.102404","article-title":"Social commerce website design, perceived value and loyalty behavior intentions: The moderating roles of gender, age and frequency of use","volume":"63","author":"Molinillo","year":"2021","journal-title":"J. Retail. Consum. Serv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Saeed, S. (2023). A customer-centric view of E-commerce security and privacy. Appl. Sci., 13.","DOI":"10.3390\/app13021020"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1080\/01638530802073712","article-title":"Gender Differences in Language Use: An Analysis of 14,000 Text Samples","volume":"45","author":"Newman","year":"2008","journal-title":"Discourse Process."},{"key":"ref_7","unstructured":"Block, A. (2024, October 01). Why Newspapers Should Not Have Columnists. Available online: https:\/\/stanforddaily.com\/2014\/11\/09\/why-newspapers-should-not-have-columnists\/."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Sulochana, B.C., Pragada, B.S., Kiran, B.C., Reddy, G.A., and Venugopalan, M. (2024, January 21\u201323). Author Identity Unveiled: Gender and Age Prediction from Textual Patterns using BERT. Proceedings of the 2024 4th International Conference on Intelligent Technologies (CONIT), Hubballi, India.","DOI":"10.1109\/CONIT61985.2024.10626311"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"100018","DOI":"10.1016\/j.nlp.2023.100018","article-title":"Gender prediction with descriptive textual data using a Machine Learning approach","volume":"4","author":"Onikoyi","year":"2023","journal-title":"Nat. Lang. Process. J."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, Y., Singh, L., and Mneimneh, Z. (2021, January 7\u20139). A Comparative Analysis of Classic and Deep Learning Models for Inferring Gender and Age of Twitter Users. Proceedings of the 2nd International Conference on Deep Learning Theory and Applications-DeLTA, Online.","DOI":"10.5220\/0010559500480058"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sezerer, E., Polatbilek, O., and Tekir, S. (2019, January 1\u20132). A Turkish Dataset for Gender Identification of Twitter Users. Proceedings of the 13th Linguistic Annotation Workshop, Florence, Italy.","DOI":"10.18653\/v1\/W19-4023"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"15399","DOI":"10.1109\/ACCESS.2024.3358199","article-title":"AGI-P: A Gender Identification Framework for Authorship Analysis Using Customized Fine-Tuning of Multilingual Language Model","volume":"12","author":"Sarwar","year":"2024","journal-title":"IEEE Access"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"88","DOI":"10.3991\/ijim.v18i03.43013","article-title":"Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization","volume":"18","author":"Taha","year":"2024","journal-title":"Int. J. Interact. Mob. Technol."},{"key":"ref_14","first-page":"50669","article-title":"Biological gender identification in Turkish news text using deep learning models","volume":"83","year":"2024","journal-title":"Multimed. Tools Appl."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Sakaki, S., Miura, Y., Ma, X., Hattori, K., and Ohkuma, T. (2014, January 23\u201329). Twitter user gender inference using combined analysis of text and image processing. Proceedings of the Third Workshop on Vision and Language, Dublin, Ireland.","DOI":"10.3115\/v1\/W14-5408"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Taniguchi, T., Sakaki, S., Shigenaka, R., Tsuboshita, Y., and Ohkuma, T. (2015, January 18). A weighted combination of text and image classifiers for user gender inference. Proceedings of the Fourth Workshop on Vision and Language, Lisbon, Portugal.","DOI":"10.18653\/v1\/W15-2814"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1016\/j.eij.2020.04.001","article-title":"Gender identification for Egyptian Arabic dialect in twitter using deep learning models","volume":"21","author":"ElSayed","year":"2020","journal-title":"Egypt. Inform. J."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"AlZahrani, F.M., and Al-Yahya, M. (2023). A Transformer-Based Approach to Authorship Attribution in Classical Arabic Texts. Appl. Sci., 13.","DOI":"10.3390\/app13127255"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.eij.2018.12.002","article-title":"Gender identification of egyptian dialect in twitter","volume":"20","author":"Hussein","year":"2019","journal-title":"Egypt. Inform. J."},{"key":"ref_20","unstructured":"Halpern, J. (2009, January 22\u201323). Lexicon-driven approach to the recognition of Arabic named entities. Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt."},{"key":"ref_21","unstructured":"Mubarak, H. (2017). Build fast and accurate lemmatization for Arabic. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1037\/h0031619","article-title":"Measuring nominal scale agreement among many raters","volume":"76","author":"Fleiss","year":"1971","journal-title":"Psychol. Bull."},{"key":"ref_23","first-page":"182","article-title":"Gender language differences do men and women really speak differently","volume":"2","author":"Choucane","year":"2016","journal-title":"Glob. Engl.-Oriented Res. J. (GEORJ)"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"6063","DOI":"10.21105\/joss.06063","article-title":"Tashaphyne: A Python package for Arabic Light Stemming","volume":"9","author":"Zerrouki","year":"2024","journal-title":"J. Open Source Softw."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Abdelali, A., Darwish, K., Durrani, N., and Mubarak, H. (2016, January 12\u201317). Farasa: A fast and furious segmenter for arabic. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, San Diego, CA, USA.","DOI":"10.18653\/v1\/N16-3003"},{"key":"ref_26","unstructured":"Ayeni, A. (2024, October 01). Empirics of Standard Deviation. Available online: https:\/\/www.researchgate.net\/publication\/264276808_Empirics_of_Standard_Deviation?channel=doi&linkId=53d74d290cf228d363eae74b&showFulltext=true."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Goldberg, Y. (2015). A Primer on Neural Network Models for Natural Language Processing. arXiv.","DOI":"10.1613\/jair.4992"},{"key":"ref_28","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Safaya, A., Abdullatif, M., and Yuret, D. (2020, January 12\u201313). KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media. Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona (Online), Spain.","DOI":"10.18653\/v1\/2020.semeval-1.271"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Husain, F., and Uzuner, O. (2021). Transfer Learning Approach for Arabic Offensive Language Detection System\u2014BERT-Based Model. arXiv.","DOI":"10.1109\/IALP57159.2022.9961263"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1145\/1461928.1461959","article-title":"Automatically profiling the author of an anonymous text","volume":"52","author":"Argamon","year":"2009","journal-title":"Commun. ACM"},{"key":"ref_32","first-page":"18","article-title":"Gendered-Linked Differences in Speech Styles: Analysing Linguistic and Gender in the Malaysian Context\/DIFF\u00c9RENCES DE SEXE DANS LE STYLE DE DISCOURS: ANALYSES LINGUISTIQUES ET ANALYSES SUR LE SEXE DANS LE CAS DE MALAISIE","volume":"6","author":"Subrayan","year":"2010","journal-title":"Cross-Cult. Commun."},{"key":"ref_33","first-page":"67","article-title":"Gender differences in the use of linguistic forms in the speech of men and women in the Malaysian context","volume":"13","author":"Subon","year":"2013","journal-title":"J. Humanit. Soc. Sci."},{"key":"ref_34","first-page":"85","article-title":"Author gender identification from Arabic text","volume":"35","author":"Alsmearat","year":"2017","journal-title":"J. Inf. Secur. Appl."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"B\u0103dic\u0103, C., Treur, J., Benslimane, D., Hnatkowska, B., and Kr\u00f3tkiewicz, M. (2022, January 28\u201330). Bots and Gender Detection on Twitter Using Stylistic Features. Proceedings of the Advances in Computational Collective Intelligence, Hammamet, Tunisia.","DOI":"10.1007\/978-3-031-16210-7"},{"key":"ref_36","first-page":"110","article-title":"Introduction to Psycholinguistics\u2014A Review","volume":"2","author":"Balamurugan","year":"2018","journal-title":"Stud. Linguist. Lit."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1515\/ijsl-2020-2084","article-title":"The Scope of Sociolinguistics","volume":"2020","author":"Hymes","year":"2020","journal-title":"Int. J. Sociol. Lang."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"48428","DOI":"10.1109\/ACCESS.2020.2973509","article-title":"An Author Gender Detection Method Using Whale Optimization Algorithm and Artificial Neural Network","volume":"8","author":"Safara","year":"2020","journal-title":"IEEE Access"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Morales S\u00e1nchez, D., Moreno, A., and Jim\u00e9nez L\u00f3pez, M.D. (2022). A White-Box Sociolinguistic Model for Gender Detection. Appl. Sci., 12.","DOI":"10.3390\/app12052676"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Freihat, A., Bella, G., Mubarak, H., and Giunchiglia, F. (2018, January 25\u201326). A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition. Proceedings of the 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP), Algiers, Algeria.","DOI":"10.1109\/ICNLSP.2018.8374393"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Alluhaibi, R., Alfraidi, T., Abdeen, M.A., and Yatimi, A. (2021). A Comparative Study of Arabic Part of Speech Taggers Using Literary Text Samples from Saudi Novels. Information, 12.","DOI":"10.3390\/info12120523"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3439726","article-title":"Deep Learning\u2013based Text Classification: A Comprehensive Review","volume":"54","author":"Minaee","year":"2021","journal-title":"ACM Comput. Surv."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1111\/josl.12080","article-title":"Gender identity and lexical variation in social media","volume":"18","author":"Bamman","year":"2014","journal-title":"J. Socioling."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Rao, D., Yarowsky, D., Shreevats, A., and Gupta, M. (2010, January 26\u201330). Classifying latent user attributes in twitter. Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, Toronto, ON, Canada.","DOI":"10.1145\/1871985.1871993"},{"key":"ref_45","first-page":"612","article-title":"Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm","volume":"11","author":"Tangirala","year":"2020","journal-title":"Int. J. Adv. Comput. Sci. Appl."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/12\/779\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:47:10Z","timestamp":1760114830000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/12\/779"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,5]]},"references-count":45,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["info15120779"],"URL":"https:\/\/doi.org\/10.3390\/info15120779","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2024,12,5]]}}}