{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T16:24:42Z","timestamp":1780676682000,"version":"3.54.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,5,17]],"date-time":"2021-05-17T00:00:00Z","timestamp":1621209600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,5,17]],"date-time":"2021-05-17T00:00:00Z","timestamp":1621209600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction, specific preferences, as well as the success of professional and romantic relationship) and getting it without the hassle of taking formal personality test. Termed personality prediction, the process involves extracting the digital content into features and mapping it according to a personality model. Owing to its simplicity and proven capability, a well-known personality model, called the big five personality traits, has often been adopted in the literature as the de facto standard for personality assessment. To date, there are many algorithms that can be used to extract embedded contextualized word from textual data for personality prediction system; some of them are based on ensembled model and deep learning. Although useful, existing algorithms such as RNN and LSTM suffers from the following limitations. Firstly, these algorithms take a long time to train the model owing to its sequential inputs. Secondly, these algorithms also lack the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address these aforementioned limitations, this paper introduces a new prediction using multi model deep learning architecture combined with multiple pre-trained language model such as BERT, RoBERTa, and XLNet as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Unlike earlier work which adopts a single social media data with open and close vocabulary extraction method, the proposed work uses multiple social media data sources namely Facebook and Twitter and produce a predictive model for each trait using bidirectional context feature combine with extraction method. Our experience with the proposed work has been encouraging as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 86.2% and 0.912 f1 measure score on the Facebook dataset; 88.5% accuracy and 0.882 f1 measure score on the Twitter dataset.<\/jats:p>","DOI":"10.1186\/s40537-021-00459-1","type":"journal-article","created":{"date-parts":[[2021,5,17]],"date-time":"2021-05-17T12:03:48Z","timestamp":1621253028000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":115,"title":["Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging"],"prefix":"10.1186","volume":"8","author":[{"given":"Hans","family":"Christian","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3271-5874","authenticated-orcid":false,"given":"Derwin","family":"Suhartono","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andry","family":"Chowanda","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kamal Z.","family":"Zamli","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,5,17]]},"reference":[{"issue":"2","key":"459_CR1","doi-asserted-by":"publisher","first-page":"159","DOI":"10.22146\/gamaijb.34931","volume":"21","author":"N Abood","year":"2019","unstructured":"Abood N. Big five traits: a critical review. Gadjah Mada Int J Business. 2019;21(2):159\u201386. https:\/\/doi.org\/10.22146\/gamaijb.34931.","journal-title":"Gadjah Mada Int J Business"},{"key":"459_CR2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-021-09958-2","author":"FA Acheampong","year":"2021","unstructured":"Acheampong FA, Nunoo-Mensah H, Chen W. Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif Intell Rev. 2021. https:\/\/doi.org\/10.1007\/s10462-021-09958-2.","journal-title":"Artif Intell Rev"},{"key":"459_CR3","doi-asserted-by":"publisher","first-page":"473","DOI":"10.1016\/j.procs.2018.08.199","volume":"135","author":"GYNN Adi","year":"2018","unstructured":"Adi GYNN, Tandio MH, Ong V, Suhartono D. Optimization for automatic personality recognition on Twitter in Bahasa Indonesia. Procedia Comp Sci. 2018;135:473\u201380. https:\/\/doi.org\/10.1016\/j.procs.2018.08.199.","journal-title":"Procedia Comp Sci"},{"key":"459_CR4","doi-asserted-by":"crossref","unstructured":"Alam F, Stepanov EA, Riccardi G. Personality traits recognition on social network\u2014Facebook. AAAI Workshop\u2014Technical Report, WS-13-01, 2013. pp 6\u20139.","DOI":"10.1609\/icwsm.v7i2.14464"},{"key":"459_CR5","doi-asserted-by":"publisher","unstructured":"Aung ZMM, Myint PH. Personality prediction based on content of facebook users: a literature review. Proceedings - 20th IEEE\/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel\/Distributed Computing, SNPD 2019; 2019. pp. 34\u201338. https:\/\/doi.org\/10.1109\/SNPD.2019.8935692.","DOI":"10.1109\/SNPD.2019.8935692"},{"key":"459_CR6","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1613\/JAIR.1.11849","volume":"68","author":"O Ben-Porat","year":"2020","unstructured":"Ben-Porat O, Hirsch S, Kuchy L, Elad G, Reichart R, Tennenholtz M. Predicting strategic behavior from free text. J Artif Intell Res. 2020;68:413\u201345. https:\/\/doi.org\/10.1613\/JAIR.1.11849.","journal-title":"J Artif Intell Res"},{"issue":"1","key":"459_CR7","doi-asserted-by":"publisher","first-page":"35","DOI":"10.3233\/WEB-200427","volume":"18","author":"R Bin Tareaf","year":"2020","unstructured":"Bin Tareaf R, Berger P, Hennig P, Meinel C. Cross-platform personality exploration system for online social networks: Facebook vs. Twitter Web Intell. 2020;18(1):35\u201351. https:\/\/doi.org\/10.3233\/WEB-200427.","journal-title":"Twitter Web Intell"},{"key":"459_CR8","unstructured":"Carvalho F, Guedesa GP. TF-IDFC-RF: a novel supervised term weighting scheme. ArXiv. 2020."},{"issue":"4","key":"459_CR9","doi-asserted-by":"publisher","first-page":"285","DOI":"10.21512\/comtech.v7i4.3746","volume":"7","author":"H Christian","year":"2016","unstructured":"Christian H, Agus MP, Suhartono D. Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech Comp Math Eng Appl. 2016;7(4):285. https:\/\/doi.org\/10.21512\/comtech.v7i4.3746.","journal-title":"ComTech Comp Math Eng Appl."},{"key":"459_CR10","unstructured":"Cui B (n.d.). Survey analysis of machine learning methods for natural language processing for MBTI Personality Type Prediction. http:\/\/cs229.stanford.edu\/proj2017\/final-reports\/5242471.pdf."},{"key":"459_CR11","doi-asserted-by":"publisher","DOI":"10.1016\/j.tele.2020.101516","author":"M Dalvi-Esfahani","year":"2020","unstructured":"Dalvi-Esfahani M, Niknafs A, Alaedini Z, Barati Ahmadabadi H, Kuss DJ, Ramayah T. Social Media Addiction and Empathy: Moderating impact of personality traits among high school students. Telematics Inform. 2020. https:\/\/doi.org\/10.1016\/j.tele.2020.101516.","journal-title":"Telematics Inform"},{"key":"459_CR12","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1109\/CTEMS.2018.8769304","volume":"2018","author":"PS Dandannavar","year":"2018","unstructured":"Dandannavar PS, Mangalwede SR, Kulkarni PM. Social media text\u2014a source for personality prediction. Proc Int Conference Comput Tech Electronics Mech Syst CTEMS. 2018;2018:62\u20135. https:\/\/doi.org\/10.1109\/CTEMS.2018.8769304.","journal-title":"Proc Int Conference Comput Tech Electronics Mech Syst CTEMS"},{"key":"459_CR13","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1(Mlm), 2019. pp. 4171\u20134186."},{"key":"459_CR14","doi-asserted-by":"crossref","unstructured":"Ergu \u0130. Twitter Verisi ve Makine \u00d6 \u011f renmesi Modelleriyle Ki \u015f ilik Tahminleme Predicting Personality with Twitter Data and Machine Learning Models. 1. 2019.","DOI":"10.1109\/ASYU48272.2019.8946355"},{"key":"459_CR15","doi-asserted-by":"publisher","unstructured":"Farnadi G, Sushmita S, Sitaraman G, Ton N, De Cock M, Davalos S. A multivariate regression approach to personality impression recognition of vloggers. WCPR 2014 - Proceedings of the 2014 Workshop on Computational Personality Recognition, Workshop of MM 2014, 1\u20136. 2014. https:\/\/doi.org\/10.1145\/2659522.2659526.","DOI":"10.1145\/2659522.2659526"},{"key":"459_CR16","doi-asserted-by":"publisher","first-page":"105550","DOI":"10.1016\/j.knosys.2020.105550","volume":"194","author":"S Han","year":"2020","unstructured":"Han S, Huang H, Tang Y. Knowledge of words: An interpretable approach for personality recognition from social media. Knowl-Based Syst. 2020;194:105550. https:\/\/doi.org\/10.1016\/j.knosys.2020.105550.","journal-title":"Knowl-Based Syst"},{"key":"459_CR17","unstructured":"Hernandez and Knight. (n.d.). Predicting MBTI from text."},{"key":"459_CR18","doi-asserted-by":"publisher","DOI":"10.1145\/3167132.3167166","author":"P Howlader","year":"2018","unstructured":"Howlader P, Pal KK, Cuzzocrea A, Kumar SDM. Predicting facebook-users\u2019 personality based on status and linguistic features via flexible regression analysis techniques. Proc ACM Symposium Appl Comput. 2018. https:\/\/doi.org\/10.1145\/3167132.3167166.","journal-title":"Proc ACM Symposium Appl Comput"},{"issue":"4","key":"459_CR19","doi-asserted-by":"publisher","first-page":"283","DOI":"10.5391\/IJFIS.2019.19.4.283","volume":"19","author":"NH Jeremy","year":"2019","unstructured":"Jeremy NH, Prasetyo C, Suhartono D. Identifying personality traits for Indonesian user from twitter dataset. Int J Fuzzy Logic Intell Syst. 2019;19(4):283\u20139. https:\/\/doi.org\/10.5391\/IJFIS.2019.19.4.283.","journal-title":"Int J Fuzzy Logic Intell Syst"},{"key":"459_CR20","unstructured":"Jiang H, Zhang X, Choi JD. Automatic text-based personality recognition on monologues and multiparty dialogues using attentive networks and contextual embeddings. ArXiv, 2019. pp. 2\u20134."},{"key":"459_CR21","unstructured":"Ju C, Laan MJ, Van Der (n.d.). The relative performance of ensemble methods with deep convolutional neural networks for image classification. pp. 1\u201320."},{"key":"459_CR22","unstructured":"Kazameini A, Fatehi S, Mehta Y, Eetemadi S, Cambria E, Computational G, Unit N. Personality Trait Detection Using Bagged SVM over BERT Word Embedding Ensembles. 2020. pp. 1\u20134."},{"key":"459_CR23","unstructured":"Keh SS, Cheng I-T. Myers-Briggs personality classification and personality-specific language generation using pre-trained language models. July. 2019. http:\/\/arxiv.org\/abs\/1907.06333."},{"key":"459_CR24","unstructured":"Khurana D, Koli A, Khatter K, Singh S. Natural Language Processing : State of The Art , Current Trends and Challenges Natural Language Processing : State of The Art , Current Trends and Challenges Department of Computer Science and Engineering Manav Rachna International University , Faridabad-. ArXiv Preprint ArXiv, August 2017. 2018."},{"issue":"3","key":"459_CR25","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1007\/s11469-018-9940-6","volume":"18","author":"K Kircaburun","year":"2020","unstructured":"Kircaburun K, Alhabash S, Tosunta\u015f \u015eB, Griffiths MD. Uses and gratifications of problematic social media use among university students: a simultaneous examination of the big five of personality traits, social media platforms, and social media use motives. Int J Ment Heal Addict. 2020;18(3):525\u201347. https:\/\/doi.org\/10.1007\/s11469-018-9940-6.","journal-title":"Int J Ment Heal Addict"},{"key":"459_CR26","doi-asserted-by":"publisher","DOI":"10.1002\/cb.1898","author":"HS Lim","year":"2020","unstructured":"Lim HS, Bouchacourt L, Brown-Devlin N. Nonprofit organization advertising on social media: the role of personality, advertizing appeals, and bandwagon effects. J Consumer Behav. 2020. https:\/\/doi.org\/10.1002\/cb.1898.","journal-title":"J Consumer Behav."},{"key":"459_CR27","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: A robustly optimized BERT pretraining approach. ArXiv; 2019. 1."},{"key":"459_CR28","doi-asserted-by":"crossref","unstructured":"Lynn VE, Balasubramanian N, Schwartz HA. Hierarchical modeling for user personality prediction: the role of message-level attention. 2020. 5306\u20135316.","DOI":"10.18653\/v1\/2020.acl-main.472"},{"issue":"3","key":"459_CR29","doi-asserted-by":"publisher","first-page":"587","DOI":"10.1109\/TCSS.2020.2966910","volume":"7","author":"AA Marouf","year":"2020","unstructured":"Marouf AA, Hasan MK, Mahmud H. Comparative analysis of feature selection algorithms for computational personality prediction from social media. IEEE Trans Comput Social Syst. 2020;7(3):587\u201399. https:\/\/doi.org\/10.1109\/TCSS.2020.2966910.","journal-title":"IEEE Trans Comput Social Syst"},{"key":"459_CR30","doi-asserted-by":"publisher","DOI":"10.3390\/app10238631","author":"V Maslej-kre\u0161","year":"2020","unstructured":"Maslej-kre\u0161 V, Sarnovsk\u00fd M, Butka P. Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification. Appl Sci. 2020. https:\/\/doi.org\/10.3390\/app10238631.","journal-title":"Appl Sci"},{"key":"459_CR31","doi-asserted-by":"publisher","unstructured":"Ong V, Rahmanto ADS, Williem W, Suhartono D, Nugroho AE, Andangsari EW, Suprayogi MN. Personality prediction based on Twitter information in Bahasa Indonesia. Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017, 11; 2017. pp. 367\u2013372. https:\/\/doi.org\/10.15439\/2017F359","DOI":"10.15439\/2017F359"},{"issue":"1","key":"459_CR32","first-page":"65","volume":"9","author":"V Ong","year":"2017","unstructured":"Ong V, Rahmanto ADS, Williem, & Suhartono, D. . Exploring personality prediction from text on social media: a literature review. Internetworking Indonesia J. 2017;9(1):65\u201370.","journal-title":"Internetworking Indonesia J"},{"key":"459_CR33","doi-asserted-by":"publisher","unstructured":"Peters ME, Neumann M, Zettlemoyer L, Yih WT. Dissecting contextual word embeddings: Architecture and representation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018; 2020. pp. 1499\u20131509. https:\/\/doi.org\/10.18653\/v1\/d18-1179.","DOI":"10.18653\/v1\/d18-1179"},{"key":"459_CR34","doi-asserted-by":"publisher","unstructured":"Pratama BY, Sarno R. Personality classification based on Twitter text using Naive Bayes, KNN and SVM. Proceedings of 2015 International Conference on Data and Software Engineering, ICODSE 2015; 2016. pp. 170\u2013174. https:\/\/doi.org\/10.1109\/ICODSE.2015.7436992.","DOI":"10.1109\/ICODSE.2015.7436992"},{"issue":"2","key":"459_CR35","doi-asserted-by":"publisher","first-page":"49","DOI":"10.11648\/j.ijdst.20180402.12","volume":"4","author":"S Redhu","year":"2018","unstructured":"Redhu S. Sentiment analysis using text mining: a review. Int J Data Sci Technol. 2018;4(2):49. https:\/\/doi.org\/10.11648\/j.ijdst.20180402.12.","journal-title":"Int J Data Sci Technol"},{"issue":"2016","key":"459_CR36","doi-asserted-by":"publisher","first-page":"61959","DOI":"10.1109\/ACCESS.2018.2876502","volume":"6","author":"MM Tadesse","year":"2018","unstructured":"Tadesse MM, Lin H, Xu B, Yang L. Personality predictions based on user behavior on the Facebook social media platform. IEEE Access. 2018;6(2016):61959\u201369. https:\/\/doi.org\/10.1109\/ACCESS.2018.2876502.","journal-title":"IEEE Access"},{"key":"459_CR37","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1016\/j.procs.2017.10.016","volume":"116","author":"T Tandera","year":"2017","unstructured":"Tandera T, Hendro S, D., Wongso, R., & Prasetio, Y. L. . Personality prediction system from facebook users. Procedia Comp Sci. 2017;116:604\u201311. https:\/\/doi.org\/10.1016\/j.procs.2017.10.016.","journal-title":"Procedia Comp Sci"},{"key":"459_CR38","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem(Nips), 2017. pp. 5999\u20136009."},{"issue":"2","key":"459_CR39","first-page":"17","volume":"54","author":"B Violino","year":"2020","unstructured":"Violino B. Social media trends. Association for Computing Machinery. Commun ACM. 2020;54(2):17.","journal-title":"Commun ACM"},{"key":"459_CR40","unstructured":"Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV. XLNet: generalized autoregressive pretraining for language understanding. ArXiv, NeurIPS; 2019. pp. 1\u201318."},{"key":"459_CR41","doi-asserted-by":"publisher","unstructured":"Yuan C, Wu J, Li H, Wang L. Personality recognition based on user generated content. 2018 15th International Conference on Service Systems and Service Management, ICSSSM 2018; 2018. pp. 1\u20136. https:\/\/doi.org\/10.1109\/ICSSSM.2018.8465006","DOI":"10.1109\/ICSSSM.2018.8465006"},{"key":"459_CR42","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1145\/3318299.3318363","volume":"F1481","author":"H Zheng","year":"2019","unstructured":"Zheng H, Wu C. Predicting personality using facebook status based on semi-supervised learning. ACM Int Conference Proc Series, Part. 2019;F1481:59\u201364. https:\/\/doi.org\/10.1145\/3318299.3318363.","journal-title":"ACM Int Conference Proc Series, Part"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00459-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-021-00459-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00459-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,27]],"date-time":"2022-12-27T15:50:40Z","timestamp":1672156240000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-021-00459-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,17]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["459"],"URL":"https:\/\/doi.org\/10.1186\/s40537-021-00459-1","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,17]]},"assertion":[{"value":"22 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 May 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 May 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"68"}}