{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T01:09:54Z","timestamp":1779239394780,"version":"3.51.4"},"reference-count":33,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T00:00:00Z","timestamp":1653264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>This study aims to explore how machine learning classification accuracy changes with different demographic groups. The HappyDB is a dataset that contains over 100,000 happy statements, incorporating demographic information that includes marital status, gender, age, and parenthood status. Using the happiness category field, we test different types of machine learning classifiers to predict what category of happiness the statements belong to, for example, whether they indicate happiness relating to achievement or affection. The tests were initially conducted with three distinct classifiers and the best performing model was the convolutional neural network (CNN) model, which is a deep learning algorithm, achieving an F1 score of 0.897 when used with the complete dataset. This model was then used as the main classifier to further analyze the results and to establish any variety in performance when tested on different demographic groups. We analyzed the results to see if classification accuracy was improved for different demographic groups, and found that the accuracy of prediction within this dataset declined with age, with the exception of the single parent subgroup. The results also showed improved performance for the married and parent subgroups, and lower performances for the non-parent and un-married subgroups, even when investigating a balanced sample.<\/jats:p>","DOI":"10.3390\/computers11050083","type":"journal-article","created":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T22:57:22Z","timestamp":1653346642000},"page":"83","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["How Machine Learning Classification Accuracy Changes in a Happiness Dataset with Different Demographic Groups"],"prefix":"10.3390","volume":"11","author":[{"given":"Colm","family":"Sweeney","sequence":"first","affiliation":[{"name":"School of Psychology, Ulster University, Coleraine BT52 1SA, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9677-0725","authenticated-orcid":false,"given":"Edel","family":"Ennis","sequence":"additional","affiliation":[{"name":"School of Psychology, Ulster University, Coleraine BT52 1SA, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1554-0785","authenticated-orcid":false,"given":"Maurice","family":"Mulvenna","sequence":"additional","affiliation":[{"name":"School of Computing, Ulster University, Jordanstown BT37 0QB, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raymond","family":"Bond","sequence":"additional","affiliation":[{"name":"School of Computing, Ulster University, Jordanstown BT37 0QB, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8786-2118","authenticated-orcid":false,"given":"Siobhan","family":"O\u2019Neill","sequence":"additional","affiliation":[{"name":"School of Psychology, Ulster University, Coleraine BT52 1SA, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,23]]},"reference":[{"key":"ref_1","unstructured":"Compton, W., and Hoffman, E. (2013). Positive Psychology: The Science of Happiness and Flourishing, Sage Publications. [3rd ed.]."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"347","DOI":"10.3390\/stats2030025","article-title":"Computing Happiness from Textual Data","volume":"2","author":"Mohamed","year":"2019","journal-title":"Stats"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Vashisth, P., and Meehan, K. (2020, January 9\u201310). Gender Classification using Twitter Text Data. Proceedings of the 31st Irish Signals and Systems Conference, Cork, Ireland.","DOI":"10.1109\/ISSC49989.2020.9180161"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1177\/0256090919853933","article-title":"Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy","volume":"44","year":"2019","journal-title":"Vikalpa"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Sweeney, C., Ennis, E., Bond, R., Mulvenna, M., and O\u2019Neill, S. (2021, January 5). Understanding a Happiness Dataset: How the Machine Learning Classification Accuracy Changes with Different Demographic Groups. Proceedings of the 2021 IEEE Symposium on Computers and Communications (ISCC \u201821), Athens, Greece.","DOI":"10.1109\/ISCC53001.2021.9631455"},{"key":"ref_6","unstructured":"Akari, A., Evensen, S., Golshan, B., Halevy, A., Li, V., Lopatenko, A., Stepanov, D., Suhara, Y., Tan, W.-C., and Xu, Y. (2018). Happydb: A Corpus of 100,000 Crowdsourced Happy Moments. arXiv."},{"key":"ref_7","unstructured":"Raj Kumar, G., Bhattacharya, P., and Yang, Y. (2020, May 26). What Constitutes Happiness? Predicting and Characterizing the Ingredients of Happiness Using Emotion Intensity Analysis. Available online: http:\/\/ceur-ws.org\/Vol-2328\/4_3_paper_22.pdf."},{"key":"ref_8","unstructured":"Jaidka, K., Mumick, S., Chhaya, N., and Ungar, L. (2020, May 27). The CL-Aff Happiness Shared Task: Results and Key Insights. Available online: http:\/\/ceur-ws.org\/Vol-2328\/2_paper.pdf."},{"key":"ref_9","unstructured":"Siriaraya, P., Suzuki, K., and Nakajima, S. (2019, January 20). Utilizing Collaborative Filtering to Recommend Opportunities for Positive Affect in Daily Life. Proceedings of the HealthRecSys@ RecSys, Copenhagen, Denmark."},{"key":"ref_10","unstructured":"Torres, J., and Vaca, C. (2019, January 27). Neural Semi-Supervised Learning for Multi-Labeled Short-Texts. Proceedings of the 2nd Workshop on Affective Content Analysis@ AAAI (AffCon2019), Honolulu, HI, USA."},{"key":"ref_11","unstructured":"VanZyl, L., and Rothmann, S. (2019). Effect on Happiness of Happiness Self-monitoring and Comparison with Others. Positive Psychology Interventions: Theories, Methodologies and Applications within Multi-Cultural Contexts, Springer International."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1521\/ijct.2015.8.2.114","article-title":"The Three-Step Theory (3ST): A New Theory of Suicide Rooted in the \u201cIdeation-to-Action\u201d Framework","volume":"8","author":"Klonsky","year":"2015","journal-title":"Int. J. Cogn. Ther."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gillick, D. (2010, January 26\u201330). Can Conversational Word Usage Be Used to Predict Speaker Demographics?. Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Makuhari, Japan.","DOI":"10.21437\/Interspeech.2010-421"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1093\/llc\/17.4.401","article-title":"Automatically Categorizing Written Texts by Author Gender","volume":"17","author":"Koppel","year":"2002","journal-title":"Lit. Linguist. Comput."},{"key":"ref_15","unstructured":"Schler, J., Koppel, M., Argamon, S., and Pennebaker, J. (2006, January 27\u201329). Effects of Age and Gender on Blogging. Proceedings of the 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Palo Alto, CA, USA."},{"key":"ref_16","unstructured":"Filippova, K. (2012, January 12\u201314). User Demographics and Language in an Implicit Social Network. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL \u201812), Jeju Island, Korea."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1111\/j.1467-9841.2006.00287.x","article-title":"Gender and Genre Variation in Weblogs","volume":"10","author":"Herring","year":"2006","journal-title":"J. Socioling."},{"key":"ref_18","first-page":"10211","article-title":"Gender, Identity and Language Use in Teenager Blogs","volume":"10","author":"Huffaker","year":"2005","journal-title":"J. Comput.-Mediat. Commun."},{"key":"ref_19","unstructured":"Burger, J.D., and Henderson, J. (2006, January 27\u201329). An Exploration of Observable Features Related to Blogger Age. Proceedings of the 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Palo Alto, CA, USA."},{"key":"ref_20","unstructured":"Yan, X., and Yan, L. (2006, January 27\u201329). Gender Classification of Weblogs Authors. Proceedings of the 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Palo Alto, CA, USA."},{"key":"ref_21","unstructured":"Popescu, A., and Grefenstette, G. (2010, January 23\u201326). Mining User Home Location and Gender from Flickr Tags. Proceedings of the International Conference on Weblogs and Social Media (ICWSM-10), Washington, DC, USA."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kincaid, J.P., Fishburne, R.P., Rogers, R.L., and Chissom, B.S. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel, Naval Technical Training. Research Branch Report.","DOI":"10.21236\/ADA006655"},{"key":"ref_23","unstructured":"Mishra, P. (2021, November 02). 4 Popular Techniques to Measure the Readability of a Text Document. Available online: https:\/\/medium.com\/mlearning-ai\/4-popular-techniques-to-measure-the-readability-of-a-text-document-32a0882db6b2."},{"key":"ref_24","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C. (2014, January 26\u201328). GloVe: Global Vectors for Word Representation. Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP \u201814), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_26","unstructured":"Lewis, D. (1992). Representation and Learning in Information Retrieval. [Ph.D. Thesis, University of Massachusetts]."},{"key":"ref_27","unstructured":"Pak, A., and Paroubek, P. (2010, January 17\u201323). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of the International Conference on Language Resources and Evaluation (LREC \u201910), Valletta, Malta."},{"key":"ref_28","unstructured":"Melville, P., Gryc, W., and Lawrence, R. (July, January 28). Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France."},{"key":"ref_29","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T. (2017, January 4\u20139). Lightgbm: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/21.97458","article-title":"A Survey of Decision Tree Classifier Methodology","volume":"21","author":"Safavian","year":"1991","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1016\/j.patcog.2017.10.013","article-title":"Recent Advances in Convolutional Neural Networks","volume":"77","author":"Gu","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Tiwari, V., Lennon, R., and Dowling, T. (2020, January 11\u201312). Not Everything You Read Is True! Fake News Detection using Machine learning Algorithms. Proceedings of the 2020 31st Irish Signals and Systems Conference (ISSC), Letterkenny, Ireland.","DOI":"10.1109\/ISSC49989.2020.9180206"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Li, Y., Sun, G., and Zhu, Y. (2010, January 15\u201317). Data Imbalance Problem in Text Classification. Proceedings of the 2010 Third International Symposium on Information Processing, Qingdao, China.","DOI":"10.1109\/ISIP.2010.47"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/11\/5\/83\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:16:48Z","timestamp":1760138208000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/11\/5\/83"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,23]]},"references-count":33,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["computers11050083"],"URL":"https:\/\/doi.org\/10.3390\/computers11050083","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,23]]}}}