{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T06:14:03Z","timestamp":1774678443940,"version":"3.50.1"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,3,22]],"date-time":"2024-03-22T00:00:00Z","timestamp":1711065600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,22]],"date-time":"2024-03-22T00:00:00Z","timestamp":1711065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005967","name":"Linnaeus University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005967","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Intell Inf Syst"],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Nowadays, various applications across industries, healthcare, and security have begun adopting automatic sentiment analysis and emotion detection in short texts, such as posts from social media. Twitter stands out as one of the most popular online social media platforms due to its easy, unique, and advanced accessibility using the API. On the other hand, supervised learning is the most widely used paradigm for tasks involving sentiment polarity and fine-grained emotion detection in short and informal texts, such as Twitter posts. However, supervised learning models are data-hungry and heavily reliant on abundant labeled data, which remains a challenge. This study aims to address this challenge by creating a large-scale real-world dataset of 17.5 million tweets. A distant supervision approach relying on emojis available in tweets is applied to label tweets corresponding to Ekman\u2019s six basic emotions. Additionally, we conducted a series of experiments using various conventional machine learning models and deep learning, including transformer-based models, on our dataset to establish baseline results. The experimental results and an extensive ablation analysis on the dataset showed that BiLSTM with FastText and an attention mechanism outperforms other models in both classification tasks, achieving an F1-score of 70.92% for sentiment classification and 54.85% for emotion detection.<\/jats:p>","DOI":"10.1007\/s10844-024-00845-0","type":"journal-article","created":{"date-parts":[[2024,3,22]],"date-time":"2024-03-22T08:39:20Z","timestamp":1711096760000},"page":"1045-1070","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Leveraging distant supervision and deep learning for twitter sentiment and emotion classification"],"prefix":"10.1007","volume":"62","author":[{"given":"Muhamet","family":"Kastrati","sequence":"first","affiliation":[]},{"given":"Zenun","family":"Kastrati","sequence":"additional","affiliation":[]},{"given":"Ali","family":"Shariq Imran","sequence":"additional","affiliation":[]},{"given":"Marenglen","family":"Biba","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,3,22]]},"reference":[{"key":"845_CR1","unstructured":"Aragon, M.E., Lopez-Monroy, A.P., Gonzalez-Gurrola, L.-C.G., & Montes, M. (2021) Detecting mental disorders in social media through emotional patterns-the case of anorexia and depression. IEEE Transactions on Affective Computing"},{"key":"845_CR2","unstructured":"Batra, R., Kastrati, Z., Imran, A.S., Daudpota, S.M., & Ghafoor, A. (2021). A large-scale tweet dataset for urdu text sentiment analysis. arXiv:2021.03057"},{"issue":"10","key":"845_CR3","doi-asserted-by":"publisher","first-page":"5344","DOI":"10.3390\/su13105344","volume":"13","author":"R Batra","year":"2021","unstructured":"Batra, R., Imran, A. S., Kastrati, Z., Ghafoor, A., Daudpota, S. M., & Shaikh, S. (2021). Evaluating polarity trend amidst the coronavirus crisis in peoples\u2019 attitudes toward the vaccination drive. Sustainability, 13(10), 5344.","journal-title":"Sustainability"},{"issue":"1","key":"845_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/2200000006","volume":"2","author":"Y Bengio","year":"2009","unstructured":"Bengio, Y., et al. (2009). Learning deep architectures for ai. Foundations and Trends\u00ae in Machine Learning, 2(1), 1\u2013127.","journal-title":"Foundations and Trends\u00ae in Machine Learning"},{"issue":"8","key":"845_CR5","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798\u20131828.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"845_CR6","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2017","unstructured":"Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135\u2013146.","journal-title":"Transactions of the Association for Computational Linguistics"},{"issue":"3","key":"845_CR7","doi-asserted-by":"publisher","first-page":"181","DOI":"10.26599\/BDMA.2019.9020002","volume":"2","author":"M Bouazizi","year":"2019","unstructured":"Bouazizi, M., & Ohtsuki, T. (2019). Multi-class sentiment analysis on twitter: Classification performance and challenges. Big Data Mining and Analytics, 2(3), 181\u2013194.","journal-title":"Big Data Mining and Analytics"},{"key":"845_CR8","doi-asserted-by":"crossref","unstructured":"Byrkjeland, M., Lichtenberg, F. G., & Gamb\u00e4ck, B. (2018). Ternary twitter sentiment classification with distant supervision and sentiment-specific word embeddings. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 97\u2013106","DOI":"10.18653\/v1\/W18-6215"},{"key":"845_CR9","unstructured":"Canales, L., Daelemans, W., Boldrini, E., & Mart\u00ednez-Barco, P. (2019). Emolabel: semi-automatic methodology for emotion annotation of social media text. IEEE Transactions on Affective Computing"},{"issue":"3","key":"845_CR10","doi-asserted-by":"publisher","first-page":"433","DOI":"10.1109\/TAFFC.2018.2807817","volume":"11","author":"N Colneri\u010d","year":"2018","unstructured":"Colneri\u010d, N., & Dem\u0161ar, J. (2018). Emotion recognition on twitter: Comparative study and training a unison model. IEEE Transactions on Affective Computing, 11(3), 433\u2013446.","journal-title":"IEEE Transactions on Affective Computing"},{"key":"845_CR11","unstructured":"Davidov, D., Tsur, O., & Rappoport, A. (2010). Enhanced sentiment learning using twitter hashtags and smileys. In: Coling 2010: Posters, pp. 241\u2013249"},{"key":"845_CR12","unstructured":"Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805"},{"key":"845_CR13","doi-asserted-by":"crossref","unstructured":"Edalati, M., Imran, A.S., Kastrati, Z., & Daudpota, S.M. (2021). The potential of machine learning algorithms for sentiment classification of students\u2019 feedback on mooc. In: Proceedings of SAI Intelligent Systems Conference, pp. 11\u201322. Springer","DOI":"10.1007\/978-3-030-82199-9_2"},{"issue":"4","key":"845_CR14","doi-asserted-by":"publisher","first-page":"384","DOI":"10.1037\/0003-066X.48.4.384","volume":"48","author":"P Ekman","year":"1993","unstructured":"Ekman, P. (1993). Facial expression and emotion. American Psychologist, 48(4), 384.","journal-title":"American Psychologist"},{"issue":"12","key":"845_CR15","first-page":"2009","volume":"1","author":"A Go","year":"2009","unstructured":"Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(12), 2009.","journal-title":"CS224N Project Report, Stanford"},{"key":"845_CR16","doi-asserted-by":"publisher","first-page":"181074","DOI":"10.1109\/ACCESS.2020.3027350","volume":"8","author":"AS Imran","year":"2020","unstructured":"Imran, A. S., Daudpota, S. M., Kastrati, Z., & Batra, R. (2020). Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on covid-19 related tweets. IEEE Access, 8, 181074\u2013181090.","journal-title":"IEEE Access"},{"key":"845_CR17","doi-asserted-by":"crossref","unstructured":"Islam, J., Ahmed, S., Akhand, M., & Siddique, N. (2020). Improved emotion recognition from microblog focusing on both emoticon and text. In: 2020 IEEE Region 10 Symposium (TENSYMP), pp. 778\u2013782. IEEE","DOI":"10.1109\/TENSYMP50017.2020.9230725"},{"key":"845_CR18","unstructured":"Kang, X., Shi, X., Wu, Y., & Ren, F. (2020). Active learning with complementary sampling for instructing class-biased multi-label text emotion classification. IEEE Transactions on Affective Computing"},{"issue":"3","key":"845_CR19","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1007\/s10796-017-9810-y","volume":"20","author":"KK Kapoor","year":"2018","unstructured":"Kapoor, K. K., Tamilmani, K., Rana, N. P., Patil, P., Dwivedi, Y. K., & Nerur, S. (2018). Advances in social media research: Past, present and future. Information Systems Frontiers, 20(3), 531\u2013558.","journal-title":"Information Systems Frontiers"},{"key":"845_CR20","unstructured":"Kastrati, M., & Biba, M. (2021). A state-of-the-art survey on deep learning methods and applications. International Journal of Computer Science and Information Security (IJCSIS),19(7)"},{"key":"845_CR21","doi-asserted-by":"crossref","unstructured":"Kastrati, M., Biba, M., Imran, A.S., & Kastrati, Z. (2022). Sentiment polarity and emotion detection from tweets using distant supervision and deep learning models. In: International Symposium on Methodologies for Intelligent Systems, pp. 13\u201323. Springer","DOI":"10.1007\/978-3-031-16564-1_2"},{"issue":"10","key":"845_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/electronics10101133","volume":"10","author":"Z Kastrati","year":"2021","unstructured":"Kastrati, Z., Ahmedi, L., Kurti, A., Kadriu, F., Murtezaj, D., & Gashi, F. (2021). A deep learning sentiment analyser for social media comments in low-resource languages. Electronics, 10(10), 1\u201319.","journal-title":"Electronics"},{"issue":"12","key":"845_CR23","doi-asserted-by":"publisher","first-page":"0144296","DOI":"10.1371\/journal.pone.0144296","volume":"10","author":"P Kralj Novak","year":"2015","unstructured":"Kralj Novak, P., Smailovi\u0107, J., Sluban, B., & Mozeti\u010d, I. (2015). Sentiment of emojis. PloS One, 10(12), 0144296.","journal-title":"PloS One"},{"key":"845_CR24","doi-asserted-by":"crossref","unstructured":"Krommyda, M., Rigos, A., Bouklas, K., & Amditis, A. (2020). Emotion detection in twitter posts: a rule-based algorithm for annotated data acquisition. In: 2020 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 257\u2013262. IEEE","DOI":"10.1109\/CSCI51800.2020.00050"},{"issue":"3","key":"845_CR25","doi-asserted-by":"publisher","first-page":"43","DOI":"10.3390\/bdcc5030043","volume":"5","author":"S Kusal","year":"2021","unstructured":"Kusal, S., Patil, S., Kotecha, K., Aluvalu, R., & Varadarajan, V. (2021). Ai based emotion detection for textual big data: Techniques and contribution. Big Data and Cognitive Computing, 5(3), 43.","journal-title":"Big Data and Cognitive Computing"},{"issue":"7553","key":"845_CR26","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436\u2013444.","journal-title":"Nature"},{"key":"845_CR27","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692"},{"key":"845_CR28","doi-asserted-by":"crossref","unstructured":"Mohammad, S.M. (2021). Sentiment analysis: Automatically detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (pp. 323\u2013379). Elsevier","DOI":"10.1016\/B978-0-12-821124-3.00011-9"},{"key":"845_CR29","doi-asserted-by":"crossref","unstructured":"Mohammad, S.M., & Bravo-Marquez, F. (2017). Wassa-2017 shared task on emotion intensity. arXiv:1708.03700","DOI":"10.18653\/v1\/W17-5205"},{"issue":"2","key":"845_CR30","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1111\/coin.12024","volume":"31","author":"SM Mohammad","year":"2015","unstructured":"Mohammad, S. M., & Kiritchenko, S. (2015). Using hashtags to capture fine emotion categories from tweets. Computational Intelligence, 31(2), 301\u2013326.","journal-title":"Computational Intelligence"},{"issue":"3","key":"845_CR31","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1111\/j.1467-8640.2012.00460.x","volume":"29","author":"SM Mohammad","year":"2013","unstructured":"Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29(3), 436\u2013465.","journal-title":"Computational Intelligence"},{"key":"845_CR32","unstructured":"Ng, A. (2017). Machine learning yearning. 139.http:\/\/www.mlyearning.org\/(96)"},{"key":"845_CR33","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., Manning, C.D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532\u20131543","DOI":"10.3115\/v1\/D14-1162"},{"key":"845_CR34","doi-asserted-by":"crossref","unstructured":"Plutchik, R. (1980). A general psychoevolutionary theory of emotion. (pp. 3\u201333) Elsevier","DOI":"10.1016\/B978-0-12-558701-3.50007-7"},{"key":"845_CR35","doi-asserted-by":"crossref","unstructured":"Polignano, M., Basile, P., Gemmis, M., & Semeraro, G. (2019). A comparison of word-embeddings in emotion detection from text using bilstm, cnn and self-attention. In: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, pp. 63\u201368","DOI":"10.1145\/3314183.3324983"},{"key":"845_CR36","doi-asserted-by":"crossref","unstructured":"Schoene, A.M., Bojani\u0107, L., Nghiem, M.-Q., Hunt, I.M., & Ananiadou, S. (2022). Classifying suicide-related content and emotions on twitter using graph convolutional neural networks. IEEE Transactions on Affective Computing","DOI":"10.1109\/TAFFC.2022.3221683"},{"issue":"1","key":"845_CR37","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929\u20131958.","journal-title":"The Journal of Machine Learning Research"},{"key":"845_CR38","doi-asserted-by":"crossref","unstructured":"Suttles, J., & Ide, N. (2013). Distant supervision for emotion classification with discrete binary values. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 121\u2013136. Springer","DOI":"10.1007\/978-3-642-37256-8_11"},{"key":"845_CR39","unstructured":"Teja, R. (2021). Twitter-Sentiment-Analysis-and-Tweet-Extraction. GitHub"},{"key":"845_CR40","doi-asserted-by":"crossref","unstructured":"Wang, W., Chen, L., Thirunarayan, K., & Sheth, A.P. (2012). Harnessing twitter \"big data\" for automatic emotion identification. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing, pp. 587\u2013592. IEEE","DOI":"10.1109\/SocialCom-PASSAT.2012.119"},{"key":"845_CR41","unstructured":"Wood, I., & Ruder, S. (2016). Emoji as emotion tags for tweets. In: Proc. of the Emotion and Sentiment Analysis Workshop, Portoro\u017e, pp. 76\u201379"},{"key":"845_CR42","doi-asserted-by":"publisher","first-page":"6286","DOI":"10.1109\/ACCESS.2020.3047831","volume":"9","author":"A Yousaf","year":"2020","unstructured":"Yousaf, A., Umer, M., Sadiq, S., Ullah, S., Mirjalili, S., Rupapara, V., & Nappi, M. (2020). Emotion recognition by textual tweets classification using voting classifier (lr-sgd). IEEE Access, 9, 6286\u20136295.","journal-title":"IEEE Access"},{"issue":"2","key":"845_CR43","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3185045","volume":"9","author":"D Zimbra","year":"2018","unstructured":"Zimbra, D., Abbasi, A., Zeng, D., & Chen, H. (2018). The state-of-the-art in twitter sentiment analysis: A review and benchmark evaluation. ACM Transactions on Management Information Systems (TMIS), 9(2), 1\u201329.","journal-title":"ACM Transactions on Management Information Systems (TMIS)"},{"key":"845_CR44","doi-asserted-by":"crossref","unstructured":"Zucco, C., Calabrese, B., & Cannataro, M. (2017). Sentiment analysis and affective computing for depression monitoring. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1988\u20131995. IEEE","DOI":"10.1109\/BIBM.2017.8217966"}],"container-title":["Journal of Intelligent Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10844-024-00845-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10844-024-00845-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10844-024-00845-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T08:11:59Z","timestamp":1725523919000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10844-024-00845-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,22]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["845"],"URL":"https:\/\/doi.org\/10.1007\/s10844-024-00845-0","relation":{},"ISSN":["0925-9902","1573-7675"],"issn-type":[{"value":"0925-9902","type":"print"},{"value":"1573-7675","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,22]]},"assertion":[{"value":"14 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 January 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 January 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 March 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}