{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T23:43:12Z","timestamp":1771026192580,"version":"3.50.1"},"reference-count":59,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T00:00:00Z","timestamp":1615507200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["769066"],"award-info":[{"award-number":["769066"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Opinion mining techniques, investigating if text is expressing a positive or negative opinion, continuously gain in popularity, attracting the attention of many scientists from different disciplines. Specific use cases, however, where the expressed opinion is indisputably positive or negative, render such solutions obsolete and emphasize the need for a more in-depth analysis of the available text. Emotion analysis is a solution to this problem, but the multi-dimensional elements of the expressed emotions in text along with the complexity of the features that allow their identification pose a significant challenge. Machine learning solutions fail to achieve a high accuracy, mainly due to the limited availability of annotated training datasets, and the bias introduced to the annotations by the personal interpretations of emotions from individuals. A hybrid rule-based algorithm that allows the acquisition of a dataset that is annotated with regard to the Plutchik\u2019s eight basic emotions is proposed in this paper. Emoji, keywords and semantic relationships are used in order to identify in an objective and unbiased way the emotion expressed in a short phrase or text. The acquired datasets are used to train machine learning classification models. The accuracy of the models and the parameters that affect it are presented in length through an experimental analysis. The most accurate model is selected and offered through an API to tackle the emotion detection in social media posts.<\/jats:p>","DOI":"10.3390\/informatics8010019","type":"journal-article","created":{"date-parts":[[2021,3,15]],"date-time":"2021-03-15T02:51:48Z","timestamp":1615776708000},"page":"19","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["An Experimental Analysis of Data Annotation Methodologies for Emotion Detection in Short Text Posted on Social Media"],"prefix":"10.3390","volume":"8","author":[{"given":"Maria","family":"Krommyda","sequence":"first","affiliation":[{"name":"Institute of Communication and Computer Systems (ICCS), 15772 Athens, Greece"}]},{"given":"Anastasios","family":"Rigos","sequence":"additional","affiliation":[{"name":"Institute of Communication and Computer Systems (ICCS), 15772 Athens, Greece"}]},{"given":"Kostas","family":"Bouklas","sequence":"additional","affiliation":[{"name":"Institute of Communication and Computer Systems (ICCS), 15772 Athens, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4089-1990","authenticated-orcid":false,"given":"Angelos","family":"Amditis","sequence":"additional","affiliation":[{"name":"Institute of Communication and Computer Systems (ICCS), 15772 Athens, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,12]]},"reference":[{"key":"ref_1","unstructured":"Bakshi, R.K., Kaur, N., Kaur, R., and Kaur, G. (2016, January 16\u201318). Opinion mining and sentiment analysis. Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India."},{"key":"ref_2","first-page":"1","article-title":"Sentiment analysis and opinion mining","volume":"5","author":"Liu","year":"2012","journal-title":"Synth. Lect. Hum. Lang. Technol."},{"key":"ref_3","unstructured":"Agarwal, A., Xie, B., Vovsha, I., Rambow, O., and Passonneau, R.J. (2011, January 23). Sentiment analysis of twitter data. Proceedings of the Workshop on Language in Social Media (LSM 2011), Portland, OR, USA."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1145\/2436256.2436274","article-title":"Techniques and applications for sentiment analysis","volume":"56","author":"Feldman","year":"2013","journal-title":"Commun. ACM"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"e12189","DOI":"10.1002\/eng2.12189","article-title":"Text-based emotion detection: Advances, challenges, and opportunities","volume":"2","author":"Acheampong","year":"2020","journal-title":"Eng. Rep."},{"key":"ref_6","unstructured":"Bollen, J., Pepe, A., and Mao, H. (2009). Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1246","DOI":"10.1109\/JBHI.2015.2403839","article-title":"We feel: Mapping emotion on Twitter","volume":"19","author":"Larsen","year":"2015","journal-title":"IEEE J. Biomed. Health Informat."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wang, W., Chen, L., Thirunarayan, K., and Sheth, A.P. (2012, January 3\u20135). Harnessing twitter \u201cbig data\u201d for automatic emotion identification. Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, Amsterdam, The Netherlands.","DOI":"10.1109\/SocialCom-PASSAT.2012.119"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Felbo, B., Mislove, A., S\u00f8gaard, A., Rahwan, I., and Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv.","DOI":"10.18653\/v1\/D17-1169"},{"key":"ref_10","first-page":"48","article-title":"Multi-class twitter emotion classification: A new approach","volume":"4","author":"Balabantaray","year":"2012","journal-title":"Int. J. Appl. Inf. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Alm, C.O., Roth, D., and Sproat, R. (2005, January 6\u20138). Emotions from text: Machine learning for text-based emotion prediction. Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada.","DOI":"10.3115\/1220575.1220648"},{"key":"ref_12","unstructured":"Plutchik, R. (1991). The Emotions, University Press of America."},{"key":"ref_13","first-page":"11","article-title":"Cognitive determinants of emotion: A structural theory","volume":"5","author":"Roseman","year":"1984","journal-title":"Rev. Personal. Soc. Psychol."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Darwin, C., and Prodger, P. (1998). The Expression of the Emotions in Man and Animals, Oxford University Press.","DOI":"10.1093\/oso\/9780195112719.002.0002"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1080\/00223980.1957.9713059","article-title":"A methodological discussion of nonverbal behavior","volume":"43","author":"Ekman","year":"1957","journal-title":"J. Psychol."},{"key":"ref_16","unstructured":"Tomkins, S.S. (1962). Affect Imagery Consciousness: Volume I: The Positive Affects, Springer Publishing Company."},{"key":"ref_17","unstructured":"Segerstrale, U.P., and Molnar, P. (1997). Universal facial expressions of emotion. Nonverbal Communication: Where Nature Meets Culture, University Of California."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Plutchik, R. (1980). A general psychoevolutionary theory of emotion. Theories of Emotion, Elsevier.","DOI":"10.1016\/B978-0-12-558701-3.50007-7"},{"key":"ref_19","unstructured":"(2020, November 07). Deep Learning for NLP: An Overview of Recent Trends. Available online: https:\/\/medium.com\/dair-ai\/deep-learning-for-nlp-an-overview-of-recent-trends-d0d8f40a776d."},{"key":"ref_20","unstructured":"Jurafsky, D. (2000). Speech & Language Processing, Pearson Education India."},{"key":"ref_21","unstructured":"Manning, C., and Schutze, H. (1999). Foundations of Statistical Natural Language Processing, MIT Press."},{"key":"ref_22","first-page":"1","article-title":"Natural language processing for social media","volume":"8","author":"Farzindar","year":"2015","journal-title":"Synth. Lect. Hum. Lang. Technol."},{"key":"ref_23","unstructured":"Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, O\u2019Reilly Media, Inc."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Canales, L., and Mart\u00ednez-Barco, P. (2014, January 20\u201324). Emotion detection from text: A survey. Proceedings of the Workshop on Natural Language Processing in the 5th Information Systems Research Working Days (JISIC), Quito, Ecuador.","DOI":"10.3115\/v1\/W14-6905"},{"key":"ref_25","unstructured":"Twitter (2020, November 17). Available online: https:\/\/twitter.com\/home?lang=en."},{"key":"ref_26","unstructured":"(2020, November 07). Twitter Developer Docs. Available online: https:\/\/developer.twitter.com\/en\/docs."},{"key":"ref_27","unstructured":"Roesslein, J. (2020, November 17). Tweepy: Twitter for Python!. Available online: Https:\/\/github.com\/tweepy\/tweepy."},{"key":"ref_28","unstructured":"(2020, March 10). Filter realtime Tweets. Available online: https:\/\/developer.twitter.com\/en\/docs\/twitter-api\/v1\/tweets\/filter-realtime\/guides\/basic-stream-parameters."},{"key":"ref_29","unstructured":"(2021, February 14). The 500 Most Frequently Used Words on Twitter. Available online: https:\/\/techland.time.com\/2009\/06\/08\/the-500-most-frequently-used-words-on-twitter\/."},{"key":"ref_30","unstructured":"(2020, November 17). NLTP Corpus. Available online: http:\/\/www.nltk.org\/howto\/corpus.html."},{"key":"ref_31","unstructured":"(2020, November 17). Abbreviations. Available online: https:\/\/www.abbreviations.com\/."},{"key":"ref_32","unstructured":"(2020, November 17). Tweet Preprocessor. Available online: https:\/\/pypi.org\/project\/tweet-preprocessor\/."},{"key":"ref_33","unstructured":"(2020, October 28). Emoji. Available online: https:\/\/github.com\/carpedm20\/emoji\/."},{"key":"ref_34","unstructured":"(2020, November 17). Emoji Tracker. Available online: http:\/\/emojitracker.com\/."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Krommyda, M., Rigos, A., Bouklas, K., and Amditis, A. (2020, January 16\u201318). Emotion detection in Twitter posts: A rule-based algorithm for annotated data acquisition. Proceedings of the 2020 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA.","DOI":"10.1109\/CSCI51800.2020.00050"},{"key":"ref_36","unstructured":"Mohammad, S.M., and Turney, P.D. (2013). Nrc Emotion Lexicon, National Research Council Canada."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Fellbaum, C. (2012). WordNet. The Encyclopedia of Applied Linguistics, John Wiley and Sons, Inc.","DOI":"10.1002\/9781405198431.wbeal1285"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Krommyda, M., and Kantere, V. (2019, January 25\u201327). Understanding SPARQL Endpoints through Targeted Exploration and Visualization. Proceedings of the 2019 First International Conference on Graph Computing (GC), Laguna Hills, CA, USA.","DOI":"10.1109\/GC46384.2019.00012"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"39","DOI":"10.35708\/GC1868-126723","article-title":"A Framework for Exploration and Visualization of SPARQL Endpoint Information","volume":"1","author":"Krommyda","year":"2020","journal-title":"Int. J. Graph Comput."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Krommyda, M., and Kantere, V. (2019, January 9\u201311). Improving the Quality of the Conversational Datasets through Extensive Semantic Analysis. Proceedings of the 2019 IEEE International Conference on Conversational Data Knowledge Engineering (CDKE), San Diego, CA, USA.","DOI":"10.1109\/CDKE46621.2019.00008"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1142\/S1793351X2050004X","article-title":"Semantic analysis for conversational datasets: Improving their quality using semantic relationships","volume":"14","author":"Krommyda","year":"2020","journal-title":"Int. J. Semant. Comput."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1207\/s15516709cog1402_1","article-title":"Finding structure in time","volume":"14","author":"Elman","year":"1990","journal-title":"Cogn. Sci."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Nowak, J., Taspinar, A., and Scherer, R. (2017, January 11\u201315). LSTM recurrent neural networks for short text and sentiment classification. Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland.","DOI":"10.1007\/978-3-319-59060-8_50"},{"key":"ref_45","unstructured":"Olah, C. (2020, November 17). Understanding lstm networks. Available online: http:\/\/colah.github.io\/posts\/2015-08-Understanding-LSTMs."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/j.neucom.2018.09.082","article-title":"Time series forecasting of petroleum production using deep LSTM recurrent networks","volume":"323","author":"Sagheer","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.neunet.2019.12.030","article-title":"Transductive LSTM for time-series prediction: An application to weather forecasting","volume":"125","author":"Karevan","year":"2020","journal-title":"Neural Netw."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1016\/j.neucom.2019.12.129","article-title":"Application of LSTM for Short Term Fog Forecasting based on Meteorological Elements","volume":"408","author":"Miao","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.neucom.2018.04.045","article-title":"LSTM with sentence representations for document-level sentiment classification","volume":"308","author":"Rao","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.neucom.2018.09.049","article-title":"Using a stacked residual LSTM model for sentiment intensity prediction","volume":"322","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_51","unstructured":"Rao, A., and Spasojevic, N. (2016). Actionable and political text classification using word embeddings and lstm. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Bottou, L. (2012). Stochastic gradient descent tricks. Neural Networks: Tricks of the Trade, Springer.","DOI":"10.1007\/978-3-642-35289-8_25"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Kibriya, A.M., Frank, E., Pfahringer, B., and Holmes, G. (2004, January 4\u20136). Multinomial naive bayes for text categorization revisited. Proceedings of the Australasian Joint Conference on Artificial Intelligence, Cairns, QLD, Australia.","DOI":"10.1007\/978-3-540-30549-1_43"},{"key":"ref_55","unstructured":"Zuo, Z. (2021, February 14). Sentiment Analysis of Ateam Review Datasets Using Naive Bayes and Decision Tree Classifier. Available online: http:\/\/hdl.handle.net\/2142\/100126."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1016\/j.procs.2018.01.150","article-title":"Random forest and support vector machine based hybrid approach to sentiment analysis","volume":"127","author":"Lazaar","year":"2018","journal-title":"Procedia Comput. Sci."},{"key":"ref_57","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_58","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_59","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2021, February 14). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online: tensorflow.org."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/8\/1\/19\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:34:30Z","timestamp":1760160870000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/8\/1\/19"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,12]]},"references-count":59,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["informatics8010019"],"URL":"https:\/\/doi.org\/10.3390\/informatics8010019","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,12]]}}}