{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T09:28:45Z","timestamp":1767864525750,"version":"3.49.0"},"reference-count":61,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2017,3,4]],"date-time":"2017-03-04T00:00:00Z","timestamp":1488585600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide spectrum of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since no one can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample is not representative in order to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this work, we develop two systems: the first in the MapReduce and the second in the Apache Spark framework for programming with Big Data. The algorithm exploits all hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification method of diverse sentiment types in a parallel and distributed manner. Moreover, the sentiment analysis tool is based on Machine Learning methodologies alongside Natural Language Processing techniques and utilizes Apache Spark\u2019s Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis as well as Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Finally, the proposed system was trained and validated with real data crawled by Twitter, and, through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable while confirming the quality of our sentiment identification.<\/jats:p>","DOI":"10.3390\/a10010033","type":"journal-article","created":{"date-parts":[[2017,3,9]],"date-time":"2017-03-09T06:56:43Z","timestamp":1489042603000},"page":"33","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":53,"title":["Large Scale Implementations for Twitter Sentiment Classification"],"prefix":"10.3390","volume":"10","author":[{"given":"Andreas","family":"Kanavos","sequence":"first","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, Patras 26504, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nikolaos","family":"Nodarakis","sequence":"additional","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, Patras 26504, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Spyros","family":"Sioutas","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, Corfu 49100, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Athanasios","family":"Tsakalidis","sequence":"additional","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, Patras 26504, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dimitrios","family":"Tsolis","sequence":"additional","affiliation":[{"name":"Department of Cultural Heritage Management and New Technologies, University of Patras, Agrinio 30100, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4073-7256","authenticated-orcid":false,"given":"Giannis","family":"Tzimas","sequence":"additional","affiliation":[{"name":"Computer &amp; Informatics Engineering Department, Technological Educational Institute of Western Greece, Patras 26334, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2017,3,4]]},"reference":[{"key":"ref_1","unstructured":"Sentiment. Available online: http:\/\/www.thefreedictionary.com\/sentiment."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wang, X., Wei, F., Liu, X., Zhou, M., and Zhang, M. (2011, January 24\u201328). Topic Sentiment Analysis in Twitter: A Graph-based Hashtag Sentiment Classification Approach. Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), Glasgow, UK.","DOI":"10.1145\/2063576.2063726"},{"key":"ref_3","unstructured":"Emoticon. Available online: http:\/\/dictionary.reference.com\/browse\/emoticon."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lin, J., and Dyer, C. (2010). Data-Intensive Text Processing with MapReduce, Morgan and Claypool Publishers.","DOI":"10.1007\/978-3-031-02136-7"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"van Banerveld, M., Le-Khac, N., and Kechadi, M.T. (2014, January 19\u201321). Performance Evaluation of a Natural Language Processing Approach Applied in White Collar Crime Investigation. Proceedings of the Future Data and Security Engineering (FDSE), Ho Chi Minh City, Vietnam.","DOI":"10.1007\/978-3-319-12778-1_3"},{"key":"ref_6","unstructured":"Agarwal, A., Xie, B., Vovsha, I., Rambow, O., and Passonneau, R. (2011). Workshop on Languages in Social Media, Association for Computational Linguistics."},{"key":"ref_7","unstructured":"Davidov, D., Tsur, O., and Rappoport, A. (2010, January 23\u201327). Enhanced Sentiment Learning Using Twitter Hashtags and Smileys. Proceedings of the International Conference on Computational Linguistics, Posters, Beijing, China."},{"key":"ref_8","unstructured":"Jiang, L., Yu, M., Zhou, M., Liu, X., and Zhao, T. (2011, January 19\u201324). Target-dependent Twitter Sentiment Classification. Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","article-title":"MapReduce: Simplified Data Processing on Large Clusters","volume":"51","author":"Dean","year":"2008","journal-title":"Commun. ACM"},{"key":"ref_10","unstructured":"White, T. (2012). Hadoop: The Definitive Guide, O\u2019Reilly Media\/Yahoo Press. [3rd ed.]."},{"key":"ref_11","unstructured":"Karau, H., Konwinski, A., Wendell, P., and Zaharia, M. (2015). Learning Spark: Lightning-Fast Big Data Analysis, O\u2019Reilly Media."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/1500000011","article-title":"Opinion Mining and Sentiment Analysis","volume":"2","author":"Pang","year":"2008","journal-title":"Found. Trends Inf. Retr."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hu, M., and Liu, B. (2004, January 22\u201325). Mining and Summarizing Customer Reviews. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA.","DOI":"10.1145\/1014052.1014073"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhuang, L., Jing, F., and Zhu, X.Y. (2006, January 5\u201311). Movie Review Mining and Summarization. Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), Arlington, VA, USA.","DOI":"10.1145\/1183614.1183625"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, W., Yu, C., and Meng, W. (2007, January 6\u201310). Opinion Retrieval from Blogs. Proceedings of the ACM Conference on Conference on Information and Knowledge Management (CIKM), Lisbon, Portugal.","DOI":"10.1145\/1321440.1321555"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Turney, P.D. (2002, January 6\u201312). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Philadephia, PA, USA.","DOI":"10.3115\/1073083.1073153"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wilson, T., Wiebe, J., and Hoffmann, P. (2005, January 6\u20138). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT\/EMNLP), Vancouver, BC, Canada.","DOI":"10.3115\/1220575.1220619"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1162\/coli.08-012-R1-06-90","article-title":"Recognizing Contextual Polarity: An Exploration of Features for Phrase-level Sentiment Analysis","volume":"35","author":"Wilson","year":"2009","journal-title":"Comput. Linguist."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Yu, H., and Hatzivassiloglou, V. (2003, January 11\u201312). Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK.","DOI":"10.3115\/1119355.1119372"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lin, C., and He, Y. (2009, January 2\u20136). Joint Sentiment\/Topic Model for Sentiment Analysis. Proceedings of the ACM Conference on Information and Knowledge Management, Hong Kong, China.","DOI":"10.1145\/1645953.1646003"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C. (2007, January 8\u201312). Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs. Proceedings of the International Conference on World Wide Web (WWW), Banff, AB, Canada.","DOI":"10.1145\/1242572.1242596"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Pang, B., Lee, L., and Vaithyanathan, S. (2002, January 6\u20137). Thumbs up? Sentiment Classification using Machine Learning Techniques. Proceedings of the ACL Conference on Empirical methods in Natural Language Processing, Philadelphia, PA, USA.","DOI":"10.3115\/1118693.1118704"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1007\/s10791-008-9070-z","article-title":"A Machine Learning Approach to Sentiment Analysis in Multilingual Web Texts","volume":"12","author":"Boiy","year":"2009","journal-title":"Inf. Retr."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Nasukawa, T., and Yi, J. (2003, January 23\u201325). Sentiment Analysis: Capturing Favorability Using Natural Language Processing. Proceedings of the International Conference on Knowledge Capture, Sanibel Island, FL, USA.","DOI":"10.1145\/945645.945658"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ding, X., and Liu, B. (2007, January 23\u201327). The Utility of Linguistic Rules in Opinion Mining. Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands.","DOI":"10.1145\/1277741.1277921"},{"key":"ref_26","unstructured":"Xavier, U.H.R. (2013, January 25\u201328). Sentiment Analysis of Hollywood Movies on Twitter. Proceedings of the IEEE\/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Paris, France."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yamamoto, Y., Kumamoto, T., and Nadamoto, A. (2014, January 4\u20136). Role of Emoticons for Multidimensional Sentiment Analysis of Twitter. Proceedings of the International Conference on Information Integration and Web-based Applications Services (iiWAS), Hanoi, Vietnam.","DOI":"10.1145\/2684200.2684283"},{"key":"ref_28","first-page":"11315","article-title":"Twitter Sentiment Analysis with Emoticons","volume":"4","author":"Kinikar","year":"2015","journal-title":"Int. J. Eng. Comput. Sci."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chikersal, P., Poria, S., and Cambria, E. (2015, January 4\u20135). SeNTU: Sentiment Analysis of Tweets by Combining a Rule-based Classifier with Supervised Learning. Proceedings of the International Workshop on Semantic Evaluation (SemEval), Denver, CO, USA.","DOI":"10.18653\/v1\/S15-2108"},{"key":"ref_30","unstructured":"Barbosa, L., and Feng, J. (2010, January 23\u201327). Robust Sentiment Detection on Twitter from Biased and Noisy Data. Proceedings of the International Conference on Computational Linguistics: Posters, Beijing, China."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Naveed, N., Gottron, T., Kunegis, J., and Alhadi, A.C. (2011, January 15\u201317). Bad News Travel Fast: A Content-based Analysis of Interestingness on Twitter. Proceedings of the 3rd International Web Science Conference (WebSci\u201911), Koblenz, Germany.","DOI":"10.1145\/2527031.2527052"},{"key":"ref_32","unstructured":"Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., and Wilson, T. (2013, January 14\u201315). SemEval-2013 Task 2: Sentiment Analysis in Twitter. Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval@NAACL-HLT), Atlanta, GA, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Rosenthal, S., Ritter, A., Nakov, P., and Stoyanov, V. (2014, January 23\u201324). SemEval-2014 Task 9: Sentiment Analysis in Twitter. Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval@COLING), Dublin, Ireland.","DOI":"10.3115\/v1\/S14-2009"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S., Ritter, A., and Stoyanov, V. (2015, January 4\u20135). SemEval-2015 Task 10: Sentiment Analysis in Twitter. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval@NAACL-HLT), Denver, CO, USA.","DOI":"10.18653\/v1\/S15-2078"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., and Stoyanov, V. (2016, January 16\u201317). SemEval-2016 Task 4: Sentiment Analysis in Twitter. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval@NAACL-HLT), San Diego, CA, USA.","DOI":"10.18653\/v1\/S16-1001"},{"key":"ref_36","unstructured":"Lee, C., and Roth, D. (2015, January 6\u201311). Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM. Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France."},{"key":"ref_37","unstructured":"Zhuang, Y., Chin, W., Juan, Y., and Lin, C. (2015, January 19\u201322). Distributed Newton Methods for Regularized Logistic Regression. Proceedings of the 19th Pacific-Asia Conference, Advances in Knowledge Discovery and Data Mining (PAKDD), Ho Chi Minh City, Vietnam."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Sahni, T., Chandak, C., Chedeti, N.R., and Singh, M. (arXiv, 2017). Efficient Twitter Sentiment Classification using Subjective Distant Supervision, arXiv.","DOI":"10.1109\/COMSNETS.2017.7945451"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Kanavos, A., Perikos, I., Vikatos, P., Hatzilygeroudis, I., Makris, C., and Tsakalidis, A. (2014, January 10\u201312). Conversation Emotional Modeling in Social Networks. Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Limassol, Cyprus.","DOI":"10.1109\/ICTAI.2014.78"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Kanavos, A., Perikos, I., Hatzilygeroudis, I., and Tsakalidis, A. (2016, January 8\u201310). Integrating User\u2019s Emotional Behavior for Community Detection in Social Networks. Proceedings of the International Conference on Web Information Systems and Technologies (WEBIST), Rome, Italy.","DOI":"10.5220\/0005862703550362"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Baltas, A., Kanavos, A., and Tsakalidis, A. (2016, January 22\u201326). An Apache Spark Implementation for Sentiment Analysis on Twitter Data. Proceedings of the International Workshop on Algorithmic Aspects of Cloud Computing (ALGOCLOUD), Aarhus, Denmark.","DOI":"10.1007\/978-3-319-57045-7_2"},{"key":"ref_42","unstructured":"Nodarakis, N., Sioutas, S., Tsakalidis, A., and Tzimas, G. (2016, January 15\u201318). Large Scale Sentiment Analysis on Twitter with Spark. Proceedings of the EDBT\/ICDT Workshops, Bordeaux, France."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Khuc, V.N., Shivade, C., Ramnath, R., and Ramanathan, J. (2012, January 24\u201328). Towards Building Large-Scale Distributed Systems for Twitter Sentiment Analysis. Proceedings of the Annual ACM Symposium on Applied Computing, Gyeongju, Korea.","DOI":"10.1145\/2245276.2245364"},{"key":"ref_44","unstructured":"Apache Spark. Available online: http:\/\/spark.apache.org\/."},{"key":"ref_45","unstructured":"MLlib. Available online: http:\/\/spark.apache.org\/mllib\/."},{"key":"ref_46","first-page":"139","article-title":"kdANN+: A Rapid AkNN Classifier for Big Data","volume":"23","author":"Nodarakis","year":"2016","journal-title":"Trans. Large Scale Data Knowl. Cent. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Davidov, D., and Rappoport, A. (2006, January 17\u201321). Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words. Proceedings of the International Conference on Computational Linguistics, Sydney, Australia.","DOI":"10.3115\/1220175.1220213"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Space\/Time Trade-offs in Hash Coding with Allowable Errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"ref_49","unstructured":"Using Hadoop for Large Scale Analysis on Twitter: A Technical Report. Available online: http:\/\/arxiv.org\/abs\/1602.01248."},{"key":"ref_50","unstructured":"Toutanova, K., Klein, D., Manning, C.D., and Singer, Y. (June, January 31). Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. Proceedings of the HLT-NAACL, Edmonton, AB, Canada."},{"key":"ref_51","unstructured":"Twitter Developer Documentation. Available online: https:\/\/dev.Twitter.com\/rest\/public\/search."},{"key":"ref_52","unstructured":"Go, A., Bhayani, R., and Huang, L. (2009). Twitter Sentiment Classification Using Distant Supervision, Stanford University. CS224N Project Report."},{"key":"ref_53","unstructured":"Sentiment140 API. Available online: http:\/\/help.sentiment140.com\/api."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Cheng, Z., Caverlee, J., and Lee, K. (2010, January 25\u201328). You Are Where You Tweet: A Content-based Approach to Geo-locating Twitter Users. Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), Washington, DC, USA.","DOI":"10.1145\/1871437.1871535"},{"key":"ref_55","unstructured":"Twitter Cikm 2010. Available online: https:\/\/archive.org\/details\/Twitter_cikm_2010."},{"key":"ref_56","unstructured":"Twitter Sentiment Analysis Training Corpus (Dataset). Available online: http:\/\/thinknook.com\/Twitter-sentiment-analysis-training-corpus-dataset-2012-09-22\/."},{"key":"ref_57","unstructured":"Ternary Classification. Available online: https:\/\/www.crowdflower.com\/data-for-everyone\/."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Barbieri, F., and Saggion, H. (2014, January 26\u201331). Modelling Irony in Twitter: Feature Analysis and Evaluation. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland.","DOI":"10.3115\/v1\/E14-3007"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1109\/MIS.2013.28","article-title":"Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT","volume":"28","author":"Bosco","year":"2013","journal-title":"IEEE Intell. Syst."},{"key":"ref_60","unstructured":"Gonz\u00e1lez-Ib\u00e1\u00f1ez, R.I., Muresan, S., and Wacholder, N. (2011, January 19\u201324). Identifying Sarcasm in Twitter: A Closer Look. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), Portland, OR, USA."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1007\/s10579-012-9196-x","article-title":"A Multidimensional Approach for Detecting Irony in Twitter","volume":"47","author":"Reyes","year":"2013","journal-title":"Lang. Resour. Eval."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/10\/1\/33\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:29:45Z","timestamp":1760207385000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/10\/1\/33"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,3,4]]},"references-count":61,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2017,3]]}},"alternative-id":["a10010033"],"URL":"https:\/\/doi.org\/10.3390\/a10010033","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,3,4]]}}}