{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T17:40:39Z","timestamp":1772041239566,"version":"3.50.1"},"reference-count":54,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2017,3,6]],"date-time":"2017-03-06T00:00:00Z","timestamp":1488758400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Sentiment analysis has played a primary role in text classification. It is an undoubted fact that some years ago, textual information was spreading in manageable rates; however, nowadays, such information has overcome even the most ambiguous expectations and constantly grows within seconds. It is therefore quite complex to cope with the vast amount of textual data particularly if we also take the incremental production speed into account. Social media, e-commerce, news articles, comments and opinions are broadcasted on a daily basis. A rational solution, in order to handle the abundance of data, would be to build automated information processing systems, for analyzing and extracting meaningful patterns from text. The present paper focuses on sentiment analysis applied in Greek texts. Thus far, there is no wide availability of natural language processing tools for Modern Greek. Hence, a thorough analysis of Greek, from the lexical to the syntactical level, is difficult to perform. This paper attempts a different approach, based on the proven capabilities of gradient boosting, a well-known technique for dealing with high-dimensional data. The main rationale is that since English has dominated the area of preprocessing tools and there are also quite reliable translation services, we could exploit them to transform Greek tokens into English, thus assuring the precision of the translation, since the translation of large texts is not always reliable and meaningful. The new feature set of English tokens is augmented with the original set of Greek, consequently producing a high dimensional dataset that poses certain difficulties for any traditional classifier. Accordingly, we apply gradient boosting machines, an ensemble algorithm that can learn with different loss functions providing the ability to work efficiently with high dimensional data. Moreover, for the task at hand, we deal with a class imbalance issues since the distribution of sentiments in real-world applications often displays issues of inequality. For example, in political forums or electronic discussions about immigration or religion, negative comments overwhelm the positive ones. The class imbalance problem was confronted using a hybrid technique that performs a variation of under-sampling the majority class and over-sampling the minority class, respectively. Experimental results, considering different settings, such as translation of tokens against translation of sentences, consideration of limited Greek text preprocessing and omission of the translation phase, demonstrated that the proposed gradient boosting framework can effectively cope with both high-dimensional and imbalanced datasets and performs significantly better than a plethora of traditional machine learning classification approaches in terms of precision and recall measures.<\/jats:p>","DOI":"10.3390\/a10010034","type":"journal-article","created":{"date-parts":[[2017,3,9]],"date-time":"2017-03-09T06:56:43Z","timestamp":1489042603000},"page":"34","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for  Modern Greek"],"prefix":"10.3390","volume":"10","author":[{"given":"Vasileios","family":"Athanasiou","sequence":"first","affiliation":[{"name":"Artificial Intelligence Laboratory, University of the Aegean, 2 Palama Street, 83200 Samos, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7701-0141","authenticated-orcid":false,"given":"Manolis","family":"Maragoudakis","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Laboratory, University of the Aegean, 2 Palama Street, 83200 Samos, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2017,3,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Vel\u00e1squez, J.D., Palade, V., and Jain, L.C. (2012). Techniques in Web Intelligence-2, SCI 452, Springer.","DOI":"10.1007\/978-3-642-33326-2"},{"key":"ref_2","unstructured":"Maynard, D., Bontcheva, K., and Rout, D. Challenges in developing opinion mining tools for social media. Proceedings of the Workshop at LREC 2012, Istambul, Turkey."},{"key":"ref_3","unstructured":"Ravikant, N., and Rifkin, A. Why Twitter is Massively Undervalued Compared to Facebook. Available online: https:\/\/techcrunch.com\/2010\/10\/16\/why-twitter-is-massively-undervalued-compared-to-facebook\/."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Skeels, M.M., and Grudin, J. (2009, January 10\u201313). When social net-works cross boundaries: A case study of workplace use of Facebook and LinkedIn. Proceedings of the ACM 2009 International Conference on Supporting Group Work, (GROUP \u201909), Sanibel Island, FL, USA.","DOI":"10.1145\/1531674.1531689"},{"key":"ref_5","first-page":"375","article-title":"Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web","volume":"6644","author":"Abel","year":"2011","journal-title":"ESWC"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Mendes, P.N., Passant, A., Kapanipathi, P., and Sheth, A.P. (September, January 31). Linked open social signals. Proceedings of the 2010 IEEE\/WIC\/ACM International Conference on Web Intelligence and Intelligent Agent Technology, (WI-IAT \u201910), Washington, DC, USA.","DOI":"10.1109\/WI-IAT.2010.314"},{"key":"ref_7","unstructured":"Han, B., and Baldwin, T. (2011, January 19\u201324). Lexical normalisation of short text messages: Makn sens a #twitter. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, (HLT \u201911), Portland, OR, USA."},{"key":"ref_8","unstructured":"Gouws, S., Metzler, D., Cai, C., and Hovy, E. (2011, January 23). Contextual bearing on linguistic variation in social media. Proceedings of the Workshop on Languages in Social Media, (LSM \u201911), Portland, OR, USA."},{"key":"ref_9","first-page":"21","article-title":"Gradient boosting machines, a tutorial","volume":"7","author":"Natekin","year":"2011","journal-title":"Front. Neurorobot."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic Minority Over-Sampling Technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_11","unstructured":"Shlens, J. A Tutorial on Principal Component Analysis, Derivation, Discussion and Singular Value Decomposition. Available online: https:\/\/www.semanticscholar.org\/paper\/A-TUTORIAL-ON-PRINCIPAL-COMPONENT-ANALYSIS-Shlens\/a99e0f8f58af7a91e26c1eda54e0cca3e3e03df3."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/1500000011","article-title":"Opinion mining and sentiment analysis","volume":"2","author":"Pang","year":"2008","journal-title":"Found. Trends Inf. Retr."},{"key":"ref_13","unstructured":"Liu, B., and Zhang, L. (2012). Mining Text Data, Springer."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Meiselman, H. (2016). Emotion Measurement, Elsevier.","DOI":"10.1016\/B978-0-08-100508-8.00026-6"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Naaman, M., Boase, J., and Lai, C. (2010, January 6\u201310). Is it really about me? Message content in social awareness streams. Proceedings of the 2010 ACM conference on Computer Supported Cooperative Work, Savannah, GA, USA.","DOI":"10.1145\/1718918.1718953"},{"key":"ref_16","unstructured":"Bollen, J., Pepe, A., and Mao, H. (2009). Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. Comput. Sci."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2169","DOI":"10.1002\/asi.21149","article-title":"Twitter power: Tweets as electronic word of mouth","volume":"60","author":"Jansen","year":"2009","journal-title":"JASIST"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1162\/COLI_a_00049","article-title":"Lexicon-based methods for sentiment analysis","volume":"37","author":"Taboada","year":"2011","journal-title":"Comput. Linguist."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1007\/s10791-008-9070-z","article-title":"A machine learning approach to sentiment analysis in multilingual web texts","volume":"12","author":"Boiy","year":"2009","journal-title":"Inf. Retr."},{"key":"ref_20","unstructured":"Moghaddam, S., and Popowich, F. (2010). Opinion polarity identification through adjectives. CoRR."},{"key":"ref_21","first-page":"329","article-title":"A context-dependent supervised learning approach to sentiment detection in large textual databases","volume":"1","author":"Weichselbraun","year":"2010","journal-title":"J. Inf. Data Manag."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1162\/coli.08-012-R1-06-90","article-title":"Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis","volume":"35","author":"Wilson","year":"2009","journal-title":"Comput. Linguist."},{"key":"ref_23","unstructured":"Gindl, S., Weichselbraun, A., and Scharl, A. (2010, January 16\u201320). Cross-domain contextualization of sentiment lexicons. Proceedings of the 19th European Conference on Artificial Intelligence (ECAI-2010), Lisbon, Portugal."},{"key":"ref_24","unstructured":"Pak, A., and Paroubek, P. (2010, January 15\u201416). Twitter Based System: Using Twitter for Disambiguating Sentiment Ambiguous Adjectives. Proceedings of the 5th International Workshop on Semantic Evaluation, Los Angeles, CA, USA."},{"key":"ref_25","unstructured":"Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., and Liu, B. Combining Lexicon-Based and Learning-Based Methods for Twitter Sentiment Analysis Technical Report HPL-2011-89, HP 21 June 2011. Available online: http:\/\/www.hpl.hp.com\/techreports\/2011\/HPL-2011-89.html."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Politopoulou, V., and Maragoudakis, M. (2013, January 13\u201316). On Mining Opinions from Social Media, Communications in Computer and Information Science, Engineering Applications of Neural Networks. Proceedings of the Lazaros Iliadis, Harris Papadopoulos, Chrisina Jayne, Halkidiki, Greece.","DOI":"10.1007\/978-3-642-41013-0_49"},{"key":"ref_27","unstructured":"Maynard, D., and Funk, A. (2011, January 29\u201330). Automatic detection of political opinions in tweets. Proceedings of the 8th International Conference on the Semantic Web (ESWC 2011), Heraklion, Crete, Greece."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000006","article-title":"Learning deep architectures for AI","volume":"2","author":"Bengio","year":"2009","journal-title":"Found. Trends Mach. Learn."},{"key":"ref_29","unstructured":"Go, A., Bhayani, R., and Huang, L. (2009). Twitter Sentiment Classification Using Distant Supervision, Stanford University. Technical Report CS224N Project Report."},{"key":"ref_30","unstructured":"Liu, X., Zhang, S., Wei, F., and Zhou, M. (2011, January 19\u201324). Recognizing named entities in tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT \u201911), Portland, Oregon."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"855","DOI":"10.1109\/TPAMI.2008.137","article-title":"A Novel Connectionist System for Improved Unconstrained Handwriting Recognition","volume":"31","author":"Graves","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","first-page":"211","article-title":"Sentilo: Frame-based sentiment analysis","volume":"7","author":"Recupero","year":"2014","journal-title":"Cognit. Comput."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/MCI.2013.2291688","article-title":"Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool","volume":"9","author":"Gangemi","year":"2014","journal-title":"IEEE Comput. Intell. Mag."},{"key":"ref_34","first-page":"245","article-title":"A semantic web based core engine to efficiently perform sentiment analysis","volume":"Volume 8798","author":"Presutti","year":"2014","journal-title":"ESWC Satellite Events"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kalamatianos, G., Malis, D., and Arampatzis, A. (2015, January 1\u20133). Sentiment analysis of greek tweets and hashtags using a sentiment lexicon. Proceedings of the 19th Panhellenic Conference on Informatics, Athens, Greece.","DOI":"10.1145\/2801948.2802010"},{"key":"ref_36","unstructured":"Burnside, G., Papadopoulos, S., and Petkos, G. D2.3 Social Stream Mining Framework. Available online: http:\/\/www.socialsensor.eu\/images\/D2.3.pdf."},{"key":"ref_37","unstructured":"Triantafyllides, G. (1998). Dictionary of Standard Modern Greek, Institute for Modern Greek Studies of the Aristotle University of Thessaloniki."},{"key":"ref_38","unstructured":"Markopoulos, G., Mikros, G., Iliadi, A., and Liontos, M. (2015). Springer Proceedings in Business and Economics, Springer."},{"key":"ref_39","first-page":"327","article-title":"Sentiment analysis for reputation management: Mining the Greek web","volume":"Volume 8445","author":"Likas","year":"2014","journal-title":"Artificial Intelligence: Methods and Applications"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1504\/IJSNM.2013.059090","article-title":"Political sentiment analysis of tweets before and after the Greek elections of May 2012","volume":"1","author":"Kermanidis","year":"2013","journal-title":"Int. J. Soc. Netw. Min."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Giouli, V., and Fotopoulou, A. (2014, January 24). Linguistically motivated language resources for sentiment analysis. Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing, Dublin, Ireland.","DOI":"10.3115\/v1\/W14-5806"},{"key":"ref_42","unstructured":"Mihalcea, R., Banea, C., and Wiebe, J. (2007, January 23\u201330). Learning multilingual subjective language via cross-lingual projections. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech."},{"key":"ref_43","first-page":"95","article-title":"How translation alters sentiment","volume":"55","author":"Mohammad","year":"2016","journal-title":"J. Arti. Intell. Res."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.csl.2013.03.004","article-title":"Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis","volume":"28","author":"Balahur","year":"2014","journal-title":"Comput. Speech Lang."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Chen, B., and Zhu, X. (2014, January 26\u201330). Bilingual sentiment consistency for statistical machine translation. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden.","DOI":"10.3115\/v1\/E14-1064"},{"key":"ref_46","unstructured":"Gabrilovich, E., and Markovitch, S. (August, January 30). Feature generation for text categorization using world knowledge. Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI\u201905), Edinburgh, UK."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1023\/A:1014046307775","article-title":"Feature Generation Using General Constructor Functions","volume":"49","author":"Markovitch","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_48","unstructured":"Hu, Y., and Kibler, D. A Wrapper Approach for Constructive Induction. Available online: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.41.9922."},{"key":"ref_49","unstructured":"Murphy, P., and Pazzani, M. ID2-of-3: Constructive Induction of M-of-N Concepts for Discriminators in Decision Trees. Available online: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.144.6995."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1023\/A:1002632712378","article-title":"Integrating linguistic resources in TC through WSD","volume":"35","author":"Buenaga","year":"2001","journal-title":"Comput. Hum."},{"key":"ref_51","first-page":"25","article-title":"Handling imbalanced datasets: A review","volume":"30","author":"Kotsiantis","year":"2006","journal-title":"GESTS Int. Trans. Comput. Sci. Eng."},{"key":"ref_52","unstructured":"Ntais, G. (2006). Development of a Stemmer for the Greek Language. [Master\u2019s Thesis, Department of Computer and System Sciences, Royal Institute of Technology, Stockholm University]."},{"key":"ref_53","first-page":"18","article-title":"Feature Selection based on Information Gain","volume":"2","author":"Azhagusundari","year":"2013","journal-title":"Int. J. Innov. Technol. Explor. Eng. (IJITEE)"},{"key":"ref_54","unstructured":"Athanasiou, V., and Maragoudakis, M. (2016, January 16\u201318). Dealing with High Dimensional Sentiment Data Using Gradient Boosting Machines. Proceedings of the 12th IFIP WG 12.5 International Conference and Workshops, (AIAI 2016), Thessaloniki, Greece."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/10\/1\/34\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:29:49Z","timestamp":1760207389000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/10\/1\/34"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,3,6]]},"references-count":54,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2017,3]]}},"alternative-id":["a10010034"],"URL":"https:\/\/doi.org\/10.3390\/a10010034","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,3,6]]}}}