{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:08:32Z","timestamp":1773799712969,"version":"3.50.1"},"reference-count":63,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,5,17]],"date-time":"2021-05-17T00:00:00Z","timestamp":1621209600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Mining social web text has been at the heart of the Natural Language Processing and Data Mining research community in the last 15 years. Though most of the reported work is on widely spoken languages, such as English, the significance of approaches that deal with less commonly spoken languages, such as Greek, is evident for reasons of preserving and documenting minority languages, cultural and ethnic diversity, and identifying intercultural similarities and differences. The present work aims at identifying, documenting and comparing social text data sets, as well as mining techniques and applications on social web text that target Modern Greek, focusing on the arising challenges and the potential for future research in the specific less widely spoken language.<\/jats:p>","DOI":"10.3390\/data6050052","type":"journal-article","created":{"date-parts":[[2021,5,17]],"date-time":"2021-05-17T04:25:07Z","timestamp":1621225507000},"page":"52","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["The Modern Greek Language on the Social Web: A Survey of Data Sets and Mining Applications"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0118-8821","authenticated-orcid":false,"given":"Maria Nefeli","family":"Nikiforos","sequence":"first","affiliation":[{"name":"Department of Informatics, Ionian University, 49132 Corfu, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4039-7714","authenticated-orcid":false,"given":"Yorghos","family":"Voutos","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49132 Corfu, Greece"}]},{"given":"Anthi","family":"Drougani","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49132 Corfu, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6916-3129","authenticated-orcid":false,"given":"Phivos","family":"Mylonas","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49132 Corfu, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3270-5078","authenticated-orcid":false,"given":"Katia Lida","family":"Kermanidis","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49132 Corfu, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2021,5,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Alexandridis, G., Michalakis, K., Aliprantis, J., Polydoras, P., Tsantilas, P., and Caridakis, G. (2020, January 5\u20137). A Deep Learning Approach to Aspect-Based Sentiment Prediction. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Neos Marmaras, Greece.","DOI":"10.1007\/978-3-030-49161-1_33"},{"key":"ref_2","unstructured":"Nikiforos, M.N., and Kermanidis, K.L. (2020, January 11\u201316). A Supervised Part-Of-Speech Tagger for the Greek Language of the Social Web. Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Markopoulos, G., Mikros, G., Iliadi, A., and Liontos, M. (2015). Sentiment analysis of hotel reviews in Greek: A comparison of unigram features. Cultural Tourism in a Digital Era, Springer.","DOI":"10.1007\/978-3-319-15859-4_31"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1007\/s40692-020-00166-5","article-title":"Virtual learning communities (VLCs) rethinking: Influence on behavior modification\u2014Bullying detection through machine learning and natural language processing","volume":"7","author":"Nikiforos","year":"2020","journal-title":"J. Comput. Educ."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Petasis, G., Spiliotopoulos, D., Tsirakis, N., and Tsantilas, P. (2014, January 15\u201317). Sentiment analysis for reputation management: Mining the greek web. Proceedings of the Hellenic Conference on Artificial Intelligence, Ioannina, Greece.","DOI":"10.1007\/978-3-319-07064-3_26"},{"key":"ref_6","unstructured":"Pitenis, Z., Zampieri, M., and Ranasinghe, T. (2020). Offensive language identification in greek. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sababa, H., and Stassopoulou, A. (2018, January 15\u201318). A classifier to distinguish between cypriot greek and standard modern greek. Proceedings of the 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), Valencia, Spain.","DOI":"10.1109\/SNAMS.2018.8554709"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Tsakalidis, A., Aletras, N., Cristea, A.I., and Liakata, M. (2018, January 22\u201326). Nowcasting the stance of social media users in a sudden vote: The case of the Greek Referendum. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy.","DOI":"10.1145\/3269206.3271783"},{"key":"ref_9","unstructured":"Vallet, D., Fernandez, M., Castells, P., Mylonas, P., and Avrithis, Y. (September, January 28). A contextual personalization approach based on ontological knowledge. Proceedings of the 17th European Conference on Artificial Intelligence (ECAI 2006), Contexts and Ontologies: Theory, Practice and Applications, Riva del Garda, Italy."},{"key":"ref_10","first-page":"21","article-title":"Authorship attribution and gender identification in Greek blogs","volume":"21","author":"Mikros","year":"2012","journal-title":"Methods Appl. Quant. Linguist."},{"key":"ref_11","unstructured":"Baxevanakis, S., Gavras, S., Mouratidis, D., and Kermanidis, K.L. (July, January 30). A machine learning approach for gender identification of Greek tweet authors. Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Greece."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kalamatianos, G., Mallis, D., Symeonidis, S., and Arampatzis, A. (2015, January 1\u20133). Sentiment analysis of Greek tweets and hashtags using a sentiment lexicon. Proceedings of the 19th Panhellenic Conference on Informatics, Athens, Greece.","DOI":"10.1145\/2801948.2802010"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Goudas, T., Louizos, C., Petasis, G., and Karkaletsis, V. (2014). Argument extraction from news, blogs, and social media. Hellenic Conference on Artificial Intelligence, Springer.","DOI":"10.1007\/978-3-319-07064-3_23"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1540024","DOI":"10.1142\/S0218213015400242","article-title":"Argument extraction from news, blogs, and the social web","volume":"24","author":"Goudas","year":"2015","journal-title":"Int. J. Artif. Intell. Tools"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Sardianos, C., Katakis, I.M., Petasis, G., and Karkaletsis, V. (2015, January 17\u201321). Argument extraction from news. Proceedings of the 2nd Workshop on Argumentation Mining, Lisbon, Portugal.","DOI":"10.3115\/v1\/W15-0508"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Nikiforos, S., Tzanavaris, S., and Kermanidis, K.L. (2020, January 25\u201327). Bullying Behavior and Project-based Activities in Virtual Learning Communities (VLCs). Proceedings of the 2020 5th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Corfu, Greece.","DOI":"10.1109\/SEEDA-CECNSM49515.2020.9221829"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/s10639-020-10270-9","article-title":"Virtual Learning Communities (VLCs) rethinking: From negotiation and conflict to prompting and inspiring","volume":"26","author":"Tzanavaris","year":"2020","journal-title":"Educ. Inf. Technol."},{"key":"ref_18","unstructured":"Pontiki, M., Gavriilidou, M., Gkoumas, D., and Piperidis, S. (2020, January 11\u201316). Verbal Aggression as an Indicator of Xenophobic Attitudes in Greek Twitter during and after the Financial Crisis. Proceedings of the Workshop about Language Resources for the SSH Cloud, Marseille, France."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1007\/s10462-016-9508-4","article-title":"Multilingual sentiment analysis: From formal to informal and scarce resource languages","volume":"48","author":"Lo","year":"2017","journal-title":"Artif. Intell. Rev."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Cambria, E., Das, D., Bandyopadhyay, S., and Feraco, A. (2017). Affective computing and sentiment analysis. A Practical Guide to Sentiment Analysis, Springer.","DOI":"10.1007\/978-3-319-55394-8"},{"key":"ref_21","unstructured":"Alpaydin, E. (2020). Introduction to Machine Learning, MIT Press."},{"key":"ref_22","unstructured":"Russell, S., and Norvig, P. (2003). Artificial Intelligence: A Modern Approach, Prentice Hall. [2nd ed.]."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","article-title":"Reinforcement learning: A survey","volume":"4","author":"Kaelbling","year":"1996","journal-title":"J. Artif. Intell. Res."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1016\/S1364-6613(99)01331-5","article-title":"Reinforcement learning: An introduction, by Sutton, RS and Barto, AG","volume":"3","author":"Montague","year":"1999","journal-title":"Trends Cogn. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Van Otterlo, M., and Wiering, M. (2012). Reinforcement learning and markov decision processes. Reinforcement Learning, Springer.","DOI":"10.1007\/978-3-642-27645-3_1"},{"key":"ref_26","unstructured":"Petasis, G., Karkaletsis, V., Paliouras, G., Androutsopoulos, I., and Spyropoulos, C.D. (2002). Ellogon: A new text engineering platform. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Goutte, C., and Gaussier, E. (2005, January 21\u201323). A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. Proceedings of the European Conference on Information Retrieval, Santiago de Compostela, Spain.","DOI":"10.1007\/978-3-540-31865-1_25"},{"key":"ref_28","unstructured":"Thanopoulos, A., Kermanidis, K., and Fakotakis, N. (September, January 28). Challenges in extracting terminology from Modern Greek texts. Proceedings of the 3rd International Workshop on Text-Based Information Retrieval (TIR-06), Riva del Garda, Italy."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Clackson, J. (2007). Indo-European Linguistics: An Introduction, Cambridge University Press.","DOI":"10.1017\/CBO9780511808616"},{"key":"ref_30","first-page":"511","article-title":"Reconstructing constructional semantics: The dative subject construction in old norse-icelandic, latin, ancient greek, old russian and old lithuanian","volume":"36","author":"Smitherman","year":"2012","journal-title":"Stud. Lang. Int. J. Spons. Found. Found. Lang."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sido, J., Pra\u017e\u00e1k, O., P\u0159ib\u00e1\u0148, P., Pa\u0161ek, J., Sej\u00e1k, M., and Konop\u00edk, M. (2021). Czert\u2013Czech BERT-like Model for Language Representation. arXiv.","DOI":"10.26615\/978-954-452-072-4_149"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3421504","article-title":"A Survey of Offensive Language Detection for the Arabic Language","volume":"20","author":"Husain","year":"2021","journal-title":"ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP)"},{"key":"ref_33","unstructured":"Lopez, C.E., Vasu, M., and Gallemore, C. (2020). Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Vilares, D., Peng, H., Satapathy, R., and Cambria, E. (2018, January 18\u201321). BabelSenticNet: A commonsense reasoning framework for multilingual sentiment analysis. Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India.","DOI":"10.1109\/SSCI.2018.8628718"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Athanasiou, V., and Maragoudakis, M. (2017). A novel, gradient boosting framework for sentiment analysis in languages where NLP resources are not plentiful: A case study for modern Greek. Algorithms, 10.","DOI":"10.3390\/a10010034"},{"key":"ref_36","unstructured":"Chatzikyriakidis, S. (2010). Clitics in Four Dialects of Modern Greek: A Dynamic Account. [Ph.D Thesis, University of London]."},{"key":"ref_37","unstructured":"Sosoni, V., Kermanidis, K.L., Stasimioti, M., Naskos, T., Takoulidou, E., Van Zaanen, M., Castilho, S., Georgakopoulou, P., Kordoni, V., and Egg, M. (2018, January 7\u201312). Translation crowdsourcing: Creating a multilingual corpus of online educational content. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1109\/MIS.2013.30","article-title":"New avenues in opinion mining and sentiment analysis","volume":"28","author":"Cambria","year":"2013","journal-title":"IEEE Intell. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1504\/IJSNM.2013.059090","article-title":"Political sentiment analysis of tweets before and after the Greek elections of May 2012","volume":"1","author":"Kermanidis","year":"2013","journal-title":"Int. J. Soc. Netw. Min."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/j.engappai.2016.01.007","article-title":"A comparison between semi-supervised and supervised text mining techniques on detecting irony in greek political tweets","volume":"51","author":"Charalampakis","year":"2016","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Charalampakis, B., Spathis, D., Kouslis, E., and Kermanidis, K. (2015, January 25\u201328). Detecting irony on greek political tweets: A text mining approach. Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS), Rhodes, Greece.","DOI":"10.1145\/2797143.2797183"},{"key":"ref_42","unstructured":"Papanikolaou, K., Papageorgiou, H., Papasarantopoulos, N., Stathopoulou, T., and Papastefanatos, G. (2016, January 17\u201320). \u201cJust the Facts\u201d with PALOMAR: Detecting Protest Events in Media Outlets and Twitter. Proceedings of the International AAAI Conference on Web and Social Media, Cologne, Germany."},{"key":"ref_43","unstructured":"Papanikolaou, K., and Papageorgiou, H. (2020, January 11\u201316). Protest Event Analysis: A Longitudinal Analysis for Greece. Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020, Marseille, France."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Antonakaki, D., Spiliotopoulos, D., Samaras, C.V., Pratikakis, P., Ioannidis, S., and Fragopoulou, P. (2017). Social media analysis during political turbulence. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0186836"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Tziovas, D. (2017). Greece in Crisis: The Cultural Politics of Austerity, Bloomsbury Publishing.","DOI":"10.5040\/9781350986657"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Bond, F., Fellbaum, C., Hsieh, S.K., Huang, C.R., Pease, A., and Vossen, P. (2014). A multilingual lexico-semantic database and ontology. Towards the Multilingual Semantic Web, Springer.","DOI":"10.1007\/978-3-662-43585-4_15"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Alessia, D., Ferri, F., Grifoni, P., and Guzzo, T. (2015). Approaches, tools and applications for sentiment analysis implementation. Int. J. Comput. Appl., 125.","DOI":"10.5120\/ijca2015905866"},{"key":"ref_48","first-page":"283","article-title":"Passive crowdsourcing in government using social media","volume":"8","author":"Charalabidis","year":"2014","journal-title":"Transform. Gov. People Process Policy"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1177\/002224299405800204","article-title":"Competitive marketing behavior in industrial markets","volume":"58","author":"Ramaswamy","year":"1994","journal-title":"J. Mark."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"782","DOI":"10.1177\/0165551515610513","article-title":"Arabic tweets sentiment analysis\u2013a hybrid scheme","volume":"42","author":"Aldayel","year":"2016","journal-title":"J. Inf. Sci."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Psomakelis, E., Tserpes, K., Anagnostopoulos, D., and Varvarigou, T. (2015). Comparing methods for twitter sentiment analysis. arXiv.","DOI":"10.5220\/0005075302250232"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Tripathi, P., Vishwakarma, S.K., and Lala, A. (2015, January 12\u201314). Sentiment analysis of english tweets using rapid miner. Proceedings of the 2015 International Conference on Computational Intelligence and Communication Networks (CICN), Jabalpur, India.","DOI":"10.1109\/CICN.2015.137"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Shoemark, P., Kirby, J., and Goldwater, S. (2018, January 1). Inducing a lexicon of sociolinguistic variables from code-mixed text. Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, Brussels, Belgium.","DOI":"10.18653\/v1\/W18-6101"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Trye, D., Calude, A.S., Bravo-Marquez, F., and Keegan, T.T.A.G. (2019, January 1\u20133). M\u0101ori loanwords: A corpus of New Zealand English tweets. Proceedings of the Vocab@ Leuven 2019, Florence, Italy.","DOI":"10.18653\/v1\/P19-2018"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Erdmann, A., and Habash, N. (2018, January 31). Complementary strategies for low resourced morphological modeling. Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, Brussels, Belgium.","DOI":"10.18653\/v1\/W18-5806"},{"key":"ref_56","unstructured":"Foster, J., Cetinoglu, O., Wagner, J., Le Roux, J., Hogan, S., Nivre, J., Hogan, D., and Van Genabith, J. (2011, January 7\u201311). # hardtoparse: POS Tagging and Parsing the Twitterverse. Proceedings of the AAAI-11 Workshop on Analyzing Microtext, San Francisco, CA, USA."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.csl.2017.12.004","article-title":"An empirical study on POS tagging for Vietnamese social media text","volume":"50","author":"Bach","year":"2018","journal-title":"Comput. Speech Lang."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.tele.2017.10.006","article-title":"Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis","volume":"35","author":"Ayvaz","year":"2018","journal-title":"Telemat. Inform."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.neunet.2015.02.012","article-title":"Multilingual part-of-speech tagging with weightless neural networks","volume":"66","author":"Carneiro","year":"2015","journal-title":"Neural Netw."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Gimpel, K., Schneider, N., O\u2019Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., and Smith, N.A. (2010, January 19\u201324). Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, OR, USA.","DOI":"10.21236\/ADA547371"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"012090","DOI":"10.1088\/1742-6596\/1437\/1\/012090","article-title":"HRCE: Detecting Food Security Events in Social Media","volume":"1437","author":"Gao","year":"2020","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Popescu, A.M., and Pennacchiotti, M. (2010, January 26\u201330). Detecting controversial events from twitter. Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada.","DOI":"10.1145\/1871437.1871751"},{"key":"ref_63","unstructured":"Popescu, A.M., Pennacchiotti, M., and Paranjpe, D. (April, January 28). Extracting events and event descriptions from twitter. Proceedings of the 20th International Conference Companion on World Wide Web, Hyderabad, India."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/5\/52\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:02:27Z","timestamp":1760162547000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/5\/52"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,17]]},"references-count":63,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["data6050052"],"URL":"https:\/\/doi.org\/10.3390\/data6050052","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,17]]}}}