{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T14:48:50Z","timestamp":1777301330450,"version":"3.51.4"},"reference-count":62,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T00:00:00Z","timestamp":1641254400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T00:00:00Z","timestamp":1641254400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100013209","name":"Hellenic Foundation for Research and Innovation","doi-asserted-by":"publisher","award":["514"],"award-info":[{"award-number":["514"]}],"id":[{"id":"10.13039\/501100013209","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Online hate speech is a recent problem in our society that is rising at a steady pace by leveraging the vulnerabilities of the corresponding regimes that characterise most social media platforms. This phenomenon is primarily fostered by offensive comments, either during user interaction or in the form of a posted multimedia context. Nowadays, giant corporations own platforms where millions of users log in every day, and protection from exposure to similar phenomena appears to be necessary to comply with the corresponding legislation and maintain a high level of service quality. A robust and reliable system for detecting and preventing the uploading of relevant content will have a significant impact on our digitally interconnected society. Several aspects of our daily lives are undeniably linked to our social profiles, making us vulnerable to abusive behaviours. As a result, the lack of accurate hate speech detection mechanisms would severely degrade the overall user experience, although its erroneous operation would pose many ethical concerns. In this paper, we present \u2018ETHOS\u2019 (multi-labEl haTe speecH detectiOn dataSet), a textual dataset with two variants: binary and multi-label, based on YouTube and Reddit comments validated using the Figure-Eight crowdsourcing platform. Furthermore, we present the annotation protocol used to create this dataset: an active sampling procedure for balancing our data in relation to the various aspects defined. Our key assumption is that, even gaining a small amount of labelled data from such a time-consuming process, we can guarantee hate speech occurrences in the examined material.<\/jats:p>","DOI":"10.1007\/s40747-021-00608-2","type":"journal-article","created":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T07:03:04Z","timestamp":1641279784000},"page":"4663-4678","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":73,"title":["ETHOS: a multi-label hate speech detection dataset"],"prefix":"10.1007","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7765-7903","authenticated-orcid":false,"given":"Ioannis","family":"Mollas","sequence":"first","affiliation":[]},{"given":"Zoe","family":"Chrysopoulou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5307-6186","authenticated-orcid":false,"given":"Stamatis","family":"Karlos","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7879-669X","authenticated-orcid":false,"given":"Grigorios","family":"Tsoumakas","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,4]]},"reference":[{"key":"608_CR1","doi-asserted-by":"publisher","unstructured":"Alharthi DN, Regan AC (2020) Social engineering defense mechanisms: a taxonomy and a survey of employees\u2019 awareness level. In: Arai K, Kapoor S, Bhatia R (eds) Intelligent computing - proceedings of the 2020 computing conference, volume 1, SAI, London, UK, 16\u201317 July 2020, Advances in Intelligent Systems and Computing, vol. 1228, pp. 521\u2013541. Springer (2020). https:\/\/doi.org\/10.1007\/978-3-030-52249-0_35","DOI":"10.1007\/978-3-030-52249-0_35"},{"issue":"1","key":"608_CR2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/2190-8532-2-1","volume":"2","author":"T Almeida","year":"2013","unstructured":"Almeida T, Hidalgo JMG, Silva TP (2013) Towards sms spam filtering: results under a new dataset. Int J Inform Secur Sci 2(1):1\u201318","journal-title":"Int J Inform Secur Sci"},{"key":"608_CR3","doi-asserted-by":"publisher","unstructured":"Anagnostou A, Mollas I, Tsoumakas G (2018) Hatebusters: a web application for actively reporting youtube hate speech. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 5796\u20135798. International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden. https:\/\/doi.org\/10.24963\/ijcai.2018\/841","DOI":"10.24963\/ijcai.2018\/841"},{"key":"608_CR4","unstructured":"Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, May 7-9, 2015, Conference Track Proceedings. San Diego, California, USA"},{"key":"608_CR5","doi-asserted-by":"publisher","unstructured":"Benites F, Sapozhnikova E (2015) Haram: a hierarchical aram neural network for large-scale text classification. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 847\u2013854. IEEE Computer Society, USA. https:\/\/doi.org\/10.1109\/ICDMW.2015.14","DOI":"10.1109\/ICDMW.2015.14"},{"key":"608_CR6","doi-asserted-by":"publisher","unstructured":"Chen J, Mao J, Liu Y, Zhang M, Ma S (2019) Tiangong-st: a new dataset with large-scale refined real-world web search sessions. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, November 3-7, 2019 pp. 2485\u20132488. ACM, Beijing, China. https:\/\/doi.org\/10.1145\/3357384.3358158","DOI":"10.1145\/3357384.3358158"},{"key":"608_CR7","doi-asserted-by":"crossref","unstructured":"Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM \u201917, pp. 512\u2013515. AAAI Press, Montreal, Canada","DOI":"10.1609\/icwsm.v11i1.14955"},{"key":"608_CR8","doi-asserted-by":"publisher","unstructured":"de\u00a0Gibert O, Perez N, Garc\u00eda-Pablos A, Cuadros M (2018) Hate speech dataset from a white supremacy forum. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). https:\/\/doi.org\/10.18653\/v1\/w18-5102","DOI":"10.18653\/v1\/w18-5102"},{"key":"608_CR9","unstructured":"Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1), pp. 4171\u20134186. Association for Computational Linguistics"},{"key":"608_CR10","unstructured":"Dinakar K, Picard RW, Lieberman H (2015) Common sense reasoning for detection, prevention, and mitigation of cyberbullying (extended abstract). In: Yang Q, Wooldridge MJ (eds) Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25\u201331, 2015, pp. 4168\u20134172. AAAI Press. http:\/\/ijcai.org\/Abstract\/15\/589"},{"key":"608_CR11","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1186\/s13326-016-0073-1","volume":"7","author":"K Dram\u00e9","year":"2016","unstructured":"Dram\u00e9 K, Mougin F, Diallo G (2016) Large scale biomedical texts classification: a knn and an esa-based approaches. J Biomed Semant 7:40. https:\/\/doi.org\/10.1186\/s13326-016-0073-1","journal-title":"J Biomed Semant"},{"key":"608_CR12","doi-asserted-by":"crossref","unstructured":"Fersini E, Rosso P, Anzovino M (2018) Overview of the task on automatic misogyny identification at ibereval 2018. In: IberEval@ SEPLN, pp. 214\u2013228","DOI":"10.4000\/books.aaccademia.4497"},{"key":"608_CR13","unstructured":"Friedman J (1999) Stochastic gradient boosting. department of statistics. Tech. rep., Stanford University, Technical Report, San Francisco, CA"},{"issue":"4","key":"608_CR14","doi-asserted-by":"publisher","first-page":"771","DOI":"10.1007\/s00779-018-1142-5","volume":"22","author":"M Furini","year":"2018","unstructured":"Furini M, Montangero M (2018) Sentiment analysis and twitter: a game proposal. Pers. Ubiquitous Comput. 22(4):771\u2013785. https:\/\/doi.org\/10.1007\/s00779-018-1142-5","journal-title":"Pers. Ubiquitous Comput."},{"key":"608_CR15","doi-asserted-by":"publisher","unstructured":"Gamb\u00e4ck B, Sikdar UK (2017) Using convolutional neural networks to classify hate-speech. In: Waseem Z, Chung WHK, Hovy D, Tetreault JR (eds) Proceedings of the First Workshop on Abusive Language Online, ALW@ACL 2017, Vancouver, BC, Canada, August 4, 2017, pp. 85\u201390. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/w17-3013","DOI":"10.18653\/v1\/w17-3013"},{"key":"608_CR16","doi-asserted-by":"crossref","unstructured":"Gao L, Huang R (2017) Detecting online hate speech using context aware models. In: RANLP","DOI":"10.26615\/978-954-452-049-6_036"},{"key":"608_CR17","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-4467-2","volume-title":"Predictive inference","author":"S Geisser","year":"1993","unstructured":"Geisser S (1993) Predictive inference, vol 55. CRC Press, Boca Raton"},{"key":"608_CR18","unstructured":"Haagsma H, Bos J, Nissim M (2020) MAGPIE: a large corpus of potentially idiomatic expressions. In: Calzolari N, B\u00e9chet F, Blache P, Choukri K, Cieri C, Declerck T, Goggi S, Isahara HH, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S (eds) Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020, pp. 279\u2013287. European Language Resources Association. https:\/\/www.aclweb.org\/anthology\/2020.lrec-1.35\/"},{"key":"608_CR19","doi-asserted-by":"publisher","unstructured":"Hoang T, Vo KD, Nejdl W (2018) W2E: a worldwide-event benchmark dataset for topic detection and tracking. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22\u201326, 2018, pp. 1847\u20131850. ACM. https:\/\/doi.org\/10.1145\/3269206.3269309","DOI":"10.1145\/3269206.3269309"},{"key":"608_CR20","unstructured":"Inc., M.: Kappa statistics for attribute agreement analysis. Available at https:\/\/support.minitab.com\/en-us\/minitab\/18\/help-and-how-to\/quality-and-process-improvement\/measurement-system-analysis\/how-to\/attribute-agreement-analysis\/attribute-agreement-analysis\/interpret-the-results\/all-statistics-and-graphs\/kappa-statistics\/ (2021\/04\/17)"},{"key":"608_CR21","doi-asserted-by":"publisher","unstructured":"Jirotka M, Stahl BC (2020) The need for responsible technology. J Respons Technol 1: 100002. https:\/\/doi.org\/10.1016\/j.jrt.2020.100002. http:\/\/www.sciencedirect.com\/science\/article\/pii\/S2666659620300020","DOI":"10.1016\/j.jrt.2020.100002"},{"key":"608_CR22","unstructured":"Joulin A, Grave E, Bojanowski P, Douze M, J\u00e9gou H, Mikolov T (2016) Fasttext.zip: compressing text classification models"},{"key":"608_CR23","doi-asserted-by":"publisher","unstructured":"Karlos S, Kanas VG, Aridas CK, Fazakis N, Kotsiantis S (2019) Combining active learning with self-train algorithm for classification of multimodal problems. In: IISA 2019, Patras, Greece, July 15-17, 2019, pp. 1\u20138. https:\/\/doi.org\/10.1109\/IISA.2019.8900724","DOI":"10.1109\/IISA.2019.8900724"},{"key":"608_CR24","doi-asserted-by":"publisher","unstructured":"Kim S, Kim D, Cho M, Kwak S (2020) Proxy anchor loss for deep metric learning. In: 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 3235\u20133244. IEEE. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00330","DOI":"10.1109\/CVPR42600.2020.00330"},{"issue":"2\u20133","key":"608_CR25","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1007\/s10994-015-5504-1","volume":"100","author":"G Krempl","year":"2015","unstructured":"Krempl G, Kottke D, Lemaire V (2015) Optimised probabilistic active learning (OPAL) - for fast, non-myopic, cost-sensitive active classification. Mach Learn 100(2\u20133):449\u2013476. https:\/\/doi.org\/10.1007\/s10994-015-5504-1","journal-title":"Mach Learn"},{"issue":"4","key":"608_CR26","doi-asserted-by":"publisher","first-page":"913","DOI":"10.1007\/s11390-020-9487-4","volume":"35","author":"P Kumar","year":"2020","unstructured":"Kumar P, Gupta A (2020) Active learning query strategies for classification, regression, and clustering: a survey. J Comput Sci Technol 35(4):913\u2013945. https:\/\/doi.org\/10.1007\/s11390-020-9487-4","journal-title":"J Comput Sci Technol"},{"key":"608_CR27","unstructured":"Kumari K, Singh JP (2020) Ai_ml_nit_patna @hasoc 2020: BERT models for hate speech identification in indo-european languages. In: Mehta P, Mandl T, Majumder P, Mitra M (eds) Working notes of FIRE 2020\u2014forum for information retrieval evaluation, Hyderabad, India, December 16\u201320, 2020, CEUR Workshop Proceedings, vol. 2826, pp. 319\u2013324. CEUR-WS.org. http:\/\/ceur-ws.org\/Vol-2826\/T2-29.pdf"},{"key":"608_CR28","doi-asserted-by":"publisher","unstructured":"Kumari K, Singh JP (2021) Identification of cyberbullying on multi-modal social media posts using genetic algorithm. Trans Emerg Telecommun Technol 32(2). https:\/\/doi.org\/10.1002\/ett.3907","DOI":"10.1002\/ett.3907"},{"key":"608_CR29","unstructured":"Kumari K, Singh JP (May 2020) Ai_ml_nit_patna @ TRAC - 2: Deep learning approach for multi-lingual aggression identification. In: Kumar R, Ojha AK, Lahiri B, Zampieri M, Malmasi S, Murdock V, Kadar D (eds) Proceedings of the second workshop on trolling, aggression and cyberbullying, TRAC@LREC 2020, Marseille, France, pp. 113\u2013119. European Language Resources Association (ELRA) (2020). https:\/\/aclanthology.org\/2020.trac-1.18\/"},{"key":"608_CR30","doi-asserted-by":"publisher","unstructured":"Ljube\u0161i\u0107 N, Erjavec T, Fi\u0161er D (2018) Datasets of slovene and croatian moderated news comments. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pp. 124\u2013131. Association for Computational Linguistics, Brussels, Belgium. https:\/\/doi.org\/10.18653\/v1\/W18-5116. https:\/\/www.aclweb.org\/anthology\/W18-5116","DOI":"10.18653\/v1\/W18-5116"},{"key":"608_CR31","unstructured":"McCallum A, Nigam K, et\u00a0al. (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol. 752, pp. 41\u201348. Citeseer"},{"issue":"11","key":"608_CR32","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1145\/219717.219748","volume":"38","author":"GA Miller","year":"1995","unstructured":"Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39\u201341","journal-title":"Commun ACM"},{"key":"608_CR33","doi-asserted-by":"crossref","unstructured":"Nghiem M, Baylis P, Ananiadou S (2021) Paladin: an annotation tool based on active and proactive learning. In: Gkatzia D, Seddah D (eds) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, EACL 2021, Online, April 19\u201323, 2021, pp. 238\u2013243. Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/2021.eacl-demos.28\/","DOI":"10.18653\/v1\/2021.eacl-demos.28"},{"key":"608_CR34","doi-asserted-by":"publisher","unstructured":"Ousidhoum N, Lin Z, Zhang H, Song Y, Yeung D (2019) Multilingual and multi-aspect hate speech analysis. In: EMNLP-IJCNLP 2019, November 3\u20137, 2019, pp. 4674\u20134683. Association for Computational Linguistics, Hong Kong, China. https:\/\/doi.org\/10.18653\/v1\/D19-1474","DOI":"10.18653\/v1\/D19-1474"},{"key":"608_CR35","doi-asserted-by":"crossref","unstructured":"Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532\u20131543. Doha, Qatar. http:\/\/www.aclweb.org\/anthology\/D14-1162","DOI":"10.3115\/v1\/D14-1162"},{"key":"608_CR36","unstructured":"Pitenis Z, Zampieri M, Ranasinghe T (2020) Offensive language identification in greek. In: LREC, pp. 5113\u20135119. European Language Resources Association"},{"key":"608_CR37","unstructured":"Polignano M, Basile P, de\u00a0Gemmis M, Semeraro G, Basile V (2019) Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets. In: Bernardi R, Navigli R, Semeraro G (eds) Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy, November 13\u201315, 2019, CEUR Workshop Proceedings, vol. 2481. CEUR-WS.org. http:\/\/ceur-ws.org\/Vol-2481\/paper57.pdf"},{"key":"608_CR38","unstructured":"Porter MF (2001) Snowball: A language for stemming algorithms. Published online. http:\/\/snowball.tartarus.org\/texts\/introduction.html. Accessed 11.03.2008, 15.00h"},{"key":"608_CR39","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1016\/j.knosys.2018.01.033","volume":"145","author":"OGR Pupo","year":"2018","unstructured":"Pupo OGR, Altalhi AH, Ventura S (2018) Statistical comparisons of active learning strategies over multiple datasets. Knowl Based Syst 145:274\u2013288. https:\/\/doi.org\/10.1016\/j.knosys.2018.01.033","journal-title":"Knowl Based Syst"},{"key":"608_CR40","unstructured":"Ranasinghe T, Zampieri M, Hettiarachchi H (2019) BRUMS at HASOC 2019: Deep learning models for multilingual hate speech and offensive language identification. In: Working Notes of FIRE 2019, December 12-15, 2019, CEUR Workshop Proceedings, vol. 2517, pp. 199\u2013207. CEUR-WS.org, Kolkata, India. http:\/\/ceur-ws.org\/Vol-2517\/T3-3.pdf"},{"key":"608_CR41","doi-asserted-by":"crossref","unstructured":"Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 254\u2013269. Springer, Springer, Bled, Slovenia","DOI":"10.1007\/978-3-642-04174-7_17"},{"key":"608_CR42","doi-asserted-by":"crossref","unstructured":"Rosenthal S, Atanasova P, Karadzhov G, Zampieri M, Nakov, P (2021) SOLID: A large-scale semi-supervised dataset for offensive language identification. In: ACL\/IJCNLP (Findings), Findings of ACL, vol. ACL\/IJCNLP 2021, pp. 915\u2013928. Association for Computational Linguistics","DOI":"10.18653\/v1\/2021.findings-acl.80"},{"key":"608_CR43","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. In: NeurIPS EMC$$^2$$ Workshop"},{"key":"608_CR44","doi-asserted-by":"publisher","unstructured":"Sharma M, Zhuang D, Bilgic M (2015) Active learning with rationales for text classification. In: Mihalcea R, Chai JY, Sarkar A (eds) NAACL HLT 2015, Denver, Colorado, USA, May 31 - June 5, 2015, pp. 441\u2013451. The Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/v1\/n15-1047","DOI":"10.3115\/v1\/n15-1047"},{"key":"608_CR45","doi-asserted-by":"publisher","unstructured":"Shim H, Luca S, Lowet D, Vanrumste B (2020) Data augmentation and semi-supervised learning for deep neural networks-based text classifier. In: Hung C, Cern\u00fd T, Shin D, Bechini A (eds) SAC \u201920: The 35th ACM\/SIGAPP Symposium on Applied Computing, online event, [Brno, Czech Republic], March 30 - April 3, 2020, pp. 1119\u20131126. ACM. https:\/\/doi.org\/10.1145\/3341105.3373992","DOI":"10.1145\/3341105.3373992"},{"key":"608_CR46","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1016\/j.csl.2020.101104","volume":"65","author":"B Skrlj","year":"2021","unstructured":"Skrlj B, Martinc M, Kralj J, Lavrac N, Pollak S (2021) tax2vec: constructing interpretable features from taxonomies for short text classification. Comput Speech Lang 65:101\u2013104. https:\/\/doi.org\/10.1016\/j.csl.2020.101104","journal-title":"Comput Speech Lang"},{"key":"608_CR47","doi-asserted-by":"publisher","unstructured":"Sun C, Asudeh A, Jagadish HV, Howe B, Stoyanovich J (2019) Mithralabel: flexible dataset nutritional labels for responsible data science. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3\u20137, pp. 2893\u20132896. ACM, Beijing, China (2019). https:\/\/doi.org\/10.1145\/3357384.3357853","DOI":"10.1145\/3357384.3357853"},{"key":"608_CR48","doi-asserted-by":"publisher","unstructured":"Tang MJ, Chan ET (2020) Social media: influences and impacts on culture. In: Arai K, Kapoor S, Bhatia R (eds) Intelligent computing\u2014proceedings of the 2020 computing conference, Volume 1, SAI 2020, London, UK, 16\u201317 July 2020, Advances in Intelligent Systems and Computing, vol. 1228, pp. 491\u2013501. Springer. https:\/\/doi.org\/10.1007\/978-3-030-52249-0_33","DOI":"10.1007\/978-3-030-52249-0_33"},{"key":"608_CR49","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.inffus.2017.05.003","volume":"40","author":"A Tommasel","year":"2018","unstructured":"Tommasel A, Godoy D (2018) A social-aware online short-text feature selection technique for social media. Inf Fus. 40:1\u201317. https:\/\/doi.org\/10.1016\/j.inffus.2017.05.003","journal-title":"Inf Fus."},{"key":"608_CR50","doi-asserted-by":"publisher","first-page":"e7","DOI":"10.1017\/S0269888919000018","volume":"34","author":"A Tommasel","year":"2019","unstructured":"Tommasel A, Godoy D (2019) Short-text learning in social media: a review. Knowl Eng Rev 34:e7. https:\/\/doi.org\/10.1017\/S0269888919000018","journal-title":"Knowl Eng Rev"},{"key":"608_CR51","doi-asserted-by":"publisher","unstructured":"Tommasel A, Godoy D (2018) A social-aware online short-text feature selection technique for social media. Inform Fus 40:1\u201317 https:\/\/doi.org\/10.1016\/j.inffus.2017.05.003. http:\/\/www.sciencedirect.com\/science\/article\/pii\/S1566253516302354","DOI":"10.1016\/j.inffus.2017.05.003"},{"issue":"3","key":"608_CR52","doi-asserted-by":"publisher","first-page":"1","DOI":"10.4018\/jdwm.2007070101","volume":"3","author":"G Tsoumakas","year":"2007","unstructured":"Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min (IJDWM) 3(3):1\u201313","journal-title":"Int J Data Warehous Min (IJDWM)"},{"issue":"1","key":"608_CR53","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1007\/s10676-019-09516-z","volume":"22","author":"S Ullmann","year":"2020","unstructured":"Ullmann S, Tomalin M (2020) Quarantining online hate speech: technical and ethical perspectives. Ethics Inf Technol 22(1):69\u201380. https:\/\/doi.org\/10.1007\/s10676-019-09516-z","journal-title":"Ethics Inf Technol"},{"key":"608_CR54","doi-asserted-by":"publisher","unstructured":"Van Hee C, Jacobs G, Emmery C, Desmet B, Lefever E, Verhoeven B, De Pauw G, Daelemans W, Hoste V (2018) Automatic detection of cyberbullying in social media text. PLOS One 13(10). https:\/\/doi.org\/10.1371\/journal.pone.0203794","DOI":"10.1371\/journal.pone.0203794"},{"key":"608_CR55","unstructured":"van Rosendaal J, Caselli T, Nissim M (2020) Lower bias, higher density abusive language datasets: a recipe. In: Monti J, Basile V, di\u00a0Buono MP, Manna R, Pascucci A, Tonelli S (eds) Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language, ResTUP@LREC 2020, Marseille, France, May 2020, pp. 14\u201319. European Language Resources Association (ELRA). https:\/\/www.aclweb.org\/anthology\/2020.restup-1.4\/"},{"key":"608_CR56","doi-asserted-by":"crossref","unstructured":"Vapnik VN (2000) The nature of statistical learning theory, Second Edition. Statistics for Engineering and Information Science. Springer","DOI":"10.1007\/978-1-4757-3264-1_8"},{"key":"608_CR57","doi-asserted-by":"crossref","unstructured":"Varma S, Simon R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinform 7(1):91","DOI":"10.1186\/1471-2105-7-91"},{"key":"608_CR58","doi-asserted-by":"crossref","unstructured":"Waseem Z, Hovy D (2016) Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88\u201393. Association for Computational Linguistics, San Diego, California. http:\/\/www.aclweb.org\/anthology\/N16-2013","DOI":"10.18653\/v1\/N16-2013"},{"key":"608_CR59","doi-asserted-by":"publisher","unstructured":"Yang F, Peng X, Ghosh G, Shilon R, Ma H, Moore E, Predovic G (2019) Exploring deep multimodal fusion of text and photo for hate speech classification. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 11\u201318. Association for Computational Linguistics, Florence, Italy. https:\/\/doi.org\/10.18653\/v1\/W19-3502. https:\/\/www.aclweb.org\/anthology\/W19-3502","DOI":"10.18653\/v1\/W19-3502"},{"issue":"5","key":"608_CR60","doi-asserted-by":"publisher","first-page":"1093","DOI":"10.1007\/s13042-018-0787-8","volume":"10","author":"D Yu","year":"2019","unstructured":"Yu D, Fu B, Xu G, Qin A (2019) Constrained nonnegative matrix factorization-based semi-supervised multilabel learning. Int J Mach Learn Cyber 10(5):1093\u20131100. https:\/\/doi.org\/10.1007\/s13042-018-0787-8","journal-title":"Int J Mach Learn Cyber"},{"key":"608_CR61","doi-asserted-by":"publisher","unstructured":"Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. In: NAACL-HLT 2019, Minneapolis, MN, USA, June 2\u20137, 2019, Volume 1 (Long and Short Papers), pp. 1415\u20131420. https:\/\/doi.org\/10.18653\/v1\/n19-1144","DOI":"10.18653\/v1\/n19-1144"},{"issue":"7","key":"608_CR62","doi-asserted-by":"publisher","first-page":"2038","DOI":"10.1016\/j.patcog.2006.12.019","volume":"40","author":"ML Zhang","year":"2007","unstructured":"Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038\u20132048","journal-title":"Pattern Recogn"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00608-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-021-00608-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00608-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,21]],"date-time":"2023-01-21T18:44:55Z","timestamp":1674326695000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-021-00608-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,4]]},"references-count":62,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["608"],"URL":"https:\/\/doi.org\/10.1007\/s40747-021-00608-2","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,4]]},"assertion":[{"value":"1 July 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 December 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}