{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T09:53:04Z","timestamp":1780653184808,"version":"3.54.1"},"reference-count":43,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2021,9,4]],"date-time":"2021-09-04T00:00:00Z","timestamp":1630713600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MTI"],"abstract":"<jats:p>We introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the English language in this field, there are no works on detoxification for the Russian language. We suggest two types of models\u2014an approach based on BERT architecture that performs local corrections and a supervised approach based on a pretrained GPT-2 language model. We compare these methods with several baselines. In addition, we provide the training datasets and describe the evaluation setup and metrics for automatic and manual evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.<\/jats:p>","DOI":"10.3390\/mti5090054","type":"journal-article","created":{"date-parts":[[2021,9,6]],"date-time":"2021-09-06T13:18:26Z","timestamp":1630934306000},"page":"54","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Methods for Detoxification of Texts for the Russian Language"],"prefix":"10.3390","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0929-4140","authenticated-orcid":false,"given":"Daryna","family":"Dementieva","sequence":"first","affiliation":[{"name":"Skolkovo Institute of Science and Technology, 121205 Moscow, Russia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Daniil","family":"Moskovskiy","sequence":"additional","affiliation":[{"name":"Skolkovo Institute of Science and Technology, 121205 Moscow, Russia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Varvara","family":"Logacheva","sequence":"additional","affiliation":[{"name":"Skolkovo Institute of Science and Technology, 121205 Moscow, Russia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Dale","sequence":"additional","affiliation":[{"name":"Skolkovo Institute of Science and Technology, 121205 Moscow, Russia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Olga","family":"Kozlova","sequence":"additional","affiliation":[{"name":"Mobile TeleSystems (MTS), 109147 Moscow, Russia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nikita","family":"Semenov","sequence":"additional","affiliation":[{"name":"Mobile TeleSystems (MTS), 109147 Moscow, Russia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6097-6118","authenticated-orcid":false,"given":"Alexander","family":"Panchenko","sequence":"additional","affiliation":[{"name":"Skolkovo Institute of Science and Technology, 121205 Moscow, Russia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,4]]},"reference":[{"key":"ref_1","unstructured":"D\u2019Sa, A.G., Illina, I., and Fohr, D. (2020, January 16). Towards Non-Toxic Landscapes: Automatic Toxic Comment Detection Using DNN. Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, Marseille, France."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Schmidt, A., and Wiegand, M. (2017, January 3). A Survey on Hate Speech Detection using Natural Language Processing. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Valencia, Spain.","DOI":"10.18653\/v1\/W17-1101"},{"key":"ref_3","unstructured":"Pamungkas, E.W., and Patti, V. (August, January 28). Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy."},{"key":"ref_4","unstructured":"Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., and Garnett, R. (2017, January 4\u20139). Style Transfer from Non-Parallel Text by Cross-Alignment. Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA."},{"key":"ref_5","unstructured":"Melnyk, I., dos Santos, C.N., Wadhawan, K., Padhi, I., and Kumar, A. (2017). Improved Neural Text Attribute Transfer with Non-parallel Data. arXiv."},{"key":"ref_6","unstructured":"Pryzant, R., Martinez, R.D., Dass, N., Kurohashi, S., Jurafsky, D., and Yang, D. (2020, January 7\u201312). Automatically Neutralizing Subjective Bias in Text. Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Rao, S., and Tetreault, J. (2018, January 1\u20136). Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-1012"},{"key":"ref_8","unstructured":"Jin, D., Jin, Z., Hu, Z., Vechtomova, O., and Mihalcea, R. (2020). Deep Learning for Text Style Transfer: A Survey. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Nogueira dos Santos, C., Melnyk, I., and Padhi, I. (2018, January 15\u201320). Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-2031"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Tran, M., Zhang, Y., and Soleymani, M. (2020, January 8\u201313). Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction. Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain.","DOI":"10.18653\/v1\/2020.coling-main.190"},{"key":"ref_11","unstructured":"Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J.R. (2020, January 5\u201310). Politeness Transfer: A Tag and Generate Approach. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online."},{"key":"ref_12","unstructured":"Jigsaw (2021, March 01). Toxic Comment Classification Challenge. Available online: https:\/\/www.kaggle.com\/c\/jigsaw-toxic-comment-classification-challenge."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., and Kumar, R. (2019, January 2\u20137). Predicting the Type and Target of Offensive Posts in Social Media. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA.","DOI":"10.18653\/v1\/N19-1144"},{"key":"ref_14","unstructured":"Wiegand, M., Siegel, M., and Ruppenhofer, J. (2018, January 19\u201321). Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language. Proceedings of the GermEval 2018, 14th Conference on Natural Language Processing (KONVENS 2018), Vienna, Austria."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3232676","article-title":"A Survey on Automatic Detection of Hate Speech in Text","volume":"51","author":"Fortuna","year":"2018","journal-title":"ACM Comput. Surv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Waseem, Z., and Hovy, D. (2016, January 12\u201317). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. Proceedings of the NAACL Student Research Workshop, San Diego, CA, USA.","DOI":"10.18653\/v1\/N16-2013"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Davidson, T., Warmsley, D., Macy, M., and Weber, I. (2017, January 15\u201318). Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the 11th International AAAI Conference on Web and Social Media (ICWSM-17), Montr\u00e9al, QC, Canada.","DOI":"10.1609\/icwsm.v11i1.14955"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel Pardo, F.M., Rosso, P., and Sanguinetti, M. (2019, January 6\u20137). SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN, USA.","DOI":"10.18653\/v1\/S19-2007"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Breitfeller, L., Ahn, E., Jurgens, D., and Tsvetkov, Y. (2019, January 3\u20137). Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1176"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1037\/0003-066X.62.4.271","article-title":"Racial microaggressions in everyday life: Implications for clinical practice","volume":"62","author":"Sue","year":"2007","journal-title":"Am. Psychol."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Han, X., and Tsvetkov, Y. (2020, January 16\u201320). Fortifying Toxic Speech Detectors Against Veiled Toxicity. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.","DOI":"10.18653\/v1\/2020.emnlp-main.622"},{"key":"ref_22","unstructured":"Lees, A., Borkan, D., Kivlichan, I., Nario, J., and Goyal, T. (2021, January 20). Capturing Covertly Toxic Speech via Crowdsourcing. Proceedings of the First Workshop on Bridging Human\u2014Computer Interaction and Natural Language Processing, Online."},{"key":"ref_23","unstructured":"Tikhonov, A., and Yamshchikov, I.P. (2018). What is wrong with style transfer for texts?. arXiv."},{"key":"ref_24","unstructured":"King, M. (1985, January 27\u201329). A Computational Theory of Prose Style for Natural Language Generation. Proceedings of the EACL 1985, 2nd Conference of the European Chapter of the Association for Computational Linguistics, Geneva, Switzerland."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Gatys, L.A., Ecker, A.S., and Bethge, M. (2016, January 27\u201330). Image Style Transfer Using Convolutional Neural Networks. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.265"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Li, J., Jia, R., He, H., and Liang, P. (2018, January 1\u20136). Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-1169"},{"key":"ref_27","unstructured":"John, V., Mou, L., Bahuleyan, H., and Vechtomova, O. (August, January 28). Disentangled Representation Learning for Non-Parallel Text Style Transfer. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_28","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"ref_29","first-page":"140:1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_30","unstructured":"Webber, B., Cohn, T., He, Y., and Liu, Y. (2020, January 16\u201320). Reformulating Unsupervised Style Transfer as Paraphrase Generation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online."},{"key":"ref_31","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Arefyev, N., Sheludko, B., Podolskiy, A., and Panchenko, A. (2020, January 8\u201313). Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution. Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain.","DOI":"10.18653\/v1\/2020.coling-main.107"},{"key":"ref_33","first-page":"84","article-title":"Conditional BERT Contextual Augmentation","volume":"Volume 11539","author":"Rodrigues","year":"2019","journal-title":"Proceedings of the Computational Science\u2014ICCS 2019\u201419th International Conference"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wu, X., Zhang, T., Zang, L., Han, J., and Hu, S. (2019). \u201cMask and Infill\u201d: Applying Masked Language Model to Sentiment Transfer. arXiv.","DOI":"10.24963\/ijcai.2019\/732"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_36","unstructured":"Belchikov, A. (2021, July 22). Russian Language Toxic Comments. Available online: https:\/\/www.kaggle.com\/blackmoon\/russian-language-toxic-comments."},{"key":"ref_37","unstructured":"Semiletov, A. (2021, July 22). Toxic Russian Comments. Available online: https:\/\/www.kaggle.com\/alexandersemiletov\/toxic-russian-comments."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching Word Vectors with Subword Information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Kutuzov, A., and Kuzmenko, E. (2017). WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models. Analysis of Images, Social Networks and Texts, Proceedings of the 5th International Conference, AIST 2016, Yekaterinburg, Russia, 7\u20139 April 2016, Springer International Publishing. Revised Selected Papers.","DOI":"10.1007\/978-3-319-52920-2_15"},{"key":"ref_40","unstructured":"Kuratov, Y., and Arkhipov, M. (2019). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Abdaoui, A., Pradel, C., and Sigel, G. (2020, January 11). Load What You Need: Smaller Versions of Mutililingual BERT. Proceedings of the SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, Stroudsburg, PA, USA.","DOI":"10.18653\/v1\/2020.sustainlp-1.16"},{"key":"ref_42","unstructured":"Birch, A., Finch, A.M., Hayashi, H., Konstas, I., Luong, T., Neubig, G., Oda, Y., and Sudoh, K. (2019, January 4). Unsupervised Evaluation Metrics and Learning Criteria for Non-Parallel Textual Transfer. Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP 2019, Hong Kong, China."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Yamshchikov, I.P., Shibaev, V., Khlebnikov, N., and Tikhonov, A. (2020). Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric. arXiv.","DOI":"10.1609\/aaai.v35i16.17672"}],"container-title":["Multimodal Technologies and Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2414-4088\/5\/9\/54\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:56:17Z","timestamp":1760165777000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2414-4088\/5\/9\/54"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,4]]},"references-count":43,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["mti5090054"],"URL":"https:\/\/doi.org\/10.3390\/mti5090054","relation":{},"ISSN":["2414-4088"],"issn-type":[{"value":"2414-4088","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,4]]}}}