{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T19:44:21Z","timestamp":1780083861588,"version":"3.54.0"},"reference-count":28,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2023,1,16]],"date-time":"2023-01-16T00:00:00Z","timestamp":1673827200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"KAKENHI","award":["20K11830"],"award-info":[{"award-number":["20K11830"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models\u2019 robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.<\/jats:p>","DOI":"10.3390\/a16010059","type":"journal-article","created":{"date-parts":[[2023,1,16]],"date-time":"2023-01-16T06:52:10Z","timestamp":1673851930000},"page":"59","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5141-2457","authenticated-orcid":false,"given":"Huidong","family":"Tang","sequence":"first","affiliation":[{"name":"Graduate School of Advanced Science and Engineering, Hiroshima University, Kagamiyama 1-7-1, Higashi-Hiroshima 739-8521, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1716-3028","authenticated-orcid":false,"given":"Sayaka","family":"Kamei","sequence":"additional","affiliation":[{"name":"Graduate School of Advanced Science and Engineering, Hiroshima University, Kagamiyama 1-7-1, Higashi-Hiroshima 739-8521, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yasuhiko","family":"Morimoto","sequence":"additional","affiliation":[{"name":"Graduate School of Advanced Science and Engineering, Hiroshima University, Kagamiyama 1-7-1, Higashi-Hiroshima 739-8521, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/j.inffus.2018.08.002","article-title":"An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks","volume":"48","author":"Faris","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"101801","DOI":"10.1016\/j.is.2021.101801","article-title":"Optimizing semantic deep forest for tweet topic classification","volume":"101","author":"Daouadi","year":"2021","journal-title":"Inf. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Fan, F., Feng, Y., and Zhao, D. (November, January 31). Multi-grained Attention Network for Aspect-Level Sentiment Classification. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1380"},{"key":"ref_4","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota."},{"key":"ref_5","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jin, D., Jin, Z., Zhou, J.T., and Szolovits, P. (2020, January 7\u201312). Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i05.6311"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.T., Wu, T., Guestrin, C., and Singh, S. (2020, January 5\u201310). Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.442"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Li, D., Zhang, Y., Peng, H., Chen, L., Brockett, C., Sun, M.-T., and Dolan, B. (2021, January 6\u201311). Contextualized Perturbation for Textual Adversarial Attack. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.","DOI":"10.18653\/v1\/2021.naacl-main.400"},{"key":"ref_9","unstructured":"Ren, S., Deng, Y., He, K., and Che, W. (August, January 28). Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wei, J., and Zou, K. (2019, January 3\u20137). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1670"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Karimi, A., Rossi, L., and Prati, A. (2021, January 7\u201311). AEDA: An Easier Data Augmentation Technique for Text Classification. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.findings-emnlp.234"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Liu, R., Xu, G., Jia, C., Ma, W., Wang, L., and Vosoughi, S. (2020, January 16\u201320). Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.","DOI":"10.18653\/v1\/2020.emnlp-main.726"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Anaby-Tavor, A., Carmeli, B., Goldbraich, E., Kantor, A., Kour, G., Shlomov, S., Tepper, N., and Zwerdling, N. (2020, January 7\u201312). Do Not Have Enough Data? Deep Learning to the Rescue!. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i05.6233"},{"key":"ref_14","unstructured":"Xie, Q., Dai, Z., Hovy, E., Luong, T., and Le, Q. (2020, January 6\u201312). Unsupervised data augmentation for consistency training. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kobayashi, S. (2018, January 1\u20136). Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-2072"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"\u015eahin, G.G., and Steedman, M. (November, January 31). Data Augmentation via Dependency Tree Morphing for Low-Resource Languages. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1545"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Niu, T., and Bansal, M. (2019, January 3\u20137). Automatically Learning Data Augmentation Policies for Dialogue Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1132"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"063120","DOI":"10.1063\/1.4954215","article-title":"Topic segmentation via community detection in complex networks","volume":"26","author":"Costa","year":"2016","journal-title":"Chaos: Interdiscip. J. Nonlinear Sci."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Machicao, J., Corr\u00eaa Jr, E.A., Miranda, G.H., Amancio, D.R., and Bruno, O.M. (2018). Authorship attribution based on life-like network automata. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0193703"},{"key":"ref_20","unstructured":"Zhang, X., Zhao, J., and LeCun, Y. (2015, January 7\u201312). Character-level convolutional networks for text classification. Proceedings of the 28th International Conference on Neural Information Processing Systems\u2014Volume 1, Montreal, Quebec, Canada."},{"key":"ref_21","unstructured":"Li, X., and Roth, D. (September, January 24). Learning Question Classifiers. Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Hovy, E., Gerber, L., Hermjakob, U., Lin, C.-Y., and Ravichandran, D. (2001, January 18\u201321). Toward semantics-based answer pinpointing. Proceedings of the First International Conference on Human Language Technology Research, San Diego, CA, USA.","DOI":"10.3115\/1072133.1072221"},{"key":"ref_23","unstructured":"Conneau, A., and Kiela, D. (2018, January 7\u201312). SentEval: An Evaluation Toolkit for Universal Sentence Representations. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Almeida, T.A., Hidalgo, J.M.G., and Yamakami, A. (2011, January 19\u201322). Contributions to the study of SMS spam filtering: New collection and results. Proceedings of the 11th ACM symposium on Document engineering, Mountain View, CA, USA.","DOI":"10.1145\/2034691.2034742"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/219717.219748","article-title":"WordNet: A lexical database for English","volume":"38","author":"Miller","year":"1995","journal-title":"Commun. ACM"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Cambria, E., Li, Y., Xing, F.Z., Poria, S., and Kwok, K. (2020, January 19\u201323). SenticNet 6: Ensemble application of symbolic and sub-symbolic AI for sentiment analysis. Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Online.","DOI":"10.1145\/3340531.3412003"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Morris, J.X., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., and Qi, Y. (2020, January 16\u201320). TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.16"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020, January 16\u201320). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/59\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:07:26Z","timestamp":1760119646000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/59"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,16]]},"references-count":28,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["a16010059"],"URL":"https:\/\/doi.org\/10.3390\/a16010059","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,16]]}}}