{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T07:21:23Z","timestamp":1780730483925,"version":"3.54.1"},"reference-count":38,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T00:00:00Z","timestamp":1758067200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Machine learning in natural language processing (NLP) analyzes datasets to make future predictions, but developing accurate models requires large, high-quality, and balanced datasets. However, collecting such datasets, especially for low-resource languages, is time-consuming and costly. As a solution, data augmentation can be used to increase the dataset size by generating synthetic samples from existing data. This study examines the effect of translation-based data augmentation on sentiment analysis using small datasets in three diverse languages: French, German, and Japanese. We use two neural machine translation (NMT) services\u2014Google Translate and DeepL\u2014to generate augmented datasets through intermediate language translation. Sentiment analysis models based on Support Vector Machine (SVM) are trained on both original and augmented datasets and evaluated using accuracy, precision, recall, and F1 score. Our results demonstrate that translation augmentation significantly enhances model performance in both French and Japanese. For example, using Google Translate, model accuracy improved from 62.50% to 83.55% in Japanese (+21.05%) and from 87.66% to 90.26% in French (+2.6%). In contrast, the German dataset showed a minor improvement or decline, depending on the translator used. Google-based augmentation generally outperformed DeepL, which yielded smaller or negative gains. To evaluate cross-lingual generalization, models trained on one language were tested on datasets in the other two. Notably, a model trained on augmented German data improved its accuracy on French test data from 81.17% to 85.71% and on Japanese test data from 71.71% to 79.61%. Similarly, a model trained on augmented Japanese data improved accuracy on German test data by up to 3.4%. These findings highlight that translation-based augmentation can enhance sentiment classification and cross-language adaptability, particularly in low-resource and multilingual NLP settings.<\/jats:p>","DOI":"10.3390\/info16090806","type":"journal-article","created":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T05:59:51Z","timestamp":1758088791000},"page":"806","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Multilingual Sentiment Analysis with Data Augmentation: A Cross-Language Evaluation in French, German, and Japanese"],"prefix":"10.3390","volume":"16","author":[{"given":"Suboh","family":"Alkhushayni","sequence":"first","affiliation":[{"name":"Department of Information Systems, Faculty of Information Technology and Computer Science, Yarmouk University, Irbid 21163, Jordan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hyesu","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Computer Information Science, Minnesota State University, Mankato, MN 56001, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Taylor, L., and Nitschke, G. (2018, January 18\u201321). Improving deep learning with generic data augmentation. Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India.","DOI":"10.1109\/SSCI.2018.8628742"},{"key":"ref_2","first-page":"10073","article-title":"A two-stage balancing strategy based on data augmentation for imbalanced text sentiment classification","volume":"40","author":"Pang","year":"2021","journal-title":"J. Intell. Fuzzy Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1177\/01655515221137270","article-title":"Improved multi-lingual sentiment analysis and recognition using deep learning","volume":"51","author":"Khan","year":"2025","journal-title":"J. Inf. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1016\/j.ymssp.2017.04.042","article-title":"On the quantification and efficient propagation of imprecise probabilities resulting from small datasets","volume":"98","author":"Zhang","year":"2018","journal-title":"Mech. Syst. Signal Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"703","DOI":"10.1038\/nmeth.3968","article-title":"Points of significance: Model selection and overfitting","volume":"13","author":"Lever","year":"2016","journal-title":"Nat. Methods"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"106631","DOI":"10.1016\/j.knosys.2020.106631","article-title":"On the class overlap problem in imbalanced data classification","volume":"212","author":"Vuttipittayamongkol","year":"2021","journal-title":"Knowl.-Based Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1016\/j.patcog.2018.03.008","article-title":"Handling data irregularities in classification: Foundations, trends, and future challenges","volume":"81","author":"Das","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"101891","DOI":"10.1016\/j.jretconser.2019.101891","article-title":"Positive emotion bias: Role of emotional content from online customer reviews in purchase decisions","volume":"52","author":"Guo","year":"2020","journal-title":"J. Retail. Consum. Serv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hasan, M., Rahman, M.T., Zillanee, A.H., Alam, M.G.R., Islam, M.F.U., and Chakrabarty, A. (2024, January 2\u20134). Multilingual sentiment analysis on social media: Harnessing deep learning for enhanced insights and decision support for foreign travelers. Proceedings of the 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT), Dhaka, Bangladesh.","DOI":"10.1109\/ICEEICT62016.2024.10534557"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"115033","DOI":"10.1016\/j.eswa.2021.115033","article-title":"Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models","volume":"178","author":"Body","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_11","first-page":"3264378","article-title":"M-da: A multifeature text data-augmentation model for improving accuracy of chinese sentiment analysis","volume":"2022","author":"Wang","year":"2022","journal-title":"Sci. Program."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1007\/s13042-022-01553-3","article-title":"Data augmentation in natural language processing: A novel text generation approach for long and short text classifiers","volume":"14","author":"Bayer","year":"2023","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1109\/TAI.2021.3114390","article-title":"Toward text data augmentation for sentiment analysis","volume":"3","author":"Abonizio","year":"2021","journal-title":"IEEE Trans. Artif. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"109803","DOI":"10.1016\/j.asoc.2022.109803","article-title":"Data augmentation techniques in natural language processing","volume":"132","author":"Pellicer","year":"2023","journal-title":"Appl. Soft Comput."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"e12746","DOI":"10.1111\/exsy.12746","article-title":"Cassava disease recognition from low-quality images using enhanced data augmentation model and deep learning","volume":"38","author":"Damasevicius","year":"2021","journal-title":"Expert Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Maeda, K., Takada, S., Haruyama, T., Togo, R., Ogawa, T., and Haseyama, M. (2022). Distress detection in subway tunnel images via data augmentation based on selective image cropping and patching. Sensors, 22.","DOI":"10.3390\/s22228932"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Nanni, L., Paci, M., Brahnam, S., and Lumini, A. (2021). Comparison of different image data augmentation approaches. J. Imaging, 7.","DOI":"10.20944\/preprints202111.0047.v1"},{"key":"ref_19","first-page":"46","article-title":"Sentiment analysis of customer feedbacks in online food ordering services","volume":"12","author":"Nguyen","year":"2021","journal-title":"Bus. Syst. Res. Int. J. Soc. Adv. Innov. Res. Econ."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"117605","DOI":"10.1016\/j.eswa.2022.117605","article-title":"Tailored text augmentation for sentiment analysis","volume":"205","author":"Feng","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wei, J., and Zou, K. (2019). EDA: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv.","DOI":"10.18653\/v1\/D19-1670"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sugiyama, A., and Yoshinaga, N. (2019, January 3). Data augmentation using back-translation for context-aware neural machine translation. Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), Hong Kong, China.","DOI":"10.18653\/v1\/D19-6504"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kry\u015bci\u0144ski, W., McCann, B., Xiong, C., and Socher, R. (2019). Evaluating the factual consistency of abstractive text summarization. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-main.750"},{"key":"ref_24","first-page":"508","article-title":"Sentiment analysis of amazon products using ensemble machine learning algorithm","volume":"4","author":"Sadhasivam","year":"2019","journal-title":"Int. J. Math. Eng. Manag. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1177\/0165551510388123","article-title":"Aspect-based sentiment analysis of movie reviews on discussion boards","volume":"36","author":"Thet","year":"2010","journal-title":"J. Inf. Sci."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1145\/2436256.2436274","article-title":"Techniques and applications for sentiment analysis","volume":"56","author":"Feldman","year":"2013","journal-title":"Commun. ACM"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1016\/j.jocs.2017.11.006","article-title":"Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of arabic hotels\u2019 reviews","volume":"27","author":"Qawasmeh","year":"2018","journal-title":"J. Comput. Sci."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wen, Y., Liang, Y., and Zhu, X. (2023). Sentiment analysis of hotel online reviews using the bert model and ernie model\u2014Data from China. PLoS ONE, 18.","DOI":"10.1371\/journal.pone.0275382"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Alkhushayni, S., Alomari, Z., and Al-Zaleq, D. (2023, January 21\u201323). A sentiment analysis study of twitter users\u2019 reactions to the COVID-19 vaccine. Proceedings of the 2023 14th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan.","DOI":"10.1109\/ICICS60529.2023.10330455"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3464524","DOI":"10.1155\/2022\/3464524","article-title":"Efficient long short-term memory-based sentiment analysis of e-commerce reviews","volume":"2022","author":"Gondhi","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"118246","DOI":"10.1016\/j.eswa.2022.118246","article-title":"Cross lingual transfer learning for sentiment analysis of italian tripadvisor reviews","volume":"209","author":"Catelli","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1108\/IJCHM-02-2021-0132","article-title":"Sentiment analysis in hospitality and tourism: A thematic and methodological review","volume":"34","author":"Mehraliyev","year":"2022","journal-title":"Int. J. Contemp. Hosp. Manag."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1001\/jamainternmed.2018.7653","article-title":"Assessing the use of google translate for spanish and chinese translations of emergency department discharge instructions","volume":"179","author":"Khoong","year":"2019","journal-title":"JAMA Intern. Med."},{"key":"ref_34","first-page":"243","article-title":"Google translate and deepl: Breaking taboos in translator training. observational study and analysis","volume":"45","author":"Burbat","year":"2023","journal-title":"Ib\u00e9rica"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2214","DOI":"10.1021\/acs.jcim.8b00534","article-title":"Is machine translation a reliable tool for reading german scientific databases and research articles?","volume":"58","author":"Zulfiqar","year":"2018","journal-title":"J. Chem. Inf. Model."},{"key":"ref_36","first-page":"18","article-title":"Simple patient care instructions translate best: Safety guidelines for physician use of google translate","volume":"25","author":"Miller","year":"2018","journal-title":"J. Clin. Outcomes Manag."},{"key":"ref_37","unstructured":"Bradley, M.M., and Lang, P.J. (1999). Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings, The Center for Research in Psychophysiology, University of Florida. Tech. Rep. C-1, technical Report."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1108","DOI":"10.3758\/s13428-013-0426-y","article-title":"ANGST: Affective norms for german sentiment terms, derived from the affective norms for english words","volume":"46","author":"Schmidtke","year":"2014","journal-title":"Behav. Res. Methods"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/9\/806\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:46:46Z","timestamp":1760035606000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/9\/806"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,17]]},"references-count":38,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["info16090806"],"URL":"https:\/\/doi.org\/10.3390\/info16090806","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,17]]}}}