{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T17:58:45Z","timestamp":1768413525399,"version":"3.49.0"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,6,27]],"date-time":"2024-06-27T00:00:00Z","timestamp":1719446400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62372005"],"award-info":[{"award-number":["62372005"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2024,7,31]]},"abstract":"<jats:p>Machine translation is integral to international communication and extensively employed in diverse human-related applications. Despite remarkable progress, fairness issues persist within current machine translation systems. In this article, we propose FairMT, an automated fairness testing approach tailored for machine translation systems. FairMT operates on the assumption that translations of semantically similar sentences, containing protected attributes from distinct demographic groups, should maintain comparable meanings. It comprises three key steps: (1) test input generation, producing inputs covering various demographic groups; (2) test oracle generation, identifying potential unfair translations based on semantic similarity measurements; and (3) regression, discerning genuine fairness issues from those caused by low-quality translation. Leveraging FairMT, we conduct an empirical study on three leading machine translation systems\u2013Google Translate, T5, and Transformer. Our investigation uncovers up to 832, 1,984, and 2,627 unfair translations across the three systems, respectively. Intriguingly, we observe that fair translations tend to exhibit superior translation performance, challenging the conventional wisdom of a fairness-performance tradeoff prevalent in the fairness literature.<\/jats:p>","DOI":"10.1145\/3664608","type":"journal-article","created":{"date-parts":[[2024,5,9]],"date-time":"2024-05-09T08:46:47Z","timestamp":1715244407000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Fairness Testing of Machine Translation Systems"],"prefix":"10.1145","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9990-9120","authenticated-orcid":false,"given":"Zeyu","family":"Sun","sequence":"first","affiliation":[{"name":"Science &amp; Technology on Integrated Information System Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4765-1893","authenticated-orcid":false,"given":"Zhenpeng","family":"Chen","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0481-7264","authenticated-orcid":false,"given":"Jie","family":"Zhang","sequence":"additional","affiliation":[{"name":"King's College London, London, United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8295-303X","authenticated-orcid":false,"given":"Dan","family":"Hao","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (Peking University), MoE, School of Computer Science, Peking University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2024,6,27]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"2013. Google Translate\u2019s Gender Problem (And Bing Translate\u2019s and Systrans\u2019s...). https:\/\/www.fastcompany.com\/3010223\/google-translates-gender-problem-and-bing-translates-and-systrans"},{"key":"e_1_3_2_3_2","unstructured":"2015. Google Apologizes After Its Translator Produced Homophobic Slurs For The Word \u2018Gay\u2019. https:\/\/www.businessinsider.com\/google-apologizes-for-translate-flaw-producing-homophobic-slurs-2015-1"},{"key":"e_1_3_2_4_2","unstructured":"2020. Female Historians and Male Nurses do not Exist. https:\/\/algorithmwatch.org\/en\/google-translate-gender-bias\/"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338937"},{"issue":"12","key":"e_1_3_2_6_2","first-page":"5087","article-title":"BiasFinder: Metamorphic test generation to uncover bias for sentiment analysis systems","volume":"48","author":"Asyrofi Muhammad Hilmi","year":"2022","unstructured":"Muhammad Hilmi Asyrofi, Zhou Yang, Imam Nur Bani Yusuf, Hong Jin Kang, Ferdian Thung, and David Lo. 2022. BiasFinder: Metamorphic test generation to uncover bias for sentiment analysis systems. IEEE Transactions on Software Engineering 48, 12 (2022), 5087\u20135101.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of ICLR","author":"Belinkov Yonatan","year":"2018","unstructured":"Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and natural noise both break neural machine translation. In Proceedings of ICLR."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409704"},{"key":"e_1_3_2_9_2","article-title":"SemMT: A semantic-based testing approach for machine translation systems","volume":"2012","author":"Cao Jialun","year":"2020","unstructured":"Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, and Shing-Chi Cheung. 2020. SemMT: A semantic-based testing approach for machine translation systems. CoRR abs\/2012.01815 (2020). arxiv:2012.01815https:\/\/arxiv.org\/abs\/2012.01815","journal-title":"CoRR"},{"key":"e_1_3_2_10_2","volume-title":"Metamorphic Testing: A New Approach for Generating Next Test Cases","author":"Chen Tsong Y.","year":"1998","unstructured":"Tsong Y. Chen, Shing C. Cheung, and Shiu Ming Yiu. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3143561"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3652155"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549093"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3583561"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639083"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098095"},{"key":"e_1_3_2_17_2","unstructured":"CWMT. 2018. The CWMT Dataset. http:\/\/nlp.nju.edu.cn\/cwmt-wmt\/"},{"key":"e_1_3_2_18_2","article-title":"Universal transformers","author":"Dehghani Mostafa","year":"2018","unstructured":"Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and \u0141ukasz Kaiser. 2018. Universal transformers. arXiv preprint arXiv:1807.03819 (2018).","journal-title":"arXiv preprint arXiv:1807.03819"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Lyle Campbell and Ver\u00f3nica Grondona. 2008. Ethnologue: Languages of the world. Language 84 3 (2008) 636\u2013641.","DOI":"10.1353\/lan.0.0054"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01841"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510137"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3106237.3106277"},{"key":"e_1_3_2_23_2","unstructured":"Google. 2023. Google Translate. http:\/\/translate.google.com"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409756"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380339"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00047"},{"key":"e_1_3_2_27_2","first-page":"68","volume-title":"Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (AMTA 2018) (Boston, MA, USA, March 17\u201321, 2018) - Volume 1: Research Papers","author":"Heigold Georg","year":"2018","unstructured":"Georg Heigold, Stalin Varanasi, G\u00fcnter Neumann, and Josef van Genabith. 2018. How robust are character-based word embeddings in tagging and MT against wrod scramlbing or randdm nouse? In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (AMTA 2018) (Boston, MA, USA, March 17\u201321, 2018) - Volume 1: Research Papers. 68\u201380. https:\/\/aclanthology.info\/papers\/W18-1807\/w18-1807"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468565"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531968"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00136"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1075\/li.30.1.03nad"},{"key":"e_1_3_2_32_2","unstructured":"OpenAI. 2023. ChatGPT. https:\/\/chat.openai.com\/"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3193977.3193980"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-019-04144-6"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.165"},{"key":"e_1_3_2_38_2","unstructured":"Stefan Schweter and Alan Akbik. 2020. FLERT: Document-Level Features for Named Entity Recognition. arxiv:2011.06993 [cs.CL]"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.5555\/972597.972602"},{"issue":"12","key":"e_1_3_2_40_2","first-page":"5188","article-title":"Astraea: Grammar-based fairness testing","volume":"48","author":"Soremekun Ezekiel","year":"2022","unstructured":"Ezekiel Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay. 2022. Astraea: Grammar-based fairness testing. IEEE Transactions on Software Engineering 48, 12 (2022), 5188\u20135211.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASWEC.2018.00021"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380420"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510206"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549169"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238165"},{"key":"e_1_3_2_46_2","article-title":"Tensor2Tensor for neural machine translation","volume":"1803","author":"Vaswani Ashish","year":"2018","unstructured":"Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, \u0141ukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. Tensor2Tensor for neural machine translation. CoRR abs\/1803.07416 (2018). http:\/\/arxiv.org\/abs\/1803.07416","journal-title":"CoRR"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616310"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597926.3598081"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.184"},{"key":"e_1_3_2_51_2","first-page":"8780","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (NeurIPS 2019)","author":"Wick Michael L.","year":"2019","unstructured":"Michael L. Wick, Swetasudha Panda, and Jean-Baptiste Tristan. 2019. Unlocking fairness: A trade-off revisited. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (NeurIPS 2019). 8780\u20138789."},{"key":"e_1_3_2_52_2","unstructured":"Wikipedia. 2014. Wikipedia. https:\/\/dumps.wikimedia.org\/"},{"key":"e_1_3_2_53_2","unstructured":"WMT. 2018. News-Commentary. http:\/\/data.statmt.org\/wmt18\/translation-task\/"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597926.3598099"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460319.3464820"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380331"},{"key":"e_1_3_2_58_2","article-title":"Generating natural adversarial examples","volume":"1710","author":"Zhao Zhengli","year":"2017","unstructured":"Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2017. Generating natural adversarial examples. CoRR abs\/1710.11342 (2017). arxiv:1710.11342http:\/\/arxiv.org\/abs\/1710.11342","journal-title":"CoRR"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510123"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1328"},{"key":"e_1_3_2_61_2","first-page":"3530","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Ziemski Micha\u0142","year":"2016","unstructured":"Micha\u0142 Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The united nations parallel corpus v1. 0. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916). 3530\u20133534."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664608","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3664608","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:44Z","timestamp":1750291424000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664608"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,27]]},"references-count":60,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,7,31]]}},"alternative-id":["10.1145\/3664608"],"URL":"https:\/\/doi.org\/10.1145\/3664608","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,27]]},"assertion":[{"value":"2024-01-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-15","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}