{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T19:15:05Z","timestamp":1776885305470,"version":"3.51.2"},"reference-count":38,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2023,7,13]],"date-time":"2023-07-13T00:00:00Z","timestamp":1689206400000},"content-version":"vor","delay-in-days":193,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,7,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Idioms are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters. Due to the properties of non-compositionality and metaphorical meaning, Chinese idioms are hard to be understood by children and non-native speakers. This study proposes a novel task, denoted as Chinese Idiom Paraphrasing (CIP). CIP aims to rephrase idiom-containing sentences to non-idiomatic ones under the premise of preserving the original sentence\u2019s meaning. Since the sentences without idioms are more easily handled by Chinese NLP systems, CIP can be used to pre-process Chinese datasets, thereby facilitating and improving the performance of Chinese NLP tasks, e.g., machine translation systems, Chinese idiom cloze, and Chinese idiom embeddings. In this study, we can treat the CIP task as a special paraphrase generation task. To circumvent difficulties in acquiring annotations, we first establish a large-scale CIP dataset based on human and machine collaboration, which consists of 115,529 sentence pairs. In addition to three sequence-to-sequence methods as the baselines, we further propose a novel infill-based approach based on text infilling. The results show that the proposed method has better performance than the baselines based on the established CIP dataset.<\/jats:p>","DOI":"10.1162\/tacl_a_00572","type":"journal-article","created":{"date-parts":[[2023,7,13]],"date-time":"2023-07-13T16:42:07Z","timestamp":1689266527000},"page":"740-754","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":12,"title":["Chinese Idiom Paraphrasing"],"prefix":"10.1162","volume":"11","author":[{"given":"Jipeng","family":"Qiang","sequence":"first","affiliation":[{"name":"Yangzhou University, China. jpqiang@yzu.edu.cn"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Li","sequence":"additional","affiliation":[{"name":"Yangzhou University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chaowei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Yangzhou University, China. cwzhang@yzu.edu.cn"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yun","family":"Li","sequence":"additional","affiliation":[{"name":"Yangzhou University, China. liyun@yzu.edu.cn"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Zhu","sequence":"additional","affiliation":[{"name":"Yangzhou University, China. zhuyi@yzu.edu.cn"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunhao","family":"Yuan","sequence":"additional","affiliation":[{"name":"Yangzhou University, China. yhyuan@yzu.edu.cn"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xindong","family":"Wu","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, China. xwu@hfut.edu.cn"},{"name":"Zhejiang Lab, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2023,7,12]]},"reference":[{"key":"2023071316415424300_bib1","article-title":"Neural machine translation by jointly learning to align and translate","volume-title":"3rd International Conference on Learning Representations, ICLR 2015","author":"Bahdanau","year":"2015"},{"key":"2023071316415424300_bib2","doi-asserted-by":"publisher","first-page":"272","DOI":"10.18653\/v1\/W18-6401","article-title":"Findings of the 2018 conference on machine translation (wmt18)","volume-title":"Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers","author":"Bojar","year":"2018"},{"key":"2023071316415424300_bib3","doi-asserted-by":"publisher","first-page":"295","DOI":"10.18653\/v1\/2020.emnlp-main.21","article-title":"Calibration of pre-trained transformers","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Desai","year":"2020"},{"key":"2023071316415424300_bib4","doi-asserted-by":"publisher","first-page":"4171","DOI":"10.18653\/v1\/N19-1423","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2023071316415424300_bib5","doi-asserted-by":"crossref","first-page":"2492","DOI":"10.18653\/v1\/2020.acl-main.225","article-title":"Enabling language models to fill in the blanks","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Donahue","year":"2020"},{"key":"2023071316415424300_bib6","article-title":"Maskgan: Better text generation via filling in the _________","volume-title":"6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 \u2013 May 3, 2018, Conference Track Proceedings","author":"Fedus","year":"2018"},{"issue":"5","key":"2023071316415424300_bib7","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1037\/h0031619","article-title":"Measuring nominal scale agreement among many raters","volume":"76","author":"Fleiss","year":"1971","journal-title":"Psychological Bulletin"},{"key":"2023071316415424300_bib8","first-page":"716","article-title":"Identifying idioms in Chinese translations.","volume-title":"LREC","author":"Ho","year":"2014"},{"issue":"8","key":"2023071316415424300_bib9","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2023071316415424300_bib10","doi-asserted-by":"crossref","first-page":"154","DOI":"10.18653\/v1\/W18-0516","article-title":"Chengyu cloze test","volume-title":"Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications","author":"Jiang","year":"2018"},{"key":"2023071316415424300_bib11","doi-asserted-by":"publisher","first-page":"229","DOI":"10.18653\/v1\/2021.acl-srw.24","article-title":"Edit distance based curriculum learning for paraphrase generation","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop","author":"Kadotani","year":"2021"},{"key":"2023071316415424300_bib12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2203.16634","article-title":"A method for stochastic optimization","author":"Kingma","year":"2015"},{"key":"2023071316415424300_bib13","first-page":"74","article-title":"ROUGE: A package for automatic evaluation of summaries","volume-title":"Text Summarization Branches Out","author":"Lin","year":"2004"},{"key":"2023071316415424300_bib14","doi-asserted-by":"crossref","first-page":"6826","DOI":"10.18653\/v1\/2022.findings-emnlp.508","article-title":"WANLI: Worker and AI collaboration for natural language inference dataset creation","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2022","author":"Liu","year":"2022"},{"key":"2023071316415424300_bib15","doi-asserted-by":"publisher","first-page":"5522","DOI":"10.18653\/v1\/P19-1552","article-title":"Neural-based Chinese idiom recommendation for enhancing elegance in essay writing","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Liu","year":"2019"},{"key":"2023071316415424300_bib16","doi-asserted-by":"publisher","first-page":"227","DOI":"10.18653\/v1\/2021.findings-emnlp.22","article-title":"An unsupervised method for building sentence simplification corpora in multiple languages","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2021","author":"Xinyu","year":"2021"},{"key":"2023071316415424300_bib17","doi-asserted-by":"crossref","first-page":"67","DOI":"10.3115\/982163.982182","article-title":"Paraphrasing using given and new information in a question-answer system","volume-title":"17th Annual Meeting of the Association for Computational Linguistics","author":"McKeown","year":"1979"},{"key":"2023071316415424300_bib18","doi-asserted-by":"publisher","first-page":"2551","DOI":"10.18653\/v1\/2021.emnlp-main.199","article-title":"ConRPG: Paraphrase generation using contexts as regularizer","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Meng","year":"2021"},{"key":"2023071316415424300_bib19","doi-asserted-by":"publisher","first-page":"431","DOI":"10.3115\/991719.991724","article-title":"Strategies for effective paraphrasing","volume-title":"COLING Budapest 1988 Volume 2: International Conference on Computational Linguistics","author":"Meteer","year":"1988"},{"key":"2023071316415424300_bib20","doi-asserted-by":"publisher","first-page":"48","DOI":"10.18653\/v1\/N19-4009","article-title":"fairseq: A fast, extensible toolkit for sequence modeling","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)","author":"Ott","year":"2019"},{"key":"2023071316415424300_bib21","doi-asserted-by":"publisher","first-page":"311","DOI":"10.3115\/1073083.1073135","article-title":"BLEU: A method for automatic evaluation of machine translation","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni","year":"2002"},{"key":"2023071316415424300_bib22","doi-asserted-by":"publisher","first-page":"76","DOI":"10.18653\/v1\/W15-2709","article-title":"Idiom paraphrases: Seventh heaven vs cloud nine","volume-title":"Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics","author":"Pershina","year":"2015"},{"key":"2023071316415424300_bib23","doi-asserted-by":"publisher","first-page":"1819","DOI":"10.1109\/TASLP.2021.3078361","article-title":"Chinese lexical simplification","author":"Qiang","year":"2021","journal-title":"IEEE\/ACM Transactions on Audio, Speech, and Language Processing"},{"key":"2023071316415424300_bib24","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2023071316415424300_bib25","article-title":"Evaluating machine translation performance on Chinese idioms with a blacklist method","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Shao","year":"2018"},{"key":"2023071316415424300_bib26","doi-asserted-by":"crossref","first-page":"5186","DOI":"10.18653\/v1\/2020.emnlp-main.420","article-title":"Blank language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Shen","year":"2020"},{"key":"2023071316415424300_bib27","first-page":"1387","article-title":"Learning and evaluating Chinese idiom embeddings","volume-title":"Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)","author":"Tan","year":"2021"},{"issue":"4","key":"2023071316415424300_bib28","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1177\/107769905303000401","article-title":"\u201cCloze procedure\u201d: A new tool for measuring readability","volume":"30","author":"Taylor","year":"1953","journal-title":"Journalism Quarterly"},{"key":"2023071316415424300_bib29","first-page":"5998","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2023071316415424300_bib30","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2023071316415424300_bib31","doi-asserted-by":"publisher","first-page":"1958","DOI":"10.18653\/v1\/2022.acl-long.138","article-title":"BiTIIMT: A bilingual text-infilling method for interactive machine translation","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Xiao","year":"2022"},{"key":"2023071316415424300_bib32","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.18653\/v1\/2022.findings-acl.97","article-title":"Multi-task learning for paraphrase generation with keyword and part-of-speech reconstruction","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Xie","year":"2022"},{"key":"2023071316415424300_bib33","doi-asserted-by":"publisher","first-page":"483","DOI":"10.18653\/v1\/2021.naacl-main.41","article-title":"mT5: A massively multilingual pre-trained text-to-text transformer","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Xue","year":"2021"},{"key":"2023071316415424300_bib34","article-title":"Decoding as dynamic programming for recurrent autoregressive models","volume-title":"International Conference on Learning Representations","author":"Zaidi","year":"2020"},{"key":"2023071316415424300_bib35","doi-asserted-by":"crossref","first-page":"177","DOI":"10.18653\/v1\/2020.repl4nlp-1.21","article-title":"Enhancing transformer with sememe knowledge","volume-title":"Proceedings of the 5th Workshop on Representation Learning for NLP","author":"Zhang","year":"2020"},{"key":"2023071316415424300_bib36","doi-asserted-by":"publisher","first-page":"778","DOI":"10.18653\/v1\/P19-1075","article-title":"ChID: A large-scale Chinese IDiom dataset for cloze test","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zheng","year":"2019"},{"key":"2023071316415424300_bib37","doi-asserted-by":"publisher","first-page":"5075","DOI":"10.18653\/v1\/2021.emnlp-main.414","article-title":"Paraphrase generation: A survey of the state of the art","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Zhou","year":"2021"},{"key":"2023071316415424300_bib38","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1901.00158","article-title":"Text infilling","author":"Zhu","year":"2019","journal-title":"CoRR"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00572\/2143279\/tacl_a_00572.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00572\/2143279\/tacl_a_00572.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,17]],"date-time":"2023-12-17T01:16:27Z","timestamp":1702775787000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00572\/116713\/Chinese-Idiom-Paraphrasing"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":38,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00572","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}