{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,13]],"date-time":"2025-05-13T22:00:30Z","timestamp":1747173630726,"version":"3.40.5"},"reference-count":43,"publisher":"Cambridge University Press (CUP)","issue":"5","license":[{"start":{"date-parts":[[2024,1,23]],"date-time":"2024-01-23T00:00:00Z","timestamp":1705968000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications.<\/jats:p>","DOI":"10.1017\/s1351324923000566","type":"journal-article","created":{"date-parts":[[2024,1,23]],"date-time":"2024-01-23T09:14:22Z","timestamp":1706001262000},"page":"915-942","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":0,"title":["How do control tokens affect natural language generation tasks like text simplification"],"prefix":"10.1017","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-1071-5708","authenticated-orcid":false,"given":"Zihao","family":"Li","sequence":"first","affiliation":[]},{"given":"Matthew","family":"Shardlow","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2024,1,23]]},"reference":[{"key":"S1351324923000566_ref24","doi-asserted-by":"crossref","unstructured":"Nishihara, D. , Kajiwara, T. and Arase, Y. (2019). Controllable text simplification with lexical constraint loss. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy. Association for Computational Linguistics, pp. 260\u2013266.","DOI":"10.18653\/v1\/P19-2036"},{"key":"S1351324923000566_ref10","doi-asserted-by":"crossref","unstructured":"Dong, Y. , Li, Z. , Rezagholizadeh, M. and Cheung, J.C.K. (2019). Editnts: An neural programmer-interpreter model for sentence simplification through explicit editing. arXiv preprint arXiv:1906.08104.","DOI":"10.18653\/v1\/P19-1331"},{"key":"S1351324923000566_ref16","doi-asserted-by":"crossref","unstructured":"Lebret, R. , Grangier, D. and Auli, M. (2016). Neural text generation from structured data with application to the biography domain. arXiv preprint arXiv:1603.07771.","DOI":"10.18653\/v1\/D16-1128"},{"key":"S1351324923000566_ref18","doi-asserted-by":"crossref","unstructured":"Lewis, M. , Liu, Y. , Goyal, N. , Ghazvininejad, M. , Mohamed, A. , Levy, O. , Stoyanov, V. and Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 7871\u20137880. Online.","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"S1351324923000566_ref27","doi-asserted-by":"crossref","unstructured":"Petersen, S. E. and Ostendorf, M. (2007). Text simplification for language learners: a corpus analysis. In Workshop on Speech and Language Technology in Education. Citeseer.","DOI":"10.21437\/SLaTE.2007-20"},{"key":"S1351324923000566_ref7","doi-asserted-by":"crossref","unstructured":"Devaraj, A. , Wallace, B.C. , Marshall, I.J. and Li, J.J. (2021). Paragraph-level simplification of medical texts. In Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting. NIH Public Access, vol. 2021, p. 4972.","DOI":"10.18653\/v1\/2021.naacl-main.395"},{"key":"S1351324923000566_ref13","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1007\/s11023-020-09548-1","article-title":"Gpt-3: Its nature, scope, limits, and consequences","volume":"30","author":"Floridi","year":"2020","journal-title":"Minds and Machines"},{"key":"S1351324923000566_ref32","doi-asserted-by":"crossref","first-page":"58","DOI":"10.14569\/SpecialIssue.2014.040109","article-title":"A survey of automated text simplification","volume":"4","author":"Shardlow","year":"2014","journal-title":"International Journal of Advanced Computer Science and Applications"},{"key":"S1351324923000566_ref39","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1162\/tacl_a_00107","article-title":"Optimizing statistical machine translation for text simplification","volume":"4","author":"Xu","year":"2016","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"S1351324923000566_ref33","doi-asserted-by":"crossref","unstructured":"Sun, R. , Lin, Z. and Wan, X. (2020). On the helpfulness of document context to sentence simplification. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online). International Committee on Computational Linguistics, pp. 1411\u20131423.","DOI":"10.18653\/v1\/2020.coling-main.121"},{"volume-title":"Advances in Neural Information Processing Systems","year":"2017","author":"Vaswani","key":"S1351324923000566_ref35"},{"key":"S1351324923000566_ref38","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1162\/tacl_a_00139","article-title":"Problems in current text simplification research: New data can help","volume":"3","author":"Xu","year":"2015","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"S1351324923000566_ref17","first-page":"707","volume-title":"Soviet Physics Doklady","volume":"10","author":"Levenshtein","year":"1966"},{"key":"S1351324923000566_ref22","unstructured":"Martin, L. , Fan, A. , de la Clergerie, \u00c9. , Bordes, A. and Sagot, B. (2020b). Multilingual unsupervised sentence simplification. arXiv preprint arXiv:2005.00352."},{"key":"S1351324923000566_ref2","doi-asserted-by":"crossref","unstructured":"Alva-Manchego, F. , Martin, L. , Bordes, A. , Scarton, C. , Sagot, B. and Specia, L. (2020a). ASSET: A dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 4668\u20134679. Online.","DOI":"10.18653\/v1\/2020.acl-main.424"},{"key":"S1351324923000566_ref15","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/0004-3702(90)90084-D","article-title":"Pragmatics and natural language generation","volume":"43","author":"Hovy","year":"1990","journal-title":"Artificial Intelligence"},{"key":"S1351324923000566_ref12","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1037\/h0057532","article-title":"A new readability yardstick","volume":"32","author":"Flesch","year":"1948","journal-title":"Journal of Applied Psychology"},{"key":"S1351324923000566_ref28","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"S1351324923000566_ref25","unstructured":"Omelianchuk, K. , Raheja, V. and Skurzhanskyi, O. (2021). Text simplification by tagging. arXiv preprint arXiv:2103.05070."},{"key":"S1351324923000566_ref8","unstructured":"Devlin, J. , Chang, M.-W. , Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 4171\u20134186."},{"key":"S1351324923000566_ref23","doi-asserted-by":"crossref","unstructured":"Mei, H. , Bansal, M. and Walter, M.R. (2016). Listen, attend, and walk: Neural mapping of navigational instructions to action sequences. In Thirtieth AAAI Conference on Artificial Intelligence.","DOI":"10.1609\/aaai.v30i1.10364"},{"key":"S1351324923000566_ref19","doi-asserted-by":"crossref","unstructured":"Li, Z. , Shardlow, M. and Hassan, S. (2022). An investigation into the effect of control tokens on text simplification. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics, pp. 154\u2013165.","DOI":"10.18653\/v1\/2022.tsar-1.14"},{"key":"S1351324923000566_ref34","doi-asserted-by":"crossref","unstructured":"Surya, S. , Mishra, A. , Laha, A. , Jain, P. and Sankaranarayanan, K. (2019). Unsupervised neural text simplification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, pp. 2058\u20132068.","DOI":"10.18653\/v1\/P19-1198"},{"key":"S1351324923000566_ref36","doi-asserted-by":"crossref","unstructured":"Wen, T.-H. , Gasic, M. , Mrksic, N. , Su, P.-H. , Vandyke, D. and Young, S. (2015). Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745.","DOI":"10.18653\/v1\/D15-1199"},{"key":"S1351324923000566_ref11","doi-asserted-by":"crossref","unstructured":"Du\u0161ek, O. and Jur\u010d\u00ed\u010dek, F. (2016). Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. arXiv preprint arXiv:1606.05491.","DOI":"10.18653\/v1\/P16-2008"},{"key":"S1351324923000566_ref26","doi-asserted-by":"crossref","unstructured":"Papineni, K. , Roukos, S. , Ward, T. and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, pp. 311\u2013318.","DOI":"10.3115\/1073083.1073135"},{"volume-title":"Linguistic Databases","year":"1998","author":"Devlin","key":"S1351324923000566_ref9"},{"key":"S1351324923000566_ref20","unstructured":"Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, Barcelona, Spain. Association for Computational Linguistics, pp. 74\u201381."},{"volume-title":"Advances in Neural Information Processing Systems","year":"2019","author":"Yang","key":"S1351324923000566_ref40"},{"key":"S1351324923000566_ref41","unstructured":"Zhang, T. , Kishore, V. , Wu, F. , Weinberger, K.Q. and Artzi, Y. (2019). Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904."},{"key":"S1351324923000566_ref42","doi-asserted-by":"crossref","unstructured":"Zhang, X. and Lapata, M. (2017). Sentence simplification with deep reinforcement learning. arXiv preprint arXiv:1703.10931.","DOI":"10.18653\/v1\/D17-1062"},{"key":"S1351324923000566_ref43","doi-asserted-by":"crossref","unstructured":"Zhao, S. , Meng, R. , He, D. , Saptono, A. and Parmanto, B. (2018). Integrating transformer and paraphrase rules for sentence simplification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Association for Computational Linguistics, pp. 3164\u20133173.","DOI":"10.18653\/v1\/D18-1355"},{"key":"S1351324923000566_ref14","unstructured":"Guo, H. , Pasunuru, R. and Bansal, M. (2018). Dynamic multi-level multi-task learning for sentence simplification. CoRR, abs\/1806.07304."},{"key":"S1351324923000566_ref21","unstructured":"Martin, L. , de la Clergerie, \u00c9. , Sagot, B. and Bordes, A. (2020a). Controllable sentence simplification. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France. European Language Resources Association, pp. 4689\u20134698."},{"key":"S1351324923000566_ref6","first-page":"19","volume-title":"Proceedings of the SIGIR Workshop on Accessible Search Systems","author":"De Belder","year":"2010"},{"key":"S1351324923000566_ref1","doi-asserted-by":"crossref","first-page":"3757","DOI":"10.18653\/v1\/2021.findings-acl.330","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Agrawal","year":"2021"},{"key":"S1351324923000566_ref30","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1017\/S1351324997001502","article-title":"Building applied natural language generation systems","volume":"3","author":"Reiter","year":"1997","journal-title":"Natural Language Engineering"},{"key":"S1351324923000566_ref37","unstructured":"Wubben, S. , van den Bosch, A. and Krahmer, E. (2012). Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jeju Island, Korea. Association for Computational Linguistics, pp.1015\u20131024."},{"key":"S1351324923000566_ref29","unstructured":"Rapin, J. and Teytaud, O. (2018). Nevergrad - a gradient-free optimization platform. Available at: https:\/\/GitHub.com\/FacebookResearch\/Nevergrad."},{"key":"S1351324923000566_ref3","doi-asserted-by":"crossref","unstructured":"Alva-Manchego, F. , Martin, L. , Scarton, C. and Specia, L. (2019). EASSE: Easier automatic sentence simplification evaluation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China. Association for Computational Linguistics, pp. 49\u201354.","DOI":"10.18653\/v1\/D19-3009"},{"key":"S1351324923000566_ref31","unstructured":"Scialom, T. , Martin, L. , Staiano, J. , de la Clergerie, \u00c9.V. and Sagot, B. (2021). Rethinking automatic evaluation in sentence simplification. arXiv preprint arXiv:2104.07560."},{"key":"S1351324923000566_ref5","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1162\/coli_a_00418","article-title":"The (un)suitability of automatic evaluation metrics for text simplification","volume":"47","author":"Alva-Manchego","year":"2021","journal-title":"Computational Linguistics"},{"key":"S1351324923000566_ref4","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/coli_a_00370","article-title":"Data-driven sentence simplification: Survey and benchmark","volume":"46","author":"Alva-Manchego","year":"2020","journal-title":"Computational Linguistics"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324923000566","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T01:04:42Z","timestamp":1731027882000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324923000566\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,23]]},"references-count":43,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["S1351324923000566"],"URL":"https:\/\/doi.org\/10.1017\/s1351324923000566","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[2024,1,23]]},"assertion":[{"value":"\u00a9 The Author(s), 2024. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}