{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T04:48:26Z","timestamp":1776746906736,"version":"3.51.2"},"reference-count":22,"publisher":"World Scientific Pub Co Pte Ltd","issue":"01","funder":[{"name":"NSF","award":["2101021"],"award-info":[{"award-number":["2101021"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Semantic Computing"],"published-print":{"date-parts":[[2026,3]]},"abstract":"<jats:p>We introduce a Reinforcement Learning (RL)-based framework to optimize discrete natural language prompts for enhancing both the accuracy and clarity in sentence simplification. Using a lightweight PPO policy, our method learns to guide a frozen small-scale LLaMA-3.2 3B model toward effective simplification for supporting user-centric computational thinking tasks. Results show that our RL-optimized prompts significantly surpass manual baselines in semantic fidelity, logical coherence, and instructional quality. Moreover, the proposed RL-optimized prompting approach enables a much smaller LLM to achieve results that are comparable in clarity and instructional value to those produced by a much larger LLaMA-3.3 70B model.<\/jats:p>","DOI":"10.1142\/s1793351x26410035","type":"journal-article","created":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T09:55:41Z","timestamp":1770976541000},"page":"45-69","source":"Crossref","is-referenced-by-count":0,"title":["RL\u2013Based Adaptive Prompt Optimization for User\u2013Centric Structured Sentence Simplification via Small Language Models"],"prefix":"10.1142","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-4874-1469","authenticated-orcid":false,"given":"Shubham S.","family":"Bhatt","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1562-4409","authenticated-orcid":false,"given":"Michael S.","family":"Hsiao","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2026,3,6]]},"reference":[{"key":"S1793351X26410035BIB001","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1062"},{"key":"S1793351X26410035BIB002","doi-asserted-by":"crossref","unstructured":"D. Aumiller and M. Gertz, Unihd at TSAR-2022 shared task: Is compute all we need for lexical simplification? (2023).","DOI":"10.18653\/v1\/2022.tsar-1.28"},{"key":"S1793351X26410035BIB003","unstructured":"L. V\u00e1squez-Rodr\u00edguez, N. T. H. Nguyen, P. Przyby\u0142a, M. Shardlow and S. Ananiadou, Simple is not enough: Document-level text simplification using readability and coherence, preprint (2024), arXiv:2412.18655 [cs.CL]."},{"key":"S1793351X26410035BIB004","unstructured":"B. Workshop : T. L. Scao\n                      et al\n                      ., Bloom: A 176b-parameter open-access multilingual language model, preprint (2023), arXiv:2211.05100 [cs.CL]."},{"key":"S1793351X26410035BIB005","first-page":"1","volume-title":"Proc. Int Conf. Artificial Intelligence x Humanities, Education, and Art","author":"Bhatt S. S.","year":"2025"},{"key":"S1793351X26410035BIB006","unstructured":"M. Ranzato, S. Chopra, M. Auli and W. Zaremba, Sequence level training with recurrent neural networks, preprint (2016), arXiv:1511.06732 [cs.LG]."},{"key":"S1793351X26410035BIB007","doi-asserted-by":"crossref","unstructured":"J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao and D. Jurafsky, Deep reinforcement learning for dialogue generation, preprint (2016), arXiv:1606.01541 [cs.CL].","DOI":"10.18653\/v1\/D16-1127"},{"key":"S1793351X26410035BIB008","unstructured":"R. Paulus, C. Xiong and R. Socher, A deep reinforced model for abstractive summarization, preprint (2017), arXiv:1705.04304 [cs.CL]."},{"key":"S1793351X26410035BIB009","unstructured":"D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano and G. Irving, Fine-tuning language models from human preferences, preprint (2020), arXiv:1909.08593v2 [cs.CL]."},{"key":"S1793351X26410035BIB010","doi-asserted-by":"crossref","unstructured":"S. Kumar, S. Bhatia, M. Aggarwal and T. Chakraborty, Dialogue agents 101: A beginner\u2019s guide to critical ingredients for designing effective conversational systems, preprint (2024), arXiv:2307.07255 [cs.CL].","DOI":"10.1017\/nlp.2024.42"},{"key":"S1793351X26410035BIB011","doi-asserted-by":"crossref","unstructured":"B. Lester, R. Al-Rfou and N. Constant, The power of scale for parameter-efficient prompt tuning, preprint (2021), arXiv:2104.08691 [cs.CL].","DOI":"10.18653\/v1\/2021.emnlp-main.243"},{"key":"S1793351X26410035BIB012","unstructured":"X. L. Li and P. Liang, Prefix-tuning: Optimizing continuous prompts for generation, preprint (2021), arXiv:2101.00190 [cs.CL]."},{"key":"S1793351X26410035BIB013","doi-asserted-by":"crossref","unstructured":"T. Shin, Y. Razeghi, R. L. L. IV, E. Wallace and S. Singh, Autoprompt: Eliciting knowledge from language models with automatically generated prompts, preprint (2020), arXiv:2010.15980 [cs.CL].","DOI":"10.18653\/v1\/2020.emnlp-main.346"},{"key":"S1793351X26410035BIB014","doi-asserted-by":"crossref","unstructured":"M. Deng, J. Wang, C.P. Hsieh, Y. Wang, H. Guo, T. Shu, M. Song, E. P. Xing and Z. Hu, Rlprompt: Optimizing discrete text prompts with reinforcement learning, preprint (2022), arXiv:2205.12548 [cs.CL].","DOI":"10.18653\/v1\/2022.emnlp-main.222"},{"key":"S1793351X26410035BIB015","doi-asserted-by":"crossref","unstructured":"M. Kwon, G. Kim, J. Kim, H. Lee and J. Kim, Stableprompt: Automatic prompt tuning using reinforcement learning for large language models, preprint (2024), arXiv:2410.07652 [cs.CL].","DOI":"10.18653\/v1\/2024.emnlp-main.551"},{"key":"S1793351X26410035BIB016","doi-asserted-by":"publisher","DOI":"10.1145\/3657604.3662032"},{"key":"S1793351X26410035BIB017","first-page":"2790","volume-title":"Proc. 36th Int. Conf. Machine Learning","author":"Houlsby N.","year":"2019"},{"key":"S1793351X26410035BIB018","first-page":"1","volume-title":"Int. Conf. Learning Representations","author":"Hu E. J.","year":"2022"},{"key":"S1793351X26410035BIB019","unstructured":"J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms, preprint (2017), arXiv:1707.06347."},{"key":"S1793351X26410035BIB020","unstructured":"L. Ouyang\n                      et al.\n                      , Training language models to follow instructions with human feedback, https:\/\/arxiv.org\/abs\/2203.02155 (2022)."},{"key":"S1793351X26410035BIB021","unstructured":"J. Schulman, P. Moritz, S. Levine, M. I. Jordan and P. Abbeel, High-dimensional continuous control using generalized advantage estimation, preprint (2016), arXiv:1506.02438."},{"key":"S1793351X26410035BIB022","doi-asserted-by":"publisher","DOI":"10.1137\/0218082"}],"container-title":["International Journal of Semantic Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S1793351X26410035","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T04:28:23Z","timestamp":1776745703000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S1793351X26410035"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3]]},"references-count":22,"journal-issue":{"issue":"01","published-print":{"date-parts":[[2026,3]]}},"alternative-id":["10.1142\/S1793351X26410035"],"URL":"https:\/\/doi.org\/10.1142\/s1793351x26410035","relation":{},"ISSN":["1793-351X","1793-7108"],"issn-type":[{"value":"1793-351X","type":"print"},{"value":"1793-7108","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3]]}}}