{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:41:17Z","timestamp":1760060477829,"version":"build-2065373602"},"reference-count":32,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T00:00:00Z","timestamp":1756425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Hankuk University of Foreign Studies Research Fund"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Systems"],"abstract":"<jats:p>Prompt optimization through textual feedback has shown promising results in improving the performance of large language models (LLMs) on downstream tasks. However, existing approaches often rely on selecting prompt edits from a pool of candidate gradients using random sampling or local heuristics, requiring multiple evaluations to find effective modifications. In this work, we propose a center-aware selection method that identifies high-quality gradient candidates based on their proximity to a robust semantic center representation of the gradient pool. Rather than sampling or scoring candidates iteratively, our method embeds all textual gradients and deterministically selects the top-k closest to the semantic center, which captures the consensus of the candidate pool. Experiments on three diverse datasets demonstrate that our approach not only improves predictive performance but also reduces the number of required model queries. In addition, qualitative analyses reveal that gradients near the center tend to encode more generalizable reasoning patterns. These findings highlight the utility of semantic embedding space as a reliable signal for selecting effective prompt edits in a resource-efficient manner.<\/jats:p>","DOI":"10.3390\/systems13090748","type":"journal-article","created":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:42:21Z","timestamp":1756485741000},"page":"748","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Prompt Optimization System Based on Center-Aware Textual Gradients"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-2945-3649","authenticated-orcid":false,"given":"Yeryung","family":"Jang","sequence":"first","affiliation":[{"name":"Division of Computer Engineering, Hankuk University of Foreign Studies, Yongin-si 17035, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1992-0731","authenticated-orcid":false,"given":"Jaekeol","family":"Choi","sequence":"additional","affiliation":[{"name":"Division of AI Data Convergence, Hankuk University of Foreign Studies, Yongin-si 17035, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,29]]},"reference":[{"key":"ref_1","unstructured":"Bubeck, S., Chadrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., and Lundberg, S. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv."},{"key":"ref_2","unstructured":"OpenAI (2023). GPT-4 Technical Report. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Reynolds, L., and McDonell, K. (2021, January 8\u201313). Prompt programming for large language models: Beyond the few-shot paradigm. Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan.","DOI":"10.1145\/3411763.3451760"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Sahoo, P., Singh, A.K., Saha, S., Jain, V., Mondal, S., and Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv.","DOI":"10.1007\/979-8-8688-0569-1_4"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Lu, Y., Bartolo, M., Moore, A., Riedel, S., and Stenetorp, P. (2021). Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv.","DOI":"10.18653\/v1\/2022.acl-long.556"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Webson, A., and Pavlick, E. (2021). Do prompt-based models really understand the meaning of their prompts?. arXiv.","DOI":"10.18653\/v1\/2022.naacl-main.167"},{"key":"ref_7","unstructured":"Zhou, Y., Muresanu, A.I., Han, Z., Paster, K., Pitis, S., Chan, H., and Ba, J. (2022, January 25\u201329). Large Language Models are Human-Level Prompt Engineers. Proceedings of the Eleventh International Conference on Learning Representations, Online."},{"key":"ref_8","unstructured":"Chang, K., Xu, S., Wang, C., Luo, Y., Liu, X., Xiao, T., and Zhu, J. (2024). Efficient Prompting Methods for Large Language Models: A Survey. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Pryzant, R., Iter, D., Li, J., Lee, Y.T., Zhu, C., and Zeng, M. (2023). Automatic prompt optimization with \u201cgradient descent\u201d and beam search. arXiv.","DOI":"10.18653\/v1\/2023.emnlp-main.494"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ye, Q., Axmed, M., Pryzant, R., and Khani, F. (2023). Prompt engineering a prompt engineer. arXiv.","DOI":"10.18653\/v1\/2024.findings-acl.21"},{"key":"ref_11","first-page":"28893","article-title":"STaR: Bootstrapping Reasoning With Reasoning","volume":"35","author":"Zelikman","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Lester, B., Al-Rfou, R., and Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.243"},{"key":"ref_13","unstructured":"Qin, C., and Joty, S. (2021). LFPT5: A unified framework for lifelong few-shot language learning based on prompt tuning of t5. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Deng, M., Wang, J., Hsieh, C.P., Wang, Y., Guo, H., Shu, T., Song, M., Xing, E.P., and Hu, Z. (2022). Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv.","DOI":"10.18653\/v1\/2022.emnlp-main.222"},{"key":"ref_15","unstructured":"Sun, H., Li, X., Xu, Y., Homma, Y., Cao, Q., Wu, M., Jiao, J., and Charles, D. (2023). Autohint: Automatic prompt optimization with hint generation. arXiv."},{"key":"ref_16","unstructured":"Wang, X., Li, C., Wang, Z., Bai, F., Luo, H., Zhang, J., Jojic, N., Xing, E.P., and Hu, Z. (2023). Promptagent: Strategic planning with language models enables expert-level prompt optimization. arXiv."},{"key":"ref_17","unstructured":"Ma, R., Wang, X., Zhou, X., Li, J., Du, N., Gui, T., Zhang, Q., and Huang, X. (2024). Are large language models good prompt optimizers?. arXiv."},{"key":"ref_18","unstructured":"Shinn, N., Cassano, F., Labash, B., Gopinath, A., Narasimhan, K., and Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. arXiv."},{"key":"ref_19","unstructured":"Madaan, A., Lin, X., Lee, R., Yang, K., Baral, C., Hakkani-T\u00fcr, D., Zaiane, O., and Liu, X. (2023). Self-Refine: Iterative Refinement with Self-Feedback. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, Y., Yang, C., and Ettinger, A. (2024, January 16\u201321). When Hindsight is Not 20\/20: Testing Limits on Reflective Thinking in Large Language Models. Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico.","DOI":"10.18653\/v1\/2024.findings-naacl.237"},{"key":"ref_21","unstructured":"Liu, F., Scao, T., Xie, L., Shelton, M., Gundersen, O.E., Ruder, S., Wang, T., Zettlemoyer, L., Reichart, R., and Gurevych, I. (2023). ProTeGi: Prompt Tuning with Textual Gradients. arXiv."},{"key":"ref_22","unstructured":"Yuksekgonul, M., Bianchi, F., Boen, J., Liu, S., Huang, Z., Guestrin, C., and Zou, J. (2024). TextGrad: Automatic \u201cDifferentiation\u201d via Text. arXiv."},{"key":"ref_23","unstructured":"Zhu, K., Zhao, Q., Chen, H., Wang, J., and Xie, X. (2023). PromptBench: A Unified Library for Evaluation of Large Language Models. arXiv."},{"key":"ref_24","unstructured":"Chen, K., Zhou, Y., Zhang, X., and Wang, H. (2025). Prompt Stability Matters: Evaluating and Optimizing Auto-Generated Prompts. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Gao, T., Fisch, A., and Chen, D. (2021). Making pre-trained language models better few-shot learners. arXiv.","DOI":"10.18653\/v1\/2021.acl-long.295"},{"key":"ref_26","unstructured":"Qiang, Y., Nandi, S., Mehrabi, N., Ver Steeg, G., Kumar, A., Rumshisky, A., and Galstyan, A. (2024, January 17\u201322). Prompt Perturbation Consistency Learning for Robust Language Models. Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, St. Julians, Malta."},{"key":"ref_27","unstructured":"Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayanan, D., Wu, Y., and Kumar, A. (2022). Holistic evaluation of language models. arXiv."},{"key":"ref_28","first-page":"2687","article-title":"Binary or Graded, Few-Shot or Zero-Shot: Prompt Design for GPTs in Relevance Evaluation","volume":"4","author":"Choi","year":"2024","journal-title":"Adv. Artif. Intell. Mach. Learn."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Choi, J. (2025). Efficient Prompt Optimization for Relevance Evaluation via LLM-Based Confusion Matrix Feedback. Appl. Sci., 15.","DOI":"10.3390\/app15095198"},{"key":"ref_30","unstructured":"Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q.V., Zhou, D., and Chen, X. (2023). Large language models as optimizers. arXiv."},{"key":"ref_31","first-page":"4077","article-title":"Prototypical Networks for Few-shot Learning","volume":"30","author":"Snell","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_32","first-page":"19368","article-title":"Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue","volume":"38","author":"Yang","year":"2024","journal-title":"AAAI Conf. Artif. Intell."}],"container-title":["Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-8954\/13\/9\/748\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:35:28Z","timestamp":1760034928000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-8954\/13\/9\/748"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,29]]},"references-count":32,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["systems13090748"],"URL":"https:\/\/doi.org\/10.3390\/systems13090748","relation":{},"ISSN":["2079-8954"],"issn-type":[{"type":"electronic","value":"2079-8954"}],"subject":[],"published":{"date-parts":[[2025,8,29]]}}}