{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T00:08:18Z","timestamp":1755907698278,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,11,14]],"date-time":"2024-11-14T00:00:00Z","timestamp":1731542400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,11,14]]},"DOI":"10.1145\/3677052.3698651","type":"proceedings-article","created":{"date-parts":[[2024,11,14]],"date-time":"2024-11-14T06:38:06Z","timestamp":1731566286000},"page":"711-718","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Online Personalizing White-box LLMs Generation with Neural Bandits"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5564-137X","authenticated-orcid":false,"given":"Zekai","family":"Chen","sequence":"first","affiliation":[{"name":"JPMorganChase, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8568-8988","authenticated-orcid":false,"given":"Po-Yu","family":"Chen","sequence":"additional","affiliation":[{"name":"JPMorganChase, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2164-7087","authenticated-orcid":false,"given":"Francois","family":"Buet-Golfouse","sequence":"additional","affiliation":[{"name":"Barclays, AI\/ML Markets, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2024,11,14]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Shipra Agrawal and Navin Goyal. 2012. Thompson Sampling for Contextual Bandits with Linear Payoffs. In ICML. https:\/\/api.semanticscholar.org\/CorpusID:96146"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944941"},{"key":"e_1_3_2_1_3_1","volume-title":"Finite-time analysis of the multiarmed bandit problem. Machine learning 47","author":"Auer Peter","year":"2002","unstructured":"Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47 (2002), 235\u2013256."},{"key":"e_1_3_2_1_4_1","volume-title":"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. ArXiv abs\/2204.05862","author":"Bai Yuntao","year":"2022","unstructured":"Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, T.\u00a0J. Henighan, Nicholas Joseph, Saurav Kadavath, John Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom\u00a0B. Brown, Jack Clark, Sam McCandlish, Christopher Olah, Benjamin Mann, and Jared Kaplan. 2022. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. ArXiv abs\/2204.05862 (2022). https:\/\/api.semanticscholar.org\/CorpusID:248118878"},{"key":"e_1_3_2_1_5_1","volume-title":"Language models are few-shot learners. Advances in neural information processing systems 33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared\u00a0D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877\u20131901."},{"key":"e_1_3_2_1_6_1","volume-title":"Language Models are Few-Shot Learners. NeurIPS abs\/2005.14165","author":"Brown B.","year":"2020","unstructured":"Tom\u00a0B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T.\u00a0J. Henighan, Rewon Child, Aditya Ramesh, Daniel\u00a0M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. NeurIPS abs\/2005.14165 (2020). https:\/\/arxiv.org\/pdf\/2005.14165.pdf"},{"key":"e_1_3_2_1_7_1","volume-title":"InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models. ArXiv abs\/2306.03082","author":"Chen Lichang","year":"2023","unstructured":"Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, and Tianyi Zhou. 2023. InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models. ArXiv abs\/2306.03082 (2023). https:\/\/api.semanticscholar.org\/CorpusID:259075794"},{"key":"e_1_3_2_1_8_1","volume-title":"Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack\u00a0W. Rae, Oriol Vinyals, and L. Sifre.","author":"Hoffmann Jordan","year":"2022","unstructured":"Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las\u00a0Casas, Lisa\u00a0Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van\u00a0den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack\u00a0W. Rae, Oriol Vinyals, and L. Sifre. 2022. Training Compute-Optimal Large Language Models. ArXiv abs\/2203.15556 (2022). https:\/\/arxiv.org\/pdf\/2203.15556.pdf"},{"key":"e_1_3_2_1_9_1","volume-title":"UserNLP\u201922: 2022 International Workshop on User-centered Natural Language Processing. Companion Proceedings of the Web Conference 2022","author":"Huang Xiaolei","year":"2022","unstructured":"Xiaolei Huang, Lucie Flek, Franck Dernoncourt, Charles\u00a0F Welch, Silvio Amir, Ramit Sawhney, and Diyi Yang. 2022. UserNLP\u201922: 2022 International Workshop on User-centered Natural Language Processing. Companion Proceedings of the Web Conference 2022 (2022). http:\/\/dl.acm.org\/citation.cfm?id=3524879"},{"key":"e_1_3_2_1_10_1","volume-title":"Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback. ArXiv abs\/2303.05453","author":"Kirk Hannah\u00a0Rose","year":"2023","unstructured":"Hannah\u00a0Rose Kirk, Bertie Vidgen, Paul R\u00f6ttger, and Scott\u00a0A. Hale. 2023. Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback. ArXiv abs\/2303.05453 (2023). https:\/\/arxiv.org\/pdf\/2303.05453.pdf"},{"key":"e_1_3_2_1_11_1","volume-title":"Large Language Models are Zero-Shot Reasoners. NeurIPS abs\/2205.11916","author":"Kojima Takeshi","year":"2022","unstructured":"Takeshi Kojima, Shixiang\u00a0Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. NeurIPS abs\/2205.11916 (2022). https:\/\/arxiv.org\/pdf\/2205.11916.pdf"},{"volume-title":"Bandit algorithms","author":"Lattimore Tor","key":"e_1_3_2_1_12_1","unstructured":"Tor Lattimore and Csaba Szepesv\u00e1ri. 2020. Bandit algorithms. Cambridge University Press."},{"key":"e_1_3_2_1_13_1","volume-title":"A survey of large language models in finance (finllms). arXiv preprint arXiv:2402.02315","author":"Lee Jean","year":"2024","unstructured":"Jean Lee, Nicholas Stevens, Soyeon\u00a0Caren Han, and Minseok Song. 2024. A survey of large language models in finance (finllms). arXiv preprint arXiv:2402.02315 (2024)."},{"key":"e_1_3_2_1_14_1","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive nlp tasks","volume":"33","author":"Lewis Patrick","year":"2020","unstructured":"Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, Tim Rockt\u00e4schel, 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459\u20139474.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_15_1","volume-title":"Teach LLMs to Personalize - An Approach inspired by Writing Education. ArXiv abs\/2308.07968","author":"Li Cheng","year":"2023","unstructured":"Cheng Li, Mingyang Zhang, Qiaozhu Mei, Yaqing Wang, Spurthi\u00a0Amba Hombaiah, Yi Liang, and Michael Bendersky. 2023. Teach LLMs to Personalize - An Approach inspired by Writing Education. ArXiv abs\/2308.07968 (2023). https:\/\/arxiv.org\/pdf\/2308.07968.pdf"},{"key":"e_1_3_2_1_16_1","volume-title":"Multi-step jailbreaking privacy attacks on chatgpt. arXiv preprint arXiv:2304.05197","author":"Li Haoran","year":"2023","unstructured":"Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, and Yangqiu Song. 2023. Multi-step jailbreaking privacy attacks on chatgpt. arXiv preprint arXiv:2304.05197 (2023)."},{"key":"e_1_3_2_1_17_1","volume-title":"Multi-step Jailbreaking Privacy Attacks on ChatGPT. EMNLP abs\/2304.05197","author":"Li Haoran","year":"2023","unstructured":"Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, and Yangqiu Song. 2023. Multi-step Jailbreaking Privacy Attacks on ChatGPT. EMNLP abs\/2304.05197 (2023). https:\/\/arxiv.org\/pdf\/2304.05197.pdf"},{"key":"e_1_3_2_1_18_1","volume-title":"Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network. CIKM","author":"Li Junyi","year":"2020","unstructured":"Junyi Li, Siqing Li, Wayne\u00a0Xin Zhao, Gaole He, Zhicheng Wei, Nicholas\u00a0Jing Yuan, and Ji rong Wen. 2020. Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network. CIKM (2020). http:\/\/dl.acm.org\/citation.cfm?id=3411893"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772758"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"P. Li and Alexander Tuzhilin. 2019. Towards Controllable and Personalized Review Generation. In EMNLP. https:\/\/www.aclweb.org\/anthology\/D19-1319.pdf","DOI":"10.18653\/v1\/D19-1319"},{"key":"e_1_3_2_1_21_1","volume-title":"ROUGE: A Package for Automatic Evaluation of Summaries. In ACL. https:\/\/aclanthology.org\/W04-1013.pdf","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In ACL. https:\/\/aclanthology.org\/W04-1013.pdf"},{"key":"e_1_3_2_1_22_1","volume-title":"Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers. ArXiv abs\/2310.02905","author":"Lin Xiaoqiang","year":"2023","unstructured":"Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, and Bryan Kian\u00a0Hsiang Low. 2023. Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers. ArXiv abs\/2310.02905 (2023). https:\/\/api.semanticscholar.org\/CorpusID:263620801"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3560815"},{"key":"e_1_3_2_1_24_1","volume-title":"Fingpt: Democratizing internet-scale data for financial large language models. arXiv preprint arXiv:2307.10485","author":"Liu Xiao-Yang","year":"2023","unstructured":"Xiao-Yang Liu, Guoxuan Wang, and Daochen Zha. 2023. Fingpt: Democratizing internet-scale data for financial large language models. arXiv preprint arXiv:2307.10485 (2023)."},{"key":"e_1_3_2_1_25_1","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled Weight Decay Regularization. In ICLR. https:\/\/api.semanticscholar.org\/CorpusID:53592270"},{"key":"e_1_3_2_1_26_1","volume-title":"Direct Preference Optimization: Your Language Model is Secretly a Reward Model. ArXiv abs\/2305.18290","author":"Rafailov Rafael","year":"2023","unstructured":"Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher\u00a0D. Manning, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. ArXiv abs\/2305.18290 (2023). https:\/\/api.semanticscholar.org\/CorpusID:258959321"},{"key":"e_1_3_2_1_27_1","volume-title":"Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127","author":"Riquelme Carlos","year":"2018","unstructured":"Carlos Riquelme, George Tucker, and Jasper Snoek. 2018. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127 (2018)."},{"key":"e_1_3_2_1_28_1","volume-title":"Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633","author":"Rubin Ohad","year":"2021","unstructured":"Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2021. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633 (2021)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1561\/9781680834710"},{"key":"e_1_3_2_1_30_1","volume-title":"LaMP: When Large Language Models Meet Personalization. arxiv:2304","author":"Salemi Alireza","year":"2023","unstructured":"Alireza Salemi, Sheshera Mysore, Michael Bendersky, and Hamed Zamani. 2023. LaMP: When Large Language Models Meet Personalization. arxiv:2304.11406\u00a0[cs.CL]"},{"volume-title":"Reinforcement learning: An introduction","author":"Sutton S","key":"e_1_3_2_1_31_1","unstructured":"Richard\u00a0S Sutton and Andrew\u00a0G Barto. 2018. Reinforcement learning: An introduction. MIT press."},{"key":"e_1_3_2_1_32_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\u00a0N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_33_1","volume-title":"F. Xia, Quoc Le, and Denny Zhou.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed\u00a0Huai hsin Chi, F. Xia, Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS abs\/2201.11903 (2022). https:\/\/arxiv.org\/pdf\/2201.11903.pdf"},{"key":"e_1_3_2_1_34_1","volume-title":"Denny Zhou","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc\u00a0V Le, Denny Zhou, 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824\u201324837."},{"key":"e_1_3_2_1_35_1","unstructured":"Tengyang Xie John Langford Paul Mineiro and Ida Momennejad. 2021. Interaction-Grounded Learning. In ICML. https:\/\/api.semanticscholar.org\/CorpusID:235376933"},{"key":"e_1_3_2_1_36_1","unstructured":"Weitong Zhang Dongruo Zhou Lihong Li and Quanquan Gu. 2021. Neural Thompson Sampling. In ICLR."},{"key":"e_1_3_2_1_37_1","volume-title":"A Survey of Large Language Models. ArXiv abs\/2303.18223","author":"Zhao Wayne\u00a0Xin","year":"2023","unstructured":"Wayne\u00a0Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Z. Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jianyun Nie, and Ji rong Wen. 2023. A Survey of Large Language Models. ArXiv abs\/2303.18223 (2023). https:\/\/arxiv.org\/pdf\/2303.18223.pdf"},{"key":"e_1_3_2_1_38_1","volume-title":"International conference on machine learning. PMLR, 12697\u201312706","author":"Zhao Zihao","year":"2021","unstructured":"Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. In International conference on machine learning. PMLR, 12697\u201312706."},{"key":"e_1_3_2_1_39_1","unstructured":"Dongruo Zhou Lihong Li and Quanquan Gu. 2019. Neural Contextual Bandits with UCB-based Exploration. In ICML."},{"key":"e_1_3_2_1_40_1","unstructured":"Dongruo Zhou Lihong Li and Quanquan Gu. 2020. Neural contextual bandits with ucb-based exploration. In ICML. PMLR 11492\u201311502."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Jian Zhu and David Jurgens. 2021. Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles. In EMNLP. https:\/\/api.semanticscholar.org\/CorpusID:237431146","DOI":"10.18653\/v1\/2021.emnlp-main.25"}],"event":{"name":"ICAIF '24: 5th ACM International Conference on AI in Finance","acronym":"ICAIF '24","location":"Brooklyn NY USA"},"container-title":["Proceedings of the 5th ACM International Conference on AI in Finance"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3677052.3698651","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3677052.3698651","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T17:15:05Z","timestamp":1755882905000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3677052.3698651"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,14]]},"references-count":41,"alternative-id":["10.1145\/3677052.3698651","10.1145\/3677052"],"URL":"https:\/\/doi.org\/10.1145\/3677052.3698651","relation":{},"subject":[],"published":{"date-parts":[[2024,11,14]]},"assertion":[{"value":"2024-11-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}