{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T17:00:25Z","timestamp":1783098025945,"version":"3.54.6"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","license":[{"start":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:00:00Z","timestamp":1720742400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2024,7,12]]},"abstract":"<jats:p>Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptability of LLMs, we propose a hybrid approach consisting of three modules. We not only write prompts to instruct LLMs to complete tasks, but we also invoke Analytic Rule Interfaces (ARIs) to accomplish tasks. The ARIs are Python code generated by prompting LLMs to generate code. We first construct a knowledge module with three elements including ASTscenario, ASTcomponent and Condition, and prompt LLMs to generate Python code for incorporation into an ARI library for subsequent use. After that, for any syntax-error-free Python code, we invoke ARIs from the ARI library to extract ASTcomponent from the ASTscenario, and then filter out ASTcomponent that does not meet the condition. Finally, we design prompts to instruct LLMs to abstract and idiomatize code, and then invoke ARIs from the ARI library to rewrite non-idiomatic code into the idiomatic code. Next, we conduct a comprehensive evaluation of our approach, RIdiom, and Prompt-LLM on nine established Pythonic idioms in RIdiom. Our approach exhibits superior accuracy, F1-score, and recall, while maintaining precision levels comparable to RIdiom, all of which consistently exceed or come close to 90% for each metric of each idiom. Lastly, we extend our evaluation to encompass four new Pythonic idioms. Our approach consistently outperforms Prompt-LLM, achieving metrics with values consistently exceeding 90% for accuracy, F1-score, precision, and recall.<\/jats:p>","DOI":"10.1145\/3643776","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T10:22:09Z","timestamp":1720779729000},"page":"1107-1128","source":"Crossref","is-referenced-by-count":15,"title":["Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-8877-4762","authenticated-orcid":false,"given":"Zejun","family":"Zhang","sequence":"first","affiliation":[{"name":"Australian National University, Canberra, Australia"},{"name":"CSIRO's Data61, Canberra, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7663-1421","authenticated-orcid":false,"given":"Zhenchang","family":"Xing","sequence":"additional","affiliation":[{"name":"CSIRO's Data61, Canberra, Australia"},{"name":"Australian National University, Canberra, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5526-1617","authenticated-orcid":false,"given":"Xiaoxue","family":"Ren","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9466-1672","authenticated-orcid":false,"given":"Qinghua","family":"Lu","sequence":"additional","affiliation":[{"name":"CSIRO's Data61, Sydney, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2273-1862","authenticated-orcid":false,"given":"Xiwei","family":"Xu","sequence":"additional","affiliation":[{"name":"CSIRO's Data61, Sydney, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,7,12]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2022. Programming Idioms. https:\/\/programming-idioms.org\/"},{"key":"e_1_3_1_3_2","unstructured":"2023. GPT. https:\/\/platform.openai.com\/docs\/guides\/gpt"},{"key":"e_1_3_1_4_2","unstructured":"2023. Introducing ChatGPT. https:\/\/chat.openai.com\/"},{"key":"e_1_3_1_5_2","unstructured":"2023. OpenAI Codex. https:\/\/openai.com\/blog\/openai-codex"},{"key":"e_1_3_1_6_2","unstructured":"2023. Replication Package. https:\/\/github.com\/idiomaticrefactoring\/IdiomatizationLLM"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Carol V Alexandru Jos\u00e9 J Merchante Sebastiano Panichella Sebastian Proksch Harald C Gall and Gregorio Robles. 2018. On the usage of pythonic idioms. In Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas New Paradigms and Reflections on Programming and Software. 1\u201311.","DOI":"10.1145\/3276954.3276960"},{"key":"e_1_3_1_8_2","unstructured":"D. Bader. 2017. Python Tricks: A Buffet of Awesome Python Features. BookBaby. https:\/\/books.google.co.in\/books?id=C0VKDwAAQBAJ"},{"key":"e_1_3_1_9_2","unstructured":"D. Beazley and B.K. Jones. 2013. Python Cookbook: 3rd Edition. O\u2019Reilly Media Incorporated. https:\/\/books.google.com.au\/books?id=oBKwkgEACAAJ"},{"key":"e_1_3_1_10_2","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]"},{"key":"e_1_3_1_11_2","unstructured":"Bei Chen Fengji Zhang Anh Nguyen Daoguang Zan Zeqi Lin Jian-Guang Lou and Weizhu Chen. 2022. CodeT: Code Generation with Generated Tests. arXiv:2207.10397 [cs.CL]"},{"key":"e_1_3_1_12_2","unstructured":"Quantified Code. 2014. The Little Book of Python Anti-Patterns. https:\/\/github.com\/quantifiedcode\/python-anti-patterns"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","unstructured":"Antonia Creswell Murray Shanahan and Irina Higgins. 2022. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning. https:\/\/doi.org\/10.48550\/arXiv.2205.09712 10.48550\/arXiv.2205.09712","DOI":"10.48550\/arXiv.2205.09712"},{"key":"e_1_3_1_14_2","unstructured":"Python developers. 2000. Python Enhancement Proposals. https:\/\/peps.python.org\/pep-0000\/"},{"key":"e_1_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Malinda Dilhara Danny Dig and Ameya Ketkar. 2023. PYEVOLVE: Automating Frequent Code Changes in Python ML Systems. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE 995\u20131007.","DOI":"10.1109\/ICSE48619.2023.00091"},{"key":"e_1_3_1_16_2","unstructured":"Yihong Dong Xue Jiang Zhi Jin and Ge Li. 2023. Self-collaboration Code Generation via ChatGPT. arXiv:2304.07590 [cs.SE]"},{"key":"e_1_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Aamir Farooq and Vadim Zaytsev. 2021. There is More than One Way to Zen Your Python. In Proceedings of the 14th ACM SIGPLAN International Conference on Software Language Engineering. 68\u201382.","DOI":"10.1145\/3486608.3486909"},{"key":"e_1_3_1_18_2","doi-asserted-by":"crossref","unstructured":"Sidong Feng and Chunyang Chen. 2023. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. arXiv:2306.01987 [cs.SE]","DOI":"10.1145\/3597503.3608137"},{"key":"e_1_3_1_19_2","unstructured":"Daniel Fried Armen Aghajanyan Jessy Lin Sida Wang Eric Wallace Freda Shi Ruiqi Zhong Wen tau Yih Luke Zettle-moyer and Mike Lewis. 2023. InCoder: A Generative Model for Code Infilling and Synthesis. arXiv:2204.05999 [cs.SE]"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549098"},{"key":"e_1_3_1_21_2","unstructured":"Raymond Hettinger. 2013. Transforming code into beautiful idiomatic Python. https:\/\/www.youtube.com\/watch?v=OSGv2VnC0go"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556912"},{"key":"e_1_3_1_23_2","unstructured":"Qing Huang Jiahui Zhu Zhenchang Xing Huan Jin Changjing Wang and Xiwei Xu. 2023. A Chain of AI-based Solutions for Resolving FQNs and Fixing Syntax Errors in Partial Code. arXiv: 2306.11981 [cs.SE]"},{"key":"e_1_3_1_24_2","unstructured":"Qing Huang Zhou Zou Zhenchang Xing Zhenkang Zuo Xiwei Xu and Qinghua Lu. 2023. AI Chain on Large Language Model for Unsupervised Control Flow Graph Generation for Statically-Typed Partial Code. arXiv:2306.00757 [cs.SE]"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510203"},{"key":"e_1_3_1_26_2","volume-title":"Writing Idiomatic Python 3.3","author":"Knupp Jeff","year":"2013","unstructured":"Jeff Knupp. 2013. Writing Idiomatic Python 3.3. Jeff Knupp."},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Pattara Leelaprute Bodin Chinthanet Supatsara Wattanakriengkrai Raula Gaikovina Kula Pongchai Jaisri and Takashi Ishio. 2022. Does coding in pythonic zen peak performance? preliminary experiments of nine pythonic idioms at scale. In Proceedings of the 30th IEEE\/ACM International Conference on Program Comprehension. 575\u2013579.","DOI":"10.1145\/3524610.3527879"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","unstructured":"Caroline Lemieux Jeevana Priya Inala Shuvendu K. Lahiri and Siddhartha Sen. 2023. CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 919\u2013931. https:\/\/doi.org\/10.1109\/ICSE48619.2023.00085 10.1109\/ICSE48619.2023.00085","DOI":"10.1109\/ICSE48619.2023.00085"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.abq1158"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00119"},{"key":"e_1_3_1_31_2","unstructured":"Jos\u00e9 J. Merchante. 2017. From Python to Pythonic: Searching for Python idioms in GitHub. https:\/\/api.semanticscholar.org\/CorpusID:211530803"},{"key":"e_1_3_1_32_2","unstructured":"Jos\u00e9 Javier Merchante and Gregorio Robles. 2017. From Python to Pythonic: Searching for Python idioms in GitHub. In Proceedings of the Seminar Series on Advanced Techniques and Tools for Software Evolution. 1\u20133."},{"key":"e_1_3_1_33_2","unstructured":"Erik Nijkamp Bo Pang Hiroaki Hayashi Lifu Tu Haiquan Wang Yingbo Zhou Silvio Savarese and Caiming Xiong. 2022. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In International Conference on Learning Representations. https:\/\/api.semanticscholar.org\/CorpusID:252668917"},{"key":"e_1_3_1_34_2","unstructured":"OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Yun Peng Chaozheng Wang Wenxuan Wang Cuiyun Gao and Michael R. Lyu. 2023. Generative Type Inference for Python. arXiv:2307.09163 [cs.SE]","DOI":"10.1109\/ASE56229.2023.00031"},{"key":"e_1_3_1_36_2","doi-asserted-by":"crossref","unstructured":"Purit Phan-udom Naruedon Wattanakul Tattiya Sakulniwat Chaiyong Ragkhitwetsagul Thanwadee Sunetnanta Morakot Choetkiertikul and Raula Gaikovina Kula. 2020. Teddy: Automatic Recommendation of Pythonic Idiom Usage For Pull-Based Software Projects. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE 806\u2013809.","DOI":"10.1109\/ICSME46990.2020.00098"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00143"},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","unstructured":"Tattiya Sakulniwat Raula Gaikovina Kula Chaiyong Ragkhitwetsagul Morakot Choetkiertikul Thanwadee Sunetnanta Dong Wang Takashi Ishio and Kenichi Matsumoto. 2019. Visualizing the usage of pythonic idioms over time: A case study of the with open idiom. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE 43\u2013435.","DOI":"10.1109\/IWESEP49350.2019.00016"},{"key":"e_1_3_1_39_2","doi-asserted-by":"crossref","unstructured":"Danilo Silva and Marco Tulio Valente. 2017. RefDiff: Detecting refactorings in version histories. In 2017 IEEE\/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE 269\u2013279.","DOI":"10.1109\/MSR.2017.14"},{"key":"e_1_3_1_40_2","volume-title":"Effective Python : 90 specific ways to write better Python \/ Brett Slatkin","author":"Slatkin Brett","year":"2020","unstructured":"Brett Slatkin. 2020. Effective Python : 90 specific ways to write better Python \/ Brett Slatkin. (second edition. ed.). Addison-Wesley, Place of publication not identified."},{"key":"e_1_3_1_41_2","volume-title":"Programming in Python 3: A Complete Introduction to the Python Language","author":"Summerfield Mark","year":"2009","unstructured":"Mark Summerfield. 2009. Programming in Python 3: A Complete Introduction to the Python Language (2nd ed.). Addison-Wesley Professional.","edition":"2"},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1007\/978-3-031-37963-5_47","volume-title":"Intelligent Computing","author":"Szalontai Bal\u00e1zs","year":"2023","unstructured":"Bal\u00e1zs Szalontai, \u00c1kos Kukucska, Andr\u00e1s Vad\u00e1sz, Bal\u00e1zs Pint\u00e9r, and Tibor Gregorics. 2023. Localizing and Idiomatizing Nonidiomatic Python Code with Deep Learning. In Intelligent Computing, Kohei Arai (Ed.). Springer Nature Switzerland, Cham, 683\u2013702."},{"key":"e_1_3_1_43_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]"},{"key":"e_1_3_1_44_2","doi-asserted-by":"crossref","unstructured":"Nikolaos Tsantalis Matin Mansouri Laleh M Eshkevari Davood Mazinanian and Danny Dig. 2018. Accurate and efficient refactoring detection in commit history. In Proceedings of the 40th international conference on software engineering. 483\u2013494.","DOI":"10.1145\/3180155.3180206"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","unstructured":"Priyan Vaithilingam Tianyi Zhang and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In CHI \u201922: CHI Conference on Human Factors in Computing Systems New Orleans LA USA 29 April 2022 - 5 May 2022 Extended Abstracts Simone D. J. Barbosa Cliff Lampe Caroline Appert and David A. Shamma (Eds.). ACM 332:1\u2013332:7. https:\/\/doi.org\/10.1145\/3491101.351966510.1145\/3491101.3519665","DOI":"10.1145\/3491101.3519665"},{"issue":"5","key":"e_1_3_1_46_2","first-page":"360","article-title":"Understanding interobserver agreement: the kappa statistic","volume":"37","author":"Viera Anthony J","year":"2005","unstructured":"Anthony J Viera, Joanne M Garrett, et al. 2005. Understanding interobserver agreement: the kappa statistic. Fam med 37, 5 (2005), 360\u2013363.","journal-title":"Fam med"},{"key":"e_1_3_1_47_2","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter Fei Xia Ed Chi Quoc Le and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3517582"},{"key":"e_1_3_1_49_2","unstructured":"Tongshuang Wu Michael Terry and Carrie J. Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. arXiv:2110.01691 [cs.HC]"},{"key":"e_1_3_1_50_2","doi-asserted-by":"crossref","unstructured":"Kevin Yang Nanyun Peng Yuandong Tian and Dan Klein. 2022. Re3: Generating Longer Stories With Recursive Reprompting and Revision. In Conference on Empirical Methods in Natural Language Processing. https:\/\/api.semanticscholar.org\/CorpusID:252873593","DOI":"10.18653\/v1\/2022.emnlp-main.296"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549143"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00130"},{"key":"e_1_3_1_53_2","doi-asserted-by":"crossref","unstructured":"Zejun Zhang Zhenchang Xing Xiwei Xu and Liming Zhu. 2023. RIdiom: Automatically Refactoring Non-Idiomatic Python Code with Pythonic Idioms. In 2023 IEEE\/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE 102\u2013106.","DOI":"10.1109\/ICSE-Companion58688.2023.00034"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639101"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643776","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643776","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T08:02:18Z","timestamp":1770192138000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643776"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,12]]},"references-count":53,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2024,7,12]]}},"alternative-id":["10.1145\/3643776"],"URL":"https:\/\/doi.org\/10.1145\/3643776","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,12]]}}}