{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T03:29:44Z","timestamp":1769743784158,"version":"3.49.0"},"reference-count":115,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","funder":[{"name":"National Natural Science Foundation of China","award":["Grant No. 623B2006, Grant No. 92464301"],"award-info":[{"award-number":["Grant No. 623B2006, Grant No. 92464301"]}]},{"name":"Amazon","award":["Amazon Trust AI Research Award"],"award-info":[{"award-number":["Amazon Trust AI Research Award"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Foundation models (FMs) have become the backbone of intelligent systems. Collaborative development of FMs enables multiple teams to fine-tune different aspects of an FM simultaneously. However, conflicts in model updates across teams, particularly when modifying overlapping parameters, pose significant challenges to maintaining model performance. To address these challenges, in this paper, we propose Medusa, a novel framework designed to support collaborative FM development by managing model branches and introducing a structured system of parameter ownership. Medusa tracks fine-tuning efforts as separate branches, similar to Git, allowing developers to work on different tasks without destabilizing the base model. Instead of passively merging parameters from already fine-tuned models, Medusa proactively controls the merging process through our novel algorithm for assigning ownership of parameters by generating merging-aware masks to guide the fine-tuning process, ensuring that only specific branches can modify designated parameters. Medusa approximates the optimal assignment even as model complexity increases, ensuring scalability in large models. To investigate the efficacy of Medusa, we conduct extensive evaluations on five datasets and three models fine-tuned by three popular techniques, and compare our approach against six state-of-the-art approaches for post-training model merging. The evaluation results show that Medusa substantially and generally improves the effectiveness of collaborative model development, across different models, fine-tuning techniques, and datasets. Specifically, with automated parameter ownership assignment and masked fine-tuning, Medusa outperforms post-training model-merging approaches by improving model performance after merging by 3.19% absolute points. Ablation studies further demonstrate the efficacy of the algorithms in Medusa.<\/jats:p>","DOI":"10.1145\/3729385","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"2594-2617","source":"Crossref","is-referenced-by-count":1,"title":["Medusa: A Framework for Collaborative Development of Foundation Models with Automated Parameter Ownership Assignment"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7916-255X","authenticated-orcid":false,"given":"Dezhi","family":"Ran","sequence":"first","affiliation":[{"name":"Key Lab of HCST (PKU), MOE; SCS, Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-6970-1833","authenticated-orcid":false,"given":"Yuan","family":"Cao","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-7533-1478","authenticated-orcid":false,"given":"Yuzhe","family":"Guo","sequence":"additional","affiliation":[{"name":"Beijing Jiaotong University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0163-8555","authenticated-orcid":false,"given":"Yuetong","family":"Li","sequence":"additional","affiliation":[{"name":"University of Chicago, Chicago, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-0300-0994","authenticated-orcid":false,"given":"Mengzhou","family":"Wu","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5035-3398","authenticated-orcid":false,"given":"Simin","family":"Chen","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, Richardson, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5338-7347","authenticated-orcid":false,"given":"Wei","family":"Yang","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, Richardson, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6731-216X","authenticated-orcid":false,"given":"Tao","family":"Xie","sequence":"additional","affiliation":[{"name":"Key Lab of HCST (PKU), MOE; SCS, Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Ahmed A Al-Saedi Veselka Boeva and Emiliano Casalicchio. 2021. Reducing communication overhead of federated learning through clustering analysis. In ISCC. 1\u20137.","DOI":"10.1109\/ISCC53001.2021.9631391"},{"key":"e_1_2_1_2_1","unstructured":"Dzmitry Bahdanau. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.796139"},{"key":"e_1_2_1_4_1","unstructured":"Rishi Bommasani Drew A Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S Bernstein Jeannette Bohg Antoine Bosselut and Emma Brunskill. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258."},{"key":"e_1_2_1_5_1","volume-title":"Bagging predictors. Machine learning, 24","author":"Breiman Leo","year":"1996","unstructured":"Leo Breiman. 1996. Bagging predictors. Machine learning, 24 (1996), 123\u2013140."},{"key":"e_1_2_1_6_1","volume-title":"Random forests. Machine learning, 45","author":"Breiman Leo","year":"2001","unstructured":"Leo Breiman. 2001. Random forests. Machine learning, 45 (2001), 5\u201332."},{"key":"e_1_2_1_7_1","unstructured":"Tom B Brown. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165."},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Neil Burgess Jelena Milanovic Nigel Stephens Konstantinos Monachopoulos and David Mansell. 2019. Bfloat16 processing for neural networks. In ARITH. 88\u201391.","DOI":"10.1109\/ARITH.2019.00022"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the ACM on Software Engineering, 1, FSE","author":"Chen Simin","year":"2024","unstructured":"Simin Chen, Xiaoning Feng, Xiaohong Han, Cong Liu, and Wei Yang. 2024. PPM: Automated generation of diverse programming problems for benchmarking code generation models. Proceedings of the ACM on Software Engineering, 1, FSE (2024), 1194\u20131215."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Simin Chen Cong Liu Mirazul Haque Zihe Song and Wei Yang. 2022. Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems. In ESEC\/FSE. 1148\u20131160.","DOI":"10.1145\/3540250.3549102"},{"key":"e_1_2_1_11_1","unstructured":"Simin Chen Pranav Pusarla and Baishakhi Ray. 2025. Dynamic benchmarking of reasoning capabilities in code large language models under data contamination. arXiv preprint arXiv:2503.04149."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In SIGKDD. 785\u2013794.","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_1_13_1","unstructured":"Weiyu Chen and James Kwok. 2024. You only merge once: Learning the pareto set of preference-aware model merging. arXiv preprint arXiv:2408.12105."},{"key":"e_1_2_1_14_1","volume-title":"Lifelong machine learning","author":"Chen Zhiyuan","unstructured":"Zhiyuan Chen and Bing Liu. 2022. Lifelong machine learning. Springer Nature."},{"key":"e_1_2_1_15_1","unstructured":"Rajas Chitale Ankit Vaidya Aditya Kane and Archana Ghotkar. 2023. Task arithmetic with LoRA for continual learning. arxiv:2311.02428. arxiv:2311.02428"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"KR1442 Chowdhary and KR Chowdhary. 2020. Natural language processing. Fundamentals of artificial intelligence 603\u2013649.","DOI":"10.1007\/978-81-322-3972-7_19"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Catarina Costa and Leonardo Murta. 2013. Version control in distributed software development: A systematic mapping study. In ICGSE. 90\u201399.","DOI":"10.1109\/ICGSE.2013.19"},{"key":"e_1_2_1_18_1","volume-title":"Cl\u00e9mentine Fourrier, Enrique Manjavacas, Stefan Schweter, and Daniel Van Strien.","author":"Toni Francesco De","year":"2022","unstructured":"Francesco De Toni, Christopher Akiki, Javier De La Rosa, Cl\u00e9mentine Fourrier, Enrique Manjavacas, Stefan Schweter, and Daniel Van Strien. 2022. Entities, dates, and languages: Zero-shot on historical texts with t0. arXiv preprint arXiv:2204.05211."},{"key":"e_1_2_1_19_1","volume-title":"Deep learning in natural language processing","author":"Deng Li","unstructured":"Li Deng and Yang Liu. 2018. Deep learning in natural language processing. Springer."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45014-9_1"},{"key":"e_1_2_1_21_1","volume-title":"Unified language model pre-training for natural language understanding and generation. NIPS, 32","author":"Dong Li","year":"2019","unstructured":"Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. NIPS, 32 (2019)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-019-8208-z"},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Cynthia Dwork. 2008. Differential privacy: A survey of results. In TAMC. 1\u201319.","DOI":"10.1007\/978-3-540-79228-4_1"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3664812"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.37"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1504"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1013203451"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Takashi Fukuda Masayuki Suzuki Gakuto Kurata Samuel Thomas Jia Cui and Bhuvana Ramabhadran. 2017. Efficient knowledge distillation from an ensemble of teachers.. In Interspeech. 3697\u20133701.","DOI":"10.21437\/Interspeech.2017-614"},{"key":"e_1_2_1_29_1","volume-title":"Principles of software engineering management. 11","author":"Gilb Tom","unstructured":"Tom Gilb and Susannah Finzi. 1988. Principles of software engineering management. 11, Addison-wesley Reading, MA."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Mandy Guo Joshua Ainslie David Uthus Santiago Ontanon Jianmo Ni Yun-Hsuan Sung and Yinfei Yang. 2021. LongT5: Efficient text-to-text transformer for long sequences. arXiv preprint arXiv:2112.07916.","DOI":"10.18653\/v1\/2022.findings-naacl.55"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.58871"},{"key":"e_1_2_1_32_1","volume-title":"The elements of statistical learning: data mining, inference, and prediction. 2","author":"Hastie Trevor","unstructured":"Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. 2, Springer."},{"key":"e_1_2_1_33_1","volume-title":"Ping Tak Peter Tang, and Alexander Heinecke","author":"Henry Greg","year":"2019","unstructured":"Greg Henry, Ping Tak Peter Tang, and Alexander Heinecke. 2019. Leveraging the bfloat16 artificial intelligence datatype for higher-precision computations. In ARITH. 69\u201376."},{"key":"e_1_2_1_34_1","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531."},{"key":"e_1_2_1_35_1","first-page":"1","article-title":"Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks","volume":"22","author":"Hoefler Torsten","year":"2021","unstructured":"Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. 2021. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. JMLR, 22, 241 (2021), 1\u2013124.","journal-title":"JMLR"},{"key":"e_1_2_1_36_1","unstructured":"Ellis Horowitz and Barry W Boehm. 1975. Practical strategies for developing large software systems. Citeseer."},{"key":"e_1_2_1_37_1","unstructured":"Edward J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. LoRA: Low-rank adaptation of large language models. arxiv:2106.09685. arxiv:2106.09685"},{"key":"e_1_2_1_38_1","unstructured":"Wenlong Huang Pieter Abbeel Deepak Pathak and Igor Mordatch. 2022. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In ICML. 9118\u20139147."},{"key":"e_1_2_1_39_1","unstructured":"Shuning Huo Yafei Xiang Hanyi Yu Mengran Zhu and Yulu Gong. 2024. Deep learning approaches for improving question answering systems in hepatocellular carcinoma research. arxiv:2402.16038. arxiv:2402.16038"},{"key":"e_1_2_1_40_1","volume-title":"Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi.","author":"Ilharco Gabriel","year":"2023","unstructured":"Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. 2023. Editing models with task arithmetic. arxiv:2212.04089. arxiv:2212.04089"},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Jingjing Jiang and Nanning Zheng. 2023. MixPHM: redundancy-aware parameter-efficient tuning for low-resource visual question answering. In CVPR. 24203\u201324213.","DOI":"10.1109\/CVPR52729.2023.02318"},{"key":"e_1_2_1_42_1","unstructured":"Xisen Jin Xiang Ren Daniel Preotiuc-Pietro and Pengxiang Cheng. 2022. Dataless knowledge fusion by merging weights of language models. arXiv preprint arXiv:2212.09849."},{"key":"e_1_2_1_43_1","unstructured":"Hayeon Jo Hyesong Choi Minhee Cho and Dongbo Min. 2024. iConFormer: Dynamic parameter-efficient tuning with input-conditioned adaptation. arxiv:2409.02838. arxiv:2409.02838"},{"key":"e_1_2_1_44_1","volume-title":"Kallista Bonawitz, Zachary Charles, Graham Cormode, and Rachel Cummings.","author":"Kairouz Peter","year":"2021","unstructured":"Peter Kairouz, H Brendan McMahan, Brendan Avent, Aur\u00e9lien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, and Rachel Cummings. 2021. Advances and open problems in federated learning. Foundations and trends\u00ae in machine learning, 14, 1\u20132 (2021), 1\u2013210."},{"key":"e_1_2_1_45_1","volume-title":"Git-theta: A git extension for collaborative development of machine learning models. In ICML. 15708\u201315719.","author":"Kandpal Nikhil","year":"2023","unstructured":"Nikhil Kandpal, Brian Lester, Mohammed Muqeeth, Anisha Mascarenhas, Monty Evans, Vishal Baskaran, Tenghao Huang, Haokun Liu, and Colin Raffel. 2023. Git-theta: A git extension for collaborative development of machine learning models. In ICML. 15708\u201315719."},{"key":"e_1_2_1_46_1","volume-title":"Paul Denny, Michelle Craig, and Tovi Grossman.","author":"Kazemitabaar Majeed","year":"2024","unstructured":"Majeed Kazemitabaar, Runlong Ye, Xiaoning Wang, Austin Zachary Henley, Paul Denny, Michelle Craig, and Tovi Grossman. 2024. Codeaid: Evaluating a classroom deployment of an LLM-based programming assistant that balances student and educator needs. In CHI. 1\u201320."},{"key":"e_1_2_1_47_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980."},{"key":"e_1_2_1_48_1","unstructured":"Fanshuang Kong Richong Zhang and Ziqiao Wang. 2024. Activated parameter locating via causal intervention for model merging. arXiv preprint arXiv:2408.09485."},{"key":"e_1_2_1_49_1","volume-title":"Deep learning. Nature, 521, 7553","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, 521, 7553 (2015), 436\u2013444."},{"key":"e_1_2_1_50_1","volume-title":"BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arxiv:1910.13461. arxiv:1910.13461","author":"Lewis Mike","year":"2019","unstructured":"Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arxiv:1910.13461. arxiv:1910.13461"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485730.3485929"},{"key":"e_1_2_1_52_1","unstructured":"Siyuan Li Luyuan Zhang Zedong Wang Di Wu Lirong Wu Zicheng Liu Jun Xia Cheng Tan Yang Liu and Baigui Sun. 2023. Masked modeling for self-supervised representation learning on vision and beyond. arXiv preprint arXiv:2401.00897."},{"key":"e_1_2_1_53_1","unstructured":"Haokun Liu Derek Tam Mohammed Muqeeth Jay Mohta Tenghao Huang Mohit Bansal and Colin Raffel. 2022. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arxiv:2205.05638. arxiv:2205.05638"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3560815"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Yinhan Liu Jiatao Gu Naman Goyal Xian Li Sergey Edunov Marjan Ghazvininejad Mike Lewis and Luke Zettlemoyer. 2020. Multilingual denoising pre-training for neural machine translation. ACL 726\u2013742.","DOI":"10.1162\/tacl_a_00343"},{"key":"e_1_2_1_56_1","unstructured":"Yucheng Low Rajat Arya Ajit Banerjee Ann Huang Brian Ronan Hoyt Koepke Joseph Godlewski and Zach Nation. 2023. Git Is For Data. In CIDR."},{"key":"e_1_2_1_57_1","unstructured":"Priyanka Mary Mammen. 2021. Federated learning: Opportunities and challenges. arXiv preprint arXiv:2101.05428."},{"key":"e_1_2_1_58_1","first-page":"1","article-title":"Finding similar files in a large file system","volume":"94","author":"Manber Udi","year":"1994","unstructured":"Udi Manber. 1994. Finding similar files in a large file system.. In USENIX Winter. 94, 1\u201310.","journal-title":"USENIX Winter."},{"key":"e_1_2_1_59_1","unstructured":"Yuzhu Mao Siqi Ping Zihao Zhao Yang Liu and Wenbo Ding. 2024. Enhancing parameter efficiency and generalization in large-scale models: A regularized and masked low-rank adaptation approach. arXiv preprint arXiv:2407.12074."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00041"},{"key":"e_1_2_1_61_1","volume-title":"Caiming Xiong, and Richard Socher.","author":"McCann Bryan","year":"2018","unstructured":"Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The natural language decathlon: Multitask learning as question answering. arxiv:1806.08730. arxiv:1806.08730"},{"key":"e_1_2_1_62_1","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1147\/sj.194.0414","article-title":"The management of software engineering, Part I: Principles of software engineering","volume":"19","author":"Mills Harlan D","year":"1980","unstructured":"Harlan D Mills. 1980. The management of software engineering, Part I: Principles of software engineering. IBM Systems Journal, 19, 4 (1980), 414\u2013420.","journal-title":"IBM Systems Journal"},{"key":"e_1_2_1_63_1","volume-title":"International Conference on Artificial Intelligence and Statistics. 5876\u20135890","author":"Mohtashami Amirkeivan","year":"2022","unstructured":"Amirkeivan Mohtashami, Martin Jaggi, and Sebastian Stich. 2022. Masked training of neural networks with partial gradients. In International Conference on Artificial Intelligence and Statistics. 5876\u20135890."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3182399"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1136\/amiajnl-2011-000464"},{"key":"e_1_2_1_66_1","unstructured":"OpenAI. 2024. Fine-tuning OpenAI models available through the API.. https:\/\/platform.openai.com\/docs\/guides\/fine-tuning"},{"key":"e_1_2_1_67_1","first-page":"66727","article-title":"Task arithmetic in the tangent space: Improved editing of pre-trained models","volume":"36","author":"Ortiz-Jimenez Guillermo","year":"2023","unstructured":"Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard. 2023. Task arithmetic in the tangent space: Improved editing of pre-trained models. NIPS, 36 (2023), 66727\u201366754.","journal-title":"NIPS"},{"key":"e_1_2_1_68_1","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, and Alex Ray. 2022. Training language models to follow instructions with human feedback. NIPS, 35 (2022), 27730\u201327744.","journal-title":"NIPS"},{"key":"e_1_2_1_69_1","doi-asserted-by":"crossref","first-page":"2017","DOI":"10.1609\/aaai.v36i2.20097","article-title":"Teach: Task-driven embodied agents that chat","volume":"36","author":"Padmakumar Aishwarya","year":"2022","unstructured":"Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, and Dilek Hakkani-Tur. 2022. Teach: Task-driven embodied agents that chat. In AAAI. 36, 2017\u20132025.","journal-title":"AAAI."},{"key":"e_1_2_1_70_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in PyTorch."},{"key":"e_1_2_1_71_1","volume-title":"Google Translate: One billion installs, one billion stories. https:\/\/blog.google\/products\/translate\/one-billion-installs\/","author":"Pitman Jeff","year":"2021","unstructured":"Jeff Pitman. 2021. Google Translate: One billion installs, one billion stories. https:\/\/blog.google\/products\/translate\/one-billion-installs\/"},{"key":"e_1_2_1_72_1","volume-title":"Liu","author":"Raffel Colin","year":"2023","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2023. Exploring the limits of transfer learning with a unified text-to-text transformer. arxiv:1910.10683. arxiv:1910.10683"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","unstructured":"Dezhi Ran Yuan Cao Yuzhe Guo Yuetong Li Mengzhou Wu Simin Chen Wei Yang and Tao Xie. 2025. Medusa Artifacts. https:\/\/doi.org\/10.5281\/zenodo.15203401 10.5281\/zenodo.15203401","DOI":"10.5281\/zenodo.15203401"},{"key":"e_1_2_1_74_1","doi-asserted-by":"crossref","unstructured":"Dezhi Ran Zongyang Li Chenxu Liu Wenyu Wang Weizhi Meng Xionglin Wu Hui Jin Jing Cui Xing Tang and Tao Xie. 2022. Automated visual testing for mobile apps in an industrial setting. In ICSE-SEIP. 55\u201364.","DOI":"10.1145\/3510457.3513027"},{"key":"e_1_2_1_75_1","volume-title":"Guardian: A Runtime Framework for LLM-Based UI Exploration. In ISSTA. 958\u2013970.","author":"Ran Dezhi","year":"2024","unstructured":"Dezhi Ran, Hao Wang, Zihe Song, Mengzhou Wu, Yuan Cao, Ying Zhang, Wei Yang, and Tao Xie. 2024. Guardian: A Runtime Framework for LLM-Based UI Exploration. In ISSTA. 958\u2013970."},{"key":"e_1_2_1_76_1","volume-title":"BADGE: Prioritizing UI Events with Hierarchical Multi-Armed Bandits for Automated UI Testing. In ICSE. 894\u2013905.","author":"Ran Dezhi","year":"2023","unstructured":"Dezhi Ran, Hao Wang, Wenyu Wang, and Tao Xie. 2023. BADGE: Prioritizing UI Events with Hierarchical Multi-Armed Bandits for Automated UI Testing. In ICSE. 894\u2013905."},{"key":"e_1_2_1_77_1","doi-asserted-by":"crossref","unstructured":"Dezhi Ran Mengzhou Wu Wei Yang and Tao Xie. 2025. Foundation model engineering: engineering foundation models just as engineering software. TOSEM.","DOI":"10.1145\/3719005"},{"key":"e_1_2_1_78_1","unstructured":"Dezhi Ran Mengzhou Wu Hao Yu Yuetong Li Jun Ren Yuan Cao Xia Zeng Haochuan Lu Zexin Xu and Mengqian Xu. 2025. Beyond pass or fail: A multi-dimensional benchmark for mobile UI navigation. arXiv preprint arXiv:2501.02863."},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0301738"},{"key":"e_1_2_1_80_1","doi-asserted-by":"crossref","unstructured":"Steven I Ross Fernando Martinez Stephanie Houde Michael Muller and Justin D Weisz. 2023. The programmer\u2019s assistant: Conversational interaction with a large language model for software development. In IUI. 491\u2013514.","DOI":"10.1145\/3581641.3584037"},{"key":"e_1_2_1_81_1","unstructured":"Winston W Royce. 1987. Managing the development of large software systems: concepts and techniques. In ICSE. 328\u2013338."},{"key":"e_1_2_1_82_1","volume-title":"Teven Le Scao, and Arun Raja","author":"Sanh Victor","year":"2021","unstructured":"Victor Sanh, Albert Webson, Colin Raffel, Stephen H Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, and Arun Raja. 2021. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207."},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2017.04.004"},{"key":"e_1_2_1_84_1","volume-title":"Manning","author":"Liu Peter J.","year":"2017","unstructured":"Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. arxiv:1704.04368. arxiv:1704.04368"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2005.140"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2012.61"},{"key":"e_1_2_1_87_1","first-page":"24193","article-title":"Training neural networks with fixed sparse masks","volume":"34","author":"Sung Yi-Lin","year":"2021","unstructured":"Yi-Lin Sung, Varun Nair, and Colin A Raffel. 2021. Training neural networks with fixed sparse masks. NIPS, 34 (2021), 24193\u201324205.","journal-title":"NIPS"},{"key":"e_1_2_1_88_1","volume-title":"Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, and Tegawend\u00e9 F Bissyand\u00e9.","author":"Tian Haoye","year":"2023","unstructured":"Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, and Tegawend\u00e9 F Bissyand\u00e9. 2023. Is ChatGPT the ultimate programming assistant\u2013how far is it? arXiv preprint arXiv:2304.11938."},{"key":"e_1_2_1_89_1","volume-title":"Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, and Faisal Azhar. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971."},{"key":"e_1_2_1_90_1","unstructured":"Barak Turovsky. 2016. Ten years of Google Translate. https:\/\/www.blog.google\/products\/translate\/ten-years-of-google-translate\/"},{"key":"e_1_2_1_91_1","volume-title":"Hans Van Vliet, and JC Van Vliet","author":"Vliet Hans Van","year":"2008","unstructured":"Hans Van Vliet, Hans Van Vliet, and JC Van Vliet. 2008. Software engineering: principles and practice. 13, John Wiley & Sons Hoboken, NJ."},{"key":"e_1_2_1_92_1","volume-title":"\u0141 ukasz Kaiser, and Illia Polosukhin","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. NIPS, 30 (2017)."},{"key":"e_1_2_1_93_1","volume-title":"Bowman","author":"Wang Alex","year":"2020","unstructured":"Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2020. SuperGLUE: A stickier benchmark for general-purpose language understanding systems. arxiv:1905.00537. arxiv:1905.00537"},{"key":"e_1_2_1_94_1","volume-title":"Bowman","author":"Wang Alex","year":"2019","unstructured":"Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arxiv:1804.07461. arxiv:1804.07461"},{"key":"e_1_2_1_95_1","volume-title":"Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.","author":"Wang Guanzhi","year":"2023","unstructured":"Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291."},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2019.2904348"},{"key":"e_1_2_1_97_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2020.2988575"},{"key":"e_1_2_1_98_1","doi-asserted-by":"crossref","unstructured":"Alexander Wettig Tianyu Gao Zexuan Zhong and Danqi Chen. 2022. Should you mask 15% in masked language modeling? arXiv preprint arXiv:2202.08005.","DOI":"10.18653\/v1\/2023.eacl-main.217"},{"key":"e_1_2_1_99_1","volume-title":"Transformers: State-of-the-art natural language processing. In EMNLP: system demonstrations. 38\u201345.","author":"Wolf Thomas","year":"2020","unstructured":"Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R\u00e9mi Louf, and Morgan Funtowicz. 2020. Transformers: State-of-the-art natural language processing. In EMNLP: system demonstrations. 38\u201345."},{"key":"e_1_2_1_100_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(05)80023-1"},{"key":"e_1_2_1_101_1","volume-title":"Wiley Encyclopedia of Computer Science and Engineering, 1\u201310.","author":"Wolsey Laurence A","unstructured":"Laurence A Wolsey. 2007. Mixed integer programming. Wiley Encyclopedia of Computer Science and Engineering, 1\u201310."},{"key":"e_1_2_1_102_1","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1109\/T-C.1974.224002","article-title":"The cost of developing large-scale software","volume":"100","author":"Wolverton Ray W","year":"1974","unstructured":"Ray W Wolverton. 1974. The cost of developing large-scale software. IEEE Trans. Comput., 100, 6 (1974), 615\u2013636.","journal-title":"IEEE Trans. Comput."},{"key":"e_1_2_1_103_1","volume-title":"Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, and Simon Kornblith.","author":"Wortsman Mitchell","year":"2022","unstructured":"Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, and Simon Kornblith. 2022. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In ICML. 23965\u201323998."},{"key":"e_1_2_1_104_1","doi-asserted-by":"crossref","unstructured":"L Xue. 2020. mT5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.","DOI":"10.18653\/v1\/2021.naacl-main.41"},{"key":"e_1_2_1_105_1","volume-title":"Ties-merging: Resolving interference when merging models. NIPS, 36","author":"Yadav Prateek","year":"2024","unstructured":"Prateek Yadav, Derek Tam, Leshem Choshen, Colin A Raffel, and Mohit Bansal. 2024. Ties-merging: Resolving interference when merging models. NIPS, 36 (2024)."},{"key":"e_1_2_1_106_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-022-10283-5"},{"key":"e_1_2_1_107_1","unstructured":"Peng Ye Yongqi Huang Chongjun Tu Minglei Li Tao Chen Tong He and Wanli Ouyang. 2023. Partial fine-tuning: A successor to full fine-tuning for vision transformers. arxiv:2312.15681. arxiv:2312.15681"},{"key":"e_1_2_1_108_1","doi-asserted-by":"crossref","unstructured":"Hao Yu Bo Shen Dezhi Ran Jiaxin Zhang Qi Zhang Yuchi Ma Guangtai Liang Ying Li Qianxiang Wang and Tao Xie. 2024. CoderEval: A benchmark of pragmatic code generation with generative pre-trained models. In ICSE. 1\u201312.","DOI":"10.1145\/3597503.3623316"},{"key":"e_1_2_1_109_1","unstructured":"Le Yu Bowen Yu Haiyang Yu Fei Huang and Yongbin Li. 2024. Language models are super mario: Absorbing abilities from homologous models as a free lunch. In ICML."},{"key":"e_1_2_1_110_1","first-page":"39","article-title":"Accelerating the machine learning lifecycle with MLflow","volume":"41","author":"Zaharia Matei","year":"2018","unstructured":"Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, and Mani Parkhe. 2018. Accelerating the machine learning lifecycle with MLflow.. IEEE Data Eng. Bull., 41, 4 (2018), 39\u201345.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_111_1","unstructured":"Jure Zbontar Li Jing Ishan Misra Yann LeCun and St\u00e9phane Deny. 2021. Barlow twins: Self-supervised learning via redundancy reduction. In ICML. 12310\u201312320."},{"key":"e_1_2_1_112_1","unstructured":"Yuchen Zeng and Kangwook Lee. 2024. The expressive power of low-rank adaptation. arxiv:2310.17513. arxiv:2310.17513"},{"key":"e_1_2_1_113_1","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1111\/rssb.12317","article-title":"MALMEM: model averaging in linear measurement error models","volume":"81","author":"Zhang Xinyu","year":"2019","unstructured":"Xinyu Zhang, Yanyuan Ma, and Raymond J Carroll. 2019. MALMEM: model averaging in linear measurement error models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81, 4 (2019), 763\u2013779.","journal-title":"Journal of the Royal Statistical Society Series B: Statistical Methodology"},{"key":"e_1_2_1_114_1","first-page":"8110","article-title":"BoostTree and BoostForest for ensemble learning","volume":"45","author":"Zhao Changming","year":"2022","unstructured":"Changming Zhao, Dongrui Wu, Jian Huang, Ye Yuan, Hai-Tao Zhang, Ruimin Peng, and Zhenhua Shi. 2022. BoostTree and BoostForest for ensemble learning. TPAMI, 45, 7 (2022), 8110\u20138126.","journal-title":"TPAMI"},{"key":"e_1_2_1_115_1","volume-title":"Adapi: Facilitating dnn model adaptivity for efficient private inference in edge computing. arXiv preprint arXiv:2407.05633.","author":"Zhou Tong","year":"2024","unstructured":"Tong Zhou, Jiahui Zhao, Yukui Luo, Xi Xie, Wujie Wen, Caiwen Ding, and Xiaolin Xu. 2024. Adapi: Facilitating dnn model adaptivity for efficient private inference in edge computing. arXiv preprint arXiv:2407.05633."}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729385","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:16:04Z","timestamp":1750346164000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729385"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":115,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3729385"],"URL":"https:\/\/doi.org\/10.1145\/3729385","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}