{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T23:39:40Z","timestamp":1777419580206,"version":"3.51.4"},"reference-count":75,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"name":"Dieter Schwarz Foundation and the Technical University of Munich\u2014Institute for Advanced Study, Germany"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput.-Hum. Interact."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>In automated UI design generation, a key challenge is the lack of support for iterative processes, as most systems focus solely on end-to-end output. This stems from limited capabilities in interpreting design intent and a lack of transparency for refining intermediate results. To better understand these challenges, we conducted a formative study that identified concrete and actionable requirements for supporting iterative design with Generative Tools. Guided by these findings, we propose PrototypeFlow, a human-centered system for automated UI generation that leverages multi-modal inputs and models. PrototypeFlow takes natural language descriptions and layout preferences as input to generate the high-fidelity UI design. At its core is a theme design module that clarifies implicit design intent through prompt enhancement and orchestrates sub-modules for component-level generation. Designers retain full control over inputs, intermediate results, and final prototypes, enabling flexible and targeted refinement by steering generation and directly editing outputs. Our experiments and user studies confirmed the effectiveness and usefulness of our proposed PrototypeFlow.<\/jats:p>","DOI":"10.1145\/3773035","type":"journal-article","created":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T16:09:52Z","timestamp":1761667792000},"page":"1-45","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Towards Human\u2013AI Synergy in UI Design: Supporting Iterative Generation with LLMs"],"prefix":"10.1145","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-5797-0945","authenticated-orcid":false,"given":"Mingyue","family":"Yuan","sequence":"first","affiliation":[{"name":"University of New South Wales, Sydney, Australia, and CSIRO\u2019s Data61, Eveleigh, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2700-7478","authenticated-orcid":false,"given":"Jieshan","family":"Chen","sequence":"additional","affiliation":[{"name":"CSIRO\u2019s Data61, Eveleigh, Australia &amp; TUM-IAS, M\u00fcnchen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1315-8969","authenticated-orcid":false,"given":"Yongquan","family":"Hu","sequence":"additional","affiliation":[{"name":"University of New South Wales, Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7740-0377","authenticated-orcid":false,"given":"Sidong","family":"Feng","sequence":"additional","affiliation":[{"name":"Monash University, Melbourne, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0481-2167","authenticated-orcid":false,"given":"Mulong","family":"Xie","sequence":"additional","affiliation":[{"name":"CSIRO\u2019s Data61, Eveleigh, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8087-2241","authenticated-orcid":false,"given":"Gelareh","family":"Mohammadi","sequence":"additional","affiliation":[{"name":"University of New South Wales, Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7663-1421","authenticated-orcid":false,"given":"Zhenchang","family":"Xing","sequence":"additional","affiliation":[{"name":"CSIRO\u2019s Data61, Eveleigh, Australia and Australian National University, Canberra, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5274-6889","authenticated-orcid":false,"given":"Aaron","family":"Quigley","sequence":"additional","affiliation":[{"name":"CSIRO\u2019s Data61, Eveleigh, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,2,25]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"UX Pilot AI. 2024. UX Pilot AI: UI Design Wireframes Generation Sitemaps Templates and AI Tools. Retrieved from https:\/\/www.figma.com\/community\/plugin\/1257688030051249633\/ux-pilot-ai-ui-design-wireframes-generation-sitemaps-templates-ai-tools\/"},{"key":"e_1_3_1_3_2","unstructured":"Adobe. 2024. Adobe XD Platform. Retrieved from https:\/\/adobexdplatform.com\/"},{"key":"e_1_3_1_4_2","unstructured":"Stability AI. 2022. Stable-diffusion-v1-5. Retrieved from https:\/\/huggingface.co\/stable-diffusion-v1-5\/stable-diffusion-v1-5"},{"key":"e_1_3_1_5_2","unstructured":"Stability AI. 2023. Stable-diffusion-2-1. Retrieved from https:\/\/huggingface.co\/stabilityai\/stable-diffusion-2-1"},{"key":"e_1_3_1_6_2","unstructured":"Alibaba. 2024. Iconfont. Retrieved from https:\/\/www.iconfont.cn\/"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3220134.3220135"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-Companion.2019.00041"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-86334-0_36"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445762"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33012564"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3359282"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3391613"},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","unstructured":"Jieshan Chen Mulong Xie Zhenchang Xing Chunyang Chen Xiwei Xu Liming Zhu and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination? arXiv:2008.05132. Retrieved from https:\/\/arxiv.org\/abs\/2008.05132","DOI":"10.1145\/3368089.3409691"},{"key":"e_1_3_1_15_2","unstructured":"Jian Chen Ruiyi Zhang Yufan Zhou Rajiv Jain Zhiqiang Xu Ryan Rossi and Changyou Chen. 2024. Towards aligned layout generation via diffusion model with aesthetic constraints. arXiv:2402.04754. Retrieved from https:\/\/arxiv.org\/abs\/2402.04754"},{"key":"e_1_3_1_16_2","unstructured":"Chin-Yi Cheng Forrest Huang Gang Li and Yang Li. 2023. PLay: Parametrically conditioned layout generation using latent diffusion. arXiv:2301.11529. Retrieved from https:\/\/arxiv.org\/abs\/2301.11529"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1037\/10096-006"},{"key":"e_1_3_1_18_2","first-page":"222","article-title":"Thematic analysis","volume":"3","author":"Clarke Victoria","year":"2015","unstructured":"Victoria Clarke, Virginia Braun, and Nikki Hayfield. 2015. Thematic analysis. Qualitative Psychology: A Practical Guide to Research Methods 3 (2015), 222\u2013248.","journal-title":"Qualitative Psychology: A Practical Guide to Research Methods"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3580969"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3126594.3126651"},{"key":"e_1_3_1_21_2","unstructured":"Dribbble. 2024. Dribbble\u2014Discover the World\u2019s Top Designers and Creative Professionals. Retrieved from https:\/\/dribbble.com\/"},{"key":"e_1_3_1_22_2","doi-asserted-by":"crossref","unstructured":"Peitong Duan Jeremy Warner Yang Li and Bjoern Hartmann. 2024. Generating automatic feedback on UI mockups with large language models. arXiv:2403.13139. Retrieved from https:\/\/arxiv.org\/abs\/2403.13139","DOI":"10.1145\/3613904.3642782"},{"key":"e_1_3_1_23_2","unstructured":"Sidong Feng Mingyue Yuan Jieshan Chen Zhenchang Xing and Chunyang Chen. 2023. Designing with language: Wireframing UI design intent with generative large language models. arXiv:2312.07755. Retrieved from https:\/\/arxiv.org\/abs\/2312.07755"},{"key":"e_1_3_1_24_2","unstructured":"Figma. 2024. Figma: The Collaborative Interface Design Tool. Retrieved from https:\/\/www.figma.com\/"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","first-page":"5207","DOI":"10.52202\/068431-0376","article-title":"Clipdraw: Exploring text-to-drawing synthesis through language-image encoders","volume":"35","author":"Frans Kevin","year":"2022","unstructured":"Kevin Frans, Lisa Soros, and Olaf Witkowski. 2022. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. In Advances in Neural Information Processing Systems, Vol. 35 (2022), 5207\u20135218.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613905.3650786"},{"key":"e_1_3_1_27_2","unstructured":"Songwei Ge Vedanuj Goswami C. Lawrence Zitnick and Devi Parikh. 2020. Creative sketch generation. arXiv:2011.10039. Retrieved from https:\/\/arxiv.org\/abs\/2011.10039"},{"key":"e_1_3_1_28_2","unstructured":"Google. 2024. Material Icons. Retrieved from https:\/\/github.com\/google\/material-design-icons"},{"key":"e_1_3_1_29_2","volume-title":"Significant Gravitas\/Auto-GPT","author":"GitHub","year":"2023","unstructured":"GitHub. 2023. Significant Gravitas\/Auto-GPT. GitHub."},{"key":"e_1_3_1_30_2","first-page":"56","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Guerreiro Julian Jorge Andrade","year":"2024","unstructured":"Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui, Mayu Otani, and Hideki Nakayama. 2024. Layoutflow: Flow matching for layout generation. In Proceedings of the European Conference on Computer Vision. Springer, 56\u201372."},{"key":"e_1_3_1_31_2","first-page":"25","article-title":"GANs trained by a two time-scale update rule converge to a local Nash equilibrium","volume":"30","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, Vol. 30 (2017), 25\u201334.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300334"},{"key":"e_1_3_1_33_2","unstructured":"Forrest Huang Gang Li Xin Zhou John F. Canny and Yang Li. 2021. Creating user interface mock-ups from high-level text descriptions with deep-learning models. arXiv:2110.07775. Retrieved from https:\/\/arxiv.org\/abs\/2110.07775"},{"key":"e_1_3_1_34_2","unstructured":"Inception. 2015. Inception-v3. Retrieved from http:\/\/download.tensorflow.org\/models\/image\/imagenet\/inception-2015-12-05.tgz"},{"key":"e_1_3_1_35_2","unstructured":"Amir Hossein Kargaran Nafiseh Nikeghbal Abbas Heydarnoori and Hinrich Sch\u00fctze. 2023. MenuCraft: Interactive Menu System Design with Large Language Models. arXiv:2303.04496. Retrieved from https:\/\/arxiv.org\/abs\/2303.04496"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3713500"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3501931"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3415289"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10515-023-00377-x"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10515-023-00377-x"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3406324.3410710"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642114"},{"key":"e_1_3_1_43_2","first-page":"19730","volume-title":"International Conference on Machine Learning","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International Conference on Machine Learning. PMLR, 19730\u201319742."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025483"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3379337.3415820"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-95579-7_6"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445049"},{"key":"e_1_3_1_48_2","unstructured":"Yang Li Gang Li Luheng He Jingjie Zheng Hong Li and Zhiwei Guan. 2020. Widget captioning: Generating natural language description for mobile user interface elements. arXiv:2010.04295. Retrieved from https:\/\/arxiv.org\/abs\/2010.04295"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/634067.634190"},{"key":"e_1_3_1_50_2","first-page":"43447","article-title":"Chameleon: Plug-and-play compositional reasoning with large language models","volume":"36","author":"Lu Pan","year":"2024","unstructured":"Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, and Jianfeng Gao. 2024. Chameleon: Plug-and-play compositional reasoning with large language models. In Advances in Neural Information Processing Systems, Vol. 36 (2024), 43447\u201343478.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491101.3519809"},{"key":"e_1_3_1_52_2","unstructured":"Midjourney. 2024. Midjourney. Retrieved from https:\/\/www.midjourney.com"},{"key":"e_1_3_1_53_2","unstructured":"OpenAI. 2024. Embeddings\u2014OpenAI API. Retrieved from https:\/\/platform.openai.com\/docs\/guides\/embeddings"},{"key":"e_1_3_1_54_2","unstructured":"OpenAI. 2024. GPT4\u2014OpenAI API. Retrieved from https:\/\/platform.openai.com\/docs\/models\/gpt-4"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/302979.303163"},{"key":"e_1_3_1_56_2","first-page":"14866","article-title":"Generating diverse high-fidelity images with VQ-VAE-2","volume":"32","author":"Razavi Ali","year":"2019","unstructured":"Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with VQ-VAE-2. Advances in Neural Information Processing Systems 32, Article 1331 (2019), 14866\u201314876.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/2470654.2481281"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_1_59_2","first-page":"25278","article-title":"LAION-5B: An open large-scale dataset for training next generation image-text models","volume":"35","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. LAION-5B: An open large-scale dataset for training next generation image-text models. In Advances in Neural Information Processing Systems, Vol. 35 (2022), 25278\u201325294.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_60_2","doi-asserted-by":"crossref","unstructured":"Yongliang Shen Kaitao Song Xu Tan Dongsheng Li Weiming Lu and Yueting Zhuang. 2023. HuggingGPT: Solving AI tasks with chatGPT and its friends in hugging face. arXiv:2303.17580. Retrieved from https:\/\/arxiv.org\/abs\/2303.17580","DOI":"10.52202\/075280-1657"},{"key":"e_1_3_1_61_2","unstructured":"Sketch. 2024. Sketch.IO\u2014The Make of Sketchpad. Retrieved from https:\/\/sketch.io\/"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1161"},{"key":"e_1_3_1_63_2","unstructured":"Mirac Suzgun and Adam Tauman Kalai. 2024. Meta-prompting: Enhancing language models with task-agnostic scaffolding. arXiv:2401.12954. Retrieved from https:\/\/arxiv.org\/abs\/2401.12954"},{"key":"e_1_3_1_64_2","unstructured":"Uizard. 2024. UI Design Made Easy Powered By AI\u2014Uizard. Retrieved from https:\/\/uizard.io\/"},{"key":"e_1_3_1_65_2","unstructured":"Vercel. 2024. V0 Development Platform. Retrieved from https:\/\/v0.dev\/"},{"key":"e_1_3_1_66_2","unstructured":"Zijun Wan Jiawei Tang Linghang Cai Xin Tong and Can Liu. 2024. Breaking the Midas Spell: Understanding progressive novice-AI collaboration in spatial design. arXiv:2410.20124. Retrieved from https:\/\/arxiv.org\/abs\/2410.20124"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3580895"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472749.3474765"},{"key":"e_1_3_1_69_2","unstructured":"Guanzhi Wang Yuqi Xie Yunfan Jiang Ajay Mandlekar Chaowei Xiao Yuke Zhu Linxi Fan and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models. arXiv:2305.16291. Retrieved from https:\/\/arxiv.org\/abs\/2305.16291"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581402"},{"key":"e_1_3_1_71_2","unstructured":"Jason Wu Eldon Schoop Alan Leung Titus Barik Jeffrey P. Bigham and Jeffrey Nichols. 2024. Uicoder: Finetuning large language models to generate user interface code through automated feedback. arXiv:2406.07739. Retrieved from https:\/\/arxiv.org\/abs\/2406.07739"},{"key":"e_1_3_1_72_2","unstructured":"Shunyu Yao Jeffrey Zhao Dian Yu Nan Du Izhak Shafran Karthik Narasimhan and Yuan Cao. 2022. ReAct: Synergizing reasoning and acting in language models. arXiv:2210.03629. Retrieved from https:\/\/arxiv.org\/abs\/2210.03629"},{"key":"e_1_3_1_73_2","unstructured":"Christoph Zauner. 2010. Implementation and Benchmarking of Perceptual Image Hash Functions. Thesis Fachhochschul-Masterstudiengang."},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2007.08.003"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_1_76_2","unstructured":"Tianming Zhao Chunyang Chen Yuanning Liu and Xiaodong Zhu. 2021. GUIGAN: Learning to generate GUI designs using generative adversarial networks. arXiv:2101.09978. Retrieved from https:\/\/arxiv.org\/abs\/2101.09978"}],"container-title":["ACM Transactions on Computer-Human Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3773035","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T13:51:02Z","timestamp":1773323462000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3773035"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,25]]},"references-count":75,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3773035"],"URL":"https:\/\/doi.org\/10.1145\/3773035","relation":{},"ISSN":["1073-0516","1557-7325"],"issn-type":[{"value":"1073-0516","type":"print"},{"value":"1557-7325","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,25]]},"assertion":[{"value":"2024-09-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-14","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}