{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T18:57:55Z","timestamp":1771613875612,"version":"3.50.1"},"reference-count":93,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:p>Business Intelligence (BI) plays a critical role in empowering modern enterprises to make informed data-driven decisions, and has grown into a billion-dollar business. Self-service BI tools like Power BI and Tableau have democratized the \"dashboarding\" phase of BI, by offering user-friendly, drag-and-drop interfaces that are tailored to non-technical enterprise users. However, despite these advances, we observe that the \"data preparation\" phase of BI continues to be a key pain point for BI users today.<\/jats:p>\n          <jats:p>In this work, we systematically study around 2K real BI projects harvested from public sources, focusing on the data-preparation phase of the BI workflows. We observe that users often have to program both (1) data transformation steps and (2) table joins steps, before their raw data can be ready for dashboarding and analysis. A careful study of the BI workflows reveals that transformation and join steps are often intertwined in the same BI project, such that considering both holistically is crucial to accurately predict these steps. Leveraging this observation, we develop an Auto-Prep system to holistically predict transformations and joins, using a principled graph-based algorithm inspired by Steiner-tree, with provable quality guarantees. Extensive evaluations using real BI projects suggest that Auto-Prep can correctly predict over 70% transformation and join steps, significantly more accurate than existing algorithms as well as language-models such as GPT-4.<\/jats:p>","DOI":"10.14778\/3734839.3734856","type":"journal-article","created":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:01:06Z","timestamp":1756483266000},"page":"2212-2225","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence"],"prefix":"10.14778","volume":"18","author":[{"given":"Eugenie Y.","family":"Lai","sequence":"first","affiliation":[{"name":"MIT"}]},{"given":"Yeye","family":"He","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]},{"given":"Surajit","family":"Chaudhuri","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]}],"member":"320","published-online":{"date-parts":[[2025,8,29]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. Auto-Prep: full technical report. https:\/\/arxiv.org\/abs\/2504.11627."},{"key":"e_1_2_1_2_1","unstructured":"[n.d.]. Calibration of probabilities in sklearn. https:\/\/scikit-learn.org\/stable\/modules\/calibration.html."},{"key":"e_1_2_1_3_1","unstructured":"[n.d.]. Create and manage join relationships in Power BI. https:\/\/learn.microsoft.com\/en-us\/power-bi\/transform-model\/desktop-create-and-manage-relationships."},{"key":"e_1_2_1_4_1","unstructured":"[n.d.]. Gartner: the Future of Self-Service Is Customer-Led Automation. https:\/\/www.gartner.com\/en\/newsroom\/press-releases\/2019-05-28-gartner-says-the-future-of-self-service-is-customer-l. Accessed: 2024-05-11."},{"key":"e_1_2_1_5_1","unstructured":"[n.d.]. IDC: U.S. Business Intelligence and Analytics Platforms."},{"key":"e_1_2_1_6_1","unstructured":"[n.d.]. Model relationships in Power BI. https:\/\/learn.microsoft.com\/en-us\/power-bi\/transform-model\/desktop-relationships-understand."},{"key":"e_1_2_1_7_1","unstructured":"[n.d.]. OpenAI Fine-tuning. https:\/\/platform.openai.com\/docs\/guides\/fine-tuning. Accessed: 2025-01."},{"key":"e_1_2_1_8_1","unstructured":"[n.d.]. Pivot in Power BI. https:\/\/learn.microsoft.com\/en-us\/power-query\/pivot-columns."},{"key":"e_1_2_1_9_1","unstructured":"[n.d.]. Pivot in Python Pandas. https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.pivot.html."},{"key":"e_1_2_1_10_1","unstructured":"[n.d.]. Pivot in R. https:\/\/tidyr.tidyverse.org\/articles\/pivot.html."},{"key":"e_1_2_1_11_1","unstructured":"[n.d.]. Pivot in Tableau. https:\/\/help.tableau.com\/current\/pro\/desktop\/en-us\/pivot.htm."},{"key":"e_1_2_1_12_1","unstructured":"[n.d.]. Power BI forum question: Help with transform data. https:\/\/community.fabric.microsoft.com\/t5\/Power-Query\/Help-with-transforming-data\/m-p\/3699752."},{"key":"e_1_2_1_13_1","unstructured":"[n.d.]. Power BI forum question: How to Transforming data as indicated. https:\/\/community.fabric.microsoft.com\/t5\/Power-Query\/Transforming-data-as-indicated\/m-p\/3794251."},{"key":"e_1_2_1_14_1","unstructured":"[n.d.]. Power BI forum question: Join after transform. https:\/\/community.fabric.microsoft.com\/t5\/Power-Query\/Join-all-sheets-in-one-excel-after-transforming-them\/m-p\/3155786."},{"key":"e_1_2_1_15_1","unstructured":"[n.d.]. Power BI forum question: over 10000 questions for a search on \"Transform data\". https:\/\/community.fabric.microsoft.com\/t5\/forums\/searchpage\/tab\/message?filter=location&q=transformdata."},{"key":"e_1_2_1_16_1","unstructured":"[n.d.]. Power BI forum question: Tables Join after transform. https:\/\/community.fabric.microsoft.com\/t5\/Desktop\/Tables-Join\/m-p\/631121."},{"key":"e_1_2_1_17_1","unstructured":"[n.d.]. Python concatenate. https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.concat.html."},{"key":"e_1_2_1_18_1","unstructured":"[n.d.]. Python split. https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.Series.str.split.html."},{"key":"e_1_2_1_19_1","unstructured":"[n.d.]. Python substring. https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.Series.str.slice.html."},{"key":"e_1_2_1_20_1","unstructured":"[n.d.]. Relationships: Asking questions across multiple related tables. https:\/\/www.tableau.com\/blog\/relationships-asking-questions-across-multiple-related-tables."},{"key":"e_1_2_1_21_1","unstructured":"[n.d.]. Tableau forum question: How to pivot or transform. https:\/\/community.tableau.com\/s\/question\/0D58b0000CFihDyCQJ\/how-to-pivot-or-transform-in-tableau."},{"key":"e_1_2_1_22_1","unstructured":"[n.d.]. Tableau forum question: How to transform this table. https:\/\/community.tableau.com\/s\/question\/0D54T00000C6BsmSAF\/how-to-transform-data."},{"key":"e_1_2_1_23_1","unstructured":"[n.d.]. Tableau Forum question: over 3000 questions for a search on \"Transform\". https:\/\/community.tableau.com\/s\/global-search\/@uri#q=transform&t=All."},{"key":"e_1_2_1_24_1","unstructured":"[n.d.]. Tableau forum question: Transform table and join. https:\/\/community.tableau.com\/s\/question\/0D54T00000C5tE4SAJ\/transform-table-and-join."},{"key":"e_1_2_1_25_1","unstructured":"[n.d.]. Tableau forum question: Transform table and join. https:\/\/community.tableau.com\/s\/question\/0D58b0000ADNKzZCQX\/im-trying-to-find-a-way-to-define-a-relationship-between-two-excel-sheets-where-the-common-field-may-extend-across-multiple-fields."},{"key":"e_1_2_1_26_1","unstructured":"[n.d.]. Transpose in Python Pandas. https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.transpose.html."},{"key":"e_1_2_1_27_1","unstructured":"[n.d.]. Trifacta: Standardize Using Patterns. (Retrieved in 07\/2023). https:\/\/docs.trifacta.com\/display\/DP\/Standardize+Using+Patterns."},{"key":"e_1_2_1_28_1","unstructured":"[n.d.]. Unpivot in Power BI. https:\/\/support.microsoft.com\/en-us\/office\/unpivot-columns-power-query-0f7bad4b-9ea1-49c1-9d95-f588221c7098."},{"key":"e_1_2_1_29_1","unstructured":"[n.d.]. Unpivot in Python Pandas (melt). https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.melt.html."},{"key":"e_1_2_1_30_1","unstructured":"[n.d.]. Unpivot in R (melt). https:\/\/rdatatable.gitlab.io\/data.table\/reference\/melt.data.table.html."},{"key":"e_1_2_1_31_1","unstructured":"[n.d.]. Unpivot in Tableau. https:\/\/community.tableau.com\/s\/question\/0D54T00000C6ndgSAB\/how-to-unpivot-data."},{"key":"e_1_2_1_32_1","unstructured":"[n.d.]. Use Relationships for Multi-table Data Analysis. https:\/\/help.tableau.com\/current\/server\/en-us\/datasource_multitable_normalized.htm."},{"key":"e_1_2_1_33_1","unstructured":"Retrieved in 2023-01. Power BI. https:\/\/powerbi.microsoft.com\/en-us\/."},{"key":"e_1_2_1_34_1","unstructured":"Retrieved in 2023-01. Tableau. https:\/\/www.tableau.com\/."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2813885.2737952"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360594"},{"key":"e_1_2_1_37_1","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978542.1978562"},{"key":"e_1_2_1_39_1","first-page":"128","article-title":"Rigel: Transforming tabular data by declarative mapping","volume":"29","author":"Chen Ran","year":"2022","unstructured":"Ran Chen, Di Weng, Yanwei Huang, Xinhuan Shu, Jiayi Zhou, Guodao Sun, and Yingcai Wu. 2022. Rigel: Transforming tabular data by declarative mapping. IEEE Transactions on Visualization and Computer Graphics 29, 1 (2022), 128\u2013138.","journal-title":"IEEE Transactions on Visualization and Computer Graphics"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733014"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639279"},{"key":"e_1_2_1_43_1","volume-title":"International conference on machine learning. PMLR, 990\u2013998","author":"Devlin Jacob","year":"2017","unstructured":"Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. Robustfill: Neural program learning under noisy i\/o. In International conference on machine learning. PMLR, 990\u2013998."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2240236.2240260"},{"key":"e_1_2_1_45_1","volume-title":"Data Mining and Reverse Engineering: Searching for semantics. IFIP TC2 WG2. 6 IFIP Seventh Conference on Database Semantics (DS-7) 7\u201310","author":"Han Jiawei","year":"1997","unstructured":"Jiawei Han. 1998. OLAP mining: An integration of OLAP with data mining. In Data Mining and Reverse Engineering: Searching for semantics. IFIP TC2 WG2. 6 IFIP Seventh Conference on Database Semantics (DS-7) 7\u201310 October 1997, Leysin, Switzerland. Springer, 3\u201320."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3231751.3231766"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824036"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the 2018 International Conference on Management of Data. 1785\u20131788","author":"He Yeye","year":"2018","unstructured":"Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek Narasayya, Surajit Chaudhuri, Xu Chu, and Yudian Zheng. 2018. Transform-data-by-example (tde) extensible data transformation in excel. In Proceedings of the 2018 International Conference on Management of Data. 1785\u20131788."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 2019 International Conference on Management of Data. 829\u2013846","author":"Heidari Alireza","year":"2019","unstructured":"Alireza Heidari, Joshua McGrath, Ihab F Ilyas, and Theodoros Rekatsinas. 2019. Holodetect: Few-shot learning for error detection. In Proceedings of the 2019 International Conference on Management of Data. 829\u2013846."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 2018 International Conference on Management of Data. 1377\u20131392","author":"Huang Zhipeng","year":"2018","unstructured":"Zhipeng Huang and Yeye He. 2018. Auto-detect: Data-driven error detection in tables. In Proceedings of the 2018 International Conference on Management of Data. 1377\u20131392."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10844-019-00562-z"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064034"},{"key":"e_1_2_1_53_1","volume-title":"CLX: Towards verifiable PBE data transformation. arXiv preprint arXiv:1803.00701","author":"Jin Zhongjun","year":"2018","unstructured":"Zhongjun Jin, Michael Cafarella, HV Jagadish, Sean Kandel, Michael Minar, and Joseph M Hellerstein. 2018. CLX: Towards verifiable PBE data transformation. arXiv preprint arXiv:1803.00701 (2018)."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407831"},{"key":"e_1_2_1_55_1","volume-title":"Reducibility among combinatorial problems","author":"Karp Richard M","unstructured":"Richard M Karp. 2010. Reducibility among combinatorial problems. Springer."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/358024.358054"},{"key":"e_1_2_1_57_1","volume-title":"The data warehouse toolkit: the complete guide to dimensional modeling","author":"Kimball Ralph","unstructured":"Ralph Kimball and Margy Ross. 2011. The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons."},{"key":"e_1_2_1_58_1","volume-title":"A fast algorithm for Steiner trees. Acta informatica 15","author":"Kou Lawrence","year":"1981","unstructured":"Lawrence Kou, George Markowsky, and Leonard Berman. 1981. A fast algorithm for Steiner trees. Acta informatica 15 (1981), 141\u2013145."},{"key":"e_1_2_1_59_1","unstructured":"Doris Jung-Lin Lee Dixin Tang Kunal Agarwal Thyne Boonmark Caitlyn Chen Jake Kang Ujjaini Mukhopadhyay Jerry Song Micah Yong Marti A Hearst et al. 2021. Lux: always-on visualization recommendations for exploratory dataframe workflows. arXiv preprint arXiv:2105.00121 (2021)."},{"key":"e_1_2_1_60_1","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive nlp tasks","volume":"33","author":"Lewis Patrick","year":"2020","unstructured":"Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, Tim Rockt\u00e4schel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459\u20139474.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452824"},{"key":"e_1_2_1_62_1","unstructured":"Peng Li Yeye He Cong Yan Yue Wang and Surajit Chaudhuri. 2023. Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples. arXiv:2307.14565 [cs.DB]"},{"key":"e_1_2_1_63_1","volume-title":"Dongmei Zhang, and Surajit Chaudhuri.","author":"Li Peng","year":"2023","unstructured":"Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, and Surajit Chaudhuri. 2023. Table-gpt: Table-tuned gpt for diverse table tasks. arXiv preprint arXiv:2310.09263 (2023)."},{"key":"e_1_2_1_64_1","unstructured":"Yiming Lin Yeye He and Surajit Chaudhuri. 2023. Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph. arXiv:2306.12515 [cs.DB]"},{"key":"e_1_2_1_65_1","volume-title":"Show me: Automatic presentation for visual analysis","author":"Mackinlay Jock","year":"2007","unstructured":"Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007. Show me: Automatic presentation for visual analysis. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1137\u20131144."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407801"},{"key":"e_1_2_1_67_1","volume-title":"Proceedings of the 2019 International Conference on Management of Data. 865\u2013882","author":"Mahdavi Mohammad","year":"2019","unstructured":"Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro Fernandez, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2019. Raha: A configuration-free error detection system. In Proceedings of the 2019 International Conference on Management of Data. 865\u2013882."},{"key":"e_1_2_1_68_1","volume-title":"Can foundation models wrangle your data? arXiv preprint arXiv:2205.09911","author":"Narayan Avanika","year":"2022","unstructured":"Avanika Narayan, Ines Chami, Laurel Orr, Simran Arora, and Christopher R\u00e9. 2022. Can foundation models wrangle your data? arXiv preprint arXiv:2205.09911 (2022)."},{"key":"e_1_2_1_69_1","volume-title":"Business intelligence. Handbook on decision support systems 2","author":"Negash Solomon","year":"2008","unstructured":"Solomon Negash and Paul Gray. 2008. Business intelligence. Handbook on decision support systems 2 (2008), 175\u2013193."},{"key":"e_1_2_1_70_1","volume-title":"TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations. arXiv preprint arXiv:2411.17110","author":"Nobari Arash Dargahi","year":"2024","unstructured":"Arash Dargahi Nobari and Davood Rafiei. 2024. TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations. arXiv preprint arXiv:2411.17110 (2024)."},{"key":"e_1_2_1_71_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]"},{"key":"e_1_2_1_72_1","doi-asserted-by":"crossref","unstructured":"John Platt et al. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers 10 3 (1999) 61\u201374.","DOI":"10.7551\/mitpress\/1113.003.0008"},{"key":"e_1_2_1_73_1","unstructured":"Alexandra Rostin Oliver Albrecht Jana Bauckmann Felix Naumann and Ulf Leser. 2009. A machine learning approach to foreign key discovery.. In WebDB."},{"key":"e_1_2_1_74_1","volume-title":"Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. 185\u2013196","author":"Shoshani Arie","year":"1997","unstructured":"Arie Shoshani. 1997. OLAP and statistical databases: Similarities and differences. In Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. 185\u2013196."},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447873"},{"key":"e_1_2_1_76_1","volume-title":"21st International Conference on Data Engineering (ICDE'05)","author":"Simitsis Alkis","year":"2005","unstructured":"Alkis Simitsis, Panos Vassiliadis, and Timos Sellis. 2005. Optimizing ETL processes in data warehouses. In 21st International Conference on Data Engineering (ICDE'05). Ieee, 564\u2013575."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.14778\/2977797.2977807"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2019.09.223"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559902"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2009070101"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/344816.344869"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/3062341.3062365"},{"key":"e_1_2_1_83_1","volume-title":"Proceedings of the 2019 International Conference on Management of Data. 811\u2013828","author":"Wang Pei","year":"2019","unstructured":"Pei Wang and Yeye He. 2019. Uni-detect: A unified approach to automated error detection in tables. In Proceedings of the 2019 International Conference on Management of Data. 811\u2013828."},{"key":"e_1_2_1_84_1","volume-title":"Denny Zhou, et al.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824\u201324837."},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/221270.221319"},{"key":"e_1_2_1_86_1","volume-title":"Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks. In International Conference on Management of Data (SIGMOD). ACM, 1539\u20131554","author":"Yan Cong","year":"2020","unstructured":"Cong Yan and Yeye He. 2020. Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks. In International Conference on Management of Data (SIGMOD). ACM, 1539\u20131554. https:\/\/www.microsoft.com\/en-us\/research\/publication\/auto-suggest-learning-to-recommend-data-preparation-steps-using-data-science-notebooks\/"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380568"},{"key":"e_1_2_1_88_1","volume-title":"Auto-pipeline: synthesizing complex data pipelines by-target using reinforcement learning and search. arXiv preprint arXiv:2106.13861","author":"Yang Junwen","year":"2021","unstructured":"Junwen Yang, Yeye He, and Surajit Chaudhuri. 2021. Auto-pipeline: synthesizing complex data pipelines by-target using reinforcement learning and search. arXiv preprint arXiv:2106.13861 (2021)."},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920944"},{"key":"e_1_2_1_90_1","volume-title":"2013 28th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 224\u2013234","author":"Zhang Sai","year":"2013","unstructured":"Sai Zhang and Yuyin Sun. 2013. Automatically synthesizing sql queries from input-output examples. In 2013 28th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 224\u2013234."},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300065"},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115409"},{"key":"e_1_2_1_93_1","volume-title":"Proceedings of the 1st international conference on very large data bases. 1\u201324","author":"Zloof Mosh\u00e9 M","year":"1975","unstructured":"Mosh\u00e9 M Zloof. 1975. Query-by-example: the invocation and definition of tables and forms. In Proceedings of the 1st international conference on very large data bases. 1\u201324."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3734839.3734856","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:02:31Z","timestamp":1756483351000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3734839.3734856"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3]]},"references-count":93,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["10.14778\/3734839.3734856"],"URL":"https:\/\/doi.org\/10.14778\/3734839.3734856","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,3]]},"assertion":[{"value":"2025-08-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}