{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T14:11:59Z","timestamp":1774534319017,"version":"3.50.1"},"reference-count":105,"publisher":"Association for Computing Machinery (ACM)","issue":"CSCW2","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":5,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1761812"],"award-info":[{"award-number":["1761812"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2021,10,13]]},"abstract":"<jats:p>While the open-source software development model has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small teams. We describe challenges to scaling data science collaborations and present a conceptual framework and ML programming model to address them. We instantiate these ideas in Ballet, the first lightweight framework for collaborative, open-source data science through a focus on feature engineering, and an accompanying cloud-based development environment. Using our framework, collaborators incrementally propose feature definitions to a repository which are each subjected to software and ML performance validation and can be automatically merged into an executable feature engineering pipeline. We leverage Ballet to conduct a case study analysis of an income prediction problem with 27 collaborators, and discuss implications for future designers of collaborative projects.<\/jats:p>","DOI":"10.1145\/3479575","type":"journal-article","created":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T02:46:19Z","timestamp":1634611579000},"page":"1-39","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Enabling Collaborative Data Science Development with the Ballet Framework"],"prefix":"10.1145","volume":"5","author":[{"given":"Micah J.","family":"Smith","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"given":"J\u00fcrgen","family":"Cito","sequence":"additional","affiliation":[{"name":"TU Wien, Vienna, Austria"}]},{"given":"Kelvin","family":"Lu","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"given":"Kalyan","family":"Veeramachaneni","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,10,18]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"The Modernization of Statistical Disclosure Limitation at the U","author":"Abowd John M.","unstructured":"John M. Abowd, Gary L. Benedetto, Simson L. Garfinkel, Scot A. Dahl, Aref N. Dajani, Matthew Graham, Michael B. Hawes, Vishesh Karwa, Daniel Kifer, Hang Kim, Philip Leclerc, Ashwin Machanavajjhala, Jerome P. Reiter, Rolando Rodriguez, Ian M. Schmutte, William N. Sexton, Phyllis E. Singer, and Lars Vilhuber. 2020. The Modernization of Statistical Disclosure Limitation at the U.S. Census Bureau. Working Paper. U.S. Census Bureau."},{"key":"e_1_2_2_2_1","unstructured":"American Community Survey Office. 2019. American Community Survey 2018 ACS 1-Year PUMS Files ReadMe. https:\/\/www2.census.gov\/programs-surveys\/acs\/tech_docs\/pums\/ACS2018_PUMS_README.pdf . Accessed 2021-08--21."},{"key":"e_1_2_2_3_1","volume-title":"Brainwash: A Data System for Feature Engineering. In 6th Biennial Conference on Innovative Data Systems Research. 1--4.","author":"Anderson Michael","year":"2013","unstructured":"Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher R\u00e9, and Ce Zhang. 2013. Brainwash: A Data System for Feature Engineering. In 6th Biennial Conference on Innovative Data Systems Research. 1--4."},{"key":"e_1_2_2_4_1","unstructured":"Peter Bailis. 2020. Humans Not Machines Are the Main Bottleneck in Modern Analytics. https:\/\/sisudata.com\/blog\/humans-not-machines-are-the-bottleneck-in-modern-analytics."},{"key":"e_1_2_2_5_1","unstructured":"Adam Baldwin. 2018. Details about the event-stream incident - The npm Blog. https:\/\/blog.npmjs.org\/post\/180565383195\/details-about-the-event-stream-incident. Accessed 2018--11--30."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/2594008.2594009"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_2_2_8_1","volume-title":"Proceedings of KDD Cup and Workshop","author":"Bennett James","year":"2007","unstructured":"James Bennett and Stan Lanning. 2007. The Netflix Prize. In Proceedings of KDD Cup and Workshop 2007. 1--4."},{"key":"e_1_2_2_9_1","volume-title":"Organization in open source communities: At the crossroads of the gift and market economies","author":"Berdou Evangelia","unstructured":"Evangelia Berdou. 2010. Organization in open source communities: At the crossroads of the gift and market economies. Routledge."},{"key":"e_1_2_2_10_1","volume-title":"Theoretical Coding: Text Analysis in. A companion to qualitative research 1","author":"B\u00f6hm Andreas","year":"2004","unstructured":"Andreas B\u00f6hm. 2004. Theoretical Coding: Text Analysis in. A companion to qualitative research 1 (2004)."},{"key":"e_1_2_2_11_1","unstructured":"Tolga Bolukbasi Kai-Wei Chang James Y. Zou Venkatesh Saligrama and A. Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In NIPS."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1083-6101.2007.00343.x"},{"key":"e_1_2_2_13_1","volume-title":"Proceedings of the 2nd SysML Conference. 1--14","author":"Breck Eric","year":"2019","unstructured":"Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning. In Proceedings of the 2nd SysML Conference. 1--14."},{"key":"e_1_2_2_14_1","unstructured":"Frederick P. Brooks Jr. 1995. The mythical man-month: essays on software engineering. Pearson Education."},{"key":"e_1_2_2_15_1","volume-title":"ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108--122","author":"Buitinck Lars","year":"2013","unstructured":"Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Ga\u00ebl Varoquaux. 2013. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108--122."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409700"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376729"},{"key":"e_1_2_2_18_1","volume-title":"Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices. In 33rd Conference on Neural Information Processing Systems. 1--11","author":"Chen Vincent","year":"2019","unstructured":"Vincent Chen, Sen Wu, Alexander J. Ratner, Jen Weng, and Christopher R\u00e9. 2019. Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices. In 33rd Conference on Neural Information Processing Systems. 1--11."},{"key":"e_1_2_2_19_1","volume-title":"Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing - CSCW '15","author":"Cheng Justin","year":"2015","unstructured":"Justin Cheng and Michael S. Bernstein. 2015. Flock: Hybrid Crowd-Machine Learning Classifiers. Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing - CSCW '15 (2015), 600--611."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2998181.2998265"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.03.067"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/APSEC.2005.22"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3359219"},{"key":"e_1_2_2_24_1","article-title":"Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project","volume":"19","author":"Cock Dean De","year":"2011","unstructured":"Dean De Cock. 2011. Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project. Journal of Statistics Education 19, 3 (2011).","journal-title":"Journal of Statistics Education"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2993412.3003382"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2347736.2347755"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/1791834.1791836"},{"key":"e_1_2_2_28_1","unstructured":"Epidemic Prediction Initiative [n.d.]. Dengue Forecasting Project. https:\/\/web.archive.org\/web\/20190916180225\/https: \/\/predict.phiresearchlab.org\/post\/5a4fcc3e2c1b1669c22aa261. Accessed 2018-04--30."},{"key":"e_1_2_2_29_1","volume-title":"AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv:2003.06505 [cs, stat] (March","author":"Erickson Nick","year":"2020","unstructured":"Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. 2020. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv:2003.06505 [cs, stat] (March 2020). arXiv:2003.06505 [cs, stat]"},{"key":"e_1_2_2_30_1","volume-title":"Fabrik: An Online Collaborative Neural Network Editor","author":"Garg Utsav","year":"2018","unstructured":"Utsav Garg, Viraj Prabhu, Deshraj Yadav, Ram Ramrakhya, Harsh Agrawal, and Dhruv Batra. 2018. Fabrik: An Online Collaborative Neural Network Editor. arXiv e-prints, Article arXiv:1810.11649 (2018). arXiv:1810.11649"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3320269.3384745"},{"key":"e_1_2_2_32_1","unstructured":"GNU [n.d.]. The GNU Operating System. https:\/\/www.gnu.org."},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568260"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884826"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2015.55"},{"key":"e_1_2_2_36_1","volume-title":"2014 NIPS Workshop on Software Engineering for Machine Learning. 1--8.","author":"Roger","unstructured":"Roger B. Grosse and David K. Duvenaud. 2014. Testing MCMC code. In 2014 NIPS Workshop on Software Engineering for Machine Learning. 1--8."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944968"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/s0166--4115(08)62386--9"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2010.05.008"},{"key":"e_1_2_2_40_1","volume-title":"Meet Michelangelo: Uber's Machine Learning Platform. https:\/\/eng.uber.com\/michelangelo-machine-learning-platform\/. Accessed 2019-07-01","author":"Hermann Jeremy","year":"2017","unstructured":"Jeremy Hermann and Mike Del Balso. 2017. Meet Michelangelo: Uber's Machine Learning Platform. https:\/\/eng.uber.com\/michelangelo-machine-learning-platform\/. Accessed 2019-07-01."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3134688"},{"key":"e_1_2_2_42_1","unstructured":"Jez Humble and David Farley. 2010. Continuous delivery: reliable software releases through build test and deployment automation. Pearson Education."},{"key":"e_1_2_2_43_1","volume-title":"Automated Sanity Checking for ML Data Sets. Workshop on ML Systems at NIPS 2017","author":"Hynes Nick","year":"2017","unstructured":"Nick Hynes, D Sculley, and Michael Terry. 2017. The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets. Workshop on ML Systems at NIPS 2017 (2017)."},{"key":"e_1_2_2_44_1","unstructured":"Insight Lane 2019. Crash Model. https:\/\/github.com\/insight-lane\/crash-model."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.infoecopol.2006.07.001"},{"key":"e_1_2_2_46_1","volume-title":"Model Assertions for Monitoring and Improving ML Models. arXiv:2003.01668 [cs] (March","author":"Kang Daniel","year":"2020","unstructured":"Daniel Kang, Deepti Raghavan, Peter Bailis, and Matei Zaharia. 2020. Model Assertions for Monitoring and Improving ML Models. arXiv:2003.01668 [cs] (March 2020). arXiv:2003.01668 [cs]"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSAA.2015.7344858"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403290"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0123"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173748"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2016.0190"},{"key":"e_1_2_2_52_1","volume-title":"Positioning and Power in Academic Publishing: Players, Agents and Agendas","author":"Kluyver Thomas","unstructured":"Thomas Kluyver, Benjamin Ragan-Kelley, Fernando P\u00e9rez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Dami\u00e1n Avila, Safia Abdalla, and Carol Willing. 2016. Jupyter Notebooks -- a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas, F. Loizides and B. Schmidt (Eds.). IOS Press, 87--90."},{"key":"e_1_2_2_53_1","unstructured":"Ron Kohavi. 1996. Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid. In KDD. 1--6."},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.69.066138"},{"key":"e_1_2_2_55_1","volume-title":"Latoza and Andr\u00e9 Van Der Hoek","author":"Thomas","year":"2016","unstructured":"Thomas D. Latoza and Andr\u00e9 Van Der Hoek. 2016. Crowdsourcing in Software Engineering: Models, Opportunities, and Challenges. IEEE Software (2016), 1--13."},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2013.137"},{"key":"e_1_2_2_57_1","unstructured":"Linux [n.d.]. The Linux Kernel Organization. https:\/\/www.kernel.org."},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3361118"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2019.23418"},{"key":"e_1_2_2_60_1","unstructured":"Meta Kaggle 2021. Meta Kaggle: Kaggle's public data on competitions users submission scores and kernels. https:\/\/www.kaggle.com\/kaggle\/meta-kaggle. Version 539."},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3392859"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300356"},{"key":"e_1_2_2_63_1","volume-title":"A conceptual framework for the design of organizational control mechanisms. Management science 25, 9","author":"Ouchi William G.","year":"1979","unstructured":"William G. Ouchi. 1979. A conceptual framework for the design of organizational control mechanisms. Management science 25, 9 (1979), 833--848."},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSAA.2016.49"},{"key":"e_1_2_2_65_1","volume-title":"On the security of open source software. Information systems journal 12, 1","author":"Payne Christian","year":"2002","unstructured":"Christian Payne. 2002. On the security of open source software. Information systems journal 12, 1 (2002), 61--78."},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3202667.3202694"},{"key":"e_1_2_2_67_1","volume-title":"Sen Wu, Daniel Selsam, and Christopher R\u00e9.","author":"Ratner Alexander","year":"2016","unstructured":"Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, and Christopher R\u00e9. 2016. Data Programming: Creating Large Training Sets, Quickly. Advances in neural information processing systems 29 (2016), 3567--3575."},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12130-999-1026-0"},{"key":"e_1_2_2_69_1","volume-title":"Proceedings of the 2nd SysML Conference. 1--12","author":"Renggli Cedric","year":"2019","unstructured":"Cedric Renggli, Bojan Karla?, Bolin Ding, Feng Liu, Kevin Schawinski, Wentao Wu, and Ce Zhang. 2019. Continuous Integration of Machine Learning Models With ease.ml\/ci: Towards a Rigorous Yet Practical Treatment. In Proceedings of the 2nd SysML Conference. 1--12."},{"key":"e_1_2_2_70_1","volume-title":"Slaughter","author":"Roberts Jeffrey A.","year":"2006","unstructured":"Jeffrey A. Roberts, Il-Horn Hann, and Sandra A. Slaughter. 2006. Understanding the motivations, participation, and performance of open source software developers: A longitudinal study of the Apache projects. Management science 52, 7 (2006), 984--999."},{"key":"e_1_2_2_71_1","volume-title":"Refactoring Machine Learning. In Workshop on Critiquing and Correcting Trends in Machine Learning at NeuRIPS","author":"Ross Andrew Slavin","year":"2018","unstructured":"Andrew Slavin Ross and Jessica Zosa Forde. 2018. Refactoring Machine Learning. In Workshop on Critiquing and Correcting Trends in Machine Learning at NeuRIPS 2018. 1--6."},{"key":"e_1_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173606"},{"key":"e_1_2_2_73_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-008--9102--8"},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1915006117"},{"key":"e_1_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3202710.3203148"},{"key":"e_1_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPC.2016.7503737"},{"key":"e_1_2_2_77_1","volume-title":"Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems","author":"Sculley D","year":"2015","unstructured":"D Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Fran\u00e7ois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems (2015), 2494--2502."},{"key":"e_1_2_2_78_1","unstructured":"Ben Shneiderman Catherine Plaisant Maxine Cohen Steven Jacobs Niklas Elmqvist and Nicholas Diakopoulos. 2016. Designing the user interface: strategies for effective human-computer interaction. Pearson."},{"key":"e_1_2_2_79_1","unstructured":"Micah J. Smith. 2021. Collaborative Open and Automated Data Science. Ph.D. Thesis. Massachusetts Institute of Technology."},{"key":"e_1_2_2_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386146"},{"key":"e_1_2_2_81_1","volume-title":"FeatureHub: Towards Collaborative Data Science. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). 590--600","author":"Smith Micah J.","year":"2017","unstructured":"Micah J. Smith, Roy Wedge, and Kalyan Veeramachaneni. 2017. FeatureHub: Towards Collaborative Data Science. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). 590--600."},{"key":"e_1_2_2_82_1","unstructured":"Stockfish [n.d.]. Stockfish: A strong open source chess engine. https:\/\/stockfishchess.org. Accessed 2019-09-05."},{"key":"e_1_2_2_83_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1355"},{"key":"e_1_2_2_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/VL\/HCC50065.2020.9127207"},{"key":"e_1_2_2_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/2641190.2641198"},{"key":"e_1_2_2_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2014.62"},{"key":"e_1_2_2_87_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786805.2786850"},{"key":"e_1_2_2_88_1","volume-title":"Towards Feature Engineering at Scale for Data from Massive Open Online Courses. arXiv:1407.5238 [cs]","author":"Veeramachaneni Kalyan","year":"2014","unstructured":"Kalyan Veeramachaneni, Una-May O'Reilly, and Colin Taylor. 2014. Towards Feature Engineering at Scale for Data from Massive Open Online Courses. arXiv:1407.5238 [cs] (2014). arXiv:1407.5238"},{"key":"e_1_2_2_89_1","volume-title":"Proceedings of the 29th International Conference on Machine Learning","author":"Wagstaff Kiri L.","year":"2012","unstructured":"Kiri L. Wagstaff. 2012. Machine Learning That Matters. In Proceedings of the 29th International Conference on Machine Learning. Edinburgh, Scotland, UK, 1--6."},{"key":"e_1_2_2_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/3359141"},{"key":"e_1_2_2_91_1","volume-title":"How Much Automation Does a Data Scientist Want? arXiv:2101.03970 [cs] (Jan","author":"Wang Dakuo","year":"2021","unstructured":"Dakuo Wang, Q. Vera Liao, Yunfeng Zhang, Udayan Khurana, Horst Samulowitz, Soya Park, Michael Muller, and Lisa Amini. 2021. How Much Automation Does a Data Scientist Want? arXiv:2101.03970 [cs] (Jan. 2021). arXiv:2101.03970 [cs]"},{"key":"e_1_2_2_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/3359313"},{"key":"e_1_2_2_93_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2441716"},{"key":"e_1_2_2_94_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300911"},{"key":"e_1_2_2_95_1","volume-title":"Gonzalez","author":"Wooders Sarah","year":"2021","unstructured":"Sarah Wooders, Peter Schafhalter, and Joseph E. Gonzalez. 2021. Feature Stores: The Data Side of ML Pipelines. https:\/\/medium.com\/riselab\/feature-stores-the-data-side-of-ml-pipelines-7083d69bff1c."},{"key":"e_1_2_2_96_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.197"},{"key":"e_1_2_2_97_1","volume-title":"Doris Jung-Lin Lee, Niloufar Salehi, and Aditya Parameswaran.","author":"Xin Doris","year":"2021","unstructured":"Doris Xin, Eva Yiwei Wu, Doris Jung-Lin Lee, Niloufar Salehi, and Aditya Parameswaran. 2021. Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows. arXiv:2101.04834 [cs] (Jan. 2021). arXiv:2101.04834 [cs]"},{"key":"e_1_2_2_98_1","unstructured":"Lei Xu Maria Skoularidou Alfredo Cuesta-Infante and Kalyan Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In NeurIPS."},{"key":"e_1_2_2_99_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196709.3196729"},{"key":"e_1_2_2_100_1","volume-title":"Taking Human out of Learning Applications: A Survey on Automated Machine Learning. arXiv:1810.13306 [cs, stat] (Dec","author":"Yao Quanming","year":"2019","unstructured":"Quanming Yao, Mengshuo Wang, Yuqiang Chen, Wenyuan Dai, Yu-Feng Li, Wei-Wei Tu, Qiang Yang, and Yang Yu. 2019. Taking Human out of Learning Applications: A Survey on Automated Machine Learning. arXiv:1810.13306 [cs, stat] (Dec. 2019). arXiv:1810.13306 [cs, stat]"},{"key":"e_1_2_2_101_1","volume-title":"Scalable and Accurate Online Feature Selection for Big Data. TKDD 11","author":"Yu Kui","year":"2016","unstructured":"Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and Accurate Online Feature Selection for Big Data. TKDD 11 (2016), 16:1--16:39."},{"key":"e_1_2_2_102_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2007.19"},{"key":"e_1_2_2_103_1","doi-asserted-by":"publisher","DOI":"10.1145\/3392826"},{"key":"e_1_2_2_104_1","doi-asserted-by":"publisher","unstructured":"Yangyang Zhao Alexander Serebrenik Yuming Zhou Vladimir Filkov and Bogdan Vasilescu. 2017. The Impact of Continuous Integration on Other Software Development Practices: A Large-Scale Empirical Study. In 2017 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE). 60--71. https:\/\/doi.org\/10.1109\/ASE.2017.8115619","DOI":"10.1109\/ASE.2017.8115619"},{"key":"e_1_2_2_105_1","volume-title":"Streaming feature selection using alpha-investing. Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05","author":"Zhou Jing","year":"2005","unstructured":"Jing Zhou, Dean Foster, Robert Stine, and Lyle Ungar. 2005. Streaming feature selection using alpha-investing. Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05 (2005), 384--393."}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3479575","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3479575","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3479575","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T05:05:40Z","timestamp":1752469540000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3479575"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,13]]},"references-count":105,"journal-issue":{"issue":"CSCW2","published-print":{"date-parts":[[2021,10,13]]}},"alternative-id":["10.1145\/3479575"],"URL":"https:\/\/doi.org\/10.1145\/3479575","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,13]]},"assertion":[{"value":"2021-10-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}