{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T18:37:34Z","timestamp":1772044654697,"version":"3.50.1"},"reference-count":91,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Programming is an essential activity in data science (DS). Unlike regular software developers, DS programmers often use Jupyter notebooks instead of conventional IDEs. Moreover, DS programmers focus on statistics, data analytics, and modeling rather than writing production-ready code following best practices in software engineering. Thus, in order to provide effective tool support to improve their productivity, it is important to understand what kinds of errors they make and how they fix them. Previous studies have analyzed DS code from public code-sharing platforms such as GitHub and Kaggle. However, they only accounted for code changes committed to the version history, omitting many programming mistakes that are resolved before code commits. To bridge the gap, we present an in-depth analysis of the fine-grained logs of a DS competition, which includes 390 Jupyter Notebooks written by participants over six weeks. In addition, we conducted semi-structured interviews with 10 DS programmers from different domains to understand the reasons behind their programming mistakes. We identified several unique programming mistakes and fix patterns that had not been reported before, highlighting opportunities for designing new tool support for DS programming.<\/jats:p>","DOI":"10.1145\/3729352","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"1824-1846","source":"Crossref","is-referenced-by-count":1,"title":["Towards Understanding Fine-Grained Programming Mistakes and Fixing Patterns in Data Science"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-9108-8473","authenticated-orcid":false,"given":"Wei-Hao","family":"Chen","sequence":"first","affiliation":[{"name":"Purdue University, West Lafayette, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4626-5509","authenticated-orcid":false,"given":"Jia Lin","family":"Cheoh","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3582-5091","authenticated-orcid":false,"given":"Manthan","family":"Keim","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8631-0955","authenticated-orcid":false,"given":"Sabine","family":"Brunswicker","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5468-9347","authenticated-orcid":false,"given":"Tianyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2019. The Data Scientist Profile 2019 - Skills Experience Education Of 1 001 Data Scientists. https:\/\/365datascience.com\/career-advice\/career-guides\/data-scientist-profile\/."},{"key":"e_1_2_1_2_1","unstructured":"2024. IPython. https:\/\/ipython.readthedocs.io\/."},{"key":"e_1_2_1_3_1","unstructured":"2024. Jupyter Notebook. https:\/\/jupyter.org\/."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2642937.2642990"},{"key":"e_1_2_1_5_1","volume-title":"Breno Dantas Cruz, and Hridesh Rajan","author":"Ahmed Shibbir","year":"2023","unstructured":"Shibbir Ahmed, Mohammad Wardat, Hamid Bagheri, Breno Dantas Cruz, and Hridesh Rajan. 2023. Characterizing Bugs in Python and R Data Analytics Programs. arXiv preprint arXiv:2306.08632."},{"key":"e_1_2_1_6_1","volume-title":"2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 293\u2013304","author":"Rahat Tamjid Al","year":"2019","unstructured":"Tamjid Al Rahat, Yu Feng, and Yuan Tian. 2019. Oauthlint: An empirical study on oauth bugs in android applications. In 2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 293\u2013304."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1007\/s10664-023-10352-5","article-title":"What constitutes debugging? An exploratory study of debugging episodes","volume":"28","author":"Alaboudi Abdulaziz","year":"2023","unstructured":"Abdulaziz Alaboudi and Thomas D LaToza. 2023. What constitutes debugging? An exploratory study of debugging episodes. Empirical Software Engineering, 28, 5 (2023), 117.","journal-title":"Empirical Software Engineering"},{"key":"e_1_2_1_8_1","volume-title":"2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 129\u2013141","author":"Bavishi Rohan","year":"2021","unstructured":"Rohan Bavishi, Shadaj Laddad, Hiroaki Yoshida, Mukul R Prasad, and Koushik Sen. 2021. Vizsmith: Automated visualization synthesis by mining data-science notebooks. In 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 129\u2013141."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180175"},{"key":"e_1_2_1_10_1","unstructured":"Bruce Lawrence Berg. 2001. Qualitative research methods for the social sciences. Allyn & Bacon."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSMR.2013.23"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510057"},{"key":"e_1_2_1_13_1","volume-title":"Amelia McNamara, Philipp Burckhardt, Allison Theobold, Amal Abdel-Ghani, and Greg Wilson.","author":"Bodwin Kelly Nicole","year":"2022","unstructured":"Kelly Nicole Bodwin, Ian Flores Siaca, Amelia McNamara, Philipp Burckhardt, Allison Theobold, Amal Abdel-Ghani, and Greg Wilson. 2022. \"Looks okay to me\": A study of best practice in data analysis code review. ICOTS."},{"key":"e_1_2_1_14_1","volume-title":"Using thematic analysis in psychology. Qualitative research in psychology, 3, 2","author":"Braun Virginia","year":"2006","unstructured":"Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology, 3, 2 (2006), 77\u2013101."},{"key":"e_1_2_1_15_1","volume-title":"2019 IEEE symposium on visual languages and human-centric computing (VL\/HCC). 25\u201334","author":"Cai Carrie J","year":"2019","unstructured":"Carrie J Cai and Philip J Guo. 2019. Software developers learning machine learning: Motivations, hurdles, and desires. In 2019 IEEE symposium on visual languages and human-centric computing (VL\/HCC). 25\u201334."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142574"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376729"},{"key":"e_1_2_1_18_1","volume-title":"CoWrangler: Recommender System for Data-Wrangling Scripts. In Companion of the 2023 International Conference on Management of Data. 147\u2013150","author":"Chopra Bhavya","year":"2023","unstructured":"Bhavya Chopra, Anna Fariha, Sumit Gulwani, Austin Z Henley, Daniel Perelman, Mohammad Raza, Sherry Shi, Danny Simmons, and Ashish Tiwari. 2023. CoWrangler: Recommender System for Data-Wrangling Scripts. In Companion of the 2023 International Conference on Management of Data. 147\u2013150."},{"key":"e_1_2_1_19_1","volume-title":"Exploratory data mining and data cleaning","author":"Dasu Tamraparni","unstructured":"Tamraparni Dasu and Theodore Johnson. 2003. Exploratory data mining and data cleaning. John Wiley & Sons."},{"key":"e_1_2_1_20_1","volume-title":"Eduardo Santana de Almeida","author":"de Santana Taijara Loiola","year":"2022","unstructured":"Taijara Loiola de Santana, Paulo Anselmo da Mota Silveira Neto, Eduardo Santana de Almeida, and Iftekhar Ahmed. 2022. Bug Analysis in Jupyter Notebook Projects: An Empirical Study. ACM Transactions on Software Engineering and Methodology."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510457.3513042"},{"key":"e_1_2_1_22_1","volume-title":"The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering: Companion Papers. 549\u2013552","author":"Evans Robert B","year":"2007","unstructured":"Robert B Evans and Alberto Savoia. 2007. Differential testing: a new approach to change detection. In The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering: Companion Papers. 549\u2013552."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2642937.2642982"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3524842.3528447"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581099"},{"key":"e_1_2_1_26_1","volume-title":"Dependable Software Systems Engineering","author":"Gulwani Sumit","unstructured":"Sumit Gulwani. 2016. Programming by examples-and its applications in data wrangling. In Dependable Software Systems Engineering. IOS Press, 137\u2013158."},{"key":"e_1_2_1_27_1","volume-title":"2019 IEEE\/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 71\u201380","author":"Gulzar Muhammad Ali","year":"2019","unstructured":"Muhammad Ali Gulzar, Yongkang Zhu, and Xiaofeng Han. 2019. Perception and practices of differential testing. In 2019 IEEE\/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 71\u201380."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering. 1282\u20131294","author":"Huang Junjie","year":"2024","unstructured":"Junjie Huang, Daya Guo, Chenglong Wang, Jiazhen Gu, Shuai Lu, Jeevana Priya Inala, Cong Yan, Jianfeng Gao, Nan Duan, and Michael R Lyu. 2024. Contextualized Data-Wrangling Code Generation in Computational Notebooks. In Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering. 1282\u20131294."},{"key":"e_1_2_1_29_1","volume-title":"On the Executability of R Markdown Files. In 2024 IEEE\/ACM 21st International Conference on Mining Software Repositories (MSR). 254\u2013264","author":"Islam Md Anaytul","year":"2024","unstructured":"Md Anaytul Islam, Muhammad Asaduzzman, and Shaowei Wang. 2024. On the Executability of R Markdown Files. In 2024 IEEE\/ACM 21st International Conference on Mining Software Repositories (MSR). 254\u2013264."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338955"},{"key":"e_1_2_1_31_1","volume-title":"2016 23rd Asia-Pacific Software Engineering Conference (APSEC). 105\u2013112","author":"Jimenez Matthieu","year":"2016","unstructured":"Matthieu Jimenez, Mike Papadakis, and Yves Le Traon. 2016. An empirical analysis of vulnerabilities in openssl and the linux kernel. In 2016 23rd Asia-Pacific Software Engineering Conference (APSEC). 105\u2013112."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2610384.2628055"},{"key":"e_1_2_1_33_1","volume-title":"Jupyter notebooks on github: characteristics and code clones. The Art, Science, and Engineering of Programming, 5, 3","author":"K\u00e4ll\u00e9n Malin","year":"2021","unstructured":"Malin K\u00e4ll\u00e9n, Ulf Sigvardsson, and Tobias Wrigstad. 2021. Jupyter notebooks on github: characteristics and code clones. The Art, Science, and Engineering of Programming, 5, 3 (2021)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978942.1979444"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379597.3387491"},{"key":"e_1_2_1_36_1","unstructured":"Staffs Keele. 2007. Guidelines for performing systematic literature reviews in software engineering. Technical report ver. 2.3 ebse technical report. ebse."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLHCC.2017.8103446"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884783"},{"key":"e_1_2_1_39_1","first-page":"87","article-title":"Jupyter Notebooks-a publishing format for reproducible computational workflows","volume":"2016","author":"Kluyver Thomas","year":"2016","unstructured":"Thomas Kluyver, Benjamin Ragan-Kelley, Fernando P\u00e9rez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, and Sylvain Corlay. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows.. Elpub, 2016 (2016), 87\u201390.","journal-title":"Elpub"},{"key":"e_1_2_1_40_1","volume-title":"Literate programming. The computer journal, 27, 2","author":"Knuth Donald Ervin","year":"1984","unstructured":"Donald Ervin Knuth. 1984. Literate programming. The computer journal, 27, 2 (1984), 97\u2013111."},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","first-page":"2020","DOI":"10.1109\/TSE.2022.3208210","article-title":"Impact of software engineering research in practice: A patent and author survey analysis","volume":"49","author":"Kotti Zoe","year":"2022","unstructured":"Zoe Kotti, Georgios Gousios, and Diomidis Spinellis. 2022. Impact of software engineering research in practice: A patent and author survey analysis. IEEE Transactions on Software Engineering, 49, 4 (2022), 2020\u20132038.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_2_1_42_1","volume-title":"Duet: Helping data analysis novices conduct pairwise comparisons by minimal specification","author":"Law Po-Ming","year":"2018","unstructured":"Po-Ming Law, Rahul C Basole, and Yanhong Wu. 2018. Duet: Helping data analysis novices conduct pairwise comparisons by minimal specification. IEEE transactions on visualization and computer graphics, 25, 1 (2018), 427\u2013437."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2015.2454513"},{"key":"e_1_2_1_44_1","volume-title":"2022 IEEE Symposium on Security and Privacy (SP). 2078\u20132095","author":"Lin Zhenpeng","year":"2022","unstructured":"Zhenpeng Lin, Yueqi Chen, Yuhang Wu, Dongliang Mu, Chensheng Yu, Xinyu Xing, and Kang Li. 2022. GREBE: Unveiling exploitation potential for Linux kernel bugs. In 2022 IEEE Symposium on Security and Privacy (SP). 2078\u20132095."},{"key":"e_1_2_1_45_1","volume-title":"2017 IEEE\/ACM 39th International Conference on Software Engineering (ICSE). 381\u2013392","author":"Ma Wanwangying","year":"2017","unstructured":"Wanwangying Ma, Lin Chen, Xiangyu Zhang, Yuming Zhou, and Baowen Xu. 2017. How do developers fix cross-project correlated bugs? a case study on the github scientific python ecosystem. In 2017 IEEE\/ACM 39th International Conference on Software Engineering (ICSE). 381\u2013392."},{"key":"e_1_2_1_46_1","first-page":"100","article-title":"Differential testing for software","volume":"10","author":"McKeeman William M","year":"1998","unstructured":"William M McKeeman. 1998. Differential testing for software. Digital Technical Journal, 10, 1 (1998), 100\u2013107.","journal-title":"Digital Technical Journal"},{"key":"e_1_2_1_47_1","volume-title":"pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing, 14, 9","author":"McKinney Wes","year":"2011","unstructured":"Wes McKinney. 2011. pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing, 14, 9 (2011), 1\u20139."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3559507"},{"key":"e_1_2_1_49_1","unstructured":"Paul Timothy Mooney. 2022. Kaggle Survey 2022: All Results. https:\/\/www.kaggle.com\/code\/paultimothymooney\/kaggle-survey-2022-all-results Accessed: 2025"},{"key":"e_1_2_1_50_1","volume-title":"2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE). 112\u2013124","author":"Ni Ansong","year":"2021","unstructured":"Ansong Ni, Daniel Ramos, Aidan ZH Yang, In\u00eas Lynce, Vasco Manquinho, Ruben Martins, and Claire Le Goues. 2021. Soar: a synthesis approach for data science api refactoring. In 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE). 112\u2013124."},{"key":"e_1_2_1_51_1","volume-title":"2013 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement. 55\u201364","author":"Ocariza Frolin","year":"2013","unstructured":"Frolin Ocariza, Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. 2013. An empirical study of client-side JavaScript bugs. In 2013 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement. 55\u201364."},{"key":"e_1_2_1_52_1","volume-title":"2011 IEEE 22nd International Symposium on Software Reliability Engineering. 100\u2013109","author":"Ocariza Frolin S","year":"2011","unstructured":"Frolin S Ocariza Jr, Karthik Pattabiraman, and Benjamin Zorn. 2011. JavaScript errors in the wild: An empirical study. In 2011 IEEE 22nd International Symposium on Software Reliability Engineering. 100\u2013109."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549130"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3608132"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Deepthi Raghunandan Aayushi Roy Shenzhi Shi Niklas Elmqvist and Leilani Battle. 2022. Code code evolution: Understanding how people change data science notebooks over time. arXiv preprint arXiv:2209.02851.","DOI":"10.1145\/3544548.3580997"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-022-10229-z"},{"key":"e_1_2_1_57_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 72\u201377","author":"Reimann Lars","year":"2023","unstructured":"Lars Reimann and G\u00fcnter Kniesel-W\u00fcnsche. 2023. Safe-DS: A Domain Specific Language to Make Data Science Safe. In 2023 IEEE\/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 72\u201377."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3524610.3529156"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173606"},{"key":"e_1_2_1_60_1","volume-title":"Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering, 14","author":"Runeson Per","year":"2009","unstructured":"Per Runeson and Martin H\u00f6st. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering, 14 (2009), 131\u2013164."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196398.3196473"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884829"},{"key":"e_1_2_1_63_1","volume-title":"Guide to advanced empirical software engineering. 93","author":"Shull Forrest","unstructured":"Forrest Shull, Janice Singer, and Dag IK Sj\u00f8berg. 2008. Guide to advanced empirical software engineering. 93, Springer."},{"key":"e_1_2_1_64_1","volume-title":"2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 1033\u20131037","author":"Sivasothy Shangeetha","year":"2021","unstructured":"Shangeetha Sivasothy. 2021. DSInfoSearch: supporting experimentation process of data scientists. In 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 1033\u20131037."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173878"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2865024"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510457.3513032"},{"key":"e_1_2_1_68_1","volume-title":"2019 IEEE 35th International Conference on Data Engineering (ICDE). 1964\u20131967","author":"Tang MingJie","year":"2019","unstructured":"MingJie Tang, Saisai Shao, Weiqing Yang, Yanbo Liang, Yongyang Yu, Bikas Saha, and Dongjoon Hyun. 2019. Sac: A system for big data lineage tracking. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1964\u20131967."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE.2012.22"},{"key":"e_1_2_1_70_1","volume-title":"Exploratory data analysis. 2","author":"Tukey John Wilder","unstructured":"John Wilder Tukey. 1977. Exploratory data analysis. 2, Springer."},{"key":"e_1_2_1_71_1","volume-title":"Data science in action","author":"Der Aalst Wil Van","unstructured":"Wil Van Der Aalst and Wil van der Aalst. 2016. Data science in action. Springer."},{"key":"e_1_2_1_72_1","volume-title":"The NumPy array: a structure for efficient numerical computation. Computing in science & engineering, 13, 2","author":"Der Walt Stefan Van","year":"2011","unstructured":"Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in science & engineering, 13, 2 (2011), 22\u201330."},{"key":"e_1_2_1_73_1","volume-title":"2021 IEEE\/ACM 18th International Conference on Mining Software Repositories (MSR). 179\u2013189","author":"Vidoni Melina","year":"2021","unstructured":"Melina Vidoni. 2021. Self-admitted technical debt in r packages: An exploratory study. In 2021 IEEE\/ACM 18th International Conference on Mining Software Repositories (MSR). 179\u2013189."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3559558"},{"key":"e_1_2_1_75_1","volume-title":"2017 IEEE\/ACM 14th International Conference on Mining Software Repositories (MSR). 413\u2013424","author":"Wan Zhiyuan","year":"2017","unstructured":"Zhiyuan Wan, David Lo, Xin Xia, and Liang Cai. 2017. Bug characteristics in blockchain systems: a large-scale empirical study. In 2017 IEEE\/ACM 14th International Conference on Mining Software Repositories (MSR). 413\u2013424."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502123"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416585"},{"key":"e_1_2_1_78_1","volume-title":"Proceedings of the ACM\/IEEE 42nd international conference on software engineering: new ideas and emerging results. 53\u201356","author":"Wang Jiawei","year":"2020","unstructured":"Jiawei Wang, Li Li, and Andreas Zeller. 2020. Better code, better sharing: on the need of analyzing jupyter notebooks. In Proceedings of the ACM\/IEEE 42nd international conference on software engineering: new ideas and emerging results. 53\u201356."},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445527"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3417943"},{"key":"e_1_2_1_81_1","volume-title":"International management review, 15, 1","author":"Williams Michael","year":"2019","unstructured":"Michael Williams and Tami Moser. 2019. The art of coding and thematic exploration in qualitative research. International management review, 15, 1 (2019), 45\u201355."},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00129"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549101"},{"key":"e_1_2_1_84_1","volume-title":"R markdown: The definitive guide","author":"Xie Yihui","unstructured":"Yihui Xie, Joseph J Allaire, and Garrett Grolemund. 2018. R markdown: The definitive guide. CRC Press."},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556918"},{"key":"e_1_2_1_86_1","volume-title":"2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 304\u2013316","author":"Yang Chenyang","year":"2021","unstructured":"Chenyang Yang, Shurui Zhou, Jin LC Guo, and Christian K\u00e4stner. 2021. Subtle bugs everywhere: Generating documentation for data wrangling code. In 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 304\u2013316."},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-021-10087-1"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/1134285.1134333"},{"key":"e_1_2_1_89_1","volume-title":"Proceedings of the ACM on Human-Computer Interaction, 4, CSCW1","author":"Zhang Amy X","year":"2020","unstructured":"Amy X Zhang, Michael Muller, and Dakuo Wang. 2020. How do data science workers collaborate? roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction, 4, CSCW1 (2020), 1\u201323."},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/3213846.3213866"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1145\/2745802.2745808"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729352","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:26:11Z","timestamp":1750346771000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729352"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":91,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3729352"],"URL":"https:\/\/doi.org\/10.1145\/3729352","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}