{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T18:10:30Z","timestamp":1776103830767,"version":"3.50.1"},"reference-count":167,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:p>\n            Computational notebooks (e.g., Jupyter, Google Colab) are widely used by data scientists. A key feature of notebooks is the interactive computing model of iteratively executing\n            <jats:italic>cells<\/jats:italic>\n            (i.e., a set of statements) and observing the result (e.g., model or plot). Unfortunately, existing notebook systems do not offer\n            <jats:italic>time-traveling to past states<\/jats:italic>\n            : when the user executes a cell, the notebook\n            <jats:italic>session state<\/jats:italic>\n            consisting of user-defined variables can be\n            <jats:italic>irreversibly modified<\/jats:italic>\n            \u2014e.g., the user cannot 'un-drop' a dataframe column. This is because, unlike DBMS, existing notebook systems do not keep track of the session state. Existing techniques for checkpointing and restoring session states, such as OS-level memory snapshot or application-level session dump, are insufficient: checkpointing can incur prohibitive storage costs and may fail, while restoration can only be inefficiently performed from scratch by fully loading checkpoint files.\n          <\/jats:p>\n          <jats:p>\n            In this paper, we introduce a new notebook system, Kishu, that offers time-traveling to and from arbitrary notebook states using an efficient and fault-tolerant incremental checkpoint and checkout mechanism. Kishu creates incremental checkpoints that are small and correctly preserve complex inter-variable dependencies at a novel\n            <jats:italic>Co-variable<\/jats:italic>\n            granularity. Then, to return to a previous state, Kishu accurately identifies the\n            <jats:italic>state difference<\/jats:italic>\n            between the current and target states to perform incremental checkout at sub-second latency with minimal data loading. Kishu is compatible with 146 object classes from popular data science libraries (e.g., Ray, Spark, PyTorch), and reduces checkpoint size and checkout time by up to 4.55\u00d7 and 9.02\u00d7, respectively, on a variety of notebooks.\n          <\/jats:p>","DOI":"10.14778\/3717755.3717759","type":"journal-article","created":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T15:51:49Z","timestamp":1747756309000},"page":"970-985","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Kishu: Time-Traveling for Computational Notebooks"],"prefix":"10.14778","volume":"18","author":[{"given":"Zhaoheng","family":"Li","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}]},{"given":"Supawit","family":"Chockchowwat","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}]},{"given":"Ribhav","family":"Sahu","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}]},{"given":"Areet","family":"Sheth","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}]},{"given":"Yongjoo","family":"Park","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}]}],"member":"320","published-online":{"date-parts":[[2025,5,20]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. CRIU CUDA Support. https:\/\/criu.org\/What_cannot_be_checkpointed#Devices."},{"key":"e_1_2_1_2_1","unstructured":"Oxnurl. 2024. shelve \u2014 Python object persistence. https:\/\/github.com\/0xnurl\/redis-shelve."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/eScience55777.2022.00015"},{"key":"e_1_2_1_4_1","unstructured":"Amazon. 2024. How Amazon Aurora helps you protect your data from mistakes. https:\/\/d1.awsstatic.com\/events\/reinvent\/2020\/How_Amazon_Aurora_helps_you_protect_your_data_from_mistakes_DAT403.pdf."},{"key":"e_1_2_1_5_1","unstructured":"Amazon. 2024. Streamlining Point-in-Time Recovery (PITR) for Amazon Aurora with AWS Backup. https:\/\/aws.amazon.com\/blogs\/storage\/streamlining-point-in-time-recovery-pitr-for-amazon-aurora-with-aws-backup\/."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-SEIP.2019.00042"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161063"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352131"},{"key":"e_1_2_1_9_1","unstructured":"Anyscale. 2024. Ray - Effortlessly Scale Your Most Complex Workloads. https:\/\/www.ray.io\/."},{"key":"e_1_2_1_10_1","unstructured":"Apache. 2024. Managing Large State in Apache Flink: An Intro to Incremental Checkpointing. https:\/\/flink.apache.org\/2018\/01\/30\/managing-large-state-in-apache-flink-an-intro-to-incremental-checkpointing\/."},{"key":"e_1_2_1_11_1","unstructured":"Apache Arrow. 2024. PyArrow - Apache Arrow Python bindings. https:\/\/arrow.apache.org\/docs\/python\/index.html."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3025111.3025116"},{"key":"e_1_2_1_13_1","unstructured":"Microsoft Azure. 2024. Change feed support in Azure Blob Storage. https:\/\/learn.microsoft.com\/en-us\/azure\/storage\/blobs\/storage-blob-change-feed?source=recommendations&tabs=azure-portal."},{"key":"e_1_2_1_14_1","unstructured":"Microsoft Azure. 2024. Point-in-time restore for block blobs. https:\/\/learn.microsoft.com\/en-us\/azure\/storage\/blobs\/point-in-time-restore-overview?source=recommendations."},{"key":"e_1_2_1_15_1","volume-title":"NCSA Qiskit Demo","year":"2023","unstructured":"babreu ncsa. 2023. NCSA Qiskit Demo May 2023. https:\/\/github.com\/babreuncsa\/qiskit\/blob\/main\/demo\/QiskitDemo_NCSA_May2023.ipynb."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2714064.2660209"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502102"},{"key":"e_1_2_1_18_1","unstructured":"Ekrem Bayar. 2022. Store Sales TS Forecasting - A Comprehensive Guide. https:\/\/www.kaggle.com\/code\/ekrembayar\/store-sales-ts-forecasting-a-comprehensive-guide\/notebook."},{"key":"e_1_2_1_19_1","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra James","year":"2012","unstructured":"James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research 13, 2 (2012).","journal-title":"Journal of machine learning research"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/319996.319998"},{"key":"e_1_2_1_21_1","unstructured":"Bokeh. 2024. bokeh.figure. https:\/\/docs.bokeh.org\/en\/latest\/docs\/reference\/plotting\/figure.html."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3320246"},{"key":"e_1_2_1_23_1","volume-title":"Conference on Innovative Data Systems Research (CIDR).","author":"Brachmann Michael","year":"2020","unstructured":"Michael Brachmann and William Spoth. 2020. Your notebook is not crumby enough, REPLace it. In Conference on Innovative Data Systems Research (CIDR)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1345206.1345253"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376729"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/509593.509626"},{"key":"e_1_2_1_27_1","unstructured":"chest. 2024. chest - Simple on-disk dictionary. https:\/\/pypi.org\/project\/chest\/."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3595360.3595855"},{"key":"e_1_2_1_29_1","volume-title":"Automatically finding optimal index structure. arXiv preprint arXiv:2208.03823","author":"Chockchowwat Supawit","year":"2022","unstructured":"Supawit Chockchowwat, Wenjie Liu, and Yongjoo Park. 2022. Automatically finding optimal index structure. arXiv preprint arXiv:2208.03823 (2022)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3617308"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00107"},{"key":"e_1_2_1_32_1","unstructured":"cloudpipe. 2024. CloudPickle. https:\/\/github.com\/cloudpipe\/cloudpickle."},{"key":"e_1_2_1_33_1","volume-title":"Optuna: A hyperparameter optimization framework. https:\/\/optuna.readthedocs.io\/en\/stable\/.","author":"Optuna","year":"2024","unstructured":"Optuna contributors. 2024. Optuna: A hyperparameter optimization framework. https:\/\/optuna.readthedocs.io\/en\/stable\/."},{"key":"e_1_2_1_34_1","unstructured":"Torch Contributors. 2024. torchvision - Image Transformers. https:\/\/pytorch.org\/vision\/stable\/index.html."},{"key":"e_1_2_1_35_1","unstructured":"CRIU. 2023. Linux CRIU. https:\/\/criu.org\/Main_Page."},{"key":"e_1_2_1_36_1","unstructured":"CRIU. 2024. CRIU - What cannot be checkpointed. https:\/\/criu.org\/What_cannot_be_checkpointed."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824127"},{"key":"e_1_2_1_38_1","volume-title":"Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds. In 2021 IEEE 17th International Conference on eScience (eScience). IEEE, 30\u201339","author":"Cunha Renato LF","year":"2021","unstructured":"Renato LF Cunha, Lucas C Villa Real, Renan Souza, Bruno Silva, and Marco AS Netto. 2021. Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds. In 2021 IEEE 17th International Conference on eScience (eScience). IEEE, 30\u201339."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3530800.3534535"},{"key":"e_1_2_1_40_1","unstructured":"The Devastator. 2023. Bruteforce Clustering. https:\/\/www.kaggle.com\/code\/thedevastator\/bruteforce-clustering."},{"key":"e_1_2_1_41_1","unstructured":"Photutils Developers. 2024. An Astropy Package for Photometry. https:\/\/photutils.readthedocs.io\/en\/stable\/."},{"key":"e_1_2_1_42_1","unstructured":"Matplotlib development team. 2024. Matplotlib - Figure. https:\/\/matplotlib.org\/stable\/api\/_as_gen\/matplotlib.pyplot.figure.html."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208293"},{"key":"e_1_2_1_44_1","unstructured":"Clark DuVall. 2015. serpy: ridiculously fast object serialization. https:\/\/serpy.readthedocs.io\/en\/latest\/."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380574"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/568522.568525"},{"key":"e_1_2_1_47_1","unstructured":"Lightning AI et al. 2018. PyTorch ModelCheckpoint. https:\/\/pytorch-lightning.readthedocs.io\/en\/stable\/api\/pytorch_lightning.callbacks.ModelCheckpoint.html."},{"key":"e_1_2_1_48_1","unstructured":"Hugging Face. 2024. Hugging Face - The AI community building the future. https:\/\/huggingface.co\/."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24449-0_31"},{"key":"e_1_2_1_50_1","unstructured":"Python Software Foundation. 2023. Python - AST. https:\/\/docs.python.org\/3\/library\/ast.html."},{"key":"e_1_2_1_51_1","unstructured":"Python Software Foundation. 2023. Python - Generators. https:\/\/wiki.python.org\/moin\/Generators."},{"key":"e_1_2_1_52_1","unstructured":"Python Software Foundation. 2023. Python Hashlib. https:\/\/docs.python.org\/3\/library\/hashlib.html."},{"key":"e_1_2_1_53_1","unstructured":"Python Software Foundation. 2023. Python JSON. https:\/\/docs.python.org\/3\/library\/json.html."},{"key":"e_1_2_1_54_1","unstructured":"Python Software Foundation. 2023. Python Marshal. https:\/\/docs.python.org\/3\/library\/marshal.html."},{"key":"e_1_2_1_55_1","unstructured":"Python Software Foundation. 2023. Python Pickle Documentation. https:\/\/docs.python.org\/3\/library\/pickle.html."},{"key":"e_1_2_1_56_1","unstructured":"The Linux Foundation. 2024. PyTorch. https:\/\/pytorch.org\/."},{"key":"e_1_2_1_57_1","unstructured":"The Uncertainty Quantification Foundation. 2023. Dill - PyPi. https:\/\/pypi.org\/project\/dill\/."},{"key":"e_1_2_1_58_1","unstructured":"The Uncertainty Quantification Foundation. 2023. Dill dump session. https:\/\/dill.readthedocs.io\/en\/latest\/dill.html."},{"key":"e_1_2_1_59_1","unstructured":"Zope Foundation. 2024. ZODB programming guide. https:\/\/zodb.org\/en\/latest\/guide\/index.html."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2018.00047"},{"key":"e_1_2_1_61_1","unstructured":"GDB. 2024. GDB - Running programs backward. https:\/\/sourceware.org\/gdb\/current\/onlinedocs\/gdb.html\/Reverse-Execution.html."},{"key":"e_1_2_1_62_1","unstructured":"Aur\u00e9lien Geron. 2023. Chapter 4 - Training Models. github.com\/ageron\/handson-ml3\/blob\/main\/04_training_linear_models.ipynb."},{"key":"e_1_2_1_63_1","unstructured":"Git. 2024. Git - Commit Graph. https:\/\/git-scm.com\/docs\/commit-graph."},{"key":"e_1_2_1_64_1","unstructured":"Git. 2024. git -fast-version-control. https:\/\/git-scm.com\/."},{"key":"e_1_2_1_65_1","volume-title":"Code like a pythonista: Idiomatic python. Archived from the original on 27","author":"Goodger David","year":"2014","unstructured":"David Goodger. 2014. Code like a pythonista: Idiomatic python. Archived from the original on 27 (2014)."},{"key":"e_1_2_1_66_1","unstructured":"Google. 2023. Keras. https:\/\/keras.io\/."},{"key":"e_1_2_1_67_1","unstructured":"Google. 2024. Google Colab FAQ - Memory size. https:\/\/research.google.com\/colaboratory\/faq.html#available-memory."},{"key":"e_1_2_1_68_1","unstructured":"Google and X. 2022. Google AI4Code - Understand Code in Python Notebooks. https:\/\/www.kaggle.com\/competitions\/AI4Code."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/356842.356847"},{"key":"e_1_2_1_70_1","unstructured":"The PostgreSQL Global Development Group. 1996. PostgreSQL - Continuous Archiving and Point-in-Time Recovery (PITR). https:\/\/www.postgresql.org\/docs\/current\/continuous-archiving.html."},{"key":"e_1_2_1_71_1","unstructured":"The PostgreSQL Global Development Group. 1996. PostgreSQL: The World's Most Advanced Open Source Relational Database. https:\/\/www.postgresql.org\/."},{"key":"e_1_2_1_72_1","unstructured":"Google Groups. 2024. Time Travel Analysis or Undo in Jupyter. https:\/\/groups.google.com\/g\/jupyter\/c\/hMPDL7Iw_BQ\/m\/MWYv1d5cAwAJ."},{"key":"e_1_2_1_73_1","volume-title":"Burrito: Wrapping your lab notebook in computational infrastructure.","author":"Guo Philip J","year":"2012","unstructured":"Philip J Guo and Margo I Seltzer. 2012. Burrito: Wrapping your lab notebook in computational infrastructure. (2012)."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3714141"},{"key":"e_1_2_1_75_1","unstructured":"HuggingFace. 2024. HuggingFace - BERT Tokenizer. https:\/\/huggingface.co\/docs\/transformers\/en\/main_classes\/tokenizer."},{"key":"e_1_2_1_76_1","unstructured":"HuggingFace. 2024. HuggingFace - Pipelines. https:\/\/huggingface.co\/docs\/transformers\/en\/main_classes\/pipelines."},{"key":"e_1_2_1_77_1","unstructured":"IPython. 2024. IPython Class. https:\/\/ipython.org\/ipython-doc\/3\/api\/generated\/IPython.html."},{"key":"e_1_2_1_78_1","unstructured":"IPython. 2024. IPython Events. https:\/\/ipython.readthedocs.io\/en\/stable\/config\/callbacks.html."},{"key":"e_1_2_1_79_1","volume-title":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1\u201315","author":"Jain Twinkle","year":"2020","unstructured":"Twinkle Jain and Gene Cooperman. 2020. Crac: Checkpoint-restart architecture for cuda with streams and uvm. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1\u201315."},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368308.3415397"},{"key":"e_1_2_1_81_1","unstructured":"Project Jupyter. 2023. Jupyter Notebook. https:\/\/jupyter.org\/."},{"key":"e_1_2_1_82_1","volume-title":"arXiv preprint arXiv:2101.05782","author":"Juric Mario","year":"2021","unstructured":"Mario Juric, Steven Stetzler, and Colin T Slater. 2021. Checkpoint, Restore, and Live Migration for Science Platforms. arXiv preprint arXiv:2101.05782 (2021)."},{"key":"e_1_2_1_83_1","unstructured":"Kaggle. 2024. Kaggle Session Memory Size. https:\/\/www.kaggle.com\/discussions\/questions-and-answers\/405504."},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2019.00015"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173748"},{"key":"e_1_2_1_86_1","volume-title":"9th USENIX Workshop on the Theory and Practice of Provenance (TaPP","author":"Koop David","year":"2017","unstructured":"David Koop and Jay Patel. 2017. Dataflow notebooks: encoding and tracking dependencies of cells. In 9th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2017)."},{"key":"e_1_2_1_87_1","volume-title":"Northstar: An interactive data science system.","author":"Kraska Tim","year":"2021","unstructured":"Tim Kraska. 2021. Northstar: An interactive data science system. (2021)."},{"key":"e_1_2_1_88_1","unstructured":"ROUNAK KUMBHAKAR. 2024. pytorch resnet34 96.6 accuracy. https:\/\/www.kaggle.com\/code\/rounakkumbhakar\/pytorch-resnet34-96-6-accuracy\/notebook."},{"key":"e_1_2_1_89_1","unstructured":"Doris Jung-Lin Lee Dixin Tang Kunal Agarwal Thyne Boonmark Caitlyn Chen Jake Kang Ujjaini Mukhopadhyay Jerry Song Micah Yong Marti A Hearst et al. 2021. Lux: always-on visualization recommendations for exploratory dataframe workflows. arXiv preprint arXiv:2105.00121 (2021)."},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/3545995"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2010.129"},{"key":"e_1_2_1_92_1","volume-title":"Demonstration of ElasticNotebook: Migrating Live Computational Notebook States. In Companion of the 2024 International Conference on Management of Data. 540\u2013543","author":"Li Zhaoheng","year":"2024","unstructured":"Zhaoheng Li, Supawit Chockchowwat, Hanxi Fang, Ribhav Sahu, Sumay Thakurdesai, Kantanat Pridaphatrakun, and Yongjoo Park. 2024. Demonstration of ElasticNotebook: Migrating Live Computational Notebook States. In Companion of the 2024 International Conference on Management of Data. 540\u2013543."},{"key":"e_1_2_1_93_1","volume-title":"Kishu: Time-Traveling for Computational Notebooks (Technical Report). arXiv preprint arXiv:2406.13856","author":"Li Zhaoheng","year":"2024","unstructured":"Zhaoheng Li, Supawit Chockchowwat, Ribhav Sahu, Areet Sheth, and Yongjoo Park. 2024. Kishu: Time-Traveling for Computational Notebooks (Technical Report). arXiv preprint arXiv:2406.13856 (2024)."},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.14778\/3626292.3626296"},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1109\/TETC.2020.2986487"},{"key":"e_1_2_1_96_1","unstructured":"LITEREPLICA. 2024. SQLite - Point in time recovery. https:\/\/litereplica.io\/sqlite-point-in-time-recovery.html."},{"key":"e_1_2_1_97_1","unstructured":"Steven Loria. 2024. TextBlob: Simplified Text Processing. https:\/\/textblob.readthedocs.io\/en\/dev\/."},{"key":"e_1_2_1_98_1","volume-title":"Andrew Head, Doris Xin, and Aditya Parameswaran.","author":"Macke Stephen","year":"2020","unstructured":"Stephen Macke, Hongpu Gong, Doris Jung-Lin Lee, Andrew Head, Doris Xin, and Aditya Parameswaran. 2020. Fine-grained lineage for safer notebook interactions. arXiv preprint arXiv:2012.06981 (2020)."},{"key":"e_1_2_1_99_1","volume-title":"CHEX: Multiversion Replay with Ordered Checkpoints. arXiv preprint arXiv:2202.08429","author":"Manne Naga Nithin","year":"2022","unstructured":"Naga Nithin Manne, Shilvi Satpati, Tanu Malik, Amitabha Bagchi, Ashish Gehani, and Amitabh Chaudhary. 2022. CHEX: Multiversion Replay with Ordered Checkpoints. arXiv preprint arXiv:2202.08429 (2022)."},{"key":"e_1_2_1_100_1","unstructured":"MessagePack. 2024. MessagePack. https:\/\/github.com\/msgpack."},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1145\/128765.128770"},{"key":"e_1_2_1_102_1","first-page":"4","article-title":"Recovery protocol for nested transactions using writeahead logging","volume":"31","author":"Mohan C","year":"1988","unstructured":"C Mohan and K Rothermel. 1988. Recovery protocol for nested transactions using writeahead logging. IBM Tech. Dwclosure Bull. 31, 4 (Sept 1988) (1988).","journal-title":"IBM Tech. Dwclosure Bull."},{"key":"e_1_2_1_103_1","unstructured":"Inc. MongoDB. 2023. BSON. https:\/\/pymongo.readthedocs.io\/en\/stable\/api\/bson\/index.html."},{"key":"e_1_2_1_104_1","volume-title":"Ray: A distributed framework for emerging {AI} applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561\u2013577.","author":"Moritz Philipp","year":"2018","unstructured":"Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A distributed framework for emerging {AI} applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561\u2013577."},{"key":"e_1_2_1_105_1","volume-title":"Proceedings. IEEE, 241\u2013253","author":"Morrey CB","year":"2003","unstructured":"CB Morrey and Dirk Grunwald. 2003. Peabody: The time travelling disk. In 20th IEEE\/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003.(MSST 2003). Proceedings. IEEE, 241\u2013253."},{"key":"e_1_2_1_106_1","unstructured":"Andreas Mueller. 2024. WordCloud for Python documentation. https:\/\/amueller.github.io\/word_cloud\/."},{"key":"e_1_2_1_107_1","first-page":"293","article-title":"Method and system for providing transparent incremental and multiprocess checkpointing to computer applications","volume":"7","author":"Neary Michael Oliver","year":"2007","unstructured":"Michael Oliver Neary, Ashwani Wason, Shvetima Gulati, and Fabrice Ferval. 2007. Method and system for providing transparent incremental and multiprocess checkpointing to computer applications. US Patent 7,293,200.","journal-title":"US Patent"},{"key":"e_1_2_1_108_1","unstructured":"IBM Netezza. 2024. Non-deterministic SQL. https:\/\/www.ibm.com\/docs\/en\/netezza?topic=environment-non-deterministic-sql."},{"key":"e_1_2_1_109_1","unstructured":"Inc. NumFOCUS. 2023. Pandas. https:\/\/pandas.pydata.org\/docs\/index.html."},{"key":"e_1_2_1_110_1","unstructured":"Inc. NumFOCUS. 2024. Pandas - DataFrame. https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.html."},{"key":"e_1_2_1_111_1","unstructured":"Oracle. 2024. MySQL. https:\/\/www.mysql.com\/."},{"key":"e_1_2_1_112_1","unstructured":"Oracle. 2024. MySQL - Point-in-Time (Incremental) Recovery. https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/point-in-time-recovery.html."},{"key":"e_1_2_1_113_1","unstructured":"Jim Ormond. 2018. ACM Recognizes Innovators Who Have Shaped the Digital Revolution."},{"key":"e_1_2_1_114_1","volume-title":"Why Jupyter is data scientists' computational notebook of choice. Nature 563, 7732","author":"Perkel Jeffrey M","year":"2018","unstructured":"Jeffrey M Perkel. 2018. Why Jupyter is data scientists' computational notebook of choice. Nature 563, 7732 (2018), 145\u2013147."},{"key":"e_1_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2013.6606581"},{"key":"e_1_2_1_116_1","unstructured":"photutils. 2024. photutils - ImageDepth. https:\/\/photutils.readthedocs.io\/en\/stable\/api\/photutils.utils.ImageDepth.html."},{"key":"e_1_2_1_117_1","doi-asserted-by":"publisher","DOI":"10.5555\/1267411.1267429"},{"key":"e_1_2_1_118_1","unstructured":"Plotly. 2024. Plotly - Low-Code Python Data Apps. https:\/\/plotly.com\/."},{"key":"e_1_2_1_119_1","unstructured":"Polars. 2024. Polars - DataFrames for the new era. https:\/\/pola.rs\/."},{"key":"e_1_2_1_120_1","unstructured":"Polars. 2024. Polars - LazyFrame. https:\/\/docs.pola.rs\/py-polars\/html\/reference\/lazyframe\/index.html."},{"key":"e_1_2_1_121_1","volume-title":"PBC formerly RStudio","author":"Posit Software PBC","year":"2023","unstructured":"PBC Posit Software, PBC formerly RStudio. 2023. Posit RStudio. https:\/\/posit.co\/."},{"key":"e_1_2_1_122_1","unstructured":"NLTK Project. 2024. NLTK - Natural Language Toolkit. https:\/\/www.nltk.org\/."},{"key":"e_1_2_1_123_1","unstructured":"Python. 2024. shelve \u2014 Python object persistence. https:\/\/docs.python.org\/3\/library\/shelve.html."},{"key":"e_1_2_1_124_1","unstructured":"PyTorch. 2024. torch.tensor. https:\/\/pytorch.org\/docs\/stable\/tensors.html."},{"key":"e_1_2_1_125_1","unstructured":"Qiskit. 2024. Qiskit - An open-source SDK for working with quantum computers at the level of extended quantum circuits operators and primitives. https:\/\/pypi.org\/project\/qiskit\/."},{"key":"e_1_2_1_126_1","unstructured":"ray project. 2024. Overview of Ray. https:\/\/github.com\/ray-project\/ray-educational-materials\/blob\/main\/Introductory_modules\/Overview_of_Ray.ipynb."},{"key":"e_1_2_1_127_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173606"},{"key":"e_1_2_1_128_1","doi-asserted-by":"crossref","unstructured":"Kenneth Salem and Hector Garcia-Molina. 1989. Checkpointing memory-resident databases. In ICDE. 452\u2013462.","DOI":"10.1109\/ICDE.1989.47249"},{"key":"e_1_2_1_129_1","doi-asserted-by":"publisher","DOI":"10.1145\/3335783.3335792"},{"key":"e_1_2_1_130_1","unstructured":"scikit learn. 2024. scikit-learn - Machine Learning in Python. https:\/\/scikit-learn.org\/stable\/."},{"key":"e_1_2_1_131_1","unstructured":"scikit-learn developers. 2024. sklearn - GaussianMixture. https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.mixture.GaussianMixture.html."},{"key":"e_1_2_1_132_1","unstructured":"scikit-learn intelex. 2024. Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application. https:\/\/pypi.org\/project\/scikit-learn-intelex\/."},{"key":"e_1_2_1_133_1","unstructured":"SciPy. 2024. SciPy - Fundamental algorithms for scientific computing in Python. https:\/\/scipy.org\/."},{"key":"e_1_2_1_134_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565855"},{"key":"e_1_2_1_135_1","unstructured":"shove. 2024. shove - https:\/\/pypi.org\/project\/shove\/. https:\/\/pypi.org\/project\/shove\/."},{"key":"e_1_2_1_136_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544817"},{"key":"e_1_2_1_137_1","unstructured":"Apache Spark. 2024. PySpark Documentation. https:\/\/spark.apache.org\/docs\/3.3.1\/api\/python\/index.html."},{"key":"e_1_2_1_138_1","unstructured":"Apache Spark. 2024. pyspark.sql module. https:\/\/spark.apache.org\/docs\/2.4.0\/api\/python\/pyspark.sql.html."},{"key":"e_1_2_1_139_1","unstructured":"SQLite. 2024. SQLite. https:\/\/www.sqlite.org\/."},{"key":"e_1_2_1_140_1","unstructured":"StackOverflow. 2024. Undo Pandas Dataframe Column Drop - StackOverflow. https:\/\/stackoverflow.com\/questions\/54284994\/how-to-get-columnsseries-back-from-dropped-table."},{"key":"e_1_2_1_141_1","unstructured":"statsmodels developers. 2024. statsmodels - statistical models hypothesis tests and data exploration. https:\/\/www.statsmodels.org\/stable\/index.html."},{"key":"e_1_2_1_142_1","unstructured":"NumPy Team. 2024. NumPy - the fundamental package for scientific computing with Python. https:\/\/numpy.org\/."},{"key":"e_1_2_1_143_1","unstructured":"Ray Team. 2024. ray.data.Dataset. https:\/\/docs.ray.io\/en\/latest\/data\/api\/doc\/ray.data.Dataset.html."},{"key":"e_1_2_1_144_1","unstructured":"The IPython Development Team. 2023. IPython Interactive Computing. https:\/\/ipython.org\/."},{"key":"e_1_2_1_145_1","unstructured":"The IPython Development Team. 2023. Jupyter checkpoint. https:\/\/jupyter-server.readthedocs.io\/en\/latest\/developers\/contents.html."},{"key":"e_1_2_1_146_1","unstructured":"The IPython Development Team. 2023. Jupyter store magic. https:\/\/ipython.readthedocs.io\/en\/stable\/config\/extensions\/storemagic.html."},{"key":"e_1_2_1_147_1","unstructured":"The Matplotlib Development Team. 2023. Matplotlib. https:\/\/matplotlib.org\/."},{"key":"e_1_2_1_148_1","unstructured":"TensorFlow. 2024. TensorFlow - An end-to-end platform for machine learning. https:\/\/www.tensorflow.org\/."},{"key":"e_1_2_1_149_1","unstructured":"TensorFlow. 2024. tf.Tensor. https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/Tensor."},{"key":"e_1_2_1_150_1","unstructured":"transformers. 2024. transformers - State-of-the-art Machine Learning for JAX PyTorch and TensorFlow. https:\/\/pypi.org\/project\/transformers\/."},{"key":"e_1_2_1_151_1","unstructured":"Cornell University. 2021. SKLearn Tweet Classification. https:\/\/github.com\/CornellCAC\/CVW_PyDataSci2\/blob\/master\/code\/sklearn_tweet_classification.ipynb."},{"key":"e_1_2_1_152_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056101"},{"key":"e_1_2_1_153_1","doi-asserted-by":"publisher","DOI":"10.1145\/356725.356730"},{"key":"e_1_2_1_154_1","unstructured":"Devlikamov Vlad. 2022. [TPS-Mar] Fast workflow using scikit-learn-intelex. https:\/\/www.kaggle.com\/code\/lordozvlad\/tps-mar-fast-workflow-using-scikit-learn-intelex\/notebook."},{"key":"e_1_2_1_155_1","volume-title":"AIC model selection using Akaike weights. Psychonomic bulletin & review 11, 1","author":"Wagenmakers Eric-Jan","year":"2004","unstructured":"Eric-Jan Wagenmakers and Simon Farrell. 2004. AIC model selection using Akaike weights. Psychonomic bulletin & review 11, 1 (2004), 192\u2013196."},{"key":"e_1_2_1_156_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502123"},{"key":"e_1_2_1_157_1","unstructured":"Michael Waskom. 2024. seaborn: statistical data visualization. https:\/\/seaborn.pydata.org\/."},{"key":"e_1_2_1_158_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445527"},{"key":"e_1_2_1_159_1","unstructured":"Wikipedia. 2024. Lowest Common Ancestor. https:\/\/en.wikipedia.org\/wiki\/Lowest_common_ancestor."},{"key":"e_1_2_1_160_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379337.3415851"},{"key":"e_1_2_1_161_1","unstructured":"xgboost developers. 2024. XGBoost Documentation. https:\/\/xgboost.readthedocs.io\/en\/stable\/."},{"key":"e_1_2_1_162_1","volume-title":"Enhancing the interactivity of dataframe queries by leveraging think time. arXiv preprint arXiv:2103.02145","author":"Xin Doris","year":"2021","unstructured":"Doris Xin, Devin Petersohn, Dixin Tang, Yifan Wu, Joseph E Gonzalez, Joseph M Hellerstein, Anthony D Joseph, and Aditya G Parameswaran. 2021. Enhancing the interactivity of dataframe queries by leveraging think time. arXiv preprint arXiv:2103.02145 (2021)."},{"key":"e_1_2_1_163_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3058744"},{"key":"e_1_2_1_164_1","unstructured":"xxHash. 2023. xxHash - Extremely fast non-cryptographic hash algorithm. https:\/\/github.com\/Cyan4973\/xxHash."},{"key":"e_1_2_1_165_1","volume-title":"2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10)","author":"Zaharia Matei","year":"2010","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10)."},{"key":"e_1_2_1_166_1","volume-title":"PanoramicData: Data analysis through pen & touch","author":"Zgraggen Emanuel","year":"2014","unstructured":"Emanuel Zgraggen, Robert Zeleznik, and Steven M Drucker. 2014. PanoramicData: Data analysis through pen & touch. IEEE transactions on visualization and computer graphics 20, 12 (2014), 2112\u20132121."},{"key":"e_1_2_1_167_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685085"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3717755.3717759","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T16:18:45Z","timestamp":1747757925000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3717755.3717759"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12]]},"references-count":167,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["10.14778\/3717755.3717759"],"URL":"https:\/\/doi.org\/10.14778\/3717755.3717759","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,12]]},"assertion":[{"value":"2025-05-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}