{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T09:02:34Z","timestamp":1775638954527,"version":"3.50.1"},"reference-count":124,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T00:00:00Z","timestamp":1739145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006374","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["200021_204620"],"award-info":[{"award-number":["200021_204620"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,2,10]]},"abstract":"<jats:p>In real-world machine learning (ML) pipelines, datasets are continuously growing. Models must incorporate this new training data to improve generalization and adapt to potential distribution shifts. The cost of model retraining is proportional to how frequently the model is retrained and how much data it is trained on, which makes the naive approach of retraining from scratch each time impractical. We present Modyn, a data-centric end-to-end machine learning platform. Modyn's ML pipeline abstraction enables users to declaratively describe policies for continuously training a model on a growing dataset. Modyn pipelines allow users to apply data selection policies (to reduce the number of data points) and triggering policies (to reduce the number of trainings). Modyn executes and orchestrates these continuous ML training pipelines. The system is open-source and comes with an ecosystem of benchmark datasets, models, and tooling. We formally discuss how to measure the performance of ML pipelines by introducing the concept of composite models, enabling fair comparison of pipelines with different data selection and triggering policies. We empirically analyze how various data selection and triggering policies impact model accuracy, and also show that Modyn enables high throughput training with sample-level data selection.<\/jats:p>","DOI":"10.1145\/3709705","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T15:45:06Z","timestamp":1739288706000},"page":"1-30","source":"Crossref","is-referenced-by-count":3,"title":["Modyn: Data-Centric Machine Learning Pipeline Orchestration"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4093-4361","authenticated-orcid":false,"given":"Maximilian","family":"B\u00f6ther","sequence":"first","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-3451-5602","authenticated-orcid":false,"given":"Ties","family":"Robroek","sequence":"additional","affiliation":[{"name":"IT University of Copenhagen, Copenhagen, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6750-5500","authenticated-orcid":false,"given":"Viktor","family":"Gsteiger","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7505-0673","authenticated-orcid":false,"given":"Robin","family":"Holzinger","sequence":"additional","affiliation":[{"name":"Technical University of Munich, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-1400-6735","authenticated-orcid":false,"given":"Xianzhe","family":"Ma","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6838-4854","authenticated-orcid":false,"given":"P\u0131nar","family":"T\u00f6z\u00fcn","sequence":"additional","affiliation":[{"name":"IT University of Copenhagen, Copenhagen, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8559-0529","authenticated-orcid":false,"given":"Ana","family":"Klimovic","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485462"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2024.111535"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS).","author":"Aljundi Rahaf","year":"2019","unstructured":"Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, and Lucas Page-Caccia. 2019a. Online Continual Learning with Maximal Interfered Retrieval. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS).","author":"Aljundi Rahaf","year":"2019","unstructured":"Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. 2019b. Gradient based sample selection for online continual learning. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_5_1","unstructured":"Amazon. 2023. Amazon SageMaker. https:\/\/docs.aws.amazon.com\/sagemaker\/index.html."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","unstructured":"arXiv.org submitters. 2024. arXiv Kaggle Dataset. https:\/\/doi.org\/10.34740\/KAGGLE\/DSV\/7548853","DOI":"10.34740\/KAGGLE\/DSV\/7548853"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr46437.2021.00812"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2018.04.014"},{"key":"e_1_2_1_9_1","unstructured":"Michael Bayer. 2012. SQLAlchemy. In The Architecture of Open Source Applications Volume II: Structure Scale and a Few More Fearless Hacks. aosabook.org. http:\/\/aosabook.org\/en\/sqlalchemy.html"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the USENIX Conference on Operational Machine Learning (OpML).","author":"Baylor Denis","year":"2019","unstructured":"Denis Baylor, Kevin Haas, Konstantinos Katsiapis, Sammy Leong, Rose Liu, Clemens Mewald, Hui Miao, Neoklis Polyzotis, Mitchell Trott, and Martin Zinkevich. 2019. Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform. In Proceedings of the USENIX Conference on Operational Machine Learning (OpML)."},{"key":"e_1_2_1_11_1","unstructured":"BentoML. 2023. BentoML: Github Organization. https:\/\/github.com\/bentoml\/. Accessed: 2023--11--28."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI).","author":"Bhardwaj Romil","year":"2022","unstructured":"Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Nikolaos Karianakis, Kevin Hsieh, Paramvir Bahl, and Ion Stoica. 2022. Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI)."},{"key":"e_1_2_1_13_1","unstructured":"Lukas Biewald. 2020. Experiment Tracking with Weights and Biases. https:\/\/www.wandb.com\/."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3578356.3592585"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/sp40001.2021.00019"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/iccv48922.2021.00817"},{"key":"e_1_2_1_17_1","volume-title":"Bulletin of the Technical Committee on Data Engineering","volume":"38","author":"Carbone Paris","year":"2015","unstructured":"Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink#8482;: Stream and Batch Processing in a Single Engine. Bulletin of the Technical Committee on Data Engineering, Vol. 38, 4 (2015)."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS).","author":"Cauwenberghs Gert","unstructured":"Gert Cauwenberghs and Tomaso A. Poggio. 2000. Incremental and Decremental Support Vector Machine Learning. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3436905.3436911"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3399579.3399867"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620678.3624661"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR).","author":"Coleman Cody","year":"2020","unstructured":"Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. 2020. Selection via Proxy: Efficient Data Selection for Deep Learning. In Proceedings of the International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_23_1","unstructured":"Criteo. 2013. Download Terabyte Click Logs. https:\/\/labs.criteo.com\/2013\/12\/download-terabyte-click-logs\/."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5441\/002\/EDBT.2019.35"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1903.05202"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/cidue.2011.5948491"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460231.3474599"},{"key":"e_1_2_1_29_1","unstructured":"European Union. 2016. Art. 17 GDPR: Right to erasure ('right to be forgotten'). https:\/\/gdpr.eu\/article-17-right-to-be-forgotten\/."},{"key":"e_1_2_1_30_1","volume-title":"Evidently: Collaborative AI observability platform. https:\/\/www.evidentlyai.com\/. Accessed: 2024-06--26.","author":"Evidently","year":"2024","unstructured":"Evidently AI. 2024. Evidently: Collaborative AI observability platform. https:\/\/www.evidentlyai.com\/. Accessed: 2024-06--26."},{"key":"e_1_2_1_31_1","unstructured":"Clement Farabet and Nicolas Koumchatzky. 2020. Presentation: Inside NVIDIAtextquoterights AI Infrastructure for Self-driving Cars. In Presentations of the USENIX Conference on Operational Machine Learning (OpML). https:\/\/www.usenix.org\/conference\/opml20\/presentation\/farabet"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr52729.2023.01144"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589273"},{"key":"e_1_2_1_34_1","article-title":"A Kernel Two-Sample Test","volume":"13","author":"Gretton Arthur","year":"2012","unstructured":"Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch\u00f6lkopf, and Alexander Smola. 2012. A Kernel Two-Sample Test. Journal of Machine Learning Research, Vol. 13, 25 (2012).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/iccv51070.2023.01728"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2307.07507"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00059"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr.2016.90"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2648584.2648589"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106622"},{"key":"e_1_2_1_41_1","unstructured":"Hopsworks AB. 2024. Hopsworks Feature Monitoring. https:\/\/www.hopsworks.ai\/dictionary\/feature-monitoring."},{"key":"e_1_2_1_42_1","unstructured":"Chip Huyen. 2020. Machine learning is going real-time. https:\/\/huyenchip.com\/2020\/12\/27\/real-time-machine-learning.html."},{"key":"e_1_2_1_43_1","volume-title":"Designing Machine Learning Systems","author":"Huyen Chip","unstructured":"Chip Huyen. 2022a. Designing Machine Learning Systems. O'Reilly Media, Inc."},{"key":"e_1_2_1_44_1","unstructured":"Chip Huyen. 2022b. Real-time machine learning: challenges and solutions. https:\/\/huyenchip.com\/2022\/01\/02\/real-time-machine-learning-challenges-and-solutions.html."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr46437.2021.00814"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2311.09930"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML).","author":"Katharopoulos Angelos","year":"2018","unstructured":"Angelos Katharopoulos and Fran\u00e7ois Fleuret. 2018. Not All Samples Are Created Equal: Deep Learning with Importance Sampling. In Proceedings of the International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML).","author":"Killamsetty KrishnaTeja","unstructured":"KrishnaTeja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Abir De, and Rishabh K. Iyer. 2021. GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training. In Proceedings of the International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_49_1","volume-title":"Does textquoteleftDeep Learning on a Data Diettextquoteright reproduce? Overall yes, but GraNd at Initialization does not. Transactions on Machine Learning Research","author":"Kirsch Andreas","year":"2023","unstructured":"Andreas Kirsch. 2023. Does textquoteleftDeep Learning on a Data Diettextquoteright reproduce? Overall yes, but GraNd at Initialization does not. Transactions on Machine Learning Research (2023)."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR).","author":"Koh Hyunseo","year":"2022","unstructured":"Hyunseo Koh, Dahyun Kim, Jung-Woo Ha, and Jonghyun Choi. 2022. Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference. In Proceedings of the International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_51_1","unstructured":"Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto Toronto Ontario. https:\/\/www.cs.toronto.edu\/ kriz\/learning-features-2009-TR.pdf"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the Conference on Machine Learning and Systems (MLSys).","author":"Kuchnik Michael","year":"2022","unstructured":"Michael Kuchnik, Ana Klimovic, Jiri Simsa, Virginia Smith, and George Amvrosiadis. 2022. Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines. In Proceedings of the Conference on Machine Learning and Systems (MLSys)."},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS).","author":"Lazaridou Angeliki","year":"2021","unstructured":"Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tom\u00e1s Kocisk\u00fd, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, and Phil Blunsom. 2021. Mind the Gap: Assessing Temporal Generalization in Neural Language Models. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS).","author":"Li Aodong","year":"2021","unstructured":"Aodong Li, Alex Boyd, Padhraic Smyth, and Stephan Mandt. 2021. Detecting and Adapting to Irregular Distribution Shifts in Bayesian Online Learning. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588698"},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR).","author":"Lopez-Paz David","year":"2017","unstructured":"David Lopez-Paz and Maxime Oquab. 2017. Revisiting Classifier Two-Sample Tests. In Proceedings of the International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NeurIPS).","author":"Lopez-Paz David","year":"2017","unstructured":"David Lopez-Paz and Marc'Aurelio Ranzato. 2017. Gradient Episodic Memory for Continual Learning. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/tkde.2018.2876857"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2310.04216"},{"key":"e_1_2_1_61_1","volume-title":"Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI).","author":"Matam Kiran Kumar","year":"2024","unstructured":"Kiran Kumar Matam, Hani Ramezani, Fan Wang, Zeliang Chen, Yue Dong, Maomao Ding, Zhiwei Zhao, Zhengyu Zhang, Ellie Wen, and Assaf Eisenman. 2024. QuickUpdate: a Real-Time Personalization System for Large-Scale Recommendation Models. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI)."},{"key":"e_1_2_1_62_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML).","author":"Mindermann S\u00f6ren","year":"2022","unstructured":"S\u00f6ren Mindermann, Jan Markus Brauner, Muhammed Razzak, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt H\u00f6ltgen, Aidan N. Gomez, Adrien Morisot, Sebastian Farquhar, and Yarin Gal. 2022. Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt. In Proceedings of the International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_63_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML).","author":"Mirzasoleiman Baharan","year":"2020","unstructured":"Baharan Mirzasoleiman, Jeff A. Bilmes, and Jure Leskovec. 2020. Coresets for Data-efficient Training of Machine Learning Models. In Proceedings of the International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_64_1","volume-title":"News Category Dataset. arXiv","author":"Misra Rishabh","year":"2022","unstructured":"Rishabh Misra. 2022. News Category Dataset. arXiv (2022)."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098021"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3533727"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476374"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403205"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","unstructured":"Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. (2019). https:\/\/doi.org\/10.48550\/ARXIV.1906.00091","DOI":"10.48550\/ARXIV.1906.00091"},{"key":"e_1_2_1_70_1","unstructured":"Neptune. 2023. Neptune.ai ML Metadata Store. https:\/\/neptune.ai\/."},{"key":"e_1_2_1_71_1","unstructured":"NVIDIA. 2023. NVIDIA Triton Inference Server. https:\/\/developer.nvidia.com\/nvidia-triton-inference-server. Accessed: 2023--11--28."},{"key":"e_1_2_1_72_1","unstructured":"NVIDIA. 2024. NVIDIA DLRM Example Implementation. https:\/\/github.com\/NVIDIA\/DeepLearningExamples\/tree\/master\/PyTorch\/Recommendation\/DLRM. Accessed: 2024-06--26."},{"key":"e_1_2_1_73_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR).","author":"Okanovic Patrik","year":"2023","unstructured":"Patrik Okanovic, Roger Waleffe, Vasilis Mageirakos, Konstantinos E. Nikolakakis, Amin Karbasi, Dionysis Kalogerias, Nezihe Merve G\u00fcrel, and Theodoros Rekatsinas. 2023. Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning. In Proceedings of the International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1712.06139"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533378"},{"key":"e_1_2_1_76_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NeurIPS).","author":"Paul Mansheej","year":"2021","unstructured":"Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. 2021. Deep Learning on a Data Diet: Finding Important Examples Early in Training. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543873.3587561"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/5326.983933"},{"key":"e_1_2_1_79_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML). https:\/\/proceedings.mlr.press\/v162\/pooladzandi22a.html","author":"Pooladzandi Omead","year":"2022","unstructured":"Omead Pooladzandi, David Davini, and Baharan Mirzasoleiman. 2022. Adaptive Second Order Coresets for Data-efficient Machine Learning. In Proceedings of the International Conference on Machine Learning (ICML). https:\/\/proceedings.mlr.press\/v162\/pooladzandi22a.html"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.09253"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3-030--58536--531"},{"key":"e_1_2_1_82_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NeurIPS).","author":"Pruthi Garima","year":"2020","unstructured":"Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. 2020. Estimating Training Data Influence by Tracing Gradient Descent. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_83_1","unstructured":"Pydantic Contributors. 2024. Pydantic Documentation. https:\/\/docs.pydantic.dev\/latest\/. Accessed: 2024-07-07."},{"key":"e_1_2_1_84_1","unstructured":"PyTorch Serve Contributors. 2020. TorchServe: Docs. https:\/\/pytorch.org\/serve\/. Accessed: 2023--11--28."},{"key":"e_1_2_1_85_1","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS).","author":"Rabanser Stephan","unstructured":"Stephan Rabanser, Stephan G\u00fcnnemann, and Zachary C. Lipton. 2019. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","unstructured":"Srikumar Ramalingam Daniel Glasner Kaushal Patel Raviteja Vemulapalli Sadeep Jayasumana and Sanjiv Kumar. 2021. Less is more: Selecting informative and diverse subsets with balancing constraints. (2021). https:\/\/doi.org\/10.48550\/arXiv.2104.12835","DOI":"10.48550\/arXiv.2104.12835"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2309.08250"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.5441\/002\/EDBT.2021.07"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-024-00835--2"},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2011.08.019"},{"key":"e_1_2_1_91_1","volume-title":"Proceedings of the Workshop on Energy Efficient Machine Learning and Cognitive Computing at NeurIPS.","author":"Sanh Victor","year":"2020","unstructured":"Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In Proceedings of the Workshop on Energy Efficient Machine Learning and Cognitive Computing at NeurIPS."},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/3555041.3589682"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3229867"},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380604"},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/3653697"},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2205.11473"},{"key":"e_1_2_1_97_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565853"},{"key":"e_1_2_1_98_1","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Sima Chijun","year":"2022","unstructured":"Chijun Sima, Yao Fu, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, and Luo Mai. 2022. Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_2_1_99_1","unstructured":"Maciej Sobczak and GitHub Contributors. 2023. SOCI - The C Database Access Library. https:\/\/github.com\/SOCI\/soci. Accessed: 2023--11--28."},{"key":"e_1_2_1_100_1","unstructured":"State of California USA. 2018. Section 1798.130 CCPA. https:\/\/ccpa-info.com\/california-consumer-privacy-act-full-text\/."},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2023.111615"},{"key":"e_1_2_1_102_1","doi-asserted-by":"publisher","DOI":"10.1145\/16856.16888"},{"key":"e_1_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.48786\/edbt.2023.37"},{"key":"e_1_2_1_104_1","doi-asserted-by":"publisher","DOI":"10.48786\/EDBT.2022.12"},{"key":"e_1_2_1_105_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML).","author":"Tahmasbi Ashraf","unstructured":"Ashraf Tahmasbi, Ellango Jothimurugesan, Srikanta Tirthapura, and Phillip B. Gibbons. 2021. DriftSurf: Stable-State \/ Reactive-State Learning under Concept Drift. In Proceedings of the International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_106_1","unstructured":"Tesla. 2019. Tesla Autonomy Day. https:\/\/www.youtube.com\/watch?v=Ucp0TTmvqOE&t=6678s."},{"key":"e_1_2_1_107_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267809.3267817"},{"key":"e_1_2_1_108_1","unstructured":"Josh Tobin. 2021. Toward continual learning systems. https:\/\/gantry.io\/blog\/toward-continual-learning-systems\/."},{"key":"e_1_2_1_109_1","volume-title":"Alibi Detect: Algorithms for outlier, adversarial and drift detection. https:\/\/github.com\/SeldonIO\/alibi-detect","author":"Looveren Arnaud Van","year":"2019","unstructured":"Arnaud Van Looveren, Janis Klaise, Giovanni Vacanti, Oliver Cobb, Ashley Scillitoe, Robert Samoilescu, and Alex Athorne. 2019. Alibi Detect: Algorithms for outlier, adversarial and drift detection. https:\/\/github.com\/SeldonIO\/alibi-detect"},{"key":"e_1_2_1_110_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-022--15245-z"},{"key":"e_1_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2312.01700"},{"key":"e_1_2_1_112_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2023.23087"},{"key":"e_1_2_1_113_1","doi-asserted-by":"publisher","DOI":"10.5220\/0012758600003756"},{"key":"e_1_2_1_114_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr42600.2020.00265"},{"key":"e_1_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_2_1_116_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2022.05.014"},{"key":"e_1_2_1_117_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr.2019.00046"},{"key":"e_1_2_1_118_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380571"},{"key":"e_1_2_1_119_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457566"},{"key":"e_1_2_1_120_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3615456"},{"key":"e_1_2_1_121_1","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS) (Benchmark Track).","author":"Yao Huaxiu","year":"2022","unstructured":"Huaxiu Yao, Caroline Choi, Bochuan Cao, Yoonho Lee, Pang Wei Koh, and Chelsea Finn. 2022. Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS) (Benchmark Track)."},{"key":"e_1_2_1_122_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2022\/788"},{"key":"e_1_2_1_123_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"},{"key":"e_1_2_1_124_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3533044"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709705","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3709705","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:20:11Z","timestamp":1774981211000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709705"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,10]]},"references-count":124,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2,10]]}},"alternative-id":["10.1145\/3709705"],"URL":"https:\/\/doi.org\/10.1145\/3709705","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,10]]}}}