{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T09:02:34Z","timestamp":1775638954538,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":79,"publisher":"ACM","funder":[{"name":"Samsung Electronics"},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["RS-2024-00340099"],"award-info":[{"award-number":["RS-2024-00340099"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Institute for Information & communication Technology Planning & evaluation","award":["RS-2023-00215700"],"award-info":[{"award-number":["RS-2023-00215700"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,13]]},"DOI":"10.1145\/3731569.3764847","type":"proceedings-article","created":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T12:43:24Z","timestamp":1759322604000},"page":"589-605","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["SAND: A New Programming Abstraction for Video-based Deep Learning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-7673-1279","authenticated-orcid":false,"given":"Juncheol","family":"Ye","sequence":"first","affiliation":[{"name":"KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5570-9814","authenticated-orcid":false,"given":"Seungkook","family":"Lee","sequence":"additional","affiliation":[{"name":"KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9872-6234","authenticated-orcid":false,"given":"Hwijoon","family":"Lim","sequence":"additional","affiliation":[{"name":"KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0986-213X","authenticated-orcid":false,"given":"Jihyuk","family":"Lee","sequence":"additional","affiliation":[{"name":"Chung-Ang University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8526-404X","authenticated-orcid":false,"given":"Uitaek","family":"Hong","sequence":"additional","affiliation":[{"name":"KAIST, Daejeon, Republic of Korea"},{"name":"Maum.AI, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5602-2397","authenticated-orcid":false,"given":"Youngjin","family":"Kwon","sequence":"additional","affiliation":[{"name":"KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6922-7244","authenticated-orcid":false,"given":"Dongsu","family":"Han","sequence":"additional","affiliation":[{"name":"KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"2023. Runway Gen-2: text-to-video generation. Runway Research Release. Describes Gen-2 as a multimodal AI system generating video from text image or video input."},{"key":"e_1_3_2_2_2_1","unstructured":"Amazon AWS. 2025. Amazon S3 Official Webpage. https:\/\/aws.amazon.com\/s3\/."},{"key":"e_1_3_2_2_3_1","unstructured":"Amazon AWS. 2025. AWS P3 Instance Official Webpage. https:\/\/aws.amazon.com\/ec2\/instance-types\/p3\/?nc1=h_ls."},{"key":"e_1_3_2_2_4_1","unstructured":"Amazon Kinesis. 2025. Amazon Kinesis Video Streams official webpage. https:\/\/aws.amazon.com\/pm\/kinesis\/?nc1=h_ls."},{"key":"e_1_3_2_2_5_1","unstructured":"Amazon SageMaker. 2025. Apache SageMaker official webpage. https:\/\/aws.amazon.com\/sagemaker\/."},{"key":"e_1_3_2_2_6_1","unstructured":"Apache Kafka. 2025. Apache Kafka official webpage. https:\/\/kafka.apache.org\/."},{"key":"e_1_3_2_2_7_1","volume-title":"2021 IEEE international symposium on multimedia (ISM). IEEE, 226\u2013234","author":"Apostolidis Evlampios","year":"2021","unstructured":"Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris, and Ioannis Patras. 2021. Combining global and local attention with positional encoding for video summarization. In 2021 IEEE international symposium on multimedia (ISM). IEEE, 226\u2013234."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4842-4850-8_4"},{"key":"e_1_3_2_2_9_1","unstructured":"Tim Brooks Bill Peebles Connor Holmes Will DePue Yufei Guo et al. 2024. Video generation models as world simulators. OpenAI Technical Report. Describes training of Sora on videos and images for text-conditional video generation (latent diffusion transformer)."},{"key":"e_1_3_2_2_10_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition.","author":"Chan Kelvin C.K.","year":"2022","unstructured":"Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2022. BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_11_1","volume-title":"LiFteR: Unleash Learned Codecs in Video Streaming with Loose Frame Referencing. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)","author":"Chen Bo","year":"2024","unstructured":"Bo Chen, Zhisheng Yan, Yinjie Zhang, Zhe Yang, and Klara Nahrstedt. 2024. LiFteR: Unleash Learned Codecs in Video Streaming with Loose Frame Referencing. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 533\u2013548. https:\/\/www.usenix.org\/conference\/nsdi24\/presentation\/chen-bo"},{"key":"e_1_3_2_2_12_1","volume-title":"Video mamba suite: State space model as a versatile alternative for video understanding. arXiv preprint arXiv:2403.09626","author":"Chen Guo","year":"2024","unstructured":"Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, and Limin Wang. 2024. Video mamba suite: State space model as a versatile alternative for video understanding. arXiv preprint arXiv:2403.09626 (2024)."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3047966"},{"key":"e_1_3_2_2_14_1","unstructured":"Cisco. 2025. openh264 github repository. https:\/\/github.com\/cisco\/openh264."},{"key":"e_1_3_2_2_15_1","unstructured":"NVIDIA Corporation. Accessed: 2022. NVIDIA DALI. https:\/\/developer.nvidia.com\/DALI."},{"key":"e_1_3_2_2_16_1","unstructured":"DMLC Contributors. 2025. Decord github repository. https:\/\/github.com\/dmlc\/decord."},{"key":"e_1_3_2_2_17_1","volume-title":"Revisiting Skeleton-based Action Recognition. 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2022","author":"Duan Haodong","year":"2022","unstructured":"Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. 2022. Revisiting Skeleton-based Action Recognition. 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2022). 10.1109\/cvpr52688.2022.00298"},{"key":"e_1_3_2_2_18_1","unstructured":"Haoqi Fan Yanghao Li Bo Xiong Wan-Yen Lo and Christoph Feichtenhofer. 2020. PySlowFast. https:\/\/github.com\/facebookresearch\/slowfast."},{"key":"e_1_3_2_2_19_1","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision. IEEE, 6202\u20136211","author":"Feichtenhofer Christoph","year":"2019","unstructured":"Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slowfast networks for video recognition. In Proceedings of the IEEE\/CVF international conference on computer vision. IEEE, 6202\u20136211. https:\/\/openaccess.thecvf.com\/content_ICCV_2019\/html\/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.html"},{"key":"e_1_3_2_2_20_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 3299\u20133309","author":"Feichtenhofer Christoph","year":"2021","unstructured":"Christoph Feichtenhofer, Haoqi Fan, Bo Xiong, Ross Girshick, and Kaiming He. 2021. A large-scale study on unsupervised spatiotemporal representation learning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 3299\u20133309."},{"key":"e_1_3_2_2_21_1","unstructured":"Google Cloud Platform. 2025. GCP A2 Instance Official Webpage. https:\/\/cloud.google.com\/compute\/docs\/gpus#a100-gpus"},{"key":"e_1_3_2_2_22_1","unstructured":"Google Cloud Platform. 2025. GCP Filestore Official Webpage. https:\/\/cloud.google.com\/filestore."},{"key":"e_1_3_2_2_23_1","unstructured":"Google Cloud Platform. 2025. Google Cloud AutoML official webpage. https:\/\/cloud.google.com\/automl."},{"key":"e_1_3_2_2_24_1","unstructured":"Google WebM Project. 2025. libvpx github repository. https:\/\/github.com\/webmproject\/libvpx."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-020-01008-z"},{"key":"e_1_3_2_2_26_1","first-page":"7873","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper John","year":"2021","unstructured":"John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin \u017d\u00eddek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Andrew J Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (Aug. 2021), 583\u2013589.","journal-title":"Nature"},{"key":"e_1_3_2_2_27_1","volume-title":"The Kinetics Human Action Video Dataset. CoRR abs\/1705.06950","author":"Kay Will","year":"2017","unstructured":"Will Kay, Jo\u00e3o Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. CoRR abs\/1705.06950 (2017). arXiv:1705.06950 http:\/\/arxiv.org\/abs\/1705.06950"},{"key":"e_1_3_2_2_28_1","volume-title":"Proceedings of ACM SIGCOMM. Association for Computing Machinery","author":"Kim Jaehong","year":"2020","unstructured":"Jaehong Kim, Youngmok Jung, Hyunho Yeo, Juncheol Ye, and Dongsu Han. 2020. Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning. In Proceedings of ACM SIGCOMM. Association for Computing Machinery, New York, NY, USA, 107\u2013125. 10.1145\/3387514.3405856"},{"key":"e_1_3_2_2_29_1","unstructured":"Jinhyung Kim Taeoh Kim Minho Shim Dongyoon Han Dongyoon Wee and Junmo Kim. 2022. Frequency Selective Augmentation for Video Representation Learning. arXiv:2204.03865 [cs.CV]"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"crossref","first-page":"863","DOI":"10.14778\/3636218.3636238","article-title":"FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation","volume":"17","author":"Kim Taeyoon","year":"2023","unstructured":"Taeyoon Kim, ChanHo Park, Mansur Mukimbekov, Heelim Hong, Minseok Kim, Ze Jin, Changdae Kim, Ji-Yong Shin, and Myeongjae Jeon. 2023. FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation. Proceedings of the VLDB Endowment 17, 4 (2023), 863\u2013876.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_2_2_31_1","series-title":"Lecture Notes in Computer Science","volume-title":"MotionSqueeze: Neural Motion Feature Learning for Video Understanding","author":"Kwon Heeseung","year":"2020","unstructured":"Heeseung Kwon, Manjin Kim, Suha Kwak, and Minsu Cho. 2020. MotionSqueeze: Neural Motion Feature Learning for Video Understanding. Lecture Notes in Computer Science (2020), 345\u2013362. 10.1007\/978-3-030-58517-4_21"},{"key":"e_1_3_2_2_32_1","volume-title":"ByteDance's TikTok cuts hundreds of jobs in shift towards AI content moderation","author":"Latiff Rozanna","year":"2024","unstructured":"Rozanna Latiff. 2024. ByteDance's TikTok cuts hundreds of jobs in shift towards AI content moderation. Reuters (11 October 2024). https:\/\/www.reuters.com\/technology\/bytedance-cuts-over-700-jobs-malaysia-shift-towards-ai-moderation-sources-say-2024-10-11\/"},{"key":"e_1_3_2_2_33_1","volume-title":"Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.)","volume":"2","author":"Li Liam","year":"2020","unstructured":"Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Jonathan Ben-tzur, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2020. A System for Massively Parallel Hyperparameter Tuning. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.), Vol. 2. 230\u2013246. https:\/\/proceedings.mlsys.org\/paper_files\/paper\/2020\/file\/a06f20b349c6cf09a6b171c71b88bbfc-Paper.pdf"},{"key":"e_1_3_2_2_34_1","volume-title":"Recurrent Video Restoration Transformer with Guided Deformable Attention. arXiv preprint arXiv:2206.02146","author":"Liang Jingyun","year":"2022","unstructured":"Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, and Luc Van Gool. 2022. Recurrent Video Restoration Transformer with Guided Deformable Attention. arXiv preprint arXiv:2206.02146 (2022)."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.ade2574"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2017.07.005"},{"key":"e_1_3_2_2_37_1","unstructured":"Bei Liu and Jianlong Fu. 2022. XPretrain. https:\/\/github.com\/microsoft\/XPretrain\/tree\/main."},{"key":"e_1_3_2_2_38_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 18591\u201318601","author":"Liu Shuming","year":"2024","unstructured":"Shuming Liu, Chen-Lin Zhang, Chen Zhao, and Bernard Ghanem. 2024. End-to-end temporal action detection with 1b parameters across 1000 frames. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 18591\u201318601."},{"key":"e_1_3_2_2_39_1","unstructured":"Microsoft Azure. 2025. Microsoft Azure Machine Learning. https:\/\/azure.microsoft.com\/en-us\/products\/machine-learning."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Ishan Misra C. Lawrence Zitnick and Martial Hebert. 2016. Shuffle and Learn: Unsupervised Learning using Temporal Order Verification. arXiv:1603.08561 [cs.CV]","DOI":"10.1007\/978-3-319-46448-0_32"},{"key":"e_1_3_2_2_41_1","volume-title":"Analyzing and mitigating data stalls in DNN training. arXiv preprint arXiv:2007.06775","author":"Mohan Jayashree","year":"2020","unstructured":"Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, and Vijay Chidambaram. 2020. Analyzing and mitigating data stalls in DNN training. arXiv preprint arXiv:2007.06775 (2020)."},{"key":"e_1_3_2_2_42_1","volume-title":"Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation","author":"Moritz Philipp","year":"2018","unstructured":"Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: a distributed framework for emerging AI applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA). USENIX Association, USA, 561\u2013577."},{"key":"e_1_3_2_2_43_1","unstructured":"NVIDIA. 2025. NVIDIA A100 Official Webpage. https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/."},{"key":"e_1_3_2_2_44_1","volume-title":"accessed","author":"NVIDIA Corporation","year":"2022","unstructured":"NVIDIA Corporation. accessed 2022. NVIDIA Video Codec SDK. https:\/\/developer.nvidia.com\/video-codec-sdk."},{"key":"e_1_3_2_2_45_1","unstructured":"Jennifer Flannery O'Connor and Emily Moxley. 2023. Our approach to responsible AI innovation. YouTube Official Blog. https:\/\/blog.youtube\/inside-youtube\/our-approach-to-responsible-ai-innovation\/ Announces AI-based moderation tools and disclosure of synthetic content."},{"key":"e_1_3_2_2_46_1","unstructured":"OpenCV. 2025. opencv github repository. https:\/\/github.com\/opencv\/opencv."},{"key":"e_1_3_2_2_47_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA."},{"key":"e_1_3_2_2_48_1","unstructured":"Png Group. 2025. libpng github repository. https:\/\/github.com\/pnggroup\/libpng."},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201394"},{"key":"e_1_3_2_2_50_1","unstructured":"PyAV Contributors. 2025. PyAV github repository. https:\/\/github.com\/PyAV-Org\/PyAV."},{"key":"e_1_3_2_2_51_1","unstructured":"PyTorchVideo Team. 2025. PyTorchVideo Official Webpage. https:\/\/pytorchvideo.org\/."},{"key":"e_1_3_2_2_52_1","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017","author":"Qi Charles Ruizhongtai","year":"2017","unstructured":"Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 77\u201385. 10.1109\/CVPR.2017.16"},{"key":"e_1_3_2_2_53_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Qi Charles Ruizhongtai","year":"2017","unstructured":"Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5099\u20135108. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/d8bf84be3800d12f74d8b05e9b89836f-Abstract.html"},{"key":"e_1_3_2_2_54_1","volume-title":"Exploring Temporal Granularity in Self-Supervised Video Representation Learning. ArXiv abs\/2112.04480","author":"Qian Rui","year":"2021","unstructured":"Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge J. Belongie, Ming-Hsuan Yang, Hartwig Adam, and Yin Cui. 2021. Exploring Temporal Granularity in Self-Supervised Video Representation Learning. ArXiv abs\/2112.04480 (2021)."},{"key":"e_1_3_2_2_55_1","volume-title":"Spatiotemporal Contrastive Video Representation Learning. 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2021","author":"Qian Rui","year":"2021","unstructured":"Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, and Yin Cui. 2021. Spatiotemporal Contrastive Video Representation Learning. 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2021). 10.1109\/cvpr46437.2021.00689"},{"key":"e_1_3_2_2_56_1","volume-title":"DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. 1421\u20131429","author":"Ran Xukan","year":"2018","unstructured":"Xukan Ran, Haolianz Chen, Xiaodan Zhu, Zhenming Liu, and Jiasi Chen. 2018. DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. 1421\u20131429. 10.1109\/INFOCOM.2018.8485905"},{"key":"e_1_3_2_2_57_1","volume-title":"Broaden Your Views for Self-Supervised Video Learning. 2021 IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Recasens Adri\u00e0","year":"2021","unstructured":"Adri\u00e0 Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altch'e, Michael Valko, Jean-Bastien Grill, A\u00e4ron van den Oord, and Andrew Zisserman. 2021. Broaden Your Views for Self-Supervised Video Learning. 2021 IEEE\/CVF International Conference on Computer Vision (ICCV) (2021), 1235\u20131245."},{"key":"e_1_3_2_2_58_1","unstructured":"Richard Gooch. 1999. Linux Virtual Filesystem Overview. https:\/\/www.kernel.org\/doc\/html\/latest\/filesystems\/vfs.html."},{"key":"e_1_3_2_2_59_1","first-page":"15","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives Alexander","year":"2021","unstructured":"Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, and Rob Fergus. 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. U. S. A. 118, 15 (April 2021), e2016239118.","journal-title":"Proc. Natl. Acad. Sci. U. S. A."},{"key":"e_1_3_2_2_60_1","volume-title":"Medical Image Computing and Computer-Assisted Intervention \u2013 MICCAI","author":"Ronneberger Olaf","year":"2015","unstructured":"Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention \u2013 MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234\u2013241."},{"key":"e_1_3_2_2_61_1","volume-title":"Garnett (Eds.)","volume":"31","author":"Sener Ozan","year":"2018","unstructured":"Ozan Sener and Vladlen Koltun. 2018. Multi-Task Learning as Multi-Objective Optimization. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2018\/file\/432aca3a1e345e339f35a30c8f65edce-Paper.pdf"},{"key":"e_1_3_2_2_62_1","unstructured":"Abraham Silberschatz Henry F. Korth and S. Sudershan. 1998. Database System Concepts (3rd ed.). McGraw-Hill Inc. USA."},{"key":"e_1_3_2_2_63_1","unstructured":"Snowflask Delta Lake. 2025. Delta Lake official webpage. https:\/\/delta.io\/."},{"key":"e_1_3_2_2_64_1","unstructured":"Snowflask Delta Lake. 2025. Snowflake Delta Lake support. https:\/\/docs.snowflake.com\/en\/user-guide\/tables-external-intro#delta-lake-support"},{"key":"e_1_3_2_2_65_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research","author":"Standley Trevor","year":"2020","unstructured":"Trevor Standley, Amir Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2020. Which Tasks Should Be Learned Together in Multi-task Learning?. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daum\u00e9 III and Aarti Singh (Eds.). PMLR, 9120\u20139132. https:\/\/proceedings.mlr.press\/v119\/standley20a.html"},{"key":"e_1_3_2_2_66_1","unstructured":"Stephen Lacey Nicola Phillips. 2024. Now the power is the major bottleneck of AI. https:\/\/www.latitudemedia.com\/news\/energy-is-now-the-primary-bottleneck-for-ai\/."},{"key":"e_1_3_2_2_67_1","volume-title":"Filesystem in Userspace","author":"Szeredi Miklos","unstructured":"Miklos Szeredi. 2005. Filesystem in Userspace. http:\/\/fuse.sourceforge.net. Accessed: 2024-09-09."},{"key":"e_1_3_2_2_68_1","volume-title":"Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805 [cs.CL] https:\/\/arxiv.org\/abs\/2312.11805 Begins to incorporate video understanding as part of multimodal capabilities.","author":"Team Gemini","year":"2024","unstructured":"Gemini Team. 2024. Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805 [cs.CL] https:\/\/arxiv.org\/abs\/2312.11805 Begins to incorporate video understanding as part of multimodal capabilities."},{"key":"e_1_3_2_2_69_1","unstructured":"OpenAI Team. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_2_2_70_1","unstructured":"Zhan Tong Yibing Song Jue Wang and Limin Wang. 2022. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Advances in Neural Information Processing Systems (NeurIPS). https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/hash\/416f9cb3276121c42eebb86352a4354a-Abstract-Conference.html"},{"key":"e_1_3_2_2_71_1","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.14778\/3579075.3579083","article-title":"Fastflow: Accelerating deep learning model training with smart offloading of input data pipeline","volume":"16","author":"Um Taegeon","year":"2023","unstructured":"Taegeon Um, Byungsoo Oh, Byeongchan Seo, Minhyeok Kweun, Goeun Kim, and Woo-Yeon Lee. 2023. Fastflow: Accelerating deep learning model training with smart offloading of input data pipeline. Proceedings of the VLDB Endowment 16, 5 (2023), 1086\u20131099.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_2_2_72_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/html\/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.html","author":"Wang Xiaolong","year":"2018","unstructured":"Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-Local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/html\/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.html"},{"key":"e_1_3_2_2_73_1","volume-title":"InternVideo: General Video Foundation Models via Generative and Discriminative Learning. arXiv preprint arXiv:2212.03191","author":"Wang Yi","year":"2022","unstructured":"Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, Limin Wang, and Yu Qiao. 2022. InternVideo: General Video Foundation Models via Generative and Discriminative Learning. arXiv preprint arXiv:2212.03191 (2022). https:\/\/arxiv.org\/abs\/2212.03191"},{"key":"e_1_3_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.544"},{"key":"e_1_3_2_2_75_1","volume-title":"Advancing HighResolution Video-Language Representation with Large-Scale Video Transcriptions. In International Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Xue Hongwei","year":"2022","unstructured":"Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, and Baining Guo. 2022. Advancing HighResolution Video-Language Representation with Large-Scale Video Transcriptions. In International Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_2_76_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3589773","article-title":"Goldminer: Elastic scaling of training data pre-processing pipelines for deep learning","volume":"1","author":"Zhao Hanyu","year":"2023","unstructured":"Hanyu Zhao, Zhi Yang, Yu Cheng, Chao Tian, Shiru Ren, Wencong Xiao, Man Yuan, Langshi Chen, Kaibo Liu, Yang Zhang, et al. 2023. Goldminer: Elastic scaling of training data pre-processing pipelines for deep learning. Proceedings of the ACM on Management of Data 1, 2 (2023), 1\u201325.","journal-title":"Proceedings of the ACM on Management of Data"},{"key":"e_1_3_2_2_77_1","volume-title":"Proceedings of the 49th annual international symposium on computer architecture. 1042\u20131057","author":"Zhao Mark","year":"2022","unstructured":"Mark Zhao, Niket Agarwal, Aarti Basant, Bu\u011fra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, et al. 2022. Understanding data storage and ingestion for large-scale deep recommendation model training: Industrial product. In Proceedings of the 49th annual international symposium on computer architecture. 1042\u20131057."},{"key":"e_1_3_2_2_78_1","volume-title":"2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018","author":"Zhou Yin","year":"2018","unstructured":"Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation \/ IEEE Computer Society, 4490\u20134499. 10.1109\/CVPR.2018.00472"},{"key":"e_1_3_2_2_79_1","first-page":"948","article-title":"Dsnet: A flexible detect-to-summarize network for video summarization","volume":"30","author":"Zhu Wencheng","year":"2020","unstructured":"Wencheng Zhu, Jiwen Lu, Jiahao Li, and Jie Zhou. 2020. Dsnet: A flexible detect-to-summarize network for video summarization. IEEE Transactions on Image Processing 30 (2020), 948\u2013962.","journal-title":"IEEE Transactions on Image Processing"}],"event":{"name":"SOSP '25: ACM SIGOPS 31st Symposium on Operating Systems Principles","location":"Lotte Hotel World Seoul Republic of Korea","acronym":"SOSP '25","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems","USENIX"]},"container-title":["Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles"],"original-title":[],"deposited":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T12:46:56Z","timestamp":1759322816000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3731569.3764847"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,12]]},"references-count":79,"alternative-id":["10.1145\/3731569.3764847","10.1145\/3731569"],"URL":"https:\/\/doi.org\/10.1145\/3731569.3764847","relation":{},"subject":[],"published":{"date-parts":[[2025,10,12]]},"assertion":[{"value":"2025-10-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}