{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T13:14:52Z","timestamp":1776950092318,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,5,4]]},"DOI":"10.1145\/3777884.3797001","type":"proceedings-article","created":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T12:27:26Z","timestamp":1776947246000},"page":"108-119","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["ORION: Integrated Runtime Modelling for Predicting Deep Learning Training Time"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-9387-2099","authenticated-orcid":false,"given":"Alireza","family":"Pourali","sequence":"first","affiliation":[{"name":"York University, Toronto, Ontario, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5439-8024","authenticated-orcid":false,"given":"Hamzeh","family":"Khazaei","sequence":"additional","affiliation":[{"name":"York University, Toronto, Ontario, Canada"}]}],"member":"320","published-online":{"date-parts":[[2026,5,3]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of Machine Learning and Systems","volume":"7","author":"Arfeen Daiyaan","year":"2025","unstructured":"Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory Ganger, and Yida Wang. 2025. Pipefill: Using gpus during bubbles in pipeline-parallel llm training. Proceedings of Machine Learning and Systems, Vol. 7 (2025)."},{"key":"e_1_3_2_1_2_1","volume-title":"Findings of the 2014 workshop on statistical machine translation. In Proceedings of the ninth workshop on statistical machine translation. 12-58","author":"Bojar Ond\u0159ej","year":"2014","unstructured":"Ond\u0159ej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, et al., 2014. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the ninth workshop on statistical machine translation. 12-58."},{"key":"e_1_3_2_1_3_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems Vol. 33 (2020) 1877-1901."},{"key":"e_1_3_2_1_4_1","first-page":"1","article-title":"Palm: Scaling language modeling with pathways","volume":"24","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al., 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, Vol. 24, 240 (2023), 1-113.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 215-223","author":"Coates Adam","year":"2011","unstructured":"Adam Coates, Andrew Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 215-223."},{"key":"e_1_3_2_1_6_1","first-page":"102","article-title":"Dawnbench: An end-to-end deep learning benchmark and competition","volume":"100","author":"Coleman Cody","year":"2017","unstructured":"Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris R\u00e9, and Matei Zaharia. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. Training, Vol. 100, 101 (2017), 102.","journal-title":"Training"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_8_1","volume-title":"BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT (2019)."},{"key":"e_1_3_2_1_9_1","volume-title":"An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929","author":"Dosovitskiy Alexey","year":"2020","unstructured":"Alexey Dosovitskiy. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)."},{"key":"e_1_3_2_1_10_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 368-380","author":"Gao Yanjie","year":"2023","unstructured":"Yanjie Gao, Xianyu Gu, Hongyu Zhang, Haoxiang Lin, and Mao Yang. 2023. Runtime performance prediction for deep learning models with graph neural network. In 2023 IEEE\/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 368-380."},{"key":"e_1_3_2_1_11_1","first-page":"503","volume-title":"2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Geoffrey X Yu","year":"2021","unstructured":"X Yu Geoffrey, Yubo Gao, Pavel Golikov, and Gennady Pekhimenko. 2021. Habitat: A computational performance predictor for deep neural network training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 503-521."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_13_1","volume-title":"Getting in the zone for successful scalability. arXiv preprint arXiv:0809.2541","author":"Holtman Jim","year":"2008","unstructured":"Jim Holtman and Neil J Gunther. 2008. Getting in the zone for successful scalability. arXiv preprint arXiv:0809.2541 (2008)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_1_15_1","first-page":"46","article-title":"JIT-Q: Just-in-time Quantization with Processing-In-Memory for Efficient ML Training","volume":"6","author":"Ibrahim Mohamed A","year":"2024","unstructured":"Mohamed A Ibrahim, Shaizeen Aga, Ada Li, Suchita Pati, and Mahzabeen Islam. 2024. JIT-Q: Just-in-time Quantization with Processing-In-Memory for Efficient ML Training. Proceedings of Machine Learning and Systems, Vol. 6 (2024), 46-59.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_2_1_17_1","first-page":"14","article-title":"Restructuring batch normalization to accelerate CNN training","volume":"1","author":"Jung Wonkyung","year":"2019","unstructured":"Wonkyung Jung, Daejin Jung, Byeongho Kim, Sunjung Lee, Wonjong Rhee, and Jung Ho Ahn. 2019. Restructuring batch normalization to accelerate CNN training. Proceedings of machine learning and systems, Vol. 1 (2019), 14-26.","journal-title":"Proceedings of machine learning and systems"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2018.8622396"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/3357034.3357049"},{"key":"e_1_3_2_1_20_1","unstructured":"Alex Krizhevsky Geoffrey Hinton et al. 2009. Learning multiple layers of features from tiny images. (2009)."},{"key":"e_1_3_2_1_21_1","first-page":"1097","article-title":"ImageNet Classification with Deep Convolutional Neural Networks","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS). 1097-1105.","journal-title":"Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_2_1_22_1","volume-title":"ICLR 2022-10th International Conference on Learning Representations.","author":"Lian Dongze","year":"2022","unstructured":"Dongze Lian, Zehao Yu, Xing Sun, and Shenghua Gao. 2022. AS-MLP: An axial shifted mlp architecture for vision. In ICLR 2022-10th International Conference on Learning Representations."},{"key":"e_1_3_2_1_23_1","volume-title":"Proceedings of Machine Learning and Systems","volume":"7","author":"Liang Mingyu","year":"2025","unstructured":"Mingyu Liang, Hiwot T Kassa, Wenyin Fu, Brian Coutinho, Louis Feng, and Christina Delimitrou. 2025. Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training. Proceedings of Machine Learning and Systems, Vol. 7 (2025)."},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the 15th ACM\/SPEC International Conference on Performance Engineering. 190-200","author":"Ma Xiaolong","year":"2024","unstructured":"Xiaolong Ma, Feng Yan, Lei Yang, Ian Foster, Michael E Papka, Zhengchun Liu, and Rajkumar Kettimuthu. 2024. MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer Nodes. In Proceedings of the 15th ACM\/SPEC International Conference on Performance Engineering. 190-200."},{"key":"e_1_3_2_1_25_1","first-page":"336","article-title":"Mlperf training benchmark","volume":"2","author":"Mattson Peter","year":"2020","unstructured":"Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, et al., 2020. Mlperf training benchmark. Proceedings of Machine Learning and Systems, Vol. 2 (2020), 336-349.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3446095.3446100"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K16-1028"},{"key":"e_1_3_2_1_28_1","volume-title":"An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458","author":"O'shea Keiron","year":"2015","unstructured":"Keiron O'shea and Ryan Nash. 2015. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)."},{"key":"e_1_3_2_1_29_1","first-page":"307","volume-title":"2020 USENIX Annual Technical Conference (USENIX ATC 20)","author":"Park Jay H","year":"2020","unstructured":"Jay H Park, Gyeongchan Yun, M Yi Chang, Nguyen T Nguyen, Seungmin Lee, Jaesik Choi, Sam H Noh, and Young-ri Choi. 2020. HetPipe: Enabling large DNN training on (whimpy) heterogeneous GPU clusters through integration of pipelined model parallelism and data parallelism. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 307-321."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2916550"},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the 15th ACM\/SPEC International Conference on Performance Engineering. 130-141","author":"Pfister Benjamin JJ","year":"2024","unstructured":"Benjamin JJ Pfister, Dominik Scheinert, Morgan K Geldenhuys, and Odej Kao. 2024. Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems. In Proceedings of the 15th ACM\/SPEC International Conference on Performance Engineering. 130-141."},{"key":"e_1_3_2_1_32_1","volume-title":"Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud. CoRR","author":"Pinto Christian","year":"2018","unstructured":"Christian Pinto, Yiannis Gkoufas, Andrea Reale, Seetharami Seelam, and Steven Eliuk. 2018. Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud. CoRR (2018)."},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings of the 16th ACM\/SPEC International Conference on Performance Engineering. 81-91","author":"Pourali Alireza","year":"2025","unstructured":"Alireza Pourali, Arian Boukani, and Hamzeh Khazaei. 2025. PreNeT: Leveraging Computational Features to Predict Deep Neural Network Training Time. In Proceedings of the 16th ACM\/SPEC International Conference on Performance Engineering. 81-91."},{"key":"e_1_3_2_1_34_1","volume-title":"NeurIPS Workshop.","author":"Sanh Victor","year":"2019","unstructured":"Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In NeurIPS Workshop."},{"key":"e_1_3_2_1_35_1","volume-title":"On Challenges in Machine Learning Model Management. Data Engineering","author":"Schelter Sebastian","year":"2018","unstructured":"Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018. On Challenges in Machine Learning Model Management. Data Engineering (2018), 5."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Christoph Schuhmann Romain Beaumont Richard Vencu Cade Gordon Ross Wightman Mehdi Cherti Theo Coombes Aarush Katta Clayton Mullis Mitchell Wortsman et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems Vol. 35 (2022) 25278-25294.","DOI":"10.52202\/068431-1833"},{"key":"e_1_3_2_1_37_1","volume-title":"3rd International Conference on Learning Representations (ICLR","author":"Simonyan K","year":"2015","unstructured":"K Simonyan and A Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations (ICLR 2015). Computational and Biological Learning Society."},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the 15th ACM\/SPEC International Conference on Performance Engineering. 201-210","author":"Singh Ravi Kumar","year":"2024","unstructured":"Ravi Kumar Singh, Likhith Bandamudi, Shruti Kunde, Mayank Mishra, and Rekha Singhal. 2024. Leftovers for LLaMA. In Proceedings of the 15th ACM\/SPEC International Conference on Performance Engineering. 201-210."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"crossref","first-page":"1631","DOI":"10.18653\/v1\/D13-1170","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics","author":"Socher Richard","year":"2013","unstructured":"Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Seattle, Washington, USA, 1631-1642. https:\/\/nlp.stanford.edu\/sentiment\/"},{"key":"e_1_3_2_1_40_1","volume-title":"Sequence to sequence learning with neural networks. Advances in neural information processing systems","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. Advances in neural information processing systems, Vol. 27 (2014)."},{"key":"e_1_3_2_1_41_1","volume-title":"Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems","author":"Tolstikhin Ilya O","year":"2021","unstructured":"Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, et al., 2021. Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems, Vol. 34 (2021), 24261-24272."},{"key":"e_1_3_2_1_42_1","volume-title":"Resmlp: Feedforward networks for image classification with data-efficient training","author":"Touvron Hugo","year":"2022","unstructured":"Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, et al., 2022. Resmlp: Feedforward networks for image classification with data-efficient training. IEEE transactions on pattern analysis and machine intelligence, Vol. 45, 4 (2022), 5314-5321."},{"key":"e_1_3_2_1_43_1","volume-title":"International conference on machine learning. PMLR, 10347-10357","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In International conference on machine learning. PMLR, 10347-10357."},{"key":"e_1_3_2_1_44_1","volume-title":"Attention is all you need. Advances in neural information processing systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017)."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477133.3477137"},{"key":"e_1_3_2_1_46_1","first-page":"1","article-title":"The effectiveness of data augmentation in image classification using deep learning","volume":"11","author":"Wang Jason","year":"2017","unstructured":"Jason Wang, Luis Perez, et al., 2017. The effectiveness of data augmentation in image classification using deep learning. Convolutional Neural Networks Vis. Recognit, Vol. 11, 2017 (2017), 1-8.","journal-title":"Convolutional Neural Networks Vis. Recognit"},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the Sixth Conference on Machine Learning and Systems (MLSys' 23)","author":"Wang Zhuang","year":"2023","unstructured":"Zhuang Wang, Xinyu Crystal Wu, Zhaozhuo Xu, and TS Eugene Ng. 2023. Cupcake: a compression optimizer for scalable communication-efficient distributed training. In Proceedings of the Sixth Conference on Machine Learning and Systems (MLSys' 23)."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3530895"},{"key":"e_1_3_2_1_49_1","first-page":"6136","article-title":"Predicting training time without training","volume":"33","author":"Zancato Luca","year":"2020","unstructured":"Luca Zancato, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto. 2020. Predicting training time without training. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6136-6146.","journal-title":"Advances in Neural Information Processing Systems"}],"event":{"name":"ICPE '26: 17th ACM\/SPEC International Conference on Performance Engineering","location":"Florence Italy","sponsor":["SIGSOFT ACM Special Interest Group on Software Engineering","SIGMETRICS ACM Special Interest Group on Measurement and Evaluation","SPEC"]},"container-title":["Proceedings of the 17th ACM\/SPEC International Conference on Performance Engineering"],"original-title":[],"deposited":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T12:30:03Z","timestamp":1776947403000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3777884.3797001"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,3]]},"references-count":49,"alternative-id":["10.1145\/3777884.3797001","10.1145\/3777884"],"URL":"https:\/\/doi.org\/10.1145\/3777884.3797001","relation":{},"subject":[],"published":{"date-parts":[[2026,5,3]]},"assertion":[{"value":"2026-05-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}