{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:12:01Z","timestamp":1779174721374,"version":"3.51.4"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:p>Artificial intelligence (AI) and machine learning (ML) techniques have existed for years, but new hardware trends and advances in model training and inference have radically improved their performance. With an ever increasing amount of algorithms, systems, and hardware solutions, it is challenging to identify good deployments even for experts. Researchers and industry experts have observed this challenge and have created several benchmark suites for AI and ML applications and systems. While they are helpful in comparing several aspects of AI applications, none of the existing benchmarks measures end-to-end performance of ML deployments. Many have been rigorously developed in collaboration between academia and industry, but no existing benchmark is standardized.<\/jats:p>\n          <jats:p>In this paper, we introduce the TPC Express Benchmark for Artificial Intelligence (TPCx-AI), the first industry standard benchmark for end-to-end machine learning deployments. TPCx-AI is the first AI benchmark that represents the pipelines typically found in common ML and AI workloads. TPCx-AI provides a full software kit, which includes data generator, driver, and two full workload implementations, one based on Python libraries and one based on Apache Spark. We describe the complete benchmark and show benchmark results for various scale factors. TPCx-AI's core contributions are a novel unified data set covering structured and unstructured data; a fully scalable data generator that can generate realistic data from GB up to PB scale; and a diverse and representative workload using different data types and algorithms, covering a wide range of aspects of real ML workloads such as data integration, data processing, training, and inference.<\/jats:p>","DOI":"10.14778\/3611540.3611554","type":"journal-article","created":{"date-parts":[[2023,9,15]],"date-time":"2023-09-15T11:32:37Z","timestamp":1694777557000},"page":"3649-3661","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems"],"prefix":"10.14778","volume":"16","author":[{"given":"Christoph","family":"Br\u00fccke","sequence":"first","affiliation":[{"name":"bankmark, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Philipp","family":"H\u00e4rtling","sequence":"additional","affiliation":[{"name":"bankmark, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rodrigo D Escobar","family":"Palacios","sequence":"additional","affiliation":[{"name":"Intel, Hillsboro, Oregon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hamesh","family":"Patel","sequence":"additional","affiliation":[{"name":"Intel, Hillsboro, Oregon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tilmann","family":"Rabl","sequence":"additional","affiliation":[{"name":"Hasso Plattner Institute, University of Potsdam, bankmark, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"McKinsey Analytics. 2016. The age of analytics: competing in a data-driven world. Technical Report. McKinsey Global Institute."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098021"},{"key":"e_1_2_1_3_1","volume-title":"Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark. CoRR abs\/1806.01427","author":"Coleman Cody","year":"2018","unstructured":"Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Christopher R\u00e9, and Matei Zaharia. 2018. Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark. CoRR abs\/1806.01427 (2018)."},{"key":"e_1_2_1_4_1","unstructured":"Transaction Processing Performance Council. 2022. TPCx-AI. https:\/\/tpc.org\/tpcx-ai\/default5.asp"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_6_1","volume-title":"Revisiting Issues in Benchmark Metric Selection","author":"Elford Christopher","unstructured":"Christopher Elford, Dippy Aggarwal, and Shreyas Shekhar. 2021. Revisiting Issues in Benchmark Metric Selection. In Performance Evaluation and Benchmarking, Raghunath Nambiar and Meikel Poess (Eds.). Springer International Publishing, Cham, 35--47."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533028.3533310"},{"key":"e_1_2_1_8_1","volume-title":"Bigdatabench: A scalable and unified big data and ai benchmark suite. arXiv preprint arXiv:1802.08254","author":"Gao Wanling","year":"2018","unstructured":"Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Xu Wen, Rui Ren, Chen Zheng, Xiwen He, Hainan Ye, et al. 2018. Bigdatabench: A scalable and unified big data and ai benchmark suite. arXiv preprint arXiv:1802.08254 (2018)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463712"},{"key":"e_1_2_1_10_1","volume-title":"An Open Source AutoML Benchmark. CoRR abs\/1907.00909","author":"Gijsbers Pieter","year":"2019","unstructured":"Pieter Gijsbers, Erin LeDell, Janek Thomas, S\u00e9bastien Poirier, Bernd Bischl, and Joaquin Vanschoren. 2019. An Open Source AutoML Benchmark. CoRR abs\/1907.00909 (2019). http:\/\/arxiv.org\/abs\/1907.00909"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-020-2766-y"},{"key":"e_1_2_1_12_1","volume-title":"Ng","author":"Hannun Awni Y.","year":"2014","unstructured":"Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep Speech: Scaling up end-to-end speech recognition. CoRR abs\/1412.5567 (2014)."},{"key":"e_1_2_1_13_1","volume-title":"Performance Evaluation, Measurement and Characterization of Complex Systems","author":"Huppler Karl","unstructured":"Karl Huppler. 2011. Price and the TPC. In Performance Evaluation, Measurement and Characterization of Complex Systems, Raghunath Nambiar and Meikel Poess (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 73--84."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-04936-6_4"},{"key":"e_1_2_1_15_1","volume-title":"Performance Evaluation and Benchmarking: 13th TPC Technology Conference, TPCTC 2021","author":"Ihde Nina","year":"2021","unstructured":"Nina Ihde, Paula Marten, Ahmed Eleliemy, Gabrielle Poerwawinata, Pedro Silva, Ilin Tolovski, Florina M. Ciorba, and Tilmann Rabl. 2021. A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks. In Performance Evaluation and Benchmarking: 13th TPC Technology Conference, TPCTC 2021, Copenhagen, Denmark, August 20, 2021, Revised Selected Papers (Copenhagen, Denmark). Springer-Verlag, Berlin, Heidelberg, 98--118."},{"key":"e_1_2_1_16_1","volume-title":"CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]. CoRR abs\/1904.09483","author":"Li Peng","year":"2019","unstructured":"Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, and Ce Zhang. 2019. CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]. CoRR abs\/1904.09483 (2019). http:\/\/arxiv.org\/abs\/1904.09483"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3231751.3231770"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2974843"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/2946645.2946679"},{"key":"e_1_2_1_20_1","first-page":"39","article-title":"WordNet","volume":"38","author":"Miller George A.","year":"1995","unstructured":"George A. Miller. 1995. WordNet: A Lexical Database for English. 38, 11 (1995), 39--41.","journal-title":"A Lexical Database for English."},{"key":"e_1_2_1_21_1","unstructured":"Sharan Narang. [n.d.]. DeepBench. https:\/\/svail.github.io\/DeepBench\/. Accessed: 2021-07-03."},{"key":"e_1_2_1_22_1","unstructured":"Sharan Narang and Greg Diamos. [n.d.]. An update to DeepBench with a focus on deep learning inference. https:\/\/svail.github.io\/DeepBench-update\/. Accessed: 2021-07-03."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the International Conference on Very Large Data Bases","volume":"32","author":"Othayoth Raghunath","year":"2006","unstructured":"Raghunath Othayoth and Meikel Poess. 2006. The making of tpc-ds. In Proceedings of the International Conference on Very Large Data Bases, Vol. 32. 1049."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/369275.369291"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1791314.1791336"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733009"},{"key":"e_1_2_1_27_1","volume-title":"Steven Euijong Whang, and Martin Zinkevich","author":"Polyzotis Neoklis","year":"2017","unstructured":"Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2017. Data management challenges in production machine learning. In SIGMOD. 1723--1726."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/MPOT.2020.3016280"},{"key":"e_1_2_1_29_1","volume-title":"Hamesh Patel, Satyam Srivastava, Christoph Boden, Jens Meiners, and Sebastian Schelter.","author":"Rabl Tilmann","year":"2020","unstructured":"Tilmann Rabl, Christoph Br\u00fccke, Philipp H\u00e4rtling, Stella Stars, Rodrigo Escobar Palacios, Hamesh Patel, Satyam Srivastava, Christoph Boden, Jens Meiners, and Sebastian Schelter. 2020. ADABench - Towards an Industry Standard Benchmark for Advanced Analytics. In Performance Evaluation and Benchmarking for the Era of Cloud(s), Raghunath Nambiar and Meikel Poess (Eds.). Springer International Publishing, Cham, 47--63."},{"key":"e_1_2_1_30_1","volume-title":"Hatem Mousselly Sergieh, and Harald Kosch","author":"Rabl Tilmann","year":"2011","unstructured":"Tilmann Rabl, Michael Frank, Hatem Mousselly Sergieh, and Harald Kosch. 2011. A Data Generator for Cloud-Scale Benchmarking. In Performance Evaluation, Measurement and Characterization of Complex Systems, Raghunath Nambiar and Meikel Poess (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 41--56."},{"key":"e_1_2_1_31_1","volume-title":"John T Gregg, Daniel J Goldberg, Praneel Chakraborty, Natasha L Ray, Daniel Himmelstein, Weixuan Fu, and Jason H Moore.","author":"Romano Joseph D","year":"2021","unstructured":"Joseph D Romano, Trang T Le, William La Cava, John T Gregg, Daniel J Goldberg, Praneel Chakraborty, Natasha L Ray, Daniel Himmelstein, Weixuan Fu, and Jason H Moore. 2021. PMLB v1.0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058v2 (2021)."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems -","volume":"2","author":"Sculley D.","year":"2015","unstructured":"D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 2503--2511."},{"key":"e_1_2_1_33_1","volume-title":"Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs\/1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs\/1802.05799 (2018)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Chen Sun Abhinav Shrivastava Saurabh Singh and Abhinav Gupta. 2017. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. In ICCV.","DOI":"10.1109\/ICCV.2017.97"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3275366.3284963"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2641190.2641198"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457566"},{"key":"e_1_2_1_38_1","volume-title":"Spark: Cluster computing with working sets.. In HotCloud. 10--10.","author":"Zaharia Matei","year":"2010","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets.. In HotCloud. 10--10."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3611540.3611554","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T22:32:30Z","timestamp":1757543550000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3611540.3611554"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8]]},"references-count":38,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["10.14778\/3611540.3611554"],"URL":"https:\/\/doi.org\/10.14778\/3611540.3611554","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,8]]},"assertion":[{"value":"2023-08-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}