{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T11:40:15Z","timestamp":1755776415868,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":61,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,3,30]],"date-time":"2025-03-30T00:00:00Z","timestamp":1743292800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006374","name":"Carl-Zeiss-Stiftung","doi-asserted-by":"publisher","award":["Interactive Inference"],"award-info":[{"award-number":["Interactive Inference"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,3,30]]},"DOI":"10.1145\/3676641.3716254","type":"proceedings-article","created":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T16:47:32Z","timestamp":1743094052000},"page":"275-292","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Einsum Trees: An Abstraction for Optimizing the Execution of Tensor Expressions"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8145-3877","authenticated-orcid":false,"given":"Alexander","family":"Breuer","sequence":"first","affiliation":[{"name":"Friedrich Schiller University Jena, Jena, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2009-7996","authenticated-orcid":false,"given":"Mark","family":"Blacher","sequence":"additional","affiliation":[{"name":"Friedrich Schiller University Jena, Jena, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-3417-9535","authenticated-orcid":false,"given":"Max","family":"Engel","sequence":"additional","affiliation":[{"name":"Friedrich Schiller University Jena, Jena, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6598-6833","authenticated-orcid":false,"given":"Joachim","family":"Giesen","sequence":"additional","affiliation":[{"name":"Friedrich Schiller University Jena, Jena, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0947-5394","authenticated-orcid":false,"given":"Alexander","family":"Heinecke","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1498-2653","authenticated-orcid":false,"given":"Julien","family":"Klaus","sequence":"additional","affiliation":[{"name":"Friedrich Schiller University Jena, Jena, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6815-6217","authenticated-orcid":false,"given":"Stefan","family":"Remke","sequence":"additional","affiliation":[{"name":"Friedrich Schiller University Jena, Jena, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,30]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 265--283."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00972"},{"key":"e_1_3_2_1_3_1","unstructured":"Anonymous. 2024. ThunderKittens: Simple Fast and Adorable Kernels. In Submitted to The Thirteenth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=0fJfVOSUra under review."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","unstructured":"Edoardo Apr\u00e0 Michael Klemm and Karol Kowalski. 2014. Efficient Implementation of Many-Body Quantum Chemical Methods on the Intel\u00ae Xeon Phi Coprocessor. In SC '14: Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis. 674--684. https:\/\/doi.org\/10.1109\/SC.2014.60","DOI":"10.1109\/SC.2014.60"},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the Annual Conference on Neural Information Processing Systems, Track on Datasets and Benchmarks (NeurIPS).","author":"Blacher Mark","year":"2024","unstructured":"Mark Blacher, Christoph Staudt, Julien Klaus, Maurice Wenig, Niklas Merk, Alexander Breuer, Max Engel, S\u00f6ren Laue, and Joachim Giesen. 2024. Einsum Benchmark: Enabling the Development of Next-Generation Tensor Execution Engines. In Proceedings of the Annual Conference on Neural Information Processing Systems, Track on Datasets and Benchmarks (NeurIPS)."},{"key":"e_1_3_2_1_6_1","unstructured":"Liu Chao. 2022. AMD Composable Kernel library: efficient fused kernels for AI apps with just a few lines of code. Technical Report. Advanced Micro Devices Inc."},{"key":"e_1_3_2_1_7_1","unstructured":"NVIDIA Corporation. 2024. cuQuantum SDK: A High-Performance Library for Accelerating Quantum Science. https:\/\/docs.nvidia.com\/cuda\/cuquantum\/latest\/index.html."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00753"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.amc.2014.02.051"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/77626.79170"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1002\/jcc.23377"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3576933"},{"key":"e_1_3_2_1_13_1","volume-title":"The ITensor software library for tensor network calculations. SciPost Physics Codebases","author":"Fishman Matthew","year":"2022","unstructured":"Matthew Fishman, Steven White, and Edwin Stoudenmire. 2022. The ITensor software library for tensor network calculations. SciPost Physics Codebases (2022), 004."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS47924.2020.00032"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS57955.2024.00089"},{"key":"e_1_3_2_1_16_1","volume-title":"Towards a high-performance AI compiler with upstream MLIR. arXiv preprint arXiv:2404.15204","author":"Golin Renato","year":"2024","unstructured":"Renato Golin, Lorenzo Chelini, Adam Siemieniuk, Kavitha Madhu, Niranjan Hasabnis, Hans Pabst, Evangelos Georganas, and Alexander Heinecke. 2024. Towards a high-performance AI compiler with upstream MLIR. arXiv preprint arXiv:2404.15204 (2024)."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356052.1356053"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.22331\/q-2021-03--15--410"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Charles R. Harris K. Jarrod Millman St\u00e9fan J. van der Walt Ralf Gommers Pauli Virtanen David Cournapeau Eric Wieser Julian Taylor Sebastian Berg Nathaniel J. Smith Robert Kern et al. 2020. Array programming with NumPy. Nature (2020).","DOI":"10.1038\/s41586-020-2649-2"},{"key":"e_1_3_2_1_20_1","volume-title":"Junchen Pei, Laura E. Ratcliff, Matthew G. Reuter, Adam C. Richie-Halford, Nichols A.","author":"Harrison Robert J.","year":"2016","unstructured":"Robert J. Harrison, Gregory Beylkin, Florian A. Bischoff, Justus A. Calvin, George I. Fann, Jacob Fosso-Tande, Diego Galindo, Jeff R. Hammond, Rebecca Hartman-Baker, Judith C. Hill, Jun Jia, Jakob S. Kottmann, Miao-Jung Yvonne Ou, Junchen Pei, Laura E. Ratcliff, Matthew G. Reuter, Adam C. Richie-Halford, Nichols A. Romero, Hideo Sekino, William A. Shelton, Bryan E. Sundahl, W. Scott Thornton, Edward F. Valeev, \u00c1lvaro V\u00e1zquez-Mayagoitia, Nicholas Vence, Takeshi Yanai, and Yukina Yokoi. 2016. MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation. SIAM J. Sci. Comput., Vol. 38, 5 (2016)."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1021\/jp9051215"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.83"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620666.3651383"},{"key":"e_1_3_2_1_24_1","unstructured":"Gian Marco Iodice. 2024. Arm KleidiAI: Helping AI frameworks elevate their performance on Arm CPUs. Technical Report. Arm."},{"key":"e_1_3_2_1_25_1","volume-title":"Cutlass: Fast linear algebra in cuda c. Technical Report. NVIDIA.","author":"Kerr Andrew","year":"2017","unstructured":"Andrew Kerr, Duane Merrill, Julien Demouth, and John Tran. 2017. Cutlass: Fast linear algebra in cuda c. Technical Report. NVIDIA."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1137\/07070111X"},{"key":"e_1_3_2_1_27_1","article-title":"Tensor Regression Networks","volume":"21","author":"Kossaifi Jean","year":"2020","unstructured":"Jean Kossaifi, Zachary C. Lipton, Arinbj\u00f6rn Kolbeinsson, Aran Khanna, Tommaso Furlanello, and Anima Anandkumar. 2020. Tensor Regression Networks. Journal of Machine Learning Research, Vol. 21 (2020), 123:1--123:21.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Chi-Chung Lam P. Sadayappan and Rephael Wenger. 1997. On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution. Parallel Process. Lett. (1997).","DOI":"10.1142\/S0129626497000176"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5881"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807671"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO57630.2024.10444871"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3030548"},{"volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Michela Taufer, Pavan Balaji, and Antonio J. Pe na (Eds.). ACM, 74:1--74:13","author":"Li Rui","key":"e_1_3_2_1_33_1","unstructured":"Rui Li, Aravind Sukumaran-Rajam, Richard Veras, Tze Meng Low, Fabrice Rastello, Atanas Rountev, and P. Sadayappan. 2019. Analytical cache modeling and tilesize optimization for tensor contractions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Michela Taufer, Pavan Balaji, and Antonio J. Pe na (Eds.). ACM, 74:1--74:13."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380188"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1021\/ct1007247"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1137\/050644756"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1137\/16M108968X"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2015.106"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620666.3651384"},{"key":"e_1_3_2_1_40_1","volume-title":"Vetrov","author":"Novikov Alexander","year":"2015","unstructured":"Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. 2015. Tensorizing Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS). 442--450."},{"key":"e_1_3_2_1_41_1","volume-title":"Tensor networks for complex quantum systems. Nature Reviews Physics","author":"Or\u00fas Rom\u00e1n","year":"2019","unstructured":"Rom\u00e1n Or\u00fas. 2019. Tensor networks for complex quantum systems. Nature Reviews Physics (2019)."},{"key":"e_1_3_2_1_42_1","unstructured":"Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga et al. 2019. PyTorch: An Imperative Style High-Performance Deep Learning Library. In Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_2_1_43_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML). 7563--7574","author":"Peharz Robert","year":"2020","unstructured":"Robert Peharz, Steven Lang, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Guy Van den Broeck, Kristian Kersting, and Zoubin Ghahramani. 2020. Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits. In Proceedings of the International Conference on Machine Learning (ICML). 7563--7574."},{"key":"e_1_3_2_1_44_1","unstructured":"PyTorch Developers. 2024. ATen PyTorch's tensor library. https:\/\/github.com\/pytorch\/pytorch\/tree\/main\/aten\/src\/ATen."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-013"},{"key":"e_1_3_2_1_46_1","unstructured":"S. K. Nayar S. A. Nene and H. Murase. 1996. Columbia Object Image Library (COIL-100). Technical Report. Columbia University."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"crossref","unstructured":"Yang Shi U. N. Niranjan Animashree Anandkumar and Cris Cecka. 2016. Tensor Contractions with Extended BLAS Kernels on CPU and GPU. (2016) 193--202.","DOI":"10.1109\/HiPC.2016.031"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.06.002"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3157733"},{"key":"e_1_3_2_1_50_1","volume-title":"Proceedings of the International Symposium on Experimental Algorithms (SEA). 27:1--27:19","author":"Staudt Christoph","year":"2024","unstructured":"Christoph Staudt, Mark Blacher, Julien Klaus, Farin Lippmann, and Joachim Giesen. 2024. Improved Cut Strategy for Tensor Network Contraction Orders. In Proceedings of the International Symposium on Experimental Algorithms (SEA). 27:1--27:19."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.101"},{"key":"e_1_3_2_1_52_1","volume-title":"Schwab","author":"Stoudenmire Edwin Miles","year":"2016","unstructured":"Edwin Miles Stoudenmire and David J. Schwab. 2016. Supervised Learning with Tensor Networks. In Advances in Neural Information Processing Systems (NeurIPS). 4799--4807."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3315508.3329973"},{"key":"e_1_3_2_1_54_1","first-page":"1","article-title":"tntorch: Tensor Network Learning with PyTorch","volume":"23","author":"Usvyatsov Mikhail","year":"2022","unstructured":"Mikhail Usvyatsov, Rafael Ballester-Ripoll, and Konrad Schindler. 2022. tntorch: Tensor Network Learning with PyTorch. Journal of Machine Learning Research, Vol. 23, 208 (2022), 1--6. http:\/\/jmlr.org\/papers\/v23\/21--1197.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2764454"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.5555\/3600270.3602228"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3617232.3624858"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2012.97"},{"key":"e_1_3_2_1_59_1","volume-title":"XNNPACK: A Highly Optimized Solution for Neural Network Inference on ARM, x86, WebAssembly, and RISC-V Platforms. https:\/\/github.com\/google\/XNNPACK.","author":"Developers XNNPACK","year":"2024","unstructured":"XNNPACK Developers. 2024. XNNPACK: A Highly Optimized Solution for Neural Network Inference on ARM, x86, WebAssembly, and RISC-V Platforms. https:\/\/github.com\/google\/XNNPACK."},{"key":"e_1_3_2_1_60_1","volume-title":"Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI'20)","author":"Zheng Lianmin","year":"2020","unstructured":"Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor: generating high-performance tensor programs for deep learning. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI'20). USENIX Association, USA, Article 49, 17 pages."},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i12.17321"}],"event":{"name":"ASPLOS '25: 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages","SIGOPS ACM Special Interest Group on Operating Systems","SIGARCH ACM Special Interest Group on Computer Architecture"],"location":"Rotterdam Netherlands","acronym":"ASPLOS '25"},"container-title":["Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3676641.3716254","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3676641.3716254","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T11:10:47Z","timestamp":1755774647000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3676641.3716254"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,30]]},"references-count":61,"alternative-id":["10.1145\/3676641.3716254","10.1145\/3676641"],"URL":"https:\/\/doi.org\/10.1145\/3676641.3716254","relation":{},"subject":[],"published":{"date-parts":[[2025,3,30]]},"assertion":[{"value":"2025-03-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}