{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T08:48:01Z","timestamp":1780994881401,"version":"3.54.1"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"funder":[{"DOI":"10.13039\/100018699","name":"HORIZON EUROPE Digital, Industry and Space","doi-asserted-by":"publisher","award":["101070374"],"award-info":[{"award-number":["101070374"]}],"id":[{"id":"10.13039\/100018699","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,6,20]]},"abstract":"<jats:p>Compilers often use performance models to decide how to optimize code. This is often preferred over using hardware performance measurements, since hardware measurements can be expensive, limited by hardware availability, and makes the output of compilation non-deterministic. Analytical models, on the other hand, serve as efficient and noise-free performance indicators. Since many optimizations focus on improving memory performance, memory cache miss rate estimations can serve as an effective and noise-free performance indicator for superoptimizers, worst-case execution time analyses, manual program optimization, and many other performance-focused use cases. Existing methods to model the cache behavior of affine programs work on small programs such as those in the Polybench benchmark but do not scale to the larger programs we would like to optimize in production, which can be orders of magnitude bigger by lines of code. These analytical approaches hand off the whole program to a Presburger solver and perform expensive mathematical operations on the huge resulting formulas. We develop a scalable cache model for affine programs that splits the computation into smaller pieces that do not trigger the worst-case asymptotic behavior of these solvers. We evaluate our approach on 46 TorchVision neural networks, finding that our model has a geomean runtime of 44.9 seconds compared to over 32 minutes for the state-of-the-art prior cache model, and the latter is actually smaller than the true value because the prior model reached our four-hour time limit on 54% of the networks, and this limit was never reached by our tool. Our model exploits parallelism effectively: running it on sixteen cores is 8.2x faster than running it single-threaded. While the state-of-the-art model takes over four hours to analyze a majority of the benchmark programs, Falcon produces results in at most 3 minutes and 3 seconds; moreover, after a local modification to the program being analyzed, our model efficiently updates the predictions in 513 ms on average (geomean). Thus, we provide the first scalable analytical cache model.<\/jats:p>\n                  <jats:p>\n                    CCS Concepts: \u2022\n                    <jats:bold>Software and its engineering \u2192 Compilers<\/jats:bold>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3656452","type":"journal-article","created":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T12:27:20Z","timestamp":1718886440000},"page":"1854-1878","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Falcon: A Scalable Analytical Cache Model"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7301-2307","authenticated-orcid":false,"given":"Arjun","family":"Pitchanathan","sequence":"first","affiliation":[{"name":"University of Edinburgh, Edinburgh, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-9915-2885","authenticated-orcid":false,"given":"Kunwar","family":"Grover","sequence":"additional","affiliation":[{"name":"Advanced Micro Devices, Edinburgh, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3874-6003","authenticated-orcid":false,"given":"Tobias","family":"Grosser","sequence":"additional","affiliation":[{"name":"University of Cambridge, Cambridge, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"e_1_3_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322967"},{"key":"e_1_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3158120"},{"key":"e_1_3_2_4_1","first-page":"769","article-title":"A Polynomial Time Algorithm for Counting Integral Points in Polyhedra When the Dimension Is Fixed","author":"Barvinok Alexander I.","year":"1994","unstructured":"Alexander I. Barvinok. 1994. A Polynomial Time Algorithm for Counting Integral Points in Polyhedra When the Dimension Is Fixed. Mathematics of OperationsResearch 19, 4 (1994), 769-779. http:\/\/www.jstor.org\/stable\/3690312","journal-title":"Mathematics of OperationsResearch 19, 4 (1994)"},{"key":"e_1_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2004.09.004"},{"key":"e_1_3_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/381694.378859"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192402"},{"key":"e_1_3_2_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327258"},{"key":"e_1_3_2_9_1","unstructured":"Jan Edler and Mark D. Hill. 1999. Dinero IV Trace-Driven Uniprocessor Cache Simulator. (1999). https:\/\/pages.cs.wisc.edu\/\u1fc0markhill\/DineroIV\/"},{"key":"e_1_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1051\/ro\/1988220302431"},{"key":"e_1_3_2_11_1","first-page":"122","article-title":"Super-Exponential Complexity of Presburger Arithmetic","author":"Fischer Michael J.","year":"1998","unstructured":"Michael J. Fischer and Michael O. Rabin. 1998. Super-Exponential Complexity of Presburger Arithmetic. In Quantifier Elimination and Cylindrical Algebraic Decomposition, Bob F. Caviness and Jeremy R. Johnson (Eds.). Springer Vienna, Vienna, 122-135. https:\/\/link.springer.com\/chapter\/10.1007\/978-3-7091-9459-1_5","journal-title":"In Quantifier Elimination and Cylindrical Algebraic Decomposition, Bob F. Caviness and Jeremy R. Johnson (Eds.). Springer Vienna, Vienna"},{"key":"e_1_3_2_12_1","unstructured":"GCC Contributors. [n. d.]. Auto-Vectorization in GCC. ([n. d.]). https:\/\/gcc.gnu.org\/projects\/tree-ssa\/vectorization.html Accessed: 2023-11-12"},{"key":"e_1_3_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314606"},{"key":"e_1_3_2_14_1","doi-asserted-by":"publisher","unstructured":"Christoph Haase. 2018. A Survival Guide to Presburger Arithmetic. ACM SIGLOG News 5 3 (July 2018) 67-82. https:\/\/doi.org\/10.1145\/3242953.3242964 10.1145\/3242953.3242964","DOI":"10.1145\/3242953.3242964"},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_16_1","article-title":"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications","author":"Howard Andrew G.","year":"2017","unstructured":"Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs\/1704.04861 (2017). arXiv:1704.04861 http:\/\/arxiv.org\/abs\/1704.04861","journal-title":"CoRR abs\/1704.04861 (2017). arXiv:1704.04861"},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOT.2003.1240655"},{"key":"e_1_3_2_18_1","article-title":"Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc","author":"Jia Zhihao","year":"2020","unstructured":"Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org. https:\/\/proceedings.mlsys.org\/book\/300.pdf","journal-title":"In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org"},{"key":"e_1_3_2_19_1","article-title":"A Learned Performance Model for Tensor Processing Units","author":"Kaufman Samuel J.","year":"2021","unstructured":"Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, and Mike Burrows. 2021. A Learned Performance Model for Tensor Processing Units. In Proceedings of Machine Learning and Systems 2021, MLSys 2021, virtual, April 5-9, 2021, Alex Smola, Alex Dimakis, and Ion Stoica (Eds.). mlsys.org. https:\/\/arxiv.org\/abs\/2008.01040","journal-title":"In Proceedings of Machine Learning and Systems 2021, MLSys 2021, virtual, April 5-9, 2021, Alex Smola, Alex Dimakis, and Ion Stoica (Eds.). mlsys.org"},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370308"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"e_1_3_2_23_1","unstructured":"LLVM Contributors. [n. d.]. Auto-Vectorization in LLVM. ([n. d.]). https:\/\/llvm.org\/docs\/Vectorizers.html Accessed: 2023-11-12."},{"key":"e_1_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1287\/moor.17.3.751"},{"key":"e_1_3_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874254"},{"key":"e_1_3_2_26_1","first-page":"311","article-title":"Lazy Array Data-Flow Dependence Analysis","author":"Maslov Vadim","year":"1994","unstructured":"Vadim Maslov. 1994. Lazy Array Data-Flow Dependence Analysis. In Proceedings ofthe 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL \u201894). Association for Computing Machinery, New York, NY, USA, 311-325. https:\/\/doi.org\/10.1145\/174675.177911","journal-title":"In Proceedings ofthe 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL \u201894). Association for Computing Machinery, New York, NY, USA"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523714"},{"key":"e_1_3_2_28_1","article-title":"Polygeist: Raising C to Polyhedral MLIR","author":"Moses William S.","year":"2021","unstructured":"William S. Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko. 2021. Polygeist: Raising C to Polyhedral MLIR. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT \u201821). Association for Computing Machinery, New York, NY, USA, 12.","journal-title":"In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT \u201821). Association for Computing Machinery, New York, NY, USA, 12"},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_3_2_30_1","article-title":"Computer Organization and Design, Fifth Edition: The Hardware\/Software Interface (5th ed.)","author":"Patterson David A.","year":"2013","unstructured":"David A. Patterson and John L. Hennessy. 2013. Computer Organization and Design, Fifth Edition: The Hardware\/Software Interface (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.","journal-title":"Morgan Kaufmann Publishers Inc., San Francisco, CA, USA"},{"key":"e_1_3_2_31_1","doi-asserted-by":"publisher","unstructured":"Arjun Pitchanathan Kunwar Grover and Tobias Grosser. 2024. Artifact for \u201cFalcon: A Scalable Analytical Cache Model\u201d.(2024). https:\/\/doi.org\/10.5281\/zenodo.10972076 10.5281\/zenodo.10972076","DOI":"10.5281\/zenodo.10972076"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485539"},{"key":"e_1_3_2_33_1","unstructured":"Louis-Noel Pouchet. 2012. Polybench: The polyhedral benchmark suite. (2012). https:\/\/sourceforge.net\/projects\/polybench\/"},{"key":"e_1_3_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3558003"},{"key":"e_1_3_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.29"},{"key":"e_1_3_2_36_1","article-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. http:\/\/arxiv.org\/abs\/1409.1556","journal-title":"In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings"},{"key":"e_1_3_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_2_39_1","unstructured":"Sven Verdoolaege. 2007. barvinok library. (2007). https:\/\/barvinok.sourceforge.io\/"},{"key":"e_1_3_2_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15582-6_49"},{"key":"e_1_3_2_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-006-1231-0"},{"key":"e_1_3_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2004.1275296"},{"key":"e_1_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3134437"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656452","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3656452","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,8]],"date-time":"2026-06-08T22:31:49Z","timestamp":1780957909000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656452"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":42,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2024,6,20]]}},"alternative-id":["10.1145\/3656452"],"URL":"https:\/\/doi.org\/10.1145\/3656452","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,20]]},"assertion":[{"value":"2023-11-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-31","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}