{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T16:50:56Z","timestamp":1771951856424,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA2","funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2023YFB4503204"],"award-info":[{"award-number":["2023YFB4503204"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2025,10,9]]},"abstract":"<jats:p>Practical encrypted neural network inference under the CKKS fully homomorphic encryption (FHE) scheme relies heavily on accelerating two key kernel operations: Matrix-Vector Multiplication (MVM) and Convolution (Conv). However, existing solutions\u2014such as expert-tuned libraries and domain-specific languages\u2014are designed in an ad hoc manner, leading to significant inefficiencies caused by excessive rotations.<\/jats:p>\n          <jats:p>We introduce MKR, a novel composition-based compiler approach that optimizes MVM and Conv kernel operations for DNN models under CKKS within a unified framework. MKR decomposes each kernel into composable units, called MetaKernels, to enhance SIMD parallelism within ciphertexts (via horizontal batching) and computational parallelism across them (via vertical batching). Our approach tackles previously unaddressed challenges, including reducing rotation overhead through a rotation-aware cost model for data packing, while also ensuring high slot utilization, uniform handling of inputs with arbitrary sizes, and compatibility with the output tensor layout. Implemented in a production-quality FHE compiler, MKR achieves inference time speedups of 10.08\u00d7\u2013185.60\u00d7 for individual MVM and Conv kernels and 1.75\u00d7\u201311.84\u00d7 for end-to-end inference compared to a state-of-the-art FHE compiler. Moreover, MKR enables homomorphic execution of large DNN models, where prior methods fail, significantly advancing the practicality of FHE compilers.<\/jats:p>","DOI":"10.1145\/3763095","type":"journal-article","created":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T08:51:31Z","timestamp":1759999891000},"page":"1261-1288","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["MetaKernel: Enabling Efficient Encrypted Neural Network Inference through Unified MVM and Convolution"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-7070-6828","authenticated-orcid":false,"given":"Peng","family":"Yuan","sequence":"first","affiliation":[{"name":"Ant Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6845-7864","authenticated-orcid":false,"given":"Yan","family":"Liu","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-5782-0454","authenticated-orcid":false,"given":"JianXin","family":"Lai","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2098-2193","authenticated-orcid":false,"given":"Long","family":"Li","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-9692-5578","authenticated-orcid":false,"given":"Tianxiang","family":"Sui","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-2778-7100","authenticated-orcid":false,"given":"Linjie","family":"Xiao","sequence":"additional","affiliation":[{"name":"Ant Group, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8443-3803","authenticated-orcid":false,"given":"Xiaojing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5144-5442","authenticated-orcid":false,"given":"Qing","family":"Zhu","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0380-3506","authenticated-orcid":false,"given":"Jingling","family":"Xue","sequence":"additional","affiliation":[{"name":"UNSW, Sydney, Australia"},{"name":"Ant Group, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2025,10,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3560827.3563379"},{"key":"e_1_2_1_2_1","unstructured":"Martin R. Albrecht Melissa Chase Hao Chen Jintai Ding Shafi Goldwasser Sergey Gorbunov Shai Halevi Jeffrey Hoffstein Kim Laine Kristin E. Lauter Satya Lokam Daniele Micciancio Dustin Moody Travis Morrison Amit Sahai and Vinod Vaikuntanathan. 2019. Homomorphic Encryption Standard. IACR Cryptol. ePrint Arch. 939. https:\/\/eprint.iacr.org\/2019\/939"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3338469.3358944"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Jean-Philippe Bossuat Rosario Cammarota Jung Hee Cheon Ilaria Chillotti Benjamin R. Curtis Wei Dai Huijing Gong Erin Hales Duhyeong Kim Bryan Kumara Changmin Lee Xianhui Lu Carsten Maple Alberto Pedrouzo-Ulloa Rachel Player Luis Antonio Ruiz Lopez Yongsoo Song Donggeon Yhee and Bahattin Yildiz. 2024. Security Guidelines for Implementing Homomorphic Encryption. Cryptology ePrint Archive Paper 2024\/463. https:\/\/eprint.iacr.org\/2024\/463","DOI":"10.62056\/anxra69p1"},{"key":"e_1_2_1_5_1","unstructured":"Zvika Brakerski Craig Gentry and Vinod Vaikuntanathan. 2011. Fully Homomorphic Encryption without Bootstrapping. Cryptology ePrint Archive Paper 2011\/277. https:\/\/eprint.iacr.org\/2011\/277"},{"key":"e_1_2_1_6_1","volume-title":"Selected Areas in Cryptography \u2013 SAC","author":"Cheon Jung Hee","year":"2018","unstructured":"Jung Hee Cheon, Kyoohyung Han, Andrey Kim, Miran Kim, and Yongsoo Song. 2019. A Full RNS Variant of Approximate Homomorphic Encryption. In Selected Areas in Cryptography \u2013 SAC 2018, Carlos Cid and Michael J. Jacobson Jr. (Eds.). Springer International Publishing, Cham. 347\u2013368."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-70694-8_15"},{"key":"e_1_2_1_8_1","volume-title":"DaCapo: Automatic Bootstrapping Management for Efficient Fully Homomorphic Encryption. In 33rd USENIX Security Symposium (USENIX Security 24)","author":"Cheon Seonyoung","year":"2024","unstructured":"Seonyoung Cheon, Yongwoo Lee, Dongkwan Kim, Ju Min Lee, Sunchul Jung, Taekyung Kim, Dongyoon Lee, and Hanjun Kim. 2024. DaCapo: Automatic Bootstrapping Management for Efficient Fully Homomorphic Encryption. In 33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA. 6993\u20137010. isbn:978-1-939133-44-1 https:\/\/www.usenix.org\/conference\/usenixsecurity24\/presentation\/cheon"},{"key":"e_1_2_1_9_1","volume-title":"TFHE: Fast Fully Homomorphic Encryption over the Torus. Cryptology ePrint Archive, Paper 2018\/421. https:\/\/eprint.iacr.org\/2018\/421","author":"Chillotti Ilaria","year":"2018","unstructured":"Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabach\u00e8ne. 2018. TFHE: Fast Fully Homomorphic Encryption over the Torus. Cryptology ePrint Archive, Paper 2018\/421. https:\/\/eprint.iacr.org\/2018\/421"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3386023"},{"key":"e_1_2_1_11_1","volume-title":"CHET: An Optimizing Compiler for Fully-Homomorphic Neural-Network Inferencing. In PLDI","author":"Dathathri Roshan","year":"2019","unstructured":"Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madan Musuvathi, and Todd Mytkowicz. 2019. CHET: An Optimizing Compiler for Fully-Homomorphic Neural-Network Inferencing. In PLDI 2019. ACM, 142\u2013156. https:\/\/www.microsoft.com\/en-us\/research\/publication\/chet-an-optimizing-compiler-for-fully-homomorphic-neural-network-inferencing\/"},{"key":"e_1_2_1_12_1","volume-title":"FHEW: Bootstrapping Homomorphic Encryption in Less Than a Second. In Advances in Cryptology \u2013 EUROCRYPT","author":"Ducas L\u00e9o","year":"2015","unstructured":"L\u00e9o Ducas and Daniele Micciancio. 2015. FHEW: Bootstrapping Homomorphic Encryption in Less Than a Second. In Advances in Cryptology \u2013 EUROCRYPT 2015, Elisabeth Oswald and Marc Fischlin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 617\u2013640. isbn:978-3-662-46800-5"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3676641.3716008"},{"key":"e_1_2_1_14_1","unstructured":"Junfeng Fan and Frederik Vercauteren. 2012. Somewhat Practical Fully Homomorphic Encryption. Cryptology ePrint Archive Paper 2012\/144. https:\/\/eprint.iacr.org\/2012\/144"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071017"},{"key":"e_1_2_1_16_1","volume-title":"A fully homomorphic encryption scheme. Ph. D. Dissertation","author":"Gentry Craig","unstructured":"Craig Gentry. 2009. A fully homomorphic encryption scheme. Ph. D. Dissertation. Stanford University."},{"key":"e_1_2_1_17_1","volume-title":"Asymptotically-Faster, Attribute-Based. In Advances in Cryptology \u2013 CRYPTO","author":"Gentry Craig","year":"2013","unstructured":"Craig Gentry, Amit Sahai, and Brent Waters. 2013. Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based. In Advances in Cryptology \u2013 CRYPTO 2013, Ran Canetti and Juan A. Garay (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 75\u201392. isbn:978-3-642-40041-4"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.07901"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 33nd International Conference on Machine Learning, ICML","author":"Gilad-Bachrach Ran","year":"2016","unstructured":"Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin E. Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, Maria-Florina Balcan and Kilian Q. Weinberger (Eds.) (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 201\u2013210. http:\/\/proceedings.mlr.press\/v48\/gilad-bachrach16.html"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","unstructured":"Gamze G\u00fcrsoy Eduardo Chielle Charlotte M. Brannon Michail Maniatakos and Mark Gerstein. 2020. Privacy-preserving genotype imputation with fully homomorphic encryption. bioRxiv https:\/\/doi.org\/10.1101\/2020.05.29.124412 10.1101\/2020.05.29.124412","DOI":"10.1101\/2020.05.29.124412"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-96884-1_4"},{"key":"e_1_2_1_22_1","volume-title":"Deep Residual Learning for Image Recognition. CoRR, abs\/1512.03385","author":"He Kaiming","year":"2015","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR, abs\/1512.03385 (2015), arXiv:1512.03385. arxiv:1512.03385"},{"key":"e_1_2_1_23_1","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arxiv:1704.04861. arxiv:1704.04861"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3669940.3707260"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_1_26_1","volume-title":"2024 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). 193\u2013204","author":"Jianhui Li Zhennan Qin","year":"2024","unstructured":"Zhennan Qin Jianhui Li and Dan Lavery. 2024. oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation. In 2024 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). 193\u2013204."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 27th USENIX Conference on Security Symposium (SEC\u201918)","author":"Juvekar Chiraag","year":"2018","unstructured":"Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. 2018. GAZELLE: a low latency framework for secure neural network inference. In Proceedings of the 27th USENIX Conference on Security Symposium (SEC\u201918). USENIX Association, USA. 1651\u20131668. isbn:9781931971461"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3656382"},{"key":"e_1_2_1_29_1","unstructured":"Eunsang Lee Joon-Woo Lee Junghyun Lee Young-Sik Kim Yongjune Kim Jong-Seon No and Woosuk Choi. 2022. Low-Complexity Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Multiplexed Parallel Convolutions. In Proceedings of the 39th International Conference on Machine Learning Kamalika Chaudhuri Stefanie Jegelka Le Song Csaba Szepesvari Gang Niu and Sivan Sabato (Eds.) (Proceedings of Machine Learning Research Vol. 162). PMLR 12403\u201312422. https:\/\/proceedings.mlr.press\/v162\/lee22e.html"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2021.3105111"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3159694"},{"key":"e_1_2_1_32_1","volume-title":"ANT-ACE: An FHE Compiler Framework for Automating Neural Network Inference. In 2025 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO).","author":"Li Long","year":"2025","unstructured":"Long Li, Jianxin Lai, Peng Yuan, Tianxiang Sui, Yan Liu, Qing Zhu, Xiaojing Zhang, Linjie Xiao, Wenguang Chen, and Jingling Xue. 2025. ANT-ACE: An FHE Compiler Framework for Automating Neural Network Inference. In 2025 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3669940.3707276"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.46586\/TCHES.V2023.I2.358-380"},{"key":"e_1_2_1_35_1","unstructured":"Meta. 2024. Build the future of AI with Meta Llama 3. https:\/\/llama.meta.com\/llama3\/"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480070"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527393"},{"key":"e_1_2_1_38_1","volume-title":"https:\/\/github.com\/Microsoft\/SEAL Microsoft Research","author":"Microsoft SEAL","unstructured":"2020. Microsoft SEAL (release 3.6). https:\/\/github.com\/Microsoft\/SEAL Microsoft Research, Redmond, WA."},{"key":"e_1_2_1_39_1","volume-title":"Advances in Cryptology \u2013 CRYPTO","author":"Shai Halevi","year":"2014","unstructured":"Halevi Shai and Shoup Victor. 2014. Algorithms in HElib. In Advances in Cryptology \u2013 CRYPTO 2014, Juan A. Garay and Rosario Gennaro (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 554\u2013571. isbn:978-3-662-44371-2"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/3049832.3049846"},{"key":"e_1_2_1_41_1","volume-title":"HECO: Automatic Code Optimizations for Efficient Fully Homomorphic Encryption. arxiv:2202.01649","author":"Viand Alexander","year":"2022","unstructured":"Alexander Viand, Patrick Jattke, Miro Haller, and Anwar Hithnawi. 2022. HECO: Automatic Code Optimizations for Efficient Fully Homomorphic Encryption. arxiv:2202.01649"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO53902.2022.9741265"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640390"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","unstructured":"Peng Yuan Yan Liu JianXin Lai Long Li Tianxiang Sui Linjie Xiao Xiaojing Zhang Qing Zhu and Jingling Xue. 2025. MetaKernel Artifact. https:\/\/doi.org\/10.5281\/zenodo.16911192 10.5281\/zenodo.16911192","DOI":"10.5281\/zenodo.16911192"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS53621.2022.00074"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3696443.3708917"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3763095","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:40:19Z","timestamp":1760031619000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3763095"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,9]]},"references-count":46,"journal-issue":{"issue":"OOPSLA2","published-print":{"date-parts":[[2025,10,9]]}},"alternative-id":["10.1145\/3763095"],"URL":"https:\/\/doi.org\/10.1145\/3763095","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,9]]},"assertion":[{"value":"2025-03-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}