{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T05:47:56Z","timestamp":1751348876707,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,28]],"date-time":"2022-06-28T00:00:00Z","timestamp":1656374400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China (NSFC)","award":["62022057, 61832006, 61872240"],"award-info":[{"award-number":["62022057, 61832006, 61872240"]}]},{"name":"Shanghai international science and technology collaboration project","award":["21510713600"],"award-info":[{"award-number":["21510713600"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,28]]},"DOI":"10.1145\/3524059.3532366","type":"proceedings-article","created":{"date-parts":[[2022,6,16]],"date-time":"2022-06-16T16:13:11Z","timestamp":1655395991000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["PAME"],"prefix":"10.1145","author":[{"given":"Shulai","family":"Zhang","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Weihao","family":"Cui","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Quan","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Zhengnian","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Yue","family":"Guan","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Jingwen","family":"Leng","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Chao","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Minyi","family":"Guo","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]}],"member":"320","published-online":{"date-parts":[[2022,6,28]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Amazon rekognition. https:\/\/aws.amazon.com\/rekognition\/.  Amazon rekognition. https:\/\/aws.amazon.com\/rekognition\/."},{"key":"e_1_3_2_1_2_1","unstructured":"Amazon translate. https:\/\/aws.amazon.com\/translate\/.  Amazon translate. https:\/\/aws.amazon.com\/translate\/."},{"key":"e_1_3_2_1_3_1","unstructured":"Google translate. https:\/\/translate.google.com.  Google translate. https:\/\/translate.google.com."},{"key":"e_1_3_2_1_4_1","unstructured":"Huggingface pre-trained models. https:\/\/huggingface.co\/transformers\/v3.3.1\/pretrained_models.html.  Huggingface pre-trained models. https:\/\/huggingface.co\/transformers\/v3.3.1\/pretrained_models.html."},{"key":"e_1_3_2_1_5_1","unstructured":"Imagenette and imagewoof. https:\/\/github.com\/fastai\/imagenette.  Imagenette and imagewoof. https:\/\/github.com\/fastai\/imagenette."},{"key":"e_1_3_2_1_6_1","unstructured":"Nvidia triton inference server. https:\/\/github.com\/NVIDIA\/triton-inference-server.  Nvidia triton inference server. https:\/\/github.com\/NVIDIA\/triton-inference-server."},{"key":"e_1_3_2_1_7_1","unstructured":"Onnx. https:\/\/github.com\/onnx\/onnx.  Onnx. https:\/\/github.com\/onnx\/onnx."},{"key":"e_1_3_2_1_8_1","unstructured":"Pytorch models and pre-trained weights. https:\/\/pytorch.org\/vision\/stable\/models.html.  Pytorch models and pre-trained weights. https:\/\/pytorch.org\/vision\/stable\/models.html."},{"volume-title":"https:\/\/developer.nvidia.com\/tensorrt","year":"2021","key":"e_1_3_2_1_9_1","unstructured":"Tensorrt. https:\/\/developer.nvidia.com\/tensorrt , 2021 . Tensorrt. https:\/\/developer.nvidia.com\/tensorrt, 2021."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.471"},{"key":"e_1_3_2_1_11_1","volume-title":"Multi-exit vision transformer for dynamic inference. arXiv preprint arXiv:2106.15183","author":"Bakhtiarnia Arian","year":"2021","unstructured":"Arian Bakhtiarnia , Qi Zhang , and Alexandros Iosifidis . Multi-exit vision transformer for dynamic inference. arXiv preprint arXiv:2106.15183 , 2021 . Arian Bakhtiarnia, Qi Zhang, and Alexandros Iosifidis. Multi-exit vision transformer for dynamic inference. arXiv preprint arXiv:2106.15183, 2021."},{"key":"e_1_3_2_1_12_1","first-page":"527","volume-title":"International Conference on Machine Learning","author":"Bolukbasi Tolga","year":"2017","unstructured":"Tolga Bolukbasi , Joseph Wang , Ofer Dekel , and Venkatesh Saligrama . Adaptive neural networks for efficient inference . In International Conference on Machine Learning , pages 527 -- 536 . PMLR, 2017 . Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. Adaptive neural networks for efficient inference. In International Conference on Machine Learning, pages 527--536. PMLR, 2017."},{"key":"e_1_3_2_1_13_1","volume-title":"Redunet: A white-box deep network from the principle of maximizing rate reduction. arXiv preprint arXiv:2105.10446","author":"Ryan Chan Kwan Ho","year":"2021","unstructured":"Kwan Ho Ryan Chan , Yaodong Yu , Chong You , Haozhi Qi , John Wright , and Yi Ma . Redunet: A white-box deep network from the principle of maximizing rate reduction. arXiv preprint arXiv:2105.10446 , 2021 . Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, and Yi Ma. Redunet: A white-box deep network from the principle of maximizing rate reduction. arXiv preprint arXiv:2105.10446, 2021."},{"key":"e_1_3_2_1_14_1","first-page":"578","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , : An automated end-to-end optimizing compiler for deep learning . In 13th USENIX Symposium on Operating Systems Design and Implementation , pages 578 -- 594 , 2018 . Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, pages 578--594, 2018."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.350"},{"key":"e_1_3_2_1_16_1","first-page":"613","volume-title":"14th USENIX Symposium on Networked Systems Design and Implementation","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw , Xin Wang , Guilio Zhou , Michael J Franklin , Joseph E Gonzalez , and Ion Stoica . Clipper : A low-latency online prediction serving system . In 14th USENIX Symposium on Networked Systems Design and Implementation , pages 613 -- 627 , 2017 . Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation, pages 613--627, 2017."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD46524.2019.00075"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476143"},{"key":"e_1_3_2_1_19_1","volume-title":"Intel ngraph: An intermediate representation, compiler, and executor for deep learning. arXiv preprint arXiv:1801.08058","author":"Cyphers Scott","year":"2018","unstructured":"Scott Cyphers , Arjun K Bansal , Anahita Bhiwandiwalla , Jayaram Bobba , Matthew Brookhart , Avijit Chakraborty , Will Constable , Christian Convey , Leona Cook , Omar Kanawi , Intel ngraph: An intermediate representation, compiler, and executor for deep learning. arXiv preprint arXiv:1801.08058 , 2018 . Scott Cyphers, Arjun K Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, et al. Intel ngraph: An intermediate representation, compiler, and executor for deep learning. arXiv preprint arXiv:1801.08058, 2018."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_21_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018 . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018."},{"key":"e_1_3_2_1_22_1","volume-title":"Duen Horng Chau, and Jimeng Sun. Elf: An early-exiting framework for long-tailed classification. arXiv preprint arXiv:2006.11979","author":"Duggal Rahul","year":"2020","unstructured":"Rahul Duggal , Scott Freitas , Sunny Dhamnani , Duen Horng Chau, and Jimeng Sun. Elf: An early-exiting framework for long-tailed classification. arXiv preprint arXiv:2006.11979 , 2020 . Rahul Duggal, Scott Freitas, Sunny Dhamnani, Duen Horng Chau, and Jimeng Sun. Elf: An early-exiting framework for long-tailed classification. arXiv preprint arXiv:2006.11979, 2020."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/SEC50012.2020.00014"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190541"},{"key":"e_1_3_2_1_25_1","volume-title":"Block-skim: Efficient question answering for transformer. arXiv preprint arXiv:2112.08560","author":"Guan Yue","year":"2021","unstructured":"Yue Guan , Zhengyi Li , Jingwen Leng , Zhouhan Lin , Minyi Guo , and Yuhao Zhu . Block-skim: Efficient question answering for transformer. arXiv preprint arXiv:2112.08560 , 2021 . Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo, and Yuhao Zhu. Block-skim: Efficient question answering for transformer. arXiv preprint arXiv:2112.08560, 2021."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.239"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_28_1","first-page":"76","volume-title":"Proceedings, Part X 16","author":"Harry Hsu Tzu-Ming","year":"2020","unstructured":"Tzu-Ming Harry Hsu , Hang Qi , and Matthew Brown . Federated visual classification with real-world data distribution. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part X 16 , pages 76 -- 92 . Springer , 2020 . Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Federated visual classification with real-world data distribution. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part X 16, pages 76--92. Springer, 2020."},{"key":"e_1_3_2_1_29_1","volume-title":"International Conference on Learning Representations","author":"Hu Ting-Kuei","year":"2019","unstructured":"Ting-Kuei Hu , Tianlong Chen , Haotao Wang , and Zhangyang Wang . Triple wins : Boosting accuracy, robustness and efficiency together by enabling input-adaptive inference . In International Conference on Learning Representations , 2019 . Ting-Kuei Hu, Tianlong Chen, Haotao Wang, and Zhangyang Wang. Triple wins: Boosting accuracy, robustness and efficiency together by enabling input-adaptive inference. In International Conference on Learning Representations, 2019."},{"key":"e_1_3_2_1_30_1","volume-title":"International Conference on Learning Representations","author":"Huang Gao","year":"2018","unstructured":"Gao Huang , Danlu Chen , Tianhong Li , Felix Wu , Laurens van der Maaten, and Kilian Weinberger. Multi-scale dense networks for resource efficient image classification . In International Conference on Learning Representations , 2018 . Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Weinberger. Multi-scale dense networks for resource efficient image classification. In International Conference on Learning Representations, 2018."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482335"},{"key":"e_1_3_2_1_32_1","first-page":"3301","volume-title":"International Conference on Machine Learning","author":"Kaya Yigitcan","year":"2019","unstructured":"Yigitcan Kaya , Sanghyun Hong , and Tudor Dumitras . Shallow-deep networks : Understanding and mitigating network overthinking . In International Conference on Machine Learning , pages 3301 -- 3310 . PMLR, 2019 . Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. Shallow-deep networks: Understanding and mitigating network overthinking. In International Conference on Machine Learning, pages 3301--3310. PMLR, 2019."},{"key":"e_1_3_2_1_33_1","volume-title":"Multi-exit semantic segmentation networks. arXiv preprint arXiv:2106.03527","author":"Kouris Alexandros","year":"2021","unstructured":"Alexandros Kouris , Stylianos I Venieris , Stefanos Laskaridis , and Nicholas D Lane . Multi-exit semantic segmentation networks. arXiv preprint arXiv:2106.03527 , 2021 . Alexandros Kouris, Stylianos I Venieris, Stefanos Laskaridis, and Nicholas D Lane. Multi-exit semantic segmentation networks. arXiv preprint arXiv:2106.03527, 2021."},{"key":"e_1_3_2_1_34_1","first-page":"1","volume-title":"2020 IEEE\/ACM International Conference On Computer Aided Design","author":"Laskaridis Stefanos","year":"2020","unstructured":"Stefanos Laskaridis , Stylianos I Venieris , Hyeji Kim , and Nicholas D Lane . Hapi : hardware-aware progressive inference . In 2020 IEEE\/ACM International Conference On Computer Aided Design , pages 1 -- 9 . IEEE, 2020 . Stefanos Laskaridis, Stylianos I Venieris, Hyeji Kim, and Nicholas D Lane. Hapi: hardware-aware progressive inference. In 2020 IEEE\/ACM International Conference On Computer Aided Design, pages 1--9. IEEE, 2020."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00198"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.537"},{"key":"e_1_3_2_1_38_1","volume-title":"Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030","author":"Liu Ze","year":"2021","unstructured":"Ze Liu , Yutong Lin , Yue Cao , Han Hu , Yixuan Wei , Zheng Zhang , Stephen Lin , and Baining Guo . Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 , 2021 . Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021."},{"key":"e_1_3_2_1_39_1","volume-title":"Deep learning with dynamic computation graphs. arXiv preprint arXiv:1702.02181","author":"Looks Moshe","year":"2017","unstructured":"Moshe Looks , Marcello Herreshoff , DeLesley Hutchins , and Peter Norvig . Deep learning with dynamic computation graphs. arXiv preprint arXiv:1702.02181 , 2017 . Moshe Looks, Marcello Herreshoff, DeLesley Hutchins, and Peter Norvig. Deep learning with dynamic computation graphs. arXiv preprint arXiv:1702.02181, 2017."},{"key":"e_1_3_2_1_40_1","volume-title":"Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, et al. Dynet: The dynamic neural network toolkit. arXiv preprint arXiv:1701.03980","author":"Neubig Graham","year":"2017","unstructured":"Graham Neubig , Chris Dyer , Yoav Goldberg , Austin Matthews , Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, et al. Dynet: The dynamic neural network toolkit. arXiv preprint arXiv:1701.03980 , 2017 . Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, et al. Dynet: The dynamic neural network toolkit. arXiv preprint arXiv:1701.03980, 2017."},{"key":"e_1_3_2_1_41_1","volume-title":"Nvidia cuda c programming guide","author":"Nvidia CUDA","year":"2011","unstructured":"CUDA Nvidia . Nvidia cuda c programming guide . Nvidia Corporation , 120(18):8, 2011 . CUDA Nvidia. Nvidia cuda c programming guide. Nvidia Corporation, 120(18):8, 2011."},{"key":"e_1_3_2_1_42_1","volume-title":"Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139","author":"Olston Christopher","year":"2017","unstructured":"Christopher Olston , Noah Fiedel , Kiril Gorovoy , Jeremiah Harmsen , Li Lao , Fangwei Li , Vinu Rajashekhar , Sukriti Ramesh , and Jordan Soyke . Tensorflow-serving : Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139 , 2017 . Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. Tensorflow-serving: Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139, 2017."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7900006"},{"key":"e_1_3_2_1_44_1","volume-title":"Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461","author":"Wang Alex","year":"2018","unstructured":"Alex Wang , Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , and Samuel R Bowman . Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 , 2018 . Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018."},{"key":"e_1_3_2_1_45_1","volume-title":"Deep high-resolution representation learning for visual recognition","author":"Wang Jingdong","year":"2020","unstructured":"Jingdong Wang , Ke Sun , Tianheng Cheng , Borui Jiang , Chaorui Deng , Yang Zhao , Dong Liu , Yadong Mu , Mingkui Tan , Xinggang Wang , Deep high-resolution representation learning for visual recognition . IEEE transactions on pattern analysis and machine intelligence, 2020 . Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 2020."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_25"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00919"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_29"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.204"},{"key":"e_1_3_2_1_50_1","first-page":"937","volume-title":"2018 USENIX Annual Technical Conference","author":"Xu Shizhen","year":"2018","unstructured":"Shizhen Xu , Hao Zhang , Graham Neubig , Wei Dai , Jin Kyu Kim , Zhijie Deng , Qirong Ho , Guangwen Yang , and Eric P Xing . Cavs : An efficient runtime system for dynamic neural networks . In 2018 USENIX Annual Technical Conference , pages 937 -- 950 , 2018 . Shizhen Xu, Hao Zhang, Graham Neubig, Wei Dai, Jin Kyu Kim, Zhijie Deng, Qirong Ho, Guangwen Yang, and Eric P Xing. Cavs: An efficient runtime system for dynamic neural networks. In 2018 USENIX Annual Technical Conference, pages 937--950, 2018."},{"key":"e_1_3_2_1_51_1","first-page":"173","volume-title":"Proceedings, Part VI 16","author":"Yuan Yuhui","year":"2020","unstructured":"Yuhui Yuan , Xilin Chen , and Jingdong Wang . Object-contextual representations for semantic segmentation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part VI 16 , pages 173 -- 190 . Springer , 2020 . Yuhui Yuan, Xilin Chen, and Jingdong Wang. Object-contextual representations for semantic segmentation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part VI 16, pages 173--190. Springer, 2020."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472456.3473513"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507723"},{"key":"e_1_3_2_1_54_1","first-page":"33","article-title":"Bert loses patience: Fast and robust inference with early exit","author":"Zhou Wangchunshu","year":"2020","unstructured":"Wangchunshu Zhou , Canwen Xu , Tao Ge , Julian McAuley , Ke Xu , and Furu Wei . Bert loses patience: Fast and robust inference with early exit . Advances in Neural Information Processing Systems , 33 , 2020 . Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, and Furu Wei. Bert loses patience: Fast and robust inference with early exit. Advances in Neural Information Processing Systems, 33, 2020.","journal-title":"Advances in Neural Information Processing Systems"}],"event":{"name":"ICS '22: 2022 International Conference on Supercomputing","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"],"location":"Virtual Event","acronym":"ICS '22"},"container-title":["Proceedings of the 36th ACM International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524059.3532366","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524059.3532366","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:37Z","timestamp":1750188637000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524059.3532366"}},"subtitle":["precision-aware multi-exit DNN serving for reducing latencies of batched inferences"],"short-title":[],"issued":{"date-parts":[[2022,6,28]]},"references-count":54,"alternative-id":["10.1145\/3524059.3532366","10.1145\/3524059"],"URL":"https:\/\/doi.org\/10.1145\/3524059.3532366","relation":{},"subject":[],"published":{"date-parts":[[2022,6,28]]},"assertion":[{"value":"2022-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}