{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T14:08:31Z","timestamp":1762092511841,"version":"build-2065373602"},"publisher-location":"New York, NY, USA","reference-count":17,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T00:00:00Z","timestamp":1692835200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,24]]},"DOI":"10.1145\/3609510.3609815","type":"proceedings-article","created":{"date-parts":[[2023,7,31]],"date-time":"2023-07-31T16:08:44Z","timestamp":1690819724000},"page":"37-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Improving Throughput-oriented Generative Inference with CPUs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2312-3049","authenticated-orcid":false,"given":"Daon","family":"Park","sequence":"first","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4844-5378","authenticated-orcid":false,"given":"Sungbin","family":"Jo","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6645-6161","authenticated-orcid":false,"given":"Bernhard","family":"Egger","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2023,8,24]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[n. d.]. HuggingFace Accelerate. https:\/\/huggingface.co\/docs\/accelerate\/index  [n. d.]. HuggingFace Accelerate. https:\/\/huggingface.co\/docs\/accelerate\/index"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.3390\/s21144758"},{"key":"e_1_3_2_1_3_1","volume-title":"Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, et al.","author":"Aminabadi Reza Yazdani","year":"2022","unstructured":"Reza Yazdani Aminabadi , Samyam Rajbhandari , Minjia Zhang , Ammar Ahmad Awan , Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, et al. 2022 . Deepspeed inference: Enabling efficient inference of transformer models at unprecedented scale. arXiv preprint arXiv:2207.00032 (2022). Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, et al. 2022. Deepspeed inference: Enabling efficient inference of transformer models at unprecedented scale. arXiv preprint arXiv:2207.00032 (2022)."},{"key":"e_1_3_2_1_4_1","volume-title":"Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934","author":"Bochkovskiy Alexey","year":"2020","unstructured":"Alexey Bochkovskiy , Chien-Yao Wang , and Hong-Yuan Mark Liao . 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 ( 2020 ). Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)."},{"key":"e_1_3_2_1_5_1","unstructured":"Rishi Bommasani Drew A Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill etal 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).  Rishi Bommasani Drew A Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)."},{"key":"e_1_3_2_1_6_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901.  Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901."},{"key":"e_1_3_2_1_7_1","volume-title":"Transformerxl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860","author":"Dai Zihang","year":"2019","unstructured":"Zihang Dai , Zhilin Yang , Yiming Yang , Jaime Carbonell , Quoc V Le , and Ruslan Salakhutdinov . 2019 . Transformerxl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019). Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformerxl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019)."},{"key":"e_1_3_2_1_8_1","volume-title":"Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al.","author":"Driess Danny","year":"2023","unstructured":"Danny Driess , Fei Xia , Mehdi SM Sajjadi , Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. 2023 . Palm-e : An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023). Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. 2023. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023)."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.3390\/s21041399"},{"key":"e_1_3_2_1_10_1","first-page":"1","article-title":"Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing","volume":"55","author":"Liu Pengfei","year":"2023","unstructured":"Pengfei Liu , Weizhe Yuan , Jinlan Fu , Zhengbao Jiang , Hiroaki Hayashi , and Graham Neubig . 2023 . Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing . Comput. Surveys 55 , 9 (2023), 1 -- 35 . Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1--35.","journal-title":"Comput. Surveys"},{"key":"e_1_3_2_1_12_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , 2019 . Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019). Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_2_1_13_1","unstructured":"Ying Sheng Lianmin Zheng Binhang Yuan Zhuohan Li Max Ryabinin Daniel Y Fu Zhiqiang Xie Beidi Chen Clark Barrett Joseph E Gonzalez etal 2023. High-throughput generative inference of large language models with a single gpu. arXiv preprint arXiv:2303.06865 (2023).  Ying Sheng Lianmin Zheng Binhang Yuan Zhuohan Li Max Ryabinin Daniel Y Fu Zhiqiang Xie Beidi Chen Clark Barrett Joseph E Gonzalez et al. 2023. High-throughput generative inference of large language models with a single gpu. arXiv preprint arXiv:2303.06865 (2023)."},{"key":"e_1_3_2_1_14_1","volume-title":"Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi , Mostofa Patwary , Raul Puri , Patrick LeGresley , Jared Casper , and Bryan Catanzaro . 2019 . Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019). Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3357895"},{"key":"e_1_3_2_1_16_1","volume-title":"Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron , Thibaut Lavril , Gautier Izacard , Xavier Martinet , Marie-Anne Lachaux , Timoth\u00e9e Lacroix , Baptiste Rozi\u00e8re , Naman Goyal , Eric Hambro , Faisal Azhar , 2023 . Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023). Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)."},{"key":"e_1_3_2_1_17_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_18_1","volume-title":"Xi Victoria Lin, et al","author":"Zhang Susan","year":"2022","unstructured":"Susan Zhang , Stephen Roller , Naman Goyal , Mikel Artetxe , Moya Chen , Shuohui Chen , Christopher Dewan , Mona Diab , Xian Li , Xi Victoria Lin, et al . 2022 . Opt : Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022). Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022)."}],"event":{"name":"APSys '23: 14th ACM SIGOPS Asia-Pacific Workshop on Systems","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"],"location":"Seoul Republic of Korea","acronym":"APSys '23"},"container-title":["Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609510.3609815","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3609510.3609815","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:25Z","timestamp":1750178785000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609510.3609815"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,24]]},"references-count":17,"alternative-id":["10.1145\/3609510.3609815","10.1145\/3609510"],"URL":"https:\/\/doi.org\/10.1145\/3609510.3609815","relation":{},"subject":[],"published":{"date-parts":[[2023,8,24]]},"assertion":[{"value":"2023-08-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}