{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T15:19:56Z","timestamp":1774365596513,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":57,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,4,17]],"date-time":"2021-04-17T00:00:00Z","timestamp":1618617600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Institute for Information & communications Technology Promotion (IITP)","award":["2014-0-00035"],"award-info":[{"award-number":["2014-0-00035"]}]},{"name":"SK Hynix"},{"name":"National Research Foundation of Korea (NRF)","award":["2020R1A2C3010663"],"award-info":[{"award-number":["2020R1A2C3010663"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,4,19]]},"DOI":"10.1145\/3445814.3446717","type":"proceedings-article","created":{"date-parts":[[2021,4,11]],"date-time":"2021-04-11T17:06:26Z","timestamp":1618160786000},"page":"302-313","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["MERCI: efficient embedding reduction on commodity hardware via sub-query memoization"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9532-1621","authenticated-orcid":false,"given":"Yejin","family":"Lee","sequence":"first","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Seong Hoon","family":"Seo","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Hyunji","family":"Choi","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Hyoung Uk","family":"Sul","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Soosung","family":"Kim","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4266-4919","authenticated-orcid":false,"given":"Jae W.","family":"Lee","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2669-6849","authenticated-orcid":false,"given":"Tae Jun","family":"Ham","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]}],"member":"320","published-online":{"date-parts":[[2021,4,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jefrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geofrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jefrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geofrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org."},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the International Conference on Very Large Data Bases (VLDB).","author":"Agrawal Rakesh","year":"1994","unstructured":"Rakesh Agrawal and Ramakrishnan Srikant . 1994 . Fast Algorithms for Mining Association Rules in Large Databases . Proceedings of the International Conference on Very Large Data Bases (VLDB). Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the International Conference on Very Large Data Bases (VLDB)."},{"key":"e_1_3_2_1_3_1","unstructured":"Amazon. Amazon EC2 M5 Instances. https:\/\/aws.amazon.com\/ec2\/instancetypes\/m5\/.  Amazon. Amazon EC2 M5 Instances. https:\/\/aws.amazon.com\/ec2\/instancetypes\/m5\/."},{"key":"e_1_3_2_1_4_1","volume-title":"Hierarchy-Based Image Embeddings for Semantic Image Retrieval. In proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV).","author":"Barz B.","unstructured":"B. Barz and J. Denzler . 2019 . Hierarchy-Based Image Embeddings for Semantic Image Retrieval. In proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV). B. Barz and J. Denzler. 2019. Hierarchy-Based Image Embeddings for Semantic Image Retrieval. In proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV)."},{"key":"e_1_3_2_1_5_1","volume-title":"The Million Song Dataset. In proceedings of the International Conference on Music Information Retrieval (ISMIR).","author":"Bertin-Mahieux Thierry","year":"2011","unstructured":"Thierry Bertin-Mahieux , Daniel P.W. Ellis , Brian Whitman , and Paul Lamere . 2011 . The Million Song Dataset. In proceedings of the International Conference on Music Information Retrieval (ISMIR). Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In proceedings of the International Conference on Music Information Retrieval (ISMIR)."},{"key":"e_1_3_2_1_6_1","volume-title":"ATM: Approximate Task Memoization in the Runtime System. In proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS).","author":"Brumar I.","unstructured":"I. Brumar , M. Casas , M. Moreto , M. Valero , and G. S. Sohi . 2017 . ATM: Approximate Task Memoization in the Runtime System. In proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS). I. Brumar, M. Casas, M. Moreto, M. Valero, and G. S. Sohi. 2017. ATM: Approximate Task Memoization in the Runtime System. In proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS)."},{"key":"e_1_3_2_1_7_1","volume-title":"Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation. CoRR abs\/","author":"Campo Miguel","year":"1807","unstructured":"Miguel Campo , Cheng-Kang Hsieh , Matt Nickens , J. J. Espinoza , Abhinav Taliyan , Julie Rieger , Jean Ho , and Bettina Sherick . 2018. Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation. CoRR abs\/ 1807 .04465 ( 2018 ). Miguel Campo, Cheng-Kang Hsieh, Matt Nickens, J. J. Espinoza, Abhinav Taliyan, Julie Rieger, Jean Ho, and Bettina Sherick. 2018. Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation. CoRR abs\/ 1807.04465 ( 2018 )."},{"key":"e_1_3_2_1_8_1","first-page":"1479","article-title":"PaToH (Partitioning Tool for Hypergraphs)","author":"\u00c7ataly\u00fcrek \u00dcmit V.","year":"2011","unstructured":"\u00dcmit V. \u00c7ataly\u00fcrek and Cevdet Aykanat . 2011 . PaToH (Partitioning Tool for Hypergraphs) . In Encyclopedia of Parallel Computing. 1479 - 1487 . \u00dcmit V. \u00c7ataly\u00fcrek and Cevdet Aykanat. 2011. PaToH (Partitioning Tool for Hypergraphs). In Encyclopedia of Parallel Computing. 1479-1487.","journal-title":"Encyclopedia of Parallel Computing."},{"key":"e_1_3_2_1_9_1","unstructured":"Tianshi Chen Zidong Du Ninghui Sun Jia Wang Chengyong Wu Yunji Chen and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).  Tianshi Chen Zidong Du Ninghui Sun Jia Wang Chengyong Wu Yunji Chen and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_2_1_10_1","volume-title":"Wide & Deep Learning for Recommender Systems. In proceedings of Workshop on Deep Learning for Recommender Systems (DLRS).","author":"Cheng Heng-Tze","year":"2016","unstructured":"Heng-Tze Cheng , Levent Koc , Jeremiah Harmsen , Tal Shaked , Tushar Chandra , Hrishi Aradhye , Glen Anderson , Greg Corrado , Wei Chai , Mustafa Ispir , Rohan Anil , Zakaria Haque , Lichan Hong , Vihan Jain , Xiaobing Liu , and Hemal Shah . 2016 . Wide & Deep Learning for Recommender Systems. In proceedings of Workshop on Deep Learning for Recommender Systems (DLRS). Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In proceedings of Workshop on Deep Learning for Recommender Systems (DLRS)."},{"key":"e_1_3_2_1_11_1","volume-title":"proceedings of the Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO).","author":"Conners D. A.","unstructured":"D. A. Conners and W. W. Hwu . 1999. Compiler-directed dynamic computation reuse: rationale and initial results . In proceedings of the Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO). D. A. Conners and W. W. Hwu. 1999. Compiler-directed dynamic computation reuse: rationale and initial results. In proceedings of the Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_12_1","unstructured":"Intel Corporation. Intel VTune Profiler. https:\/\/software.intel.com\/content\/www\/ us\/en\/develop\/tools\/vtune-profiler.html.  Intel Corporation. Intel VTune Profiler. https:\/\/software.intel.com\/content\/www\/ us\/en\/develop\/tools\/vtune-profiler.html."},{"key":"e_1_3_2_1_13_1","volume-title":"Deep Neural Networks for YouTube Recommendations. In proceedings of the ACM Conference on Recommender Systems (RecSys).","author":"Covington Paul","year":"2016","unstructured":"Paul Covington , Jay Adams , and Emre Sargin . 2016 . Deep Neural Networks for YouTube Recommendations. In proceedings of the ACM Conference on Recommender Systems (RecSys). Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In proceedings of the ACM Conference on Recommender Systems (RecSys)."},{"key":"e_1_3_2_1_14_1","volume-title":"Search engines: information retrieval in practice","author":"Croft W. Bruce","unstructured":"W. Bruce Croft , Donald Metzler , and Trevor Strohman . 2010. Search engines: information retrieval in practice ( 1 st ed.). Addison-Wesley , Boston . W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search engines: information retrieval in practice (1st ed.). Addison-Wesley, Boston.","edition":"1"},{"key":"e_1_3_2_1_15_1","volume-title":"ACM\/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).","author":"David H.","unstructured":"H. David , E. Gorbatov , U. R. Hanebutte , R. Khanna , and C. Le . 2010. RAPL: Memory power estimation and capping . ACM\/IEEE International Symposium on Low-Power Electronics and Design (ISLPED). H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. ACM\/IEEE International Symposium on Low-Power Electronics and Design (ISLPED)."},{"key":"e_1_3_2_1_16_1","volume-title":"Computer Architecture: Empowering the Machine-Learning Revolution","author":"Dean J.","year":"2018","unstructured":"J. Dean , D. Patterson , and C. Young . 2018 . A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution . IEEE Micro 38, 2 ( 2018 ). J. Dean, D. Patterson, and C. Young. 2018. A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution. IEEE Micro 38, 2 ( 2018 )."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2006.1639359"},{"key":"e_1_3_2_1_18_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In proceedings of the Conference of the North American","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 ( Long and Short Papers) (NAACL) . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 ( Long and Short Papers) (NAACL)."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of Machine Learning and Systems (MLSys).","author":"Eisenman Assaf","year":"2019","unstructured":"Assaf Eisenman , Maxim Naumov , Darryl Gardner , Misha Smelyanskiy , Sergey Pupyrev , Kim M. Hazelwood , Asaf Cidon , and Sachin Katti . 2019 . Bandana: Using Non-Volatile Memory for Storing Deep Learning Models . Proceedings of Machine Learning and Systems (MLSys). Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim M. Hazelwood, Asaf Cidon, and Sachin Katti. 2019. Bandana: Using Non-Volatile Memory for Storing Deep Learning Models. Proceedings of Machine Learning and Systems (MLSys)."},{"key":"e_1_3_2_1_20_1","unstructured":"Facebook. Cafe2. https:\/\/cafe2.ai.  Facebook. Cafe2. https:\/\/cafe2.ai."},{"key":"e_1_3_2_1_21_1","volume-title":"Scaling Datacenter Accelerators with Compute-Reuse Architectures. In proceedings of the Annual International Symposium on Computer Architecture (ISCA).","author":"Fuchs Adi","year":"2018","unstructured":"Adi Fuchs and David Wentzlaf . 2018 . Scaling Datacenter Accelerators with Compute-Reuse Architectures. In proceedings of the Annual International Symposium on Computer Architecture (ISCA). Adi Fuchs and David Wentzlaf. 2018. Scaling Datacenter Accelerators with Compute-Reuse Architectures. In proceedings of the Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/239"},{"key":"e_1_3_2_1_23_1","volume-title":"DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Gupta Udit","year":"2020","unstructured":"Udit Gupta , Samuel Hsia , Vikram Saraph , Xiaodong Wang , Brandon Reagen , GuYeon Wei , Hsien-Hsin S. Lee , David Brooks , and Carole-Jean Wu . 2020 . DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, GuYeon Wei, Hsien-Hsin S. Lee, David Brooks, and Carole-Jean Wu. 2020. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_24_1","volume-title":"proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Gupta U.","unstructured":"U. Gupta , C. Wu , X. Wang , M. Naumov , B. Reagen , D. Brooks , B. Cottel , K. Hazelwood , M. Hempstead , B. Jia , H. S. Lee , A. Malevich , D. Mudigere , M. Smelyanskiy , L. Xiong , and X. Zhang . 2020. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation . In proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA). U. Gupta, C. Wu, X. Wang, M. Naumov, B. Reagen, D. Brooks, B. Cottel, K. Hazelwood, M. Hempstead, B. Jia, H. S. Lee, A. Malevich, D. Mudigere, M. Smelyanskiy, L. Xiong, and X. Zhang. 2020. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation. In proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335372"},{"key":"e_1_3_2_1_26_1","volume-title":"EIE: Eficient Inference Engine on Compressed Deep Neural Network. In proceedings of the International Symposium on Computer Architecture (ISCA).","author":"Han Song","unstructured":"Song Han , Xingyu Liu , Huizi Mao , Jing Pu , Ardavan Pedram , Mark A. Horowitz , and William J. Dally . 2016 . EIE: Eficient Inference Engine on Compressed Deep Neural Network. In proceedings of the International Symposium on Computer Architecture (ISCA). Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Eficient Inference Engine on Compressed Deep Neural Network. In proceedings of the International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_27_1","volume-title":"2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Hazelwood K.","unstructured":"K. Hazelwood , S. Bird , D. Brooks , S. Chintala , U. Diril , D. Dzhulgakov , M. Fawzy , B. Jia , Y. Jia , A. Kalro , J. Law , K. Lee , J. Lu , P. Noordhuis , M. Smelyanskiy , L. Xiong , and X. Wang . 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective . 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). K. Hazelwood, S. Bird, D. Brooks, S. Chintala, U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia, A. Kalro, J. Law, K. Lee, J. Lu, P. Noordhuis, M. Smelyanskiy, L. Xiong, and X. Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_28_1","volume-title":"Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In proceedings of the International Conference on World Wide Web (WWW).","author":"He Ruining","year":"2016","unstructured":"Ruining He and Julian McAuley . 2016 . Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In proceedings of the International Conference on World Wide Web (WWW). Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In proceedings of the International Conference on World Wide Web (WWW)."},{"key":"e_1_3_2_1_29_1","volume-title":"Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Hwang R.","unstructured":"R. Hwang , T. Kim , Y. Kwon , and M. Rhu . 2020. Centaur: A Chiplet-based , Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). R. Hwang, T. Kim, Y. Kwon, and M. Rhu. 2020. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137650"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.748202"},{"key":"e_1_3_2_1_32_1","volume-title":"RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Ke Liu","year":"2020","unstructured":"Liu Ke , Udit Gupta , Benjamin Youngjae Cho , David Brooks , Vikas Chandra , Utku Diril , Amin Firoozshahian , Kim M. Hazelwood , Bill Jia , Hsien-Hsin S. Lee , Meng Li , Bert Maher , Dheevatsa Mudigere , Maxim Naumov , Martin Schatz , Mikhail Smelyanskiy , Xiaodong Wang , Brandon Reagen , Carole-Jean Wu , Mark Hempstead , and Xuan Zhang . 2020 . RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim M. Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, and Xuan Zhang. 2020. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In proceedings of ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_33_1","volume-title":"proceedings of the International Conference on Machine Learning (ICML).","author":"Kusner Matt J.","unstructured":"Matt J. Kusner , Yu Sun , Nicholas I. Kolkin , and Kilian Q. Weinberger . 2015. From Word Embeddings to Document Distances . In proceedings of the International Conference on Machine Learning (ICML). Matt J. Kusner, Yu Sun, Nicholas I. Kolkin, and Kilian Q. Weinberger. 2015. From Word Embeddings to Document Distances. In proceedings of the International Conference on Machine Learning (ICML)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358284"},{"key":"e_1_3_2_1_35_1","volume-title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs\/","author":"Liu Yinhan","year":"1907","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs\/ 1907 .11692 ( 2019 ). Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs\/ 1907.11692 ( 2019 )."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322215"},{"key":"e_1_3_2_1_37_1","unstructured":"Michael Lui Yavuz Yetim \u00d6zg\u00fcr \u00d6zkan Zhuoran Zhao Shin-Yeh Tsai CaroleJean Wu and Mark Hempstead. Understanding Capacity-Driven Scale-Out Neural Recommendation Inference.  Michael Lui Yavuz Yetim \u00d6zg\u00fcr \u00d6zkan Zhuoran Zhao Shin-Yeh Tsai CaroleJean Wu and Mark Hempstead. Understanding Capacity-Driven Scale-Out Neural Recommendation Inference."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Donald Michie. 1968. ? Memo? functions and machine learning. Nature 218 5136 ( 1968 ).  Donald Michie. 1968. ? Memo? functions and machine learning. Nature 218 5136 ( 1968 ).","DOI":"10.1038\/218019a0"},{"key":"e_1_3_2_1_39_1","volume-title":"Distributed Representations of Words and Phrases and Their Compositionality. In proceedings of the International Conference on Neural Information Processing Systems (NIPS).","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Greg Corrado , and Jefrey Dean . 2013 . Distributed Representations of Words and Phrases and Their Compositionality. In proceedings of the International Conference on Neural Information Processing Systems (NIPS). Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jefrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In proceedings of the International Conference on Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_40_1","volume-title":"On the Dimensionality of Embeddings for Sparse Features and Data. CoRR abs\/","author":"Naumov Maxim","year":"1901","unstructured":"Maxim Naumov . 2019. On the Dimensionality of Embeddings for Sparse Features and Data. CoRR abs\/ 1901 .02103 ( 2019 ). Maxim Naumov. 2019. On the Dimensionality of Embeddings for Sparse Features and Data. CoRR abs\/ 1901.02103 ( 2019 )."},{"key":"e_1_3_2_1_41_1","unstructured":"Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs\/ 1906.00091 ( 2019 ).  Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs\/ 1906.00091 ( 2019 )."},{"key":"e_1_3_2_1_42_1","unstructured":"Jongsoo Park Maxim Naumov Protonu Basu Summer Deng Aravind Kalaiah Daya Shanker Khudia James Law Parth Malani Andrey Malevich Nadathur Satish Juan Pino Martin Schatz Alexander Sidorov Viswanath Sivakumar Andrew Tulloch Xiaodong Wang Yiming Wu Hector Yuen Utku Diril Dmytro Dzhulgakov Kim M. Hazelwood Bill Jia Yangqing Jia Lin Qiao Vijay Rao Nadav Rotem Sungjoo Yoo and Mikhail Smelyanskiy. 2018. Deep Learning Inference in Facebook Data Centers: Characterization Performance Optimizations and Hardware Implications. CoRR abs\/ 1811.09886 ( 2018 ).  Jongsoo Park Maxim Naumov Protonu Basu Summer Deng Aravind Kalaiah Daya Shanker Khudia James Law Parth Malani Andrey Malevich Nadathur Satish Juan Pino Martin Schatz Alexander Sidorov Viswanath Sivakumar Andrew Tulloch Xiaodong Wang Yiming Wu Hector Yuen Utku Diril Dmytro Dzhulgakov Kim M. Hazelwood Bill Jia Yangqing Jia Lin Qiao Vijay Rao Nadav Rotem Sungjoo Yoo and Mikhail Smelyanskiy. 2018. Deep Learning Inference in Facebook Data Centers: Characterization Performance Optimizations and Hardware Implications. CoRR abs\/ 1811.09886 ( 2018 )."},{"key":"e_1_3_2_1_43_1","volume-title":"An Efective Hash-Based Algorithm for Mining Association Rules. In proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD).","author":"Park Jong Soo","unstructured":"Jong Soo Park , Ming-Syan Chen , and Philip S. Yu . 1995 . An Efective Hash-Based Algorithm for Mining Association Rules. In proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. 1995. An Efective Hash-Based Algorithm for Mining Association Rules. In proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD)."},{"key":"e_1_3_2_1_44_1","first-page":"8024","article-title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","volume":"32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019 . PyTorch: An Imperative Style, High-Performance Deep Learning Library . In Advances in Neural Information Processing Systems 32. 8024 - 8035 . Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024-8035.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_46_1","volume-title":"Ahmed","author":"Rossi Ryan A.","year":"2015","unstructured":"Ryan A. Rossi and Nesreen K . Ahmed . 2015 . The Network Data Repository with Interactive Graph Analytics and Visualization. AAAI. Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. AAAI."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611974317.5"},{"key":"e_1_3_2_1_48_1","volume-title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR abs\/","author":"Shoeybi Mohammad","year":"1909","unstructured":"Mohammad Shoeybi , Mostofa Patwary , Raul Puri , Patrick LeGresley , Jared Casper , and Bryan Catanzaro . 2019. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR abs\/ 1909 .08053 ( 2019 ). Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR abs\/ 1909.08053 ( 2019 )."},{"key":"e_1_3_2_1_49_1","volume-title":"BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. CoRR abs\/","author":"Sun Fei","year":"1904","unstructured":"Fei Sun , Jun Liu , Jian Wu , Changhua Pei , Xiao Lin , Wenwu Ou , and Peng Jiang . 2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. CoRR abs\/ 1904 .06690 ( 2019 ). Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. CoRR abs\/ 1904.06690 ( 2019 )."},{"key":"e_1_3_2_1_50_1","volume-title":"proceedings of the IASTED International MultiConference: Parallel and Distributed Computing and Networks (PDCN).","author":"Tsumura Tomoaki","year":"2007","unstructured":"Tomoaki Tsumura , Ikuma Suzuki , Yasuki Ikeuchi , Hiroshi Matsuo , Hiroshi Nakashima , and Yasuhiko Nakashima . 2007 . Design and Evaluation of an Auto-Memoization Processor . In proceedings of the IASTED International MultiConference: Parallel and Distributed Computing and Networks (PDCN). Tomoaki Tsumura, Ikuma Suzuki, Yasuki Ikeuchi, Hiroshi Matsuo, Hiroshi Nakashima, and Yasuhiko Nakashima. 2007. Design and Evaluation of an Auto-Memoization Processor. In proceedings of the IASTED International MultiConference: Parallel and Distributed Computing and Networks (PDCN)."},{"key":"e_1_3_2_1_51_1","volume-title":"Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. CoRR abs\/","author":"Wang Jizhe","year":"1803","unstructured":"Jizhe Wang , Pipei Huang , Huan Zhao , Zhibo Zhang , Binqiang Zhao , and Dik Lun Lee . 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. CoRR abs\/ 1803 .02349 ( 2018 ). Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. CoRR abs\/ 1803.02349 ( 2018 )."},{"key":"e_1_3_2_1_52_1","volume-title":"proceedings of the International Joint Conference on Neural Networks (IJCNN).","author":"Wang P.","unstructured":"P. Wang , Z. Liu , H. Wang , and D. Wang . 2017. Data-centric computation mode for convolution in deep neural networks . In proceedings of the International Joint Conference on Neural Networks (IJCNN). P. Wang, Z. Liu, H. Wang, and D. Wang. 2017. Data-centric computation mode for convolution in deep neural networks. In proceedings of the International Joint Conference on Neural Networks (IJCNN)."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Ruoxi Wang Bin Fu Gang Fu and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. CoRR abs\/1708.05123 ( 2017 ).  Ruoxi Wang Bin Fu Gang Fu and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. CoRR abs\/1708.05123 ( 2017 ).","DOI":"10.1145\/3124749.3124754"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1987.10478385"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358272"},{"key":"e_1_3_2_1_56_1","volume-title":"Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. arXiv preprint arXiv","author":"Zhao Weijie","year":"2003","unstructured":"Weijie Zhao , Deping Xie , Ronglai Jia , Yulei Qian , Ruiquan Ding , Mingming Sun , and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. arXiv preprint arXiv : 2003 . 05622 ( 2020 ). Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. arXiv preprint arXiv: 2003. 05622 ( 2020 )."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3358045"},{"key":"e_1_3_2_1_58_1","volume-title":"Deep Interest Network for ClickThrough Rate Prediction. In proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD).","author":"Zhou Guorui","year":"2018","unstructured":"Guorui Zhou , Xiaoqiang Zhu , Chenru Song , Ying Fan , Han Zhu , Xiao Ma , Yanghui Yan , Junqi Jin , Han Li , and Kun Gai . 2018 . Deep Interest Network for ClickThrough Rate Prediction. In proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for ClickThrough Rate Prediction. In proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD)."}],"event":{"name":"ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems","location":"Virtual USA","acronym":"ASPLOS '21","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3445814.3446717","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3445814.3446717","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:14Z","timestamp":1750195694000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3445814.3446717"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,17]]},"references-count":57,"alternative-id":["10.1145\/3445814.3446717","10.1145\/3445814"],"URL":"https:\/\/doi.org\/10.1145\/3445814.3446717","relation":{},"subject":[],"published":{"date-parts":[[2021,4,17]]},"assertion":[{"value":"2021-04-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}