{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T07:40:29Z","timestamp":1763106029782,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2024,10,21]],"date-time":"2024-10-21T00:00:00Z","timestamp":1729468800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62176206"],"award-info":[{"award-number":["62176206"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>As deep learning has produced dramatic breakthroughs in many areas, it has motivated emerging studies on the combination between neural networks and cache replacement algorithms. However, deep learning is a poor fit for performing cache replacement in hardware implementation because its neural network models are impractically large and slow. Many studies have tried to use the guidance of the Belady algorithm to speed up the prediction of cache replacement. But it is still impractical to accurately predict the characteristics of future access addresses, introducing inaccuracy in the discrimination of complex access patterns. Therefore, this paper presents the LSTM-CRP algorithm as well as its efficient hardware implementation, which employs the long short-term memory (LSTM) for access pattern identification at run-time to guide cache replacement algorithm. LSTM-CRP first converts the address into a novel key according to the frequency of the access address and a virtual capacity of the cache, which has the advantages of low information redundancy and high timeliness. Using the key as the inputs of four offline-trained LSTM network-based predictors, LSTM-CRP can accurately classify different access patterns and identify current cache characteristics in a timely manner via an online set dueling mechanism on sampling caches. For efficient implementation, heterogeneous lightweight LSTM networks are dedicatedly constructed in LSTM-CRP to lower hardware overhead and inference delay. The experimental results show that LSTM-CRP was able to averagely improve the cache hit rate by 20.10%, 15.35%, 12.11% and 8.49% compared with LRU, RRIP, Hawkeye and Glider, respectively. Implemented on Xilinx XCVU9P FPGA at the cost of 15,973 LUTs and 1610 FF registers, LSTM-CRP was running at a 200 MHz frequency with 2.74 W power consumption.<\/jats:p>","DOI":"10.3390\/bdcc8100140","type":"journal-article","created":{"date-parts":[[2024,10,21]],"date-time":"2024-10-21T08:53:11Z","timestamp":1729500791000},"page":"140","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["LSTM-CRP: Algorithm-Hardware Co-Design and Implementation of Cache Replacement Policy Using Long Short-Term Memory"],"prefix":"10.3390","volume":"8","author":[{"given":"Yizhou","family":"Wang","sequence":"first","affiliation":[{"name":"School of Microelectronics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Yishuo","family":"Meng","sequence":"additional","affiliation":[{"name":"School of Microelectronics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Jiaxing","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Microelectronics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8221-7670","authenticated-orcid":false,"given":"Chen","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Microelectronics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,10,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1109\/JPROC.2008.2007472","article-title":"Mitigating Memory Wall Effects in High-Clock-Rate and Multicore CMOS 3-D Processor Memory Stacks","volume":"97","author":"Jacob","year":"2009","journal-title":"Proc. IEEE"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3627","DOI":"10.1109\/TCAD.2020.3012213","article-title":"Hardware Memory Management for Future Mobile Hybrid Memory Systems","volume":"39","author":"Wen","year":"2020","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., and Veidenbaum, A.V. (2012, January 1\u20135). Improving Cache Management Policies Using Dynamic Reuse Distances. Proceedings of the 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture, Vancouver, BC, Canada.","DOI":"10.1109\/MICRO.2012.43"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Jaleel, A., Theobald, K.B., Steely, S.C., and Emer, J. (2010, January 19\u201323). High performance cache replacement using re-reference interval prediction (RRIP). Proceedings of the 37th Annual International Symposium on Computer Architecture, Saint-Malo, France.","DOI":"10.1145\/1815961.1815971"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., and Emer, J. (2007, January 9\u201313). Adaptive insertion policies for high performance caching. Proceedings of the 34th Annual International Symposium on Computer Architecture, San Diego, CA, USA.","DOI":"10.1145\/1250662.1250709"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Khan, S.M., Tian, Y., and Jimenez, D.A. (2010, January 4\u20138). Sampling Dead Block Prediction for Last-Level Caches. Proceedings of the 2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture, Atlanta, GA, USA.","DOI":"10.1109\/MICRO.2010.24"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Jain, A., and Lin, C. (2018, January 1\u20136). Rethinking belady\u2019s algorithm to accommodate prefetching. Proceedings of the 45th Annual International Symposium on Computer Architecture, Los Angeles, CA, USA.","DOI":"10.1109\/ISCA.2018.00020"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Jain, A., and Lin, C. (2016, January 18\u201322). Back to the future: Leveraging Belady\u2019s algorithm for improved cache replacement. Proceedings of the 43rd International Symposium on Computer Architecture, Seoul, Republic of Korea.","DOI":"10.1109\/ISCA.2016.17"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1147\/sj.52.0078","article-title":"A study of replacement algorithms for a virtual-storage computer","volume":"5","author":"Belady","year":"1966","journal-title":"IBM Syst. J."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shi, Z., Huang, X., Jain, A., and Lin, C. (2019, January 12\u201316). Applying Deep Learning to the Cache Replacement Problem. Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture, Columbus, OH, USA.","DOI":"10.1145\/3352460.3358319"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Jim\u00e9nez, D.A., and Teran, E. (2017, January 14\u201318). Multiperspective reuse prediction. Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture, Cambridge, MA, USA.","DOI":"10.1145\/3123939.3123942"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Teran, E., Wang, Z., and Jim\u00e9nez, D.A. (2016, January 15\u201319). Perceptron learning for reuse prediction. Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture, Taipei, Taiwan.","DOI":"10.1109\/MICRO.2016.7783705"},{"key":"ref_13","unstructured":"Liu, E.Z., Hashemi, M., Swersky, K., Ranganathan, P., and Ahn, J. (2020, January 13\u201318). An imitation learning approach for cache replacement. Proceedings of the 37th International Conference on Machine Learning, Virtual Event."},{"key":"ref_14","unstructured":"Fu, J.W.C., Patel, J.H., and Janssens, B.L. (1992, January 1\u20134). Stride directed prefetching in scalar processors. Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, OR, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Joseph, D., and Grunwald, D. (1997, January 2\u20134). Prefetching using Markov predictors. Proceedings of the 24th Annual International Symposium on Computer Architecture, Denver, CO, USA.","DOI":"10.1145\/264107.264207"},{"key":"ref_16","unstructured":"Kandiraju, G.B., and Sivasubramaniam, A. (2002, January 25\u201329). Going the distance for TLB prefetching: An application-driven study. Proceedings of the 29th Annual International Symposium on Computer Architecture, Anchorage, Alaska."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/MM.2005.6","article-title":"Data Cache Prefetching Using a Global History Buffer","volume":"25","author":"Nesbit","year":"2005","journal-title":"IEEE Micro"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3312740","article-title":"Evaluation of Hardware Data Prefetchers on Server Processors","volume":"52","author":"Bakhshalipour","year":"2019","journal-title":"ACM Comput. Surv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wu, H., Nathella, K., Sunwoo, D., Jain, A., and Lin, C. (2019, January 22\u201326). Efficient metadata management for irregular data prefetching. Proceedings of the 46th International Symposium on Computer Architecture, Phoenix, AZ, USA.","DOI":"10.1145\/3307650.3322225"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhang, C., Zeng, Y., Shalf, J., and Guo, X. (2020, January 17\u201321). RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher. Proceedings of the 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece.","DOI":"10.1109\/MICRO50266.2020.00057"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1145\/3345000","article-title":"A Neural Network Prefetcher for Arbitrary Memory Access Patterns","volume":"16","author":"Peled","year":"2019","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1145\/1070838.1070856","article-title":"The locality principle","volume":"48","author":"Denning","year":"2005","journal-title":"Commun. ACM"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Noh, H., Hong, S., and Han, B. (2015, January 7\u201313). Learning Deconvolution Network for Semantic Segmentation. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.178"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1109\/TASLP.2018.2888814","article-title":"Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment","volume":"27","author":"Deena","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Proc."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1109\/TNNLS.2020.2979670","article-title":"A Survey of the Usages of Deep Learning for Natural Language Processing","volume":"32","author":"Otter","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lu, X., Najafi, H., Liu, J., and Sun, X.H. (2024, January 2\u20136). CHROME: Concurrency-Aware Holistic Cache Management Framework with Online Reinforcement Learning. Proceedings of the 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Edinburgh, UK.","DOI":"10.1109\/HPCA57654.2024.00090"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1109\/TC.2023.3325625","article-title":"An Efficient Deep Reinforcement Learning-Based Automatic Cache Replacement Policy in Cloud Block Storage Systems","volume":"73","author":"Zhou","year":"2024","journal-title":"IEEE Trans. Comput."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sethumurugan, S., Yin, J., and Sartori, J. (March, January 27). Designing a Cost-Effective Cache Replacement Policy using Machine Learning. Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Republic of Korea.","DOI":"10.1109\/HPCA51647.2021.00033"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3311","DOI":"10.1109\/TCAD.2020.3012173","article-title":"DeepPrefetcher: A Deep Learning Framework for Data Prefetching in Flash Storage Devices","volume":"39","author":"Ganfure","year":"2020","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sarwar, S., Zia-ul-Qayyum, Z.-u.-Q., Malik, O.A., Rizvi, B., Ahmed, H.F., and Takahashi, H. (2010, January 28\u201330). Performance comparison of case retrieval between case based reasoning and neural networks in predictive prefetching. Proceedings of the 6th International Conference on High Capacity Optical Networks and Enabling Technologies, Alexandria, Egypt.","DOI":"10.1109\/HONET.2009.5423052"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liu, W., Cui, J., Liu, J., and Yang, L.T. (2020, January 24\u201327). MLCache: A space-efficient cache scheme based on reuse distance and machine learning for NVMe SSDs. Proceedings of the 39th International Conference on Computer-Aided Design, Virtual Event.","DOI":"10.1145\/3400302.3415652"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1109\/MM.2008.14","article-title":"Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching","volume":"28","author":"Qureshi","year":"2008","journal-title":"IEEE Micro"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zeng, Y., and Guo, X. (2017, January 2\u20135). Long short term memory based hardware prefetcher: A case study. Proceedings of the International Symposium on Memory Systems, Alexandria, VA, USA.","DOI":"10.1145\/3132402.3132405"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chen, K., Huang, L., Li, M., Zeng, X., and Fan, Y. (2018, January 7\u201310). A Compact and Configurable Long Short-Term Memory Neural Network Hardware Architecture. Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece.","DOI":"10.1109\/ICIP.2018.8451053"},{"key":"ref_36","unstructured":"Migacz, S. (2024, March 01). 8-Bit Inference with TensorRT [EB\/OL]. Available online: http:\/\/on-demand.gputechconf.com\/gtc\/2017\/presentation\/s7310-8-bit-inference-with-tensorrt.pdf."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yang, C., Hou, J., Wang, Y., and Geng, L. (2020, January 23\u201325). CRP: Context-directed Replacement Policy to Improve Cache Performance for Coarse-Grained Reconfigurable Arrays. Proceedings of the 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Glasgow, UK.","DOI":"10.1109\/ICECS49266.2020.9294864"},{"key":"ref_38","unstructured":"Rodriguez, L.V., Yusuf, F., Lyons, S., Paz, E., Rangaswami, R., Liu, J., Zhao, M., and Narasimhan, G. (2021, January 23\u201325). Learning Cache Replacement with CACHEUS. Proceedings of the 19th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/10\/140\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:17:29Z","timestamp":1760113049000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/10\/140"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,21]]},"references-count":38,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["bdcc8100140"],"URL":"https:\/\/doi.org\/10.3390\/bdcc8100140","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2024,10,21]]}}}