{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:10:03Z","timestamp":1750194603484,"version":"3.41.0"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,5,10]],"date-time":"2021-05-10T00:00:00Z","timestamp":1620604800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,9,30]]},"abstract":"<jats:p>Sparse memory accesses, which are scattered accesses to single elements of a large data structure, are a challenge for current processor architectures. Their lack of spatial and temporal locality and their irregularity makes caches and traditional stream prefetchers useless. Furthermore, performing standard caching and prefetching on sparse accesses wastes precious memory bandwidth and thrashes caches, deteriorating performance for regular accesses. Bypassing prefetchers and caches for sparse accesses, and fetching only a single element (e.g., 8\u00a0B) from main memory (subline access), can solve these issues.<\/jats:p>\n          <jats:p>Deciding which accesses to handle as sparse accesses and which as regular cached accesses, is a challenging task, with a large potential impact on performance. Not only is performance reduced by treating sparse accesses as regular accesses, not caching accesses that do have locality also negatively impacts performance by significantly increasing their latency and bandwidth consumption. Furthermore, this decision depends on the dynamic environment, such as input set characteristics and system load, making a static decision by the programmer or compiler suboptimal.<\/jats:p>\n          <jats:p>\n            We propose the\n            <jats:bold>Instruction Spatial Locality Estimator<\/jats:bold>\n            (\n            <jats:bold>ISLE<\/jats:bold>\n            ), a hardware detector that finds instructions that access isolated words in a sea of unused data. These sparse accesses are dynamically converted into uncached subline accesses, while keeping regular accesses cached. ISLE does not require modifying source code or binaries, and adapts automatically to a changing environment (input data, available bandwidth, etc.). We apply ISLE to a graph analytics processor running sparse graph workloads, and show that ISLE outperforms the performance of no subline accesses, manual sublining, and prior work on detecting sparse accesses.\n          <\/jats:p>","DOI":"10.1145\/3452141","type":"journal-article","created":{"date-parts":[[2021,5,10]],"date-time":"2021-05-10T22:15:17Z","timestamp":1620684917000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Automatic Sublining for Efficient Sparse Memory Accesses"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2286-1525","authenticated-orcid":false,"given":"Wim","family":"Heirman","sequence":"first","affiliation":[{"name":"Intel Corporation, Belgium"}]},{"given":"Stijn","family":"Eyerman","sequence":"additional","affiliation":[{"name":"Intel Corporation, Belgium"}]},{"given":"Kristof Du","family":"Bois","sequence":"additional","affiliation":[{"name":"Intel Corporation, Belgium"}]},{"given":"Ibrahim","family":"Hur","sequence":"additional","affiliation":[{"name":"Intel Corporation, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,5,10]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Sriram Aananthakrishnan Nesreen K. Ahmed Vincent Cave Marcelo Cintra Yigit Demir Kristof Du Bois Stijn Eyerman Joshua B. Fryman Ivan Ganev Wim Heirman Hans-Christian Hoppe Jason Howard Ibrahim Hur MidhunChandra Kodiyath Samkit Jain Daniel S. Klowden Marek M. Landowski Laurent Montigny Ankit More Przemyslaw Ossowski Robert Pawlowski Nick Pepperling Fabrizio Petrini Mariusz Sikora Balasubramanian Seshasayee Shaden Smith Sebastian Szkoda Sanjaya Tayal2020. PIUMA: Programmable Integrated Unified Memory Architecture. arxiv:cs.AR\/2010.06277  Sriram Aananthakrishnan Nesreen K. Ahmed Vincent Cave Marcelo Cintra Yigit Demir Kristof Du Bois Stijn Eyerman Joshua B. Fryman Ivan Ganev Wim Heirman Hans-Christian Hoppe Jason Howard Ibrahim Hur MidhunChandra Kodiyath Samkit Jain Daniel S. Klowden Marek M. Landowski Laurent Montigny Ankit More Przemyslaw Ossowski Robert Pawlowski Nick Pepperling Fabrizio Petrini Mariusz Sikora Balasubramanian Seshasayee Shaden Smith Sebastian Szkoda Sanjaya Tayal2020. PIUMA: Programmable Integrated Unified Memory Architecture. arxiv:cs.AR\/2010.06277"},{"key":"e_1_2_1_2_1","unstructured":"Advanced Micro Devices Inc.2013. High Bandwidth Memory (HBM) DRAM.  Advanced Micro Devices Inc.2013. High Bandwidth Memory (HBM) DRAM."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173189"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240302.3240416"},{"key":"e_1_2_1_5_1","volume-title":"Patterson","author":"Beamer Scott","year":"2015","unstructured":"Scott Beamer , Krste Asanovic , and David A . Patterson . 2015 . The GAP Benchmark Suite . arxiv:cs.DC\/1508.03619 Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. arxiv:cs.DC\/1508.03619"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063454"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.30"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972740.43"},{"key":"e_1_2_1_9_1","unstructured":"Hybrid Memory Cube Consortium. 2015. Hybrid Memory Cube Specification 2.1.  Hybrid Memory Cube Consortium. 2015. Hybrid Memory Cube Specification 2.1."},{"key":"e_1_2_1_10_1","unstructured":"Cray. 2014. Urika GD. Retrieved from https:\/\/www.cray.com\/sites\/default\/files\/resources\/Urika-GD-TechSpecs.pdf.  Cray. 2014. Urika GD. Retrieved from https:\/\/www.cray.com\/sites\/default\/files\/resources\/Urika-GD-TechSpecs.pdf."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IA3.2016.007"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00025"},{"key":"e_1_2_1_13_1","volume-title":"Ibrahim Hur, and Joshua B. Fryman.","author":"Eyerman Stijn","year":"2020","unstructured":"Stijn Eyerman , Wim Heirman , Kristof Du Bois , Ibrahim Hur, and Joshua B. Fryman. 2020 . Indirect memory fetcher. US patent 10,684,858. Stijn Eyerman, Wim Heirman, Kristof Du Bois, Ibrahim Hur, and Joshua B. Fryman. 2020. Indirect memory fetcher. US patent 10,684,858."},{"key":"e_1_2_1_14_1","volume-title":"Damla Senol Cali, and Onur Mutlu","author":"Ghose Saugata","year":"2019","unstructured":"Saugata Ghose , Tianshi Li , Nastaran Hajinazar , Damla Senol Cali, and Onur Mutlu . 2019 . Understanding the interactions of workloads and DRAM types: A comprehensive experimental study. arxiv:cs.AR\/1902.07609 Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, and Onur Mutlu. 2019. Understanding the interactions of workloads and DRAM types: A comprehensive experimental study. arxiv:cs.AR\/1902.07609"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2008.45"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830799"},{"key":"e_1_2_1_17_1","volume-title":"Stijn Eyerman, Joshua B. Fryman, and Ibrahim Hur.","author":"Heirman Wim","year":"2018","unstructured":"Wim Heirman , Kristof Du Bois , Stijn Eyerman, Joshua B. Fryman, and Ibrahim Hur. 2018 . System, apparatus and method for dynamic automatic sub-cacheline granularity memory access control. US patent application 16\/203,891. Wim Heirman, Kristof Du Bois, Stijn Eyerman, Joshua B. Fryman, and Ibrahim Hur. 2018. System, apparatus and method for dynamic automatic sub-cacheline granularity memory access control. US patent application 16\/203,891."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243181"},{"key":"e_1_2_1_19_1","unstructured":"SK Hynix. 2018. SK Hynix Inc. Announces 1Ynm 16Gb DDR5 DRAM.  SK Hynix. 2018. SK Hynix Inc. Announces 1Ynm 16Gb DDR5 DRAM."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815971"},{"key":"e_1_2_1_21_1","volume-title":"Towards Dataflow-Based Graph Accelerator. In International Conference on Distributed Computing Systems (ICDCS \u201917)","author":"Jin Hai","year":"2017","unstructured":"Hai Jin , Pengcheng Yao , Xiaofei Liao , Long Zheng , and Xianliang Li . 2017 . Towards Dataflow-Based Graph Accelerator. In International Conference on Distributed Computing Systems (ICDCS \u201917) . Hai Jin, Pengcheng Yao, Xiaofei Liao, Long Zheng, and Xianliang Li. 2017. Towards Dataflow-Based Graph Accelerator. In International Conference on Distributed Computing Systems (ICDCS \u201917)."},{"key":"e_1_2_1_22_1","volume-title":"Cray User Group Proceedings.","author":"Kopser Andrew","year":"2011","unstructured":"Andrew Kopser and Dennis Vollrath . 2011 . Overview of the next generation Cray XMT . In Cray User Group Proceedings. Andrew Kopser and Dennis Vollrath. 2011. Overview of the next generation Cray XMT. In Cray User Group Proceedings."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.42"},{"key":"e_1_2_1_24_1","volume-title":"International SoC Design Conference.","author":"Kwon Sanghyuk","year":"2014","unstructured":"Sanghyuk Kwon , Young Hoon Son , and Jung Ho Ahn . 2014 . Understanding DDR4 in pursuit of In-DRAM ECC . In International SoC Design Conference. Sanghyuk Kwon, Young Hoon Son, and Jung Ho Ahn. 2014. Understanding DDR4 in pursuit of In-DRAM ECC. In International SoC Design Conference."},{"key":"e_1_2_1_25_1","volume-title":"Yookun Cho, and Chong Sang Kim.","author":"Lee Donghee","year":"2001","unstructured":"Donghee Lee , Jongmoo Choi , Jong-Hun Kim , Sam H. Noh , Sang Lyul Min , Yookun Cho, and Chong Sang Kim. 2001 . LRFU : A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Transactions on Computers 12 (2001), 1352--1361. Donghee Lee, Jongmoo Choi, Jong-Hun Kim, Sam H. Noh, Sang Lyul Min, Yookun Cho, and Chong Sang Kim. 2001. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Transactions on Computers12 (2001), 1352--1361."},{"key":"e_1_2_1_26_1","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http:\/\/snap.stanford.edu\/data.  Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http:\/\/snap.stanford.edu\/data."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751237"},{"volume-title":"International Conference on High Performance Computing, Networking, Storage and Analysis (SC).","author":"Li Sheng","key":"e_1_2_1_28_1","unstructured":"Sheng Li , Doe Hyun Yoon , Ke Chen , Jishen Zhao , Jung Ho Ahn , Jay B. Brockman , Yuan Xie , and Norman P. Jouppi . 2012. MAGE: Adaptive granularity and ECC for resilient and power efficient memory systems . In International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Sheng Li, Doe Hyun Yoon, Ke Chen, Jishen Zhao, Jung Ho Ahn, Jay B. Brockman, Yuan Xie, and Norman P. Jouppi. 2012. MAGE: Adaptive granularity and ECC for resilient and power efficient memory systems. In International Conference on High Performance Computing, Networking, Storage and Analysis (SC)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626407002843"},{"volume-title":"IEEE International Congress on Big Data.","author":"Nisar M. Usman","key":"e_1_2_1_30_1","unstructured":"M. Usman Nisar , Arash Fard , and John A. Miller . 2013. Techniques for graph analytics on big data . In IEEE International Congress on Big Data. M. Usman Nisar, Arash Fard, and John A. Miller. 2013. Techniques for graph analytics on big data. In IEEE International Congress on Big Data."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.24"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00067"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250709"},{"volume-title":"International Symposium on High Performance Computer Architecture (HPCA \u201907)","author":"Qureshi Moinuddin K.","key":"e_1_2_1_34_1","unstructured":"Moinuddin K. Qureshi , M. Aater Suleman , and Yale N. Patt . 2007. Line distillation: Increasing cache capacity by filtering unused words in cache lines . In International Symposium on High Performance Computer Architecture (HPCA \u201907) . Moinuddin K. Qureshi, M. Aater Suleman, and Yale N. Patt. 2007. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In International Symposium on High Performance Computer Architecture (HPCA \u201907)."},{"volume-title":"International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.","author":"Jeffrey","key":"e_1_2_1_35_1","unstructured":"Jeffrey B. Rothman and Alan Jay Smith. 2000. Sector cache design and performance . In International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. Jeffrey B. Rothman and Alan Jay Smith. 2000. Sector cache design and performance. In International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems."},{"key":"e_1_2_1_36_1","volume-title":"IEEE\/ACM International Symposium on Microarchitecture (MICRO \u201916)","author":"Sodani Avinash","year":"2016","unstructured":"Avinash Sodani , Roger Gramunt , Jesus Corbal , Ho-Seop Kim , Krishna Vinod , Sundaram Chinthamani , Steven Hutsell , Rajat Agarwal , and Yen-Chen Liu . 2016 . Knights landing (KNL): Second-generation Intel Xeon Phi product . IEEE\/ACM International Symposium on Microarchitecture (MICRO \u201916) . Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights landing (KNL): Second-generation Intel Xeon Phi product. IEEE\/ACM International Symposium on Microarchitecture (MICRO \u201916)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2016.7761635"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00068"},{"volume-title":"Workshop on General Purpose Processing Using GPUs (GPGPU \u201915)","author":"Tian Yingying","key":"e_1_2_1_39_1","unstructured":"Yingying Tian , Sooraj Puthoor , Joseph L. Greathouse , Bradford M. Beckmann , and Daniel A. Jim\u00e9nez . 2015. Adaptive GPU cache bypassing . In Workshop on General Purpose Processing Using GPUs (GPGPU \u201915) . Yingying Tian, Sooraj Puthoor, Joseph L. Greathouse, Bradford M. Beckmann, and Daniel A. Jim\u00e9nez. 2015. Adaptive GPU cache bypassing. In Workshop on General Purpose Processing Using GPUs (GPGPU \u201915)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000100"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2012.6237047"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830807"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173197"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452141","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3452141","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:07Z","timestamp":1750193287000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452141"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,10]]},"references-count":43,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,30]]}},"alternative-id":["10.1145\/3452141"],"URL":"https:\/\/doi.org\/10.1145\/3452141","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2021,5,10]]},"assertion":[{"value":"2020-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}