{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:52:30Z","timestamp":1773193950833,"version":"3.50.1"},"reference-count":177,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,11,17]],"date-time":"2022-11-17T00:00:00Z","timestamp":1668643200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>\n            Commodity DRAM-based processing-using-memory (PuM) techniques that are supported by off-the-shelf DRAM chips present an opportunity for alleviating the data movement bottleneck at low cost. However, system integration of these techniques imposes non-trivial challenges that are yet to\n            <jats:styled-content style=\"black\">be<\/jats:styled-content>\n            solve\n            <jats:styled-content style=\"black\">d<\/jats:styled-content>\n            . Potential solutions to the integration challenges require appropriate tools to develop any necessary hardware and software components. Unfortunately, current proprietary computing systems, specialized DRAM-testing platforms, or system simulators do not provide the flexibility and\/or the holistic system view that is necessary to properly evaluate and deal with the integration challenges of commodity DRAM-based PuM techniques.\n          <\/jats:p>\n          <jats:p>\n            We design and develop Processing-in-DRAM (PiDRAM),\n            <jats:styled-content style=\"black\">the first<\/jats:styled-content>\n            flexible end-to-end framework that enables system integration studies and evaluation of real, commodity DRAM-based PuM techniques. PiDRAM provides software and hardware\n            <jats:styled-content style=\"black\">components<\/jats:styled-content>\n            to rapidly integrate PuM techniques across the whole system software and hardware stack. We implement PiDRAM on an FPGA-based RISC-V system.\n            <jats:styled-content style=\"black\">To demonstrate the flexibility and ease of use of PiDRAM, we implement and evaluate two state-of-the-art commodity DRAM-based PuM techniques: (i) in-DRAM copy and initialization (RowClone) and (ii) in-DRAM true random number generation (D-RaNGe)<\/jats:styled-content>\n            . We describe how we solve key integration challenges to make such techniques work and be effective on a real-system prototype, including memory allocation, alignment, and coherence. We observe that end-to-end RowClone speeds up bulk copy and initialization operations by 14.6\u00d7 and 12.6\u00d7, respectively, over conventional CPU copy, even when coherence is supported with inefficient cache flush operations. Over PiDRAM\u2019s extensible codebase, integrating both RowClone and D-RaNGe end-to-end on a real RISC-V system prototype takes only 388 lines of Verilog code and 643 lines of C++ code.\n          <\/jats:p>","DOI":"10.1145\/3563697","type":"journal-article","created":{"date-parts":[[2022,9,14]],"date-time":"2022-09-14T13:23:18Z","timestamp":1663161798000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":30,"title":["PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5333-5726","authenticated-orcid":false,"given":"Ataberk","family":"Olgun","sequence":"first","affiliation":[{"name":"ETH Zurich, Gloriastrasse, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6514-1571","authenticated-orcid":false,"given":"Juan G\u00f3mez","family":"Luna","sequence":"additional","affiliation":[{"name":"ETH Zurich, Gloriastrasse, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2375-7490","authenticated-orcid":false,"given":"Konstantinos","family":"Kanellopoulos","sequence":"additional","affiliation":[{"name":"ETH Zurich, Gloriastrasse, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4043-5044","authenticated-orcid":false,"given":"Behzad","family":"Salami","sequence":"additional","affiliation":[{"name":"SAFARI Research Group, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9766-007X","authenticated-orcid":false,"given":"Hasan","family":"Hassan","sequence":"additional","affiliation":[{"name":"ETH Zurich, Gloriastrasse, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2701-3787","authenticated-orcid":false,"given":"Oguz","family":"Ergin","sequence":"additional","affiliation":[{"name":"TOBB University of Economics and Technology, Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0075-2312","authenticated-orcid":false,"given":"Onur","family":"Mutlu","sequence":"additional","affiliation":[{"name":"ETH Zurich, Gloriastrasse, Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11,17]]},"reference":[{"key":"e_1_3_2_2_2","volume-title":"HPCA","author":"Aga Shaizeen","year":"2017","unstructured":"Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute caches. In HPCA."},{"key":"e_1_3_2_3_2","volume-title":"ISCA","author":"Ahn Junwhan","year":"2015","unstructured":"Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA."},{"key":"e_1_3_2_4_2","volume-title":"ISCA","author":"Ahn Junwhan","year":"2015","unstructured":"Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In ISCA."},{"key":"e_1_3_2_5_2","volume-title":"ISCA","author":"Akin Berkin","year":"2015","unstructured":"Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In ISCA."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2019.2945617"},{"key":"e_1_3_2_7_2","volume-title":"GLSVLSI","author":"Angizi Shaahin","year":"2019","unstructured":"Shaahin Angizi and Deliang Fan. 2019. Graphide: A graph processing accelerator leveraging in-dram-computing. In GLSVLSI."},{"key":"e_1_3_2_8_2","volume-title":"DAC","author":"Angizi S.","year":"2018","unstructured":"S. Angizi, Z. He, and D. Fan. 2018. PIMA-logic: A novel processing-in-memory architecture for highly flexible and energy-efficient logic computation. In DAC."},{"key":"e_1_3_2_9_2","volume-title":"DAC","author":"Angizi S.","year":"2018","unstructured":"S. Angizi, A. S. Rakin, and D. Fan. 2018. CMP-PIM: An energy-efficient comparator-based processing-in-memory neural network accelerator. In DAC."},{"key":"e_1_3_2_10_2","volume-title":"DAC","author":"Angizi S.","year":"2019","unstructured":"S. Angizi, J. Sun, W. Zhang, and D. Fan. 2019. AlignS: A processing-in-memory accelerator for DNA short read alignment leveraging SOT-MRAM. In DAC."},{"key":"e_1_3_2_11_2","unstructured":"ARM. 2021. Cache Maintenance Operations. Retrieved from https:\/\/developer.arm.com\/documentation\/ddi0246\/h\/programmers-model\/register-descriptions\/cache-maintenance-operations."},{"key":"e_1_3_2_12_2","unstructured":"Krste Asanovi\u0107 Rimas Avizienis Jonathan Bachrach Scott Beamer David Biancolin Christopher Celio Henry Cook Palmer Dabbelt John R. Hauser Adam M. Izraelevitz Sagar Karandikar Benjamin Keller Donggyu Kim John Koenig Yunsup Lee Eric Love Martin Maas Albert Magyar Howard Mao Miquel Moret\u00f3 Albert Ou David A. Patterson B. H. Richards Colin Schmidt Stephen M. Twigg Huy Vo and Andrew Waterman. 2016. The rocket chip generator.Technical Report No. UCB\/EECS-2016-17."},{"key":"e_1_3_2_13_2","volume-title":"MICRO","author":"Asghari-Moghaddam Hadi","year":"2016","unstructured":"Hadi Asghari-Moghaddam, Young Hoon Son, Jung Ho Ahn, and Nam Sung Kim. 2016. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems. In MICRO."},{"key":"e_1_3_2_14_2","volume-title":"ICCE","author":"Talukder B. M. S. Bahar","year":"2019","unstructured":"B. M. S. Bahar Talukder, J. Kerns, B. Ray, T. Morris, and M. T. Rahman. 2019. Exploiting DRAM latency variations for generating true random numbers. In ICCE."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2923174"},{"key":"e_1_3_2_16_2","volume-title":"IVSW","author":"Barenghi Alessandro","year":"2018","unstructured":"Alessandro Barenghi, Luca Breveglieri, Niccol\u00f2 Izzo, and Gerardo Pelosi. 2018. Software-only reverse engineering of physical DRAM mappings for rowhammer attacks. In IVSW."},{"key":"e_1_3_2_17_2","volume-title":"MICRO","author":"Besta Maciej","year":"2021","unstructured":"Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Ber\u00e1nek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan G\u00f3mez Luna, Jakub Golinowski, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Nils Blach, Marek Konieczny, Onur Mutlu, and Torsten Hoefler. 2021. SISA: Set-centric instruction set architecture for graph mining on processing-in-memory systems. In MICRO."},{"key":"e_1_3_2_18_2","volume-title":"DATE","author":"Bhattacharjee D.","year":"2017","unstructured":"D. Bhattacharjee, R. Devadoss, and A. Chattopadhyay. 2017. ReVAMP: ReRAM based VLIW architecture for in-memory computing. In DATE."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature08940"},{"key":"e_1_3_2_21_2","volume-title":"ASPLOS","author":"Boroumand A.","year":"2018","unstructured":"A. Boroumand, S. Ghose, Y. Kim, R. Ausavarungnirun, E. Shiu, R. Thakur, D. Kim, A. Kuusela, A. Knies, P. Ranganathan, and O. Mutlu. 2018. Google workloads for consumer devices: Mitigating data movement bottlenecks. In ASPLOS."},{"key":"e_1_3_2_22_2","volume-title":"ISCA","author":"Boroumand Amirali","year":"2019","unstructured":"Amirali Boroumand, Saugata Ghose, Minesh Patel, Hasan Hassan, Brandon Lucia, Rachata Ausavarungnirun, Kevin Hsieh, Nastaran Hajinazar, Krishna T. Malladi, Hongzhong Zheng, and Onur Mutlu. 2019. CoNDA: Efficient cache coherence support for near-data accelerators. In ISCA."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2016.2577557"},{"key":"e_1_3_2_24_2","volume-title":"HPCA","author":"Bostanci F.","year":"2022","unstructured":"F. Bostanci, A. Olgun, L. Orosa, A. Yaglikci, J. S. Kim, H. Hassan, O. Ergin, and O. Mutlu. 2022. DR-STRaNGe: End-to-end system design for DRAM-based true random number generators. In HPCA."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1080\/23746149.2016.1259585"},{"key":"e_1_3_2_26_2","volume-title":"MICRO","author":"Cali Damla Senol","year":"2020","unstructured":"Damla Senol Cali, Gurpreet S. Kalsi, Z\u00fclal Bing\u00f6l, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu. 2020. GenASM: A high-performance, low-power approximate string matching acceleration framework for genome sequence analysis. In MICRO."},{"key":"e_1_3_2_27_2","volume-title":"Understanding and Improving the Latency of DRAM-based Memory Systems","author":"Chang K.","year":"2017","unstructured":"K. Chang. 2017. Understanding and Improving the Latency of DRAM-based Memory Systems. Ph. D. Dissertation. Carnegie Mellon University."},{"key":"e_1_3_2_28_2","volume-title":"SIGMETRICS","author":"Chang Kevin K.","year":"2016","unstructured":"Kevin K. Chang, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Donghyuk Lee, Tianshi Li, Gennady Pekhimenko, Samira Khan, and Onur Mutlu. 2016. Understanding latency variation in modern DRAM chips: Experimental characterization, analysis, and optimization. In SIGMETRICS."},{"key":"e_1_3_2_29_2","volume-title":"HPCA","author":"Chang Kevin K.","year":"2016","unstructured":"Kevin K. Chang, Prashant J. Nair, Donghyuk Lee, Saugata Ghose, Moinuddin K. Qureshi, and Onur Mutlu. 2016. Low-cost inter-linked subarrays (LISA): Enabling fast inter-subarray data movement in DRAM. In HPCA."},{"key":"e_1_3_2_30_2","volume-title":"SIGMETRICS","author":"Chang Kevin K.","year":"2017","unstructured":"Kevin K. Chang, Abdullah Giray Ya\u011fl\u0131k\u00e7\u0131, Saugata Ghose, Aditya Agrawal, Niladrish Chatterjee, Abhijith Kashyap, Donghyuk Lee, Mike O\u2019Connor, Hasan Hassan, and Onur Mutlu. 2017. Understanding reduced-voltage operation in modern DRAM devices: Experimental characterization, analysis, and mechanisms. In SIGMETRICS."},{"key":"e_1_3_2_31_2","volume-title":"S&P","author":"Cojocar Lucian","year":"2020","unstructured":"Lucian Cojocar, Jeremie Kim, Minesh Patel, Lillian Tsai, Stefan Saroiu, Alec Wolman, and Onur Mutlu. 2020. Are we susceptible to rowhammer? An end-to-end methodology for cloud providers. In S&P."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2821565"},{"key":"e_1_3_2_33_2","volume-title":"DAC","author":"Deng Q.","year":"2018","unstructured":"Q. Deng, L. Jiang, Y. Zhang, M. Zhang, and J. Yang. 2018. DrAcc: A DRAM based accelerator for accurate CNN inference. In DAC."},{"key":"e_1_3_2_34_2","volume-title":"DFI 5.0 Specification","author":"Group DFI","year":"2018","unstructured":"DFI Group. 2018. DFI 5.0 Specification. https:\/\/www.ddr-phy.org\/."},{"key":"e_1_3_2_35_2","volume-title":"ISCA","author":"Oliveira Mario Paulo Drumond Lages De","year":"2017","unstructured":"Mario Paulo Drumond Lages De Oliveira, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel Obando, Babak Falsafi, Boris Grot, and Dionisios Pnevmatikatos. 2017. The mondrian data engine. In ISCA."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2022.102528"},{"key":"e_1_3_2_37_2","volume-title":"ISCA","author":"Eckert Charles","year":"2018","unstructured":"Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In ISCA."},{"key":"e_1_3_2_38_2","volume-title":"HPCA","author":"Farmahini-Farahani Amin","year":"2015","unstructured":"Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In HPCA."},{"key":"e_1_3_2_39_2","volume-title":"ETS","author":"Farmani Mohammad","year":"2021","unstructured":"Mohammad Farmani, Mark Tehranipoor, and Fahim Rahman. 2021. RHAT: Efficient rowhammer-aware test for modern DRAM modules. In ETS."},{"key":"e_1_3_2_40_2","volume-title":"ICCD","author":"Fernandez Ivan","year":"2020","unstructured":"Ivan Fernandez, Ricardo Quislant, Christina Giannoula, Mohammed Alser, Juan Gomez-Luna, Eladio Gutierrez, Oscar Plata, and Onur Mutlu. 2020. NATSA: A near-data processing accelerator for time series analysis. In ICCD."},{"key":"e_1_3_2_41_2","article-title":"pLUTo: In-DRAM lookup tables to enable massively parallel general-purpose computation","author":"Ferreira Jo\u00e3o Dinis","year":"2021","unstructured":"Jo\u00e3o Dinis Ferreira, Gabriel Falcao, Juan G\u00f3mez-Luna, Mohammed Alser, Lois Orosa, Mohammad Sadrosadati, Jeremie S. Kim, Geraldo F. Oliveira, Taha Shahroodi, Anant Nori, et\u00a0al. 2021. pLUTo: In-DRAM lookup tables to enable massively parallel general-purpose computation. arXiv:2104.07699. Retrieved from https:\/\/arxiv.org\/abs\/2104.07699.","journal-title":"arXiv:2104.07699"},{"key":"e_1_3_2_42_2","volume-title":"S&P","author":"Frigo Pietro","year":"2020","unstructured":"Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victor van der Veen, Onur Mutlu, Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi. 2020. TRRespass: Exploiting the many sides of target row refresh. In S&P."},{"key":"e_1_3_2_43_2","volume-title":"ISCA","author":"Fujiki Daichi","year":"2019","unstructured":"Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2019. Duality cache for data parallel acceleration. In ISCA."},{"key":"e_1_3_2_44_2","volume-title":"DATE","author":"Gaillardon Pierre-Emmanuel","year":"2016","unstructured":"Pierre-Emmanuel Gaillardon, Luca Amar\u00fa, Anne Siemon, Eike Linn, Rainer Waser, Anupam Chattopadhyay, and Giovanni De Micheli. 2016. The programmable logic-in-memory (PLiM) computer. In DATE."},{"key":"e_1_3_2_45_2","volume-title":"MICRO","author":"Gao Fei","year":"2019","unstructured":"Fei Gao, Georgios Tziantzioulis, and David Wentzlaff. 2019. ComputeDRAM: In-memory compute using off-the-shelf DRAMs. In MICRO."},{"key":"e_1_3_2_46_2","volume-title":"PACT","author":"Gao Mingyu","year":"2015","unstructured":"Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In PACT."},{"key":"e_1_3_2_47_2","volume-title":"HPCA","author":"Gao Mingyu","year":"2016","unstructured":"Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In HPCA."},{"key":"e_1_3_2_48_2","volume-title":"ASPLOS","author":"Gao Mingyu","year":"2017","unstructured":"Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. Tetris: Scalable and efficient neural network acceleration with 3D memory. In ASPLOS."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2019.2934048"},{"key":"e_1_3_2_50_2","volume-title":"SIGMETRICS","author":"Ghose Saugata","year":"2019","unstructured":"Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, and Onur Mutlu. 2019. Demystifying complex workload-DRAM interactions: An experimental study. In SIGMETRICS."},{"key":"e_1_3_2_51_2","volume-title":"SIGMETRICS","author":"Ghose Saugata","year":"2018","unstructured":"Saugata Ghose, Abdullah Giray Yaglik\u00e7i, Raghav Gupta, Donghyuk Lee, Kais Kudrolli, William X. Liu, Hasan Hassan, Kevin K. Chang, Niladrish Chatterjee, Aditya Agrawal, Mike O\u2019Connor, and Onur Mutlu. 2018. What your DRAM power models are not telling you: Lessons from a detailed experimental study. In SIGMETRICS."},{"key":"e_1_3_2_52_2","volume-title":"HPCA","author":"Giannoula Christina","year":"2021","unstructured":"Christina Giannoula, Nandita Vijaykumar, Nikela Papadopoulou, Vasileios Karakostas, Ivan Fernandez, Juan G\u00f3mez-Luna, Lois Orosa, Nectarios Koziris, Georgios Goumas, and Onur Mutlu. 2021. SynCron: Efficient synchronization support for near-data-processing architectures. In HPCA."},{"key":"e_1_3_2_53_2","unstructured":"SAFARI Research Group. 2021. SoftMC v1.0\u2014GitHub Repository. Retrieved from https:\/\/github.com\/CMU-SAFARI\/SoftMC."},{"key":"e_1_3_2_54_2","volume-title":"ISCA","author":"Gu Boncheol","year":"2016","unstructured":"Boncheol Gu, A. S. Yoon, D.-H. Bae, I. Jo, J. Lee, J. Yoon, J.-U. Kang, M. Kwon, C. Yoon, S. Cho, J. Jeong, and D. Chang. 2016. Biscuit: A framework for near-data processing of big data workloads. In ISCA."},{"key":"e_1_3_2_55_2","volume-title":"ASPLOS","author":"Hajinazar Nastaran","year":"2021","unstructured":"Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, Jo\u00e3o Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan G\u00f3mez-Luna, and Onur Mutlu. 2021. SIMDRAM: A framework for bit-serial SIMD processing using DRAM. In ASPLOS."},{"key":"e_1_3_2_56_2","volume-title":"DATE","author":"Hamdioui S.","year":"2017","unstructured":"S. Hamdioui, S. Kvatinsky, and et al. G. Cauwenberghs. 2017. Memristor for computing: Myth or reality? In DATE."},{"key":"e_1_3_2_57_2","volume-title":"DATE","author":"Hamdioui Said","year":"2015","unstructured":"Said Hamdioui, Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Koen Bertels, Henk Corporaal, Hailong Jiao, Francky Catthoor, Dirk Wouters, Linn Eike, and Jan van Lunteren. 2015. Memristor based computation-in-memory architecture for data-intensive applications. In DATE."},{"key":"e_1_3_2_58_2","volume-title":"ISCA","author":"Hashemi Milad","year":"2016","unstructured":"Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2016. Accelerating dependent cache misses with an enhanced memory controller. In ISCA."},{"key":"e_1_3_2_59_2","volume-title":"MICRO","author":"Hashemi M.","year":"2016","unstructured":"M. Hashemi, O. Mutlu, and Y. N. Patt. 2016. Continuous runahead: Transparent hardware acceleration for memory intensive workloads. In MICRO."},{"key":"e_1_3_2_60_2","volume-title":"MICRO","author":"Hassan Hasan","year":"2021","unstructured":"Hasan Hassan, Yahya Can Tugrul, Jeremie S. Kim, Victor van der Veen, Kaveh Razavi, and Onur Mutlu. 2021. Uncovering In-DRAM rowhammer protection mechanisms: A new methodology, custom rowhammer patterns, and implications. In MICRO."},{"key":"e_1_3_2_61_2","volume-title":"HPCA","author":"Hassan Hasan","year":"2017","unstructured":"Hasan Hassan, Nandita Vijaykumar, Samira Khan, Saugata Ghose, Kevin Chang, Gennady Pekhimenko, Donghyuk Lee, Oguz Ergin, and Onur Mutlu. 2017. SoftMC: A flexible and practical open-source infrastructure for enabling experimental DRAM studies. In HPCA."},{"key":"e_1_3_2_62_2","volume-title":"MASCOTS","author":"Helm C.","year":"2020","unstructured":"C. Helm, S. Akiyama, and K. Taura. 2020. Reliable reverse engineering of intel DRAM addressing using performance counters. In MASCOTS."},{"key":"e_1_3_2_63_2","unstructured":"Marius Hillenbrand. 2017. Physical Address Decoding in Intel Xeon v3\/v4 CPUs: A Supplemental Datasheet."},{"key":"e_1_3_2_64_2","volume-title":"ISIS","author":"Horiguchi M.","year":"1997","unstructured":"M. Horiguchi. 1997. Redundancy techniques for high-density DRAMs. In ISIS."},{"key":"e_1_3_2_65_2","unstructured":"HPS Research Group. 2022. Scarab\u2013Github Repository. Retrieved from https:\/\/github.com\/hpsresearchgroup\/scarab."},{"key":"e_1_3_2_66_2","volume-title":"ISCA","author":"Hsieh Kevin","year":"2016","unstructured":"Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O\u2019Conner, Nandita Vijaykumar, Onur Mutlu, and Stephen Keckler. 2016. Transparent offloading and mapping (TOM): Enabling programmer-transparent near-data processing in GPU systems. In ISCA."},{"key":"e_1_3_2_67_2","volume-title":"ICCD","author":"Hsieh K.","year":"2016","unstructured":"K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu. 2016. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation. In ICCD."},{"key":"e_1_3_2_68_2","volume-title":"IPDPS","author":"Huang Yu","year":"2020","unstructured":"Yu Huang, Long Zheng, Pengcheng Yao, Jieshan Zhao, Xiaofei Liao, Hai Jin, and Jingling Xue. 2020. A heterogeneous PIM hardware-software co-design for energy-efficient graph processing. In IPDPS."},{"key":"e_1_3_2_69_2","unstructured":"Intel. 2011. Intel 64 and IA-32 Architectures Software Developer Manuals. Retrieved from http:\/\/www.intel.com\/content\/www\/us\/en\/processors\/architectures-software-developer-manuals.html."},{"key":"e_1_3_2_70_2","article-title":"Taking Neuromorphic Computing to the Next Level with Loihi 2","year":"2022","unstructured":"Intel. 2022. Taking Neuromorphic Computing to the Next Level with Loihi 2. Technology Brief.","journal-title":"Technology Brief"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-04478-0"},{"key":"e_1_3_2_72_2","article-title":"DDR4","year":"2012","unstructured":"JEDEC. 2012. DDR4. JEDEC Standard JESD79\u20134 (2012).","journal-title":"JEDEC Standard JESD79\u20134"},{"key":"e_1_3_2_73_2","article-title":"One-Transistor Type DRAM","author":"Kang Hee Bok","year":"2009","unstructured":"Hee Bok Kang and Suk Kyoung Hong. 2009. One-Transistor Type DRAM. US Patent 7701751.","journal-title":"US Patent 7701751"},{"key":"e_1_3_2_74_2","volume-title":"ICASSP","author":"Kang Mingu","year":"2014","unstructured":"Mingu Kang, Min-Sun Keel, Naresh R. Shanbhag, Sean Eilert, and Ken Curewitz. 2014. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM. In ICASSP."},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3097700"},{"key":"e_1_3_2_76_2","volume-title":"DRAM Circuit Design: A Tutorial","author":"Keeth B.","year":"2001","unstructured":"B. Keeth and R. J. Baker. 2001. DRAM Circuit Design: A Tutorial. Wiley."},{"key":"e_1_3_2_77_2","volume-title":"DSN","author":"Khan Samira","year":"2016","unstructured":"Samira Khan, Donghyuk Lee, and Onur Mutlu. 2016. PARBOR: An efficient system-level technique to detect data dependent failures in DRAM. In DSN."},{"key":"e_1_3_2_78_2","volume-title":"MICRO","author":"Khan Samira","year":"2017","unstructured":"Samira Khan, Chris Wilkerson, Z. Wang, Alaa Alameldeen, Donghyuk Lee, and Onur Mutlu. 2017. Detecting and mitigating data-dependent DRAM failures by exploiting current memory content. In MICRO."},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3065365"},{"key":"e_1_3_2_80_2","volume-title":"ISCA","author":"Kim Duckhwan","year":"2016","unstructured":"Duckhwan Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In ISCA."},{"key":"e_1_3_2_81_2","volume-title":"SC","author":"Kim G.","year":"2017","unstructured":"G. Kim, N. Chatterjee, M. O\u2019Connor, and K. Hsieh. 2017. Toward standardized near-data processing with unrestricted data placement for GPUs. In SC."},{"key":"e_1_3_2_82_2","volume-title":"ICCD","author":"Kim J.","year":"2018","unstructured":"J. Kim, M. Patel, H. Hassan, and O. Mutlu. 2018. Solar-DRAM: Reducing DRAM access latency by exploiting the variation in local bitlines. In ICCD."},{"key":"e_1_3_2_83_2","volume-title":"HPCA","author":"Kim J.","year":"2018","unstructured":"J. Kim, M. Patel, H. Hassan, and O. Mutlu. 2018. The DRAM latency PUF: Quickly evaluating physical unclonable functions by exploiting the latency\u2013reliability tradeoff in modern DRAM devices. In HPCA."},{"key":"e_1_3_2_84_2","volume-title":"HPCA","author":"Kim J.","year":"2019","unstructured":"J. Kim, M. Patel, H. Hassan, L. Orosa, and O. Mutlu. 2019. D-RaNGe: Using commodity DRAM devices to generate true random numbers with low latency and high throughput. In HPCA."},{"key":"e_1_3_2_85_2","volume-title":"Hot Chips","author":"Kim Jin Hyun","year":"2021","unstructured":"Jin Hyun Kim, Shin-haeng Kang, Sukhan Lee, Hyeonsu Kim, Woongjae Song, Yuhwan Ro, Seungwon Lee, David Wang, Hyunsung Shin, Bengseng Phuah, et\u00a0al. 2021. Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond. In Hot Chips."},{"key":"e_1_3_2_86_2","volume-title":"ISCA","author":"Kim Jeremie S.","year":"2020","unstructured":"Jeremie S. Kim, Minesh Patel, A. Giray Ya\u011fl\u0131k\u00e7\u0131, Hasan Hassan, Roknoddin Azizi, Lois Orosa, and Onur Mutlu. 2020. Revisiting RowHammer: An experimental analysis of modern DRAM devices and mitigation techniques. In ISCA."},{"key":"e_1_3_2_87_2","doi-asserted-by":"crossref","unstructured":"J. S. Kim D. Senol H. Xin D. Lee S. Ghose M. Alser H. Hassan O. Ergin C. Alkan and O. Mutlu. 2018. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies. 19 2 (2018) 89.","DOI":"10.1186\/s12864-018-4460-0"},{"key":"e_1_3_2_88_2","volume-title":"Architectural Techniques to Enhance DRAM Scaling","author":"Kim Yoongu","year":"2015","unstructured":"Yoongu Kim. 2015. Architectural Techniques to Enhance DRAM Scaling. Ph. D. Dissertation. Carnegie Mellon University."},{"key":"e_1_3_2_89_2","volume-title":"ISCA","author":"Kim Yoongu","year":"2014","unstructured":"Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. 2014. Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors. In ISCA."},{"key":"e_1_3_2_90_2","volume-title":"ISCA","author":"Kim Yoongu","year":"2012","unstructured":"Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Liu, and Onur Mutlu. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In ISCA."},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2015.2414456"},{"key":"e_1_3_2_92_2","volume-title":"IEEE TCAS II: Express Briefs","author":"Kvatinsky S.","year":"2014","unstructured":"S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser. 2014. MAGIC\u2014memristor-aided logic. In IEEE TCAS II: Express Briefs. 61 (2014), 895\u2013899."},{"key":"e_1_3_2_93_2","volume-title":"ICCD","author":"Kvatinsky S.","year":"2011","unstructured":"S. Kvatinsky, A. Kolodny, U. C. Weiser, and E. G. Friedman. 2011. Memristor-based IMPLY logic design procedure. In ICCD."},{"key":"e_1_3_2_94_2","doi-asserted-by":"crossref","unstructured":"S. Kvatinsky G. Satat N. Wald E. G. Friedman A. Kolodny and U. C. Weiser. 2014. Memristor-based material implication (IMPLY) logic: Design principles and methodologies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22 (2014) 2054\u20132066.","DOI":"10.1109\/TVLSI.2013.2282132"},{"key":"e_1_3_2_95_2","volume-title":"ISSCC","author":"Kwon Y.-C.","year":"2021","unstructured":"Y.-C. Kwon, S. H. Lee, J. Lee, S.-H. Kwon, J. M. Ryu, J.-P. Son, O. Seongil, H.-S. Yu, H. Lee, S. Y. Kim, Y. Cho, J. G. Kim, J. Choi, H.-S. Shin, J. Kim, B. Phuah, H. Kim, M. J. Song, A. Choi, D. Kim, S. Kim, E.-B. Kim, D. Wang, S. Kang, Y. Ro, S. Seo, J. Song, J. Youn, K. Sohn, and N. S. Kim. 2021. 25.4 A 20nm 6GB function-in-memory dram, based on HBM2 with a 1.2TFLOPS programmable computing unit using bank-level parallelism, for machine learning applications. In ISSCC."},{"key":"e_1_3_2_96_2","volume-title":"Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity","author":"Lee D.","year":"2016","unstructured":"D. Lee. 2016. Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Ph. D. Dissertation. Carnegie Mellon University."},{"key":"e_1_3_2_97_2","volume-title":"SIGMETRICS","author":"Lee D.","year":"2017","unstructured":"D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, and O. Mutlu. 2017. Design-induced latency variation in modern DRAM chips: Characterization, analysis, and latency reduction mechanisms. In SIGMETRICS."},{"key":"e_1_3_2_98_2","volume-title":"HPCA","author":"Lee Donghyuk","year":"2015","unstructured":"Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, and Onur Mutlu. 2015. Adaptive-latency DRAM: Optimizing DRAM timing for the common-case. In HPCA."},{"key":"e_1_3_2_99_2","volume-title":"HPCA","author":"Lee Donghyuk","year":"2013","unstructured":"Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, and Onur Mutlu. 2013. Tiered-latency DRAM: A low latency and low cost DRAM architecture. In HPCA."},{"key":"e_1_3_2_100_2","volume-title":"DaMoN","author":"Lee Donghun","year":"2022","unstructured":"Donghun Lee, Jinin So, Minseon Ahn, Jong-Geon Lee, Jungmin Kim, Jeonghyeon Cho, Rebholz Oliver, Vishnu Charan Thummala, Ravi shankar JV, Sachin Suresh Upadhya, et\u00a0al. 2022. Improving in-memory database operations with acceleration DIMM (AxDIMM). In DaMoN."},{"key":"e_1_3_2_101_2","volume-title":"PACT","author":"Lee Donghyuk","year":"2015","unstructured":"Donghyuk Lee, Lavanya Subramanian, Rachata Ausavarungnirun, Jongmoo Choi, and Onur Mutlu. 2015. Decoupled direct memory access: Isolating CPU and IO traffic by leveraging a dual-data-port DRAM. In PACT."},{"key":"e_1_3_2_102_2","volume-title":"ISSCC","author":"Lee Seongju","year":"2022","unstructured":"Seongju Lee, Kyuyoung Kim, Sanghoon Oh, Joonhong Park, Gimoon Hong, Dongyoon Ka, Kyudong Hwang, Jeongje Park, Kyeongpil Kang, Jungyeon Kim, Junyeol Jeon, Nahsung Kim, Yongkee Kwon, Kornijcuk Vladimir, Woojae Shin, Jongsoon Won, Minkyu Lee, Hyunha Joo, Haerang Choi, Jaewook Lee, Donguc Ko, Younggun Jun, Keewon Cho, Ilwoong Kim, Choungki Song, Chunseok Jeong, Daehan Kwon, Jieun Jang, Il Park, Junhyun Chun, and Joohwan Cho. 2022. A 1ynm 1.25V 8Gb, 16Gb\/s\/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning applications. In ISSCC."},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.mejo.2014.06.006"},{"key":"e_1_3_2_104_2","volume-title":"MICRO","author":"Li S.","year":"2017","unstructured":"S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie. 2017. DRISA: A DRAM-based reconfigurable in-situ accelerator. In MICRO."},{"key":"e_1_3_2_105_2","volume-title":"DAC","author":"Li Shuangchen","year":"2016","unstructured":"Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In DAC."},{"key":"e_1_3_2_106_2","article-title":"calloc(3p)\u2013Linux Manual Page","author":"Project Linux man-pages","year":"2022","unstructured":"Linux man-pages Project. 2022. calloc(3p)\u2013Linux Manual Page. Retrieved from https:\/\/man7.org\/linux\/man-pages\/man3\/calloc.3p.html.","journal-title":"https:\/\/man7.org\/linux\/man-pages\/man3\/calloc.3p.html."},{"key":"e_1_3_2_107_2","article-title":"malloc(3)\u2013Linux Manual Page","author":"Project Linux man-pages","year":"2022","unstructured":"Linux man-pages Project. 2022. malloc(3)\u2013Linux Manual Page. Retrieved from https:\/\/man7.org\/linux\/man-pages\/man3\/malloc.3.html.","journal-title":"https:\/\/man7.org\/linux\/man-pages\/man3\/malloc.3.html."},{"key":"e_1_3_2_108_2","article-title":"memcpy(3)\u2013Linux Manual Page","author":"Project Linux man-pages","year":"2022","unstructured":"Linux man-pages Project. 2022. memcpy(3)\u2013Linux Manual Page. Retrieved from https:\/\/man7.org\/linux\/man-pages\/man3\/memcpy.3.html.","journal-title":"https:\/\/man7.org\/linux\/man-pages\/man3\/memcpy.3.html."},{"key":"e_1_3_2_109_2","article-title":"posix_memalign(3)\u2013Linux Manual Page","author":"Project Linux man-pages","year":"2022","unstructured":"Linux man-pages Project. 2022. posix_memalign(3)\u2013Linux Manual Page. Retrieved from https:\/\/man7.org\/linux\/man-pages\/man3\/posix_memalign.3.html.","journal-title":"https:\/\/man7.org\/linux\/man-pages\/man3\/posix_memalign.3.html."},{"key":"e_1_3_2_110_2","article-title":"perf: Linux Profiling with Performance Counters","author":"Wiki Linux","year":"2021","unstructured":"Linux Wiki. 2021. perf: Linux Profiling with Performance Counters. Retrieved from https:\/\/perf.wiki.kernel.org\/index.php\/Main_Page.","journal-title":"https:\/\/perf.wiki.kernel.org\/index.php\/Main_Page."},{"key":"e_1_3_2_111_2","volume-title":"ISCA","author":"Liu Jamie","year":"2013","unstructured":"Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu. 2013. An experimental study of data retention behavior in modern DRAM devices: Implications for retention time profiling mechanisms. In ISCA."},{"key":"e_1_3_2_112_2","volume-title":"SPAA","author":"Liu Zhiyu","year":"2017","unstructured":"Zhiyu Liu, Irina Calciu, Maurice Herlihy, and Onur Mutlu. 2017. Concurrent data structures for near-memory computing. In SPAA."},{"key":"e_1_3_2_113_2","volume-title":"MICRO","author":"Lu Shih-Lien","year":"2015","unstructured":"Shih-Lien Lu, Ying-Chen Lin, and Chia-Lin Yang. 2015. Improving DRAM latency with dynamic asymmetric subarray. In MICRO."},{"key":"e_1_3_2_114_2","volume-title":"ISCA","author":"Luo Haocong","year":"2020","unstructured":"Haocong Luo, Taha Shahroodi, Hasan Hassan, Minesh Patel, A. Giray Yaglikci, Lois Orosa, Jisung Park, and Onur Mutlu. 2020. CLR-DRAM: A low-cost DRAM architecture enabling dynamic capacity-latency trade-off. In ISCA."},{"key":"e_1_3_2_115_2","doi-asserted-by":"publisher","DOI":"10.1147\/rd.462.0187"},{"key":"e_1_3_2_116_2","article-title":"FT20X","year":"2022","unstructured":"Maxwell. 2022. FT20X. Retrieved from https:\/\/www.maxwell-fa.com\/upload\/files\/base\/8\/m\/311.pdf.","journal-title":"https:\/\/www.maxwell-fa.com\/upload\/files\/base\/8\/m\/311.pdf."},{"key":"e_1_3_2_117_2","unstructured":"Micron. 2016. DDR4 SDRAM Datasheet."},{"key":"e_1_3_2_118_2","article-title":"DDR3 SDRAM: MT41J128M8","year":"2018","unstructured":"Micron. 2018. DDR3 SDRAM: MT41J128M8. Data Sheet.","journal-title":"Data Sheet"},{"key":"e_1_3_2_119_2","doi-asserted-by":"publisher","DOI":"10.1145\/2686875"},{"key":"e_1_3_2_120_2","volume-title":"DATE","author":"Mosanu Sergiu","year":"2022","unstructured":"Sergiu Mosanu, Mohammad Nazmus Sakib, Tommy II, Ersin Cukurtas, Alif Ahmed, Preslav Ivanov, Samira Khan, Kevin Skadron, and Mircea Stan. 2022. PiMulator: A fast and flexible processing-in-memory emulation platform. In DATE."},{"key":"e_1_3_2_121_2","volume-title":"Emerging Computing: From Devices to Systems\u2014Looking Beyond Moore and Von Neumann","author":"Mutlu Onur","year":"2021","unstructured":"Onur Mutlu, Saugata Ghose, Juan G\u00f3mez-Luna, and Rachata Ausavarungnirun. 2021. A modern primer on processing in memory. In Emerging Computing: From Devices to Systems\u2014Looking Beyond Moore and Von Neumann. Springer, Singapore, 171\u2013243."},{"key":"e_1_3_2_122_2","volume-title":"HPCA","author":"Nai Lifeng","year":"2017","unstructured":"Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. GraphPIM: Enabling instruction-level PIM offloading in graph computing frameworks. In HPCA."},{"key":"e_1_3_2_123_2","volume-title":"ISSCC","author":"Niu Dimin","year":"2022","unstructured":"Dimin Niu, Shuangchen Li, Yuhao Wang, Wei Han, Zhe Zhang, Yijin Guan, Tianchan Guan, Fei Sun, Fei Xue, Lide Duan, et\u00a0al. 2022. 184QPS\/W 64Mb\/mm 2 3D Logic-to-DRAM hybrid bonding with process-near-memory engine for recommendation system. In ISSCC."},{"key":"e_1_3_2_124_2","article-title":"QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips","author":"Olgun Ataberk","year":"2021","unstructured":"Ataberk Olgun, Minesh Patel, A. Giray Yaglikci, Haocong Luo, Jeremie S. Kim, Nisa Bostanci, Nandita Vijaykumar, Oguz Ergin, and Onur Mutlu. 2021. QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips. arXiv:2105.08955. Retrieved from https:\/\/arxiv.org\/abs\/2105.08955.","journal-title":"arXiv:2105.08955"},{"key":"e_1_3_2_125_2","volume-title":"ISCA","author":"Olgun Ataberk","year":"2021","unstructured":"Ataberk Olgun, Minesh Patel, A. Giray Ya\u011fl\u0131k\u00e7\u0131, Haocong Luo, Jeremie S. Kim, F. Nisa Bostanc\u0131, Nandita Vijaykumar, O\u011fuz Ergin, and Onur Mutlu. 2021. QUAC-TRNG: High-throughput true random number generation using quadruple row activation in commodity DRAM chips. In ISCA."},{"key":"e_1_3_2_126_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3110993"},{"key":"e_1_3_2_127_2","volume-title":"ISCA","author":"Orosa Lois","year":"2021","unstructured":"Lois Orosa, Yaohua Wang, Mohammad Sadrosadati, Jeremie S. Kim, Minesh Patel, Ivan Puddu, Haocong Luo, Kaveh Razavi, Juan G\u00f3mez-Luna, Hasan Hassan, Nika Mansouri-Ghiasi, Saugata Ghose, and Onur Mutlu. 2021. CODIC: A low-cost substrate for enabling custom In-DRAM functionalities and optimizations. In ISCA."},{"key":"e_1_3_2_128_2","volume-title":"MICRO","author":"Orosa Lois","year":"2021","unstructured":"Lois Orosa, Abdullah Giray Yaglikci, Haocong Luo, Ataberk Olgun, Jisung Park, Hasan Hassan, Minesh Patel, Jeremie S. Kim, and Onur Mutlu. 2021. A deeper look into RowHammer\u2019s sensitivities: Experimental analysis of real DRAM chipsand implications on future attacks and defenses. In MICRO."},{"key":"e_1_3_2_129_2","volume-title":"ISCA","author":"Patel Minesh","year":"2017","unstructured":"Minesh Patel, Jeremie S. Kim, and Onur Mutlu. 2017. The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions. In ISCA."},{"key":"e_1_3_2_130_2","volume-title":"MICRO","author":"Patel Minesh","year":"2020","unstructured":"Minesh Patel, Jeremie S. Kim, Taha Shahroodi, Hasan Hassan, and Onur Mutlu. 2020. Bit-exact ECC recovery (BEER): Determining DRAM On-Die ECC functions by exploiting DRAM data retention characteristics. In MICRO."},{"key":"e_1_3_2_131_2","article-title":"A case for transparent reliability in DRAM systems","author":"Patel Minesh","year":"2022","unstructured":"Minesh Patel, Taha Shahroodi, Aditya Manglik, A. Giray Yaglikci, Ataberk Olgun, Haocong Luo, and Onur Mutlu. 2022. A case for transparent reliability in DRAM systems. arXiv:2204.10378. Retrieved from https:\/\/arxiv.org\/abs\/2204.10378.","journal-title":"arXiv:2204.10378"},{"key":"e_1_3_2_132_2","volume-title":"PACT","author":"Pattnaik Ashutosh","year":"2016","unstructured":"Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, and Chita R. Das. 2016. Scheduling techniques for GPU architectures with processing-in-memory capabilities. In PACT."},{"key":"e_1_3_2_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2014.2299539"},{"key":"e_1_3_2_134_2","volume-title":"ISPASS","author":"Pugsley Seth H.","year":"2014","unstructured":"Seth H. Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads. In ISPASS."},{"key":"e_1_3_2_135_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.2990599"},{"key":"e_1_3_2_136_2","unstructured":"RISC-V. 2022. RISC-V Proxy Kernel. Retrieved from https:\/\/github.com\/riscv\/riscv-pk."},{"key":"e_1_3_2_137_2","doi-asserted-by":"publisher","DOI":"10.1145\/3465371"},{"key":"e_1_3_2_138_2","article-title":"Ramulator: A DRAM Simulator\u2013GitHub Repository","author":"Group SAFARI Research","year":"2015","unstructured":"SAFARI Research Group. 2015. Ramulator: A DRAM Simulator\u2013GitHub Repository. Retrieved from https:\/\/github.com\/CMU-SAFARI\/ramulator\/.","journal-title":"https:\/\/github.com\/CMU-SAFARI\/ramulator\/."},{"key":"e_1_3_2_139_2","article-title":"DAMOV\u2013GitHub Repository","author":"Group SAFARI Research","year":"2021","unstructured":"SAFARI Research Group. 2021. DAMOV\u2013GitHub Repository. Retrieved from https:\/\/github.com\/CMU-SAFARI\/DAMOV.","journal-title":"https:\/\/github.com\/CMU-SAFARI\/DAMOV."},{"key":"e_1_3_2_140_2","unstructured":"SAFARI Research Group. 2021. Ramulator-PIM: A Processing-in-Memory Simulation Framework\u2013GitHub Repository. Retrieved from https:\/\/github.com\/CMU-SAFARI\/ramulator-pim."},{"key":"e_1_3_2_141_2","volume-title":"ISCA","author":"Sanchez Daniel","year":"2013","unstructured":"Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In ISCA."},{"key":"e_1_3_2_142_2","volume-title":"IRPS","author":"Saroiu Stefan","year":"2022","unstructured":"Stefan Saroiu, Alec Wolman, and Lucian Cojocar. 2022. The price of secrecy: How hiding internal DRAM topologies hurts rowhammer defenses. In IRPS."},{"key":"e_1_3_2_143_2","volume-title":"Simple DRAM and Virtual Memory Abstractions to Enable Highly Efficient Memory Systems","author":"Seshadri V.","year":"2016","unstructured":"V. Seshadri. 2016. Simple DRAM and Virtual Memory Abstractions to Enable Highly Efficient Memory Systems. Ph. D. Dissertation. Carnegie Mellon University."},{"key":"e_1_3_2_144_2","volume-title":"ISCA","author":"Seshadri Vivek","year":"2014","unstructured":"Vivek Seshadri, Abhishek Bhowmick, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2014. The dirty-block index. In ISCA."},{"key":"e_1_3_2_145_2","volume-title":"CAL","author":"Seshadri Vivek","year":"2015","unstructured":"Vivek Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. 2015. Fast bulk bitwise AND and OR in DRAM. In CAL."},{"key":"e_1_3_2_146_2","volume-title":"MICRO","author":"Seshadri Vivek","year":"2013","unstructured":"Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Michael A. Kozuch, Phillip B. Gibbons, and Todd C. Mowry. 2013. RowClone: Fast and energy-efficient In-DRAM bulk data copy and initialization. In MICRO."},{"key":"e_1_3_2_147_2","article-title":"Buddy-RAM: Improving the performance and efficiency of bulk bitwise operations using DRAM","author":"Seshadri V.","year":"2016","unstructured":"V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. 2016. Buddy-RAM: Improving the performance and efficiency of bulk bitwise operations using DRAM. arXiv:1611.09988. Retrieved from https:\/\/arxiv.org\/abs\/1611.09988.","journal-title":"arXiv:1611.09988"},{"key":"e_1_3_2_148_2","volume-title":"MICRO","author":"Seshadri V.","year":"2017","unstructured":"V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In MICRO."},{"key":"e_1_3_2_149_2","article-title":"The processing using memory paradigm: In-DRAM bulk copy, initialization, bitwise AND and OR","author":"Seshadri Vivek","year":"2016","unstructured":"Vivek Seshadri and Onur Mutlu. 2016. The processing using memory paradigm: In-DRAM bulk copy, initialization, bitwise AND and OR. arXiv:1610.09603. Retrieved from https:\/\/arxiv.org\/abs\/1610.09603.","journal-title":"arXiv:1610.09603"},{"key":"e_1_3_2_150_2","volume-title":"Advances in Computers, Volume 106","author":"Seshadri Vivek","year":"2017","unstructured":"Vivek Seshadri and Onur Mutlu. 2017. Simple operations in memory to reduce data movement. In Advances in Computers, Volume 106."},{"key":"e_1_3_2_151_2","article-title":"In-DRAM bulk bitwise execution engine","author":"Seshadri Vivek","year":"2020","unstructured":"Vivek Seshadri and Onur Mutlu. 2020. In-DRAM bulk bitwise execution engine. arXiv:1905.09822. Retrieved from https:\/\/arxiv.org\/abs\/1905.09822.","journal-title":"arXiv:1905.09822"},{"key":"e_1_3_2_152_2","volume-title":"ISCA","author":"Shafiee Ali","year":"2016","unstructured":"Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In ISCA."},{"key":"e_1_3_2_153_2","volume-title":"DAC","author":"Singh Gagandeep","year":"2019","unstructured":"Gagandeep Singh, Juan Gomez-Luna, Giovanni Mariani, Geraldo F. Oliveira, Stefano Corda, Sander Stujik, Onur Mutlu, and Henk Corporaal. 2019. NAPEL: Near-memory computing application performance prediction via ensemble learning. In DAC."},{"key":"e_1_3_2_154_2","unstructured":"Standard Performance Evaluation Corp.2006. SPEC CPU 2006. Retrieved from http:\/\/www.spec.org\/cpu2006."},{"key":"e_1_3_2_155_2","volume-title":"HOST","author":"Talukder B. S. Bahar","year":"2020","unstructured":"B. S. Bahar Talukder, V. Menon, B. Ray, T. Neal, and M. Rahman. 2020. Towards the avoidance of counterfeit memory: Identifying the DRAM origin. In HOST."},{"key":"e_1_3_2_156_2","volume-title":"NANOARCH","author":"Testa Eleonora","year":"2016","unstructured":"Eleonora Testa, Mathias Soeken, Odysseas Zografos, Luca Amaru, Praveen Raghavan, Rudy Lauwereins, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. 2016. Inversion optimization in majority-inverter graphs. In NANOARCH."},{"key":"e_1_3_2_157_2","article-title":"PL & PL-P Series DC Power Supplies Data Sheet\u2013Issue 5","year":"2022","unstructured":"TTi. 2022. PL & PL-P Series DC Power Supplies Data Sheet\u2013Issue 5. Retrieved from https:\/\/resources.aimtti.com\/datasheets\/AIM-PL+PL-P_series_DC_power_supplies_data_sheet-Iss5.pdf.","journal-title":"https:\/\/resources.aimtti.com\/datasheets\/AIM-PL+PL-P_series_DC_power_supplies_data_sheet-Iss5.pdf"},{"key":"e_1_3_2_158_2","unstructured":"UPMEM. 2018. Introduction to UPMEM PIM. Processing-in-memory (PIM) on DRAM Accelerator."},{"key":"e_1_3_2_159_2","unstructured":"A. J. van de Goor and I. Schanstra. 2002. Address and data scrambling: Causes and impact on memory tests. In IEEE International Workshop on Electronic Design Test and Applications ."},{"key":"e_1_3_2_160_2","volume-title":"MICRO","author":"Wang Yaohua","year":"2020","unstructured":"Yaohua Wang, Lois Orosa, Xiangjun Peng, Yang Guo, Saugata Ghose, Minesh Patel, Jeremie S. Kim, Juan G\u00f3mez Luna, Mohammad Sadrosadati, Nika Mansouri Ghiasi, and Onur Mutlu. 2020. FIGARO: Improving system performance via fine-grained In-DRAM data relocation and caching. In MICRO."},{"key":"e_1_3_2_161_2","unstructured":"Andrew Waterman and Krste Asanovic. 2021. The RISC-V Instruction Set Manual. Retrieved from https:\/\/riscv.org\/wp-content\/uploads\/2019\/06\/riscv-spec.pdf."},{"key":"e_1_3_2_162_2","volume-title":"DaMoN","author":"Xi Sam (Likun)","year":"2015","unstructured":"Sam (Likun) Xi, Oreoluwa Babarinsa, Manos Athanassoulis, and Stratos Idreos. 2015. Beyond the wall: Near-data processing for databases. In DaMoN."},{"key":"e_1_3_2_163_2","volume-title":"ICCD","author":"Xie Lei","year":"2015","unstructured":"Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Said Hamdioui, and Koen Bertels. 2015. Fast boolean logic mapped on memristor crossbar. In ICCD."},{"key":"e_1_3_2_164_2","volume-title":"7 Series FPGAs Memory Interface Solutions","year":"2011","unstructured":"Xilinx. 2011. 7 Series FPGAs Memory Interface Solutions."},{"key":"e_1_3_2_165_2","volume-title":"Vivado Design Suite: Using Constraints","year":"2021","unstructured":"Xilinx. 2021. Vivado Design Suite: Using Constraints."},{"key":"e_1_3_2_166_2","unstructured":"Xilinx. 2021. Xilinx Ultrascale+ MPSoC. Retrieved from https:\/\/www.xilinx.com\/products\/silicon-devices\/soc\/zynq-ultrascale-mpsoc.html."},{"key":"e_1_3_2_167_2","unstructured":"Xilinx. 2021. Xilinx Zynq-7000 SoC ZC706 Evaluation Kit. Retrieved from https:\/\/www.xilinx.com\/products\/boards-and-kits\/ek-z7-zc706-g.html."},{"key":"e_1_3_2_168_2","volume-title":"HPCA","author":"Xin Xin","year":"2020","unstructured":"Xin Xin, Youtao Zhang, and Jun Yang. 2020. ELP2IM: Efficient and low power bitwise operation processing in DRAM. In HPCA."},{"key":"e_1_3_2_169_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2018.2885752"},{"key":"e_1_3_2_170_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2021.3061905"},{"key":"e_1_3_2_171_2","volume-title":"DATE","author":"Yu Jintao","year":"2018","unstructured":"Jintao Yu, Hoang Anh Du Nguyen, Lei Xie, Mottaqiallah Taouil, and Said Hamdioui. 2018. Memristive devices for computation-in-memory. In DATE."},{"key":"e_1_3_2_172_2","volume-title":"VLSIC","author":"Zha Yue","year":"2019","unstructured":"Yue Zha, Etienne Nowak, and Jing Li. 2019. Liquid silicon: A nonvolatile fully programmable processing-in-memory processor with monolithically integrated ReRAM for big data\/machine learning applications. In VLSIC."},{"key":"e_1_3_2_173_2","volume-title":"HPDC","author":"Zhang D. P.","year":"2014","unstructured":"D. P. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski. 2014. TOP-PIM: Throughput-oriented programmable processing in memory. In HPDC."},{"key":"e_1_3_2_174_2","doi-asserted-by":"publisher","DOI":"10.1145\/3409114"},{"key":"e_1_3_2_175_2","volume-title":"CICA","author":"Zhang Liang","year":"2022","unstructured":"Liang Zhang and Li Shen. 2022. PIM-HBMSim: A processing in memory simulator based on high bandwidth memory. In CICA."},{"key":"e_1_3_2_176_2","volume-title":"HPCA","author":"Zhang Mingxing","year":"2018","unstructured":"Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing communication for PIM-based graph processing with efficient data partition. In HPCA."},{"key":"e_1_3_2_177_2","volume-title":"HPEC","author":"Zhu Qiuling","year":"2013","unstructured":"Qiuling Zhu, Tobias Graf, H Ekin Sumbul, Larry Pileggi, and Franz Franchetti. 2013. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware. In HPEC."},{"key":"e_1_3_2_178_2","volume-title":"MICRO","author":"Zhuo Youwei","year":"2019","unstructured":"Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-based graph processing. In MICRO."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3563697","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3563697","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:38:09Z","timestamp":1750178289000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3563697"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,17]]},"references-count":177,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3563697"],"URL":"https:\/\/doi.org\/10.1145\/3563697","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,17]]},"assertion":[{"value":"2021-12-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-07-14","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}