{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:25:42Z","timestamp":1750220742405,"version":"3.41.0"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T00:00:00Z","timestamp":1601424000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100003696","name":"ETRI","doi-asserted-by":"crossref","award":["IITP\/KEIT[2014-3-00035]"],"award-info":[{"award-number":["IITP\/KEIT[2014-3-00035]"]}],"id":[{"id":"10.13039\/501100003696","id-type":"DOI","asserted-by":"crossref"}]},{"name":"NSF","award":["1253700, 1916817, 1337147, CNS-1749711"],"award-info":[{"award-number":["1253700, 1916817, 1337147, CNS-1749711"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,12,31]]},"abstract":"<jats:p>\n            We propose\n            <jats:sc>ecoTLB<\/jats:sc>\n            \u2014software-based eventual translation lookaside buffer (TLB) coherence\u2014which eliminates the overhead of the synchronous TLB shootdown mechanism in operating systems that use address space identifiers (ASIDs). With an eventual TLB coherence,\n            <jats:sc>ecoTLB<\/jats:sc>\n            improves the performance of\n            <jats:italic>free<\/jats:italic>\n            and\n            <jats:italic>page swap<\/jats:italic>\n            operations by removing the inter-processor interrupt (IPI) overheads incurred to invalidate TLB entries. We show that the TLB shootdown has implications for page swapping in particular in emerging, disaggregated data centers and demonstrate that\n            <jats:sc>ecoTLB<\/jats:sc>\n            can improve both the performance and the specific swapping policy decisions using\n            <jats:sc>ecoTLB<\/jats:sc>\n            \u2019s asynchronous mechanism. We demonstrate that\n            <jats:sc>ecoTLB<\/jats:sc>\n            improves the performance of real-world applications, such as Memcached and Make, that perform page swapping using\n            <jats:sc>Infiniswap<\/jats:sc>\n            , a solution for next generation data centers that use disaggregated memory, by up to 17.2%. Moreover,\n            <jats:sc>ecoTLB<\/jats:sc>\n            improves the 99th percentile tail latency of Memcached by up to 70.8% due to its asynchronous scheme and improved policy decisions. Furthermore, we show that recent features to improve security in the Linux kernel, like kernel page table isolation (KPTI), can result in significant performance overheads on architectures without support for specific instructions to clear single entries in tagged TLBs, falling back to full TLB flushes. In this scenario,\n            <jats:sc>ecoTLB<\/jats:sc>\n            is able to recover the performance lost for supporting KPTI due to its asynchronous shootdown scheme and its support for tagged TLBs. Finally, we demonstrate that\n            <jats:sc>ecoTLB<\/jats:sc>\n            improves the performance of free operations by up to 59.1% on a 120-core machine and improves the performance of Apache on a 16-core machine by up to 13.7% compared to baseline Linux, and by up to 48.2% compared to ABIS, a recent state-of-the-art research prototype that reduces the number of IPIs.\n          <\/jats:p>","DOI":"10.1145\/3409454","type":"journal-article","created":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T11:23:50Z","timestamp":1601465030000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["<scp>ECO<\/scp>\n            TLB"],"prefix":"10.1145","volume":"17","author":[{"given":"Steffen","family":"Maass","sequence":"first","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, USA"}]},{"given":"Mohan Kumar","family":"Kumar","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, USA"}]},{"given":"Taesoo","family":"Kim","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, USA"}]},{"given":"Tushar","family":"Krishna","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, USA"}]},{"given":"Abhishek","family":"Bhattacharjee","sequence":"additional","affiliation":[{"name":"Yale University, New Haven, CT, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,9,30]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201917)","author":"Amit Nadav","year":"2017","unstructured":"Nadav Amit . 2017 . Optimizing the TLB shootdown algorithm with page access tracking . In Proceedings of the USENIX Annual Technical Conference (ATC\u201917) . 27--39. Nadav Amit. 2017. Optimizing the TLB shootdown algorithm with page access tracking. In Proceedings of the USENIX Annual Technical Conference (ATC\u201917). 27--39."},{"volume-title":"Don\u2019t shoot down TLB shootdowns! In Proceedings of the 15th European Conference on Computer Systems (EuroSys\u201920). 1--14","author":"Amit Nadav","unstructured":"Nadav Amit , Amy Tai , and Michael Wei . 2020. Don\u2019t shoot down TLB shootdowns! In Proceedings of the 15th European Conference on Computer Systems (EuroSys\u201920). 1--14 . Nadav Amit, Amy Tai, and Michael Wei. 2020. Don\u2019t shoot down TLB shootdowns! In Proceedings of the 15th European Conference on Computer Systems (EuroSys\u201920). 1--14.","key":"e_1_2_1_2_1"},{"unstructured":"Lukasz Anaczkowski. 2016. Linux VM workaround for Knights Landing A\/D leak. Retrieved from https:\/\/lkml.org\/lkml\/2016\/6\/14\/505.  Lukasz Anaczkowski. 2016. Linux VM workaround for Knights Landing A\/D leak. Retrieved from https:\/\/lkml.org\/lkml\/2016\/6\/14\/505.","key":"e_1_2_1_3_1"},{"unstructured":"Ravi Arimilli Guy Guthrie and Kirk Livingston. 2004. Multiprocessor system supporting multiple outstanding TLBI operations per partition. Retrieved from https:\/\/www.google.com\/patents\/US20040215898 US Patent App. 10\/425 425.  Ravi Arimilli Guy Guthrie and Kirk Livingston. 2004. Multiprocessor system supporting multiple outstanding TLBI operations per partition. Retrieved from https:\/\/www.google.com\/patents\/US20040215898 US Patent App. 10\/425 425.","key":"e_1_2_1_4_1"},{"unstructured":"ARM. 2014. ARM Compiler Reference Guide: TLBI. Retrieved from http:\/\/infocenter.arm.com\/help\/index.jsp?topic=\/com.arm.doc.dui0802b\/TLBI_SYS.html.  ARM. 2014. ARM Compiler Reference Guide: TLBI. Retrieved from http:\/\/infocenter.arm.com\/help\/index.jsp?topic=\/com.arm.doc.dui0802b\/TLBI_SYS.html.","key":"e_1_2_1_5_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.1145\/2254756.2254766"},{"volume-title":"Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201917)","author":"Awad Amro","unstructured":"Amro Awad , Arkaprava Basu , Sergey Blagodurov , Yan Solihin , and Gabriel H. Loh . 2017. Avoiding TLB shootdowns through self-invalidating TLB entries . In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201917) . 273--287. Amro Awad, Arkaprava Basu, Sergey Blagodurov, Yan Solihin, and Gabriel H. Loh. 2017. Avoiding TLB shootdowns through self-invalidating TLB entries. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201917). 273--287.","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201992)","author":"Balan Ramesh","year":"1992","unstructured":"Ramesh Balan and Kurt Gollhard . 1992 . A scalable implementation of virtual memory HAT layer for shared memory multiprocessor machine . In Proceedings of the USENIX Annual Technical Conference (ATC\u201992) . 107--115. Ramesh Balan and Kurt Gollhard. 1992. A scalable implementation of virtual memory HAT layer for shared memory multiprocessor machine. In Proceedings of the USENIX Annual Technical Conference (ATC\u201992). 107--115."},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1145\/3015146"},{"volume-title":"Proceedings of the 26th IEEE Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Baruah T.","unstructured":"T. Baruah , Y. Sun , A. T. Din\u00e7er , S. A. Mojumder , J. L. Abell\u00e1n , Y. Ukidave , A. Joshi , N. Rubin , J. Kim , and D. Kaeli . 2020. Griffin: Hardware-software support for efficient page migration in multi-GPU systems . In Proceedings of the 26th IEEE Symposium on High Performance Computer Architecture (HPCA\u201920) . 596--609. T. Baruah, Y. Sun, A. T. Din\u00e7er, S. A. Mojumder, J. L. Abell\u00e1n, Y. Ukidave, A. Joshi, N. Rubin, J. Kim, and D. Kaeli. 2020. Griffin: Hardware-software support for efficient page migration in multi-GPU systems. In Proceedings of the 26th IEEE Symposium on High Performance Computer Architecture (HPCA\u201920). 596--609.","key":"e_1_2_1_10_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_11_1","DOI":"10.1145\/1629575.1629579"},{"volume-title":"Proceedings of the 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201918)","author":"Bharadwaj S.","unstructured":"S. Bharadwaj , G. Cox , T. Krishna , and A. Bhattacharjee . 2018. Scalable distributed last-level TLBs using low-latency interconnects . In Proceedings of the 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201918) . 271--284. S. Bharadwaj, G. Cox, T. Krishna, and A. Bhattacharjee. 2018. Scalable distributed last-level TLBs using low-latency interconnects. In Proceedings of the 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201918). 271--284.","key":"e_1_2_1_12_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_13_1","DOI":"10.1145\/3037697.3037705"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.1109\/HPCA.2011.5749717"},{"volume-title":"Proceedings of the 3rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201989)","author":"Black David L.","unstructured":"David L. Black , Richard F. Rashid , David B. Golub , Charles R. Hill , and Robert V. Baron . 1989. Translation lookaside buffer consistency: A software approach . In Proceedings of the 3rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201989) . 113--122. David L. Black, Richard F. Rashid, David B. Golub, Charles R. Hill, and Robert V. Baron. 1989. Translation lookaside buffer consistency: A software approach. In Proceedings of the 3rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201989). 113--122.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201908)","author":"Boyd-Wickizer Silas","year":"2008","unstructured":"Silas Boyd-Wickizer , Haibo Chen , Rong Chen , Yandong Mao , Frans Kaashoek , Robert Morris , Aleksey Pesterev , Lex Stein , Ming Wu , Yuehua Dai , Yang Zhang , and Zheng Zhang . 2008 . Corey: An operating system for many cores . In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201908) . 43--57. Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, and Zheng Zhang. 2008. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201908). 43--57."},{"doi-asserted-by":"publisher","key":"e_1_2_1_17_1","DOI":"10.1145\/2465351.2465373"},{"unstructured":"Jonathan Corbet. 2017. The current state of kernel page-table isolation. Retrieved from https:\/\/lwn.net\/Articles\/741878\/.  Jonathan Corbet. 2017. The current state of kernel page-table isolation. Retrieved from https:\/\/lwn.net\/Articles\/741878\/.","key":"e_1_2_1_18_1"},{"unstructured":"Christopher Covington. 2016. arm64: Work around Falkor erratum 1003. Retrieved from https:\/\/lkml.org\/lkml\/2016\/12\/29\/267.  Christopher Covington. 2016. arm64: Work around Falkor erratum 1003. Retrieved from https:\/\/lkml.org\/lkml\/2016\/12\/29\/267.","key":"e_1_2_1_19_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_20_1","DOI":"10.1145\/3037697.3037704"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 26th USENIX Security Symposium (USENIX Security\u201917)","author":"Dang Thurston H. Y.","year":"2017","unstructured":"Thurston H. Y. Dang , Petros Maniatis , and David Wagner . 2017 . Oscar: A practical page-permissions-based scheme for thwarting dangling pointers . In Proceedings of the 26th USENIX Security Symposium (USENIX Security\u201917) . 815--832. Thurston H. Y. Dang, Petros Maniatis, and David Wagner. 2017. Oscar: A practical page-permissions-based scheme for thwarting dangling pointers. In Proceedings of the 26th USENIX Security Symposium (USENIX Security\u201917). 815--832."},{"unstructured":"Linux Kernel Driver Database. 2017. CONFIG_ARM_ERRATA_720789. Retrieved from http:\/\/cateee.net\/lkddb\/web-lkddb\/ARM_ERRATA_720789.html.  Linux Kernel Driver Database. 2017. CONFIG_ARM_ERRATA_720789. Retrieved from http:\/\/cateee.net\/lkddb\/web-lkddb\/ARM_ERRATA_720789.html.","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Gao Peter X.","year":"2016","unstructured":"Peter X. Gao , Akshay Narayan , Sagar Karandikar , Joao Carreira , Sangjin Han , Rachit Agarwal , Sylvia Ratnasamy , and Scott Shenker . 2016 . Network requirements for resource disaggregation . In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916) . 249--264. Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Network requirements for resource disaggregation. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916). 249--264."},{"unstructured":"Will Glozer. 2015. wrk - a HTTP benchmarking tool. Retrieved from https:\/\/github.com\/wg\/wrk.  Will Glozer. 2015. wrk - a HTTP benchmarking tool. Retrieved from https:\/\/github.com\/wg\/wrk.","key":"e_1_2_1_24_1"},{"unstructured":"Google. 2018. CPU Platforms. Retrieved from https:\/\/cloud.google.com\/compute\/docs\/cpu-platforms.  Google. 2018. CPU Platforms. Retrieved from https:\/\/cloud.google.com\/compute\/docs\/cpu-platforms.","key":"e_1_2_1_25_1"},{"unstructured":"Intel. 2010. Intel 64 Architecture x2APIC Specification. Retrieved from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/download\/intel-64-architecture-x2apic-specification.html.  Intel. 2010. Intel 64 Architecture x2APIC Specification. Retrieved from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/download\/intel-64-architecture-x2apic-specification.html.","key":"e_1_2_1_26_1"},{"unstructured":"Intel 2017. 5-Level Paging and 5-Level EPT. Retrieved from https:\/\/software.intel.com\/sites\/default\/files\/managed\/2b\/80\/5-level_paging_white_paper.pdf.  Intel 2017. 5-Level Paging and 5-Level EPT. Retrieved from https:\/\/software.intel.com\/sites\/default\/files\/managed\/2b\/80\/5-level_paging_white_paper.pdf.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201917)","author":"Juncheng Gu","year":"2017","unstructured":"Gu Juncheng , Lee Youngmoon , Zhang Yiwen , Chowdhury Mosharaf , and Shin Kang . 2017 . Efficient memory disaggregation with INFINISWAP . In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201917) . Gu Juncheng, Lee Youngmoon, Zhang Yiwen, Chowdhury Mosharaf, and Shin Kang. 2017. Efficient memory disaggregation with INFINISWAP. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201917)."},{"volume-title":"Proceedings of the 22nd IEEE Symposium on High Performance Computer Architecture (HPCA\u201916)","author":"Karakostas Vasileios","unstructured":"Vasileios Karakostas , Jayneel Gandhi , Adri\u00e1n Cristal , Mark D. Hill , Kathryn S. McKinley , Mario Nemirovsky , Michael M. Swift , and Osman S . \u00dcnsal. 2016. Energy-efficient address translation . In Proceedings of the 22nd IEEE Symposium on High Performance Computer Architecture (HPCA\u201916) . 631--643. Vasileios Karakostas, Jayneel Gandhi, Adri\u00e1n Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman S. \u00dcnsal. 2016. Energy-efficient address translation. In Proceedings of the 22nd IEEE Symposium on High Performance Computer Architecture (HPCA\u201916). 631--643.","key":"e_1_2_1_29_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.1145\/2901318.2901337"},{"doi-asserted-by":"publisher","key":"e_1_2_1_31_1","DOI":"10.1145\/3173162.3173198"},{"doi-asserted-by":"publisher","key":"e_1_2_1_32_1","DOI":"10.1145\/1772690.1772751"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Kwon Youngjin","year":"2016","unstructured":"Youngjin Kwon , Hangchen Yu , Simon Peter , Christopher J. Rossbach , and Emmett Witchel . 2016 . Coordinated and efficient huge page management with Ingens . In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916) . 705--721. Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and efficient huge page management with Ingens. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916). 705--721."},{"key":"e_1_2_1_34_1","volume-title":"Mutilate: High-Performance Memcached Load Generator.","author":"Leverich Jacob","year":"2017","unstructured":"Jacob Leverich . 2017 . Mutilate: High-Performance Memcached Load Generator. Retrieved from https:\/\/github.com\/leverich\/mutilate. Jacob Leverich. 2017. Mutilate: High-Performance Memcached Load Generator. Retrieved from https:\/\/github.com\/leverich\/mutilate."},{"key":"e_1_2_1_35_1","volume-title":"ArXiv e-prints (Jan","author":"Lipp Moritz","year":"2018","unstructured":"Moritz Lipp , Michael Schwarz , Daniel Gruss , Thomas Prescher , Werner Haas , Stefan Mangard , Paul Kocher , Daniel Genkin , Yuval Yarom , and Mike Hamburg . 2018. Meltdown. ArXiv e-prints (Jan . 2018 ). arxiv:1801.01207. Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. 2018. Meltdown. ArXiv e-prints (Jan. 2018). arxiv:1801.01207."},{"doi-asserted-by":"publisher","key":"e_1_2_1_36_1","DOI":"10.1145\/2445572.2445574"},{"doi-asserted-by":"publisher","key":"e_1_2_1_37_1","DOI":"10.1145\/2872362.2872399"},{"doi-asserted-by":"publisher","key":"e_1_2_1_38_1","DOI":"10.1145\/3064176.3064191"},{"unstructured":"Mellanox 2017. ConnectX-3 Single\/Dual-Port Adapter with VPI. Retrieved from http:\/\/www.mellanox.com\/page\/products_dyn?product_family=1198mtag=connectx_3_vpi.  Mellanox 2017. ConnectX-3 Single\/Dual-Port Adapter with VPI. Retrieved from http:\/\/www.mellanox.com\/page\/products_dyn?product_family=1198mtag=connectx_3_vpi.","key":"e_1_2_1_39_1"},{"unstructured":"Memcached 2017. A high-performance distributed memory object caching system. Retrieved from http:\/\/memcached.org\/.  Memcached 2017. A high-performance distributed memory object caching system. Retrieved from http:\/\/memcached.org\/.","key":"e_1_2_1_40_1"},{"unstructured":"Timothy Prickett Morgan. 2017. AMD Disrupts the Two-Socket Server Status Quo. Retrieved from https:\/\/www.nextplatform.com\/2017\/05\/17\/amd-disrupts-two-socket-server-status-quo\/.  Timothy Prickett Morgan. 2017. AMD Disrupts the Two-Socket Server Status Quo. Retrieved from https:\/\/www.nextplatform.com\/2017\/05\/17\/amd-disrupts-two-socket-server-status-quo\/.","key":"e_1_2_1_41_1"},{"volume-title":"Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201915)","author":"Oskin Mark","unstructured":"Mark Oskin and Gabriel H. Loh . 2015. A software-managed approach to die-stacked DRAM . In Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201915) . 188--200. Mark Oskin and Gabriel H. Loh. 2015. A software-managed approach to die-stacked DRAM. In Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201915). 188--200.","key":"e_1_2_1_42_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_43_1","DOI":"10.1109\/ISCA45697.2020.00079"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS\u201992)","author":"Peacock J. Kent","year":"1992","unstructured":"J. Kent Peacock , Sunil Saxena , Dean Thomas , Fred Yang , and Wilfred Yu . 1992 . Experiences from multithreading system V release 4 . In Proceedings of the Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS\u201992) . 77--91. J. Kent Peacock, Sunil Saxena, Dean Thomas, Fred Yang, and Wilfred Yu. 1992. Experiences from multithreading system V release 4. In Proceedings of the Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS\u201992). 77--91."},{"doi-asserted-by":"publisher","key":"e_1_2_1_45_1","DOI":"10.1109\/MICRO.2012.32"},{"volume-title":"Proceedings of the 20th IEEE Symposium on High Performance Computer Architecture (HPCA). 558--567","author":"Pham Binh","unstructured":"Binh Pham , Abhishek Bhattacharjee , Yasuko Eckert , and Gabriel H. Loh . 2014. Increasing TLB reach by exploiting clustering in page translations . In Proceedings of the 20th IEEE Symposium on High Performance Computer Architecture (HPCA). 558--567 . Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. In Proceedings of the 20th IEEE Symposium on High Performance Computer Architecture (HPCA). 558--567.","key":"e_1_2_1_46_1"},{"key":"e_1_2_1_47_1","first-page":"1","article-title":"TLB shootdown mitigation for low-power, many-core servers with L1 virtual caches","volume":"17","author":"Pham Binh","year":"2017","unstructured":"Binh Pham , Derek Hower , Abhishek Bhattacharjee , and Trey Cain . 2017 . TLB shootdown mitigation for low-power, many-core servers with L1 virtual caches . IEEE Comput. Archit. Lett. 17 , 1 (June 2017). Binh Pham, Derek Hower, Abhishek Bhattacharjee, and Trey Cain. 2017. TLB shootdown mitigation for low-power, many-core servers with L1 virtual caches. IEEE Comput. Archit. Lett. 17, 1 (June 2017).","journal-title":"IEEE Comput. Archit. Lett."},{"doi-asserted-by":"publisher","key":"e_1_2_1_48_1","DOI":"10.1145\/2541940.2541942"},{"volume-title":"Proceedings of the 20th IEEE Symposium on High Performance Computer Architecture (HPCA\u201914)","author":"Power Jason","unstructured":"Jason Power , Mark D. Hill , and David A. Wood . 2014. Supporting x86-64 address translation for 100s of GPU lanes . In Proceedings of the 20th IEEE Symposium on High Performance Computer Architecture (HPCA\u201914) . 568--578. Jason Power, Mark D. Hill, and David A. Wood. 2014. Supporting x86-64 address translation for 100s of GPU lanes. In Proceedings of the 20th IEEE Symposium on High Performance Computer Architecture (HPCA\u201914). 568--578.","key":"e_1_2_1_49_1"},{"volume-title":"Proceedings of the 15th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201910)","author":"Romanescu Bogdan F.","unstructured":"Bogdan F. Romanescu , Alvin R. Lebeck , and Daniel J. Sorin . 2010. Specifying and dynamically verifying address translation-aware memory consistency . In Proceedings of the 15th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201910) . 323--334. Bogdan F. Romanescu, Alvin R. Lebeck, and Daniel J. Sorin. 2010. Specifying and dynamically verifying address translation-aware memory consistency. In Proceedings of the 15th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201910). 323--334.","key":"e_1_2_1_50_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_51_1","DOI":"10.1109\/HPCA.2010.5416643"},{"unstructured":"ScyllaDB. 2015. Memcached Benchmark. Retrieved from https:\/\/github.com\/scylladb\/seastar\/wiki\/Memcached-Benchmark.  ScyllaDB. 2015. Memcached Benchmark. Retrieved from https:\/\/github.com\/scylladb\/seastar\/wiki\/Memcached-Benchmark.","key":"e_1_2_1_52_1"},{"unstructured":"Anand Lal Shimpi. 2008. AMD\u2019s B3 stepping Phenom previewed TLB hardware fix tested. Retrieved from http:\/\/www.anandtech.com\/show\/2477\/2.  Anand Lal Shimpi. 2008. AMD\u2019s B3 stepping Phenom previewed TLB hardware fix tested. Retrieved from http:\/\/www.anandtech.com\/show\/2477\/2.","key":"e_1_2_1_53_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_54_1","DOI":"10.1109\/2.55498"},{"doi-asserted-by":"publisher","key":"e_1_2_1_55_1","DOI":"10.1109\/HICSS.1988.11765"},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the 38th ACM\/IEEE International Symposium on Computer Architecture (ISCA\u201911)","author":"Rixner Scott","year":"2011","unstructured":"Scott Rixner , Thomas Barr , and Alan Cox . 2011 . SpecTLB: A mechanism for speculative address translation . In Proceedings of the 38th ACM\/IEEE International Symposium on Computer Architecture (ISCA\u201911) . 307--318. Scott Rixner, Thomas Barr, and Alan Cox. 2011. SpecTLB: A mechanism for speculative address translation. In Proceedings of the 38th ACM\/IEEE International Symposium on Computer Architecture (ISCA\u201911). 307--318."},{"volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201988)","author":"Thompson Michael Y.","unstructured":"Michael Y. Thompson , J. M. Barton , T. A. Jermoluk , and J. C. Wagner . 1988. Translation lookaside buffer synchronization in a multiprocessor system . In Proceedings of the USENIX Annual Technical Conference (ATC\u201988) . Michael Y. Thompson, J. M. Barton, T. A. Jermoluk, and J. C. Wagner. 1988. Translation lookaside buffer synchronization in a multiprocessor system. In Proceedings of the USENIX Annual Technical Conference (ATC\u201988).","key":"e_1_2_1_57_1"},{"unstructured":"Linus Torvalds. 2017. Linux Kernel. Retrieved from https:\/\/github.com\/torvalds\/linux.  Linus Torvalds. 2017. Linux Kernel. Retrieved from https:\/\/github.com\/torvalds\/linux.","key":"e_1_2_1_58_1"},{"unstructured":"Theo Valich. 2007. Intel explains the Core 2 CPU errata. Retrieved from http:\/\/www.theinquirer.net\/inquirer\/news\/1031406\/intel-explains-core-cpu-errata.  Theo Valich. 2007. Intel explains the Core 2 CPU errata. Retrieved from http:\/\/www.theinquirer.net\/inquirer\/news\/1031406\/intel-explains-core-cpu-errata.","key":"e_1_2_1_59_1"},{"volume-title":"Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201911)","author":"Villavieja Carlos","unstructured":"Carlos Villavieja , Vasileios Karakostas , Lluis Vilanova , Yoav Etsion , Alex Ramirez , Avi Mendelson , Nacho Navarro , Adri\u00e1n Cristal , and Osman S . \u00dcnsal. 2011. DiDi: Mitigating the performance impact of TLB shootdowns using a shared TLB directory . In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201911) . 340--349. Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adri\u00e1n Cristal, and Osman S. \u00dcnsal. 2011. DiDi: Mitigating the performance impact of TLB shootdowns using a shared TLB directory. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201911). 340--349.","key":"e_1_2_1_60_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_61_1","DOI":"10.1145\/3310133"},{"doi-asserted-by":"publisher","key":"e_1_2_1_62_1","DOI":"10.1145\/3079856.3080211"},{"doi-asserted-by":"publisher","key":"e_1_2_1_63_1","DOI":"10.1145\/3297858.3304024"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3409454","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3409454","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3409454","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:38:40Z","timestamp":1750199920000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3409454"}},"subtitle":["Eventually Consistent TLBs"],"short-title":[],"issued":{"date-parts":[[2020,9,30]]},"references-count":63,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,12,31]]}},"alternative-id":["10.1145\/3409454"],"URL":"https:\/\/doi.org\/10.1145\/3409454","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2020,9,30]]},"assertion":[{"value":"2019-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}