{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:35:33Z","timestamp":1772724933815,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":69,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,17]],"date-time":"2023-06-17T00:00:00Z","timestamp":1686960000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"publisher","award":["IJCI-2017-33945"],"award-info":[{"award-number":["IJCI-2017-33945"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"publisher","award":["PID2019-105660RB-C21"],"award-info":[{"award-number":["PID2019-105660RB-C21"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010067","name":"Gobierno de Arag\u00f3n","doi-asserted-by":"publisher","award":["T58_20R"],"award-info":[{"award-number":["T58_20R"]}],"id":[{"id":"10.13039\/501100010067","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,17]]},"DOI":"10.1145\/3579371.3589065","type":"proceedings-article","created":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T20:25:28Z","timestamp":1686947128000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8337-6326","authenticated-orcid":false,"given":"V\u00edctor","family":"Soria-Pardos","sequence":"first","affiliation":[{"name":"Barcelona Supercomputing Center, Barcelona, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2869-668X","authenticated-orcid":false,"given":"Adri\u00e0","family":"Armejach","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center, Barcelona, Spain"},{"name":"Universitat Polit\u00e8cnica de Catalunya, Barcelona, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6515-0312","authenticated-orcid":false,"given":"Tiago","family":"M\u00fcck","sequence":"additional","affiliation":[{"name":"Arm, Austin, Texas, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7490-4067","authenticated-orcid":false,"given":"Dario","family":"Su\u00e1rez-Gracia","sequence":"additional","affiliation":[{"name":"Universidad de Zaragoza, Zaragoza, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3571-5562","authenticated-orcid":false,"given":"Jos\u00e9","family":"Joao","sequence":"additional","affiliation":[{"name":"Arm, Austin, Texas, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1282-8887","authenticated-orcid":false,"given":"Alejandro","family":"Rico","sequence":"additional","affiliation":[{"name":"AMD, Austin, Texas, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9848-8758","authenticated-orcid":false,"given":"Miquel","family":"Moret\u00f3","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya, Barcelona, Spain"},{"name":"Barcelona Supercomputing Center, Barcelona, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,6,17]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"2","article-title":"WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication","volume":"44","author":"Abadal Sergi","year":"2016","unstructured":"Sergi Abadal , Albert Cabellos-Aparicio , Eduard Alarcon , and Josep Torrellas . 2016 . WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication . SIGARCH Comput. Archit. News 44 , 2 (mar 2016), 3--17. Sergi Abadal, Albert Cabellos-Aparicio, Eduard Alarcon, and Josep Torrellas. 2016. WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication. SIGARCH Comput. Archit. News 44, 2 (mar 2016), 3--17.","journal-title":"SIGARCH Comput. Archit. News"},{"key":"e_1_3_2_1_2_1","volume-title":"2010 39th International Conference on Parallel Processing. 267--276","author":"Abell\u00e1n Jose L.","unstructured":"Jose L. Abell\u00e1n , Juan Fern\u00e1ndez , and Manuel E. Acacio . 2010. A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs . In 2010 39th International Conference on Parallel Processing. 267--276 . Jose L. Abell\u00e1n, Juan Fern\u00e1ndez, and Manuel E. Acacio. 2010. A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs. In 2010 39th International Conference on Parallel Processing. 267--276."},{"key":"e_1_3_2_1_3_1","volume-title":"GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs. In 2011 IEEE International Parallel Distributed Processing Symposium. 893--905","author":"Abell\u00e1n Jose L.","unstructured":"Jose L. Abell\u00e1n , Juan Fern\u00e1ndez , and Manuel E. Acacio . 2011 . GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs. In 2011 IEEE International Parallel Distributed Processing Symposium. 893--905 . Jose L. Abell\u00e1n, Juan Fern\u00e1ndez, and Manuel E. Acacio. 2011. GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs. In 2011 IEEE International Parallel Distributed Processing Symposium. 893--905."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2011.304"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/225830.223985"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1019751632622"},{"key":"e_1_3_2_1_7_1","volume-title":"Powered by AWS Graviton3 Processors. https:\/\/aws.amazon.com\/blogs\/aws\/new-amazon-ec2-c7g-instances-powered-by-aws-graviton3-processors\/. [Online","author":"Services Amazon Web","year":"2022","unstructured":"Amazon Web Services . 2021. New - Amazon EC2 C7g Instances , Powered by AWS Graviton3 Processors. https:\/\/aws.amazon.com\/blogs\/aws\/new-amazon-ec2-c7g-instances-powered-by-aws-graviton3-processors\/. [Online ; accessed 30- July - 2022 ]. Amazon Web Services. 2021. New - Amazon EC2 C7g Instances, Powered by AWS Graviton3 Processors. https:\/\/aws.amazon.com\/blogs\/aws\/new-amazon-ec2-c7g-instances-powered-by-aws-graviton3-processors\/. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_8_1","volume-title":"Arm Neoverse N2 Core Technical Reference Manual. https:\/\/developer.arm.com\/documentation\/102099\/0001\/The-Neoverse-N2--core. [Online","author":"Arm","year":"2021","unstructured":"Arm holdings. 2020. Arm Neoverse N2 Core Technical Reference Manual. https:\/\/developer.arm.com\/documentation\/102099\/0001\/The-Neoverse-N2--core. [Online ; accessed 2- December - 2021 ]. Arm holdings. 2020. Arm Neoverse N2 Core Technical Reference Manual. https:\/\/developer.arm.com\/documentation\/102099\/0001\/The-Neoverse-N2--core. [Online; accessed 2-December-2021]."},{"key":"e_1_3_2_1_9_1","volume-title":"AMBA 5 CHI Architecture Specification. https:\/\/developer.arm.com\/architectures\/system-architectures\/amba\/amba-5. [Online","author":"Holdings Arm","year":"2022","unstructured":"Arm Holdings . 2021. AMBA 5 CHI Architecture Specification. https:\/\/developer.arm.com\/architectures\/system-architectures\/amba\/amba-5. [Online ; accessed 30- July - 2022 ]. Arm Holdings. 2021. AMBA 5 CHI Architecture Specification. https:\/\/developer.arm.com\/architectures\/system-architectures\/amba\/amba-5. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the 49th Annual International Symposium on Computer Architecture","author":"Asgharzadeh Ashkan","year":"2022","unstructured":"Ashkan Asgharzadeh , Juan M. Cebrian , Arthur Perais , Stefanos Kaxiras , and Alberto Ros . 2022 . Free Atomics: Hardware Atomic Operations without Fences . In Proceedings of the 49th Annual International Symposium on Computer Architecture ( New York, New York) (ISCA '22). Association for Computing Machinery, New York, NY, USA, 14--26. Ashkan Asgharzadeh, Juan M. Cebrian, Arthur Perais, Stefanos Kaxiras, and Alberto Ros. 2022. Free Atomics: Hardware Atomic Operations without Fences. In Proceedings of the 49th Annual International Symposium on Computer Architecture (New York, New York) (ISCA '22). Association for Computing Machinery, New York, NY, USA, 14--26."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306792"},{"key":"e_1_3_2_1_12_1","unstructured":"Scott Beamer Krste Asanovi\u0107 and David Patterson. 2015. The GAP Benchmark Suite. Scott Beamer Krste Asanovi\u0107 and David Patterson. 2015. The GAP Benchmark Suite."},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the 1990 ACM\/IEEE Conference on Supercomputing","author":"Carl","unstructured":"Carl J. Beckmann and Constantine D. Polychronopoulos. 1990. Fast Barrier Synchronization Hardware . In Proceedings of the 1990 ACM\/IEEE Conference on Supercomputing ( New York, New York, USA) (Supercomputing '90). IEEE Computer Society Press, Washington, DC, USA, 180--189. Carl J. Beckmann and Constantine D. Polychronopoulos. 1990. Fast Barrier Synchronization Hardware. In Proceedings of the 1990 ACM\/IEEE Conference on Supercomputing (New York, New York, USA) (Supercomputing '90). IEEE Computer Society Press, Washington, DC, USA, 180--189."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454128"},{"key":"e_1_3_2_1_15_1","volume-title":"AMD Releases Milan-X CPUs With 3D V-Cache: EPYC 7003 Up to 64 Cores and 768 MB L3 Cache. https:\/\/www.anandtech.com\/show\/17323\/amd-releases-milan-x-cpus-with-3d-vcache-epyc-7003. [Online","author":"Bonshor Gavin","year":"2022","unstructured":"Gavin Bonshor . 2021. AMD Releases Milan-X CPUs With 3D V-Cache: EPYC 7003 Up to 64 Cores and 768 MB L3 Cache. https:\/\/www.anandtech.com\/show\/17323\/amd-releases-milan-x-cpus-with-3d-vcache-epyc-7003. [Online ; accessed 30- July - 2022 ]. Gavin Bonshor. 2021. AMD Releases Milan-X CPUs With 3D V-Cache: EPYC 7003 Up to 64 Cores and 768 MB L3 Cache. https:\/\/www.anandtech.com\/show\/17323\/amd-releases-milan-x-cpus-with-3d-vcache-epyc-7003. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_16_1","unstructured":"Gary Bradski and Adrian Kaehler. 2008. Learning OpenCV: Computervision with the OpenCV library. O'Reilly. Gary Bradski and Adrian Kaehler. 2008. Learning OpenCV: Computervision with the OpenCV library. O'Reilly."},{"key":"e_1_3_2_1_17_1","volume-title":"PNG Files Florida State University. https:\/\/people.sc.fsu.edu\/~jburkardt\/data\/png\/bmp_24.png. [Online","author":"Burkardt John","year":"2022","unstructured":"John Burkardt . 2021. PNG Files Florida State University. https:\/\/people.sc.fsu.edu\/~jburkardt\/data\/png\/bmp_24.png. [Online ; accessed 30- July - 2022 ]. John Burkardt. 2021. PNG Files Florida State University. https:\/\/people.sc.fsu.edu\/~jburkardt\/data\/png\/bmp_24.png. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_18_1","volume-title":"std::Atomic Library. https:\/\/en.cppreference.com\/w\/cpp\/header\/atomic. [Online","author":"The C++ Standards Committee","year":"2023","unstructured":"The C++ Standards Committee . 2023. std::Atomic Library. https:\/\/en.cppreference.com\/w\/cpp\/header\/atomic. [Online ; accessed 12- April - 2023 ]. The C++ Standards Committee. 2023. std::Atomic Library. https:\/\/en.cppreference.com\/w\/cpp\/header\/atomic. [Online; accessed 12-April-2023]."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"A. de Dios B. Sahelices P. Ib\u00e1\u00f1ez V. Vi\u00f1als and J. M. Llaber\u00eda. 2006. Speeding-Up Synchronizations in DSM Multiprocessors. In Euro-Par 2006 Parallel Processing Wolfgang E. Nagel Wolfgang V. Walter and Wolfgang Lehner (Eds.). Springer Berlin Heidelberg Berlin Heidelberg 473--484. A. de Dios B. Sahelices P. Ib\u00e1\u00f1ez V. Vi\u00f1als and J. M. Llaber\u00eda. 2006. Speeding-Up Synchronizations in DSM Multiprocessors. In Euro-Par 2006 Parallel Processing Wolfgang E. Nagel Wolfgang V. Walter and Wolfgang Lehner (Eds.). Springer Berlin Heidelberg Berlin Heidelberg 473--484.","DOI":"10.1007\/11823285_49"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.1974.1050511"},{"key":"e_1_3_2_1_22_1","unstructured":"DIMACS. 2006. The Ninth DIMACS challange on shortest paths. http:\/\/www.dis.uniroma1.it\/challenge9\/.. [Online; accessed 30-July-2022]. DIMACS. 2006. The Ninth DIMACS challange on shortest paths. http:\/\/www.dis.uniroma1.it\/challenge9\/.. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_23_1","volume-title":"Proceedings of the 34th ACM International Conference on Supercomputing","author":"Dimi\u0107 V.","unstructured":"V. Dimi\u0107 , M. Moret\u00f3 , M. Casas , J. Ciesko , and M. Valero . 2020. RICH: Implementing Reductions in the Cache Hierarchy . In Proceedings of the 34th ACM International Conference on Supercomputing ( Barcelona, Spain) (ICS '20). Association for Computing Machinery, New York, NY, USA, Article 16, 13 pages. V. Dimi\u0107, M. Moret\u00f3, M. Casas, J. Ciesko, and M. Valero. 2020. RICH: Implementing Reductions in the Cache Hierarchy. In Proceedings of the 34th ACM International Conference on Supercomputing (Barcelona, Spain) (ICS '20). Association for Computing Machinery, New York, NY, USA, Article 16, 13 pages."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1274971.1275004"},{"key":"e_1_3_2_1_25_1","volume-title":"WiDir: A Wireless-Enabled Directory Cache Coherence Protocol. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 304--317","author":"Franques Antonio","year":"2021","unstructured":"Antonio Franques , Apostolos Kokolis , Sergi Abadal , Vimuth Fernando , Sasa Misailovic , and Josep Torrellas . 2021 . WiDir: A Wireless-Enabled Directory Cache Coherence Protocol. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 304--317 . Antonio Franques, Apostolos Kokolis, Sergi Abadal, Vimuth Fernando, Sasa Misailovic, and Josep Torrellas. 2021. WiDir: A Wireless-Enabled Directory Cache Coherence Protocol. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 304--317."},{"key":"e_1_3_2_1_26_1","volume-title":"The Ampere Altra Max Review: Pushing it to 128 Cores per Socket. https:\/\/www.anandtech.com\/show\/16979\/the-ampere-altra-max-review-pushing-it-to-128-cores-per-socket. [Online","author":"Frumusanu Andrei","year":"2022","unstructured":"Andrei Frumusanu . 2021. The Ampere Altra Max Review: Pushing it to 128 Cores per Socket. https:\/\/www.anandtech.com\/show\/16979\/the-ampere-altra-max-review-pushing-it-to-128-cores-per-socket. [Online ; accessed 30- July - 2022 ]. Andrei Frumusanu. 2021. The Ampere Altra Max Review: Pushing it to 128 Cores per Socket. https:\/\/www.anandtech.com\/show\/16979\/the-ampere-altra-max-review-pushing-it-to-128-cores-per-socket. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_27_1","volume-title":"Fujitsu A64FX Datasheet. https:\/\/www.fujitsu.com\/downloads\/SUPER\/a64fx\/a64fx_datasheet_en.pdf. [Online","year":"2022","unstructured":"Fujitsu. 2021. Fujitsu A64FX Datasheet. https:\/\/www.fujitsu.com\/downloads\/SUPER\/a64fx\/a64fx_datasheet_en.pdf. [Online ; accessed 30- July - 2022 ]. Fujitsu. 2021. Fujitsu A64FX Datasheet. https:\/\/www.fujitsu.com\/downloads\/SUPER\/a64fx\/a64fx_datasheet_en.pdf. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_28_1","volume-title":"gem5 CHI Protocol. https:\/\/www.gem5.org\/documentation\/general_docs\/ruby\/CHI\/. [Online","author":"Holdings Arm","year":"2022","unstructured":"gem5 and Arm Holdings . 2021. gem5 CHI Protocol. https:\/\/www.gem5.org\/documentation\/general_docs\/ruby\/CHI\/. [Online ; accessed 30- July - 2022 ]. gem5 and Arm Holdings. 2021. gem5 CHI Protocol. https:\/\/www.gem5.org\/documentation\/general_docs\/ruby\/CHI\/. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_29_1","volume-title":"MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture","author":"G\u00f3mez-Hern\u00e1ndez E. J.","unstructured":"E. J. G\u00f3mez-Hern\u00e1ndez , J. M. Cebrian , R. Titos-Gil , S. Kaxiras , and A. Ros . 2021. Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations . In MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture ( Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 337--349. E. J. G\u00f3mez-Hern\u00e1ndez, J. M. Cebrian, R. Titos-Gil, S. Kaxiras, and A. Ros. 2021. Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations. In MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 337--349."},{"key":"e_1_3_2_1_30_1","volume-title":"Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems","author":"Goodman James R.","unstructured":"James R. Goodman , Mary K. Vernon , and Philip J. Woest . 1989. Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors . In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems ( Boston, Massachusetts, USA) (ASPLOS III). Association for Computing Machinery, New York, NY, USA, 64--75. James R. Goodman, Mary K. Vernon, and Philip J. Woest. 1989. Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, Massachusetts, USA) (ASPLOS III). Association for Computing Machinery, New York, NY, USA, 64--75."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/285930.285983"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/165123.165164"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"H. Hoffmann D. Wentzlaff and A. Agarwal. 2010. Remote Store Programming. In High Performance Embedded Architectures and Compilers. Springer Berlin Heidelberg Berlin Heidelberg 3--17. H. Hoffmann D. Wentzlaff and A. Agarwal. 2010. Remote Store Programming. In High Performance Embedded Architectures and Compilers. Springer Berlin Heidelberg Berlin Heidelberg 3--17.","DOI":"10.1007\/978-3-642-11515-8_3"},{"key":"e_1_3_2_1_34_1","volume-title":"Armv8 Architecture Reference Manual for A-profile architecture. https:\/\/developer.arm.com\/documentation\/ddi0487\/ha\/?lang=en. [Online","author":"Holdings Arm","year":"2022","unstructured":"Arm Holdings . 2022. Armv8 Architecture Reference Manual for A-profile architecture. https:\/\/developer.arm.com\/documentation\/ddi0487\/ha\/?lang=en. [Online ; accessed 30- July - 2022 ]. Arm Holdings. 2022. Armv8 Architecture Reference Manual for A-profile architecture. https:\/\/developer.arm.com\/documentation\/ddi0487\/ha\/?lang=en. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_35_1","volume-title":"Do near or far atomics give the best performance on Neoverse systems? https:\/\/developer.arm.com\/documentation\/ka004706\/latest\/. [Online","author":"Holdings Arm","year":"2022","unstructured":"Arm Holdings . 2022. Do near or far atomics give the best performance on Neoverse systems? https:\/\/developer.arm.com\/documentation\/ka004706\/latest\/. [Online ; accessed 30- July - 2022 ]. Arm Holdings. 2022. Do near or far atomics give the best performance on Neoverse systems? https:\/\/developer.arm.com\/documentation\/ka004706\/latest\/. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_36_1","volume-title":"AWS Graviton2-Powered EC2 Instances Now Available. https:\/\/www.hpcwire.com\/2020\/06\/12\/aws-graviton2-powered-ec2-instances\/. [Online","author":"Wire HPC","year":"2021","unstructured":"HPC Wire . 2020. AWS Graviton2-Powered EC2 Instances Now Available. https:\/\/www.hpcwire.com\/2020\/06\/12\/aws-graviton2-powered-ec2-instances\/. [Online ; accessed 2- December - 2021 ]. HPC Wire. 2020. AWS Graviton2-Powered EC2 Instances Now Available. https:\/\/www.hpcwire.com\/2020\/06\/12\/aws-graviton2-powered-ec2-instances\/. [Online; accessed 2-December-2021]."},{"key":"e_1_3_2_1_37_1","volume-title":"Proceedings of the 25th Annual International Symposium on Computer Architecture","author":"Keckler Stephen W.","unstructured":"Stephen W. Keckler , William J. Dally , Daniel Maskit , Nicholas P. Carter , Andrew Chang , and Whay S. Lee . 1998. Exploiting Fine-Grain Thread Level Parallelism on the MIT Multi-ALU Processor . In Proceedings of the 25th Annual International Symposium on Computer Architecture ( Barcelona, Spain) (ISCA '98). IEEE Computer Society, USA, 306--317. Stephen W. Keckler, William J. Dally, Daniel Maskit, Nicholas P. Carter, Andrew Chang, and Whay S. Lee. 1998. Exploiting Fine-Grain Thread Level Parallelism on the MIT Multi-ALU Processor. In Proceedings of the 25th Annual International Symposium on Computer Architecture (Barcelona, Spain) (ISCA '98). IEEE Computer Society, USA, 306--317."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"R.E. Kessler and J.L. Schwarzmeier. 1993. Cray T3D: a new dimension for Cray Research. In Digest of Papers. Compcon Spring. 176--182. R.E. Kessler and J.L. Schwarzmeier. 1993. Cray T3D: a new dimension for Cray Research. In Digest of Papers. Compcon Spring. 176--182.","DOI":"10.1109\/CMPCON.1993.289660"},{"key":"e_1_3_2_1_39_1","volume-title":"2009 IEEE International Symposium on Performance Analysis of Systems and Software. 65--76","author":"Kulkarni Milind","year":"2009","unstructured":"Milind Kulkarni , Martin Burtscher , Calin Cascaval , and Keshav Pingali . 2009 . Lonestar: A suite of parallel irregular programs . In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. 65--76 . Milind Kulkarni, Martin Burtscher, Calin Cascaval, and Keshav Pingali. 2009. Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. 65--76."},{"key":"e_1_3_2_1_40_1","volume-title":"Proceedings of the 34th Annual International Symposium on Computer Architecture","author":"Kumar Sanjeev","year":"2007","unstructured":"Sanjeev Kumar , Christopher J. Hughes , and Anthony Nguyen . 2007 . Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors . In Proceedings of the 34th Annual International Symposium on Computer Architecture ( San Diego, California, USA) (ISCA '07). Association for Computing Machinery, New York, NY, USA, 162--173. Sanjeev Kumar, Christopher J. Hughes, and Anthony Nguyen. 2007. Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (San Diego, California, USA) (ISCA '07). Association for Computing Machinery, New York, NY, USA, 162--173."},{"key":"e_1_3_2_1_41_1","volume-title":"2020 57th ACM\/IEEE Design Automation Conference (DAC). 1--6.","author":"Kurth Andreas","year":"2020","unstructured":"Andreas Kurth , Samuel Riedel , Florian Zaruba , Torsten Hoefler , and Luca Benini . 2020 . ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor . In 2020 57th ACM\/IEEE Design Automation Conference (DAC). 1--6. Andreas Kurth, Samuel Riedel, Florian Zaruba, Torsten Hoefler, and Luca Benini. 2020. ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor. In 2020 57th ACM\/IEEE Design Automation Conference (DAC). 1--6."},{"key":"e_1_3_2_1_42_1","volume-title":"Proceedings of the 24th Annual International Symposium on Computer Architecture","author":"Laudon J.","unstructured":"J. Laudon and D. Lenoski . 1997. The SGI Origin: A CcNUMA Highly Scalable Server . In Proceedings of the 24th Annual International Symposium on Computer Architecture ( Denver, Colorado, USA) (ISCA '97). Association for Computing Machinery, New York, NY, USA, 241--251. J. Laudon and D. Lenoski. 1997. The SGI Origin: A CcNUMA Highly Scalable Server. In Proceedings of the 24th Annual International Symposium on Computer Architecture (Denver, Colorado, USA) (ISCA '97). Association for Computing Machinery, New York, NY, USA, 241--251."},{"key":"e_1_3_2_1_43_1","article-title":"IBM POWER9 processor core","volume":"62","author":"Le H. Q.","year":"2018","unstructured":"H. Q. Le , J. A. Van Norstrand , B. W. Thompto , J. E. Moreira , D. Q. Nguyen , D. Hrusecky , M. J. Genden , and M. Kroener . 2018 . IBM POWER9 processor core . IBM Journal of Research and Development 62 , 4\/5 (2018), 2:1--2:12. H. Q. Le, J. A. Van Norstrand, B. W. Thompto, J. E. Moreira, D. Q. Nguyen, D. Hrusecky, M. J. Genden, and M. Kroener. 2018. IBM POWER9 processor core. IBM Journal of Research and Development 62, 4\/5 (2018), 2:1--2:12.","journal-title":"IBM Journal of Research and Development"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2001.970573"},{"key":"e_1_3_2_1_45_1","volume-title":"Using Kronecker Multiplication. In Knowledge Discovery in Databases: PKDD","author":"Leskovec Jurij","year":"2005","unstructured":"Jurij Leskovec , Deepayan Chakrabarti , Jon Kleinberg , and Christos Faloutsos . 2005. Realistic , Mathematically Tractable Graph Generation and Evolution , Using Kronecker Multiplication. In Knowledge Discovery in Databases: PKDD 2005 , Al\u00edpio M\u00e1rio Jorge, Lu\u00eds Torgo , Pavel Brazdil, Rui Camacho, and Jo\u00e3o Gama (Eds.). Springer Berlin Heidelberg , Berlin, Heidelberg, 133--145. Jurij Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos. 2005. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. In Knowledge Discovery in Databases: PKDD 2005, Al\u00edpio M\u00e1rio Jorge, Lu\u00eds Torgo, Pavel Brazdil, Rui Camacho, and Jo\u00e3o Gama (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 133--145."},{"key":"e_1_3_2_1_46_1","volume-title":"2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 469--480","author":"Li Sheng","unstructured":"Sheng Li , Jung Ho Ahn , Richard D. Strong , Jay B. Brockman , Dean M. Tullsen , and Norman P. Jouppi . 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures . In 2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 469--480 . Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 469--480."},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the 42nd Annual International Symposium on Computer Architecture","author":"Liang C.","unstructured":"C. Liang and M. Prvulovic . 2015. MiSAR: Minimalistic Synchronization Accelerator with Resource Overflow Management . In Proceedings of the 42nd Annual International Symposium on Computer Architecture ( Portland, Oregon) (ISCA '15). Association for Computing Machinery, New York, NY, USA, 414--426. C. Liang and M. Prvulovic. 2015. MiSAR: Minimalistic Synchronization Accelerator with Resource Overflow Management. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (Portland, Oregon) (ISCA '15). Association for Computing Machinery, New York, NY, USA, 414--426."},{"key":"e_1_3_2_1_48_1","unstructured":"J. Lowe-Power A. Mutaal Ahmad A. Akram M. Alian R. Amslinger M. Andreozzi A. Armejach N. Asmussen B. Beckmann S. Bharadwaj G. Black G. Bloom B. R. Bruce D. Rodrigues Carvalho J. Castrillon L. Chen N. Derumigny S. Diestelhorst W. Elsasser C. Escuin M. Fariborz A. Farmahini-Farahani P. Fotouhi R. Gambord J. Gandhi D. Gope T. Grass A. Gutierrez B. Hanindhito A. Hansson S. Haria A. Harris T. Hayes A. Herrera M. Horsnell S. A. R. Jafri R. Jagtap H. Jang R. Jeyapaul T. M. Jones M. Jung S. Kannoth H. Khaleghzadeh Y. Kodama T. Krishna T. Marinelli C. Menard A. Mondelli M. Moreto T. M\u00fcck O. Naji K. Nathella H. Nguyen N. Nikoleris L. E. Olson M. Orr B. Pham P. Prieto T. Reddy A. Roelke M. Samani A. Sandberg J. Setoain B. Shingarov M. D. Sinclair T. Ta R. Thakur G. Travaglini M. Upton N. Vaish I. Vougioukas W. Wang Z. Wang N. Wehn C. Weis D. A. Wood H. Yoon and \u00c9. F. Zulian. 2020. The gem5 Simulator: Version 20.0+. arXiv:2007.03152 [cs.AR] J. Lowe-Power A. Mutaal Ahmad A. Akram M. Alian R. Amslinger M. Andreozzi A. Armejach N. Asmussen B. Beckmann S. Bharadwaj G. Black G. Bloom B. R. Bruce D. Rodrigues Carvalho J. Castrillon L. Chen N. Derumigny S. Diestelhorst W. Elsasser C. Escuin M. Fariborz A. Farmahini-Farahani P. Fotouhi R. Gambord J. Gandhi D. Gope T. Grass A. Gutierrez B. Hanindhito A. Hansson S. Haria A. Harris T. Hayes A. Herrera M. Horsnell S. A. R. Jafri R. Jagtap H. Jang R. Jeyapaul T. M. Jones M. Jung S. Kannoth H. Khaleghzadeh Y. Kodama T. Krishna T. Marinelli C. Menard A. Mondelli M. Moreto T. M\u00fcck O. Naji K. Nathella H. Nguyen N. Nikoleris L. E. Olson M. Orr B. Pham P. Prieto T. Reddy A. Roelke M. Samani A. Sandberg J. Setoain B. Shingarov M. D. Sinclair T. Ta R. Thakur G. Travaglini M. Upton N. Vaish I. Vougioukas W. Wang Z. Wang N. Wehn C. Weis D. A. Wood H. Yoon and \u00c9. F. Zulian. 2020. The gem5 Simulator: Version 20.0+. arXiv:2007.03152 [cs.AR]"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.1998.658762"},{"key":"e_1_3_2_1_50_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Musleh M.","unstructured":"M. Musleh and V. S. Pai . 2015. Automatic Sharing Classification and Timely Push for Cache-Coherent Systems . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis ( Austin, Texas) (SC '15). Association for Computing Machinery, New York, NY, USA, Article 13, 12 pages. M. Musleh and V. S. Pai. 2015. Automatic Sharing Classification and Timely Push for Cache-Coherent Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, Texas) (SC '15). Association for Computing Machinery, New York, NY, USA, Article 13, 12 pages."},{"key":"e_1_3_2_1_51_1","volume-title":"NASA Video and Image Library. https:\/\/images.nasa.gov\/. [Online","author":"NASA.","year":"2022","unstructured":"NASA. 2021. NASA Video and Image Library. https:\/\/images.nasa.gov\/. [Online ; accessed 30- July - 2022 ]. NASA. 2021. NASA Video and Image Library. https:\/\/images.nasa.gov\/. [Online; accessed 30-July-2022]."},{"key":"e_1_3_2_1_52_1","volume-title":"2011 38th Annual International Symposium on Computer Architecture (ISCA). 105--115","author":"Oh Jungju","year":"2011","unstructured":"Jungju Oh , Milos Prvulovic , and Alenka Zajic . 2011 . TLSync: Support for multiple fast barriers using on-chip transmission lines . In 2011 38th Annual International Symposium on Computer Architecture (ISCA). 105--115 . Jungju Oh, Milos Prvulovic, and Alenka Zajic. 2011. TLSync: Support for multiple fast barriers using on-chip transmission lines. In 2011 38th Annual International Symposium on Computer Architecture (ISCA). 105--115."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42614.2022.9731562"},{"key":"e_1_3_2_1_54_1","volume-title":"https:\/\/gcc.gnu.org\/onlinedocs\/gcc\/_005f_005fatomic-Builtins.html. [Online","author":"Project GNU","year":"2023","unstructured":"GNU Project . 2023. GCC __atomic Builtins . https:\/\/gcc.gnu.org\/onlinedocs\/gcc\/_005f_005fatomic-Builtins.html. [Online ; accessed 12- April - 2023 ]. GNU Project. 2023. GCC __atomic Builtins. https:\/\/gcc.gnu.org\/onlinedocs\/gcc\/_005f_005fatomic-Builtins.html. [Online; accessed 12-April-2023]."},{"key":"e_1_3_2_1_55_1","first-page":"4","article-title":"A Fast General-Purpose Hardware Synchronization Mechanism","volume":"14","author":"Robinson John T.","year":"1985","unstructured":"John T. Robinson . 1985 . A Fast General-Purpose Hardware Synchronization Mechanism . SIGMOD Rec. 14 , 4 (may 1985), 122--130. John T. Robinson. 1985. A Fast General-Purpose Hardware Synchronization Mechanism. SIGMOD Rec. 14, 4 (may 1985), 122--130.","journal-title":"SIGMOD Rec."},{"key":"e_1_3_2_1_56_1","volume-title":"2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 101--111","author":"Sakalis C.","unstructured":"C. Sakalis , C. Leonardsson , S. Kaxiras , and A. Ros . 2016. Splash-3: A properly synchronized benchmark suite for contemporary research . In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 101--111 . C. Sakalis, C. Leonardsson, S. Kaxiras, and A. Ros. 2016. Splash-3: A properly synchronized benchmark suite for contemporary research. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 101--111."},{"key":"e_1_3_2_1_57_1","volume-title":"Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers. In 2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'06)","author":"Sampson Jack","year":"2006","unstructured":"Jack Sampson , Ruben Gonzalez , Jean-francois Collard, Norman P. Jouppi , Mike Schlansker , and Brad Calder . 2006 . Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers. In 2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'06) . 235--246. Jack Sampson, Ruben Gonzalez, Jean-francois Collard, Norman P. Jouppi, Mike Schlansker, and Brad Calder. 2006. Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers. In 2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'06). 235--246."},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/237090.237144"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.388040"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/277830.277903"},{"key":"e_1_3_2_1_61_1","volume-title":"Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems","author":"Tang X.","unstructured":"X. Tang , J. Zhai , X. Qian , and W. Chen . 2019. PLock: A Fast Lock for Architectures with Explicit Inter-Core Message Passing . In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems ( Providence, RI, USA) (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 765--778. X. Tang, J. Zhai, X. Qian, and W. Chen. 2019. PLock: A Fast Lock for Architectures with Explicit Inter-Core Message Passing. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 765--778."},{"key":"e_1_3_2_1_62_1","volume-title":"Architectural Support for Fair Reader-Writer Locking. In 2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture. 275--286","author":"Vallejo Enrique","year":"2010","unstructured":"Enrique Vallejo , Ramon Beivide , Adrian Cristal , Tim Harris , Fernando Vallejo , Osman Unsal , and Mateo Valero . 2010 . Architectural Support for Fair Reader-Writer Locking. In 2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture. 275--286 . Enrique Vallejo, Ramon Beivide, Adrian Cristal, Tim Harris, Fernando Vallejo, Osman Unsal, and Mateo Valero. 2010. Architectural Support for Fair Reader-Writer Locking. In 2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture. 275--286."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.5555\/1320302.1320834"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.24"},{"key":"e_1_3_2_1_65_1","volume-title":"Proceedings 22nd Annual International Symposium on Computer Architecture. 24--36","author":"Woo S.C.","unstructured":"S.C. Woo , M. Ohara , E. Torrie , J.P. Singh , and A. Gupta . 1995. The SPLASH-2 programs: characterization and methodological considerations . In Proceedings 22nd Annual International Symposium on Computer Architecture. 24--36 . S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings 22nd Annual International Symposium on Computer Architecture. 24--36."},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056064"},{"key":"e_1_3_2_1_67_1","volume-title":"Proceedings of the 48th International Symposium on Microarchitecture","author":"Zhang G.","unstructured":"G. Zhang , W. Horn , and D. Sanchez . 2015. Exploiting Commutativity to Reduce the Cost of Updates to Shared Data in Cache-Coherent Systems . In Proceedings of the 48th International Symposium on Microarchitecture ( Waikiki, Hawaii) (MICRO-48). Association for Computing Machinery, New York, NY, USA, 13--25. G. Zhang, W. Horn, and D. Sanchez. 2015. Exploiting Commutativity to Reduce the Cost of Updates to Shared Data in Cache-Coherent Systems. In Proceedings of the 48th International Symposium on Microarchitecture (Waikiki, Hawaii) (MICRO-48). Association for Computing Machinery, New York, NY, USA, 13--25."},{"key":"e_1_3_2_1_68_1","volume-title":"18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. 58--.","author":"Zhang L.","unstructured":"L. Zhang , Z. Fang , and J.B. Carter . 2004. Highly efficient synchronization based on active memory operations . In 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. 58--. L. Zhang, Z. Fang, and J.B. Carter. 2004. Highly efficient synchronization based on active memory operations. In 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. 58--."},{"key":"e_1_3_2_1_69_1","first-page":"2","article-title":"Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures","volume":"35","author":"Zhu Weirong","year":"2007","unstructured":"Weirong Zhu , Vugranam C Sreedhar , Ziang Hu , and Guang R. Gao . 2007 . Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures . SIGARCH Comput. Archit. News 35 , 2 (jun 2007), 35--45. Weirong Zhu, Vugranam C Sreedhar, Ziang Hu, and Guang R. Gao. 2007. Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures. SIGARCH Comput. Archit. News 35, 2 (jun 2007), 35--45.","journal-title":"SIGARCH Comput. Archit. News"}],"event":{"name":"ISCA '23: 50th Annual International Symposium on Computer Architecture","location":"Orlando FL USA","acronym":"ISCA '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IEEE"]},"container-title":["Proceedings of the 50th Annual International Symposium on Computer Architecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579371.3589065","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:39Z","timestamp":1750178799000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579371.3589065"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,17]]},"references-count":69,"alternative-id":["10.1145\/3579371.3589065","10.1145\/3579371"],"URL":"https:\/\/doi.org\/10.1145\/3579371.3589065","relation":{},"subject":[],"published":{"date-parts":[[2023,6,17]]},"assertion":[{"value":"2023-06-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}