{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,1]],"date-time":"2025-09-01T13:10:14Z","timestamp":1756732214769,"version":"3.44.0"},"reference-count":118,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>\n            Parallel applications often use MPI processes and OpenMP threads. Those parallel execution models, multi-process and multi-thread, were invented to increase efficiency on uniprocessor systems. In the multi-process approach, each process\u2019s isolated address space may make communication expensive; in the multi-thread design, shared variables may cause access conflicts and stall executions. Processes or threads interact and exchange information more often as CPU cores increase, and traditional execution models may create bottlenecks. The paradigm shift from uniprocessor to many-core systems necessitates the development of new parallel execution models to address challenges posed by the two parallel models. When processes share an address space, what happens? If threads don\u2019t share static variables? Sharing an address space and privatizing static variables reduces information exchange and shared static variable exclusion costs. This survey investigates\n            <jats:italic toggle=\"yes\">SAS-PSV (SAS-PSV)<\/jats:italic>\n            , a new execution architecture that allows shared address space and static variable privatization. This notion is implemented by MPC, SMARTMAP, PVAS, PiP, and AMPI. Each has a different approach and execution. This article analyzes these implementations\u2019 concepts, details, and hidden defects. We also present SAS-PSV applications and issues that need to be solved.\n          <\/jats:p>","DOI":"10.1145\/3746169","type":"journal-article","created":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T07:19:51Z","timestamp":1751613591000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Survey on the Shared Address Space with Privatized Static Variables (SAS-PSV) Execution Model for the Many-Core Era"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7010-8098","authenticated-orcid":false,"given":"Atsushi","family":"Hori","sequence":"first","affiliation":[{"name":"National Institute of Informatics","place":["Chiyoda-ku, Japan"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4775-1835","authenticated-orcid":false,"given":"Kaiming","family":"Ouyang","sequence":"additional","affiliation":[{"name":"NVIDIA Corporation","place":["Santa Clara, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0208-096X","authenticated-orcid":false,"given":"Min","family":"Si","sequence":"additional","affiliation":[{"name":"Meta Platforms","place":["Menlo Park, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7830-0001","authenticated-orcid":false,"given":"Pavan","family":"Balaji","sequence":"additional","affiliation":[{"name":"Meta Platforms","place":["Menlo Park, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0084-1574","authenticated-orcid":false,"given":"Julien","family":"Jaeger","sequence":"additional","affiliation":[{"name":"CEA\/DAM\/DIF","place":["Arpajon, France"]},{"name":"CEA\/DAM\/DIF\/LRC DIGIT","place":["Arpajon, France"]},{"name":"Universit\u00e9 Paris-Saclay","place":["Arpajon, France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1615-2749","authenticated-orcid":false,"given":"Marc","family":"P\u00e9rache","sequence":"additional","affiliation":[{"name":"CEA\/DAM\/DIF","place":["Arpajon, France"]},{"name":"CEA\/DAM\/DIF\/LRC DIGIT","place":["Arpajon, France"]},{"name":"Universit\u00e9 Paris-Saclay","place":["Arpajon, France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6019-8763","authenticated-orcid":false,"given":"Sam","family":"White","sequence":"additional","affiliation":[{"name":"Intel Corporation","place":["Champaign, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6559-5634","authenticated-orcid":false,"given":"Evan","family":"Ramos","sequence":"additional","affiliation":[{"name":"NVIDIA Corporation","place":["Santa Clara, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9673-8445","authenticated-orcid":false,"given":"Laxmikant","family":"Kale","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign","place":["Urbana, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1261-0178","authenticated-orcid":false,"given":"Kevin","family":"Pedretti","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories","place":["Albuquerque, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8186-222X","authenticated-orcid":false,"given":"Ron","family":"Brightwell","sequence":"additional","affiliation":[{"name":"Scalable System Software, Sandia National Laboratories","place":["Albuquerque, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8585-6031","authenticated-orcid":false,"given":"Balazs","family":"Gerofi","sequence":"additional","affiliation":[{"name":"Intel Corporation","place":["Portland, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2286-9770","authenticated-orcid":false,"given":"Yutaka","family":"Ishikawa","sequence":"additional","affiliation":[{"name":"National Institute of Informatics","place":["Chiyoda-ku, Japan"]},{"name":"Otsuma Women's University","place":["Chiyoda-ku, Japan"]}]}],"member":"320","published-online":{"date-parts":[[2025,9]]},"reference":[{"volume-title":"Debugging Multiple Inferiors Connections and Programs","author":"Free Software Foundation","key":"e_1_3_2_2_2","unstructured":"Free Software Foundation. 1998. Debugging Multiple Inferiors Connections and Programs. Free Software Foundation. Retrieved from https:\/\/sourceware.org\/gdb\/current\/onlinedocs\/gdb.html\/Inferiors-Connections-and-Programs.html"},{"key":"e_1_3_2_3_2","unstructured":"2014. Rust. Retrieved Sep. 2 2022 from https:\/\/www.rust-lang.org"},{"key":"e_1_3_2_4_2","volume-title":"The GNU C Library","author":"Free Software Foundation","year":"2023","unstructured":"Free Software Foundation 2023. The GNU C Library. Free Software Foundation. Retrieved August 19, 2024 from https:\/\/www.gnu.org\/software\/libc\/"},{"key":"e_1_3_2_5_2","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201921)","author":"Allen Tyler","year":"2021","unstructured":"Tyler Allen and Rong Ge. 2021. In-depth analyses of unified virtual memory system for GPU accelerated computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201921). ACM, New York, NY, USA, Article 64, 15 pages. DOI:10.1145\/3458817.3480855"},{"key":"e_1_3_2_6_2","first-page":"239","volume-title":"Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015)","author":"Amer Abdelhalim","year":"2015","unstructured":"Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, and Satoshi Matsuoka. 2015. MPI+Threads: Runtime contention and remedies. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). ACM, New York, NY, USA, 239\u2013248. DOI:10.1145\/2688500.2688522"},{"key":"e_1_3_2_7_2","first-page":"95","volume-title":"Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP\u201991)","author":"Anderson Thomas E.","year":"1991","unstructured":"Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. 1991. Scheduler activations: Effective kernel support for the user-level management of parallelism. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP\u201991). ACM, New York, NY, USA, 95\u2013109."},{"key":"e_1_3_2_8_2","first-page":"496","volume-title":"Proceedings of the 11 IPPS\/SPDP\u201999 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing","author":"Antoniu Gabriel","year":"1999","unstructured":"Gabriel Antoniu, Luc Boug\u00e9, and Raymond Namyst. 1999. An efficient and transparent thread migration scheme in the PM2 runtime system. In Proceedings of the 11 IPPS\/SPDP\u201999 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. Springer-Verlag, 496\u2013510. Retrieved from https:\/\/hal.inria.fr\/inria-00565361"},{"key":"e_1_3_2_9_2","volume-title":"Explore the New System Architecture of Apple Silicon Macs","author":"Barraclough Gavin","year":"2020","unstructured":"Gavin Barraclough and Anand Dalal. 2020. Explore the New System Architecture of Apple Silicon Macs. Retrieved December 8, 2024 from https:\/\/developer.apple.com\/videos\/play\/wwdc2020\/10686\/"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2966884.2966910"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/850657.850659"},{"key":"e_1_3_2_12_2","volume-title":"USENIX Experiences with Distributed and Multiprocessor Systems (SEDMS IV)","author":"Brecht Tim","year":"1993","unstructured":"Tim Brecht. 1993. On the importance of parallel application placement in NUMA multiprocessors. In USENIX Experiences with Distributed and Multiprocessor Systems (SEDMS IV). USENIX Association, San Diego, CA. Retrieved from https:\/\/www.usenix.org\/conference\/sedms-iv\/importance-parallel-application-placement-numa-multiprocessors"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87475-1_18"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009359014"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2009.65"},{"key":"e_1_3_2_16_2","volume-title":"Proceedings of the 5th Partitioned Global Address Space Conference","author":"Brightwell Ron","year":"2011","unstructured":"Ron Brightwell and Kevin Pedretti. 2011. An intra-node implementation of OpenSHMEM using virtual address space mapping\u201d. In Proceedings of the 5th Partitioned Global Address Space Conference."},{"key":"e_1_3_2_17_2","volume-title":"Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing (SC\u201908)","author":"Brightwell Ron","year":"2008","unstructured":"Ron Brightwell, Kevin Pedretti, and Trammell Hudson. 2008. SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor. In Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing (SC\u201908). IEEE Press, Piscataway, NJ, USA, Article 25, 12 pages. Retrieved fromhttp:\/\/dl.acm.org\/citation.cfm?id=1413370.1413396"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1007\/978-3-642-02303-3_7","volume-title":"Evolving OpenMP in an Age of Extreme Parallelism","author":"Broquedis Fran\u00e7ois","year":"2009","unstructured":"Fran\u00e7ois Broquedis, Nathalie Furmento, Brice Goglin, Raymond Namyst, and Pierre-Andr\u00e9 Wacrenier. 2009. Dynamic task and data placement over NUMA architectures: An OpenMP runtime perspective. In Evolving OpenMP in an Age of Extreme Parallelism. Matthias S. M\u00fcller, Bronis R. de Supinski, and Barbara M. Chapman (Eds.), Springer Berlin Heidelberg, Berlin, 79\u201392."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2006.31"},{"key":"e_1_3_2_20_2","first-page":"25","volume-title":"Proceedings of the 2009 Conference on USENIX Annual Technical Conference (USENIX\u201909)","author":"Burtsev Anton","year":"2009","unstructured":"Anton Burtsev, Kiran Srinivasan, Prashanth Radhakrishnan, Lakshmi N. Bairavasundaram, Kaladhar Voruganti, and Garth R. Goodson. 2009. Fido: Fast inter-virtual-machine communication for enterprise appliances. In Proceedings of the 2009 Conference on USENIX Annual Technical Conference (USENIX\u201909). USENIX Association, USA, 25."},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1109\/SC.2000.10001","volume-title":"Proceedings of the 2000 ACM\/IEEE Conference on Supercomputing (SC\u201900)","author":"Cappello Franck","year":"2000","unstructured":"Franck Cappello and Daniel Etiemble. 2000. MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks. In Proceedings of the 2000 ACM\/IEEE Conference on Supercomputing (SC\u201900). IEEE Computer Society, USA, 12\u201312."},{"key":"e_1_3_2_22_2","first-page":"1","volume-title":"Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More","author":"Carribault Patrick","year":"2010","unstructured":"Patrick Carribault, Marc P\u00e9rache, and Herv\u00e9 Jourdren. 2010. Enabling low-overhead hybrid MPI\/openmp parallelism with MPC. In Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More. Mitsuhisa Sato, Toshihiro Hanawa, Matthias S. M\u00fcller, Barbara M. Chapman, and Bronis R. de Supinski (Eds.), Springer Berlin Heidelberg, Berlin, 1\u201314."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21487-5_7"},{"key":"e_1_3_2_24_2","volume-title":"Proceedings of the 4th Conference on Partitioned Global Address Space Programming Model (PGAS\u201910)","author":"Chapman Barbara","year":"2010","unstructured":"Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the 4th Conference on Partitioned Global Address Space Programming Model (PGAS\u201910). ACM, New York, NY, USA, Article 2, 3 pages. DOI:10.1145\/2020373.2020375"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/195792.195795"},{"key":"e_1_3_2_26_2","volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912)","author":"Chen Guancheng","year":"2012","unstructured":"Guancheng Chen and Per Stenstrom. 2012. Critical lock analysis: Diagnosing critical section bottlenecks in multithreaded applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912). IEEE Computer Society Press, Washington, DC, USA, Article 71, 11 pages."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/40.653035"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-09873-9_50"},{"volume-title":"OpenMP","author":"Community OpenMP","key":"e_1_3_2_29_2","unstructured":"OpenMP Community. 2012. OpenMP. Retrieved September 2, 2022 from https:\/\/www.openmp.org"},{"key":"e_1_3_2_30_2","volume-title":"Multi-Processor Computing","author":"Computing Multi-Processor","year":"2022","unstructured":"Multi-Processor Computing. 2022. Multi-Processor Computing. Retrieved September 2, 2022 from https:\/\/mpc.hpcframework.com"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2006.4380784"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/2716320"},{"key":"e_1_3_2_33_2","first-page":"23","volume-title":"Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV\u201919)","author":"Dorier Matthieu","year":"2019","unstructured":"Matthieu Dorier, Orcun Yildiz, Tom Peterka, and Robert Ross. 2019. The challenges of elastic in situ analysis and visualization. In Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV\u201919). ACM, New York, NY, USA, 23\u201328. DOI:10.1145\/3364228.3364234"},{"key":"e_1_3_2_34_2","volume-title":"ELF Handling For Thread Local Storage","author":"Drepper Ulrich","year":"2013","unstructured":"Ulrich Drepper. 2013. ELF Handling For Thread Local Storage. Retrieved September 2, 2022 from http:\/\/people.redhat.com\/drepper\/tls.pdf"},{"key":"e_1_3_2_35_2","volume-title":"The Native POSIX Thread Library for Linux","author":"Drepper Ulrich","year":"2003","unstructured":"Ulrich Drepper and Ingo Molnar. 2003. The Native POSIX Thread Library for Linux. Technical Report. RedHat, Inc."},{"volume-title":"MPI Forum","author":"Forum MPI","key":"e_1_3_2_36_2","unstructured":"MPI Forum. 2018. MPI Forum. Retrieved September 2, 2022 from https:\/\/www.mpi-forum.org"},{"key":"e_1_3_2_37_2","volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201913)","author":"Friedley Andrew","year":"2013","unstructured":"Andrew Friedley, Greg Bronevetsky, Torsten Hoefler, and Andrew Lumsdaine. 2013. Hybrid MPI: Efficient message passing for multi-core systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201913). ACM, New York, NY, USA, Article 18, 11 pages. DOI:10.1145\/2503210.2503294"},{"key":"e_1_3_2_38_2","first-page":"49","volume-title":"Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201919)","author":"Garg Rohan","year":"2019","unstructured":"Rohan Garg, Gregory Price, and Gene Cooperman. 2019. MANA for MPI: MPI-agnostic network-agnostic transparent checkpointing. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201919). ACM, New York, NY, USA, 49\u201360. DOI:10.1145\/3307681.3325962"},{"key":"e_1_3_2_39_2","first-page":"231","volume-title":"Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14)","author":"Gaud Fabien","year":"2014","unstructured":"Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quema. 2014. Large pages may be harmful on NUMA systems. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14). USENIX Association, Philadelphia, PA, 231\u2013242. Retrieved from https:\/\/www.usenix.org\/conference\/atc14\/technical-sessions\/presentation\/gaud"},{"key":"e_1_3_2_40_2","unstructured":"Robert A. Gingell Meng Lee and Xuong T. Dang. 1987. Shared libraries in SunOS. Retrieved Sep. 2 2022 from https:\/\/api.semanticscholar.org\/CorpusID:12881742"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2012.09.016"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1007\/978-3-642-24449-0_17","volume-title":"Recent Advances in the Message Passing Interface","author":"Goodell David","year":"2011","unstructured":"David Goodell, William Gropp, Xin Zhao, and Rajeev Thakur. 2011. Scalable memory use in MPI: A case study with MPICH2. In Recent Advances in the Message Passing Interface. Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.), Springer Berlin Heidelberg, Berlin, 140\u2013149."},{"key":"e_1_3_2_43_2","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1007\/978-3-540-87475-1_21","volume-title":"Recent Advances in Parallel Virtual Machine and Message Passing Interface","author":"Graham Richard L.","year":"2008","unstructured":"Richard L. Graham and Galen Shipman. 2008. MPI support for multi-core architectures: Optimized shared memory collectives. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. Alexey Lastovetsky, Tahar Kechadi, and Jack Dongarra (Eds.), Springer Berlin Heidelberg, Berlin, 130\u2013140."},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1007\/11846802_11","volume-title":"Recent Advances in Parallel Virtual Machine and Message Passing Interface","author":"Gropp William","year":"2006","unstructured":"William Gropp and Rajeev Thakur. 2006. Issues in developing a thread-safe MPI implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. Bernd Mohr, Jesper Larsson Tr\u00e4ff, Joachim Worringen, and Jack Dongarra (Eds.), Springer Berlin Heidelberg, Berlin, 12\u201321."},{"key":"e_1_3_2_45_2","volume-title":"TLS Performance Overhead and Cost on GNU\/Linux","author":"Gross David","year":"2016","unstructured":"David Gross. 2016. TLS Performance Overhead and Cost on GNU\/Linux. Retrieved September 2, 2022 from http:\/\/david-grs.github.io\/tls_performance_overhead_cost_linux\/"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.2197\/ipsjjip.20.89"},{"key":"e_1_3_2_47_2","volume-title":"Designing High Performance Shared-Address-Space and Adaptive Communication Middlewares for Next-Generation HPC Systems","author":"Hashmi Jahanzeb Maqbool","year":"2020","unstructured":"Jahanzeb Maqbool Hashmi. 2020. Designing High Performance Shared-Address-Space and Adaptive Communication Middlewares for Next-Generation HPC Systems. Ph. D. Dissertation. The Ohio State University."},{"key":"e_1_3_2_48_2","article-title":"Implementation and performance of the mungi single-address-space operating system","author":"Heiser Gernot","year":"1997","unstructured":"Gernot Heiser. 1997. Implementation and performance of the mungi single-address-space operating system. Software - Practice and Experience (1997).","journal-title":"Software - Practice and Experience"},{"volume-title":"Linux Cross-Memory Attach","author":"Hjelm Nathan","key":"e_1_3_2_49_2","unstructured":"Nathan Hjelm. 2014. Linux Cross-Memory Attach. Retrieved September 2, 2022 from https:\/\/github.com\/hjelmn\/xpmem"},{"key":"e_1_3_2_50_2","first-page":"123","volume-title":"Proceedings of the 2021 IEEE\/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","author":"Hobson Tanner","year":"2021","unstructured":"Tanner Hobson, Orcun Yildiz, Bogdan Nicolae, Jian Huang, and Tom Peterka. 2021. Shared-memory communication for containerized workflows. In Proceedings of the 2021 IEEE\/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 123\u2013132. DOI:10.1109\/CCGrid51090.2021.00022"},{"key":"e_1_3_2_51_2","first-page":"213","volume-title":"Proceedings of the 2008 IEEE International Conference on Cluster Computing","author":"Hoefler Torsten","year":"2008","unstructured":"Torsten Hoefler and Andrew Lumsdaine. 2008. Message progression in parallel computing - to thread or not to thread?. In Proceedings of the 2008 IEEE International Conference on Cluster Computing. 213\u2013222. DOI:10.1109\/CLUSTR.2008.4663774"},{"key":"e_1_3_2_52_2","volume-title":"A Study on Efficient Time-Sharing Scheduling for Distributed Memory Parallel Machines","author":"Hori Atsushi","year":"1999","unstructured":"Atsushi Hori. 1999. A Study on Efficient Time-Sharing Scheduling for Distributed Memory Parallel Machines. Ph. D. Dissertation. The University of Tokyo. (in Japanese)."},{"key":"e_1_3_2_53_2","volume-title":"Great Experiences with PiP (Process-in-Process)","author":"Hori Atsushi","year":"2022","unstructured":"Atsushi Hori. 2022. Great Experiences with PiP (Process-in-Process). Retrieved September 2, 2022 from https:\/\/procinproc.github.io\/PiP-Tutorial.pdf"},{"key":"e_1_3_2_54_2","first-page":"976","volume-title":"Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","author":"Hori Atsushi","year":"2020","unstructured":"Atsushi Hori, Balazs Gerofi, and Yutaka Ishikawa. 2020. An implementation of user-level processes using address space sharing. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 976\u2013984. DOI:10.1109\/IPDPSW50202.2020.00161"},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/978-3-031-10419-0_5","volume-title":"Supercomputing Frontiers","author":"Hori Atsushi","year":"2022","unstructured":"Atsushi Hori, Kaiming Ouyang, Balazs Gerofi, and Yutaka Ishikawa. 2022. On the difference between shared memory and shared address space in HPC communication. In Supercomputing Frontiers. Dhabaleswar K. Panda and Michael Sullivan (Eds.), Springer International Publishing, Cham, 59\u201378."},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3208040.3208045"},{"key":"e_1_3_2_57_2","first-page":"43","volume-title":"SC\u201998: Proceedings of the 1998 ACM\/IEEE Conference on Supercomputing","author":"Hori Atsushi","year":"1998","unstructured":"Atsushi Hori, Hiroshi Tezuka, and Yutaka Ishikawa. 1998. Highly efficient gang scheduling implementation. In SC\u201998: Proceedings of the 1998 ACM\/IEEE Conference on Supercomputing. 43\u201343. DOI:10.1109\/SC.1998.10007"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","unstructured":"C. Huang O. Lawlor and L. V. Kal\u00e9. 2004. Adaptive MPI. In Languages and Compilers for Parallel Computing LCPC 2003 L. Rauchwerger (Ed.). Lecture Notes in Computer Science Vol 2958. Springer Berlin Heidelberg. 10.1007\/978-3-540-24644-2_20","DOI":"10.1007\/978-3-540-24644-2_20"},{"key":"e_1_3_2_59_2","volume-title":"Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2006","author":"Huang Chao","year":"2006","unstructured":"Chao Huang, Gengbin Zheng, Sameer Kumar, and Laxmikant V. Kale. 2006. Performance evaluation of adaptive MPI. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2006."},{"volume-title":"POSIX\u2122 Cetification","key":"e_1_3_2_60_2","unstructured":"IEEE. 2002. POSIX\u2122 Cetification. Retrieved August 20, 2024 from https:\/\/posix.opengroup.org"},{"volume-title":"Intel\u00ae Xeon Phi\u2122 Processors","key":"e_1_3_2_61_2","unstructured":"Intel. [n. d.]. Intel\u00ae Xeon Phi\u2122 Processors. Retrieved August 16, 2024 from https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/series\/75557\/intel-xeon-phi-processors.html"},{"key":"e_1_3_2_62_2","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1109\/PACT.2019.00011","volume-title":"Proceedings of the 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)","author":"Iwasaki Shintaro","year":"2019","unstructured":"Shintaro Iwasaki, Abdelhalim Amer, Kenjiro Taura, Sangmin Seo, and Pavan Balaji. 2019. BOLT: Optimizing OpenMP parallel regions with user-level threads. In Proceedings of the 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 29\u201342. DOI:10.1109\/PACT.2019.00011"},{"key":"e_1_3_2_63_2","first-page":"184","volume-title":"Proceedings of the 2005 International Conference on Parallel Processing (ICPP\u201905)","author":"Jin H. W.","year":"2005","unstructured":"H. W. Jin, S. Sur, L. Chai, and D. K. Panda. 2005. LiMIC: Support for high-performance MPI intra-node communication on linux cluster. In Proceedings of the 2005 International Conference on Parallel Processing (ICPP\u201905). 184\u2013191. DOI:10.1109\/ICPP.2005.48"},{"key":"e_1_3_2_64_2","volume-title":"Proceedings of the Los Alamos Computer Science Institute Symposium (LACSI 2002)","author":"Kale Laxmikant V.","year":"2002","unstructured":"Laxmikant V. Kale. 2002. The virtualization model of parallel programming : Runtime optimizations and the state of art. In Proceedings of the Los Alamos Computer Science Institute Symposium (LACSI 2002)."},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1109\/IPPS.1996.508060","volume-title":"Proceedings of the International Conference on Parallel Processing","author":"Kale Laxmikant V.","year":"1996","unstructured":"Laxmikant V. Kale, Milind Bhandarkar, Narain Jagathesan, Sanjeev Krishnan, and Josh Yelon. 1996. Converse: An interoperable framework for parallel programming. In Proceedings of the International Conference on Parallel Processing. IEEE, 212\u2013217."},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1145\/2749246.2749274","volume-title":"Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201915)","author":"Kocoloski Brian","year":"2015","unstructured":"Brian Kocoloski and John Lange. 2015. XEMEM: Efficient shared memory for composed applications on Multi-OS\/R exascale systems. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201915). ACM, New York, NY, USA, 89\u2013100. DOI:10.1145\/2749246.2749274"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/143371.143508"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508834.2513149"},{"key":"e_1_3_2_69_2","first-page":"196","volume-title":"Proceedings of the 2011 IEEE International Conference on Cluster Computing","author":"Ma Teng","year":"2011","unstructured":"Teng Ma, Thomas Herault, George Bosilca, and Jack J. Dongarra. 2011. Process distance-aware adaptive MPI collective communications. In Proceedings of the 2011 IEEE International Conference on Cluster Computing. 196\u2013204. DOI:10.1109\/CLUSTER.2011.30"},{"key":"e_1_3_2_70_2","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1145\/1993478.1993481","volume-title":"Proceedings of the International Symposium on Memory Management (ISMM\u201911)","author":"Majo Zoltan","year":"2011","unstructured":"Zoltan Majo and Thomas R. Gross. 2011. Memory management in NUMA multicore systems: Trapped between cache contention and interconnect overhead. In Proceedings of the International Symposium on Memory Management (ISMM\u201911). ACM, New York, NY, USA, 11\u201320. DOI:10.1145\/1993478.1993481"},{"key":"e_1_3_2_71_2","first-page":"130","volume-title":"Proceedings of the 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID)","author":"Mamidala Amith R.","year":"2008","unstructured":"Amith R. Mamidala, Rahul Kumar, Debraj De, and D. K. Panda. 2008. MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics. In Proceedings of the 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID). 130\u2013137. DOI:10.1109\/CCGRID.2008.87"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.22"},{"key":"e_1_3_2_73_2","volume-title":"Proceedings of the USENIX Microkernels and Other Architectures Symposium (USENIX Microkernels and Other Architectures Symposium)","author":"Murray Kevin","year":"1993","unstructured":"Kevin Murray, Tim Wilkinson, Peter Osmon, Ashley Saulsbury, Tom Stiemerling, and Paul Kelly. 1993. Design and implementation of an object-oriented 64-bit single address space microkernel. In Proceedings of the USENIX Microkernels and Other Architectures Symposium (USENIX Microkernels and Other Architectures Symposium). USENIX Association, San Diego, CA. Retrieved from https:\/\/www.usenix.org\/conference\/usenix-microkernels-and-other-architectures-symposium\/design-and-implementation-object"},{"key":"e_1_3_2_74_2","volume-title":"Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI 02)","author":"Navarro Juan","year":"2002","unstructured":"Juan Navarro, Sitaram Iyer, and Alan Cox. 2002. Practical, transparent operating system support for superpages. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI 02). USENIX Association, Boston, MA. Retrieved from https:\/\/www.usenix.org\/conference\/osdi-02\/practical-transparent-operating-system-support-superpages"},{"key":"e_1_3_2_75_2","volume-title":"Exploring Interprocess Techniques for High-Performance MPI Communication","author":"Ouyang Kaiming","year":"2022","unstructured":"Kaiming Ouyang. 2022. Exploring Interprocess Techniques for High-Performance MPI Communication. PhD thesis. University of California, Riverside."},{"key":"e_1_3_2_76_2","first-page":"1","volume-title":"SC20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Ouyang Kaiming","year":"2020","unstructured":"Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, and Pavan Balaji. 2020. CAB-MPI: Exploring interprocess work-stealing towards balanced MPI communication. In SC20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1\u201315."},{"key":"e_1_3_2_77_2","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1109\/Cluster48925.2021.00027","volume-title":"Proceedings of the 2021 IEEE International Conference on Cluster Computing (CLUSTER)","author":"Ouyang Kaiming","year":"2021","unstructured":"Kaiming Ouyang, Min Si, Astushi Hori, Zizhong Chen, and Pavan Balaji. 2021. Daps: A dynamic asynchronous progress stealing model for MPI communication. In Proceedings of the 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 516\u2013527."},{"key":"e_1_3_2_78_2","first-page":"55a\u201355a","volume-title":"Supercomputing\u201995:Proceedings of the 1995 ACM\/IEEE Conference on Supercomputing","author":"Pakin S.","year":"1995","unstructured":"S. Pakin, M. Lauria, and A. Chien. 1995. High performance messaging on workstations: Illinois fast messages (FM) for myrinet. In Supercomputing\u201995:Proceedings of the 1995 ACM\/IEEE Conference on Supercomputing. 55a\u201355a. DOI:10.1145\/224170.1039010"},{"key":"e_1_3_2_79_2","volume-title":"Parallel Languages\/Paradigms: AMPI - Adaptive Message Passing Interface","author":"Science University of Illinois at Urbana-Champaign Parallel Programming Laboratory, Department of Computer","year":"2022","unstructured":"University of Illinois at Urbana-Champaign Parallel Programming Laboratory, Department of Computer Science. 2022. Parallel Languages\/Paradigms: AMPI - Adaptive Message Passing Interface. Retrieved September 2, 2022 from http:\/\/charm.cs.uiuc.edu\/research\/ampi\/"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.3929\/ethz-a-007316742"},{"key":"e_1_3_2_81_2","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1007\/978-3-642-03770-2_16","volume-title":"Recent Advances in Parallel Virtual Machine and Message Passing Interface","author":"P\u00e9rache Marc","year":"2009","unstructured":"Marc P\u00e9rache, Patrick Carribault, and Herv\u00e9 Jourdren. 2009. MPC-MPI: An MPI implementation reducing the overall memory consumption. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. Matti Ropo, Jan Westerholm, and Jack Dongarra (Eds.), Springer Berlin Heidelberg, Berlin, 94\u2013103."},{"key":"e_1_3_2_82_2","first-page":"78","volume-title":"Proceedings of the 14th International Euro-Par Conference on Parallel Processing (Euro-Par\u201908)","author":"P\u00e9rache Marc","year":"2008","unstructured":"Marc P\u00e9rache, Herv\u00e9 Jourdren, and Raymond Namyst. 2008. MPC: A unified parallel runtime for clusters of NUMA machines. In Proceedings of the 14th International Euro-Par Conference on Parallel Processing (Euro-Par\u201908). Springer-Verlag, Berlin, 78\u201388. DOI:10.1007\/978-3-540-85451-7_9"},{"key":"e_1_3_2_83_2","first-page":"1","volume-title":"Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","author":"Pouget Kevin","year":"2010","unstructured":"Kevin Pouget, Marc P\u00e9rache, Patrick Carribault, and Herv\u00e9 Jourdren. 2010. User level DB: A debugging API for user-level thread libraries. In Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW). 1\u20137. DOI:10.1109\/IPDPSW.2010.5470815"},{"key":"e_1_3_2_84_2","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1145\/3385412.3386036","volume-title":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020)","author":"Qin Boqin","year":"2020","unstructured":"Boqin Qin, Yilun Chen, Zeming Yu, Linhai Song, and Yiying Zhang. 2020. Understanding memory and thread safety practices and issues in real-world rust programs. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). ACM, New York, NY, USA, 763\u2013779. DOI:10.1145\/3385412.3386036"},{"key":"e_1_3_2_85_2","volume-title":"Workshop Proceedings of the 51st International Conference on Parallel Processing (ICPP Workshops\u201922)","author":"Ramos Evan","year":"2023","unstructured":"Evan Ramos, Sam White, Aditya Bhosale, and Laxmikant Kale. 2023. Runtime techniques for automatic process virtualization. In Workshop Proceedings of the 51st International Conference on Parallel Processing (ICPP Workshops\u201922). ACM, New York, NY, USA, Article 26, 10 pages. DOI:10.1145\/3547276.3548522"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847562"},{"volume-title":"Maximizing Unified Memory Performance in CUDA","author":"Sakharnykh Nikolay","key":"e_1_3_2_87_2","unstructured":"Nikolay Sakharnykh. 2017. Maximizing Unified Memory Performance in CUDA. Retrieved August 9, 2024 from https:\/\/developer.nvidia.com\/blog\/maximizing-unified-memory-performance-cuda\/"},{"key":"e_1_3_2_88_2","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1145\/2642769.2642795","volume-title":"Proceedings of the 21st European MPI Users\u2019 Group Meeting (EuroMPI\/ASIA\u201914)","author":"Sato Mikiko","year":"2014","unstructured":"Mikiko Sato, Go Fukazawa, Akio Shimada, Atsushi Hori, Yutaka Ishikawa, and Mitaro Namiki. 2014. Design of multiple PVAS on infiniband cluster system consisting of many-core and multi-core. In Proceedings of the 21st European MPI Users\u2019 Group Meeting (EuroMPI\/ASIA\u201914). ACM, New York, NY, USA, 133\u2013138. DOI:10.1145\/2642769.2642795"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2766062"},{"key":"e_1_3_2_90_2","first-page":"298","volume-title":"Proceedings of the 11th ACM Conference on Computer and Communications Security (CCS\u201904)","author":"Shacham Hovav","year":"2004","unstructured":"Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan Boneh. 2004. On the effectiveness of address-space randomization. In Proceedings of the 11th ACM Conference on Computer and Communications Security (CCS\u201904). ACM, New York, NY, USA, 298\u2013307. DOI:10.1145\/1030083.1030124"},{"key":"e_1_3_2_91_2","volume-title":"A Study on Task Models for High-Performance and Efficient Intra-Node Communication in Many-Core","author":"Shimada Akio","year":"2017","unstructured":"Akio Shimada. 2017. A Study on Task Models for High-Performance and Efficient Intra-Node Communication in Many-Core. Ph. D. Dissertation. Keio University. (in Japanese)."},{"key":"e_1_3_2_92_2","article-title":"Implementing many-core friendly MPI intra-node communication with new task model","author":"Shimada Akio","year":"2015","unstructured":"Akio Shimada, Balazs Gerofi, Atsushi Hori, and Yutaka Ishikawa. 2015. Implementing many-core friendly MPI intra-node communication with new task model. IPSJ SIG Notes 8, 2 (July2015), 36\u201354.","journal-title":"IPSJ SIG Notes"},{"key":"e_1_3_2_93_2","volume-title":"PGAS 2012: Proceedings of the 6th Conference on Partitioned Global Address Space Programing Model (PGAS\u201912)","author":"Shimada Akio","year":"2012","unstructured":"Akio Shimada, Balazs Gerofi, Atsushi Hori, and Yutaka Ishikawa. 2012. PGAS intra-node communication towards many-core architecture. In PGAS 2012: Proceedings of the 6th Conference on Partitioned Global Address Space Programing Model (PGAS\u201912)."},{"key":"e_1_3_2_94_2","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1145\/2489068.2489075","volume-title":"Proceedings of the 1st International Workshop on Many-Core Embedded Systems (MES\u201913)","author":"Shimada Akio","year":"2013","unstructured":"Akio Shimada, Balazs Gerofi, Atsushi Hori, and Yutaka Ishikawa. 2013. Proposing a new task model towards many-core architecture. In Proceedings of the 1st International Workshop on Many-Core Embedded Systems (MES\u201913). ACM, New York, NY, USA, 45\u201348. DOI:10.1145\/2489068.2489075"},{"key":"e_1_3_2_95_2","volume-title":"Proceedings of the 21st European MPI Users\u2019 Group Meeting (EuroMPI\/ASIA\u201914)","author":"Shimada Akio","year":"2014","unstructured":"Akio Shimada, Atsushi Hori, and Yutaka Ishikawa. 2014. Eliminating costs for crossing process boundary from MPI intra-node communication. In Proceedings of the 21st European MPI Users\u2019 Group Meeting (EuroMPI\/ASIA\u201914). ACM, New York, NY, USA, Article 119, 2 pages. DOI:10.1145\/2642769.2642790"},{"issue":"22","key":"e_1_3_2_96_2","first-page":"1","article-title":"User-level process towards exascale systems","volume":"2014","author":"Shimada Akio","year":"2014","unstructured":"Akio Shimada, Atsushi Hori, Yutaka Ishikawa, and Pavan Balaji. 2014. User-level process towards exascale systems. IPSJ SIG Notes 2014, 22 (122014), 1\u20137. Retrieved from https:\/\/cir.nii.ac.jp\/crid\/1571417127882324224","journal-title":"IPSJ SIG Notes"},{"issue":"2","key":"e_1_3_2_97_2","first-page":"46","article-title":"Accelerating MPI intra-node communication using derived data types on many-core","volume":"9","author":"Shimada Akio","year":"2016","unstructured":"Akio Shimada, Atsushi Suto, Atsushi Hori, Yutaka Ishikawa, and Kenji Kono. 2016. Accelerating MPI intra-node communication using derived data types on many-core. IPSJ-ACS 9, 2 (July2016), 46\u201363. (In Japanese).","journal-title":"IPSJ-ACS"},{"key":"e_1_3_2_98_2","first-page":"665","volume-title":"Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium","author":"Si Min","year":"2015","unstructured":"Min Si, Antonio J. Pe\u00f1a, Jeff Hammond, Pavan Balaji, Masamichi Takagi, and Yutaka Ishikawa. 2015. Casper: An asynchronous progress model for MPI RMA on many-core architectures. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium. 665\u2013676. DOI:10.1109\/IPDPS.2015.35"},{"key":"e_1_3_2_99_2","volume-title":"Operating System Concepts","author":"Silberschatz Abraham","year":"2018","unstructured":"Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. 2018. Operating System Concepts (10th ed). Wiley."},{"key":"e_1_3_2_100_2","volume-title":"Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors","author":"Sobalvarro Patrick Gregory","year":"1997","unstructured":"Patrick Gregory Sobalvarro. 1997. Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors. Ph. D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.1145\/1281700.1281722"},{"volume-title":"Debugging with gdb (10th edition, for gdb version 14.0.50.20230118-git ed.)","author":"Stallman Richard","key":"e_1_3_2_102_2","unstructured":"Richard Stallman, Roland Pesch, Stan Shebs, et al. 2017. Debugging with gdb (10th edition, for gdb version 14.0.50.20230118-git ed.). Free Software Foundation. Retrieved from https:\/\/sourceware.org\/gdb\/current\/onlinedocs\/gdb.pdf"},{"issue":"3","key":"e_1_3_2_103_2","first-page":"202","article-title":"The free lunch is over: A fundamental turn toward concurrency in software","volume":"30","author":"Sutter Herb","year":"2005","unstructured":"Herb Sutter. 2005. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb\u2019s Journal 30, 3 (2005), 202\u2013210. Retrieved from http:\/\/www.gotw.ca\/publications\/concurrency-ddj.htm","journal-title":"Dr. Dobb\u2019s Journal"},{"key":"e_1_3_2_104_2","doi-asserted-by":"crossref","first-page":"1178","DOI":"10.1007\/BFb0098001","volume-title":"Parallel and Distributed Processing","author":"Takahashi Toshiyuki","year":"1999","unstructured":"Toshiyuki Takahashi, Francis O\u2019Carroll, Hiroshi Tezuka, Atsushi Hori, Shinji Sumimoto, Hiroshi Harada, Yutaka Ishikawa, and Peter H. Beckman. 1999. Implementation and evaluation of MPI on an SMP cluster. In Parallel and Distributed Processing. Jos\u00e9 Rolim, Frank Mueller, Albert Y. Zomaya, Fikret Ercal, Stephan Olariu, Binoy Ravindran, Jan Gustafsson, Hiroaki Takada, Ron Olsson, Laxmikant V. Kale, Pete Beckman, Matthew Haines, Hossam ElGindy, Denis Caromel, Serge Chaumette, Geoffrey Fox, Yi Pan, Keqin Li, Tao Yang, G. Chiola, G. Conte, L. V. Mancini, Domenique M\u00e9ry, Beverly Sanders, Devesh Bhatt, and Viktor Prasanna (Eds.), Springer Berlin Heidelberg, Berlin, 1178\u20131192."},{"key":"e_1_3_2_105_2","volume-title":"Modern Operating Systems (4th ed.)","author":"Tanenbaum Andrew S.","year":"2014","unstructured":"Andrew S. Tanenbaum and Herbert Bos. 2014. Modern Operating Systems (4th ed.). Pearson, Boston, MA."},{"key":"e_1_3_2_106_2","first-page":"366","volume-title":"Proceedings of the 2012 IEEE International Parallel Distributed Processing Symposium (IPDPS)","author":"Tchiboukdjian M.","year":"2012","unstructured":"M. Tchiboukdjian, P. Carribault, and M. P\u00e9rache. 2012. Hierarchical local storage: Exploiting flexible user-data sharing between MPI tasks. In Proceedings of the 2012 IEEE International Parallel Distributed Processing Symposium (IPDPS). 366\u2013377. DOI:10.1109\/IPDPS.2012.42"},{"key":"e_1_3_2_107_2","first-page":"366","volume-title":"Proceedings of the 2012 IEEE 26th International Parallel Distributed Processing Symposium (IPDPS)","author":"Tchiboukdjian M.","year":"2012","unstructured":"M. Tchiboukdjian, P. Carribault, and M. Perache. 2012. Hierarchical local storage: Exploiting flexible user-data sharing between MPI tasks. In Proceedings of the 2012 IEEE 26th International Parallel Distributed Processing Symposium (IPDPS). 366\u2013377. DOI:10.1109\/IPDPS.2012.42"},{"volume-title":"OpenSHMEM","author":"Team OpenSHMEM","key":"e_1_3_2_108_2","unstructured":"OpenSHMEM Team. 2010. OpenSHMEM. Retrieved September 2, 2022 from http:\/\/www.openshmem.org\/site\/"},{"key":"e_1_3_2_109_2","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1007\/BFb0031642","volume-title":"High-Performance Computing and Networking","author":"Tezuka Hiroshi","year":"1997","unstructured":"Hiroshi Tezuka, Atsushi Hori, Yutaka Ishikawa, and Mitsuhisa Sato. 1997. PM: An operating system coordinated high performance communication library. In High-Performance Computing and Networking. Bob Hertzberger and Peter Sloot (Eds.), Springer Berlin Heidelberg, Berlin, 708\u2013717."},{"key":"e_1_3_2_110_2","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1007\/978-3-540-75416-9_13","volume-title":"Recent Advances in Parallel Virtual Machine and Message Passing Interface","author":"Thakur Rajeev","year":"2007","unstructured":"Rajeev Thakur and William Gropp. 2007. Test suite for evaluating performance of MPI implementations that support MPI_THREAD_MULTIPLE. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. Franck Cappello, Thomas Herault, and Jack Dongarra (Eds.), Springer Berlin Heidelberg, Berlin, 46\u201355."},{"key":"e_1_3_2_111_2","doi-asserted-by":"publisher","DOI":"10.1145\/3547142"},{"key":"e_1_3_2_112_2","doi-asserted-by":"publisher","DOI":"10.1260\/1748-3018.5.2.199"},{"key":"e_1_3_2_113_2","doi-asserted-by":"publisher","DOI":"10.1145\/224057.224061"},{"key":"e_1_3_2_114_2","first-page":"1","volume-title":"Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing","author":"Wheeler Kyle B.","year":"2008","unstructured":"Kyle B. Wheeler, Richard C. Murphy, and Douglas Thain. 2008. Qthreads: An API for programming with millions of lightweight threads. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing. 1\u20138. DOI:10.1109\/IPDPS.2008.4536359"},{"key":"e_1_3_2_115_2","volume-title":"Runtime Techniques for Efficient Execution of Virtualized, Migratable MPI Ranks","author":"White Sam","year":"2022","unstructured":"Sam White. 2022. Runtime Techniques for Efficient Execution of Virtualized, Migratable MPI Ranks. Ph. D. Dissertation. University of Illinois at Urbana-Champaign."},{"key":"e_1_3_2_116_2","volume-title":"Single Address Space Operating Systems","author":"Wilkinson Tim","year":"1995","unstructured":"Tim Wilkinson, Kevin Murray, Stephen Russel, Gernot Heiser, and Jochen Liedtke. 1995. Single Address Space Operating Systems. Technical Report UNSW-CSE-TR-9504. School of Computer Science and Engineering."},{"key":"e_1_3_2_117_2","article-title":"The SGI Altix 3000 global shared-memory architecture","author":"WOODACRE Michael","year":"2003","unstructured":"Michael WOODACRE. 2003. The SGI Altix 3000 global shared-memory architecture. SGI White Paper (2003).","journal-title":"SGI White Paper"},{"key":"e_1_3_2_118_2","first-page":"220","volume-title":"Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS)","author":"Zheng G.","year":"2011","unstructured":"G. Zheng, S. Negara, C. L. Mendes, L. V. Kale, and E. R. Rodrigues. 2011. Automatic handling of global variables for multi-threaded MPI programs. In Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS). 220\u2013227. DOI:10.1109\/ICPADS.2011.33"},{"key":"e_1_3_2_119_2","doi-asserted-by":"publisher","DOI":"10.1145\/2996190"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746169","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,1]],"date-time":"2025-09-01T12:53:44Z","timestamp":1756731224000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746169"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":118,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3746169"],"URL":"https:\/\/doi.org\/10.1145\/3746169","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"type":"print","value":"0360-0300"},{"type":"electronic","value":"1557-7341"}],"subject":[],"published":{"date-parts":[[2025,9]]},"assertion":[{"value":"2023-07-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-28","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}