{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T04:11:13Z","timestamp":1751429473178,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"name":"National Key Research and Development Program of China","award":["2023YFB3001504"],"award-info":[{"award-number":["2023YFB3001504"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62302302, and 62232011"],"award-info":[{"award-number":["62302302, and 62232011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>\n            In high-performance computing (HPC), parallelization is essential for improving computational efficiency as data and computation scales exceed single-node capacity. Existing methods, such as the polyhedral model used in\n            <jats:sc>Pluto<\/jats:sc>\n            -Distmem, focus on loop and array optimizations within shared memory but struggle with high communication overheads and inflexibility in distributed environments. These methods often fail to effectively partition computation and manage data across nodes, leading to suboptimal performance.\n          <\/jats:p>\n          <jats:p>\n            This paper presents\n            <jats:sc>Arachne<\/jats:sc>\n            , an innovative system designed to address these shortcomings by generating distributed parallel code with minimized communication overhead. The system introduces a dynamic programming algorithm to optimally distribute computational tasks across multiple processes, ensuring minimal communication costs. It also incorporates user-friendly compiler directives, allowing programmers to influence code generation easily and accommodate a broader range of parallelization scenarios without needing in-depth knowledge of parallel architectures.\n            <jats:sc>Arachne<\/jats:sc>\n            significantly reduces the learning curve and need for extensive code modifications, making parallel programming more accessible and efficient. Evaluation of various HPC benchmarks demonstrates that\n            <jats:sc>Arachne<\/jats:sc>\n            outperforms existing methods by reducing communication overhead, lowering memory requirements, and supporting more complex parallel logic, thus enhancing the overall scalability and efficiency of HPC applications.\n          <\/jats:p>","DOI":"10.1145\/3716871","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T11:27:26Z","timestamp":1739273246000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["ARACHNE: Optimizing Distributed Parallel Applications with Reduced Inter-Process Communication"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-8257-1230","authenticated-orcid":false,"given":"Yifu","family":"He","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1561-5329","authenticated-orcid":false,"given":"Han","family":"Zhao","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6646-5260","authenticated-orcid":false,"given":"Weihao","family":"Cui","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0802-7203","authenticated-orcid":false,"given":"Shulai","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5832-0347","authenticated-orcid":false,"given":"Quan","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0034-2302","authenticated-orcid":false,"given":"Minyi","family":"Guo","sequence":"additional","affiliation":[{"name":"Computer Science, Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,30]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2012. PolyBench. https:\/\/web.cs.ucla.edu\/pouchet\/software\/polybench\/. (2012)."},{"key":"e_1_3_1_3_2","unstructured":"2022. NPB-CPP. https:\/\/github.com\/GMAP\/NPB-CPP. (2022)."},{"key":"e_1_3_1_4_2","unstructured":"2023. Integer Set Library. https:\/\/libisl.sourceforge.io\/. (2023)."},{"key":"e_1_3_1_5_2","unstructured":"2023. NAS Parallel Benchmark. https:\/\/www.nas.nasa.gov\/software\/npb.html. (2023)."},{"key":"e_1_3_1_6_2","unstructured":"2024. Clang LibTooling. https:\/\/clang.llvm.org\/docs\/LibTooling.html. (2024)."},{"key":"e_1_3_1_7_2","unstructured":"2024. LLVM Pass. https:\/\/llvm.org\/docs\/WritingAnLLVMPass.html. (2024)."},{"key":"e_1_3_1_8_2","volume-title":"Optimizing Compilers for Modern Architectures: A Dependence-based Approach","author":"Allen Randy","year":"2001","unstructured":"Randy Allen and Ken Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/155090.155102"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/209936.209954"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/155090.155101"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661197"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2023.12.025"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088174"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503289"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/1229428.1229446"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/1614191"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/556139"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2013.6618833"},{"key":"e_1_3_1_21_2","volume-title":"A Polyhedral Approach for Auto-Parallelization using a Distributed Virtual Machine","author":"Montis Damien de","year":"2021","unstructured":"Damien de Montis, Jean-Baptiste Besnard, and Christophe Alias. 2021. A Polyhedral Approach for Auto-Parallelization using a Distributed Virtual Machine. Ph.D. Dissertation. INRIA, LIP-ENS Lyon; Paratools."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/SNPD.2013.38"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01407931"},{"key":"e_1_3_1_24_2","volume-title":"Designing and Building Parallel Programs - Concepts and Tools for Parallel Software Engineering","author":"Foster Ian T.","year":"1995","unstructured":"Ian T. Foster. 1995. Designing and Building Parallel Programs - Concepts and Tools for Parallel Software Engineering. Addison-Wesley."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/J.PARCO.2006.07.003"},{"key":"e_1_3_1_26_2","volume-title":"Automatic Parallelization of Loop Programs for Distributed Memory Architectures","author":"Griebl Martin","year":"2004","unstructured":"Martin Griebl et\u00a0al. 2004. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. Univ. Passau, Passau, Germany."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2743016"},{"key":"e_1_3_1_28_2","volume-title":"Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Sanjay Rajopadhye and Sven Verdoolaege (Eds.). Vienna, Austria","author":"Guo Jing","year":"2014","unstructured":"Jing Guo, Robert Bernecky, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2014. Polyhedral methods for improving parallel update-in-place. In Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Sanjay Rajopadhye and Sven Verdoolaege (Eds.). Vienna, Austria."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.5555\/2048577.2048584"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-019-00640-3"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/73560.73588"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","unstructured":"Caigui Jiang Chengcheng Tang Amir Vaxman Peter Wonka and Helmut Pottmann. 2015. Polyhedral patterns. 34 6 Article 172 (Nov.2015) 12 pages. 10.1145\/2816795.2818077","DOI":"10.1145\/2816795.2818077"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607096"},{"key":"e_1_3_1_34_2","first-page":"1","volume-title":"GROW 2009: 1st International Workshop on GCC Research Opportunities","author":"Kouadri-Mostefaoui Abdellah","year":"2009","unstructured":"Abdellah Kouadri-Mostefaoui, Daniel Millot, Christian Parrot, and Fr\u00e9d\u00e9rique Silber-Chaussumier. 2009. Prototyping the automatic generation of MPI code from OpenMP programs in GCC. In GROW 2009: 1st International Workshop on GCC Research Opportunities. 1\u201311."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/156619"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-16-1483-5_9"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2401.02180"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.14"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","unstructured":"Mahesh Ravishankar Roshan Dathathri Venmugil Elango Louis-No\u00ebl Pouchet J. Ramanujam Atanas Rountev and P. Sadayappan. 2015. Distributed memory code generation for mixed irregular\/regular computations(PPoPP 2015). Association for Computing Machinery New York NY USA 65\u201375. 10.1145\/2688500.2688515","DOI":"10.1145\/2688500.2688515"},{"key":"e_1_3_1_40_2","article-title":"OMP2MPI: Automatic MPI code generation from OpenMP programs","volume":"1502","author":"Sa\u00e0-Garriga Albert","year":"2015","unstructured":"Albert Sa\u00e0-Garriga, David Castells-Rufas, and Jordi Carrabina. 2015. OMP2MPI: Automatic MPI code generation from OpenMP programs. CoRR abs\/1502.02921 (2015). arXiv:1502.02921http:\/\/arxiv.org\/abs\/1502.02921","journal-title":"CoRR"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3624062.3624063"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPDC51135.2020.00016"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/181181.181261"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.5555\/353939"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523437"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00044"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","unstructured":"Wei Zuo Yun Liang Peng Li Kyle Rupnow Deming Chen and Jason Cong. 2013. Improving high level synthesis optimization opportunity through polyhedral transformations(FPGA\u201913). Association for Computing Machinery New York NY USA 9\u201318. 10.1145\/2435264.2435271","DOI":"10.1145\/2435264.2435271"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3716871","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T11:11:13Z","timestamp":1751368273000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716871"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":46,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3716871"],"URL":"https:\/\/doi.org\/10.1145\/3716871","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2024-05-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}