{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:30:21Z","timestamp":1773246621459,"version":"3.50.1"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,21]],"date-time":"2025-03-21T00:00:00Z","timestamp":1742515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["#62302416"],"award-info":[{"award-number":["#62302416"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Guangzhou-HKUST(GZ) Joint Funding Program","award":["#2023A03J0138"],"award-info":[{"award-number":["#2023A03J0138"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>\n            Hypergraph partitioning finds practical applications in various fields, such as high-performance computing and circuit partitioning in VLSI physical design, where high-performance solutions often demand substantial parallelism beyond what existing CPU-based solutions can offer. While GPUs are promising in this regard, their potential in hypergraph partitioning remains unexplored. In this work, we first develop an end-to-end deterministic hypergraph partitioner on GPUs, ported from state-of-the-art multi-threaded CPU work, and identify three major performance challenges by characterizing its performance. We propose the first end-to-end solution,\n            <jats:sc>gHyPart<\/jats:sc>\n            , to unleash the potentials of hypergraph partitioning on GPUs. To overcome the challenges of GPU thread underutilization due to imbalanced workload, long critical path, and high work complexity due to excessive operations, we redesign GPU algorithms with diverse parallelization strategies thus expanding optimization space; to address the challenge of no one-size-fits-all implementation for various input hypergraphs, we propose a decision tree-based strategy to choose a suitable parallelization strategy for each kernel. Evaluation on 500 hypergraphs shows up to 125.7\u00d7 (17.5\u00d7 on average), 640.0\u00d7 (24.2\u00d7 on average), and 171.6\u00d7 (1.4\u00d7 on average) speedups over two CPU partitioners and our GPU baseline\n            <jats:sc>gHyPart-B<\/jats:sc>\n            , respectively.\n          <\/jats:p>\n          <jats:p\/>","DOI":"10.1145\/3711925","type":"journal-article","created":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T11:21:26Z","timestamp":1736508086000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["gHyPart: GPU-friendly End-to-End Hypergraph Partitioner"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-6348-5211","authenticated-orcid":false,"given":"Zhenlin","family":"Wu","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology - Guangzhou Campus, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4735-4015","authenticated-orcid":false,"given":"Haosong","family":"Zhao","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology - Guangzhou Campus, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6961-6394","authenticated-orcid":false,"given":"Hongyuan","family":"Liu","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology - Guangzhou Campus, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3440-1905","authenticated-orcid":false,"given":"Wujie","family":"Wen","sequence":"additional","affiliation":[{"name":"North Carolina State University at Raleigh, Raleigh, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1270-4147","authenticated-orcid":false,"given":"Jiajia","family":"Li","sequence":"additional","affiliation":[{"name":"North Carolina State University at Raleigh, Raleigh, United States"}]}],"member":"320","published-online":{"date-parts":[[2025,3,21]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/274535.274546"},{"key":"e_1_3_3_3_2","volume-title":"Proceedings of SAT Competition 2014: Solver and Benchmark Descriptions","author":"J\u00e4rvisalo Anton Belov, Daniel Diepold, Marijn J. H. Heule, and Matti","year":"2014","unstructured":"Anton Belov, Daniel Diepold, Marijn J. H. Heule, and Matti J\u00e4rvisalo. 2014. The 2014 sat competition benchmarks. In Proceedings of SAT Competition 2014: Solver and Benchmark Descriptions."},{"key":"e_1_3_3_4_2","volume-title":"International Conference on High Performance Computing & Simulation (HPCS\u201919)","author":"Goswami Bahareh Goodarzi, Farzad Khorasani, Vivek Sarkar, and Dhrubajyoti","year":"2019","unstructured":"Bahareh Goodarzi, Farzad Khorasani, Vivek Sarkar, and Dhrubajyoti Goswami. 2019. High performance multilevel graph partitioning on GPU. In International Conference on High Performance Computing & Simulation (HPCS\u201919)."},{"key":"e_1_3_3_5_2","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW\u201916)","author":"Goswami Bahareh Goodarzi, Martin Burtscher, and Dhrubajyoti","year":"2016","unstructured":"Bahareh Goodarzi, Martin Burtscher, and Dhrubajyoti Goswami. 2016. Parallel graph partitioning on a CPU-GPU architecture. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW\u201916)."},{"key":"e_1_3_3_6_2","doi-asserted-by":"crossref","DOI":"10.1145\/227234.227246","article-title":"Programming parallel algorithms","author":"Blelloch Guy E.","year":"1996","unstructured":"Guy E. Blelloch. 1996. Programming parallel algorithms. Communication of the ACM 39, 3 (1996), 85\u201397.","journal-title":"Communication of the ACM"},{"key":"e_1_3_3_7_2","volume-title":"Proceedings of the 1995 ACM\/IEEE Conference on Supercomputing (SC\u201995)","author":"Leland Bruce Hendrickson and Robert","year":"1995","unstructured":"Bruce Hendrickson and Robert Leland. 1995. A multilevel algorithm for partitioning graphs. In Proceedings of the 1995 ACM\/IEEE Conference on Supercomputing (SC\u201995)."},{"key":"e_1_3_3_8_2","volume-title":"IEEE\/ACM International Conference on Computer Aided Design (ICCAD\u201923)","author":"Bustany Ismail","year":"2023","unstructured":"Ismail Bustany, Grigor Gasparyan, Andrew B. Kahng, Ioannis Koutis, Bodhisatta Pramanik, and Zhiang Wang. 2023. An open-source constraints-driven general partitioning multi-tool for VLSI physical design. In IEEE\/ACM International Conference on Computer Aided Design (ICCAD\u201923)."},{"key":"e_1_3_3_9_2","volume-title":"IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201917)","author":"Catalyurek Umit V.","year":"2007","unstructured":"Umit V. Catalyurek, Erik G. Boman, Karen D. Devine, Doruk Bozdag, Robert Heaphy, and Lee Ann Riesen. 2007. Hypergraph-based dynamic load balancing for adaptive scientific computations. In IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201917)."},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.5555\/832285.835618"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/157485.165119"},{"key":"e_1_3_3_12_2","volume-title":"IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906)","author":"Devine Karen D.","year":"2006","unstructured":"Karen D. Devine, Erik G. Boman, Robert T. Heaphy, Rob H. Bisseling, and Umit V. Catalyurek. 2006. Parallel hypergraph partitioning for scientific computing. In IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906)."},{"key":"e_1_3_3_13_2","volume-title":"Technical Mannual, Department of Computer Science, University of Minnesota","author":"Kumar George Karypis and Vipin","year":"1998","unstructured":"George Karypis and Vipin Kumar. 1998. hMetis: A hypergraph partitioning package, version 1.5.3. In Technical Mannual, Department of Computer Science, University of Minnesota."},{"key":"e_1_3_3_14_2","volume-title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","author":"Shekhar George Karypis, Rajat Aggarwal, Vipin Kumar, and Shashi","year":"1999","unstructured":"George Karypis, Rajat Aggarwal, Vipin Kumar, and Shashi Shekhar. 1999. Multilevel hypergraph partitioning applications in VLSI domain. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems."},{"key":"e_1_3_3_15_2","volume-title":"IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201921)","author":"Gilbert Michael S.","year":"2021","unstructured":"Michael S. Gilbert, Seher Acer, Erik G. Boman, Kamesh Madduri, and Sivasankaran Rajamanickam. 2021. Performance-portable graph coarsening for efficient multilevel graph analysis. In IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201921)."},{"key":"e_1_3_3_16_2","article-title":"Scalable high-quality hypergraph partitioning","author":"Gottesb\u00fcren Lars","year":"2023","unstructured":"Lars Gottesb\u00fcren, Tobias Heuer, Nikolai Maas, Peter Sanders, and Sebastian Schlag. 2023. Scalable high-quality hypergraph partitioning. ACM Transactions on Algorithms 20, 1, Article No.: 9 (2023), 1\u201354.","journal-title":"ACM Transactions on Algorithms"},{"key":"e_1_3_3_17_2","volume-title":"28th International Conference on Parallel and Distributed Computing, Proceedings (Euro-Par\u201922)","author":"Gottesb\u00fcren Lars","year":"2022","unstructured":"Lars Gottesb\u00fcren and Michael Hamann. 2022. Deterministic parallel hypergraph partitioning. In 28th International Conference on Parallel and Distributed Computing, Proceedings (Euro-Par\u201922)."},{"key":"e_1_3_3_18_2","doi-asserted-by":"crossref","unstructured":"Johnnie Gray and Stefanos Kourtis. 2021. Hyper-optimized tensor network contraction. Quantum 5 (2021) 410.","DOI":"10.22331\/q-2021-03-15-410"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415631"},{"key":"e_1_3_3_20_2","volume-title":"Communications of the ACM","author":"Blelloch Guy E.","year":"1996","unstructured":"Guy E. Blelloch. 1996. Programming parallel algorithms. In Communications of the ACM."},{"key":"e_1_3_3_21_2","volume-title":"Hypergraph-Based Optimisations for Scalable Graph Analytics and Learning","author":"Haldar Aparajita","year":"2022","unstructured":"Aparajita Haldar. 2022. Hypergraph-Based Optimisations for Scalable Graph Analytics and Learning. Ph. D. Dissertation. University of Warwick."},{"key":"e_1_3_3_22_2","volume-title":"IEEE Transactions on Very Large Scale Integration Systems","author":"Kim Hyunchul Shin and Chunghee","year":"1993","unstructured":"Hyunchul Shin and Chunghee Kim. 1993. A simple yet effective technique for partitioning. In IEEE Transactions on Very Large Scale Integration Systems."},{"key":"e_1_3_3_23_2","article-title":"HyperX: A scalable hypergraph framework","author":"Jiang Wenkai","year":"2018","unstructured":"Wenkai Jiang, Jianzhong Qi, Jeffrey Xu Yu, Jin Huang, and Rui Zhang. 2018. HyperX: A scalable hypergraph framework. IEEE Transactions on Knowledge and Data Engineering (TKDE) 31, 5 (2018), 909\u2013922.","journal-title":"IEEE Transactions on Knowledge and Data Engineering (TKDE)"},{"key":"e_1_3_3_24_2","volume-title":"Addison-Wesley","author":"J\u00e1J\u00e1 Joseph","year":"1992","unstructured":"Joseph J\u00e1J\u00e1. 1992. An introduction to parallel algorithms. In Addison-Wesley."},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374527"},{"key":"e_1_3_3_26_2","volume-title":"Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS\u201906)","author":"\u00c7ataly\u00fcrek Karen D. Devine, Erik G. Boman, Robert T. Heaphy, Rob H. Bisseling, and \u00dcmit V.","year":"2006","unstructured":"Karen D. Devine, Erik G. Boman, Robert T. Heaphy, Rob H. Bisseling, and \u00dcmit V. \u00c7ataly\u00fcrek. 2006. Parallel hypergraph partitioning for scientific computing. In Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS\u201906)."},{"key":"e_1_3_3_27_2","volume-title":"Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT\u201913)","author":"Kay\u0131ran Onur","year":"2013","unstructured":"Onur Kay\u0131ran, Adwait Jog, Mahmut T. Kandemir, and Chita R. Das. 2013. Neither more nor less: Optimizing thread-level parallelism for GPGPUs. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT\u201913)."},{"key":"e_1_3_3_28_2","volume-title":"Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919)","author":"Sun Ke Meng, Jiajia Li, Guangming Tan, and Ninghui","year":"2019","unstructured":"Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A pattern based algorithmic autotuner for graph processing on GPUs. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919)."},{"key":"e_1_3_3_29_2","volume-title":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM\u201916)","author":"Mirrokni Kevin Aydin, MohammadHossein Bateni, and Vahab","year":"2016","unstructured":"Kevin Aydin, MohammadHossein Bateni, and Vahab Mirrokni. 2016. Distributed balanced partitioning via linear embedding. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM\u201916)."},{"key":"e_1_3_3_30_2","volume-title":"Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX\u201921)","author":"Schlag Lars Gottesb\u00fcren, Tobias Heuer, Peter Sanders, and Sebastian","year":"2021","unstructured":"Lars Gottesb\u00fcren, Tobias Heuer, Peter Sanders, and Sebastian Schlag. 2021. Scalable shared-memory hypergraph partitioning. In Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX\u201921)."},{"key":"e_1_3_3_31_2","volume-title":"IEEE\/ACM International Conference on Computer-Aided Design (ICCAD\u201992)","author":"Kahng Lars Hagen and Andrew B.","year":"1992","unstructured":"Lars Hagen and Andrew B. Kahng. 1992. A new approach to effective circuit clustering. In IEEE\/ACM International Conference on Computer-Aided Design (ICCAD\u201992)."},{"key":"e_1_3_3_32_2","volume-title":"IEEE\/ACM Asia and South Pacific Design Automation Conference (ASP-DAC\u201925)","author":"Lee Wan-Luan","year":"2025","unstructured":"Wan-Luan Lee, Dian-Lun Lin, Cheng-Hsiang Chiu, Ulf Schlichtmann, and Tsung-Wei Huang. 2025. HyperG: Multilevel GPU-Accelerated k-way hypergraph partitioner. In IEEE\/ACM Asia and South Pacific Design Automation Conference (ASP-DAC\u201925)."},{"key":"e_1_3_3_33_2","doi-asserted-by":"crossref","unstructured":"Wan Luan Lee Dian-Lun Lin Tsung-Wei Huang Shui Jiang Tsung-Yi Ho Yibo Lin and Bei Yu. 2024. G-kway: Multilevel GPU-accelerated k-way graph partitioner. In Proceedings of the 61st ACM\/IEEE Design Automation Conference. 1\u20136.","DOI":"10.1145\/3649329.3656238"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626184.3633319"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD51958.2021.9643563"},{"key":"e_1_3_3_36_2","doi-asserted-by":"crossref","unstructured":"Yibo Lin Shounak Dhar Wuxi Li Haoxing Ren Brucek Khailany and David Z. Pan. 2019. Dreamplace: Deep learning toolkit-enabled GPU acceleration for modern VLSI placement. In Proceedings of the 56th Annual Design Automation Conference. 2019 1\u20136.","DOI":"10.1145\/3316781.3317803"},{"key":"e_1_3_3_37_2","volume-title":"Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC\u201915)","author":"Yoon Lin Cheng, Hyunsu Cho, and Peter","year":"2015","unstructured":"Lin Cheng, Hyunsu Cho, and Peter Yoon. 2015. An accelerated procedure for hypergraph coarsening on the GPU. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC\u201915)."},{"key":"e_1_3_3_38_2","volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC\u201919)","author":"Liu Hang","year":"2019","unstructured":"Hang Liu and H Howie Huang. 2019. SIMD-X: Programming and processing of graph algorithms on GPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC\u201919)."},{"key":"e_1_3_3_39_2","article-title":"NCTU-GR 2.0: Multithreaded collision-aware global routing with bounded-length maze routing","author":"Liu Wen-Hao","year":"2013","unstructured":"Wen-Hao Liu, Wei-Chun Kao, Yih-Lang Li, and Kai-Yuan Chao. 2013. NCTU-GR 2.0: Multithreaded collision-aware global routing with bounded-length maze routing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 5 (2013), 709\u201372.","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems"},{"key":"e_1_3_3_40_2","volume-title":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201921)","author":"Maleki Sepideh","year":"2021","unstructured":"Sepideh Maleki, Udit Agarwal, Martin Burtscher, and Keshav Pingali. 2021. BiPart: A parallel and deterministic hypergraph partitioner. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201921)."},{"key":"e_1_3_3_41_2","unstructured":"Maven Silicon. 2023. Introduction to Machine Learning in VLSI. Retrieved from https:\/\/www.maven-silicon.com\/blog\/machine-learning-in-vlsi\/"},{"key":"e_1_3_3_42_2","first-page":"117","volume-title":"Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201912)","author":"Merrill Duane","year":"2012","unstructured":"Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201912). 117\u2013128."},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3673038.3673152"},{"key":"e_1_3_3_44_2","volume-title":"Proceedings of the 49th Annual Design Automation Conference (DAC\u201912)","author":"Wei Natarajan Viswanathan, Charles Alpert, Cliff Sze, Zhuo Li, and Yaoguang","year":"2012","unstructured":"Natarajan Viswanathan, Charles Alpert, Cliff Sze, Zhuo Li, and Yaoguang Wei. 2012. The DAC 2012 routability-driven placement contest and benchmark suite. In Proceedings of the 49th Annual Design Automation Conference (DAC\u201912)."},{"key":"e_1_3_3_45_2","volume-title":"NVIDIA Technical Report","author":"Naumov Maxim","year":"2016","unstructured":"Maxim Naumov and Timothy Moon. 2016. Parallel spectral graph partitioning. In NVIDIA Technical Report."},{"key":"e_1_3_3_46_2","unstructured":"NVIDIA Corporation. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. Retrieved from https:\/\/www.nvidia.com\/content\/PDF\/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf"},{"key":"e_1_3_3_47_2","unstructured":"NVIDIA Corporation. 2023. NVIDIA ADA GPU ARCHITECTURE. Retrieved from https:\/\/images.nvidia.com\/aem-dam\/Solutions\/Data-Center\/l4\/nvidia-ada-gpu-architecture-whitepaper-v2.1.pdf"},{"key":"e_1_3_3_48_2","unstructured":"NVIDIA Research. 2023. CUB: Cooperative primitives for CUDA C++. Release: 2.0.1.. In https:\/\/github.com\/NVIDIA\/cub"},{"key":"e_1_3_3_49_2","volume-title":"Proceedings of the International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA\u201916)","author":"Pai Sreepathi","year":"2016","unstructured":"Sreepathi Pai and Keshav Pingali. 2016. A compiler for throughput optimization of graph algorithms on GPUs. In Proceedings of the International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA\u201916)."},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.291466"},{"key":"e_1_3_3_51_2","volume-title":"ACM Journal of Experimental Algorithmics","author":"Sanders Sebastian Schlag, Tobias Heuer, Lars Gottesb\u00fcren, Yaroslav Akhremtsev, Christian Schulz, and Peter","year":"2023","unstructured":"Sebastian Schlag, Tobias Heuer, Lars Gottesb\u00fcren, Yaroslav Akhremtsev, Christian Schulz, and Peter Sanders. 2023. High-quality hypergraph partitioning. In ACM Journal of Experimental Algorithmics."},{"key":"e_1_3_3_52_2","volume-title":"Proceedings of the 18th Workshop on Algorithm Engineering and Experiments (ALENEX\u201916)","author":"Schulz Sebastian Schlag, Vitali Henne, Tobias Heuer, Henning Meyerhenke, Peter Sanders, and Christian","year":"2016","unstructured":"Sebastian Schlag, Vitali Henne, Tobias Heuer, Henning Meyerhenke, Peter Sanders, and Christian Schulz. 2016. k-way hypergraph partitioning via n-Level recursive bisection. In Proceedings of the 18th Workshop on Algorithm Engineering and Experiments (ALENEX\u201916)."},{"key":"e_1_3_3_53_2","volume-title":"IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW\u201920)","author":"Rajamanickam Seher Acer, Erik G. Boman, and Sivasankaran","year":"2020","unstructured":"Seher Acer, Erik G. Boman, and Sivasankaran Rajamanickam. 2020. SPHYNX: Spectral partitioning for HYbrid aNd aXelerator-enabled systems. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW\u201920)."},{"key":"e_1_3_3_54_2","volume-title":"Parallel Computing","author":"Rajamanickam Seher Acer, Erik G. Boman, Christian A. Glusa, and Sivasankaran","year":"2021","unstructured":"Seher Acer, Erik G. Boman, Christian A. Glusa, and Sivasankaran Rajamanickam. 2021. Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems. In Parallel Computing."},{"key":"e_1_3_3_55_2","volume-title":"Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201913)","author":"Shun Julian","year":"2013","unstructured":"Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201913)."},{"key":"e_1_3_3_56_2","article-title":"Parallel local graph clustering","author":"Shun Julian","year":"2016","unstructured":"Julian Shun, Farbod Roosta-Khorasani, Kimon Fountoulakis, and Michael W. Mahoney. 2016. Parallel local graph clustering. Proceedings of the VLDB Endowment 9, 12 (2016), 1041\u20131052.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_3_57_2","volume-title":"IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201915)","author":"Smith Shaden","year":"2015","unstructured":"Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201915)."},{"key":"e_1_3_3_58_2","volume-title":"ACM Transactions on Mathematical Software (TOMS\u201911)","author":"Hu Timothy A. Davis and Yifan","year":"2011","unstructured":"Timothy A. Davis and Yifan Hu. 2011. The university of florida sparse matrix collection. In ACM Transactions on Mathematical Software (TOMS\u201911)."},{"key":"e_1_3_3_59_2","volume-title":"16th International Symposium on Experimental Algorithms (SEA\u201917)","author":"Schlag Tobias Heuer and Sebastian","year":"2017","unstructured":"Tobias Heuer and Sebastian Schlag. 2017. Improving coarsening schemes for hypergraph partitioning by exploiting community structure. In 16th International Symposium on Experimental Algorithms (SEA\u201917)."},{"key":"e_1_3_3_60_2","volume-title":"Proceedings of the 9th IEEE International High-Level Design Validation and Test Workshop","author":"Kalla Vijay Durairaj and Priyank","year":"2004","unstructured":"Vijay Durairaj and Priyank Kalla. 2004. Exploiting hypergraph partitioning for efficient boolean satisfiability. In Proceedings of the 9th IEEE International High-Level Design Validation and Test Workshop."},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295733"},{"key":"e_1_3_3_62_2","volume-title":"Proceedings of the 19th Workshop on Algorithm Engineering and Experiments (ALENEX\u201917)","author":"Schlag Yaroslav Akhremtsev, Tobias Heuer, Peter Sanders, and Sebastian","year":"2017","unstructured":"Yaroslav Akhremtsev, Tobias Heuer, Peter Sanders, and Sebastian Schlag. 2017. Engineering a direct k-way hypergraph partitioning algorithm. In Proceedings of the 19th Workshop on Algorithm Engineering and Experiments (ALENEX\u201917)."},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.5555\/AAI28240362"},{"key":"e_1_3_3_64_2","volume-title":"IEEE Transactions on Parallel and Distributed Systems (TPDS\u201999)","author":"Aykanat \u00dcmit V. \u00c7ataly\u00fcrek and Cevdet","year":"1999","unstructured":"\u00dcmit V. \u00c7ataly\u00fcrek and Cevdet Aykanat. 1999. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. In IEEE Transactions on Parallel and Distributed Systems (TPDS\u201999)."},{"key":"e_1_3_3_65_2","volume-title":"Technical Mannual","author":"Aykanat \u00dcmit V. \u00c7ataly\u00fcrek and Cevdet","year":"1999","unstructured":"\u00dcmit V. \u00c7ataly\u00fcrek and Cevdet Aykanat. 1999. PaToH: Partitioning tool for hypergraphs. In Technical Mannual."},{"key":"e_1_3_3_66_2","volume-title":"ACM Computing Surveys","author":"Wagner \u00dcmit \u00c7ataly\u00fcrek, Karen Devine, Marcelo Faraj, Lars Gottesb\u00fcren, Tobias Heuer, Henning Meyerhenke, Peter Sanders, Sebastian Schlag, Christian Schulz, Daniel Seemaier, and Dorothea","year":"2023","unstructured":"\u00dcmit \u00c7ataly\u00fcrek, Karen Devine, Marcelo Faraj, Lars Gottesb\u00fcren, Tobias Heuer, Henning Meyerhenke, Peter Sanders, Sebastian Schlag, Christian Schulz, Daniel Seemaier, and Dorothea Wagner. 2023. More recent advances in (Hyper)Graph partitioning. In ACM Computing Surveys."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3711925","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3711925","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:10Z","timestamp":1750295890000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3711925"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,21]]},"references-count":65,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3711925"],"URL":"https:\/\/doi.org\/10.1145\/3711925","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,21]]},"assertion":[{"value":"2024-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-25","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}