{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T07:42:10Z","timestamp":1774942930026,"version":"3.50.1"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","license":[{"start":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T00:00:00Z","timestamp":1601424000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002428","name":"Austrian Science Fund","doi-asserted-by":"publisher","award":["P 31763-N31"],"award-info":[{"award-number":["P 31763-N31"]}],"id":[{"id":"10.13039\/501100002428","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["SA 933\/11-1"],"award-info":[{"award-number":["SA 933\/11-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ACM J. Exp. Algorithmics"],"published-print":{"date-parts":[[2020,12,6]]},"abstract":"<jats:p>Communication and topology-aware process mapping is a powerful approach to reduce communication time in parallel applications with known communication patterns on large, distributed memory systems. We address the problem as a quadratic assignment problem (QAP) and present algorithms to construct initial mappings of processes to processors and fast local search algorithms to further improve the mappings. By exploiting assumptions that typically hold for applications and modern supercomputer systems such as sparse communication patterns and hierarchically organized communication systems, we obtain significantly more powerful algorithms for these special QAPs. Our multilevel construction algorithms employ perfectly balanced graph partitioning techniques and exploit the given communication system hierarchy in significant ways. We present improvements to a local search algorithm of Brandfass et\u00a0al. (2013) and further decrease the running time by reducing the time needed to perform swaps in the assignment as well as by carefully constraining local search neighborhoods. We also investigate different algorithms to create the communication graph that is mapped onto the processor network. Experiments indicate that our algorithms not only dramatically speed up local search but also, due to the multilevel approach, find much better solutions in practice.<\/jats:p>","DOI":"10.1145\/3409667","type":"journal-article","created":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T11:13:41Z","timestamp":1601464421000},"page":"1-19","source":"Crossref","is-referenced-by-count":3,"title":["Better Process Mapping and Sparse Quadratic Assignment"],"prefix":"10.1145","volume":"25","author":[{"given":"Konrad Von","family":"Kirchbach","sequence":"first","affiliation":[{"name":"TU Wien, Faculty of Informatics, Vienna, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2823-3506","authenticated-orcid":false,"given":"Christian","family":"Schulz","sequence":"additional","affiliation":[{"name":"University of Vienna, Faculty of Computer Science, Vienna, Austria"}]},{"given":"Jesper Larsson","family":"Tr\u00e4ff","sequence":"additional","affiliation":[{"name":"TU Wien, Faculty of Informatics, Vienna, Austria"}]}],"member":"320","published-online":{"date-parts":[[2020,9,30]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201914)","author":"Abdel-Gawad A. H."},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"D. A. Bader H. Meyerhenke P. Sanders C. Schulz A. Kappes and D. Wagner. 2014. Benchmarking for graph clustering and partitioning. In Encyclopedia of Social Network Analysis and Mining. Springer 73--82.  D. A. Bader H. Meyerhenke P. Sanders C. Schulz A. Kappes and D. Wagner. 2014. Benchmarking for graph clustering and partitioning. In Encyclopedia of Social Network Analysis and Mining. Springer 73--82.","DOI":"10.1007\/978-1-4614-6170-8_23"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"M. A.\n      Bender\n     and \n      M.\n      Farach-Colton\n  . \n  2000\n  . The LCA problem revisited. In Proceedings of the Latin American Symposium on Theoretical Informatics Lecture Notes in Computer Science Vol. \n  1776\n  . \n  Springer 88--94.  M. A. Bender and M. Farach-Colton. 2000. The LCA problem revisited. In Proceedings of the Latin American Symposium on Theoretical Informatics Lecture Notes in Computer Science Vol. 1776. Springer 88--94.","DOI":"10.1007\/10719839_9"},{"key":"e_1_2_1_4_1","unstructured":"C. Bichot and P. Siarry (Eds.). 2011. Graph Partitioning. Wiley.  C. Bichot and P. Siarry (Eds.). 2011. Graph Partitioning. Wiley."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2012.01.019"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"A.\n      Bulu\u00e7 H.\n      Meyerhenke I.\n      Safro P.\n      Sanders and \n      C.\n      Schulz\n  . \n  2016\n  . Recent advances in graph partitioning. \n  In Algorithm\u00a0Engineering\u2014Selected Results and Surveys Lecture Notes in Computer Science Vol. \n  9220\n  . 117--158. DOI:https:\/\/doi.org\/10.1007\/978-3-319-49487-6_4  A. Bulu\u00e7 H. Meyerhenke I. Safro P. Sanders and C. Schulz. 2016. Recent advances in graph partitioning. In Algorithm\u00a0Engineering\u2014Selected Results and Surveys Lecture Notes in Computer Science Vol. 9220. 117--158. DOI:https:\/\/doi.org\/10.1007\/978-3-319-49487-6_4","DOI":"10.1007\/978-3-319-49487-6_4"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"R. E Burkard E. Cela P. M. Pardalos and L. S. Pitsoulis. 1998. The quadratic assignment problem. In Handbook of Combinatorial Optimization. Springer 1713--1809.  R. E Burkard E. Cela P. M. Pardalos and L. S. Pitsoulis. 1998. The quadratic assignment problem. In Handbook of Combinatorial Optimization. Springer 1713--1809.","DOI":"10.1007\/978-1-4613-0303-9_27"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"\u00dc. \n      V.\n      \u00c7ataly\u00fcrek\n     and \n      C.\n      Aykanat\n  . \n  1996\n  . Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In Proceedings of the 3rd International Workshop on Parallel Algorithms for Irregularly Structured Problems Lecture Notes in Computer Science Vol. \n  1117\n  . \n  Springer 75--86.  \u00dc. V. \u00c7ataly\u00fcrek and C. Aykanat. 1996. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In Proceedings of the 3rd International Workshop on Parallel Algorithms for Irregularly Structured Problems Lecture Notes in Computer Science Vol. 1117. Springer 75--86.","DOI":"10.1007\/BFb0030098"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_2_1_10_1","volume-title":"Algorithmics of Large and Complex Networks. LNCS State-of-the-Art Survey","volume":"5515","author":"Delling D."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 19th Conference on Design Automation. 175--181","author":"Fiduccia C. M."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"J.\n      Fietz M.\n      Krause C.\n      Schulz P.\n      Sanders and \n      V.\n      Heuveline\n  . \n  2012\n  . Optimized hybrid parallel lattice boltzmann fluid flow simulations on complex geometries. In Proceedings of the European Conference on Parallel Processing (Euro-Par\u201912) Lecture Notes in Computer Science Vol. \n  7484\n  . \n  Springer 818--829.  J. Fietz M. Krause C. Schulz P. Sanders and V. Heuveline. 2012. Optimized hybrid parallel lattice boltzmann fluid flow simulations on complex geometries. In Proceedings of the European Conference on Parallel Processing (Euro-Par\u201912) Lecture Notes in Computer Science Vol. 7484. Springer 818--829.","DOI":"10.1007\/978-3-642-32820-6_81"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 23rd Euromicro Intl. Conference on Parallel, Distributed, and Network-Based Processing. 236--243","author":"Glantz R.","year":"2015"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0056575"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 25th International Conference on Supercomputing (ICS\u201911)","author":"Hoefler T."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1997.1404"},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"G.\n      Mercier\n     and \n      J.\n      Clet-Ortega\n  . \n  2009\n  . Towards an efficient process placement policy for MPI applications in multicore environments. In Proceedings of the European Parallel Virtual Machine\/Message Passing Interface Users\u2019 Group Meeting Lecture Notes in Computer Science Vol. \n  5759\n  . \n  Springer 104--115.  G. Mercier and J. Clet-Ortega. 2009. Towards an efficient process placement policy for MPI applications in multicore environments. In Proceedings of the European Parallel Virtual Machine\/Message Passing Interface Users\u2019 Group Meeting Lecture Notes in Computer Science Vol. 5759. Springer 104--115.","DOI":"10.1007\/978-3-642-03770-2_17"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"G.\n      Mercier\n     and \n      E.\n      Jeannot\n  . \n  2011\n  . Improving MPI applications performance on multicore clusters with rank reordering. In Proceedings of the 18th \n  European MPI Users\u2019 Group Meeting Lecture Notes in Computer Science Vol. \n  6960\n  . 39--49.  G. Mercier and E. Jeannot. 2011. Improving MPI applications performance on multicore clusters with rank reordering. In Proceedings of the 18th European MPI Users\u2019 Group Meeting Lecture Notes in Computer Science Vol. 6960. 39--49.","DOI":"10.1007\/978-3-642-24449-0_7"},{"key":"e_1_2_1_21_1","volume-title":"\u00d6konometrie und Unternehmensforschung","author":"M\u00fcller-Merbach H."},{"key":"e_1_2_1_22_1","unstructured":"F. Pellegrini. [n.d.]. Scotch Home Page. Retrieved from http:\/\/www. labri.fr\/pelegrin\/scotch.  F. Pellegrini. [n.d.]. Scotch Home Page. Retrieved from http:\/\/www. labri.fr\/pelegrin\/scotch."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/321958.321975"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"P.\n      Sanders\n     and \n      C.\n      Schulz\n  . \n  2011\n  . Engineering multilevel graph partitioning algorithms. In Proceedings of the 19th European Symposium on Algorithms Lecture Notes in Computer Science Vol. \n  6942\n  . \n  Springer 469--480.  P. Sanders and C. Schulz. 2011. Engineering multilevel graph partitioning algorithms. In Proceedings of the 19th European Symposium on Algorithms Lecture Notes in Computer Science Vol. 6942. Springer 469--480.","DOI":"10.1007\/978-3-642-23719-5_40"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 12th International Symposium on Experimental Algorithms (SEA\u201913)","author":"Sanders P."},{"key":"e_1_2_1_26_1","unstructured":"K. Schloegel G. Karypis and V. Kumar. 2003. Graph partitioning for high performance scientific simulations. In The Sourcebook of Parallel Computing. 491--541.  K. Schloegel G. Karypis and V. Kumar. 2003. Graph partitioning for high performance scientific simulations. In The Sourcebook of Parallel Computing. 491--541."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"C. Schulz and D. Strash. 2019. Graph partitioning: Formulations and applications to big data. In Encyclopedia of Big Data Technologies S. Sakr and A. Y. Zomaya (Eds.). Springer. DOI:https:\/\/doi.org\/10.1007\/978-3-319-63962-8_312-2  C. Schulz and D. Strash. 2019. Graph partitioning: Formulations and applications to big data. In Encyclopedia of Big Data Technologies S. Sakr and A. Y. Zomaya (Eds.). Springer. DOI:https:\/\/doi.org\/10.1007\/978-3-319-63962-8_312-2","DOI":"10.1007\/978-3-319-63962-8_312-2"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:JOGO.0000042115.44455.f3"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1098\/rspa.1935.0134"},{"key":"e_1_2_1_30_1","volume-title":"Implementing the MPI process topology mechanism","author":"Tr\u00e4ff J. L."},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"J. T. Vogelstein J. M. Conroy V. Lyzinski L. J. Podrazik S. G. Kratzer E. T. Harley D. E. Fishkind R. J. Vogelstein and C. E. Priebe. 2015. Fast Approximate Quadratic Programming for Graph Matching. PLoS ONE 10 4 (2015).  J. T. Vogelstein J. M. Conroy V. Lyzinski L. J. Podrazik S. G. Kratzer E. T. Harley D. E. Fishkind R. J. Vogelstein and C. E. Priebe. 2015. Fast Approximate Quadratic Programming for Graph Matching. PLoS ONE 10 4 (2015).","DOI":"10.1371\/journal.pone.0121002"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827598337373"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the ACM\/IEEE Supercomputing. ACM Press, 116","author":"Yu H."}],"container-title":["ACM Journal of Experimental Algorithmics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3409667","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3409667","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:38:41Z","timestamp":1750199921000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3409667"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,30]]},"references-count":32,"alternative-id":["10.1145\/3409667"],"URL":"https:\/\/doi.org\/10.1145\/3409667","relation":{},"ISSN":["1084-6654","1084-6654"],"issn-type":[{"value":"1084-6654","type":"print"},{"value":"1084-6654","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,30]]}}}