{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T21:14:04Z","timestamp":1775942044285,"version":"3.50.1"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2014,10,27]],"date-time":"2014-10-27T00:00:00Z","timestamp":1414368000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100007523","name":"Advanced Cyberinfrastructure","doi-asserted-by":"publisher","award":["0904907, ACI-1148125\/1340293"],"award-info":[{"award-number":["0904907, ACI-1148125\/1340293"]}],"id":[{"id":"10.13039\/100007523","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2014,10,27]]},"abstract":"<jats:p>\n            We present a parallel sparse direct solver for multicore architectures based on Directed Acyclic Graph (DAG) scheduling. Recently, DAG scheduling has become popular in advanced Dense Linear Algebra libraries due to its efficient asynchronous parallel execution of tasks. However, its application to sparse matrix problems is more challenging as it has to deal with an enormous number of highly irregular tasks. This typically results in substantial scheduling overhead both in time and space, which causes overall parallel performance to be suboptimal. We describe a parallel solver based on two-level task parallelism: tasks are first generated from a parallel tree traversal on the assembly tree; next, those tasks are further refined by using\n            <jats:italic>algorithms<\/jats:italic>\n            -\n            <jats:italic>by<\/jats:italic>\n            -\n            <jats:italic>blocks<\/jats:italic>\n            to gain fine-grained parallelism. The resulting fine-grained tasks are asynchronously executed after their dependencies are analyzed. Our approach is distinct from others in that we adopt two-level task scheduling to mirror the two-level parallelism. As a result, we reduce scheduling overhead, and increase efficiency and flexibility. The proposed parallel sparse direct solver is evaluated for the particular problems arising from the\n            <jats:italic>hp<\/jats:italic>\n            -Finite Element Method where conventional sparse direct solvers do not scale well.\n          <\/jats:p>","DOI":"10.1145\/2629641","type":"journal-article","created":{"date-parts":[[2014,10,28]],"date-time":"2014-10-28T12:40:29Z","timestamp":1414500029000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling"],"prefix":"10.1145","volume":"41","author":[{"given":"Kyungjoo","family":"Kim","sequence":"first","affiliation":[{"name":"The University of Texas at Austin"}]},{"given":"Victor","family":"Eijkhout","sequence":"additional","affiliation":[{"name":"The University of Texas at Austin"}]}],"member":"320","published-online":{"date-parts":[[2014,10,27]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479899358194"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2005.07.004"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/76909.76910"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1002\/nme.1620280813"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1137\/1036141"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1137\/0718033"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cma.2009.07.012"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/209936.209958"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of PMAA\u20192012","author":"Bosilca G."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/110846427"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.10.002"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1002\/nme.1620362202"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810479.1810520"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTR.2007.4629221"},{"key":"e_1_2_1_15_1","volume-title":"Tech. Rep. RAL-TR-96-010, Computing and Information Systems Department, Rutherford Appleton Laboratory.","author":"Damhaug A. C.","year":"1996"},{"key":"e_1_2_1_16_1","volume-title":"Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2). Number 0898716136","author":"Davis T. A."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1462173.1462176"},{"key":"e_1_2_1_18_1","volume-title":"Frontiers Three Dimensional Elliptic and Maxwell Problems with Applications","author":"Demkowicz L.","year":"2007"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479895291765"},{"key":"e_1_2_1_20_1","unstructured":"Dongarra J. Faverge M. Ltaief H. and Luszczek P. 2011. Achieving numerical accuracy and high performance using recursive tile LU factorization. Tech. Rep. LAPACK Working Note 259.  Dongarra J. Faverge M. Ltaief H. and Luszczek P. 2011. Achieving numerical accuracy and high performance using recursive tile LU factorization. Tech. Rep. LAPACK Working Note 259."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/77626.79170"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(86)90019-0"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1137\/0713056"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/356044.356047"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 4th International Conference on OpenMP in a New Era of Parallelism. Springer-Verlag, 100--110","author":"Duran A."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01436966"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01407861"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01396660"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1236463.1236465"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(03)00099-1"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing. ACM, 793--802","author":"Gupta A."},{"key":"e_1_2_1_32_1","unstructured":"Gustavson F. G. Jonsson I. K\u00e5gstr\u00f6m B. and Ling P. 1999. Towards peak performance on hierarchical SMP memory architectures - new recursive blocked data formats and BLAS. In Parallel Processing for Scientific Computing 1--4.  Gustavson F. G. Jonsson I. K\u00e5gstr\u00f6m B. and Ling P. 1999. Towards peak performance on hierarchical SMP memory architectures - new recursive blocked data formats and BLAS. In Parallel Processing for Scientific Computing 1--4."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1137\/090757216"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2003.07.007"},{"key":"e_1_2_1_36_1","unstructured":"Lacoste X. Ramet P. Faverge M. Yamazaki I. and Dongarra J. 2012. Sparse direct solvers with accelerators over DAG runtimes. Tech. rep.  Lacoste X. Ramet P. Faverge M. Yamazaki I. and Dongarra J. 2012. Sparse direct solvers with accelerators over DAG runtimes. Tech. rep."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1137\/1034004"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1137\/0614019"},{"key":"e_1_2_1_40_1","article-title":"Memory bandwidth and machine balance in current high performance computers. IEEE","author":"McCalpin J. D.","year":"1995","journal-title":"Comput. Soc. Tech. Comm. Comput. Arch. (TCCA) Newsletter, 19--25."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-010-0140-7"},{"key":"e_1_2_1_42_1","unstructured":"OpenMP Architecture Review Board. 2008. OpenMP application program interface Version 3.0. http:\/\/www.openmp.org.  OpenMP Architecture Review Board. 2008. OpenMP application program interface Version 3.0 . http:\/\/www.openmp.org."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1137\/0914074"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1377612.1377615"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1527286.1527288"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the International Conference on Computational Science-Part II. Springer-Verlag, 335--363","author":"Schenk O."},{"key":"e_1_2_1_47_1","first-page":"158","article-title":"On fast factorization pivoting methods for sparse symmetric indefinite systems","volume":"23","author":"Schenk O.","year":"2006","journal-title":"Electron. Trans. Numer. Anal."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1002\/1096-9128(200010)12:12<1219::AID-CPE530>3.0.CO;2-0"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/0045-7825(90)90022-E"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1366219.1366222"},{"key":"e_1_2_1_51_1","unstructured":"Yarkhan A. Kurzak J. and Dongarra J. 2011. QUARK Users\u2019 Guide. Tech. Rep. Electrical Engineering and Computer Science Innovative Computing Laboratory University of Tenessee.  Yarkhan A. Kurzak J. and Dongarra J. 2011. QUARK Users\u2019 Guide. Tech. Rep. Electrical Engineering and Computer Science Innovative Computing Laboratory University of Tenessee."}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629641","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2629641","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:30Z","timestamp":1750227210000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629641"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,10,27]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,10,27]]}},"alternative-id":["10.1145\/2629641"],"URL":"https:\/\/doi.org\/10.1145\/2629641","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,10,27]]},"assertion":[{"value":"2012-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}