{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T13:07:26Z","timestamp":1775912846091,"version":"3.50.1"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2014,8,25]],"date-time":"2014-08-25T00:00:00Z","timestamp":1408924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004963","name":"Seventh Framework Programme","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004963","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2014,10,27]]},"abstract":"<jats:p>We present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a static description of the memory hierarchy and on runtime information about intertask communication. Existing locality-aware scheduling strategies for fine-grained tasks have strong limitations: they are specific to some class of machines or applications, they do not handle task dependences, they require manual program annotations, or they rely on fragile profiling schemes. By contrast, our solution makes no assumption on the structure of programs or on the layout of data in memory. Experimental results, based on the OpenStream language, show that locality of accesses to main memory of scientific applications can be increased significantly on a 64-core machine, resulting in a speedup of up to 1.63\u00d7 compared to a state-of-the-art work-stealing scheduler.<\/jats:p>","DOI":"10.1145\/2641764","type":"journal-article","created":{"date-parts":[[2014,8,29]],"date-time":"2014-08-29T13:03:31Z","timestamp":1409317411000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages"],"prefix":"10.1145","volume":"11","author":[{"given":"Andi","family":"Drebes","sequence":"first","affiliation":[{"name":"Sorbonne Universit\u00e9s, UPMC Univ Paris 06, CNRS, UMR 7606, LIP6, France"}]},{"given":"Karine","family":"Heydemann","sequence":"additional","affiliation":[{"name":"Sorbonne Universit\u00e9s, UPMC Univ Paris 06, CNRS, UMR 7606, LIP6, France"}]},{"given":"Nathalie","family":"Drach","sequence":"additional","affiliation":[{"name":"Sorbonne Universit\u00e9s, UPMC Univ Paris 06, CNRS, UMR 7606, LIP6, France"}]},{"given":"Antoniu","family":"Pop","sequence":"additional","affiliation":[{"name":"University of Manchester, School of Computer Science, United Kingdom"}]},{"given":"Albert","family":"Cohen","sequence":"additional","affiliation":[{"name":"INRIA and \u00c9cole Normale Sup\u00e9rieure, Paris, France"}]}],"member":"320","published-online":{"date-parts":[[2014,8,25]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/341800.341801"},{"key":"e_1_2_1_2_1","volume-title":"Anne Greenbaum, Sven Hammarling, A. McKenney, and Danny Sorensen.","author":"Anderson Edward","year":"1999","unstructured":"Edward Anderson , Zhaojun Bai , Christian Bischof , Laura Blackford , James Demmel , Jack Dongarra , Jeremy Du Croz , Anne Greenbaum, Sven Hammarling, A. McKenney, and Danny Sorensen. 1999 . LAPACK Users\u2019 Guide (3rd ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA. Edward Anderson, Zhaojun Bai, Christian Bischof, Laura Blackford, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, A. McKenney, and Danny Sorensen. 1999. LAPACK Users\u2019 Guide (3rd ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1631"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1468075.1468121"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 3rd USENIX Workshop on Hot Topics on Parallelism (HotPar\u201911)","author":"Best Micah","year":"2011","unstructured":"Micah Best , Shane Mottishaw , Craig Mustard , Mark Roth , Parsiad Azimzadeh , Alexandra Fedorova , and Andrew Brownsword . 2011 . Schedule data, not code . In Proceedings of the 3rd USENIX Workshop on Hot Topics on Parallelism (HotPar\u201911) . Micah Best, Shane Mottishaw, Craig Mustard, Mark Roth, Parsiad Azimzadeh, Alexandra Fedorova, and Andrew Brownsword. 2011. Schedule data, not code. In Proceedings of the 3rd USENIX Workshop on Hot Topics on Parallelism (HotPar\u201911)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567807"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1880018.1880019"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/209936.209958"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/324133.324234"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-010-0136-3"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-30961-8_8"},{"key":"e_1_2_1_12_1","first-page":"3","article-title":"Concurrent collections","volume":"18","author":"Budimli Zoran","year":"2010","unstructured":"Zoran Budimli &cgrave;, Michael Burke , Vincent Cav\u00e9 , Kathleen Knobe , Geoff Lowney , Ryan Newton , Jens Palsberg , David Peixotto , Vivek Sarkar , Frank Schlimbach , and Sagnak Ta\u015firlar . 2010 . Concurrent collections . Scientific Programming 18 , 3 -- 4 , 203--217. http:\/\/portal.acm.org\/citation.cfm&quest;id=1938482.1938486 Zoran Budimli&cgrave;, Michael Burke, Vincent Cav\u00e9, Kathleen Knobe, Geoff Lowney, Ryan Newton, Jens Palsberg, David Peixotto, Vivek Sarkar, Frank Schlimbach, and Sagnak Ta\u015firlar. 2010. Concurrent collections. Scientific Programming 18, 3--4, 203--217. http:\/\/portal.acm.org\/citation.cfm&quest;id=1938482.1938486","journal-title":"Scientific Programming"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2093157.2093165"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342007078442"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/155332.155358"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1094811.1094852"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1073970.1073974"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304599"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2451116.2451157"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 7th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG\u201914)","author":"Drebes Andi","year":"2014","unstructured":"Andi Drebes , Antoniu Pop , Karine Heydemann , Albert Cohen , and Nathalie Drach-Temam . 2014 . Aftermath: A graphical tool for performance analysis and debugging of fine-grained task-parallel programs and run-time systems . In Proceedings of the 7th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG\u201914) . Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, and Nathalie Drach-Temam. 2014. Aftermath: A graphical tool for performance analysis and debugging of fine-grained task-parallel programs and run-time systems. In Proceedings of the 7th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG\u201914)."},{"key":"e_1_2_1_21_1","volume-title":"OpenMP in a New Era of Parallelism, Rudolf Eigenmann and Bronis R","author":"Duran Alejandro","unstructured":"Alejandro Duran , Julita Corbal\u00e1n , and Eduard Ayguad\u00e9 . 2008. Evaluation of OpenMP task scheduling strategies . In OpenMP in a New Era of Parallelism, Rudolf Eigenmann and Bronis R . Supinski (Eds.). Lecture Notes in Computer Science, Vol. 5004 . Springer-Verlag , Berlin, Heidelberg, 100--110. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-540-79561-2_9 10.1007\/978-3-540-79561-2_9 Alejandro Duran, Julita Corbal\u00e1n, and Eduard Ayguad\u00e9. 2008. Evaluation of OpenMP task scheduling strategies. In OpenMP in a New Era of Parallelism, Rudolf Eigenmann and Bronis R. Supinski (Eds.). Lecture Notes in Computer Science, Vol. 5004. Springer-Verlag, Berlin, Heidelberg, 100--110. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-540-79561-2_9"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/277650.277725"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2010.55"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1278177.1278182"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693504"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/564870.564900"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454146"},{"key":"e_1_2_1_28_1","volume-title":"Retrieved","author":"Kleen Andreas","year":"2005","unstructured":"Andreas Kleen . 2005 . A NUMA API for Linux . Retrieved July 25, 2014, from http:\/\/halobates.de\/numaapi3.pdf. Andreas Kleen. 2005. A NUMA API for Linux. Retrieved July 25, 2014, from http:\/\/halobates.de\/numaapi3.pdf."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442524"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 5th Conference on Partitioned Global Address Space Programming Models (PGAS\u201911)","author":"Min Seung-Jai","year":"2011","unstructured":"Seung-Jai Min , Costin Iancu , and Katherine Yelick . 2011 . Hierarchical work stealing on manycore clusters . In Proceedings of the 5th Conference on Partitioned Global Address Space Programming Models (PGAS\u201911) . Seung-Jai Min, Costin Iancu, and Katherine Yelick. 2011. Hierarchical work stealing on manycore clusters. In Proceedings of the 5th Conference on Partitioned Global Address Space Programming Models (PGAS\u201911)."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009106195"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400712"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/113379.113401"},{"key":"e_1_2_1_34_1","volume-title":"Tools for High Performance Computing","author":"Terpstra Dan","year":"2009","unstructured":"Dan Terpstra , Heike Jagode , Haihang You , and Jack Dongarra . 2010. Collecting performance data with PAPI-C . In Tools for High Performance Computing 2009 , Matthias S. M\u00fcller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer-Verlag , Berlin, Heidelberg, 157--173. Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting performance data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. M\u00fcller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer-Verlag, Berlin, Heidelberg, 157--173."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321211.1321241"},{"key":"e_1_2_1_36_1","series-title":"Lecture Notes in Computer Science","volume-title":"Euro-Par 2007 Parallel Processing, Anne-Marie Kermarrec, Luc Boug\u00e9, and Thierry Priol (Eds.)","author":"Thibault Samuel","unstructured":"Samuel Thibault , Raymond Namyst , and Pierre-Andr\u00e9 Wacrenier . 2007. Building portable thread schedulers for hierarchical multiprocessors: The bubblesched framework . In Euro-Par 2007 Parallel Processing, Anne-Marie Kermarrec, Luc Boug\u00e9, and Thierry Priol (Eds.) . Lecture Notes in Computer Science , Vol. 4641 . Springer-Verlag , Berlin, Heidelberg , 42--51. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-540-74466-5_6 10.1007\/978-3-540-74466-5_6 Samuel Thibault, Raymond Namyst, and Pierre-Andr\u00e9 Wacrenier. 2007. Building portable thread schedulers for hierarchical multiprocessors: The bubblesched framework. In Euro-Par 2007 Parallel Processing, Anne-Marie Kermarrec, Luc Boug\u00e9, and Thierry Priol (Eds.). Lecture Notes in Computer Science, Vol. 4641. Springer-Verlag, Berlin, Heidelberg, 42--51. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-540-74466-5_6"},{"key":"e_1_2_1_37_1","volume-title":"UPC Language Specifications, v1.2","author":"UPC Consortium","unstructured":"UPC Consortium . 2005. UPC Language Specifications, v1.2 . Technical Report LBNL-59208. Lawrence Berkeley National Lab . Available at http:\/\/www.gwu.edu\/&sim;upc\/publications\/LBNL-59208.pdf. UPC Consortium. 2005. UPC Language Specifications, v1.2. Technical Report LBNL-59208. Lawrence Berkeley National Lab. Available at http:\/\/www.gwu.edu\/&sim;upc\/publications\/LBNL-59208.pdf."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2486159.2486175"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 2003 USENIX Annual Technical Conference (USENIX\u201903)","author":"Zeldovich Nickolai","year":"2003","unstructured":"Nickolai Zeldovich , Alexander Yip , Frank Dabek , Robert Morris , David Mazi\u00e8res , and Frans Kaashoek . 2003 . Multiprocessor support for event-driven programs . In Proceedings of the 2003 USENIX Annual Technical Conference (USENIX\u201903) . Nickolai Zeldovich, Alexander Yip, Frank Dabek, Robert Morris, David Mazi\u00e8res, and Frans Kaashoek. 2003. Multiprocessor support for event-driven programs. In Proceedings of the 2003 USENIX Annual Technical Conference (USENIX\u201903)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2379776.2379780"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2641764","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2641764","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:56:19Z","timestamp":1750229779000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2641764"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8,25]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2014,10,27]]}},"alternative-id":["10.1145\/2641764"],"URL":"https:\/\/doi.org\/10.1145\/2641764","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,8,25]]},"assertion":[{"value":"2013-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-08-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}