{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T06:11:02Z","timestamp":1741068662354,"version":"3.38.0"},"reference-count":32,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2014,3,31]],"date-time":"2014-03-31T00:00:00Z","timestamp":1396224000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p> Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a uniform runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource oversubscription, undesirable cache flushes and memory bus contention. <\/jats:p><jats:p> This paper presents an extension of StarPU, a runtime system specifically designed for heterogeneous architectures, that allows multiple parallel codes to run concurrently with minimal interference. Such parallel codes run within scheduling contexts that provide confined execution environments which can be used to partition computing resources. Scheduling contexts can be dynamically resized to optimize the allocation of computing resources among concurrently running libraries. We introduce a hypervisor that automatically expands or shrinks contexts using feedback from the runtime system (e.g. resource utilization). We demonstrate the relevance of our approach using benchmarks invoking multiple high performance linear algebra kernels simultaneously on top of heterogeneous multicore machines. We show that our mechanism can dramatically improve the overall application run time (\u2212 34%), most notably by reducing the average cache miss ratio (\u2212 50%). <\/jats:p>","DOI":"10.1177\/1094342014527575","type":"journal-article","created":{"date-parts":[[2014,4,1]],"date-time":"2014-04-01T03:29:48Z","timestamp":1396322988000},"page":"285-300","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":4,"title":["Composing multiple StarPU applications over heterogeneous machines: A supervised approach"],"prefix":"10.1177","volume":"28","author":[{"given":"Andra","family":"Hugo","sequence":"first","affiliation":[{"name":"Inria, LaBRI, University of Bordeaux, France"}]},{"given":"Abdou","family":"Guermouche","sequence":"additional","affiliation":[{"name":"Inria, LaBRI, University of Bordeaux, France"}]},{"given":"Pierre-Andr\u00e9","family":"Wacrenier","sequence":"additional","affiliation":[{"name":"Inria, LaBRI, University of Bordeaux, France"}]},{"given":"Raymond","family":"Namyst","sequence":"additional","affiliation":[{"name":"Inria, LaBRI, University of Bordeaux, France"}]}],"member":"179","published-online":{"date-parts":[[2014,3,31]]},"reference":[{"key":"bibr1-1094342014527575","first-page":"100","volume-title":"Proceedings of the annual ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)","author":"Agrawal K","year":"2006"},{"key":"bibr2-1094342014527575","first-page":"473","volume":"2","author":"Agullo E","year":"2011","journal-title":"GPU Computing Gems, Jade Edition"},{"key":"bibr3-1094342014527575","doi-asserted-by":"crossref","unstructured":"Agullo E, Demmel J, Dongarra J, (2009) Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects.","DOI":"10.1088\/1742-6596\/180\/1\/012037"},{"key":"bibr4-1094342014527575","unstructured":"ARB TO (2012) The OpenMP\u00ae API specification for parallel programming. Available at: http:\/\/openmp.org\/."},{"key":"bibr5-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03869-3_80"},{"key":"bibr6-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1631"},{"key":"bibr7-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03869-3_79"},{"key":"bibr8-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.10.003"},{"key":"bibr9-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"bibr10-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1145\/1383422.1383447"},{"key":"bibr11-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"key":"bibr12-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1145\/277652.277725"},{"key":"bibr13-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1016\/j.crme.2010.12.003"},{"key":"bibr14-1094342014527575","unstructured":"Group TK (2011) OpenCL\u2014The open standard for parallel programming of heterogeneous systems. Available at: http:\/\/khronos.org\/opencl\/."},{"key":"bibr15-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2012.03.005"},{"key":"bibr16-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15291-7_23"},{"key":"bibr17-1094342014527575","unstructured":"Intel Corporation (n.d.) MKL reference manual. Available at: http:\/\/software.intel.com\/en-us\/articles\/intel-mkl."},{"key":"bibr18-1094342014527575","unstructured":"Intel Corporation (n.d.) TBB reference manual. Available at: http:\/\/threadingbuildingblocks.org."},{"key":"bibr19-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.49"},{"key":"bibr20-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669121"},{"key":"bibr21-1094342014527575","unstructured":"Marochko A (2012) TBB 3.0 task scheduler improves composability of TBB based solutions. Available at: http:\/\/software.intel.com\/en-us\/blogs\/2010\/05\/13\/tbb-30-task-scheduler-improves-composability-of-tbb-based-solutions-part-1\/."},{"key":"bibr22-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1145\/1809028.1806639"},{"journal-title":"FLAME Working Notes","year":"2008","author":"Quintana-Orti G","key":"bibr23-1094342014527575"},{"key":"bibr24-1094342014527575","unstructured":"Sabahi M (2012) Getting code ready for parallel execution with Intel parallel composer. Available at: http:\/\/software.intel.com\/en-us\/articles\/getting-code-ready-for-parallel-execution-with-intel-parallel-composer."},{"key":"bibr25-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.121"},{"key":"bibr26-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536546"},{"key":"bibr27-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-010-0151-6"},{"key":"bibr28-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTR.2009.5289193"},{"key":"bibr29-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35867-8_7"},{"key":"bibr30-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2010.5470941"},{"key":"bibr31-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/71.993206"},{"key":"bibr32-1094342014527575","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2009.154"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014527575","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342014527575","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014527575","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T02:47:37Z","timestamp":1741056457000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342014527575"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,3,31]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.1177\/1094342014527575"],"URL":"https:\/\/doi.org\/10.1177\/1094342014527575","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2014,3,31]]}}}