{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"institution":[{"id":[{"id":"https:\/\/ror.org\/03mb6wj31","id-type":"ROR","asserted-by":"publisher"},{"id":"https:\/\/www.isni.org\/000000041937028X","id-type":"ISNI","asserted-by":"publisher"},{"id":"https:\/\/www.wikidata.org\/entity\/Q1640731","id-type":"wikidata","asserted-by":"publisher"}],"name":"Universitat Polit\u00e8cnica de Catalunya","acronym":["UPC"]}],"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T12:53:32Z","timestamp":1768308812044,"version":"3.49.0"},"reference-count":0,"publisher":"Universitat Polit\u00e8cnica de Catalunya","license":[{"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/3.0\/es\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:p>There is a clear trend nowadays to use heterogeneous high-performance computers, as they offer considerably greater computing power than homogeneous CPU systems. Extending traditional CPU systems with specialized units (accelerators such as GPGPUs) has become a revolution in the HPC world. Both the traditional performance-per-Watt and the performance-per-Euro ratios have been increased with the use of such systems.\r\nHeterogeneous machines can adapt better to different application requirements, as each architecture type offers different characteristics. Thus, in order to maximize application performance in these platforms, applications should be divided into several portions according to their execution requirements. These portions should then be scheduled to the device that better fits their requirements.\r\nHence, heterogeneity introduces complexity in application development, up to the point of reaching the programming wall: on the one hand, source codes must be adapted to fit new architectures and, on the other, resource management becomes more complicated. For example, multiple memory spaces that require explicit data movements or additional synchronizations between different code portions that run on different units. For all these reasons, efficient programming and code maintenance in heterogeneous systems is extremely complex and expensive.\r\nAlthough several approaches have been proposed for accelerator programming, like CUDA or OpenCL, these models do not solve the aforementioned programming challenges, as they expose low level hardware characteristics to the programmer. Therefore, programming models should be able to hide all these complex accelerator programming by providing a homogeneous development environment.\r\nIn this context, this thesis contributes in two key aspects: first, it proposes a general design to efficiently manage the execution of heterogeneous applications and second, it presents several scheduling mechanisms to spread application execution among all the units of the system to maximize performance and resource utilization.\r\nThe first contribution proposes an asynchronous design to manage execution, data movements and synchronizations on accelerators. This approach has been developed in two steps: first, a semi-asynchronous proposal and then, a fully-asynchronous proposal  in order to fit contemporary hardware restrictions. The experimental results tested on different multi-accelerator systems showed that these approaches could reach the maximum expected performance. Even if compared to native, hand-tuned codes, they could get the same results and outperform native versions in selected cases.\r\nThe second contribution presents four different scheduling strategies. They focus and combine different aspects related to heterogeneous programming to minimize application's execution time. For example, minimizing the amount of data shared between memory spaces, or maximizing resource utilization by scheduling each portion of code on the unit that fits better. The experimental results were performed on different heterogeneous platforms, including CPUs, GPGPU and Intel Xeon Phi devices. As shown in these tests, it is particularly interesting to analyze how all these scheduling strategies can impact application performance.\r\nThree general conclusions can be extracted: first, application performance is not guaranteed across new hardware generations. Then, source codes must be periodically updated as hardware evolves. Second, the most efficient way to run an application on a heterogeneous platform is to divide it into smaller portions and pick the unit that better fits to run each portion. Hence, system resources can cooperate together to execute the application. Finally, and probably the most important, the requirements derived from the first and second conclusions can be implemented inside runtime frameworks, so the complexity of programming heterogeneous architectures is completely hidden to the programmer.<\/jats:p>\n                <jats:p>Actualment, hi ha una clara tend\u00e8ncia per l'\u00fas de sistemes heterogenis d'alt rendiment, ja que ofereixen una major pot\u00e8ncia de c\u00e0lcul que els sistemes homogenis amb CPUs tradicionals. L'addici\u00f3 d'unitats especialitzades (acceleradors com ara GPGPUs) als sistemes amb CPUs s'ha convertit en una revoluci\u00f3 en el m\u00f3n de la computaci\u00f3 d'alt rendiment. Els sistemes heterogenis poden adaptar-se millor a les diferents necessitats de les aplicacions, ja que cada tipus d'arquitectura ofereix diferents caracter\u00edstiques. Per tant, per maximitzar el rendiment, les aplicacions s'han de dividir en diverses parts d'acord amb els seus requeriments computacionals. Llavors, aquestes parts s'han d'executar al dispositiu que s'adapti millor a les seves necessitats. Per tant, l'heterogene\u00eftat introdueix una complexitat addicional en el desenvolupament d'aplicacions: d'una banda, els codis font s'han d'adaptar a les noves arquitectures i, de l'altra, la gesti\u00f3 de recursos es fa m\u00e9s complicada. Per exemple, m\u00faltiples espais de mem\u00f2ria que requereixen moviments expl\u00edcits de dades o sincronitzacions addicionals entre diferents parts de codi que s'executen en diferents unitats. Per aix\u00f2, la programaci\u00f3 i el manteniment del codi en sistemes heterogenis s\u00f3n extremadament complexos i cars. Tot i que hi ha diverses propostes per a la programaci\u00f3 d'acceleradors, com CUDA o OpenCL, aquests models no resolen els reptes de programaci\u00f3 descrits anteriorment, ja que exposen les caracter\u00edstiques de baix nivell del hardware al programador. Per tant, els models de programaci\u00f3 han de poder ocultar les complexitats dels acceleradors de cara al programador, proporcionant un entorn de desenvolupament homogeni. En aquest context, la tesi contribueix en dos aspectes fonamentals: primer, proposa un disseny per a gestionar de manera eficient l'execuci\u00f3 d'aplicacions heterog\u00e8nies i, segon, presenta diversos mecanismes de planificaci\u00f3 per dividir l'execuci\u00f3 d'aplicacions entre totes les unitats del sistema, per tal de maximitzar el rendiment i la utilitzaci\u00f3 de recursos. La primera contribuci\u00f3 proposa un disseny d'execuci\u00f3 as\u00edncron per gestionar els moviments de dades i sincronitzacions en acceleradors. Aquest enfocament s'ha desenvolupat en dos passos: primer, una proposta semi-as\u00edncrona i despr\u00e9s, una proposta totalment as\u00edncrona per tal d'adaptar-se a les restriccions del hardware contemporani. Els resultats en sistemes multi-accelerador mostren que aquests enfocaments poden assolir el m\u00e0xim rendiment esperat. Fins i tot, en determinats casos, poden superar el rendiment de codis nadius altament optimitzats. La segona contribuci\u00f3 presenta quatre mecanismes de planificaci\u00f3 diferents, enfocats a la programaci\u00f3 heterog\u00e8nia, per minimitzar el temps d'execuci\u00f3 de les aplicacions. Per exemple, minimitzar la quantitat de dades compartides entre espais de mem\u00f2ria, o maximitzar la utilitzaci\u00f3 de recursos mitjan\u00e7ant l'execuci\u00f3 de cada porci\u00f3 de codi a la unitat que s'adapta millor. Els experiments s'han realitzat en diferents plataformes heterog\u00e8nies, incloent CPUs, GPGPUs i dispositius Intel Xeon Phi. \u00c9s particularment interessant analitzar com totes aquestes estrat\u00e8gies de planificaci\u00f3 poden afectar el rendiment de l'aplicaci\u00f3. Com a resultat, es poden extreure tres conclusions generals: en primer lloc, el rendiment de l'aplicaci\u00f3 no est\u00e0 garantit en les noves generacions de hardware. Per tant, els codis s'han d'actualitzar peri\u00f2dicament a mesura que el hardware evoluciona. En segon lloc, la forma m\u00e9s eficient d'executar una aplicaci\u00f3 en una plataforma heterog\u00e8nia \u00e9s dividir-la en porcions m\u00e9s petites i escollir la unitat que millor s'adapta per executar cada porci\u00f3. Finalment, i probablement la conclusi\u00f3 m\u00e9s important, \u00e9s que les exig\u00e8ncies derivades de les dues primeres conclusions poden ser implementades dins de llibreries de sistema, de manera que la complexitat de programaci\u00f3 d'arquitectures heterog\u00e8nies quedi completament oculta per al programador.<\/jats:p>","DOI":"10.5821\/dissertation-2117-95988","type":"dissertation","created":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T03:52:01Z","timestamp":1689738721000},"approved":{"date-parts":[[2015,11,3]]},"source":"Crossref","is-referenced-by-count":0,"title":["Programming models and scheduling techniques for heterogeneous architectures"],"prefix":"10.5821","author":[{"sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Judit","family":"Planas Carbonell","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"3865","container-title":[],"original-title":[],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T06:41:20Z","timestamp":1768286480000},"score":1,"resource":{"primary":{"URL":"https:\/\/hdl.handle.net\/2117\/95988"}},"subtitle":[],"editor":[{"given":"Eduard","family":"Ayguad\u00e9 Parra","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Rosa Maria","family":"Badia Sala","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[null]]},"references-count":0,"URL":"https:\/\/doi.org\/10.5821\/dissertation-2117-95988","relation":{},"subject":[]}}