{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"institution":[{"id":[{"id":"https:\/\/ror.org\/03mb6wj31","id-type":"ROR","asserted-by":"publisher"},{"id":"https:\/\/www.isni.org\/000000041937028X","id-type":"ISNI","asserted-by":"publisher"},{"id":"https:\/\/www.wikidata.org\/entity\/Q1640731","id-type":"wikidata","asserted-by":"publisher"}],"name":"Universitat Polit\u00e8cnica de Catalunya","acronym":["UPC"]}],"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T14:04:05Z","timestamp":1769263445067,"version":"3.49.0"},"reference-count":0,"publisher":"Universitat Polit\u00e8cnica de Catalunya","license":[{"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0\/es\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:p>Gone are the days when engineers and scientists conducted most of their experiments empirically. During these decades, actual tests were carried out in order to assess the robustness and reliability of forthcoming product designs and prove theoretical models. With the advent of the computational era, scientific computing has definetely become a feasible solution compared with empirical methods, in terms of effort, cost and reliability. Large and massively parallel computational resources have reduced the simulation execution times and have improved their numerical results due to the refinement of the sampled domain. Several numerical methods coexist for solving the Partial Differential Equations (PDEs). Methods such as the Finite Element (FE) and the Finite Volume (FV) are specially well suited for dealing with problems where unstructured meshes are frequent. Unfortunately, this flexibility is not bestowed for free. These schemes entail higher memory latencies due to the handling of irregular data accesses. Conversely, the Finite Difference (FD) scheme has shown to be an efficient solution for problems where the structured meshes suit the domain requirements. Many scientific areas use this scheme due to its higher performance.\r\nThis thesis focuses on improving FD schemes to leverage the performance of large scientific computing simulations. Different techniques are proposed such as the Semi-stencil, a novel algorithm that increases the FLOP\/Byte ratio for medium- and high-order stencils operators by reducing the accesses and endorsing data reuse. The algorithm is orthogonal and can be combined with techniques such as spatial- or time-blocking, adding further improvement. New trends on Symmetric Multi-Processing (SMP) systems -where tens of cores are replicated on the same die- pose new challenges due to the exacerbation of the memory wall problem. In order to alleviate this issue, our research is focused on different strategies to reduce pressure on the cache hierarchy, particularly when different threads are sharing resources due to Simultaneous Multi-Threading (SMT). Several domain decomposition schedulers for work-load balance are introduced ensuring quasi-optimal results without jeopardizing the overall performance. We combine these schedulers with spatial-blocking and auto-tuning techniques, exploring the parametric space and reducing misses in last level cache.\r\nAs alternative to brute-force methods used in auto-tuning, where a huge parametric space must be traversed to find a suboptimal candidate, performance models are a feasible solution. Performance models can predict the performance on different architectures, selecting suboptimal parameters almost instantly. In this thesis, we devise a flexible and extensible performance model for stencils. The proposed model is capable of supporting multi- and many-core architectures including complex features such as hardware prefetchers, SMT context and algorithmic optimizations. Our model can be used not only to forecast execution time, but also to make decisions about the best algorithmic parameters. Moreover, it can be included in run-time optimizers to decide the best SMT configuration based on the execution environment.\r\nSome industries rely heavily on FD-based techniques for their codes. Nevertheless, many cumbersome aspects arising in industry are still scarcely considered in academia research. In this regard, we have collaborated in the implementation of a FD framework which covers the most important features that an HPC industrial application must include. Some of the node-level optimization techniques devised in this thesis have been included into the framework in order to contribute in the overall application performance. We show results for a couple of strategic applications in industry: an atmospheric transport model that simulates the dispersal of volcanic ash and a seismic imaging model used in Oil &amp; Gas industry to identify hydrocarbon-rich reservoirs.<\/jats:p>\n                <jats:p>Atr\u00e1s quedaron los d\u00edas en los que ingenieros y cient\u00edficos realizaban sus experimentos emp\u00edricamente. Durante esas d\u00e9cadas, se llevaban a cabo ensayos reales para verificar la robustez y fiabilidad de productos venideros y probar modelos te\u00f3ricos. Con la llegada de la era computacional, la computaci\u00f3n cient\u00edfica se ha convertido en una soluci\u00f3n factible comparada con m\u00e9todos emp\u00edricos, en t\u00e9rminos de esfuerzo, coste y fiabilidad. Los supercomputadores han reducido el tiempo de las simulaciones y han mejorado los resultados num\u00e9ricos gracias al refinamiento del dominio. Diversos m\u00e9todos num\u00e9ricos coexisten para resolver las Ecuaciones Diferenciales Parciales (EDPs). M\u00e9todos como Elementos Finitos (EF) y Vol\u00famenes Finitos (VF) est\u00e1n bien adaptados para tratar problemas donde las mallas no estructuradas son frecuentes. Desafortunadamente, esta flexibilidad no se confiere de forma gratuita. Estos esquemas conllevan latencias m\u00e1s altas debido al acceso irregular de datos. En cambio, el esquema de Diferencias Finitas (DF) ha demostrado ser una soluci\u00f3n eficiente cuando las mallas estructuradas se adaptan a los requerimientos. Esta tesis se enfoca en mejorar los esquemas DF para impulsar el rendimiento de las simulaciones en la computaci\u00f3n cient\u00edfica. Se proponen diferentes t\u00e9cnicas, como el Semi-stencil, un nuevo algoritmo que incrementa el ratio de FLOP\/Byte para operadores de stencil de orden medio y alto reduciendo los accesos y promoviendo el reuso de datos. El algoritmo es ortogonal y puede ser combinado con t\u00e9cnicas como spatial- o time-blocking, a\u00f1adiendo mejoras adicionales. Las nuevas tendencias hacia sistemas con procesadores multi-sim\u00e9tricos (SMP) -donde decenas de cores son replicados en el mismo procesador- plantean nuevos retos debido a la exacerbaci\u00f3n del problema del ancho de memoria. Para paliar este problema, nuestra investigaci\u00f3n se centra en estrategias para reducir la presi\u00f3n en la jerarqu\u00eda de cache, particularmente cuando diversos threads comparten recursos debido a Simultaneous Multi-Threading (SMT). Introducimos diversos planificadores de descomposici\u00f3n de dominios para balancear la carga asegurando resultados casi \u00f3ptimos sin poner en riesgo el rendimiento global. Combinamos estos planificadores con t\u00e9cnicas de spatial-blocking y auto-tuning, explorando el espacio param\u00e9trico y reduciendo los fallos en la cache de \u00faltimo nivel. Como alternativa a los m\u00e9todos de fuerza bruta usados en auto-tuning donde un espacio param\u00e9trico se debe recorrer para encontrar un candidato, los modelos de rendimiento son una soluci\u00f3n factible. Los modelos de rendimiento pueden predecir el rendimiento en diferentes arquitecturas, seleccionando par\u00e1metros suboptimos casi de forma instant\u00e1nea. En esta tesis, ideamos un modelo de rendimiento para stencils flexible y extensible. El modelo es capaz de soportar arquitecturas multi-core incluyendo caracter\u00edsticas complejas como prefetchers, SMT y optimizaciones algor\u00edtmicas. Nuestro modelo puede ser usado no solo para predecir los tiempos de ejecuci\u00f3n, sino tambi\u00e9n para tomar decisiones de los mejores par\u00e1metros algor\u00edtmicos. Adem\u00e1s, puede ser incluido en optimizadores run-time para decidir la mejor configuraci\u00f3n SMT. Algunas industrias conf\u00edan en t\u00e9cnicas DF para sus c\u00f3digos. Sin embargo, no todos los aspectos que aparecen en la industria han sido sometidos a investigaci\u00f3n. En este aspecto, hemos dise\u00f1ado e implementado desde cero una infraestructura DF que cubre las caracter\u00edsticas m\u00e1s importantes que una aplicaci\u00f3n industrial debe incluir. Algunas de las t\u00e9cnicas de optimizaci\u00f3n propuestas en esta tesis han sido incluidas para contribuir en el rendimiento global a nivel industrial. Mostramos resultados de un par de aplicaciones estrat\u00e9gicas para la industria: un modelo de transporte atmosf\u00e9rico que simula la dispersi\u00f3n de ceniza volc\u00e1nica y un modelo de imagen s\u00edsmica usado en la industria del petroleo y gas para identificar reservas ricas en hidrocarburos<\/jats:p>","DOI":"10.5821\/dissertation-2117-95958","type":"dissertation","created":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T01:47:50Z","timestamp":1689731270000},"approved":{"date-parts":[[2015,11,30]]},"source":"Crossref","is-referenced-by-count":0,"title":["Leveraging performance of 3D finite difference schemes in large scientific computing simulations"],"prefix":"10.5821","author":[{"sequence":"additional","affiliation":[]},{"given":"Ra\u00fal","family":"De la Cruz","sequence":"first","affiliation":[]}],"member":"3865","container-title":[],"original-title":[],"deposited":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T06:49:28Z","timestamp":1769150968000},"score":1,"resource":{"primary":{"URL":"https:\/\/hdl.handle.net\/2117\/95958"}},"subtitle":[],"editor":[{"given":"Jos\u00e9 M.","family":"Cela Esp\u00edn","sequence":"first","affiliation":[]},{"given":"Mauricio","family":"Araya Polo","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[null]]},"references-count":0,"URL":"https:\/\/doi.org\/10.5821\/dissertation-2117-95958","relation":{},"subject":[]}}