{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:09:53Z","timestamp":1750306193182,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2016,9,17]],"date-time":"2016-09-17T00:00:00Z","timestamp":1474070400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100003545","name":"Minist\u00e9rio da Ci\u00eancia, Tecnologia e Inova\u00e7\u00e3o","doi-asserted-by":"publisher","award":["RNP-689772"],"award-info":[{"award-number":["RNP-689772"]}],"id":[{"id":"10.13039\/501100003545","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007601","name":"Horizon 2020","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100007601","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"Conselho Nacional de Desenvolvimento Cient\u00edfico e Tecnol\u00f3gico","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2016,9,17]]},"abstract":"<jats:p>The performance and energy efficiency of modern architectures depend on memory locality, which can be improved by thread and data mappings considering the memory access behavior of parallel applications. In this article, we propose intense pages mapping, a mechanism that analyzes the memory access behavior using information about the time the entry of each page resides in the translation lookaside buffer. It provides accurate information with a very low overhead. We present experimental results with simulation and real machines, with average performance improvements of 13.7% and energy savings of 4.4%, which come from reductions in cache misses and interconnection traffic.<\/jats:p>","DOI":"10.1145\/2975587","type":"journal-article","created":{"date-parts":[[2016,9,19]],"date-time":"2016-09-19T20:11:45Z","timestamp":1474315905000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8413-795X","authenticated-orcid":false,"given":"Eduardo H. M.","family":"Cruz","sequence":"first","affiliation":[{"name":"Federal University of Rio Grande do Sul, RS, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9064-7806","authenticated-orcid":false,"given":"Matthias","family":"Diener","sequence":"additional","affiliation":[{"name":"Federal University of Rio Grande do Sul, RS, Brazil"}]},{"given":"La\u00e9rcio L.","family":"Pilla","sequence":"additional","affiliation":[{"name":"Federal University of Santa Catarina, SC, Brazil"}]},{"given":"Philippe O. A.","family":"Navaux","sequence":"additional","affiliation":[{"name":"Federal University of Rio Grande do Sul, RS, Brazil"}]}],"member":"320","published-online":{"date-parts":[[2016,9,17]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201909)","author":"Agarwal Niket","year":"2009","unstructured":"Niket Agarwal , Tushar Krishna , Li-Shiuan Peh , and Niraj K. Jha . 2009. GARNET: A detailed on-chip network model inside a full-system simulator . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201909) . 33--42. DOI:http:\/\/dx.doi.org\/10.1109\/ISPASS. 2009 .4919636 10.1109\/ISPASS.2009.4919636 Niket Agarwal, Tushar Krishna, Li-Shiuan Peh, and Niraj K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201909). 33--42. DOI:http:\/\/dx.doi.org\/10.1109\/ISPASS.2009.4919636"},{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.1145\/1531793.1531803"},{"doi-asserted-by":"publisher","key":"e_1_2_1_3_1","DOI":"10.1109\/MC.2010.60"},{"doi-asserted-by":"publisher","key":"e_1_2_1_4_1","DOI":"10.1109\/IISWC.2009.5306792"},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.1145\/1454115.1454128"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.1145\/1941487.1941507"},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.1109\/IPDPS.2010.5470442"},{"doi-asserted-by":"publisher","key":"e_1_2_1_8_1","DOI":"10.1109\/PDP.2010.67"},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1145\/1080695.1070001"},{"key":"e_1_2_1_10_1","volume-title":"Retrieved","author":"Corbet Jonathan","year":"2012","unstructured":"Jonathan Corbet . 2012 a. AutoNUMA: The Other Approach to NUMA Scheduling . Retrieved August 20, 2016, from http:\/\/lwn.net\/Articles\/488709\/. Jonathan Corbet. 2012a. AutoNUMA: The Other Approach to NUMA Scheduling. Retrieved August 20, 2016, from http:\/\/lwn.net\/Articles\/488709\/."},{"key":"e_1_2_1_11_1","volume-title":"Retrieved","author":"Corbet Jonathan","year":"2012","unstructured":"Jonathan Corbet . 2012 b. Toward Better NUMA Scheduling . Retrieved August 20, 2016, from http:\/\/lwn.net\/Articles\/486858\/. Jonathan Corbet. 2012b. Toward Better NUMA Scheduling. Retrieved August 20, 2016, from http:\/\/lwn.net\/Articles\/486858\/."},{"doi-asserted-by":"publisher","key":"e_1_2_1_12_1","DOI":"10.1147\/JRD.2011.2163967"},{"doi-asserted-by":"publisher","key":"e_1_2_1_13_1","DOI":"10.1109\/PDP.2015.25"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.1109\/TC.2011.241"},{"doi-asserted-by":"publisher","key":"e_1_2_1_15_1","DOI":"10.1145\/2451116.2451157"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.5555\/1898953.1899056"},{"doi-asserted-by":"publisher","key":"e_1_2_1_17_1","DOI":"10.1145\/2628071.2628085"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.1016\/j.parco.2015.01.005"},{"doi-asserted-by":"publisher","key":"e_1_2_1_19_1","DOI":"10.1109\/CSE.2008.51"},{"volume-title":"Parallel Computing: From Multicores and GPU\u2019s to Petascale","author":"Dupros Fabrice","unstructured":"Fabrice Dupros , Christiane Pousa , Alexandre Carissimi , and Jean-Fran\u00e7ois M\u00e9haut . 2010. Parallel simulations of seismic wave propagation on NUMA architectures . In Parallel Computing: From Multicores and GPU\u2019s to Petascale , B. Chapman, F. Desprez, G. R. Joubert, A. Lichnewsky, F. Peters, and T. Priol (Eds.). IOS Press , Amsterdam, Netherlands , 67--74. Fabrice Dupros, Christiane Pousa, Alexandre Carissimi, and Jean-Fran\u00e7ois M\u00e9haut. 2010. Parallel simulations of seismic wave propagation on NUMA architectures. In Parallel Computing: From Multicores and GPU\u2019s to Petascale, B. Chapman, F. Desprez, G. R. Joubert, A. Lichnewsky, F. Peters, and T. Priol (Eds.). IOS Press, Amsterdam, Netherlands, 67--74.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the Linux Symposium.","author":"Eranian Stephane","year":"2006","unstructured":"Stephane Eranian . 2006 . Perfmon2: A flexible performance monitoring interface for Linux . In Proceedings of the Linux Symposium. Stephane Eranian. 2006. Perfmon2: A flexible performance monitoring interface for Linux. In Proceedings of the Linux Symposium."},{"doi-asserted-by":"publisher","key":"e_1_2_1_22_1","DOI":"10.1109\/IPDPS.2012.54"},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.1109\/CCGrid.2016.91"},{"volume-title":"Retrieved","year":"2012","unstructured":"Intel. 2012 b. Intel Performance Counter Monitor\u2014A Better Way to Measure CPU Utilization . Retrieved August 20, 2016, from http:\/\/www.intel.com\/software\/pcm. Intel. 2012b. Intel Performance Counter Monitor\u2014A Better Way to Measure CPU Utilization. Retrieved August 20, 2016, from http:\/\/www.intel.com\/software\/pcm.","key":"e_1_2_1_26_1"},{"key":"e_1_2_1_27_1","series-title":"Lecture Notes in Computer Science","volume-title":"Euro-Par 2010\u2014Parallel Processing","author":"Jeannot Emmanuel","unstructured":"Emmanuel Jeannot and Guillaume Mercier . 2010. Near-optimal placement of MPI processes on hierarchical NUMA architectures . In Euro-Par 2010\u2014Parallel Processing . Lecture Notes in Computer Science , Vol. 6272 . Springer , 199--210. Emmanuel Jeannot and Guillaume Mercier. 2010. Near-optimal placement of MPI processes on hierarchical NUMA architectures. In Euro-Par 2010\u2014Parallel Processing. Lecture Notes in Computer Science, Vol. 6272. Springer, 199--210."},{"key":"e_1_2_1_28_1","volume-title":"Retrieved","author":"JEDEC.","year":"2012","unstructured":"JEDEC. 2012 . DDR3 SDRAM Standard . Retrieved August 20, 2016, from https:\/\/www.jedec.org\/standards-documents\/docs\/jesd-79-3d. JEDEC. 2012. DDR3 SDRAM Standard. Retrieved August 20, 2016, from https:\/\/www.jedec.org\/standards-documents\/docs\/jesd-79-3d."},{"unstructured":"H. Jin M. Frumkin and J. Yan. 1999. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report. NASA.  H. Jin M. Frumkin and J. Yan. 1999. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report. NASA.","key":"e_1_2_1_29_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_2_1_31_1","series-title":"Lecture Notes in Computer Science","volume-title":"Transactions on High-Performance Embedded Architectures and Compilers","author":"Klug Tobias","unstructured":"Tobias Klug , Michael Ott , Josef Weidendorfer , and Carsten Trinitis . 2008. autopin\u2014automated optimization of thread-to-core pinning on multicore systems . In Transactions on High-Performance Embedded Architectures and Compilers . Lecture Notes in Computer Science , Vol. 6590 . Springer , 219--235. Tobias Klug, Michael Ott, Josef Weidendorfer, and Carsten Trinitis. 2008. autopin\u2014automated optimization of thread-to-core pinning on multicore systems. In Transactions on High-Performance Embedded Architectures and Compilers. Lecture Notes in Computer Science, Vol. 6590. Springer, 219--235."},{"doi-asserted-by":"publisher","key":"e_1_2_1_32_1","DOI":"10.1145\/149439.133082"},{"doi-asserted-by":"publisher","key":"e_1_2_1_33_1","DOI":"10.1145\/1088149.1088201"},{"doi-asserted-by":"publisher","key":"e_1_2_1_34_1","DOI":"10.1109\/2.982916"},{"doi-asserted-by":"publisher","key":"e_1_2_1_35_1","DOI":"10.1145\/1122971.1122987"},{"doi-asserted-by":"publisher","key":"e_1_2_1_36_1","DOI":"10.1016\/j.jpdc.2010.08.015"},{"doi-asserted-by":"publisher","key":"e_1_2_1_37_1","DOI":"10.1145\/2209249.2209269"},{"doi-asserted-by":"publisher","key":"e_1_2_1_38_1","DOI":"10.1145\/1105734.1105747"},{"doi-asserted-by":"publisher","key":"e_1_2_1_39_1","DOI":"10.1145\/1639949.1640117"},{"doi-asserted-by":"publisher","key":"e_1_2_1_40_1","DOI":"10.1145\/2628071.2628077"},{"doi-asserted-by":"publisher","key":"e_1_2_1_41_1","DOI":"10.1109\/TPDS.2012.311"},{"doi-asserted-by":"publisher","key":"e_1_2_1_42_1","DOI":"10.1109\/SBAC-PAD.2009.16"},{"doi-asserted-by":"publisher","key":"e_1_2_1_43_1","DOI":"10.1145\/1366219.1366222"},{"key":"e_1_2_1_44_1","volume-title":"Norman P. Jouppi, and Palo Alto.","author":"Thoziyoor Shyamkumar","year":"2008","unstructured":"Shyamkumar Thoziyoor , Naveen Muralimanohar , Jung Ho Ahn , Norman P. Jouppi, and Palo Alto. 2008 . Cacti 5.1. Technical Report. HP Labs . Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, Norman P. Jouppi, and Palo Alto. 2008. Cacti 5.1. Technical Report. HP Labs."},{"doi-asserted-by":"publisher","key":"e_1_2_1_45_1","DOI":"10.1016\/j.jpdc.2008.05.006"},{"doi-asserted-by":"publisher","key":"e_1_2_1_46_1","DOI":"10.1109\/MC.2009.341"},{"doi-asserted-by":"publisher","key":"e_1_2_1_48_1","DOI":"10.1145\/1736020.1736036"},{"doi-asserted-by":"publisher","key":"e_1_2_1_49_1","DOI":"10.1145\/2379776.2379780"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2975587","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2975587","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:50:18Z","timestamp":1750218618000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2975587"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,17]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2016,9,17]]}},"alternative-id":["10.1145\/2975587"],"URL":"https:\/\/doi.org\/10.1145\/2975587","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2016,9,17]]},"assertion":[{"value":"2016-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-09-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}