{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T12:16:53Z","timestamp":1767183413744,"version":"3.41.0"},"reference-count":91,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,12,13]],"date-time":"2022-12-13T00:00:00Z","timestamp":1670889600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"R&D program of MOTIE\/KEIT","award":["10077609"],"award-info":[{"award-number":["10077609"]}]},{"name":"Engineering Research Center Program through the National Research Foundation of Korea"},{"name":"Korean Government MSIT","award":["NRF-2018R1A5A1059921"],"award-info":[{"award-number":["NRF-2018R1A5A1059921"]}]},{"name":"Inha University Research"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,1,31]]},"abstract":"<jats:p>\n            Hardware performance monitoring units (PMUs) are a standard feature in modern microprocessors, providing a rich set of microarchitectural event samplers. Recently, numerous profile-guided optimization (PGO) frameworks have exploited them to feature much lower profiling overhead compared to conventional instrumentation-based frameworks. However, existing PGO frameworks mainly focus on optimizing the layout of binaries; they overlook rich information provided by the PMU about data access behaviors over the memory hierarchy. Thus, we propose MaPHeA, a lightweight\n            <jats:bold>\n              <jats:underline>M<\/jats:underline>\n            <\/jats:bold>\n            emory hierarchy-\n            <jats:bold>\n              <jats:underline>a<\/jats:underline>\n            <\/jats:bold>\n            ware\n            <jats:bold>\n              <jats:underline>P<\/jats:underline>\n            <\/jats:bold>\n            rofile-guided\n            <jats:bold>\n              <jats:underline>He<\/jats:underline>\n            <\/jats:bold>\n            ap\n            <jats:bold>\n              <jats:underline>A<\/jats:underline>\n            <\/jats:bold>\n            llocation framework applicable to both HPC and embedded systems. MaPHeA guides and applies the optimized allocation of dynamically allocated heap objects with very low profiling overhead and without additional user intervention to improve application performance. To demonstrate the effectiveness of MaPHeA, we apply it to optimizing heap object allocation in an emerging DRAM-NVM heterogeneous memory system (HMS), selective huge-page utilization, and controlling the cacheability of the objects with the low temporal locality. In an HMS, by identifying and placing frequently accessed heap objects to the fast DRAM region, MaPHeA improves the performance of memory-intensive graph-processing and Redis workloads by 56.0% on average over the default configuration that uses DRAM as a hardware-managed cache of slow NVM. By identifying large heap objects that cause frequent TLB misses and allocating them to huge pages, MaPHeA increases the performance of the read and update operations of Redis by 10.6% over the transparent huge-page implementation of Linux. Also, by distinguishing the objects that cause cache pollution due to their low temporal locality and applying write-combining to them, MaPHeA improves the performance of STREAM and RADIX workloads by 20.0% on average over the system without cacheability control.\n          <\/jats:p>","DOI":"10.1145\/3527853","type":"journal-article","created":{"date-parts":[[2022,3,31]],"date-time":"2022-03-31T12:06:24Z","timestamp":1648728384000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["MaPHeA: A Framework for Lightweight Memory Hierarchy-aware Profile-guided Heap Allocation"],"prefix":"10.1145","volume":"22","author":[{"given":"Deok-Jae","family":"Oh","sequence":"first","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Yaebin","family":"Moon","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Do Kyu","family":"Ham","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Tae Jun","family":"Ham","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Yongjun","family":"Park","sequence":"additional","affiliation":[{"name":"Hanyang University, South Korea"}]},{"given":"Jae W.","family":"Lee","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Jung Ho","family":"Ahn","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Eojin","family":"Lee","sequence":"additional","affiliation":[{"name":"Inha University, South Korea"}]}],"member":"320","published-online":{"date-parts":[[2022,12,13]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996873"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037706"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296979.3192392"},{"key":"e_1_3_1_5_2","unstructured":"AMD. 2017. AMD64 Architecture Programmer\u2019s Manual Volume 2: System Programming. Retrieved from https:\/\/www.amd.com\/system\/files\/TechDocs\/24593.pdf."},{"key":"e_1_3_1_6_2","unstructured":"J. A. Ang B. W. Barrett K. B. Wheeler and R. C. Murphy. 2010. Introducing the Graph 500. DOI:https:\/\/www.osti.gov\/biblio\/1014641"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2019.2899330"},{"key":"e_1_3_1_8_2","unstructured":"ARM. 2019. ARM\u00ae ARM Architecture Reference Manual Armv8 for Armv8-A Architecture Profile. Retrieved from https:\/\/documentation-service.arm.com\/static\/60119835773bb020e3de6fee?token=."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00061"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024723.2000101"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485943"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2015.12"},{"key":"e_1_3_1_13_2","unstructured":"S. Beamer K. Asanovi\u0107 and D. Patterson. 2017. The GAP Benchmark Suite. arXiv:1508.03619 [cs.DC]."},{"key":"e_1_3_1_14_2","unstructured":"C. Cantalupo V. Venkatesan J. Hammond K. Czurlyo and S. D. Hammond. 2015. memkind: An extensible heap memory manager for heterogeneous memory platforms and mixed memory policies.DOI:https:\/\/www.osti.gov\/biblio\/1245908"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2854038.2854044"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772963"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.233"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2014.7004353"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368826.3377922"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/301631.301633"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1807128.1807152"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_1_23_2","volume-title":"Proceedings of the Linux Plumbers Conference","author":"Melo A. C. de","year":"2009","unstructured":"A. C. de Melo. 2009. Performance counters on Linux. In Proceedings of the Linux Plumbers Conference."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/2901318.2901344"},{"key":"e_1_3_1_25_2","unstructured":"GNU. 2016. GCC. Retrieved from https:\/\/github.com\/gcc-mirror\/gcc."},{"key":"e_1_3_1_26_2","unstructured":"Google. 2019. AutoFDO. Retrieved from: https:\/\/github.com\/google\/autofdo."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357526.3357534"},{"key":"e_1_3_1_28_2","unstructured":"T. Hirofuchi and R. Takano. 2019. The Preliminary Evaluation of a Hypervisor-Based Virtualization Mechanism for Intel Optane DC Persistent Memory Module. arXiv:1907.12014 [cs.OS]."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2014.2321571"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.7873\/DATE.2013.131"},{"key":"e_1_3_1_31_2","volume-title":"Proceedings of the GCC Summit","author":"Hubicka J.","year":"2005","unstructured":"J. Hubicka. 2005. Profile driven optimisations in GCC. In Proceedings of the GCC Summit."},{"key":"e_1_3_1_32_2","unstructured":"IBM. 2018. POWER9 Performance Monitor Unit User\u2019s Guide. Retrieved from https:\/\/wiki.raptorcs.com\/w\/images\/6\/6b\/POWER9_PMU_UG_v12_28NOV2018_pub.pdf."},{"key":"e_1_3_1_33_2","unstructured":"Intel. 2018. Memory Optimizer. Retrieved from https:\/\/github.com\/intel\/memory-optimizer."},{"key":"e_1_3_1_34_2","unstructured":"Intel. 2018. Persistent Memory Documentation. Retrieved from https:\/\/docs.pmem.io\/persistent-memory\/."},{"key":"e_1_3_1_35_2","unstructured":"Intel. 2019. MEMKIND. Retrieved from https:\/\/github.com\/memkind\/memkind."},{"key":"e_1_3_1_36_2","unstructured":"Intel. 2021. Intel\u00ae 64 and IA-32 Architectures Optimization Reference Manual. Retrieved from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/download\/intel-64-and-ia-32-architectures-optimization-reference-manual."},{"key":"e_1_3_1_37_2","unstructured":"Intel. 2021. Intel\u00ae 64 and IA-32 Architectures Software Developer\u2019s Manual Combined Volumes 3B: System Programming Guide. Retrieved from https:\/\/software.intel.com\/en-us\/download\/intel-64-and-ia-32-architectures-sdm-volume-3b-system-programming-guide-part-2."},{"key":"e_1_3_1_38_2","unstructured":"JEDEC. 2012. JEDEC Standard: DDR4 SDRAM."},{"key":"e_1_3_1_39_2","unstructured":"JEDEC. 2015. High Bandwidth Memory (HBM) DRAM."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2015.2495103"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750392"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037736"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080245"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.5555\/3018869.3018871"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.231"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772751"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.5555\/3026877.3026931"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2019.8662367"},{"key":"e_1_3_1_49_2","unstructured":"J. Leidel and R. C. Murphy. 2015. Hybrid Memory Cube System Interconnect Directory-Based Cache Coherence Methodology. US Patent App. 14\/706 516."},{"key":"e_1_3_1_50_2","unstructured":"Linux. 2009. Transparent Hugepages. Retrieved from https:\/\/lwn.net\/Articles\/359158."},{"key":"e_1_3_1_51_2","unstructured":"Linux. 2018. PMEM NUMA Node and Hotness Accounting\/Migration. Retrieved from https:\/\/lkml.org\/lkml\/2018\/12\/26\/138."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2019.8875668"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281660"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378525"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/1566445.1566553"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2015.7357128"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/1122971.1122987"},{"key":"e_1_3_1_58_2","first-page":"19","article-title":"Memory bandwidth and machine balance in current high performance computers","volume":"84","author":"McCalpin J. D.","year":"1995","unstructured":"J. D. McCalpin. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Societ. Technic. Committ. Comput. Archit. Newsl. 84 (1995), 19\u201325.","journal-title":"IEEE Comput. Societ. Technic. Committ. Comput. Archit. Newsl."},{"key":"e_1_3_1_59_2","volume-title":"Proceedings of the GCC Summit","author":"Merrill J.","year":"2003","unstructured":"J. Merrill. 2003. GENERIC and GIMPLE: A new tree representation for entire functions. In Proceedings of the GCC Summit."},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056027"},{"key":"e_1_3_1_61_2","unstructured":"Micron. 2016. 3D XPoint Technology. Retrieved from https:\/\/www.micron.com\/products\/advanced-solutions\/3d-xpoint-technology."},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807626"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2018.00042"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/844128.844138"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3461648.3463844"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3196886"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2017.7863743"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661201"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/2708463.2709068"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304064"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/93542.93550"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.30"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555760"},{"key":"e_1_3_1_74_2","unstructured":"Redis. 2020. redis.io. Retrieved from https:\/\/redis.io."},{"key":"e_1_3_1_75_2","first-page":"34","article-title":"Persistent memory programming","volume":"42","author":"Rudoff A.","year":"2017","unstructured":"A. Rudoff. 2017. Persistent memory programming. Login: Usenix Mag. 42 (2017), 34\u201340.","journal-title":"Login: Usenix Mag."},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.44"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368826.3377914"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/384265.291012"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2017.50"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/2716282.2716283"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314650"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835958"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1145\/3205289.3205320"},{"key":"e_1_3_1_84_2","unstructured":"B. Wicht R. A. Vitillo D. Chen and D. Levinthal. 2014. Hardware Counted Profile-Guided Optimization. arXiv:1411.6361 [cs.PL]."},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.1995.524546"},{"key":"e_1_3_1_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126923"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00034"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2013.6691165"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056023"},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304024"},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385977"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTCSA.2014.6910524"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3527853","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3527853","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:02Z","timestamp":1750186802000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3527853"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,13]]},"references-count":91,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,31]]}},"alternative-id":["10.1145\/3527853"],"URL":"https:\/\/doi.org\/10.1145\/3527853","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2022,12,13]]},"assertion":[{"value":"2021-10-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-20","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}