{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T02:10:05Z","timestamp":1750299005776,"version":"3.41.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA2","license":[{"start":{"date-parts":[[2023,10,16]],"date-time":"2023-10-16T00:00:00Z","timestamp":1697414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DUE-2215193, CCF-2024253, CNS-1750760"],"award-info":[{"award-number":["DUE-2215193, CCF-2024253, CNS-1750760"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2023,10,16]]},"abstract":"<jats:p>The memory allocator plays a key role in the performance of applications, but none of the existing profilers can pinpoint performance slowdowns caused by a memory allocator. Consequently, programmers may spend time improving application code incorrectly or unnecessarily, achieving low or no performance improvement. This paper designs the first profiler\u2014MemPerf\u2014to identify allocator-induced performance slowdowns without comparing against another allocator. Based on the key observation that an allocator may impact the whole life-cycle of heap objects, including the accesses (or uses) of these objects, MemPerf proposes a life-cycle based detection to identify slowdowns caused by slow memory management operations and slow accesses separately. For the prior one, MemPerf proposes a thread-aware and type-aware performance modeling to identify slow management operations. For slow memory accesses, MemPerf utilizes a top-down approach to identify all possible reasons for slow memory accesses introduced by the allocator, mainly due to cache and TLB misses, and further proposes a unified method to identify them correctly and efficiently. Based on our extensive evaluation, MemPerf reports 98% medium and large allocator-reduced slowdowns (larger than 5%) correctly without reporting any false positives. MemPerf also pinpoints multiple known and unknown design issues in widely-used allocators.<\/jats:p>","DOI":"10.1145\/3622848","type":"journal-article","created":{"date-parts":[[2023,10,16]],"date-time":"2023-10-16T15:41:29Z","timestamp":1697470889000},"page":"1418-1441","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["MemPerf: Profiling Allocator-Induced Performance Slowdowns"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1201-7806","authenticated-orcid":false,"given":"Jin","family":"Zhou","sequence":"first","affiliation":[{"name":"University of Massachusetts at Amherst, Amherst, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9470-6439","authenticated-orcid":false,"given":"Sam","family":"Silvestro","sequence":"additional","affiliation":[{"name":"University of Texas at San Antonio, San Antonio, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1854-6426","authenticated-orcid":false,"given":"Steven (Jiaxun)","family":"Tang","sequence":"additional","affiliation":[{"name":"University of Massachusetts at Amherst, Amherst, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2817-7839","authenticated-orcid":false,"given":"Hanmei","family":"Yang","sequence":"additional","affiliation":[{"name":"University of Massachusetts at Amherst, Amherst, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-9198-2615","authenticated-orcid":false,"given":"Hongyu","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Texas at San Antonio, San Antonio, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8694-7020","authenticated-orcid":false,"given":"Guangming","family":"Zeng","sequence":"additional","affiliation":[{"name":"Synopsys, Sunnyvale, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1696-4272","authenticated-orcid":false,"given":"Bo","family":"Wu","sequence":"additional","affiliation":[{"name":"Colorado School of Mines, Golden, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1190-522X","authenticated-orcid":false,"given":"Cong","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1968-4081","authenticated-orcid":false,"given":"Tongping","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Massachusetts at Amherst, Amherst, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,10,16]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2814270.2814294"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3064176.3064186"},{"key":"e_1_2_1_3_1","unstructured":"Android Community. 2020. View the Java heap and memory allocations with citep. https:\/\/developer.android.com\/studio\/profile\/memory-profiler \t\t\t\t  Android Community. 2020. View the Java heap and memory allocations with citep. https:\/\/developer.android.com\/studio\/profile\/memory-profiler"},{"key":"e_1_2_1_4_1","unstructured":"The OpenMP ARB. 2022. The OpenMP API Specification For Parallel Programming. https:\/\/www.openmp.org\/ \t\t\t\t  The OpenMP ARB. 2022. The OpenMP API Specification For Parallel Programming. https:\/\/www.openmp.org\/"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.1998.694758"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/378993.379232"},{"key":"e_1_2_1_7_1","unstructured":"Christian Bienia and Kai Li. 2009. PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors. http:\/\/www-mount.ece.umn.edu\/ jjyi\/MoBS\/2009\/program\/02E-Bienia.pdf \t\t\t\t  Christian Bienia and Kai Li. 2009. PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors. http:\/\/www-mount.ece.umn.edu\/ jjyi\/MoBS\/2009\/program\/02E-Bienia.pdf"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178499"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/GCCE.2014.7031343"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2004.1291361"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1806799.1806874"},{"key":"e_1_2_1_12_1","unstructured":"Jon Coppeard. 2017. Allocate all JS data in a separate jemalloc arena. https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1410132 \t\t\t\t  Jon Coppeard. 2017. Allocate all JS data in a separate jemalloc arena. https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1410132"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2815400.2815409"},{"key":"e_1_2_1_14_1","unstructured":"Stephane Eranian Eric Gouriou Tipp Moseley and Willem de Bruijn. 2015. Linux kernel profiling with perf. https:\/\/perf.wiki.kernel.org\/index.php\/Tutorial \t\t\t\t  Stephane Eranian Eric Gouriou Tipp Moseley and Willem de Bruijn. 2015. Linux kernel profiling with perf. https:\/\/perf.wiki.kernel.org\/index.php\/Tutorial"},{"key":"e_1_2_1_15_1","unstructured":"Jason Evans. 2016. Scalable memory allocation using jemalloc. https:\/\/krebsonsecurity.com\/2016\/10\/ddos-on-dyn-impacts-twitter-spotify-reddit\/ \t\t\t\t  Jason Evans. 2016. Scalable memory allocation using jemalloc. https:\/\/krebsonsecurity.com\/2016\/10\/ddos-on-dyn-impacts-twitter-spotify-reddit\/"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBESC.2011.29"},{"key":"e_1_2_1_17_1","unstructured":"Free Software Foundation Inc.. 2015. The GNU C Library: Allocation Debugging. http:\/\/www.gnu.org\/software\/libc\/manual\/html_node\/Allocation-Debugging.html \t\t\t\t  Free Software Foundation Inc.. 2015. The GNU C Library: Allocation Debugging. http:\/\/www.gnu.org\/software\/libc\/manual\/html_node\/Allocation-Debugging.html"},{"key":"e_1_2_1_18_1","unstructured":"Sanjay Ghemawat. 2005. Profiling heap usage. http:\/\/goog-perftools.sourceforge.net\/doc\/heap_profiler.html \t\t\t\t  Sanjay Ghemawat. 2005. Profiling heap usage. http:\/\/goog-perftools.sourceforge.net\/doc\/heap_profiler.html"},{"key":"e_1_2_1_19_1","volume-title":"TCMalloc: Thread-caching malloc","author":"Ghemawat Sanjay","year":"2007","unstructured":"Sanjay Ghemawat and Paul Menage . 2007. TCMalloc: Thread-caching malloc , 2007 . http:\/\/goog-perftools.sourceforge.net\/doc\/tcmalloc.html Sanjay Ghemawat and Paul Menage. 2007. TCMalloc: Thread-caching malloc, 2007. http:\/\/goog-perftools.sourceforge.net\/doc\/tcmalloc.html"},{"key":"e_1_2_1_20_1","unstructured":"Mel Gorman. 2015. malloc: Reduce worst-case behaviour with madvise and refault overhead. https:\/\/patchwork.ozlabs.org\/project\/glibc\/patch\/20150209140608.GD2395@suse.de\/ \t\t\t\t  Mel Gorman. 2015. malloc: Reduce worst-case behaviour with madvise and refault overhead. https:\/\/patchwork.ozlabs.org\/project\/glibc\/patch\/20150209140608.GD2395@suse.de\/"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/872726.806987"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314644"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2754169.2754178"},{"key":"e_1_2_1_25_1","first-page":"L","volume":"201","unstructured":"Lawrence Livermore National Laboratory. 201 8. CORA L - 2 Benchmarks. https:\/\/asc.llnl.gov\/coral-2-benchmarks Lawrence Livermore National Laboratory. 2018. CORAL-2 Benchmarks. https:\/\/asc.llnl.gov\/coral-2-benchmarks","journal-title":"Lawrence Livermore National Laboratory."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASSET.2000.888070"},{"key":"e_1_2_1_27_1","unstructured":"Daan Leijen. 2020. mimalloc. https:\/\/github.com\/microsoft\/mimalloc \t\t\t\t  Daan Leijen. 2020. mimalloc. https:\/\/github.com\/microsoft\/mimalloc"},{"key":"e_1_2_1_28_1","volume-title":"Oprofile: A system profiler for linux. https:\/\/oprofile.sourceforge.io\/news\/","author":"Levon John","year":"2004","unstructured":"John Levon and Philippe Elie . 2004 . Oprofile: A system profiler for linux. https:\/\/oprofile.sourceforge.io\/news\/ John Levon and Philippe Elie. 2004. Oprofile: A system profiler for linux. https:\/\/oprofile.sourceforge.io\/news\/"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2048066.2048070"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2854038.2854039"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2692916.2555244"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503297"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628102"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628102"},{"key":"e_1_2_1_35_1","unstructured":"Brandon Lucia. [n. d.]. MultiCacheSim: A coherent multiprocessor cache simulator. https:\/\/courses.cs.washington.edu\/courses\/cse471\/11sp\/sim.html. \t\t\t\t  Brandon Lucia. [n. d.]. MultiCacheSim: A coherent multiprocessor cache simulator. https:\/\/courses.cs.washington.edu\/courses\/cse471\/11sp\/sim.html."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446070"},{"key":"e_1_2_1_38_1","unstructured":"Adrian LUPASC and Viorica Popoiu. 2014. Dynamic Memory Allocation\u2013Clr Profiler. 118 pages. \t\t\t\t  Adrian LUPASC and Viorica Popoiu. 2014. Dynamic Memory Allocation\u2013Clr Profiler. 118 pages."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSMC.2011.6084042"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24749-1_12"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1755913.1755947"},{"key":"e_1_2_1_42_1","unstructured":"The Open MPI Project. 2022. Open MPI: Open Source High Performance Computing. https:\/\/www.open-mpi.org\/ \t\t\t\t  The Open MPI Project. 2022. Open MPI: Open Source High Performance Computing. https:\/\/www.open-mpi.org\/"},{"key":"e_1_2_1_43_1","unstructured":"Kirill Rogozhin. 2014. Controlling memory consumption with Intel\u00ae Threading Building Blocks (Intel\u00ae TBB) scalable allocator. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/controlling-memory-consumption-with-intel-threading-building-blocks-intel-tbb-scalable.html \t\t\t\t  Kirill Rogozhin. 2014. Controlling memory consumption with Intel\u00ae Threading Building Blocks (Intel\u00ae TBB) scalable allocator. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/controlling-memory-consumption-with-intel-threading-building-blocks-intel-tbb-scalable.html"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3168819"},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC\u201912)","author":"Serebryany Konstantin","year":"2012","unstructured":"Konstantin Serebryany , Derek Bruening , Alexander Potapenko , and Dmitry Vyukov . 2012 . AddressSanitizer: a fast address sanity checker . In Proceedings of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC\u201912) . USENIX Association, Berkeley, CA, USA. 28\u201328. http:\/\/dl.acm.org\/citation.cfm?id=2342821.2342849 Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: a fast address sanity checker. In Proceedings of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC\u201912). USENIX Association, Berkeley, CA, USA. 28\u201328. http:\/\/dl.acm.org\/citation.cfm?id=2342821.2342849"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1985793.1985848"},{"key":"e_1_2_1_47_1","unstructured":"Oliver Yang. 2015. Pitfalls of TSC usage. http:\/\/oliveryang.net\/2015\/09\/pitfalls-of-TSC-usage\/ \t\t\t\t  Oliver Yang. 2015. Pitfalls of TSC usage. http:\/\/oliveryang.net\/2015\/09\/pitfalls-of-TSC-usage\/"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2931037.2931070"},{"key":"e_1_2_1_49_1","unstructured":"Matej Zavrtanik and Jurij Mihelic. [n. d.]. Experimental Evaluation and Comparison of Memory Allocators in the GNU\/Linux Operating System. http:\/\/ipsitransactions.org\/journals\/papers\/tar\/2017jan\/p10.pdf \t\t\t\t  Matej Zavrtanik and Jurij Mihelic. [n. d.]. Experimental Evaluation and Comparison of Memory Allocators in the GNU\/Linux Operating System. http:\/\/ipsitransactions.org\/journals\/papers\/tar\/2017jan\/p10.pdf"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1952682.1952688"},{"key":"e_1_2_1_51_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018","author":"Zhou Fang","year":"2018","unstructured":"Fang Zhou , Yifan Gan , Sixiang Ma , and Yang Wang . 2018 . wPerf: Generic Off-CPU Analysis to Identify Bottleneck Waiting Events . In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018 , Carlsbad, CA, USA , October 8-10, 2018, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 527\u2013543. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/zhou Fang Zhou, Yifan Gan, Sixiang Ma, and Yang Wang. 2018. wPerf: Generic Off-CPU Analysis to Identify Bottleneck Waiting Events. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 527\u2013543. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/zhou"},{"key":"e_1_2_1_52_1","volume-title":"Hilfinger","author":"Zorn Benjamin","year":"1988","unstructured":"Benjamin Zorn and Paul N . Hilfinger . 1988 . A Memory Allocation Profiler for C and Lisp Programs. EECS Department, University of California , Berkeley. http:\/\/www2.eecs.berkeley.edu\/Pubs\/TechRpts\/1988\/5382.html Benjamin Zorn and Paul N. Hilfinger. 1988. A Memory Allocation Profiler for C and Lisp Programs. EECS Department, University of California, Berkeley. http:\/\/www2.eecs.berkeley.edu\/Pubs\/TechRpts\/1988\/5382.html"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3622848","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3622848","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:27Z","timestamp":1750298247000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3622848"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,16]]},"references-count":51,"journal-issue":{"issue":"OOPSLA2","published-print":{"date-parts":[[2023,10,16]]}},"alternative-id":["10.1145\/3622848"],"URL":"https:\/\/doi.org\/10.1145\/3622848","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2023,10,16]]},"assertion":[{"value":"2023-10-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}