{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,30]],"date-time":"2025-12-30T23:24:42Z","timestamp":1767137082190,"version":"build-2238731810"},"publisher-location":"Cham","reference-count":38,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783030507428","type":"print"},{"value":"9783030507435","type":"electronic"}],"license":[{"start":{"date-parts":[[2020,1,1]],"date-time":"2020-01-01T00:00:00Z","timestamp":1577836800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,6,15]],"date-time":"2020-06-15T00:00:00Z","timestamp":1592179200000},"content-version":"vor","delay-in-days":166,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    The ever increasing demand for higher memory performance and\u2014at the same time\u2014larger memory capacity is leading the industry towards hybrid main memory designs, i.e., memory systems that consist of multiple different memory technologies. This trend, however, naturally leads to one important question: how can we efficiently utilize such hybrid memories? Our paper proposes a software-based approach to solve this challenge by deploying a\n                    <jats:italic>pattern-aware<\/jats:italic>\n                    staging technique. Our work is based on the following observations: (a)\u00a0the high-bandwidth fast memory outperforms the large memory for memory intensive tasks; (b)\u00a0but those tasks can run for much longer than a bulk data copy to\/from the fast memory, especially when the access pattern is more\n                    <jats:italic>irregular\/sparse<\/jats:italic>\n                    . We exploit these observations by applying the following staging technique\n                    <jats:italic>if the accesses are irregular and sparse<\/jats:italic>\n                    : (1)\u00a0copying a chunk (few GB of sequential data) from large to fast memory; (2)\u00a0performing a memory intensive task on the chunk; and (3)\u00a0writing it back to the large memory. To check the\n                    <jats:italic>regularity\/sparseness<\/jats:italic>\n                    of the accesses\n                    <jats:italic>at runtime<\/jats:italic>\n                    with negligible performance impact, we develop a lightweight pattern detection mechanism using a\n                    <jats:italic>helper threading<\/jats:italic>\n                    inspired approach with two different\n                    <jats:bold>\n                      <jats:italic>Bloom filters<\/jats:italic>\n                    <\/jats:bold>\n                    . Our case study using various scientific codes on a real system shows that our approach achieves significant speed-ups compared to executions with using only the large memory or hardware caching: 3\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\times $$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    or 41% speedups in the best, respectively.\n                  <\/jats:p>","DOI":"10.1007\/978-3-030-50743-5_24","type":"book-chapter","created":{"date-parts":[[2020,6,15]],"date-time":"2020-06-15T15:03:45Z","timestamp":1592233425000},"page":"474-495","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Pattern-Aware Staging for Hybrid Memory Systems"],"prefix":"10.1007","author":[{"given":"Eishi","family":"Arima","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Schulz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,6,15]]},"reference":[{"key":"24_CR1","doi-asserted-by":"crossref","unstructured":"Alvarez, L., et al.: Runtime-guided management of stacked DRAM memories in task parallel programs. In: ICS, pp. 218\u2013228 (2018)","DOI":"10.1145\/3205289.3205312"},{"key":"24_CR2","doi-asserted-by":"crossref","unstructured":"Bell, N., et al.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC, pp. 18:1\u201318:11 (2009)","DOI":"10.1145\/1654059.1654078"},{"key":"24_CR3","doi-asserted-by":"crossref","unstructured":"Benoit, A., et al.: A performance model to execute workflows on high-bandwidth-memory architectures. In: ICPP, pp. 36:1\u201336:10 (2018)","DOI":"10.1145\/3225058.3225110"},{"issue":"7","key":"24_CR4","doi-asserted-by":"publisher","first-page":"422","DOI":"10.1145\/362686.362692","volume":"13","author":"BH Bloom","year":"1970","unstructured":"Bloom, B.H.: Space\/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422\u2013426 (1970)","journal-title":"Commun. ACM"},{"key":"24_CR5","doi-asserted-by":"crossref","unstructured":"Burnett, G.J., et al.: A study of interleaved memory systems. In: AFIPS 1970 (Spring), pp. 467\u2013474 (1970)","DOI":"10.1145\/1476936.1477008"},{"key":"24_CR6","doi-asserted-by":"crossref","unstructured":"Butcher, N., et al.: Optimizing for KNL usage modes when data doesn\u2019t fit in MCDRAM. In: ICPP, pp. 37:1\u201337:10 (2018)","DOI":"10.1145\/3225058.3225116"},{"key":"24_CR7","unstructured":"Cantalupo, C., et al.: User Extensible Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies (2015)"},{"key":"24_CR8","doi-asserted-by":"crossref","unstructured":"Chatterjee, N., et al.: Architecting an energy-efficient dram system for GPUs. In: HPCA, pp. 73\u201384 (2017)","DOI":"10.1109\/HPCA.2017.58"},{"key":"24_CR9","unstructured":"Consortium, H.M.C.: Hybrid Memory Cube Specification 2.1. Last Revision January (2015)"},{"issue":"1","key":"24_CR10","doi-asserted-by":"publisher","first-page":"1:1","DOI":"10.1145\/2049662.2049663","volume":"38","author":"TA Davis","year":"2011","unstructured":"Davis, T.A., et al.: The university of Florida sparse matrix collection. ACM TOMS 38(1), 1:1\u20131:25 (2011)","journal-title":"ACM TOMS"},{"key":"24_CR11","doi-asserted-by":"crossref","unstructured":"Dhiman, G., et al.: PDRAM: a Hybrid PRAM and DRAM main memory system. In: DAC, pp. 664\u2013669 (2009)","DOI":"10.1145\/1629911.1630086"},{"key":"24_CR12","doi-asserted-by":"crossref","unstructured":"Doudali, T.D., et al.: Kleio: a hybrid memory page scheduler with machine intelligence. In: HPDC, pp. 37\u201348 (2019)","DOI":"10.1145\/3307681.3325398"},{"key":"24_CR13","doi-asserted-by":"crossref","unstructured":"Dulloor, S.R., et al.: Data tiering in heterogeneous memory systems. In: EuroSys (2016)","DOI":"10.1145\/2901318.2901344"},{"key":"24_CR14","doi-asserted-by":"crossref","unstructured":"He, B., et al.: Efficient gather and scatter operations on graphics processors. In: SC, pp. 1\u201312 (2007)","DOI":"10.1145\/1362622.1362684"},{"key":"24_CR15","unstructured":"Intel: Intel\u00aeAgilex\u2122FPGA Advanced Information Brief: (Device Overview). INTEL (2019)"},{"key":"24_CR16","unstructured":"Izraelevitz, J., et al.: Basic performance measurements of the intel Optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019)"},{"key":"24_CR17","volume-title":"Intel Xeon Phi Processor High Performance Programming: Knights","author":"J Jeffers","year":"2016","unstructured":"Jeffers, J., et al.: Intel Xeon Phi Processor High Performance Programming: Knights, Landing edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)","edition":"Landing"},{"key":"24_CR18","unstructured":"Jun, H.: HBM (High Bandwidth Memory) for 2.5D. SEMICON Taiwan (2015)"},{"key":"24_CR19","doi-asserted-by":"crossref","unstructured":"Kamruzzaman, M., et al.: Inter-core prefetching for multicore processors using migrating helper threads. In: ASPLOS, pp. 393\u2013404 (2011)","DOI":"10.1145\/1961296.1950411"},{"key":"24_CR20","doi-asserted-by":"crossref","unstructured":"Kaseridis, D., et al.: Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era. In: MICRO, pp. 24\u201335 (2011)","DOI":"10.1145\/2155620.2155624"},{"key":"24_CR21","doi-asserted-by":"crossref","unstructured":"Khaldi, D., et al.: Towards automatic HBM allocation using LLVM: a case study with knights landing. In: LLVM-HPC 2016, pp. 12\u201320 (2016)","DOI":"10.1109\/LLVM-HPC.2016.007"},{"key":"24_CR22","unstructured":"Kim, D., et al.: Physical Experimentation with Prefetching Helper Threads on Intel\u2019s Hyper-Threaded Processors. In: CGO, p. 27 (2004)"},{"key":"24_CR23","unstructured":"Kim, J., et al.: HBM: memory solution for bandwidth-hungry processors. In: Hot Chips 26 Symposium (HCS), pp. 1\u201324 (2014)"},{"key":"24_CR24","unstructured":"Lattner, C., et al.: LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO, p. 75 (2004)"},{"issue":"9","key":"24_CR25","first-page":"1309","volume":"20","author":"J Lee","year":"2009","unstructured":"Lee, J., et al.: Prefetching with helper threads for loosely coupled multiprocessor systems. IEEE TPDS 20(9), 1309\u20131324 (2009)","journal-title":"IEEE TPDS"},{"key":"24_CR26","doi-asserted-by":"crossref","unstructured":"Matsuoka, S., et al.: From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore eera. In: CF, pp. 274\u2013281 (2016)","DOI":"10.1145\/2903150.2906830"},{"key":"24_CR27","doi-asserted-by":"crossref","unstructured":"Miao, H., et al.: StreamBox-HBM: stream analytics on high bandwidth hybrid memory. In: ASPLOS, pp. 167\u2013181 (2019)","DOI":"10.1145\/3297858.3304031"},{"key":"24_CR28","doi-asserted-by":"crossref","unstructured":"Mokhtari, R., et al.: BigKernel - high performance CPU-GPU communication pipelining for big data-style applications. In: IPDPS, pp. 819\u2013828 (2014)","DOI":"10.1109\/IPDPS.2014.89"},{"issue":"3","key":"24_CR29","first-page":"19","volume":"1","author":"O Mutlu","year":"2014","unstructured":"Mutlu, O., et al.: Research problems and opportunities in memory systems. SUPERFRI 1(3), 19\u201355 (2014)","journal-title":"SUPERFRI"},{"issue":"02n03","key":"24_CR30","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1142\/S0129626400000214","volume":"10","author":"D Quinlan","year":"2000","unstructured":"Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215\u2013226 (2000)","journal-title":"Parallel Process. Lett."},{"key":"24_CR31","unstructured":"Song, Y., et al.: Design and implementation of a compiler framework for helper threading on multi-core processors. In: PACT, pp. 99\u2013109 (2005)"},{"key":"24_CR32","doi-asserted-by":"crossref","unstructured":"Stanley-Marbell, P., et al.: Pinned to the walls: impact of packaging and application properties on the memory and power walls. In: ISLPED, pp. 51\u201356 (2011)","DOI":"10.1109\/ISLPED.2011.5993603"},{"key":"24_CR33","doi-asserted-by":"crossref","unstructured":"Tang, X., et al.: Improving bank-level parallelism for irregular applications. In: MICRO, pp. 1\u201312 (2016)","DOI":"10.1109\/MICRO.2016.7783760"},{"issue":"2","key":"24_CR34","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1109\/MCSE.2015.4","volume":"17","author":"JS Vetter","year":"2015","unstructured":"Vetter, J.S., et al.: Opportunities for nonvolatile memory systems in extreme-scale high-performance computing. Comput. Sci. Eng. 17(2), 73\u201382 (2015)","journal-title":"Comput. Sci. Eng."},{"key":"24_CR35","doi-asserted-by":"crossref","unstructured":"Vijayaraghavan, T., et al.: Design and analysis of an APU for exascale computing. In: HPCA, pp. 85\u201396 (2017)","DOI":"10.1109\/HPCA.2017.42"},{"key":"24_CR36","doi-asserted-by":"crossref","unstructured":"Wu, K., et al.: UNIMEM: runtime data management on non-volatile memory-based heterogeneous main memory. In: SC, pp. 58:1\u201358:14 (2017)","DOI":"10.1145\/3126908.3126923"},{"key":"24_CR37","doi-asserted-by":"crossref","unstructured":"Wu, K., et al.: Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs. In: SC (2018)","DOI":"10.1109\/SC.2018.00034"},{"key":"24_CR38","doi-asserted-by":"crossref","unstructured":"Yoon, H., et al.: Row buffer locality aware caching policies for hybrid memories. In: ICCD, pp. 337\u2013344 (2012)","DOI":"10.1109\/ICCD.2012.6378661"}],"updated-by":[{"DOI":"10.1007\/978-3-030-50743-5_28","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2020,6,15]],"date-time":"2020-06-15T00:00:00Z","timestamp":1592179200000}}],"container-title":["Lecture Notes in Computer Science","High Performance Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-50743-5_24","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,18]],"date-time":"2023-12-18T15:05:50Z","timestamp":1702911950000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-50743-5_24"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020]]},"ISBN":["9783030507428","9783030507435"],"references-count":38,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-50743-5_24","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020]]},"assertion":[{"value":"15 June 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"15 June 2020","order":2,"name":"change_date","label":"Change Date","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"Correction","order":3,"name":"change_type","label":"Change Type","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"The original version of chapters 17 and 24 were previously published non-open access. They have now been made open access under a CC BY 4.0 license and the copyright holder has been changed to \u2018The Author(s).\u2019 The book has also been updated with the change.","order":4,"name":"change_details","label":"Change Details","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"The chapters 19 and 25 were inadvertently published open access. This has been corrected and the chapters are now non-open access.","order":5,"name":"change_details","label":"Change Details","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ISC High Performance","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on High Performance Computing","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Frankfurt am Main","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Germany","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2020","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"22 June 2020","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"25 June 2020","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"35","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"supercomputing2020","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/www.isc-hpc.com\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Linklings","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"87","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"27","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"31% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.73","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"4.33","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"The conference was held virtually due to the COVID-19 pandemic.","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}