{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T03:10:03Z","timestamp":1739157003402,"version":"3.37.0"},"reference-count":19,"publisher":"Wiley","issue":"14","license":[{"start":{"date-parts":[[2009,6,10]],"date-time":"2009-06-10T00:00:00Z","timestamp":1244592000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2009,9,25]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Dynamic optimizers modify the binary code of programs at runtime by profiling and optimizing certain aspects of the execution. We present a completely software\u2010based framework that dynamically optimizes programs for object\u2010based distributed shared memory (DSM) systems on clusters. In DSM systems, reducing the number of messages between cluster nodes is crucial. Prefetching transfers data in advance from the storage node to the local node so that communication is minimized. Our framework uses a profiler and a dynamic binary rewriter that monitor the access behavior of the application and place prefetches where they are beneficial to speed up the application. In addition, we use two distinct predictors to handle different types of access patterns. A meta\u2010predictor analyzes the memory access behavior and dynamically enables one of the predictors. Our system also adapts the number of prefetches per request to best fit the application's behavior. The evaluation shows that the performance of our system is better than the manual prefetching. The number of messages sent decreases by up to 90%. Performance gains of up to 80% can be observed on benchmarks. Copyright \u00a9 2009 John Wiley &amp; Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.1443","type":"journal-article","created":{"date-parts":[[2009,6,10]],"date-time":"2009-06-10T13:43:11Z","timestamp":1244641391000},"page":"1789-1803","source":"Crossref","is-referenced-by-count":0,"title":["A meta\u2010predictor framework for prefetching in object\u2010based DSMs"],"prefix":"10.1002","volume":"21","author":[{"given":"Jean Christophe","family":"Beyler","sequence":"first","affiliation":[]},{"given":"Michael","family":"Klemm","sequence":"additional","affiliation":[]},{"given":"Philippe","family":"Clauss","sequence":"additional","affiliation":[]},{"given":"Michael","family":"Philippsen","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2009,6,10]]},"reference":[{"key":"e_1_2_8_2_2","unstructured":"TOP500 List. Available at:http:\/\/www.top500.org\/[8 March2009]."},{"key":"e_1_2_8_3_2","unstructured":"MPI Forum.MPI\u20102: Extensions to the message\u2010passing interface. Technical Report MPI Forum July1997."},{"key":"e_1_2_8_4_2","unstructured":"LiuH HuW.A comparison of two strategies of dynamic data prefetching in software DSM. Proceedings of the 15th International Parallel and Distributed Processing Symposium San Francisco CA April 2001;62\u201367."},{"key":"e_1_2_8_5_2","unstructured":"SpeightE BurtscherM.Delphi: Prediction\u2010based page prefetching to improve the performance of shared virtual memory systems. Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications Las Vegas NV June 2002;49\u201355."},{"key":"e_1_2_8_6_2","doi-asserted-by":"crossref","unstructured":"VeldemaR HofmanRFH BhoedjangRAF BalHE.Runtime optimizations for a Java DSM implementation. Proceedings of the ACM\u2010ISCOPE Conference on Java Grande Palo Alto CA June 2001;153\u2013162.","DOI":"10.1145\/376656.376842"},{"key":"e_1_2_8_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/144965.145006"},{"key":"e_1_2_8_8_2","doi-asserted-by":"crossref","unstructured":"KlemmM BeylerJC LampertRT PhilippsenM ClaussP.Esodyp+: Prefetching in the Jackal software DSM. Proceedings of Euro\u2010Par 2007 Rennes France August 2007;563\u2013573.","DOI":"10.1007\/978-3-540-74466-5_60"},{"key":"e_1_2_8_9_2","doi-asserted-by":"crossref","unstructured":"BianchiniR PintoR AmorimCL.Data prefetching for software DSMs. Proceedings of the International Conference on Supercomputing Melbourne Australia July 1998;385\u2013392.","DOI":"10.1145\/277830.277925"},{"key":"e_1_2_8_10_2","unstructured":"JeunW\u2010C KeeY\u2010S HaS.Improving performance of OpenMP for SMP clusters through overlapping page migrations. Proceedings of the International Workshop on OpenMP Reims France June 2006 CD\u2010ROM."},{"key":"e_1_2_8_11_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1491"},{"key":"e_1_2_8_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.752653"},{"key":"e_1_2_8_13_2","unstructured":"BeylerJC ClaussP.ESODYP: An entirely software and dynamic data prefetcher based on a Markov model. Proceedings of the 12th Workshop on Compilers for Parallel Computers A Coruna Spain January 2006;118\u2013132."},{"key":"e_1_2_8_14_2","doi-asserted-by":"crossref","unstructured":"WooSC OharaM TorrieE SinghJP GuptaA.The SPLASH\u20102 programs: Characterization and methodological considerations. Proceedings of the 22nd International Symposium on Computer Architecture St. Margherita Ligure Italy June 1995;24\u201336.","DOI":"10.1145\/223982.223990"},{"key":"e_1_2_8_15_2","unstructured":"LuJ ChenH FuR HsuW OthmerB YewP ChenD.The performance of runtime data cache prefetching in a dynamic optimization system. Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture San Diego CA December 2003;180\u2013190."},{"key":"e_1_2_8_16_2","first-page":"1","article-title":"Design and implementation of a lightweight dynamic optimization system","volume":"6","author":"Lu J","year":"2004","journal-title":"Journal of Instruction\u2010Level Parallelism"},{"key":"e_1_2_8_17_2","unstructured":"BrueningD GarnettT AmarasingheS.An infrastructure for adaptive dynamic optimization. International Symposium on Code Generation and Optimization San Francisco CA March 2003;265\u2013275."},{"key":"e_1_2_8_18_2","doi-asserted-by":"crossref","unstructured":"ZhaoQ RabbahR AmarasingheS RudolphL WongW\u2010F.Ubiquitous memory introspection. Proceedings of the International Symposium on Code Generation and Optimization San Jose CA March 2007;299\u2013311.","DOI":"10.1109\/CGO.2007.12"},{"key":"e_1_2_8_19_2","doi-asserted-by":"crossref","unstructured":"ChilimbiTM HirzelM.Dynamic hot data stream prefetching for general\u2010purpose programs. Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation Berlin Germany June 2002;199\u2013209.","DOI":"10.1145\/512529.512554"},{"key":"e_1_2_8_20_2","unstructured":"SrivastavaA EdwardsA VoH.Vulcan binary transformation in a distributed environment. Technical Report MSR\u2010TR\u20102001\u201050 Microsoft Research April2001."}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.1443","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.1443","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T02:30:57Z","timestamp":1739154657000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.1443"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,6,10]]},"references-count":19,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2009,9,25]]}},"alternative-id":["10.1002\/cpe.1443"],"URL":"https:\/\/doi.org\/10.1002\/cpe.1443","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"type":"print","value":"1532-0626"},{"type":"electronic","value":"1532-0634"}],"subject":[],"published":{"date-parts":[[2009,6,10]]}}}