{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,10,25]],"date-time":"2023-10-25T05:49:36Z","timestamp":1698212976696},"reference-count":17,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2006,10,24]],"date-time":"2006-10-24T00:00:00Z","timestamp":1161648000000},"content-version":"vor","delay-in-days":5379,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency: Pract. Exper."],"published-print":{"date-parts":[[1992,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We study here the behavior of two numerical algorithms (matrix multiplication and finite difference method) on a three\u2010level memory hierarchy multi\u2010processor RP3. Using different versions of these algorithms, which differ on data placement (global, local, global and cacheable, local and cacheable) and on data access (blocked or non\u2010blocked), we study the impact of these parameters on the performance of the program. This performance analysis is done using a very accurate monitoring system (VPMC) which records instructions, memory requests, cache requests and misses. We perform also a theoretical performance analysis of these programs using a model of computation and communication. Good agreement is found between theoretical and experimental results. As a conclusion we discuss the use of local memory on such a machine and show that it is ineffective with RP3 cache, local and global memory communication speed ratios. We also discuss optimal use of cache and show that the optima can only be realized under some cache properties (private store\u2010in cache with user control of write\u2010back) and show that blocked optimal algorithms are to be used to find it. Comparing programming of shared and distributed memory multi\u2010processors, we remark that optimized algorithms for shared memory systems utilize the same blocking techniques used for programming distributed memory systems, leading to a common programming paradigm.<\/jats:p>","DOI":"10.1002\/cpe.4330040106","type":"journal-article","created":{"date-parts":[[2006,11,17]],"date-time":"2006-11-17T16:22:00Z","timestamp":1163780520000},"page":"79-106","source":"Crossref","is-referenced-by-count":2,"title":["Designing Algorithms on RP3"],"prefix":"10.1002","volume":"4","author":[{"given":"Luigi","family":"Brochard","sequence":"first","affiliation":[]},{"given":"Alex","family":"Freau","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2006,10,24]]},"reference":[{"key":"e_1_2_1_2_2","doi-asserted-by":"crossref","unstructured":"K.Gallivan W.Jalby U.MeierandA.Sameh \u2018The impact of hierarchical memory systems on linear algebra algorithm design\u2019 Int. J. Supercomput. Appl. 12\u201348(1988).","DOI":"10.1177\/109434208800200103"},{"key":"e_1_2_1_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(89)90004-5"},{"key":"e_1_2_1_4_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4330040105"},{"key":"e_1_2_1_5_2","unstructured":"G. F.Pfister W. C.Brantley D. A.George S. L.Harvey W. J.KleinfelderK. P.McAuliffe E. A.Melton V. A.NortonandJ.Weiss \u2018The IBM Research parallel processor prototype (RP3): introduction and architectre\u2019 Proceedings of the 1985 International Conference on Parallel Processing 1985 pp.764\u2013771."},{"key":"e_1_2_1_6_2","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1145\/75705.75707","volume-title":"Instrumentation for Parallel Computer Systems","author":"Brantley W. C.","year":"1989"},{"key":"e_1_2_1_7_2","volume-title":"Performance Instrumentation and Visualization for Parallel Computer Systems","author":"Brantley W. C.","year":"1990"},{"key":"e_1_2_1_8_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0129053389000299"},{"key":"e_1_2_1_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(88)90094-4"},{"key":"e_1_2_1_10_2","doi-asserted-by":"publisher","DOI":"10.1137\/0909041"},{"key":"e_1_2_1_11_2","unstructured":"C.Moler \u2018Matrix computation on distributed memory multiprocessors\u2019 Proceedings of the First Conference on Hypercube Multiprocessors 1985 pp.181\u2013195."},{"key":"e_1_2_1_12_2","unstructured":"BrochardL. \u2018Scalability granularity and parallelism of numerical algorithms\u2019. IBM Research Report Report RC 14786 1989."},{"key":"e_1_2_1_13_2","volume-title":"\u2018Solving problems on concurrent processors. Vol 1\u2019","author":"Fox G.","year":"1988"},{"key":"e_1_2_1_14_2","first-page":"1042","volume-title":"Proceedings of Supercomputing 87, Papatheodorou","author":"Fox G.","year":"1988"},{"key":"e_1_2_1_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(88)90053-1"},{"key":"e_1_2_1_16_2","unstructured":"L.BrochardandJ. P.Prost \u2018Synchronization and load unbalance effects of parallel iterative algorithms\u2019. Proceedings of the 1989 International Conference on Parallel Processing III 1989 pp.153\u2013160."},{"key":"e_1_2_1_17_2","unstructured":"D.MarinescuandJ.Rice \u2018On the effects of synchronization in parallel computing\u2019 Report CSD\u2010TR\u2010750 Purdue University 1988."},{"key":"e_1_2_1_18_2","unstructured":"W. C.Brantley K. P.McAuliffeandJ.Weiss \u2018RP3 processor\u2010memory element\u2019 Proceedings of the 1985 International Conference on Parallel Processing 1985 pp.782\u2013789."}],"container-title":["Concurrency: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.4330040106","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.4330040106","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,24]],"date-time":"2023-10-24T13:47:15Z","timestamp":1698155235000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.4330040106"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1992,2]]},"references-count":17,"journal-issue":{"issue":"1","published-print":{"date-parts":[[1992,2]]}},"alternative-id":["10.1002\/cpe.4330040106"],"URL":"https:\/\/doi.org\/10.1002\/cpe.4330040106","archive":["Portico"],"relation":{},"ISSN":["1040-3108","1096-9128"],"issn-type":[{"value":"1040-3108","type":"print"},{"value":"1096-9128","type":"electronic"}],"subject":[],"published":{"date-parts":[[1992,2]]}}}