{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:51:08Z","timestamp":1750308668495,"version":"3.41.0"},"reference-count":9,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2012,10,8]],"date-time":"2012-10-08T00:00:00Z","timestamp":1349654400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMETRICS Perform. Eval. Rev."],"published-print":{"date-parts":[[2012,10,8]]},"abstract":"<jats:p>We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We develop a matrix transpose algorithm that uses cache blocking, cache prefetching and data alignment. We model the POWER7 data cache and memory concurrency and use the model to predict the memory throughput of the proposed matrix transpose algorithm. The performance of our matrix transpose algorithm is up to five times higher than that of the dgetmo routine of the Engineering and Scientific Subroutine Library and is 2.5 times higher than that of the code generated by compiler-inserted prefetching. Numerical experiments indicate a good agreement between the predicted and the measured memory throughput.<\/jats:p>","DOI":"10.1145\/2381056.2381073","type":"journal-article","created":{"date-parts":[[2012,10,11]],"date-time":"2012-10-11T14:55:16Z","timestamp":1349967316000},"page":"68-73","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Optimizing matrix transposes using a POWER7 cache model and explicit prefetching"],"prefix":"10.1145","volume":"40","author":[{"given":"Gabriel","family":"Mateescu","sequence":"first","affiliation":[{"name":"Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gregory H.","family":"Bauer","sequence":"additional","affiliation":[{"name":"National Center for Supercomputing Applications, Urbana, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert A.","family":"Fiedler","sequence":"additional","affiliation":[{"name":"National Center for Supercomputing Applications, Urbana, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2012,10,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"195","volume-title":"Proceedings of the Sixth International Symposium on High-Performance Computer Architecture (HPCA-6)","author":"Chatterjee S.","year":"2000","unstructured":"Chatterjee , S. , and Sen , S . Cache-efficient matrix transposition . In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture (HPCA-6) ( 2000 ), pp. 195 -- 205 . Chatterjee, S., and Sen, S. Cache-efficient matrix transposition. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture (HPCA-6) (2000), pp. 195--205."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/795665.796479"},{"key":"e_1_2_1_3_1","volume-title":"Computer Architecture: A Quantitative Approach","author":"Hennessy J. L.","year":"2007","unstructured":"Hennessy , J. L. , and Patterson , D. A . Computer Architecture: A Quantitative Approach , 4 th ed. Morgan Kaufmann , 2007 . Hennessy, J. L., and Patterson, D. A. Computer Architecture: A Quantitative Approach, 4th ed. Morgan Kaufmann, 2007.","edition":"4"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.38"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1122018.1122054"},{"key":"e_1_2_1_6_1","volume-title":"STREAM: Sustainable memory bandwidth in high performance computers. Tech. rep","author":"McCalpin J. D.","year":"2012","unstructured":"McCalpin , J. D. STREAM: Sustainable memory bandwidth in high performance computers. Tech. rep ., University of Virginia , Charlottesville, Virginia , 2012 . http:\/\/www.cs.virginia.edu\/stream\/. McCalpin, J. D. STREAM: Sustainable memory bandwidth in high performance computers. Tech. rep., University of Virginia, Charlottesville, Virginia, 2012. http:\/\/www.cs.virginia.edu\/stream\/."},{"key":"e_1_2_1_7_1","unstructured":"Power.org. Power ISA Version 2.06. http:\/\/www.power.org\/resources\/downloads.  Power.org. Power ISA Version 2.06. http:\/\/www.power.org\/resources\/downloads."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2011.2109230"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2011.2127330"}],"container-title":["ACM SIGMETRICS Performance Evaluation Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2381056.2381073","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2381056.2381073","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:00:39Z","timestamp":1750276839000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2381056.2381073"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,10,8]]},"references-count":9,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,10,8]]}},"alternative-id":["10.1145\/2381056.2381073"],"URL":"https:\/\/doi.org\/10.1145\/2381056.2381073","relation":{},"ISSN":["0163-5999"],"issn-type":[{"type":"print","value":"0163-5999"}],"subject":[],"published":{"date-parts":[[2012,10,8]]},"assertion":[{"value":"2012-10-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}