{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T14:57:54Z","timestamp":1770994674854,"version":"3.50.1"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2012,4,1]],"date-time":"2012-04-01T00:00:00Z","timestamp":1333238400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004359","name":"Vetenskapsr\u00e4det","doi-asserted-by":"publisher","award":["2008-5243"],"award-info":[{"award-number":["2008-5243"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2012,4]]},"abstract":"<jats:p>Techniques and algorithms for efficient in-place conversion to and from standard and blocked matrix storage formats are described. Such functionality is required by numerical libraries that use different data layouts internally. Parallel algorithms and a software package for in-place matrix storage format conversion based on in-place matrix transposition are presented and evaluated. A new algorithm for in-place transposition which efficiently determines the structure of the transposition permutation a priori is one of the key ingredients. It enables effective load balancing in a parallel environment.<\/jats:p>","DOI":"10.1145\/2168773.2168775","type":"journal-article","created":{"date-parts":[[2012,5,7]],"date-time":"2012-05-07T18:47:42Z","timestamp":1336416462000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":56,"title":["Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion"],"prefix":"10.1145","volume":"38","author":[{"given":"Fred","family":"Gustavson","sequence":"first","affiliation":[{"name":"IBM T.J. Watson Research Center, Emeritus, and Ume\u00e5 University"}]},{"given":"Lars","family":"Karlsson","sequence":"additional","affiliation":[{"name":"Ume\u00e5 University"}]},{"given":"Bo","family":"K\u00e5gstr\u00f6m","sequence":"additional","affiliation":[{"name":"Ume\u00e5 University"}]}],"member":"320","published-online":{"date-parts":[[2012,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1975.224124"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.laa.2006.03.018"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1366219.1366223"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/320941.320952"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/363282.363304"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/362349.362368"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/355611.355612"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/355719.355729"},{"key":"e_1_2_1_9_1","volume-title":"Department of Computer Sciences","author":"Chan E.","unstructured":"Chan , E. , van de Geijn , R. , Quintana-Ort , E. S. , Quintana-Ort , G. , and van Zee , F. G. 2008. Programming algorithms-by-blocks for matrix computations on multithreaded architectures. Tech. rep. TR-08-04 , Department of Computer Sciences , University of Texas as Austin. Chan, E., van de Geijn, R., Quintana-Ort, E. S., Quintana-Ort, G., and van Zee, F. G. 2008. Programming algorithms-by-blocks for matrix computations on multithreaded architectures. Tech. rep. TR-08-04, Department of Computer Sciences, University of Texas as Austin."},{"key":"e_1_2_1_10_1","first-page":"1","article-title":"QR factorization for the CELL Broadband","volume":"17","author":"Dongarra J.","year":"2009","unstructured":"Dongarra , J. and Kurzak , J. 2009 . QR factorization for the CELL Broadband Engine. Sci. Progr. 17 , 1 -- 2 , 31--42. Dongarra, J. and Kurzak, J. 2009. QR factorization for the CELL Broadband Engine. Sci. Progr. 17, 1--2, 31--42.","journal-title":"Engine. Sci. Progr."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(95)00050-X"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0036144503428693"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539792238649"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/321941.321949"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1177\/109434208800200103"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/647102.717430"},{"key":"e_1_2_1_17_1","unstructured":"Gustavson F. 2008. The relevance of new data structure approaches for dense linear algebra in the new multicore\/manycore environments. Tech. rep. RC24599 IBM Research. Gustavson F. 2008. The relevance of new data structure approaches for dense linear algebra in the new multicore\/manycore environments. Tech. rep. RC24599 IBM Research."},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Gustavson F.\n     and \n      Swirszcz T\n  . \n  2007\n  . In-place transposition of rectangular matrices. In Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing Conference. B. K\u00e5gstr\u00f6m et al. Eds. Lecture Notes in Computer Science vol. \n  4699 Springer 560--569. Gustavson F. and Swirszcz T. 2007. In-place transposition of rectangular matrices. In Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing Conference . B. K\u00e5gstr\u00f6m et al. Eds. Lecture Notes in Computer Science vol. 4699 Springer 560--569.","DOI":"10.1007\/978-3-540-75755-9_68"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1499096.1499100"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/645781.666659"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2003.1214317"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/169627.169814"},{"key":"e_1_2_1_23_1","unstructured":"IBM. 1986. IBM Engineering and scientific subroutine library. ESSL Guide and Reference SA22-7272-00. IBM. 1986. IBM Engineering and scientific subroutine library. ESSL Guide and Reference SA22-7272-00."},{"key":"e_1_2_1_24_1","volume-title":"Matrix transposition. Department of Mathematics and Computer Science","author":"Johnson J. R.","year":"1910","unstructured":"Johnson , J. R. 1995. Matrix transposition. Department of Mathematics and Computer Science , Drexel University , Philadelphia, PA 1910 4. Manuscript. http:\/\/cricket.cs.drexel\/edu\/&tilde;jjohnson\/2007-08\/fell\/cs680\/paper\/transpose.pdf Johnson, J. R. 1995. Matrix transposition. Department of Mathematics and Computer Science, Drexel University, Philadelphia, PA 19104. Manuscript. http:\/\/cricket.cs.drexel\/edu\/&tilde;jjohnson\/2007-08\/fell\/cs680\/paper\/transpose.pdf"},{"key":"e_1_2_1_25_1","volume-title":"Department of Computing Science","author":"Karlsson L.","unstructured":"Karlsson , L. 2009. Blocked in-place transposition with application to storage format conversion. Tech. rep. UMINF 09.01. ISSN 0348-0542 , Department of Computing Science , Ume\u00e5 University , Ume\u00e5, Sweden . Karlsson, L. 2009. Blocked in-place transposition with application to storage format conversion. Tech. rep. UMINF 09.01. ISSN 0348-0542, Department of Computing Science, Ume\u00e5 University, Ume\u00e5, Sweden."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of IFIP Congress. North-Holland, 19--27","author":"Knuth D. E.","year":"1971","unstructured":"Knuth , D. E. 1971 . Mathematical analysis of algorithms . In Proceedings of IFIP Congress. North-Holland, 19--27 . Knuth, D. E. 1971. Mathematical analysis of algorithms. In Proceedings of IFIP Congress. North-Holland, 19--27."},{"key":"e_1_2_1_27_1","volume-title":"The Art of Computer Programming. Vols. 1 and 2","author":"Knuth D.","unstructured":"Knuth , D. 1998. The Art of Computer Programming. Vols. 1 and 2 , 3 rd Ed. Addison-Wesley . Knuth, D. 1998. The Art of Computer Programming. Vols. 1 and 2, 3rd Ed. Addison-Wesley.","edition":"3"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/106975.106981"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-1960-0112246-3"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/2.1.47"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2168773.2168775","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2168773.2168775","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:54:45Z","timestamp":1750240485000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2168773.2168775"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4]]},"references-count":30,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,4]]}},"alternative-id":["10.1145\/2168773.2168775"],"URL":"https:\/\/doi.org\/10.1145\/2168773.2168775","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,4]]},"assertion":[{"value":"2010-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-04-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}