{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,6,26]],"date-time":"2022-06-26T00:28:30Z","timestamp":1656203310423},"reference-count":17,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/D079543\/1"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2010,1]]},"abstract":"\n We describe an efficient implementation of a hierarchy of algorithms for multiplication of dense matrices over the field with two elements (F\n 2<\/jats:sub>\n ). In particular we present our implementation\u2014in the M4RI library\u2014of Strassen-Winograd matrix multiplication and the \u201cMethod of the Four Russians for Multiplication\u201d (M4RM) and compare it against other available implementations. Good performance is demonstrated on AMD's Opteron processor and particulary good performance on Intel's Core 2 uo processor. The open-source M4RI library is available as a stand-alone package as well as part of the Sage mathematics system.\n <\/jats:p>\n \n In machine terms, addition in F\n 2<\/jats:sub>\n is logical-XOR, and multiplication is logical-AND, thus a machine word of 64 bits allows one to operate on 64 elements of F\n 2<\/jats:sub>\n in parallel: at most one CPU cycle for 64 parallel additions or multiplications. As such, element-wise operations over F\n 2<\/jats:sub>\n are relatively cheap. In fact, in this paper, we conclude that the actual bottlenecks are memory reads and writes and issues of data locality. We present our empirical findings in relation to minimizing these and give an analysis thereof.\n <\/jats:p>","DOI":"10.1145\/1644001.1644010","type":"journal-article","created":{"date-parts":[[2010,1,19]],"date-time":"2010-01-19T20:14:58Z","timestamp":1263932098000},"page":"1-14","source":"Crossref","is-referenced-by-count":13,"title":["Algorithm 898"],"prefix":"10.1145","volume":"37","author":[{"given":"Martin","family":"Albrecht","sequence":"first","affiliation":[{"name":"Royal Holloway, University of London, United Kingdom"}]},{"given":"Gregory","family":"Bard","sequence":"additional","affiliation":[{"name":"Fordham University, Bronx, NY"}]},{"given":"William","family":"Hart","sequence":"additional","affiliation":[{"name":"University of Warwick, United Kingdom"}]}],"member":"320","reference":[{"key":"e_1_2_2_1_1","volume-title":"The Design and Analysis of Computer Algorithms","author":"Aho A."},{"key":"e_1_2_2_2_1","first-page":"11","article-title":"On economical construction of the transitive closure of a directed graph","volume":"194","author":"Arlazarov V.","year":"1970","journal-title":"Dokl. Akad. Nauk."},{"key":"e_1_2_2_3_1","unstructured":"Bard G. 2008. Matrix inversion (or LUP-factorization) via the Method of Four Russians in &thetas;(n3\/log n) time. In Submission. Bard G. 2008. Matrix inversion (or LUP-factorization) via the Method of Four Russians in &thetas;(n 3 \/log n) time. In Submission."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1006\/jsco.1996.0125"},{"key":"e_1_2_2_7_1","volume-title":"Electronic Proceedings of MEGA 2007","author":"Brickenstein M.","year":"2007"},{"key":"e_1_2_2_8_1","unstructured":"Dumas J.-G. and Pernet C. 2007. Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. (Available at http:\/\/www.citebase.org\/abstract?id=oai:arXiv.org:0707.2347.) Dumas J.-G. and Pernet C. 2007. Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. (Available at http:\/\/www.citebase.org\/abstract?id=oai:arXiv.org:0707.2347.)"},{"key":"e_1_2_2_9_1","unstructured":"Fog A. 2008. Optimizing software in C++. (Available at http:\/\/www.agner.org\/optimize.) Fog A. 2008. Optimizing software in C++. (Available at http:\/\/www.agner.org\/optimize.)"},{"issue":"2","key":"e_1_2_2_10_1","first-page":"632","article-title":"Pulse code communication","author":"Gray F.","year":"1953","journal-title":"US Patent"},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","volume-title":"Accuracy and Stability of Numerical Algorithms","author":"Higham N.","DOI":"10.1137\/1.9780898718027"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/369028.369096"},{"key":"e_1_2_2_13_1","unstructured":"Pernet C. 2001. Implementation of Winograd's algorithm over finite Fields using ATLAS Level3 Blas. Tech. rep. ID-Laboratory. Pernet C. 2001. Implementation of Winograd's algorithm over finite Fields using ATLAS Level3 Blas. Tech. rep. ID-Laboratory."},{"key":"e_1_2_2_14_1","unstructured":"Ringe M. 2007. Meataxe 2.4.8. Available at http:\/\/www.math.rwth-aachen.de\/~MTX\/. Ringe M. 2007. Meataxe 2.4.8. Available at http:\/\/www.math.rwth-aachen.de\/~MTX\/."},{"key":"e_1_2_2_15_1","volume-title":"et al","author":"Stein W.","year":"2008"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02165411"},{"issue":"4","key":"e_1_2_2_17_1","first-page":"10","article-title":"GAP\u2014Groups, Algorithms, and Programming","volume":"4","author":"The GAP Group","year":"2007","journal-title":"Version"},{"key":"e_1_2_2_18_1","volume-title":"Addison-Wesley Longman Publishing Co","author":"Warren H. S."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/0024-3795(71)90009-7"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1644001.1644010","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,21]],"date-time":"2021-02-21T17:37:46Z","timestamp":1613929066000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1644001.1644010"}},"subtitle":["Efficient multiplication of dense matrices over GF(2)"],"short-title":[],"issued":{"date-parts":[[2010,1]]},"references-count":17,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,1]]}},"alternative-id":["10.1145\/1644001.1644010"],"URL":"http:\/\/dx.doi.org\/10.1145\/1644001.1644010","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":["Applied Mathematics","Software"],"published":{"date-parts":[[2010,1]]}}}