{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T09:21:38Z","timestamp":1766136098031,"version":"3.41.0"},"reference-count":17,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2010,1,1]],"date-time":"2010-01-01T00:00:00Z","timestamp":1262304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/D079543\/1"],"award-info":[{"award-number":["EP\/D079543\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2010,1]]},"abstract":"<jats:p>\n            We describe an efficient implementation of a hierarchy of algorithms for multiplication of dense matrices over the field with two elements (F\n            <jats:sub>2<\/jats:sub>\n            ). In particular we present our implementation\u2014in the M4RI library\u2014of Strassen-Winograd matrix multiplication and the \u201cMethod of the Four Russians for Multiplication\u201d (M4RM) and compare it against other available implementations. Good performance is demonstrated on AMD's Opteron processor and particulary good performance on Intel's Core 2 uo processor. The open-source M4RI library is available as a stand-alone package as well as part of the Sage mathematics system.\n          <\/jats:p>\n          <jats:p>\n            In machine terms, addition in F\n            <jats:sub>2<\/jats:sub>\n            is logical-XOR, and multiplication is logical-AND, thus a machine word of 64 bits allows one to operate on 64 elements of F\n            <jats:sub>2<\/jats:sub>\n            in parallel: at most one CPU cycle for 64 parallel additions or multiplications. As such, element-wise operations over F\n            <jats:sub>2<\/jats:sub>\n            are relatively cheap. In fact, in this paper, we conclude that the actual bottlenecks are memory reads and writes and issues of data locality. We present our empirical findings in relation to minimizing these and give an analysis thereof.\n          <\/jats:p>","DOI":"10.1145\/1644001.1644010","type":"journal-article","created":{"date-parts":[[2010,1,19]],"date-time":"2010-01-19T20:14:58Z","timestamp":1263932098000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":22,"title":["Algorithm 898"],"prefix":"10.1145","volume":"37","author":[{"given":"Martin","family":"Albrecht","sequence":"first","affiliation":[{"name":"Royal Holloway, University of London, United Kingdom"}]},{"given":"Gregory","family":"Bard","sequence":"additional","affiliation":[{"name":"Fordham University, Bronx, NY"}]},{"given":"William","family":"Hart","sequence":"additional","affiliation":[{"name":"University of Warwick, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2010,1,22]]},"reference":[{"volume-title":"The Design and Analysis of Computer Algorithms","author":"Aho A.","key":"e_1_2_2_1_1","unstructured":"Aho , A. , Hopcroft , J. , and Ullman ., J. 1974. The Design and Analysis of Computer Algorithms . Addison-Wesley , Reading, MA . Aho, A., Hopcroft, J., and Ullman., J. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA."},{"key":"e_1_2_2_2_1","first-page":"11","article-title":"On economical construction of the transitive closure of a directed graph","volume":"194","author":"Arlazarov V.","year":"1970","unstructured":"Arlazarov , V. , Dinic , E. , Kronrod , M. , and Faradzev , I. 1970 . On economical construction of the transitive closure of a directed graph . Dokl. Akad. Nauk. 194 , 11 . (in Russian), (English Translation in Soviet Math Dokl.) Arlazarov, V., Dinic, E., Kronrod, M., and Faradzev, I. 1970. On economical construction of the transitive closure of a directed graph. Dokl. Akad. Nauk. 194, 11. (in Russian), (English Translation in Soviet Math Dokl.)","journal-title":"Dokl. Akad. Nauk."},{"key":"e_1_2_2_3_1","unstructured":"Bard G. 2008. Matrix inversion (or LUP-factorization) via the Method of Four Russians in &thetas;(n3\/log n) time. In Submission.  Bard G. 2008. Matrix inversion (or LUP-factorization) via the Method of Four Russians in &thetas;(n 3 \/log n) time. In Submission."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1006\/jsco.1996.0125"},{"key":"e_1_2_2_7_1","volume-title":"Electronic Proceedings of MEGA 2007","author":"Brickenstein M.","year":"2007","unstructured":"Brickenstein , M. and Dreyer , A . 2007. PolyBoRi: A framework for Gr\u00f6bner basis computations with Boolean polynomials . In Electronic Proceedings of MEGA 2007 . (Available at http:\/\/www.ricam.oeaw.ac.at\/mega 2007 \/electronic\/26.pdf.) Brickenstein, M. and Dreyer, A. 2007. PolyBoRi: A framework for Gr\u00f6bner basis computations with Boolean polynomials. In Electronic Proceedings of MEGA 2007. (Available at http:\/\/www.ricam.oeaw.ac.at\/mega2007\/electronic\/26.pdf.)"},{"key":"e_1_2_2_8_1","unstructured":"Dumas J.-G. and Pernet C. 2007. Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. (Available at http:\/\/www.citebase.org\/abstract?id=oai:arXiv.org:0707.2347.)  Dumas J.-G. and Pernet C. 2007. Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. (Available at http:\/\/www.citebase.org\/abstract?id=oai:arXiv.org:0707.2347.)"},{"key":"e_1_2_2_9_1","unstructured":"Fog A. 2008. Optimizing software in C++. (Available at http:\/\/www.agner.org\/optimize.)  Fog A. 2008. Optimizing software in C++. (Available at http:\/\/www.agner.org\/optimize.)"},{"issue":"2","key":"e_1_2_2_10_1","first-page":"632","article-title":"Pulse code communication","author":"Gray F.","year":"1953","unstructured":"Gray , F. 1953 . Pulse code communication . US Patent No. 2 , 632 ,058. Gray, F. 1953. Pulse code communication. US Patent No. 2,632,058.","journal-title":"US Patent"},{"volume-title":"Accuracy and Stability of Numerical Algorithms","author":"Higham N.","key":"e_1_2_2_11_1","unstructured":"Higham , N. 2002. Accuracy and Stability of Numerical Algorithms , second ed. Society for Industrial and Applied Mathematics . Higham, N. 2002. Accuracy and Stability of Numerical Algorithms, second ed. Society for Industrial and Applied Mathematics."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/369028.369096"},{"key":"e_1_2_2_13_1","unstructured":"Pernet C. 2001. Implementation of Winograd's algorithm over finite Fields using ATLAS Level3 Blas. Tech. rep. ID-Laboratory.  Pernet C. 2001. Implementation of Winograd's algorithm over finite Fields using ATLAS Level3 Blas. Tech. rep. ID-Laboratory."},{"key":"e_1_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Ringe M. 2007. Meataxe 2.4.8. Available at http:\/\/www.math.rwth-aachen.de\/~MTX\/.  Ringe M. 2007. Meataxe 2.4.8. Available at http:\/\/www.math.rwth-aachen.de\/~MTX\/.","DOI":"10.1016\/S1351-4180(07)70582-2"},{"key":"e_1_2_2_15_1","volume-title":"et al","author":"Stein W.","year":"2008","unstructured":"Stein , W. et al . 2008 . SAGE Mathematics Software (Version 3.3). The Sage Development Team . Available at http:\/\/www.sagemath.org. Stein, W. et al. 2008. SAGE Mathematics Software (Version 3.3). The Sage Development Team. Available at http:\/\/www.sagemath.org."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02165411"},{"issue":"4","key":"e_1_2_2_17_1","first-page":"10","article-title":"GAP\u2014Groups, Algorithms, and Programming","volume":"4","author":"The GAP Group","year":"2007","unstructured":"The GAP Group . 2007 . GAP\u2014Groups, Algorithms, and Programming , Version 4 . 4 . 10 . The GAP Group. The GAP Group. 2007. GAP\u2014Groups, Algorithms, and Programming, Version 4.4.10. The GAP Group.","journal-title":"Version"},{"volume-title":"Addison-Wesley Longman Publishing Co","author":"Warren H. S.","key":"e_1_2_2_18_1","unstructured":"Warren , H. S. 2002. Hacker's Delight . Addison-Wesley Longman Publishing Co ., Inc., Boston, MA, USA. Warren, H. S. 2002. Hacker's Delight. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/0024-3795(71)90009-7"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1644001.1644010","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1644001.1644010","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:26:43Z","timestamp":1750278403000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1644001.1644010"}},"subtitle":["Efficient multiplication of dense matrices over GF(2)"],"short-title":[],"issued":{"date-parts":[[2010,1]]},"references-count":17,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,1]]}},"alternative-id":["10.1145\/1644001.1644010"],"URL":"https:\/\/doi.org\/10.1145\/1644001.1644010","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"type":"print","value":"0098-3500"},{"type":"electronic","value":"1557-7295"}],"subject":[],"published":{"date-parts":[[2010,1]]},"assertion":[{"value":"2008-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-01-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}