{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T05:33:23Z","timestamp":1740980003052,"version":"3.38.0"},"reference-count":20,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[1996,12,1]],"date-time":"1996-12-01T00:00:00Z","timestamp":849398400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The International Journal of Supercomputer Applications and High Performance Computing"],"published-print":{"date-parts":[[1996,12]]},"abstract":"<jats:p> The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines designed for distributed memory architectures. The basic linear algebra subroutines (BLAS) of the CMSSL have been imple mented as a two-level structure to exploit optimizations local to nodes and across nodes. This paper presents the implementation considerations and performance of the local BLAS, or BLAS local to each node of the system. A wide variety of loop structures and unrollings have been implemented in order to achieve a uniform and high performance, irrespective of the data layout in node mem ory. The CMSSL is the only existing high performance library capable of supporting both the data parallel and message-passing modes of programming a distributed memory computer. The implications of implementing BLAS on distributed memory computers are considered in this light. <\/jats:p>","DOI":"10.1177\/109434209601000403","type":"journal-article","created":{"date-parts":[[2007,3,5]],"date-time":"2007-03-05T01:17:47Z","timestamp":1173057467000},"page":"300-335","source":"Crossref","is-referenced-by-count":1,"title":["Local Basic Linear Algebra Subroutines (LBLAS) for the CM-5\/5E"],"prefix":"10.1177","volume":"10","author":[{"given":"David","family":"Kramer","sequence":"first","affiliation":[{"name":"THINKING MACHINES CORPORATION, CAMBRIDGE, MA 02138"}]},{"given":"S. Lennart","family":"Johnsson","sequence":"additional","affiliation":[{"name":"THINKING MACHINES CORPORATION, AND HARVARD UNIVERSITY,\rCAMBRIDGE, MA 02138"}]},{"family":"Yu Hu","sequence":"additional","affiliation":[{"name":"AIKEN COMPUTATION LAB, HARVARD UNIVERSITY, CAMBRIDGE,\rMA 02138"}]}],"member":"179","published-online":{"date-parts":[[1996,12,1]]},"reference":[{"volume-title":"ScaLA-PACK: A scalable linear algebra for distributed, memory concurrent computers. CS-92-181","year":"1992","author":"Choi, J.","key":"atypb1"},{"issue":"10","key":"atypb2","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/6.155706","volume":"29","author":"Comerford, R.","year":"1992","journal-title":"IEEE Spectrum"},{"key":"atypb3","doi-asserted-by":"publisher","DOI":"10.1145\/42288.42291"},{"issue":"1","key":"atypb4","first-page":"18","volume":"14","author":"Dongarra, J.J.","year":"1988","journal-title":"ACMTOMS"},{"key":"atypb5","doi-asserted-by":"publisher","DOI":"10.1145\/77626.79170"},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1145\/77626.77627"},{"key":"atypb7","doi-asserted-by":"publisher","DOI":"10.1109\/6.158637"},{"issue":"1","key":"atypb8","first-page":"1","volume":"2","author":"High Performance Fortran Forum.","year":"1993","journal-title":"Scientific Programming"},{"key":"atypb9","doi-asserted-by":"publisher","DOI":"10.1137\/0613043"},{"volume-title":"Parallel computers. Bristol, UK","year":"1981","author":"Hockney, R.W.","key":"atypb10"},{"volume-title":"Distributed BLAS","year":"1992","author":"Johnsson, S.L.","key":"atypb11"},{"key":"atypb12","doi-asserted-by":"publisher","DOI":"10.1177\/109434209200600403"},{"key":"atypb13","doi-asserted-by":"publisher","DOI":"10.1145\/355841.355848"},{"key":"atypb14","doi-asserted-by":"publisher","DOI":"10.1145\/355841.355847"},{"key":"atypb15","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(94)90011-6"},{"volume-title":"Fortran 90 explained","year":"1991","author":"Metcalf, M.","key":"atypb16"},{"key":"atypb17","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(90)90093-O"},{"volume-title":"CM-5 technical summary","year":"1991","author":"Thinking Machines Corporation.","key":"atypb18"},{"volume-title":"CMSSL for CM Fortran, version 3.1","year":"1993","author":"Thinking Machines Corporation.","key":"atypb19"},{"volume-title":"The CM run-time system (CMRTS)","year":"1994","author":"Thinking Machines Corporation.","key":"atypb20"}],"container-title":["The International Journal of Supercomputer Applications and High Performance Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/109434209601000403","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/109434209601000403","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T09:37:14Z","timestamp":1740908234000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/109434209601000403"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1996,12]]},"references-count":20,"journal-issue":{"issue":"4","published-print":{"date-parts":[[1996,12]]}},"alternative-id":["10.1177\/109434209601000403"],"URL":"https:\/\/doi.org\/10.1177\/109434209601000403","relation":{},"ISSN":["1078-3482"],"issn-type":[{"type":"print","value":"1078-3482"}],"subject":[],"published":{"date-parts":[[1996,12]]}}}