{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T19:22:25Z","timestamp":1774120945884,"version":"3.50.1"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2017,6,3]],"date-time":"2017-06-03T00:00:00Z","timestamp":1496448000000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["ACI-1148125,ACI-1340293"],"award-info":[{"award-number":["ACI-1148125,ACI-1340293"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002855","name":"Ministry of Science and Technology of the People's Republic of China","doi-asserted-by":"publisher","award":["2012AA010903"],"award-info":[{"award-number":["2012AA010903"]}],"id":[{"id":"10.13039\/501100002855","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003329","name":"Ministerio de Econom\u00eda y Competitividad","doi-asserted-by":"publisher","award":["TIN 2012-32180"],"award-info":[{"award-number":["TIN 2012-32180"]}],"id":[{"id":"10.13039\/501100003329","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61272136"],"award-info":[{"award-number":["61272136"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2016,6,3]]},"abstract":"<jats:p>BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. We show, with very little effort, how the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the GotoBLAS), and commercial vendor implementations such as AMD\u2019s ACML, IBM\u2019s ESSL, and Intel\u2019s MKL libraries. Although most of this article focuses on single-core implementation, we also provide compelling results that suggest the framework\u2019s leverage extends to the multithreaded domain.<\/jats:p>","DOI":"10.1145\/2755561","type":"journal-article","created":{"date-parts":[[2016,6,10]],"date-time":"2016-06-10T09:00:33Z","timestamp":1465549233000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":62,"title":["The BLIS Framework"],"prefix":"10.1145","volume":"42","author":[{"given":"Field G.","family":"Van Zee","sequence":"first","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tyler M.","family":"Smith","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bryan","family":"Marker","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tze Meng","family":"Low","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert A. Van De","family":"Geijn","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Francisco D.","family":"Igual","sequence":"additional","affiliation":[{"name":"Complutense University of Madrid, Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mikhail","family":"Smelyanskiy","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xianyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Kistler","sequence":"additional","affiliation":[{"name":"IBM Corporation, Austin, TX"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vernon","family":"Austel","sequence":"additional","affiliation":[{"name":"IBM Corporation, Yorktown Heights, NY"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John A.","family":"Gunnels","sequence":"additional","affiliation":[{"name":"IBM Corporation, Yorktown Heights, NY"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lee","family":"Killough","sequence":"additional","affiliation":[{"name":"Cray Inc., Seattle, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2016,6,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2012.26"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","unstructured":"E. Anderson Z. Bai C. Bischof L. S. Blackford J. Demmel Jack J. Dongarra J. Du Croz S. Hammarling A. Greenbaum A. McKenney and D. Sorensen. 1999. LAPACK Users\u2019 Guide (3rd ed.). Society for Industrial and Applied Mathematics Philadelphia PA.","DOI":"10.5555\/323215"},{"key":"e_1_2_1_3_1","unstructured":"ATLAS. 2013. ATLAS 3.8.4 ARM. Retrieved April 4 2016 from http:\/\/www.vesperix.com\/arm\/atlas-arm\/index.html."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/77626.79170"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/42288.42291"},{"key":"e_1_2_1_6_1","volume-title":"AltiVec Technology Programming Interface Manual. Retrieved","author":"Semiconductor Freescale","year":"2016","unstructured":"Freescale Semiconductor. 1999. AltiVec Technology Programming Interface Manual. Retrieved April 4, 2016, from, http:\/\/www.freescale.com\/files\/32bit\/doc\/ref_manual\/ALTIVECPIM.pdf."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356052.1356053"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1377603.1377607"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304609"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/645455.653765"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/647882.738103"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.113"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676475"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2012.2222991"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389032"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/292395.292412"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/355841.355847"},{"key":"e_1_2_1_18_1","volume-title":"Loongson 3A Processor Manual","author":"Technology Loongson","unstructured":"Loongson Technology. 2009. Loongson 3A Processor Manual. Loongson Technology Corp. Ltd."},{"key":"e_1_2_1_19_1","volume-title":"Quintana-Ort\u00ed","author":"Low Tze Meng","year":"2014","unstructured":"Tze Meng Low, Francisco D. Igual, Tyler M. Smith, and Enrique S. Quintana-Ort\u00ed. 2014. Analytical Modeling Is Enough for High Performance BLIS. Technical Report. Department of Computer Sciences, University of Texas at Austin."},{"key":"e_1_2_1_20_1","volume-title":"OpenBLAS Home Page. Retrieved","author":"BLAS.","year":"2016","unstructured":"OpenBLAS. 2012. OpenBLAS Home Page. Retrieved April 4, 2016, from http:\/\/xianyi.github.com\/OpenBLAS\/."},{"key":"e_1_2_1_21_1","volume-title":"OpenMP Application Program Interface Version 3.0. Retrieved","author":"Architecture Review Board MP","year":"2016","unstructured":"OpenMP Architecture Review Board. 2008. OpenMP Application Program Interface Version 3.0. Retrieved April 4, 2016, from http:\/\/www.openmp.org\/mp-documents\/spec30.pdf."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2012.35"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2012.132"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2011.2127330"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.110"},{"key":"e_1_2_1_26_1","volume-title":"TMS320C66x DSP CPU and Instruction Set Reference Guide. Retrieved","author":"Instruments Texas","year":"2016","unstructured":"Texas Instruments. 2010. TMS320C66x DSP CPU and Instruction Set Reference Guide. Retrieved April 4, 2016, from http:\/\/www.ti.com\/lit\/ug\/sprugh7\/sprugh7.pdf."},{"key":"e_1_2_1_27_1","volume-title":"TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor. Retrieved","author":"Instruments Texas","year":"2016","unstructured":"Texas Instruments. 2012. TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor. Retrieved April 4, 2016, from http:\/\/www.ti.com.cn\/cn\/lit\/ds\/symlink\/tms320c6678.pdf."},{"key":"e_1_2_1_28_1","unstructured":"Field G. Van Zee. 2012. Libflame : The Complete Reference. www.lulu.com."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2009.207"},{"key":"e_1_2_1_30_1","volume-title":"Van Zee and Robert A. van de Geijn","author":"Field","year":"2012","unstructured":"Field G. Van Zee and Robert A. van de Geijn. 2012. BLIS: A Framework for Generating BLAS-Like Libraries. FLAME Working Note #66. Technical Report UTCS TR-12-30. Department of Computer Sciences, University of Texas at Austin."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2764454"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/509058.509096"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2012.97"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840444"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2755561","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2755561","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2755561","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T09:16:25Z","timestamp":1763457385000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2755561"}},"subtitle":["Experiments in Portability"],"short-title":[],"issued":{"date-parts":[[2016,6,3]]},"references-count":34,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,6,3]]}},"alternative-id":["10.1145\/2755561"],"URL":"https:\/\/doi.org\/10.1145\/2755561","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,6,3]]},"assertion":[{"value":"2013-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-04-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-06-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}