{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T19:29:08Z","timestamp":1725564548280},"publisher-location":"New York, NY","reference-count":26,"publisher":"Springer New York","isbn-type":[{"type":"print","value":"9781441969347"},{"type":"electronic","value":"9781441969354"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011]]},"DOI":"10.1007\/978-1-4419-6935-4_20","type":"book-chapter","created":{"date-parts":[[2010,9,8]],"date-time":"2010-09-08T18:45:19Z","timestamp":1283971519000},"page":"353-370","source":"Crossref","is-referenced-by-count":7,"title":["Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology"],"prefix":"10.1007","author":[{"given":"Jaewook","family":"Shin","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mary W.","family":"Hall","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jacqueline","family":"Chame","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chun","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paul D.","family":"Hovland","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2010,8,13]]},"reference":[{"key":"20_CR1_20","unstructured":"http:\/\/nek5000.mcs.anl.gov"},{"key":"20_CR2_20","unstructured":"http:\/\/rosecompiler.org\/"},{"key":"20_CR3_20","unstructured":"http:\/\/www.mcs.anl.gov\/~jaewook\/tune.html"},{"key":"20_CR4_20","unstructured":"http:\/\/www.netlib.org\/blas\/"},{"key":"20_CR5_20","volume-title":"Loop optimization using hierarchical compilation and kernel decomposition","author":"D Barthou","year":"2007","unstructured":"Barthou D, Donadio S, Carribault P, Duchateau A, Jalby W (2007) Loop optimization using hierarchical compilation and kernel decomposition. In International symposium on code generation and optimization, San Jose, CA"},{"key":"20_CR6_20","first-page":"340","volume-title":"Optimizing matrix multiply using PHiPAC: a portable, high-performance","author":"J Bilmes","year":"1997","unstructured":"Bilmes J, Asanovic K, Chin C-W, Demmel J (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In International conference on supercomputing, Vienna, Austria, pp 340\u2013347"},{"key":"20_CR7_20","volume-title":"Model-guided empirical optimization for memory hierarchy","author":"C Chen","year":"2007","unstructured":"Chen C (2007) Model-guided empirical optimization for memory hierarchy. PhD thesis, University of Southern California"},{"key":"20_CR8_20","volume-title":"Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy","author":"C Chen","year":"2005","unstructured":"Chen C, Chame J, Hall M (2005) Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In International symposium on code generation and optimization, March 2005"},{"key":"20_CR9_20","volume-title":"CHiLL: a framework for composing high-level loop transformations","author":"C Chen","year":"2008","unstructured":"Chen C, Chame J, Hall M (2008) CHiLL: a framework for composing high-level loop transformations. Technical Report 08-897, University of Southern California, Computer Science Department"},{"issue":"9","key":"20_CR10_20","doi-asserted-by":"publisher","first-page":"1051","DOI":"10.1109\/TCAD.2002.801096","volume":"21","author":"E-Y Chung","year":"2002","unstructured":"Chung E-Y, Benini L, DeMicheli G, Luculli G, Carilli M (2002) Value-sensitive automatic code specialization for embedded software. IEEE Trans Comput Aided Des Integr Circuits Syst 21(9):1051\u20131067","journal-title":"IEEE Trans Comput Aided Des Integr Circuits Syst"},{"key":"20_CR11_20","doi-asserted-by":"crossref","DOI":"10.21236\/ADA479065","volume-title":"The fastest Fourier transform in the West","author":"M Frigo","year":"1997","unstructured":"Frigo M, Johnson SG (1997) The fastest Fourier transform in the West. Technical Report MIT-LCS-TR728, MIT Lab for Computer Science"},{"issue":"4","key":"20_CR12_20","doi-asserted-by":"publisher","first-page":"422","DOI":"10.1145\/504210.504213","volume":"27","author":"JA Gunnels","year":"2001","unstructured":"Gunnels JA, Gustavson FG, Henry GM, Van De Geijn RA (2001) FLAME: formal linear algebra methods environment. ACM Trans Math Software 27(4):422\u2013455","journal-title":"ACM Trans Math Software"},{"key":"20_CR13_20","volume-title":"University of Delaware","author":"Hall M, Chame J, Chen C, Shin J, Rudy G, Murtaza Khan M (2009) Loop transformation recipes for code generation and auto-tuning. The 22nd international workshop on languages and compilers for parallel computing, October 8\u201310","year":"2009","unstructured":"Hall M, Chame J, Chen C, Shin J, Rudy G, Murtaza Khan M (2009) Loop transformation recipes for code generation and auto-tuning. The 22nd international workshop on languages and compilers for parallel computing, October 8-10, 2009, University of Delaware, Newark, Delaware"},{"key":"20_CR14_20","volume-title":"Annotation-based empirical performance tuning using orio","author":"A Hartono","year":"2009","unstructured":"Hartono A, Norris B, Sadayappan P (2009) Annotation-based empirical performance tuning using orio. In IEEE international parallel and distributed processing symposium (IPDPS), Rome, Italy"},{"key":"20_CR15_20","volume-title":"Improving performance of hypermatrix cholesky factorization","author":"JR Herrero","year":"2003","unstructured":"Herrero JR, Navarro JJ (2003) Improving performance of hypermatrix cholesky factorization. In 9th International Euro-Par Conference, pp 461\u2013469"},{"key":"20_CR16_20","unstructured":"Intel (2008) Intel Fortran Compiler User and Reference Guides. \n                  http:\/\/www.intel.com\/cd\/software\/products\/asmo-na\/eng\/406088.htm"},{"key":"20_CR17_20","doi-asserted-by":"crossref","unstructured":"Kaushik DK, Gropp W, Minkoff M, Smith B (2008) Improving the performance of tensor matrix vector multiplication in cumulative reaction probability based quantum chemistry codes. In 15th international conference on high performance computing (HiPC 2008), vol. 5374 of Lecture Notes in Computer Science, Springer, Berlin","DOI":"10.1007\/978-3-540-89894-8_14"},{"issue":"1","key":"20_CR18_20","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1023\/A:1020989410030","volume":"24","author":"PMW Knijnenburg","year":"2003","unstructured":"Knijnenburg PMW, Kisuki T, O\u2019Boyle MFP (2003) Combined selection of tile sizes and unroll factors using iterative compilation. J Supercomput 24(1):43\u201367","journal-title":"J Supercomput"},{"key":"20_CR19_20","volume-title":"Code specialization based on value profiles","author":"R Muth","year":"2002","unstructured":"Muth R, Watterson S, Debray S (2002) Code specialization based on value profiles. In Proceedings of static analysis symposium, June 2000"},{"issue":"2","key":"20_CR20_20","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1109\/JPROC.2004.840306","volume":"93","author":"M P\u00fcschel","year":"2005","unstructured":"P\u00fcschel M, Moura JMF, Johnson J, Padua D, Veloso M, Singer B, Xiong J, Franchetti F, Ga\u010di\u0107 A, Voronenko Y, Chen K, Johnson RW, Rizzolo N (2005) SPIRAL: code generation for DSP transforms. Proc IEEE 93(2):232\u2013275","journal-title":"Proc IEEE"},{"key":"20_CR21_20","volume-title":"A scalable autotuning framework for compiler optimization","author":"A Tiwari","year":"2009","unstructured":"Tiwari A, Chen C, Chame J, Hall M, Hollingsworth JK (2009) A scalable autotuning framework for compiler optimization. In IPDPS, Rome, Italy"},{"key":"20_CR22_20","volume-title":"Terascale spectral element algorithms and implementations","author":"HM Tufo","year":"1999","unstructured":"Tufo HM, Fischer PF (1999) Terascale spectral element algorithms and implementations. In ACM\/IEEE conference on Supercomputing, Portland, OR"},{"issue":"1","key":"20_CR23_20","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1088\/1742-6596\/16\/1\/071","volume":"16","author":"R Vuduc","year":"2005","unstructured":"Vuduc R, Demmel JW, Yelick KA (2005) Oski: a library of automatically tuned sparse matrix kernels. J Phys Conf Ser 16(1):521\u2013530","journal-title":"J Phys Conf Ser"},{"key":"20_CR24_20","doi-asserted-by":"crossref","unstructured":"Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In SuperComputing","DOI":"10.1109\/SC.1998.10004"},{"key":"20_CR25_20","volume-title":"POET: parameterized optimizations for empirical tuning","author":"Q Yi","year":"2007","unstructured":"Yi Q, Seymour K, You H, Vuduc R, Quinlan D (2007) POET: parameterized optimizations for empirical tuning. In IPDPS, Long Beach, CA, March 2007"},{"issue":"2","key":"20_CR26_20","doi-asserted-by":"publisher","first-page":"358","DOI":"10.1109\/JPROC.2004.840444","volume":"93","author":"K Yotov","year":"2005","unstructured":"Yotov K, Li X, Ren G, Garzar\u00e1n MJ, Padua D, Pingali K, Stodghill P (2005) Is search really necessary to generate high-performance BLAS? Proc IEEE 93(2):358\u2013386","journal-title":"Proc IEEE"}],"container-title":["Software Automatic Tuning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/978-1-4419-6935-4_20","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,3,20]],"date-time":"2019-03-20T04:59:37Z","timestamp":1553057977000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/978-1-4419-6935-4_20"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,8,13]]},"ISBN":["9781441969347","9781441969354"],"references-count":26,"URL":"https:\/\/doi.org\/10.1007\/978-1-4419-6935-4_20","relation":{},"subject":[],"published":{"date-parts":[[2010,8,13]]}}}