{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,1,31]],"date-time":"2025-01-31T04:40:01Z","timestamp":1738298401756,"version":"3.35.0"},"reference-count":17,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2008,7,22]],"date-time":"2008-07-22T00:00:00Z","timestamp":1216684800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2009,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To fully exploit the instruction\u2010level parallelism offered by modern processors, compilers need the necessary information available during the execution of the program. This advocates for iterative or dynamic compilation. Unfortunately, dynamic compilation is suitable only for applications where the cost of compilation may be amortized by multiple invocations of the same code. Similarly, the cost of iterative compilation makes it impractical to be widely used for performance improvement. In this article, we suggest a novel approach for improving the performance of mathematical kernels through fast instantiations of templates. Optimized templates are generated at static compile time with a limited number of compilations. The initial instantiations of these templates are performed at static compile time, and the runtime instantiations are performed with a very small overhead through specialized data, requiring no computations at runtime. It represents an effective solution in terms of reduced overhead incurring at static compile time and dynamic compile time. The experiments have been performed on an Itanium\u2010II architecture using highly optimized kernels of<jats:styled-content>ATLAS<\/jats:styled-content>and<jats:styled-content>FFTW<\/jats:styled-content>with<jats:italic>icc<\/jats:italic>and<jats:italic>gcc<\/jats:italic>compilers. Copyright \u00a9 2008 John Wiley &amp; Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.1333","type":"journal-article","created":{"date-parts":[[2008,7,22]],"date-time":"2008-07-22T10:27:14Z","timestamp":1216722434000},"page":"59-70","source":"Crossref","is-referenced-by-count":4,"title":["Improving performance of optimized kernels through fast instantiations of templates"],"prefix":"10.1002","volume":"21","author":[{"given":"Minhaj Ahmad","family":"Khan","sequence":"first","affiliation":[]},{"given":"H.\u2010P.","family":"Charles","sequence":"additional","affiliation":[]},{"given":"D.","family":"Barthou","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2008,7,22]]},"reference":[{"key":"e_1_2_10_2_2","doi-asserted-by":"crossref","unstructured":"MuthR WattersonSA DebraySK.Code specialization based on value profiles. Static Analysis Symposium London U.K. 2000;340\u2013359.","DOI":"10.1007\/978-3-540-45099-3_18"},{"key":"e_1_2_10_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/289121.289140"},{"key":"e_1_2_10_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/316686.316697"},{"key":"e_1_2_10_5_2","unstructured":"LeoneM DybvigRK.Dynamo: A staged compiler architecture for dynamic program optimization. Technical Report Indiana University 1997."},{"key":"e_1_2_10_6_2","unstructured":"GrantB MockM PhiliposeM ChambersC EggersSJ.DyC: An expressive annotation\u2010directed dynamic compiler for c. Technical Report Department of Computer Science and Engineering University of Washington 1999."},{"key":"e_1_2_10_7_2","unstructured":"KhanMA CharlesHP BarthouD.Reducing code size explosion through low\u2010overhead specialization. Eleventh Annual Workshop on the Interaction Between Compilers and Computer Architecture Phoenix U.S.A. 2007."},{"key":"e_1_2_10_8_2","first-page":"1","article-title":"Design and implementation of a lightweight dynamic optimization system","volume":"6","author":"Lu J","year":"2004","journal-title":"Journal of Instruction\u2010Level Parallelism"},{"key":"e_1_2_10_9_2","unstructured":"ChildersBR DavidsonJW SoffaML.Continuous compilation: A new approach to aggressive and adaptive code transformation. NSF Workshop on Next Generation Software Nice France 2003."},{"key":"e_1_2_10_10_2","unstructured":"KhanMA CharlesH\u2010P BarthouD.An effective automated approach to specialization of code. Twentieth International Workshop on Languages and Compilers for Parallel Computing Urbana IL U.S.A. 11\u201313 October 2007."},{"key":"e_1_2_10_11_2","unstructured":"WhaleyRC DongarraJ. Automatically tuned linear algebra software. Technical Report UT\u2010CS\u201097\u2010366 University of Tennessee 1997. Available at:http:\/\/www.netlib.org\/lapack\/lawns\/lawn131.ps[December1997]."},{"key":"e_1_2_10_12_2","doi-asserted-by":"crossref","unstructured":"CalderB FellerP EustaceA.Value profiling. International Symposium on Microarchitecture Los Alamitos CA U.S.A. 1997;259\u2013269.","DOI":"10.1109\/MICRO.1997.645816"},{"key":"e_1_2_10_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"key":"e_1_2_10_14_2","unstructured":"MakholmH.Specializing C\u2014An introduction to the principles behind C\u2010Mix. Technical Report Computer Science Department University of Copenhagen 1999."},{"key":"e_1_2_10_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/258993.259016"},{"key":"e_1_2_10_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-61580-6_4"},{"key":"e_1_2_10_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/358438.349303"},{"key":"e_1_2_10_18_2","doi-asserted-by":"crossref","unstructured":"KhanMA CharlesH\u2010P.Applying code specialization to FFT libraries for integral parameters. Nineteenth International Workshop on Languages and Compilers for Parallel Computing New Orleans U.S.A. 2\u20134 November 2006.","DOI":"10.1007\/978-3-540-72521-3_8"}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.1333","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.1333","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,31]],"date-time":"2025-01-31T03:57:54Z","timestamp":1738295874000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.1333"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,7,22]]},"references-count":17,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,1]]}},"alternative-id":["10.1002\/cpe.1333"],"URL":"https:\/\/doi.org\/10.1002\/cpe.1333","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"type":"print","value":"1532-0626"},{"type":"electronic","value":"1532-0634"}],"subject":[],"published":{"date-parts":[[2008,7,22]]}}}