{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T22:50:45Z","timestamp":1780527045929,"version":"3.54.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2015,4,16]],"date-time":"2015-04-16T00:00:00Z","timestamp":1429142400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100020875","name":"UVSQ","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100020875","id-type":"DOI","asserted-by":"crossref"}]},{"name":"CEA"},{"name":"Intel"},{"DOI":"10.13039\/501100010190","name":"GENCI","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100010190","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,4,16]]},"abstract":"<jats:p>\n            This article presents Codelet Extractor and REplayer (CERE), an open-source framework for code isolation. CERE finds and extracts the hotspots of an application as isolated fragments of code, called\n            <jats:italic>codelets<\/jats:italic>\n            . Codelets can be modified, compiled, run, and measured independently from the original application. Code isolation reduces benchmarking cost and allows piecewise optimization of an application. Unlike previous approaches, CERE isolates codes at the compiler Intermediate Representation (IR) level. Therefore CERE is language agnostic and supports many input languages such as C, C++, Fortran, and D. CERE automatically detects codelets invocations that have the same performance behavior. Then, it selects a reduced set of representative codelets and invocations, much faster to replay, which still captures accurately the original application. In addition, CERE supports recompiling and retargeting the extracted codelets. Therefore, CERE can be used for cross-architecture performance prediction or piecewise code optimization. On the SPEC 2006 FP benchmarks, CERE codelets cover 90.9% and accurately replay 66.3% of the execution time. We use CERE codelets in a realistic study to evaluate three different architectures on the NAS benchmarks. CERE accurately estimates each architecture performance and is 7.3 \u00d7 to 46.6 \u00d7 cheaper than running the full benchmark.\n          <\/jats:p>","DOI":"10.1145\/2724717","type":"journal-article","created":{"date-parts":[[2015,4,17]],"date-time":"2015-04-17T22:12:01Z","timestamp":1429308721000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["CERE"],"prefix":"10.1145","volume":"12","author":[{"given":"Pablo De Oliveira","family":"Castro","sequence":"first","affiliation":[{"name":"Universit\u00e9 de Versailles Saint-Quentin-en-Yvelines and Exascale Computing Research, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chadi","family":"Akel","sequence":"additional","affiliation":[{"name":"Exascale Computing Research, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eric","family":"Petit","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Versailles Saint-Quentin-en-Yvelines, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mihail","family":"Popov","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Versailles Saint-Quentin-en-Yvelines, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"William","family":"Jalby","sequence":"additional","affiliation":[{"name":"Exascale Computing Research, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2015,4,16]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.116"},{"key":"e_1_2_2_2_1","unstructured":"A. Alexandrescu. 2010. The D Programming Language. Pearson Education.   A. Alexandrescu. 2010. The D Programming Language. Pearson Education."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/125826.125925"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1190\/1.1441434"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541228.2555294"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2006.02.003"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/266800.266825"},{"key":"e_1_2_2_8_1","unstructured":"CAPS. 2013. Codelet Finder. Retrieved from http:\/\/www.caps-entreprise.com\/.  CAPS. 2013. Codelet Finder. Retrieved from http:\/\/www.caps-entreprise.com\/."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1176760.1176765"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1806596.1806647"},{"key":"e_1_2_2_11_1","volume-title":"Proceedings of the 1996 IEEE International Conference on Computer Design: VLSI in Computers and Processors","author":"Conte Thomas M.","year":"1996","unstructured":"Thomas M. Conte , Mary Ann Hirsch , and Kishore N. Menezes . 1996. Reducing state loss for effective trace sampling of superscalar processors . In Proceedings of the 1996 IEEE International Conference on Computer Design: VLSI in Computers and Processors , 1996 (ICCD\u201996). IEEE, 468--477. Thomas M. Conte, Mary Ann Hirsch, and Kishore N. Menezes. 1996. Reducing state loss for effective trace sampling of superscalar processors. In Proceedings of the 1996 IEEE International Conference on Computer Design: VLSI in Computers and Processors, 1996 (ICCD\u201996). IEEE, 468--477."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2581122.2544144"},{"key":"#cr-split#-e_1_2_2_13_1.1","doi-asserted-by":"crossref","unstructured":"Pablo de Oliveira Castro Eric Petit Asma Farjallah and William Jalby. 2013. Adaptive sampling for performance characterization of application kernels. Concurrency and Computation: Practice and Experience. DOI:http:\/\/dx.doi.org\/10.1002\/cpe.3097 10.1002\/cpe.3097","DOI":"10.1002\/cpe.3097"},{"key":"#cr-split#-e_1_2_2_13_1.2","doi-asserted-by":"crossref","unstructured":"Pablo de Oliveira Castro Eric Petit Asma Farjallah and William Jalby. 2013. Adaptive sampling for performance characterization of application kernels. Concurrency and Computation: Practice and Experience. DOI:http:\/\/dx.doi.org\/10.1002\/cpe.3097","DOI":"10.1002\/cpe.3097"},{"key":"e_1_2_2_14_1","volume-title":"Proceeding of the 4th Workshop on EPIC Architectures and Compiler Technology","author":"Djoudi Lamia","year":"2005","unstructured":"Lamia Djoudi , Denis Barthou , Patrick Carribault , Christophe Lemuet , Jean-Thomas Acquaviva , and William Jalby . 2005 . Maqao: Modular assembler quality analyzer and optimizer for Itanium 2 . In Proceeding of the 4th Workshop on EPIC Architectures and Compiler Technology , San Jose. Lamia Djoudi, Denis Barthou, Patrick Carribault, Christophe Lemuet, Jean-Thomas Acquaviva, and William Jalby. 2005. Maqao: Modular assembler quality analyzer and optimizer for Itanium 2. In Proceeding of the 4th Workshop on EPIC Architectures and Compiler Technology, San Jose."},{"key":"e_1_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Jason Duell. 2005. The design and implementation of Berkeley Lab\u2019s Linux checkpoint\/restart. Lawrence Berkeley National Laboratory.  Jason Duell. 2005. The design and implementation of Berkeley Lab\u2019s Linux checkpoint\/restart. Lawrence Berkeley National Laboratory.","DOI":"10.2172\/891617"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2005.1525996"},{"key":"e_1_2_2_17_1","unstructured":"gperftools v2.2.1. Google Performance Tools. Retrieved from http:\/\/code.google.com\/p\/gperftools.  gperftools v2.2.1. Google Performance Tools. Retrieved from http:\/\/code.google.com\/p\/gperftools."},{"key":"e_1_2_2_18_1","volume-title":"Proceedings of the 2005 IEEE International Workload Characterization Symposium. IEEE, 46--55","author":"Gao Xiaofeng","year":"2005","unstructured":"Xiaofeng Gao , Michael Laurenzano , Beth Simon , and Allan Snavely . 2005 . Reducing overheads for acquiring dynamic memory traces . In Proceedings of the 2005 IEEE International Workload Characterization Symposium. IEEE, 46--55 . Xiaofeng Gao, Michael Laurenzano, Beth Simon, and Allan Snavely. 2005. Reducing overheads for acquiring dynamic memory traces. In Proceedings of the 2005 IEEE International Workload Characterization Symposium. IEEE, 46--55."},{"key":"e_1_2_2_19_1","volume-title":"Proceedings of the 27th International Conference on Languages and Compilers for Parallel Computing (LCPC\u201914)","author":"Haine Christopher","year":"2014","unstructured":"Christopher Haine , Olivier Aumage , Enguerrand Petit , and Denis Barthou . 2014 . Exploring and evaluating array layout restructuration for SIMDization . In Proceedings of the 27th International Conference on Languages and Compilers for Parallel Computing (LCPC\u201914) . Christopher Haine, Olivier Aumage, Enguerrand Petit, and Denis Barthou. 2014. Exploring and evaluating array layout restructuration for SIMDization. In Proceedings of the 27th International Conference on Languages and Compilers for Parallel Computing (LCPC\u201914)."},{"key":"e_1_2_2_20_1","volume-title":"Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201903)","author":"John","unstructured":"John W. Haskins Jr and Kevin Skadron. 2003. Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation . In Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201903) . IEEE, 195--203. John W. Haskins Jr and Kevin Skadron. 2003. Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation. In Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201903). IEEE, 195--203."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186736.1186737"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1816038.1815998"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2006.302732"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.56"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1152154.1152174"},{"key":"e_1_2_2_26_1","volume-title":"Proceedings of the 25th International Conference on Languages and Compilers for Parallel Computing (LCPC\u201912)","author":"Kashnikov Yuriy","year":"2012","unstructured":"Yuriy Kashnikov , Jean Christophe Beyler , and William Jalby . 2012 . Compiler optimizations: Machine Learning versus O3 . In Proceedings of the 25th International Conference on Languages and Compilers for Parallel Computing (LCPC\u201912) . Yuriy Kashnikov, Jean Christophe Beyler, and William Jalby. 2012. Compiler optimizations: Machine Learning versus O3. In Proceedings of the 25th International Conference on Languages and Compilers for Parallel Computing (LCPC\u201912)."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2013.6641465"},{"key":"e_1_2_2_28_1","volume-title":"Rousseeuw","author":"Kaufman Leonard","year":"2009","unstructured":"Leonard Kaufman and Peter J . Rousseeuw . 2009 . Finding Groups in Data : An Introduction to Cluster Analysis. Vol. 344 . John Wiley & amp; Sons. Leonard Kaufman and Peter J. Rousseeuw. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. Vol. 344. John Wiley &amp; Sons."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.286300"},{"key":"e_1_2_2_30_1","volume-title":"Languages and Compilers for Parallel Computing","author":"Khan Minhaj Ahmad","unstructured":"Minhaj Ahmad Khan , H.-P. Charles , and Denis Barthou . 2008. An effective automated approach to specialization of code . In Languages and Compilers for Parallel Computing . Springer , 308--322. Minhaj Ahmad Khan, H.-P. Charles, and Denis Barthou. 2008. An effective automated approach to specialization of code. In Languages and Compilers for Parallel Computing. Springer, 308--322."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380010203"},{"key":"e_1_2_2_32_1","volume-title":"Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. Workload Characterization of Emerging Computer Applications","author":"Lafage Thierry","unstructured":"Thierry Lafage and Andr\u00e9 Seznec . 2001. Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. Workload Characterization of Emerging Computer Applications . Springer , 145--163. Thierry Lafage and Andr\u00e9 Seznec. 2001. Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. Workload Characterization of Emerging Computer Applications. Springer, 145--163."},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/977395.977673"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/11532378_13"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13374-9_21"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1012888.1005691"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1152154.1152182"},{"key":"e_1_2_2_39_1","volume-title":"Proceedings of the USENIX Annual Technical Conference. 115--126","author":"Payer Mathias","unstructured":"Mathias Payer , Enrico Kravina , and Thomas R. Gross . 2013. Lightweight memory tracing . In Proceedings of the USENIX Annual Technical Conference. 115--126 . Mathias Payer, Enrico Kravina, and Thomas R. Gross. 2013. Lightweight memory tracing. In Proceedings of the USENIX Annual Technical Conference. 115--126."},{"key":"e_1_2_2_40_1","unstructured":"Eric Petit and Fran\u00e7ois Bodin. 2010. Code-Partitioning for a Concise Characterization of Programs for Decoupled Code Tuning. Retrieved from http:\/\/hal.archives-ouvertes.fr\/hal-00460897.  Eric Petit and Fran\u00e7ois Bodin. 2010. Code-Partitioning for a Concise Characterization of Programs for Decoupled Code Tuning. Retrieved from http:\/\/hal.archives-ouvertes.fr\/hal-00460897."},{"key":"e_1_2_2_41_1","volume-title":"Proceedings of Compilers for Parallel Computers Workshop (CPC'12)","author":"Petit Eric","year":"2012","unstructured":"Eric Petit , Pablo de Oliveira Castro , Tarek Menour , Bettina Krammer , and William Jalby . 2012 . Computing-kernels performance prediction using dataflow analysis and microbenchmarking . In Proceedings of Compilers for Parallel Computers Workshop (CPC'12) . Eric Petit, Pablo de Oliveira Castro, Tarek Menour, Bettina Krammer, and William Jalby. 2012. Computing-kernels performance prediction using dataflow analysis and microbenchmarking. In Proceedings of Compilers for Parallel Computers Workshop (CPC'12)."},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1188455.1188602"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273440.1250713"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.19"},{"key":"e_1_2_2_45_1","volume-title":"Third Annual LLVM Developers Meeting.","author":"Sands D","year":"2009","unstructured":"D Sands . 2009 . Reimplementing llvm-gcc as a gcc plugin . In Third Annual LLVM Developers Meeting. D Sands. 2009. Reimplementing llvm-gcc as a gcc plugin. In Third Annual LLVM Developers Meeting."},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/645988.674158"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/635506.605403"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2010.38"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2724717","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2724717","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:12:12Z","timestamp":1750227132000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2724717"}},"subtitle":["LLVM-Based Codelet Extractor and REplayer for Piecewise Benchmarking and Optimization"],"short-title":[],"issued":{"date-parts":[[2015,4,16]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,4,16]]}},"alternative-id":["10.1145\/2724717"],"URL":"https:\/\/doi.org\/10.1145\/2724717","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,4,16]]},"assertion":[{"value":"2014-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-04-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}