{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T16:42:57Z","timestamp":1761324177092,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2018,7,31]],"date-time":"2018-07-31T00:00:00Z","timestamp":1532995200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Chinese National Mega Project of Scientific Research","award":["2014ZX01030101"],"award-info":[{"award-number":["2014ZX01030101"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2018,7,31]]},"abstract":"<jats:p>Utilizing analytical models to evaluate proposals or provide guidance in high-level architecture decisions is been becoming more and more attractive. A certain number of methods have emerged regarding cache behaviors and quantified insights in the last decade, such as the stack distance theory and the memory level parallelism (MLP) estimations. However, prior research normally oversimplified the factors that need to be considered in out-of-order processors, such as the effects triggered by reordered memory instructions, and multiple dependences among memory instructions, along with the merged accesses in the same MSHR entry. These ignored influences actually result in low and unstable precisions of recent analytical models.<\/jats:p>\n          <jats:p>By quantifying the aforementioned effects, this article proposes a cache performance evaluation framework equipped with three analytical models, which can more accurately predict cache misses, MLPs, and the average cache miss service time, respectively. Similar to prior studies, these analytical models are all fed with profiled software characteristics in which case the architecture evaluation process can be accelerated significantly when compared with cycle-accurate simulations.<\/jats:p>\n          <jats:p>\n            We evaluate the accuracy of proposed models compared with\n            <jats:italic>gem5<\/jats:italic>\n            cycle-accurate simulations with 16 benchmarks chosen from Mobybench Suite 2.0, Mibench 1.0, and Mediabench II. The average root mean square errors for predicting cache misses, MLPs, and the average cache miss service time are around 4%, 5%, and 8%, respectively. Meanwhile, the average error of predicting the stall time due to cache misses by our framework is as low as 8%. The whole cache performance estimation can be sped by about 15 times versus\n            <jats:italic>gem5<\/jats:italic>\n            cycle-accurate simulations and 4 times when compared with recent studies. Furthermore, we have shown and studied the insights between different performance metrics and the reorder buffer sizes by using our models. As an application case of the framework, we also demonstrate how to use our framework combined with McPAT to find out Pareto optimal configurations for cache design space explorations.\n          <\/jats:p>","DOI":"10.1145\/3233182","type":"journal-article","created":{"date-parts":[[2018,8,13]],"date-time":"2018-08-13T12:29:41Z","timestamp":1534163381000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["An Analytical Cache Performance Evaluation Framework for Embedded Out-of-Order Processors Using Software Characteristics"],"prefix":"10.1145","volume":"17","author":[{"given":"Kecheng","family":"Ji","sequence":"first","affiliation":[{"name":"Southeast University, Nanjing, China"}]},{"given":"Ming","family":"Ling","sequence":"additional","affiliation":[{"name":"Southeast University, Nanjing, China"}]},{"given":"Longxing","family":"Shi","sequence":"additional","affiliation":[{"name":"Southeast University, Nanjing China"}]},{"given":"Jianping","family":"Pan","sequence":"additional","affiliation":[{"name":"University of Victoria, Canada"}]}],"member":"320","published-online":{"date-parts":[[2018,8,9]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"751","article-title":"Out-of-order processor with a memory subsystem which handles speculatively dispatched load operations. (May 12, 1998)","volume":"5","author":"Abramson Jeffrey M.","year":"1998","journal-title":"US Patent"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2700100"},{"volume-title":"USENIX Annual Technical Conference, FREENIX Track. 41--46","year":"2005","author":"Bellard Fabrice","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1153925.1154584"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2006.1620793"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2012.6189202"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/782814.782836"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993744.1993746"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/143371.143486"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1028176.1006708"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2013.10.053"},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Roeland J. Douma Sebastian Altmeyer and Andy D. Pimentel. 2015. Fast and precise cache performance estimation for out-of-order execution. In Design Automation 8 Test in Europe Conference & Exhibition (DATE\u201915). IEEE 1132--1137.   Roeland J. Douma Sebastian Altmeyer and Andy D. Pimentel. 2015. Fast and precise cache performance estimation for out-of-order execution. In Design Automation 8 Test in Europe Conference & Exhibition (DATE\u201915). IEEE 1132--1137.","DOI":"10.7873\/DATE.2015.0066"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2631948.2631951"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2010.5452069"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1534909.1534910"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/2015039.2015543"},{"key":"e_1_2_1_19_1","unstructured":"Peter Greenhalgh. 2011. Big.LITTLE processing with ARM Cortex-A15 8 Cortex-A7. ARM White Paper (2011) 1--8.  Peter Greenhalgh. 2011. Big.LITTLE processing with ARM Cortex-A15 8 Cortex-A7. ARM White Paper (2011) 1--8."},{"volume-title":"Proceedings of the 2001 IEEE International Workshop on Workload Characterization (WWC-4\u201901)","author":"Guthaus Matthew R.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844457"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/540298"},{"key":"e_1_2_1_24_1","unstructured":"John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier.   John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier."},{"key":"e_1_2_1_25_1","article-title":"The microarchitecture of the Pentium\u00ae 4 processor. In Intel Technology","author":"Hinton Glenn","year":"2001","journal-title":"Journal. Citeseer."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844460"},{"key":"e_1_2_1_27_1","first-page":"555","article-title":"Cache memory, including miss status\/information and a method using the same. (Oct. 8, 2013)","volume":"8","author":"Ishizaka Kazuhisa","year":"2013","journal-title":"US Patent"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.42"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3130379.3130392"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2017.02.005"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 5th International Conference on Signal Processing (WCCC-ICSP\u201900)","volume":"3","author":"Jin Wen","year":"2000"},{"volume-title":"Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004. Proceedings. IEEE, 338--349","author":"Tejas","key":"e_1_2_1_32_1"},{"volume-title":"Proceedings of the 30th Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 330--335","author":"Lee Chunho","key":"e_1_2_1_33_1"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2445572.2445577"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1391469.1391551"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2539036.2539039"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1064978.1065034"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2907071"},{"volume-title":"Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA-9","year":"2003","author":"Mutlu Onur","key":"e_1_2_1_39_1"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2015.7095785"},{"key":"e_1_2_1_41_1","unstructured":"Lutz Prechelt et al. 1994. Proben1: A set of neural network benchmark problems and benchmarking rules. (1994).  Lutz Prechelt et al. 1994. Proben1: A set of neural network benchmark problems and benchmarking rules. (1994)."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.811115"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/2635509.2635559"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2016.7461361"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2547387"},{"key":"e_1_2_1_46_1","unstructured":"Wei Wang and Tanima Dey. 2011. A survey on ARM Cortex a processors. Retrieved March 2011 from http:\/\/www.cs.virginia.edu\/shadron\/cs8535s11\/armcotex.pdf.  Wei Wang and Tanima Dey. 2011. A survey on ARM Cortex a processors. Retrieved March 2011 from http:\/\/www.cs.virginia.edu\/shadron\/cs8535s11\/armcotex.pdf."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/2485288.2485432"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3233182","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3233182","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:07:55Z","timestamp":1750212475000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3233182"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,31]]},"references-count":45,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,7,31]]}},"alternative-id":["10.1145\/3233182"],"URL":"https:\/\/doi.org\/10.1145\/3233182","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2018,7,31]]},"assertion":[{"value":"2017-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-08-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}