{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T13:46:18Z","timestamp":1753883178221,"version":"3.41.2"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"name":"National Key Research and Development Program of China","award":["2022YFB3105100"],"award-info":[{"award-number":["2022YFB3105100"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>\n            Thread-Level Speculation (TLS) utilizes speculative parallelization to accelerate\n            <jats:italic toggle=\"yes\">hard-to-parallelize<\/jats:italic>\n            serial codes on multi-cores. As the heterogeneous multi-core architecture is becoming ubiquitous, it presents an opportunity for TLS to reorganize little cores for the acceleration of these serial codes instead of a big core with similar or more area and power. However, previous TLS designs significantly suffer from extended hardware overhead and costly speculative forwarding.\n          <\/jats:p>\n          <jats:p>\n            We present LitTLS, a lightweight TLS design with versioning caches to eliminate significant extended hardware overhead by storing versions in caches without speculative write buffers and memory undo-logs. Additionally, LitTLS introduces the Speculative Address Table, a novel component to accelerate speculative forwarding with a central structure to trace memory dependencies. Evaluations on four little cores show that LitTLS achieves an average performance speedup of 2.87\u00d7 compared to a little core, outperforming a big core by 94% with similar area and less power. The extended area size is only 0.07 mm\n            <jats:sup>2<\/jats:sup>\n            , and the maximum increase in dynamic power consumption is limited to 0.3%, compared to four little cores.\n          <\/jats:p>","DOI":"10.1145\/3719655","type":"journal-article","created":{"date-parts":[[2025,2,26]],"date-time":"2025-02-26T11:24:57Z","timestamp":1740569097000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["LitTLS: Lightweight Thread-Level Speculation on Little Cores"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-9782-1258","authenticated-orcid":false,"given":"Xin","family":"Cheng","sequence":"first","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"University of Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7451-3200","authenticated-orcid":false,"given":"Jinpeng","family":"Ye","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"University of Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4276-2460","authenticated-orcid":false,"given":"Haoyu","family":"Deng","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"University of Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1724-4904","authenticated-orcid":false,"given":"Tingting","family":"Zhang","sequence":"additional","affiliation":[{"name":"Loongson Technology Co. Ltd.","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5341-1343","authenticated-orcid":false,"given":"Tianyi","family":"Liu","sequence":"additional","affiliation":[{"name":"Computer Science, The University of Texas at San Antonio","place":["San Antonio, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-0651-5965","authenticated-orcid":false,"given":"Jian","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"University of Chinese Academy of Sciences","place":["Beijing, China"]}]}],"member":"320","published-online":{"date-parts":[[2025,7,2]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2023.3295848"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPDC.2009.18"},{"key":"e_1_3_2_4_2","unstructured":"Apple. 2022. ml-ane-transformers. Retrieved August 19 2024 from https:\/\/github.com\/apple\/ml-ane-transformers"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2018.10.006"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2013.6799113"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2013.6618836"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454128"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-09766-4_499"},{"key":"e_1_3_2_11_2","unstructured":"Len Brown. 2024. turbostat. Retrieved August 23 2024 from https:\/\/manpages.ubuntu.com\/manpages\/xenial\/man8\/turbostat.8.html"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-016-1710-2"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.43"},{"key":"e_1_3_2_14_2","volume-title":"Introduction to Algorithms (4th ed.)","author":"Cormen Thomas H.","year":"2022","unstructured":"Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2022. Introduction to Algorithms (4th ed.). MIT Press. 2021037260https:\/\/books.google.co.uk\/books?id=drZNEAAAQBAJ"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1090\/dimacs\/074"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/IA3.2016.007"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/2938369"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-016-0421-x"},{"key":"e_1_3_2_19_2","unstructured":"Unix and Linux Forums. 2019. powermetrics. Retrieved August 23 2024 from https:\/\/www.unix.com\/man-page\/osx\/1\/powermetrics\/"},{"key":"e_1_3_2_20_2","unstructured":"Andrei Frumusanu. 2021. Apple Announces M1 Pro & M1 Max: Giant New Arm SoCs with All-Out Performance. Retrieved August 23 2024 from https:\/\/www.anandtech.com\/show\/17019\/apple-announced-m1-pro-m1-max-giant-new-socs-with-allout-performance"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2017.21"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2018.00023"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.1998.650559"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2010.7480072"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/APCSAC.2008.4625443"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/40.848474"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.63"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/1186736.1186737"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/1391469.1391660"},{"key":"e_1_3_2_30_2","unstructured":"Intel. 2024. Intel\u00ae Core\u2122 Ultra Processor Datasheet Volume 1 of 2. Retrieved August 19 2024 from https:\/\/www.intel.com\/content\/www\/us\/en\/content-details\/792044\/intel-core-ultra-processor-datasheet-volume-1-of-2.html"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2016.12"},{"key":"e_1_3_2_32_2","unstructured":"Andi Kleen. 2020. tsx-tools. Retrieved April 15 2024 from https:\/\/github.com\/andikleen\/tsx-tools"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2014.2380199"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669172"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2023.102822"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC\/SmartCity\/DSS.2019.00325"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICISCE.2018.00263"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2007.70797"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2008.4636089"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2006.1598134"},{"key":"e_1_3_2_41_2","first-page":"29","volume-title":"Proceedings of the USENIX Winter 1993 Conference","year":"1993","unstructured":"Frank Mueller. 1993. A library implementation of POSIX threads under UNIX. In Proceedings of the USENIX Winter 1993 Conference. 29\u201342."},{"issue":"1","key":"e_1_3_2_42_2","first-page":"Article 7, 12 p","article-title":"IBM Blue Gene\/Q memory subsystem with speculative execution and transactional memory","volume":"57","author":"Ohmacht Martin","year":"2013","unstructured":"Martin Ohmacht, Amy Wang, Thomas Gooding, Ben Nathanson, Indira Nair, Geert Janssen, Marcel Schaal, and Burkhard Steinmacher-Burow. 2013. IBM Blue Gene\/Q memory subsystem with speculative execution and transactional memory. IBM Journal of Research and Development 57, 1-2 (2013), Article 7, 12 pages.","journal-title":"IBM Journal of Research and Development"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.26"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919640"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2001.991127"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088178"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088173"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2022.3164338"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3546591.3547532"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.84"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2752169"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/223982.224451"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.2200\/S00346ED1V01Y201104CAC016"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/1082469.1082471"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO56248.2022.00025"},{"key":"e_1_3_2_56_2","first-page":"1131","volume-title":"Proceedings of the 2012 Design Automation Conference (DAC \u201912)","author":"Taylor Michael B.","year":"2012","unstructured":"Michael B. Taylor. 2012. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse. In Proceedings of the 2012 Design Automation Conference (DAC \u201912). IEEE, 1131\u20131136."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00012"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370836"},{"key":"e_1_3_2_59_2","unstructured":"Wikipedia Contributors. 2024. Alder Lake\u2014Wikipedia The Free Encyclopedia. Retrieved April 11 2024 from https:\/\/en.wikipedia.org\/w\/index.php?title=Alder_Lake&oldid=1216526350"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346204"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00024"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS51385.2021.00030"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3719655","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T12:20:23Z","timestamp":1751458823000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3719655"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":61,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3719655"],"URL":"https:\/\/doi.org\/10.1145\/3719655","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2024-08-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}