{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:58:58Z","timestamp":1750309138174,"version":"3.41.0"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T00:00:00Z","timestamp":1689724800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>The ideal latency for on-chip network traversal would be the delay incurred from wire traversal alone. Unfortunately, in a realistic modular network, the latency for a packet to traverse the network is significantly higher than this wire delay. The main limiter to achieving lower latency is the modular quantization of network traversal into hops. Beyond this, the physical heterogeneity in real-world systems further complicate the ability to reach ideal wire-only delay.<\/jats:p>\n          <jats:p>\n            In this work, we propose\n            <jats:bold>TNT<\/jats:bold>\n            or\n            <jats:bold>Transparent Network Traversal<\/jats:bold>\n            . TNT targets ideal network latency by attempting source to destination network traversal as a single multi-cycle \u2018long-hop\u2019, bypassing the quantization effects of intermediate routers via transparent data\/information flow. TNT is built in a modular tile-scalable manner via a novel control path performing neighbor-to-neighbor interactions but enabling end-to-end transparent flit traversal. Further, TNT\u2019s fine grained on-the-fly delay tracking allows it to cope with physical NOC heterogeneity across the chip.\n          <\/jats:p>\n          <jats:p>\n            Analysis on Ligra graph workloads shows that TNT can reduce NOC latency by as much as 43% compared to the state of the art and allows efficiency gains up to 38%. Further, it can achieve more than 3x the benefits of the best\/closest alternative research proposal, SMART\u00a0[\n            <jats:xref ref-type=\"bibr\">43<\/jats:xref>\n            ].\n          <\/jats:p>","DOI":"10.1145\/3597611","type":"journal-article","created":{"date-parts":[[2023,5,22]],"date-time":"2023-05-22T11:47:33Z","timestamp":1684756053000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2334-2682","authenticated-orcid":false,"given":"Gokul Subramanian","family":"Ravi","sequence":"first","affiliation":[{"name":"University of Chicago, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5738-6942","authenticated-orcid":false,"given":"Tushar","family":"Krishna","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8535-9244","authenticated-orcid":false,"given":"Mikko","family":"Lipasti","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,7,19]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555810"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919636"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00049"},{"key":"e_1_3_2_5_2","unstructured":"AMD. 2022. Zen - Microarchitectures - AMD. https:\/\/en.wikichip.org\/wiki\/amd\/microarchitectures\/zen."},{"key":"e_1_3_2_6_2","unstructured":"Angstrom. 2022. Angstrom. http:\/\/projects.csail.mit.edu\/angstrom."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835953"},{"key":"e_1_3_2_8_2","unstructured":"ARM. 2022. ARM core link interconnect. https:\/\/www.arm.com\/products\/system-ip\/corelink-interconnect."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2002.1044296"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2591635.2667187"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2954679.2872414"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/2.976921"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2005.36"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2638459"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/4.982424"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2016.2538284"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1109\/ISCA.2016.64","volume-title":"2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","author":"Cherupalli H.","year":"2016","unstructured":"H. Cherupalli, R. Kumar, and J. Sartori. 2016. Exploiting dynamic timing slack for energy efficiency in ultra-low-power embedded systems. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 671\u2013681."},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1109\/ICCAD.2015.7372642","volume-title":"2015 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD)","author":"Cherupalli H.","year":"2015","unstructured":"H. Cherupalli and J. Sartori. 2015. Graph-based dynamic analysis: Efficient characterization of dynamic timing and activity distributions. In 2015 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). 729\u2013735."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/640000.640040"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/995703"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC.2001.156225"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1109\/ISCA.2014.6853232","volume-title":"2014 ACM\/IEEE 41st International Symposium on Computer Architecture (ISCA)","author":"Daya B. K.","year":"2014","unstructured":"B. K. Daya, C. O. Chen, S. Subramanian, W. Kwon, S. Park, T. Krishna, J. Holt, A. P. Chandrakasan, and L. Peh. 2014. SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering. In 2014 ACM\/IEEE 41st International Symposium on Computer Architecture (ISCA). 25\u201336."},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1109\/ISLPED.2013.6629293","volume-title":"International Symposium on Low Power Electronics and Design (ISLPED)","author":"Drake A. J.","year":"2013","unstructured":"A. J. Drake, M. S. Floyd, R. L. Willaman, D. J. Hathaway, J. Hernandez, C. Soja, M. D. Tiner, G. D. Carpenter, and R. M. Senger. 2013. Single-cycle, pulse-shaped critical path monitor in the POWER7+ microprocessor. In International Symposium on Low Power Electronics and Design (ISLPED). 193\u2013198."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2150982"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2009.4798251"},{"key":"e_1_3_2_27_2","first-page":"401","volume-title":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","author":"Grot B.","year":"2011","unstructured":"B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. 2011. Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees. In 2011 38th Annual International Symposium on Computer Architecture (ISCA). 401\u2013412."},{"key":"e_1_3_2_28_2","volume-title":"MICRO","author":"Gupta Meeta","year":"2009","unstructured":"Meeta Gupta, Jude A. Rivers, Pradip Bose, Gu-Yeon Wei, and David Brooks. 2009. Tribeca: Design for PVT variations with local recovery and fine-grained adaptation. In MICRO."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555779"},{"key":"e_1_3_2_30_2","volume-title":"Proceedings of the Biennial Conference on Innovative Data Systems Research","author":"Hardavellas Nikos","year":"2007","unstructured":"Nikos Hardavellas, Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia Ailamaki, and Babak Falsafi. 2007. Database servers on chip multiprocessors: Limitations and opportunities. In Proceedings of the Biennial Conference on Innovative Data Systems Research."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.4378783"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2010.5434077"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2010.2079450"},{"key":"e_1_3_2_34_2","unstructured":"Intel. 2022. Skylake - Microarchitectures - Intel. https:\/\/en.wikichip.org\/wiki\/intel\/microarchitectures\/skylake_(server)."},{"key":"e_1_3_2_35_2","unstructured":"Intel. 2022. Sunny Cove - Microarchitectures - Intel. https:\/\/en.wikichip.org\/wiki\/intel\/microarchitectures\/sunny_cove."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/NOCS.2010.15"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.2200\/S00772ED1V01Y201704CAC040"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.40"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2012.2202392"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2011.2164538"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669145"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.29"},{"key":"e_1_3_2_43_2","volume-title":"Enabling Dedicated Single-cycle Connections over a Shared Network-on-chip","author":"Krishna Tushar","year":"2014","unstructured":"Tushar Krishna. 2014. Enabling Dedicated Single-cycle Connections over a Shared Network-on-chip. Ph. D. Dissertation. MIT."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522334"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2009.64"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2010.5647666"},{"issue":"3","key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1109\/43.913759","article-title":"Pattern generation for delay testing and dynamic timing analysis considering power-supply noise effects","volume":"20","author":"Krstic A.","year":"2001","unstructured":"A. Krstic, Yi-Min Jiang, and Kwang-Ting Cheng. 2001. Pattern generation for delay testing and dynamic timing analysis considering power-supply noise effects. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 20, 3 (2001), 416\u2013425.","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2007.4601881"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2008.4771803"},{"key":"e_1_3_2_50_2","first-page":"477","volume-title":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","author":"Kurian G.","year":"2010","unstructured":"G. Kurian, J. E. Miller, J. Psota, J. Eastep, J. Liu, J. Michel, L. C. Kimerling, and A. Agarwal. 2010. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). 477\u2013488."},{"key":"e_1_3_2_51_2","first-page":"1","volume-title":"2011 44th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Lefurgy C. R.","year":"2011","unstructured":"C. R. Lefurgy, A. J. Drake, M. S. Floyd, M. S. Allen-Ware, B. Brock, J. A. Tierno, and J. B. Carter. 2011. Active management of timing guardband to save energy in POWER7. In 2011 44th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 1\u201311."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2013.52"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2000.824340"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2009.4798274"},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","first-page":"762","DOI":"10.1109\/HPCA.2018.00070","volume-title":"2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)","author":"McKeown M.","year":"2018","unstructured":"M. McKeown, A. Lavrov, M. Shahrad, P. J. Jackson, Y. Fu, J. Balkind, T. M. Nguyen, K. Lim, Y. Zhou, and D. Wentzlaff. 2018. Power and energy characterization of an open source 25-core manycore processor. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 762\u2013775."},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2004.1310774"},{"key":"e_1_3_2_57_2","volume-title":"Proc. of the 12th IFIP International Conference on Very Large Scale Integration (VLSI-SoC 2003)","author":"Pamunuwa D.","year":"2003","unstructured":"D. Pamunuwa, J. \u00d6berg, L. R. Zheng, M. Millberg, A. Jantsch, and H. Tenhunen. 2003. Layout, performance and power trade-offs in mesh-based network-on-chip architectures. In Proc. of the 12th IFIP International Conference on Very Large Scale Integration (VLSI-SoC 2003). Citeseer."},{"key":"e_1_3_2_58_2","first-page":"398","volume-title":"DAC Design Automation Conference 2012","author":"Park S.","year":"2012","unstructured":"S. Park, T. Krishna, C. Chen, B. Daya, A. Chandrakasan, and L. Peh. 2012. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI. In DAC Design Automation Conference 2012. 398\u2013405."},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228431"},{"key":"e_1_3_2_60_2","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1109\/HPCA.2019.00065","article-title":"Recycling data slack in out-of-order cores","author":"Ravi Gokul Subramanian","year":"2019","unstructured":"Gokul Subramanian Ravi and Mikko H. Lipasti. 2019. Recycling data slack in out-of-order cores. 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) (2019), 545\u2013557.","journal-title":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)"},{"key":"e_1_3_2_61_2","first-page":"C30\u2013C31","volume-title":"2019 Symposium on VLSI Circuits","author":"Rovinski A.","year":"2019","unstructured":"A. Rovinski, C. Zhao, K. Al-Hawaj, P. Gao, S. Xie, C. Torng, S. Davidson, A. Amarnath, L. Vega, B. Veluri, A. Rao, T. Ajayi, J. Puscar, S. Dai, R. Zhao, D. Richmond, Z. Zhang, I. Galton, C. Batten, M. B. Taylor, and R. G. Dreslinski. 2019. A 1.4 GHz 695 giga risc-V Inst\/s 496-core manycore processor with mesh on-chip network and an all-digital synthesized PLL in 16nm CMOS. In 2019 Symposium on VLSI Circuits. C30\u2013C31."},{"key":"e_1_3_2_62_2","first-page":"1","volume-title":"2017 IEEE\/ACM International Symposium on Low Power Electronics and Design (ISLPED)","author":"Ryu S.","year":"2017","unstructured":"S. Ryu, J. Koo, and J. Kim. 2017. Low design overhead timing error correction scheme for elastic clock methodology. In 2017 IEEE\/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1\u20136."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSM.2007.913186"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442530"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/2312005.2312018"},{"key":"e_1_3_2_66_2","first-page":"1","volume-title":"2015 IEEE Hot Chips 27 Symposium (HCS)","author":"Sodani A.","year":"2015","unstructured":"A. Sodani. 2015. Knights landing (KNL): 2nd generation Intel\u00ae Xeon Phi processor. In 2015 IEEE Hot Chips 27 Symposium (HCS). 1\u201324."},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/NOCS.2012.31"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2005.24"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3287624.3287683"},{"key":"e_1_3_2_70_2","first-page":"6 pp.\u2013165","volume-title":"24th IEEE VLSI Test Symposium","author":"Vorisek V.","year":"2006","unstructured":"V. Vorisek, B. Swanson, Kun-Han Tsai, and D. Goswami. 2006. Improved handling of false and multicycle paths in ATPG. In 24th IEEE VLSI Test Symposium. 6 pp.\u2013165."},{"key":"e_1_3_2_71_2","doi-asserted-by":"crossref","first-page":"276","DOI":"10.7873\/DATE.2013.069","volume-title":"2013 Design, Automation Test in Europe Conference Exhibition (DATE)","author":"Wagner M.","year":"2013","unstructured":"M. Wagner and H. Wunderlich. 2013. Efficient variation-aware statistical dynamic timing analysis for delay test applications. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE). 276\u2013281."},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.4378780"},{"key":"e_1_3_2_73_2","first-page":"1","volume-title":"2013 23rd International Conference on Field Programmable Logic and Applications","author":"Zheng H.","year":"2013","unstructured":"H. Zheng, S. T. Gurumani, L. Yang, D. Chen, and K. Rupnow. 2013. High-level synthesis with behavioral level multi-cycle path analysis. In 2013 23rd International Conference on Field Programmable Logic and Applications. 1\u20138."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3597611","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3597611","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:50:13Z","timestamp":1750287013000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3597611"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,19]]},"references-count":72,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3597611"],"URL":"https:\/\/doi.org\/10.1145\/3597611","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2023,7,19]]},"assertion":[{"value":"2022-02-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-04-24","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}