{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:22:23Z","timestamp":1750220543309,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,7,17]],"date-time":"2021-07-17T00:00:00Z","timestamp":1626480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018YFB0204300"],"award-info":[{"award-number":["2018YFB0204300"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62002368"],"award-info":[{"award-number":["62002368"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012152","name":"National Postdoctoral Program for Innovative Talents","doi-asserted-by":"publisher","award":["BX20190091"],"award-info":[{"award-number":["BX20190091"]}],"id":[{"id":"10.13039\/501100012152","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>Hierarchical organization is widely used in high-radix routers to enable efficient scaling to higher switch port count. A general-purpose hierarchical router must be symmetrically designed with the same input buffer depth, resulting in a large amount of unused input buffers due to the different link lengths. Sharing input buffers between different input ports can improve buffer utilization, but the implementation overhead also increases with the number of shared ports. Previous work allowed input buffers to be shared among all router ports, which maximizes the buffer utilization but also introduces higher implementation complexity. Moreover, such design can impair performance when faced with long packets, due to the head-of-line blocking in intermediate buffers.<\/jats:p>\n          <jats:p>In this work, we explain that sharing unused buffers between a subset of router ports is a more efficient design. Based on this observation, we propose Centralized Input Buffer Design in Hierarchical High-radix Routers (CIB-HIER), a novel centralized input buffer design for hierarchical high-radix routers. CIB-HIER integrates multiple input ports onto a single tile and organizes all unused input buffers in the tile as a centralized input buffer. CIB-HIER only allows the centralized input buffer to be shared between ports on the same tile, without introducing additional intermediate virtual channels or global scheduling circuits. Going beyond the basic design of CIB-HIER, the centralized input buffer can be used to relieve the head-of-line blocking caused by shallow intermediate buffers, by stashing long packets in the centralized input buffer. Experimental results show that CIB-HIER is highly effective and can significantly increase the throughput of high-radix routers.<\/jats:p>","DOI":"10.1145\/3468062","type":"journal-article","created":{"date-parts":[[2021,7,17]],"date-time":"2021-07-17T10:05:22Z","timestamp":1626516322000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["CIB-HIER"],"prefix":"10.1145","volume":"18","author":[{"given":"Cunlu","family":"Li","sequence":"first","affiliation":[{"name":"College of Computer, National University of Defense Technology, China"}]},{"given":"Dezun","family":"Dong","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, China"}]},{"given":"Shazhou","family":"Yang","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, China"}]},{"given":"Xiangke","family":"Liao","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, China"}]},{"given":"Guangyu","family":"Sun","sequence":"additional","affiliation":[{"name":"Peking University, China"}]},{"given":"Yongheng","family":"Liu","sequence":"additional","affiliation":[{"name":"Peng Cheng Laboratory, China"}]}],"member":"320","published-online":{"date-parts":[[2021,7,17]]},"reference":[{"volume-title":"Proceedings of the IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201918)","author":"Blumrich Matthias A.","unstructured":"Matthias A. Blumrich , Nan Jiang , and Larry R. Dennison . 2018. Exploiting idle resources in a high-radix switch for supplemental storage . In Proceedings of the IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201918) . 1\u201313. Matthias A. Blumrich, Nan Jiang, and Larry R. Dennison. 2018. Exploiting idle resources in a high-radix switch for supplemental storage. In Proceedings of the IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201918). 1\u201313.","key":"e_1_2_1_1_1"},{"volume-title":"Proceedings of The ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201909)","author":"Ahn Jung Ho","unstructured":"Jung Ho Ahn , Nathan Binkert , Al Davis , Moray McLaren , and Robert S. Schreiber . 2009. HyperX: Topology, routing, and packaging of efficient large-scale networks . In Proceedings of The ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201909) . 41:1\u201341:11. Jung Ho Ahn, Nathan Binkert, Al Davis, Moray McLaren, and Robert S. Schreiber. 2009. HyperX: Topology, routing, and packaging of efficient large-scale networks. In Proceedings of The ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201909). 41:1\u201341:11.","key":"e_1_2_1_2_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_3_1","DOI":"10.1109\/HPCA.2012.6169048"},{"doi-asserted-by":"publisher","key":"e_1_2_1_4_1","DOI":"10.1016\/j.parco.2016.01.009"},{"volume-title":"Proceedings of the ACM\/IEEE International Conference on Supercomputing. 158\u2013165","author":"Bailey D. H.","unstructured":"D. H. Bailey , E. Barszcz , H. D. Simon , V. Venkatakrishnan , S. K. Weeratunga , J. T. Barton , D. S. Browning , R. L. Carter , L. Dagum , and R. A. Fatoohi . 1991. The NAS parallel benchmarks . In Proceedings of the ACM\/IEEE International Conference on Supercomputing. 158\u2013165 . D. H. Bailey, E. Barszcz, H. D. Simon, V. Venkatakrishnan, S. K. Weeratunga, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, and R. A. Fatoohi. 1991. The NAS parallel benchmarks. In Proceedings of the ACM\/IEEE International Conference on Supercomputing. 158\u2013165.","key":"e_1_2_1_5_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.1145\/3296957.3177158"},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.1109\/SC.2014.34"},{"volume-title":"Proceedings of the 23th IEEE Annual Symposium on High-performance Interconnects (HOTI\u201915)","author":"Birrittella Mark S.","unstructured":"Mark S. Birrittella , Mark Debbage , Ram Huggahalli , James Kunz , Tom Lovett , Todd Rimmer , Keith D. Underwood , and Robert C. Zak . 2015. Intel omni-path architecture: Enabling scalable, high performance fabrics . In Proceedings of the 23th IEEE Annual Symposium on High-performance Interconnects (HOTI\u201915) . 1\u20139. Mark S. Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D. Underwood, and Robert C. Zak. 2015. Intel omni-path architecture: Enabling scalable, high performance fabrics. In Proceedings of the 23th IEEE Annual Symposium on High-performance Interconnects (HOTI\u201915). 1\u20139.","key":"e_1_2_1_8_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1109\/TPDS.2018.2873337"},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1109\/IPDPS.2017.15"},{"volume-title":"Principles and Practices of Interconnection Networks","author":"Dally William James","unstructured":"William James Dally and Brian Patrick Towles . 2004. Principles and Practices of Interconnection Networks . Elsevier . William James Dally and Brian Patrick Towles. 2004. Principles and Practices of Interconnection Networks. Elsevier.","key":"e_1_2_1_11_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_13_1","DOI":"10.1109\/NOCS.2012.8"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.1007\/s11432-016-5588-7"},{"doi-asserted-by":"publisher","key":"e_1_2_1_15_1","DOI":"10.1109\/IPDPS.2014.37"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1109\/NoCS.2013.6558397"},{"doi-asserted-by":"publisher","key":"e_1_2_1_17_1","DOI":"10.5555\/3195638.3195674"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.1109\/HPCA.2014.6835943"},{"doi-asserted-by":"publisher","key":"e_1_2_1_19_1","DOI":"10.1109\/SC.2006.10"},{"doi-asserted-by":"publisher","key":"e_1_2_1_20_1","DOI":"10.1145\/1273440.1250679"},{"doi-asserted-by":"publisher","key":"e_1_2_1_21_1","DOI":"10.1109\/ISCA.2008.19"},{"volume-title":"Proceedings of the 32th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201905)","author":"Kim John","unstructured":"John Kim , William J. Dally , Brian Towles , and Amit K. Gupta . 2005. Microarchitecture of a high radix router . In Proceedings of the 32th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201905) . 420\u2013431. John Kim, William J. Dally, Brian Towles, and Amit K. Gupta. 2005. Microarchitecture of a high radix router. In Proceedings of the 32th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201905). 420\u2013431.","key":"e_1_2_1_22_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.1145\/3392717.3392747"},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.1145\/3330345.3330381"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.1007\/s11390-015-1520-7"},{"volume-title":"Proceedings of the International Conference on Parallel Processing Workshops. 1\u20139.","author":"Lu Qingda","unstructured":"Qingda Lu , Jiesheng Wu , Dhabaleswar Panda , and P. Sadayappan . 2004. Applying MPI derived datatypes to the NAS benchmarks: A case study . In Proceedings of the International Conference on Parallel Processing Workshops. 1\u20139. Qingda Lu, Jiesheng Wu, Dhabaleswar Panda, and P. Sadayappan. 2004. Applying MPI derived datatypes to the NAS benchmarks: A case study. In Proceedings of the International Conference on Parallel Processing Workshops. 1\u20139.","key":"e_1_2_1_26_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_27_1","DOI":"10.1109\/HPCA.2012.6169049"},{"doi-asserted-by":"publisher","key":"e_1_2_1_28_1","DOI":"10.1109\/NOCS.2010.17"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.1145\/1150019.1136488"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.1109\/NOCS.2012.31"},{"doi-asserted-by":"publisher","key":"e_1_2_1_31_1","DOI":"10.1109\/12.144624"},{"doi-asserted-by":"publisher","key":"e_1_2_1_32_1","DOI":"10.1145\/2377677.2377711"},{"doi-asserted-by":"publisher","key":"e_1_2_1_33_1","DOI":"10.1109\/TVLSI.2016.2536747"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3468062","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3468062","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:06Z","timestamp":1750195686000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3468062"}},"subtitle":["Centralized Input Buffer Design in Hierarchical High-radix Routers"],"short-title":[],"issued":{"date-parts":[[2021,7,17]]},"references-count":32,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3468062"],"URL":"https:\/\/doi.org\/10.1145\/3468062","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2021,7,17]]},"assertion":[{"value":"2020-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}