{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T15:57:18Z","timestamp":1767974238663,"version":"3.49.0"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2020,5,29]],"date-time":"2020-05-29T00:00:00Z","timestamp":1590710400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["1633318 and 1633412"],"award-info":[{"award-number":["1633318 and 1633412"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,6,30]]},"abstract":"<jats:p>\n            Remote Direct Memory Access (RDMA) fabrics such as InfiniBand and Converged Ethernet report latency shorter by a factor of 50 than TCP. As such, RDMA is a potential replacement for TCP in datacenters (DCs) running low-latency applications, such as Web search and memcached. InfiniBand\u2019s Shared Receive Queues (SRQs), which use two-sided send\/recv verbs (i.e.,\n            <jats:italic>channel semantics<\/jats:italic>\n            ), reduce the amount of pre-allocated, pinned memory (despite optimizations such as InfiniBand\u2019s on-demand paging (ODP)) for message buffers. However, SRQs are limited fundamentally to a single message size per queue, which incurs either memory wastage or significant programmer burden for typical DC traffic of an arbitrary number (level of burstiness) of messages of arbitrary size.\n          <\/jats:p>\n          <jats:p>\n            We propose\n            <jats:italic>remote indirect memory access (RIMA)<\/jats:italic>\n            , which avoids these pitfalls by providing (1) network interface card (NIC) microarchitecture support for novel\n            <jats:italic>queue semantics<\/jats:italic>\n            and (2) a new \u201cverb\u201d called\n            <jats:italic>append<\/jats:italic>\n            . To append a sender\u2019s message to a shared queue, the receiver NIC atomically increments the queue\u2019s tail pointer by the incoming message\u2019s size and places the message in the newly created space. As in traditional RDMA, the NIC is responsible for pointer lookup, address translation, and enforcing virtual memory protections. This\n            <jats:italic>indirection<\/jats:italic>\n            of specifying a queue (and not its tail pointer, which remains hidden from senders) handles the typical DC traffic of an arbitrary sender sending an arbitrary number of messages of arbitrary size. Because RIMA\u2019s simple hardware adds only 1--2 ns to the multi-\\mu\n            <jats:italic>s<\/jats:italic>\n            message latency, RIMA achieves the same message latency and throughput as InfiniBand SRQ with unlimited buffering. Running memcached traffic on a 30-node InfiniBand cluster, we show that at similar, low programmer effort, RIMA achieves significantly smaller memory footprint than SRQ. However, while SRQ can be crafted to minimize memory footprint by expending significant programming effort, RIMA provides those benefits with little programmer effort. For memcached traffic, a high-performance key-value cache (\n            <jats:italic>FastKV<\/jats:italic>\n            ) using RIMA achieves either 3\u00d7 lower 96 th-percentile latency or significantly better throughput or memory footprint than\n            <jats:italic>FastKV<\/jats:italic>\n            using RDMA.\n          <\/jats:p>","DOI":"10.1145\/3374215","type":"journal-article","created":{"date-parts":[[2020,5,30]],"date-time":"2020-05-30T04:22:06Z","timestamp":1590812526000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters"],"prefix":"10.1145","volume":"17","author":[{"given":"Jiachen","family":"Xue","sequence":"first","affiliation":[{"name":"Nvidia, Santa Clara, CA"}]},{"given":"T. N.","family":"Vijaykumar","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, IN"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4164-4542","authenticated-orcid":false,"given":"Mithuna","family":"Thottethodi","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, IN"}]}],"member":"320","published-online":{"date-parts":[[2020,5,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1402958.1402967"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1851182.1851192"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects. 83--87","author":"Alverson R.","year":"2010"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing\u201998)","author":"Ang Boon S.","year":"1998"},{"key":"e_1_2_1_5_1","unstructured":"apache [n.d.]. Apache Performance Tuning. Retrieved from https:\/\/httpd.apache.org\/docs\/2.4\/misc\/perf-tuning.html.  apache [n.d.]. Apache Performance Tuning. Retrieved from https:\/\/httpd.apache.org\/docs\/2.4\/misc\/perf-tuning.html."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2254756.2254766"},{"key":"e_1_2_1_7_1","unstructured":"aws [n.d.]. Amazon EC2 Instance Types. Retrieved rom https:\/\/aws.amazon.com\/ec2\/instance-types\/.  aws [n.d.]. Amazon EC2 Instance Types. Retrieved rom https:\/\/aws.amazon.com\/ec2\/instance-types\/."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2003.1196112"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201994)","author":"Blumrich M. A.","year":"1919"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.342015"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA\u201995)","author":"Brewer Eric A."},{"key":"e_1_2_1_12_1","first-page":"04","article-title":"Analog-digital technologies for mixed-signal processing: The driving force to success for the European industry","volume":"25","author":"Casier H. J.","year":"1992","journal-title":"IEEE Micro"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI\u201914)","author":"Dragojevi\u0107 Aleksandar","year":"2014"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.671404"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190550"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190550"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201910)","author":"Gran E. G.","year":"2010"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201912)","author":"Gran E. G.","year":"2012"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the European Conference on Message Passing Interface (EuroMPI\u201913)","author":"Hassani Amin"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201996)","author":"Hill M. D.","year":"1996"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201912)","author":"Huang J."},{"key":"e_1_2_1_22_1","unstructured":"ibspec [n.d.]. Infiniband Specification. Technical Report.  ibspec [n.d.]. Infiniband Specification. Technical Report."},{"key":"e_1_2_1_23_1","unstructured":"infiniband [n.d.]. Infiniband Performance. Retrieved from http:\/\/www.mellanox.com\/page\/performance_infiniband.  infiniband [n.d.]. Infiniband Performance. Retrieved from http:\/\/www.mellanox.com\/page\/performance_infiniband."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201912)","author":"Islam N. S."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID\u201912)","author":"Jose Jithin","year":"2012"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the International Conference on Parallel Processing (ICPP\u201911)","author":"Jose J.","year":"2011"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the International Conference on Parallel Processing (ICPP\u201911)","author":"Jose Jithin","year":"2011"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2740070.2626299"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC\u201916)","author":"Kalia Anuj"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Kalia Anuj"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201906)","author":"Scott Rixner Kim","year":"2006"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2008.19"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.35"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201914)","author":"Koop Matthew J."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 21st Annual International Conference on Supercomputing. 180--189","author":"Koop Matthew J."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201916)","author":"Li Mingzhe"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201913)","author":"Lim Kevin"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201903)","author":"Liu Jiuxing"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the International Conference on Parallel Processing (ICPP\u201913)","author":"Lu X."},{"key":"e_1_2_1_40_1","volume-title":"The remote enqueue operation on networks of workstations","author":"Markatos Evangelos P."},{"key":"e_1_2_1_41_1","unstructured":"memcached [n.d.]. Memcached. Retrieved from http:\/\/memcached.org\/.  memcached [n.d.]. Memcached. Retrieved from http:\/\/memcached.org\/."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the ACM\/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS\u201909)","author":"Miller David J.","year":"1882"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201913)","author":"Mitchell Christopher","year":"2013"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2785956.2787510"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the Conference on Networked Systems Design and Implementation (NSDI\u201913)","author":"Nishtala Rajesh","year":"2013"},{"key":"e_1_2_1_47_1","volume-title":"Scale-out NUMA. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201914)","author":"Novakovic Stanko","year":"2014"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/224170.224360"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/HIS.2001.946704"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 2015 IEEE International Conference on Big Data.","author":"Shankar D."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD\u201916)","author":"Sriraman Akshitha"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3064176.3064189"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the IEEE Cluster. 203--212","author":"Subramoni Hari","year":"2008"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906)","author":"Sur Sayantan","year":"1898"},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the Annual Conference on Principles and Practice of Parallel Programming (PPoPP\u201906)","author":"Sur Sayantan"},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the Symposium on Operating Systems Principles (SOSP\u201995)","author":"von Eicken T."},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the (HPCA-11)","author":"Willmann Paul","year":"2005"},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the International Conference on Parallel Processing (ICPP\u201903)","author":"Wu Jiesheng"},{"key":"e_1_2_1_59_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906)","author":"Yu Weikuan","year":"1898"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2785956.2787484"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3374215","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3374215","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3374215","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:47Z","timestamp":1750200107000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3374215"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,29]]},"references-count":59,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,6,30]]}},"alternative-id":["10.1145\/3374215"],"URL":"https:\/\/doi.org\/10.1145\/3374215","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,29]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}