{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T06:14:48Z","timestamp":1775283288372,"version":"3.50.1"},"reference-count":93,"publisher":"Association for Computing Machinery (ACM)","issue":"1-2","license":[{"start":{"date-parts":[[2020,5,31]],"date-time":"2020-05-31T00:00:00Z","timestamp":1590883200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005416","name":"Norges Forskningsr\u00e5d","doi-asserted-by":"crossref","award":["235530"],"award-info":[{"award-number":["235530"]}],"id":[{"id":"10.13039\/501100005416","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Syst."],"published-print":{"date-parts":[[2020,5,31]]},"abstract":"<jats:p>The large variety of compute-heavy and data-driven applications accelerate the need for a distributed I\/O solution that enables cost-effective scaling of resources between networked hosts. For example, in a cluster system, different machines may have various devices available at different times, but moving workloads to remote units over the network is often costly and introduces large overheads compared to accessing local resources. To facilitate I\/O disaggregation and device sharing among hosts connected using Peripheral Component Interconnect Express (PCIe) non-transparent bridges, we present SmartIO. NVMes, GPUs, network adapters, or any other standard PCIe device may be borrowed and accessed directly, as if they were local to the remote machines. We provide capabilities beyond existing disaggregation solutions by combining traditional I\/O with distributed shared-memory functionality, allowing devices to become part of the same global address space as cluster applications. Software is entirely removed from the data path, and simultaneous sharing of a device among application processes running on remote hosts is enabled. Our experimental results show that I\/O devices can be shared with remote hosts, achieving native PCIe performance. Thus, compared to existing device distribution mechanisms, SmartIO provides more efficient, low-cost resource sharing, increasing the overall system performance.<\/jats:p>","DOI":"10.1145\/3462545","type":"journal-article","created":{"date-parts":[[2021,7,8]],"date-time":"2021-07-08T13:53:05Z","timestamp":1625752385000},"page":"1-78","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["SmartIO"],"prefix":"10.1145","volume":"38","author":[{"given":"Jonas","family":"Markussen","sequence":"first","affiliation":[{"name":"Dolphin Interconnect Solutions, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lars Bj\u00f8rlykke","family":"Kristiansen","sequence":"additional","affiliation":[{"name":"Dolphin Interconnect Solutions, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P\u00e5l","family":"Halvorsen","sequence":"additional","affiliation":[{"name":"SimulaMet, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Halvor","family":"Kielland-Gyrud","sequence":"additional","affiliation":[{"name":"Dolphin Interconnect Solutions, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"H\u00e5kon Kvale","family":"Stensland","sequence":"additional","affiliation":[{"name":"Simula Research Laboratory, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carsten","family":"Griwodz","sequence":"additional","affiliation":[{"name":"University of Oslo, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,7,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Keras. [n.d.]. Retrieved from https:\/\/keras.io.  Keras. [n.d.]. Retrieved from https:\/\/keras.io."},{"key":"e_1_2_1_2_1","unstructured":"TensorFlow. [n.d.]. Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https:\/\/www.tensorflow.org\/.  TensorFlow. [n.d.]. Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https:\/\/www.tensorflow.org\/."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1535\/itj.1003.02"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304061"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201918)","author":"Aguilera Marcos K.","year":"2018","unstructured":"Marcos K. Aguilera , Nadav Amit , Irina Calciu , Xavier Deguillard , Jayneel Gandhi , Stanko Novakovi\u0107 , Arun Ramanathan , Pratap Subrahmanyam , Lalith Suresh , Kiran Tati , Rajesh Venkatasubramanian , and Michael Wei . 2018 . Remote regions: A simple abstraction for remote memory . In Proceedings of the USENIX Annual Technical Conference (ATC\u201918) . 775\u2013787. Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novakovi\u0107, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. 2018. Remote regions: A simple abstraction for remote memory. In Proceedings of the USENIX Annual Technical Conference (ATC\u201918). 775\u2013787."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Conference on Computer Systems and Software Engineering (CompEuro\u201990)","author":"Aln\u00e6s Knut","year":"1990","unstructured":"Knut Aln\u00e6s , Ernst H. Kristiansen , David B. Gustavson , and David V. James . 1990. Scalable coherent interface . In Proceedings of the International Conference on Computer Systems and Software Engineering (CompEuro\u201990) . 446\u2013453. DOI:https:\/\/doi.org\/10.1109\/CMPEUR. 1990 .113656 10.1109\/CMPEUR.1990.113656 Knut Aln\u00e6s, Ernst H. Kristiansen, David B. Gustavson, and David V. James. 1990. Scalable coherent interface. In Proceedings of the International Conference on Computer Systems and Software Engineering (CompEuro\u201990). 446\u2013453. DOI:https:\/\/doi.org\/10.1109\/CMPEUR.1990.113656"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the International Symposium on Computer Architecture(ISCA\u201910)","author":"Amit Nadav","year":"2010","unstructured":"Nadav Amit , Muli Ben-Yehuda , and Ben-Ami Yassour . 2010 . IOMMU: Strategies for mitigating the IOTLB bottleneck . In Proceedings of the International Symposium on Computer Architecture(ISCA\u201910) . Springer, 256\u2013274. DOI:https:\/\/doi.org\/10.1007\/978-3-642-24322-6_22 10.1007\/978-3-642-24322-6_22 Nadav Amit, Muli Ben-Yehuda, and Ben-Ami Yassour. 2010. IOMMU: Strategies for mitigating the IOTLB bottleneck. In Proceedings of the International Symposium on Computer Architecture(ISCA\u201910). Springer, 256\u2013274. DOI:https:\/\/doi.org\/10.1007\/978-3-642-24322-6_22"},{"key":"e_1_2_1_8_1","volume-title":"Neefe","author":"Anderson Eric A.","year":"1994","unstructured":"Eric A. Anderson and Jeanna M . Neefe . 1994 . An Exploration of Network RAM. Technical Report. EECS Department, University of California . Retrieved from https:\/\/www2.eecs.berkeley.edu\/Pubs\/TechRpts\/1998\/CSD-98-1000.pdf. Eric A. Anderson and Jeanna M. Neefe. 1994. An Exploration of Network RAM. Technical Report. EECS Department, University of California. Retrieved from https:\/\/www2.eecs.berkeley.edu\/Pubs\/TechRpts\/1998\/CSD-98-1000.pdf."},{"key":"e_1_2_1_9_1","unstructured":"Jens Axboe. [n.d.]. Flexible I\/O Tester. Retrieved from https:\/\/github.com\/axboe\/fio.  Jens Axboe. [n.d.]. Flexible I\/O Tester. Retrieved from https:\/\/github.com\/axboe\/fio."},{"key":"e_1_2_1_10_1","unstructured":"Stephen Bates. 2015. Project Donard. Retrieved from https:\/\/github.com\/sbates130272\/donard.  Stephen Bates. 2015. Project Donard. Retrieved from https:\/\/github.com\/sbates130272\/donard."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201917)","author":"Bergman Shai","year":"2017","unstructured":"Shai Bergman , Tanya Brokhman , Tzachi Cohen , and Mark Silberstein . 2017 . SPIN: Seamless operating system integration of peer-to-peer DMA between SSDs and GPUs . In Proceedings of the USENIX Annual Technical Conference (ATC\u201917) . 665\u2013676. Shai Bergman, Tanya Brokhman, Tzachi Cohen, and Mark Silberstein. 2017. SPIN: Seamless operating system integration of peer-to-peer DMA between SSDs and GPUs. In Proceedings of the USENIX Annual Technical Conference (ATC\u201917). 665\u2013676."},{"key":"#cr-split#-e_1_2_1_12_1.1","doi-asserted-by":"crossref","unstructured":"Maciej Bielski Christian Pinto Daniel Raho and Renaud Pacalet. 2016. Survey on memory and devices disaggregation solutions for HPC systems. In Proceedings of the International Conference on Computational Science and Engineering and International Conference on Embedded and Ubiquitous Computing and International Symposium on Distributed Computing and Applications for Business Engineering (CSE-EUC-DCABES'16). 197-204. DOI:https:\/\/doi.org\/10.1109\/CSE-EUC-DCABES.2016.185 10.1109\/CSE-EUC-DCABES.2016.185","DOI":"10.1109\/CSE-EUC-DCABES.2016.185"},{"key":"#cr-split#-e_1_2_1_12_1.2","doi-asserted-by":"crossref","unstructured":"Maciej Bielski Christian Pinto Daniel Raho and Renaud Pacalet. 2016. Survey on memory and devices disaggregation solutions for HPC systems. In Proceedings of the International Conference on Computational Science and Engineering and International Conference on Embedded and Ubiquitous Computing and International Symposium on Distributed Computing and Applications for Business Engineering (CSE-EUC-DCABES'16). 197-204. DOI:https:\/\/doi.org\/10.1109\/CSE-EUC-DCABES.2016.185","DOI":"10.1109\/CSE-EUC-DCABES.2016.185"},{"key":"e_1_2_1_13_1","unstructured":"Broadcom. 2011. PEX8733 PCI Express Gen 3 Switch 32 Lanes 18 Ports. Retrieved from https:\/\/docs.broadcom.com\/docs\/12351852.  Broadcom. 2011. PEX8733 PCI Express Gen 3 Switch 32 Lanes 18 Ports. Retrieved from https:\/\/docs.broadcom.com\/docs\/12351852."},{"key":"e_1_2_1_14_1","unstructured":"Broadcom. 2012. PEX8796 PCI Express Gen 3 Switch 96 Lanes 24 Ports. Retrieved from https:\/\/docs.broadcom.com\/docs\/12351860.  Broadcom. 2012. PEX8796 PCI Express Gen 3 Switch 96 Lanes 24 Ports. Retrieved from https:\/\/docs.broadcom.com\/docs\/12351860."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3149457.3149466"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML\u201913)","author":"Coates Adam","year":"2013","unstructured":"Adam Coates , Brody Huval , Tao Wang , David J. Wu , Andrew Y. Ng , and Bryan Catanzaro . 2013 . Deep learning with COTS HPC systems . In Proceedings of the International Conference on Machine Learning (ICML\u201913) . 1337\u20131345. Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng, and Bryan Catanzaro. 2013. Deep learning with COTS HPC systems. In Proceedings of the International Conference on Machine Learning (ICML\u201913). 1337\u20131345."},{"key":"e_1_2_1_17_1","unstructured":"Intel Corporation. 2015. Intel Rack Scale Design. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/architecture-and-technology\/rack-scale-design-overview.html.  Intel Corporation. 2015. Intel Rack Scale Design. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/architecture-and-technology\/rack-scale-design-overview.html."},{"key":"e_1_2_1_18_1","unstructured":"Liqid Corporation. [n.d.]. Liqid Composable Infrastructure. Retrieved from https:\/\/www.liqid.com\/.  Liqid Corporation. [n.d.]. Liqid Composable Infrastructure. Retrieved from https:\/\/www.liqid.com\/."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872887.2750415"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_21_1","unstructured":"Dolphin Interconnect Solutions. [n.d.]. SFF-8644 MiniSAS-HD PCIe Gen3 cables. Retrieved from https:\/\/www.dolphinics.com\/products\/PCI_Express_SFF-8644_cables.html.  Dolphin Interconnect Solutions. [n.d.]. SFF-8644 MiniSAS-HD PCIe Gen3 cables. Retrieved from https:\/\/www.dolphinics.com\/products\/PCI_Express_SFF-8644_cables.html."},{"key":"e_1_2_1_22_1","unstructured":"Dolphin Interconnect Solutions [n.d.]. SISCI API Documentation. Dolphin Interconnect Solutions. Retrieved from http:\/\/ww.dolphinics.no\/download\/SISCI_DOC_V2\/.  Dolphin Interconnect Solutions [n.d.]. SISCI API Documentation. Dolphin Interconnect Solutions. Retrieved from http:\/\/ww.dolphinics.no\/download\/SISCI_DOC_V2\/."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the International Conference on High Performance Computing and Simulation (HPCS\u201910)","author":"Duato Jos\u00e9","year":"2010","unstructured":"Jos\u00e9 Duato , Antonio J. Pena , Frederico Silla , Rafael Mayo , and Enrique S . Quintana-Ort\u00ed. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters . In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS\u201910) . 224\u2013231. DOI:https:\/\/doi.org\/10.1109\/HPCS. 2010 .5547126 10.1109\/HPCS.2010.5547126 Jos\u00e9 Duato, Antonio J. Pena, Frederico Silla, Rafael Mayo, and Enrique S. Quintana-Ort\u00ed. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS\u201910). 224\u2013231. DOI:https:\/\/doi.org\/10.1109\/HPCS.2010.5547126"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the ACM Symposium on Operating Systems Principles (SOSP\u201995)","author":"Feeley Michael J.","unstructured":"Michael J. Feeley , William E. Morgan , Frederic H. Pighin , Anna R. Karlin , and Henry M. Levy . 1995. Implementing global memory management in a workstation cluster . In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP\u201995) . 201\u2013212. DOI:https:\/\/doi.org\/10.1145\/224056.224072 10.1145\/224056.224072 Michael J. Feeley, William E. Morgan, Frederic H. Pighin, Anna R. Karlin, and Henry M. Levy. 1995. Implementing global memory management in a workstation cluster. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP\u201995). 201\u2013212. DOI:https:\/\/doi.org\/10.1145\/224056.224072"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the International Conference on Accelerator 8 Large Experimental Physics Control Systems(ICALEPCS\u201905)","author":"Fountain Trevor","year":"2005","unstructured":"Trevor Fountain , Alexandra McCarthy , and Fangfang Peng . 2005 . PCI express: An overview of PCI express, cabled PCI express and PXI express . In Proceedings of the International Conference on Accelerator 8 Large Experimental Physics Control Systems(ICALEPCS\u201905) . Trevor Fountain, Alexandra McCarthy, and Fangfang Peng. 2005. PCI express: An overview of PCI express, cabled PCI express and PXI express. In Proceedings of the International Conference on Accelerator 8 Large Experimental Physics Control Systems(ICALEPCS\u201905)."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI\u201917)","author":"Gu Juncheng","unstructured":"Juncheng Gu , Youngmoon Lee , Yiwen Zhang , Mosharaf Chowdury , and Kang G. Shin . 2017. Efficient memory disaggregation with INFINISWAP . In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI\u201917) . 649\u2013667. Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdury, and Kang G. Shin. 2017. Efficient memory disaggregation with INFINISWAP. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI\u201917). 649\u2013667."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCEM48484.2019.000-5"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078468.3078483"},{"key":"e_1_2_1_29_1","first-page":"4","article-title":"Performance characterization of NVMe-over-fabrics storage disaggregation","volume":"14","author":"Guz Zvika","year":"2018","unstructured":"Zvika Guz , Harry Li , Anahita Shayesteh , and Vijay Balkrishnan . 2018 . Performance characterization of NVMe-over-fabrics storage disaggregation . ACM Trans. Stor. 14 , 4 (Dec. 2018), 1\u201318. DOI:https:\/\/doi.org\/10.1145\/3239563 10.1145\/3239563 Zvika Guz, Harry Li, Anahita Shayesteh, and Vijay Balkrishnan. 2018. Performance characterization of NVMe-over-fabrics storage disaggregation. ACM Trans. Stor. 14, 4 (Dec. 2018), 1\u201318. DOI:https:\/\/doi.org\/10.1145\/3239563","journal-title":"ACM Trans. Stor."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CBMS.2018.00070"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201913)","author":"Hou Rui","year":"2013","unstructured":"Rui Hou , Tao Jiang , Liuhang Zhang , Pengfei Qi , Jianbo Dong , Haibin Wang , Xiongli Gu , and Shujie Zhang . 2013 . Cost effective data center servers . In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201913) . 179\u2013187. DOI:https:\/\/doi.org\/10.1109\/HPCA.2013.6522317 10.1109\/HPCA.2013.6522317 Rui Hou, Tao Jiang, Liuhang Zhang, Pengfei Qi, Jianbo Dong, Haibin Wang, Xiongli Gu, and Shujie Zhang. 2013. Cost effective data center servers. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201913). 179\u2013187. DOI:https:\/\/doi.org\/10.1109\/HPCA.2013.6522317"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS\u201912)","author":"Huang Jian","year":"2012","unstructured":"Jian Huang , Xiangyong Ouyang , Jithin Jose , Md Wasi-Ur-Rahman , Hao Wang , Miao Luo , Hari Subramoni , Chet Murthy , and Dhabaleswar K. Panda . 2012. High-performance design of hbase with RDMA over InfiniBand . In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS\u201912) . 774\u2013785. DOI:https:\/\/doi.org\/10.1109\/IPDPS. 2012 .74 10.1109\/IPDPS.2012.74 Jian Huang, Xiangyong Ouyang, Jithin Jose, Md Wasi-Ur-Rahman, Hao Wang, Miao Luo, Hari Subramoni, Chet Murthy, and Dhabaleswar K. Panda. 2012. High-performance design of hbase with RDMA over InfiniBand. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS\u201912). 774\u2013785. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2012.74"},{"key":"e_1_2_1_33_1","unstructured":"Neo Jia and Kirti Wankhede. 2016. VFIO Mediated Devices. Retrieved from https:\/\/www.kernel.org\/doc\/Documentation\/vfio-mediated-device.txt.  Neo Jia and Kirti Wankhede. 2016. VFIO Mediated Devices. Retrieved from https:\/\/www.kernel.org\/doc\/Documentation\/vfio-mediated-device.txt."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Symposium on Cluster Computing and the Grid (CCGrid\u201904)","author":"Jiang Weihang","year":"2004","unstructured":"Weihang Jiang , Jiuxing Liu , Hyun-Wook Jin , Dhabaleswar K. Panda , William Gropp , and Rajeev Thakur . 2004 . High performance MPI-2 one-sided communication over InfiniBand . In Proceedings of the International Symposium on Cluster Computing and the Grid (CCGrid\u201904) . 531\u2013538. DOI:https:\/\/doi.org\/10.1109\/CCGrid.2004.1336648 10.1109\/CCGrid.2004.1336648 Weihang Jiang, Jiuxing Liu, Hyun-Wook Jin, Dhabaleswar K. Panda, William Gropp, and Rajeev Thakur. 2004. High performance MPI-2 one-sided communication over InfiniBand. In Proceedings of the International Symposium on Cluster Computing and the Grid (CCGrid\u201904). 531\u2013538. DOI:https:\/\/doi.org\/10.1109\/CCGrid.2004.1336648"},{"key":"e_1_2_1_35_1","unstructured":"Linux kernel development community. [n.d.]. NTB Drivers. Retrieved from https:\/\/www.kernel.org\/doc\/html\/latest\/driver-api\/ntb.html.  Linux kernel development community. [n.d.]. NTB Drivers. Retrieved from https:\/\/www.kernel.org\/doc\/html\/latest\/driver-api\/ntb.html."},{"key":"e_1_2_1_36_1","unstructured":"Linux kernel development community. 2013. Linux Filesystems API. Retrieved from https:\/\/www.kernel.org\/doc\/htmldocs\/filesystems\/index.html.  Linux kernel development community. 2013. Linux Filesystems API. Retrieved from https:\/\/www.kernel.org\/doc\/htmldocs\/filesystems\/index.html."},{"key":"e_1_2_1_37_1","unstructured":"Linux kernel development community. 2013. VFIO\u2014\u201cVirtual Function I\/O.\u201d Retrieved from https:\/\/www.kernel.org\/doc\/Documentation\/vfio.txt.  Linux kernel development community. 2013. VFIO\u2014\u201cVirtual Function I\/O.\u201d Retrieved from https:\/\/www.kernel.org\/doc\/Documentation\/vfio.txt."},{"key":"e_1_2_1_38_1","unstructured":"Linux kernel development community. 2019. Linux IOMMU Support. Retrieved from https:\/\/www.kernel.org\/doc\/Documentation\/Intel-IOMMU.txt.  Linux kernel development community. 2019. Linux IOMMU Support. Retrieved from https:\/\/www.kernel.org\/doc\/Documentation\/Intel-IOMMU.txt."},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201916)","author":"Kim Hyeong-Jun","year":"2016","unstructured":"Hyeong-Jun Kim , Young-Sik Lee , and Jin-Soo Kim . 2016 . NVMeDirect: A user-space I\/O framework for application-specific optimization on NVMe SSDs . In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201916) . 41\u201345. Hyeong-Jun Kim, Young-Sik Lee, and Jin-Soo Kim. 2016. NVMeDirect: A user-space I\/O framework for application-specific optimization on NVMe SSDs. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201916). 41\u201345."},{"key":"e_1_2_1_40_1","unstructured":"KaiGai Kohei. 2016. GpuScan + SSD-to-GPUDirect DMA. Retrieved from https:\/\/kaigai.hatenablog.com\/entry\/2016\/09\/08\/003556.  KaiGai Kohei. 2016. GpuScan + SSD-to-GPUDirect DMA. Retrieved from https:\/\/kaigai.hatenablog.com\/entry\/2016\/09\/08\/003556."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2910642.2910650"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the IEEE International Conference on Cluster Computing (Cluster\u201905)","author":"Liang Shuang","year":"2005","unstructured":"Shuang Liang , Ranjit Noronha , and Dhabaleswar K. Panda . 2005. Swapping to remote memory over Infiniband: An approach using a high performance network block device . In Proceedings of the IEEE International Conference on Cluster Computing (Cluster\u201905) . 1\u201310. DOI:https:\/\/doi.org\/10.1109\/CLUSTR. 2005 .347050 10.1109\/CLUSTR.2005.347050 Shuang Liang, Ranjit Noronha, and Dhabaleswar K. Panda. 2005. Swapping to remote memory over Infiniband: An approach using a high performance network block device. In Proceedings of the IEEE International Conference on Cluster Computing (Cluster\u201905). 1\u201310. DOI:https:\/\/doi.org\/10.1109\/CLUSTR.2005.347050"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the the Annual International Symposium on Computer Architecture(ISCA\u201909)","author":"Lim Kevin","unstructured":"Kevin Lim , Jichuan Chang , Trevor Mudge , Parthasarathy Ranganathan , Steven K. Reinhardt , and Thomas F. Wenisch . 2009. Disaggregated memory for expansion and sharing in blade servers . In Proceedings of the the Annual International Symposium on Computer Architecture(ISCA\u201909) . 267\u2013278. DOI:https:\/\/doi.org\/10.1145\/1555754.1555789 10.1145\/1555754.1555789 Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. In Proceedings of the the Annual International Symposium on Computer Architecture(ISCA\u201909). 267\u2013278. DOI:https:\/\/doi.org\/10.1145\/1555754.1555789"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2019.00104"},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the International Conference on Parallel Processing (ICPP\u201913)","author":"Lu Xiaoyi","year":"2013","unstructured":"Xiaoyi Lu , Nusrat S. Islam , Md. Wasi-Ur-Rahman , Jithin Jose , Hari Subramoni , Hao Wang , and Dhabaleswar K. Panda . 2013. High-performance design of Hadoop RPC with RDMA over InfiniBand . In Proceedings of the International Conference on Parallel Processing (ICPP\u201913) . 641\u2013650. DOI:https:\/\/doi.org\/10.1109\/ICPP. 2013 .78 10.1109\/ICPP.2013.78 Xiaoyi Lu, Nusrat S. Islam, Md. Wasi-Ur-Rahman, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. Panda. 2013. High-performance design of Hadoop RPC with RDMA over InfiniBand. In Proceedings of the International Conference on Parallel Processing (ICPP\u201913). 641\u2013650. DOI:https:\/\/doi.org\/10.1109\/ICPP.2013.78"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201996)","author":"Evangelos","unstructured":"Evangelos P. Markatos and George Dramitinos. 1996. Implementation of a reliable remote memory pager . In Proceedings of the USENIX Annual Technical Conference (ATC\u201996) . Evangelos P. Markatos and George Dramitinos. 1996. Implementation of a reliable remote memory pager. In Proceedings of the USENIX Annual Technical Conference (ATC\u201996)."},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the Network and Distributed System Security Symposium (NDSS\u201919)","author":"Markettos Athanasios Theodore","year":"2019","unstructured":"Athanasios Theodore Markettos , Colin Rothwell , Brett F. Gutstein , Allison Pearce , Peter G. Neumann , Simon W. Moore , and Robert N. M. Watson . 2019. Thunderclap: Exploring vulnerabilities in operating system IOMMU protection via DMA from untrustworthy peripherals . In Proceedings of the Network and Distributed System Security Symposium (NDSS\u201919) . DOI:https:\/\/doi.org\/10.14722\/ndss. 2019 .23194 10.14722\/ndss.2019.23194 Athanasios Theodore Markettos, Colin Rothwell, Brett F. Gutstein, Allison Pearce, Peter G. Neumann, Simon W. Moore, and Robert N. M. Watson. 2019. Thunderclap: Exploring vulnerabilities in operating system IOMMU protection via DMA from untrustworthy peripherals. In Proceedings of the Network and Distributed System Security Symposium (NDSS\u201919). DOI:https:\/\/doi.org\/10.14722\/ndss.2019.23194"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-019-02988-0"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3229710.3229759"},{"key":"e_1_2_1_50_1","unstructured":"Vijay Meduri. 2011. A Case for PCI Express as a High-Performance Cluster Interconnect. Retrieved from https:\/\/www.hpcwire.com\/2011\/01\/24\/a_case_for_pci_express_as_a_high-performance_cluster_interconnect\/.  Vijay Meduri. 2011. A Case for PCI Express as a High-Performance Cluster Interconnect. Retrieved from https:\/\/www.hpcwire.com\/2011\/01\/24\/a_case_for_pci_express_as_a_high-performance_cluster_interconnect\/."},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the Linux Symposium. 71\u201385","author":"Muli Ben-Yehuda","year":"2006","unstructured":"Ben-Yehuda Muli , Jon Mason , Orran Krieger , Jimi Xenidis , Leendert Van Doorn , Asit Mallick , Jun Nakijima , and Elsie Wahlig . 2006 . Utilizing IOMMUs for virtualization in Linux and Xen . In Proceedings of the Linux Symposium. 71\u201385 . Ben-Yehuda Muli, Jon Mason, Orran Krieger, Jimi Xenidis, Leendert Van Doorn, Asit Mallick, Jun Nakijima, and Elsie Wahlig. 2006. Utilizing IOMMUs for virtualization in Linux and Xen. In Proceedings of the Linux Symposium. 71\u201385."},{"key":"e_1_2_1_53_1","volume-title":"GPUDirect RDMA Documentation","author":"NVIDIA Corporation 2019.","unstructured":"NVIDIA Corporation 2019. GPUDirect RDMA Documentation . NVIDIA Corporation . Retrieved from https:\/\/docs.nvidia.com\/cuda\/gpudirect-rdma\/index.html. NVIDIA Corporation 2019. GPUDirect RDMA Documentation. NVIDIA Corporation. Retrieved from https:\/\/docs.nvidia.com\/cuda\/gpudirect-rdma\/index.html."},{"key":"e_1_2_1_54_1","volume-title":"CUDA Toolkit Documentation v11.0.171","author":"NVIDIA Corporation 2020.","unstructured":"NVIDIA Corporation 2020. CUDA Toolkit Documentation v11.0.171 . NVIDIA Corporation . Retrieved from http:\/\/docs.nvidia.com\/cuda\/. NVIDIA Corporation 2020. CUDA Toolkit Documentation v11.0.171. NVIDIA Corporation. Retrieved from http:\/\/docs.nvidia.com\/cuda\/."},{"key":"e_1_2_1_55_1","volume-title":"NVM Express Base Specification","author":"NVM","year":"2019","unstructured":"NVM Express 2019. NVM Express Base Specification . NVM Express. Retrieved from https:\/\/nvmexpress.org\/wp-content\/uploads\/NVM-Express-1_3d- 2019 .03.20-Ratified.pdf. NVM Express 2019. NVM Express Base Specification. NVM Express. Retrieved from https:\/\/nvmexpress.org\/wp-content\/uploads\/NVM-Express-1_3d-2019.03.20-Ratified.pdf."},{"key":"e_1_2_1_56_1","volume-title":"NVM Express Over Fabrics","author":"NVM","year":"2019","unstructured":"NVM Express 2019. NVM Express Over Fabrics . NVM Express. Retrieved from https:\/\/nvmexpress.org\/wp-content\/uploads\/NVMe-over-Fabrics-1.1- 2019 .10.22-Ratified.pdf. NVM Express 2019. NVM Express Over Fabrics. NVM Express. Retrieved from https:\/\/nvmexpress.org\/wp-content\/uploads\/NVMe-over-Fabrics-1.1-2019.10.22-Ratified.pdf."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201918)","author":"Peng Bo","year":"2018","unstructured":"Bo Peng , Haozhong Zhang , Jianguo Yao , Yaozu Dong , Yu Xu , and Haibing Guan . 2018 . MDev-NVMe: A NVMe storage virtualization solution with mediated pass-through . In Proceedings of the USENIX Annual Technical Conference (ATC\u201918) . 665\u2013676. Bo Peng, Haozhong Zhang, Jianguo Yao, Yaozu Dong, Yu Xu, and Haibing Guan. 2018. MDev-NVMe: A NVMe storage virtualization solution with mediated pass-through. In Proceedings of the USENIX Annual Technical Conference (ATC\u201918). 665\u2013676."},{"key":"e_1_2_1_59_1","unstructured":"Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2008. Multi-root I\/O Virtualization and Sharing Specification. Peripheral Component Interconnect Special Interest Group (PCI-SIG). Retrieved from https:\/\/www.pcisig.com\/specifications\/iov\/multi-root\/.  Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2008. Multi-root I\/O Virtualization and Sharing Specification. Peripheral Component Interconnect Special Interest Group (PCI-SIG). Retrieved from https:\/\/www.pcisig.com\/specifications\/iov\/multi-root\/."},{"key":"e_1_2_1_60_1","volume-title":"Address Translation Services Revision 1.1","author":"Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2009.","unstructured":"Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2009. Address Translation Services Revision 1.1 . Peripheral Component Interconnect Special Interest Group (PCI-SIG) . Retrieved from https:\/\/www.pcisig.com\/specifications\/iov\/ats\/. Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2009. Address Translation Services Revision 1.1. Peripheral Component Interconnect Special Interest Group (PCI-SIG). Retrieved from https:\/\/www.pcisig.com\/specifications\/iov\/ats\/."},{"key":"e_1_2_1_61_1","unstructured":"Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2010. PCI Express 3.1 Base Specification. Peripheral Component Interconnect Special Interest Group (PCI-SIG). Retrieved from https:\/\/pcisig.com\/specifications.  Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2010. PCI Express 3.1 Base Specification. Peripheral Component Interconnect Special Interest Group (PCI-SIG). Retrieved from https:\/\/pcisig.com\/specifications."},{"key":"e_1_2_1_62_1","unstructured":"Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2010. Single-root I\/O Virtualization and Sharing Specification. Peripheral Component Interconnect Special Interest Group (PCI-SIG). Retrieved from https:\/\/www.pcisig.com\/specifications\/iov\/single-root\/.  Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2010. Single-root I\/O Virtualization and Sharing Specification. Peripheral Component Interconnect Special Interest Group (PCI-SIG). Retrieved from https:\/\/www.pcisig.com\/specifications\/iov\/single-root\/."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CBMS.2018.00073"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3083187.3083212"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-017-4989-y"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2910017.2910636"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/SYSTEMS.2008.4519048"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2017.42"},{"key":"e_1_2_1_69_1","unstructured":"Jack Regula. 2004. Using Non-Transparent Bridging in PCI Express Systems. Whitepaper. PLX Technology\/Broadcom. Retrieved from https:\/\/www.digikey.no\/no\/pdf\/b\/broadcom\/using-non-transparent-bridging-pci.  Jack Regula. 2004. Using Non-Transparent Bridging in PCI Express Systems. Whitepaper. PLX Technology\/Broadcom. Retrieved from https:\/\/www.digikey.no\/no\/pdf\/b\/broadcom\/using-non-transparent-bridging-pci."},{"key":"e_1_2_1_70_1","unstructured":"Davide Rosetti. 2014. Benchmarking GPUDirect RDMA on Modern Server Platforms. Retrieved from https:\/\/developer.nvidia.com\/blog\/benchmarking-gpudirect-rdma-on-modern-server-platforms\/.  Davide Rosetti. 2014. Benchmarking GPUDirect RDMA on Modern Server Platforms. Retrieved from https:\/\/developer.nvidia.com\/blog\/benchmarking-gpudirect-rdma-on-modern-server-platforms\/."},{"key":"e_1_2_1_71_1","volume-title":"Persistent memory programming. USENIX","author":"Rudoff Andy","year":"2017","unstructured":"Andy Rudoff . 2017. Persistent memory programming. USENIX ; login: 42, 2 ( 2017 ), 34\u201340. Retrieved from https:\/\/www.usenix.org\/system\/files\/login\/articles\/login_summer17_07_rudoff.pdf. Andy Rudoff. 2017. Persistent memory programming. USENIX; login: 42, 2 (2017), 34\u201340. Retrieved from https:\/\/www.usenix.org\/system\/files\/login\/articles\/login_summer17_07_rudoff.pdf."},{"key":"e_1_2_1_72_1","unstructured":"Kazuo Saito Koji Anai Keiju Igarashi Takeshi Nishikawa Ryoichi Himeno and Kazuhiro Yoguchi. 1998. ATM bus system. U.S. patent No. 5 796 741 A.  Kazuo Saito Koji Anai Keiju Igarashi Takeshi Nishikawa Ryoichi Himeno and Kazuhiro Yoguchi. 1998. ATM bus system. U.S. patent No. 5 796 741 A."},{"key":"e_1_2_1_73_1","unstructured":"Nikolay Sakharnykh. 2016. Beyond GPU Memory Limits with Unified Memory on Pascal. Retrieved from https:\/\/developer.nvidia.com\/blog\/beyond-gpu-memory-limits-unified-memory-pascal\/.  Nikolay Sakharnykh. 2016. Beyond GPU Memory Limits with Unified Memory on Pascal. Retrieved from https:\/\/developer.nvidia.com\/blog\/beyond-gpu-memory-limits-unified-memory-pascal\/."},{"key":"e_1_2_1_74_1","volume-title":"Proceedings of the Conference on Operating Systems Design and Implementation (OSDI\u201918)","author":"Shan Yizhou","year":"2018","unstructured":"Yizhou Shan , Yutong Huang , Yilun Chen , and Yiying Zhang . 2018 . LegoOS: A disseminated, distributed OS for hardware resource disaggregation . In Proceedings of the Conference on Operating Systems Design and Implementation (OSDI\u201918) . 69\u201387. Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. LegoOS: A disseminated, distributed OS for hardware resource disaggregation. In Proceedings of the Conference on Operating Systems Design and Implementation (OSDI\u201918). 69\u201387."},{"key":"e_1_2_1_75_1","volume-title":"Design and implementation of initial OpenSHMEM on PCIe NTB based cloud computing. Cluster Comput. 22 (Feb","author":"Shim Cheol","year":"2018","unstructured":"Cheol Shim , Kwang-Ho Cha , and Min Choi . 2018. Design and implementation of initial OpenSHMEM on PCIe NTB based cloud computing. Cluster Comput. 22 (Feb . 2018 ), 1815\u20131826. DOI:https:\/\/doi.org\/10.1007\/s10586-018-1707-0 10.1007\/s10586-018-1707-0 Cheol Shim, Kwang-Ho Cha, and Min Choi. 2018. Design and implementation of initial OpenSHMEM on PCIe NTB based cloud computing. Cluster Comput. 22 (Feb. 2018), 1815\u20131826. DOI:https:\/\/doi.org\/10.1007\/s10586-018-1707-0"},{"key":"e_1_2_1_76_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from https:\/\/arXiv:1409.1556.  Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from https:\/\/arXiv:1409.1556."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2015.2513759"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/HOTI.2010.21"},{"key":"e_1_2_1_80_1","volume-title":"Future cloud system designs: Challenges and research directions","author":"Taherkordi Amir","year":"2018","unstructured":"Amir Taherkordi , Feroz Zahid , Yiannis Verginadis , and Geir Horn . 2018. Future cloud system designs: Challenges and research directions . IEEE Access 6 ( 2018 ). DOI:https:\/\/doi.org\/10.1109\/ACCESS.2018.2883149 10.1109\/ACCESS.2018.2883149 Amir Taherkordi, Feroz Zahid, Yiannis Verginadis, and Geir Horn. 2018. Future cloud system designs: Challenges and research directions. IEEE Access 6 (2018). DOI:https:\/\/doi.org\/10.1109\/ACCESS.2018.2883149"},{"key":"e_1_2_1_81_1","unstructured":"Mellanox Technologies. [n.d.]. ConnectX-5 EN Single\/Dual-Port Adapter Supporting 100Gb\/s Ethernet. Retrieved from https:\/\/www.mellanox.com\/products\/ethernet-adapters\/connectx-5-en.  Mellanox Technologies. [n.d.]. ConnectX-5 EN Single\/Dual-Port Adapter Supporting 100Gb\/s Ethernet. Retrieved from https:\/\/www.mellanox.com\/products\/ethernet-adapters\/connectx-5-en."},{"key":"e_1_2_1_82_1","unstructured":"PLX Technologies. 2005. Multi-Host System and Intelligent I\/O Design with PCI Express. Whitepaper. PLX Technology\/Broadcom. Retrieved from https:\/\/docs.broadcom.com\/docs-and-downloads\/pdf\/technical\/expresslane\/NTB_Brief_April-05.pdf.  PLX Technologies. 2005. Multi-Host System and Intelligent I\/O Design with PCI Express. Whitepaper. PLX Technology\/Broadcom. Retrieved from https:\/\/docs.broadcom.com\/docs-and-downloads\/pdf\/technical\/expresslane\/NTB_Brief_April-05.pdf."},{"key":"e_1_2_1_83_1","volume-title":"Newburn","author":"Thompson Adam","year":"2019","unstructured":"Adam Thompson and Chris J . Newburn . 2019 . GPUDirect Storage: A Direct Path Between Storage and GPU Memory. Retrieved from https:\/\/developer.nvidia.com\/blog\/gpudirect-storage\/. Adam Thompson and Chris J. Newburn. 2019. GPUDirect Storage: A Direct Path Between Storage and GPU Memory. Retrieved from https:\/\/developer.nvidia.com\/blog\/gpudirect-storage\/."},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/2103799.2103820"},{"key":"e_1_2_1_85_1","volume-title":"Proceedings of the Workshop on Hot Topics in Cloud Computing (HotCloud\u201919)","author":"Tsai Shin-Yeh","year":"2019","unstructured":"Shin-Yeh Tsai and Yiying Zhang . 2019 . A double-edged sword: Security threats and opportunities in one-sided network communication . In Proceedings of the Workshop on Hot Topics in Cloud Computing (HotCloud\u201919) . Shin-Yeh Tsai and Yiying Zhang. 2019. A double-edged sword: Security threats and opportunities in one-sided network communication. In Proceedings of the Workshop on Hot Topics in Cloud Computing (HotCloud\u201919)."},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211890.3211895"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485932"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/2658260.2658262"},{"key":"e_1_2_1_90_1","volume-title":"Proceedings of the International Conference on Parallel Processing (ICPP\u201917)","author":"Venkatesh Akshay","year":"2017","unstructured":"Akshay Venkatesh , Khaled Hamidouche , Sreeram Potluri , Davide Rosettig , Ching-Hsiang Chu , and Dhabaleswar K. Panda . 2017. MPI-GDS: High performance MPI designs with GPUDirect-aSync for CPU-GPU control flow decoupling . In Proceedings of the International Conference on Parallel Processing (ICPP\u201917) . 151\u2013160. DOI:https:\/\/doi.org\/10.1109\/ICPP. 2017 .24 10.1109\/ICPP.2017.24 Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosettig, Ching-Hsiang Chu, and Dhabaleswar K. Panda. 2017. MPI-GDS: High performance MPI designs with GPUDirect-aSync for CPU-GPU control flow decoupling. In Proceedings of the International Conference on Parallel Processing (ICPP\u201917). 151\u2013160. DOI:https:\/\/doi.org\/10.1109\/ICPP.2017.24"},{"key":"e_1_2_1_91_1","volume-title":"Proceedings of the International Conference on High Performance Computing (HiPC\u201914)","author":"Venkatesh Akshay","year":"2014","unstructured":"Akshay Venkatesh , Hari Subramoni , Khaled Hamidouche , and Dhabaleswar K. Panda . 2014. A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters . In Proceedings of the International Conference on High Performance Computing (HiPC\u201914) . 1\u201310. DOI:https:\/\/doi.org\/10.1109\/HiPC. 2014 .7116875 10.1109\/HiPC.2014.7116875 Akshay Venkatesh, Hari Subramoni, Khaled Hamidouche, and Dhabaleswar K. Panda. 2014. A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters. In Proceedings of the International Conference on High Performance Computing (HiPC\u201914). 1\u201310. DOI:https:\/\/doi.org\/10.1109\/HiPC.2014.7116875"},{"key":"e_1_2_1_92_1","volume-title":"PCI Express Multi-Root Switch Reconfiguration During System Operation. Master\u2019s thesis","author":"Wong Heymian","unstructured":"Heymian Wong . 2011. PCI Express Multi-Root Switch Reconfiguration During System Operation. Master\u2019s thesis . Massachusetts Institute of Technology . Heymian Wong. 2011. PCI Express Multi-Root Switch Reconfiguration During System Operation. Master\u2019s thesis. Massachusetts Institute of Technology."},{"key":"e_1_2_1_93_1","volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201920)","author":"Yang Jian","year":"2020","unstructured":"Jian Yang , Juno Kim , Morteza Hoseinzadeh , Joseph Izraelevitz , and Steve Swanson . 2020 . An empirical guide to the behavior and use of scalable persistent memory . In Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201920) . 169\u2013182. Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An empirical guide to the behavior and use of scalable persistent memory. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST\u201920). 169\u2013182."},{"key":"e_1_2_1_94_1","volume-title":"Proceedings of the International Conference on Cloud Computing Technology and Science (CloudCom\u201917)","author":"Yang Ziye","year":"2017","unstructured":"Ziye Yang , James R. Harris , Benjamin Walker , Daniel Verkamp , Changpeng Liu , Cunyin Chang , Gang Cao , Jonathan Stern , Vishal Verma , and Luse E. Paul . 2017. SPDK: A development kit to build high performance storage applications . In Proceedings of the International Conference on Cloud Computing Technology and Science (CloudCom\u201917) . 154\u2013161. DOI:https:\/\/doi.org\/10.1109\/CloudCom. 2017 .14 10.1109\/CloudCom.2017.14 Ziye Yang, James R. Harris, Benjamin Walker, Daniel Verkamp, Changpeng Liu, Cunyin Chang, Gang Cao, Jonathan Stern, Vishal Verma, and Luse E. Paul. 2017. SPDK: A development kit to build high performance storage applications. In Proceedings of the International Conference on Cloud Computing Technology and Science (CloudCom\u201917). 154\u2013161. DOI:https:\/\/doi.org\/10.1109\/CloudCom.2017.14"},{"key":"e_1_2_1_95_1","volume-title":"NTB: Add support for AMD PCI-Express Non-Transparent Bridge.","author":"Yu Xiangliang","year":"2016","unstructured":"Xiangliang Yu . 2016 . NTB: Add support for AMD PCI-Express Non-Transparent Bridge. Retrieved from https:\/\/lwn.net\/Articles\/672752\/. Xiangliang Yu. 2016. NTB: Add support for AMD PCI-Express Non-Transparent Bridge. Retrieved from https:\/\/lwn.net\/Articles\/672752\/."}],"container-title":["ACM Transactions on Computer Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462545","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3462545","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:19:02Z","timestamp":1750191542000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462545"}},"subtitle":["Zero-overhead Device Sharing through PCIe Networking"],"short-title":[],"issued":{"date-parts":[[2020,5,31]]},"references-count":93,"journal-issue":{"issue":"1-2","published-print":{"date-parts":[[2020,5,31]]}},"alternative-id":["10.1145\/3462545"],"URL":"https:\/\/doi.org\/10.1145\/3462545","relation":{},"ISSN":["0734-2071","1557-7333"],"issn-type":[{"value":"0734-2071","type":"print"},{"value":"1557-7333","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,31]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}