{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T05:51:49Z","timestamp":1762321909688,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA","license":[{"start":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T00:00:00Z","timestamp":1634256000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/R006865\/1, EP\/L016796\/1"],"award-info":[{"award-number":["EP\/R006865\/1, EP\/L016796\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2021,10,20]]},"abstract":"<jats:p>Heterogeneous CPU\/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory, are becoming popular in several computing sectors. In this paper, we study the shared-memory semantics of these devices, with a view to providing a firm foundation for reasoning about the programs that run on them. Our focus is on Intel platforms that combine an Intel FPGA with a multicore Xeon CPU. We describe the weak-memory behaviours that are allowed (and observable) on these devices when CPU threads and an FPGA thread access common memory locations in a fine-grained manner through multiple channels. Some of these behaviours are familiar from well-studied CPU and GPU concurrency; others are weaker still. We encode these behaviours in two formal memory models: one operational, one axiomatic. We develop executable implementations of both models, using the CBMC bounded model-checking tool for our operational model and the Alloy modelling language for our axiomatic model. Using these, we cross-check our models against each other via a translator that converts Alloy-generated executions into queries for the CBMC model. We also validate our models against actual hardware by translating 583 Alloy-generated executions into litmus tests that we run on CPU\/FPGA devices; when doing this, we avoid the prohibitive cost of synthesising a hardware design per litmus test by creating our own 'litmus-test processor' in hardware. We expect that our models will be useful for low-level programmers, compiler writers, and designers of analysis tools. Indeed, as a demonstration of the utility of our work, we use our operational model to reason about a producer\/consumer buffer implemented across the CPU and the FPGA. When the buffer uses insufficient synchronisation -- a situation that our model is able to detect -- we observe that its performance improves at the cost of occasional data corruption.<\/jats:p>","DOI":"10.1145\/3485497","type":"journal-article","created":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T19:18:28Z","timestamp":1634325508000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["The semantics of shared memory in Intel CPU\/FPGA systems"],"prefix":"10.1145","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2313-7910","authenticated-orcid":false,"given":"Dan","family":"Iorga","sequence":"first","affiliation":[{"name":"Imperial College London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7448-7961","authenticated-orcid":false,"given":"Alastair F.","family":"Donaldson","sequence":"additional","affiliation":[{"name":"Imperial College London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1646-7935","authenticated-orcid":false,"given":"Tyler","family":"Sorensen","sequence":"additional","affiliation":[{"name":"University of California at Santa Cruz, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6735-5533","authenticated-orcid":false,"given":"John","family":"Wickerson","sequence":"additional","affiliation":[{"name":"Imperial College London, UK"}]}],"member":"320","published-online":{"date-parts":[[2021,10,15]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2015.2469113"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694391"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19835-9_5"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2627752"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2019.00037"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3294054"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24730-2_15"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/AHS.2013.6604241"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851194"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2019.00-24"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2705069"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3282307"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541981"},{"volume-title":"A Uniform Specification for System-on-Chip (SoC) Verification. CoRR, abs\/1801.01114","year":"2018","author":"Huang Bo-Yuan","key":"e_1_2_2_14_1"},{"key":"e_1_2_2_15_1","unstructured":"Intel. 2019. Intel Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual. https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/manual\/mnl-ias-ccip.pdf Version 2019.11.04.  Intel. 2019. Intel Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual. https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/manual\/mnl-ias-ccip.pdf Version 2019.11.04."},{"key":"e_1_2_2_16_1","unstructured":"Intel. 2021. Intel Academic Compute Environment. https:\/\/wiki.intel-research.net\/  Intel. 2021. Intel Academic Compute Environment. https:\/\/wiki.intel-research.net\/"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.5468873"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS48715.2020.000-6"},{"volume-title":"Software Abstractions: Logic, Language, and Analysis","year":"2012","author":"Jackson Daniel","key":"e_1_2_2_19_1"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3428294"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1979.1675439"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304043"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872887.2750378"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037723"},{"volume-title":"Prasanna","year":"2020","author":"Meng Yuan","key":"e_1_2_2_25_1"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174258"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ReConFig.2011.4"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03359-9_27"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3158107"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314624"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2086696.2086713"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847343"},{"key":"e_1_2_2_33_1","unstructured":"Martin C. Rinard. 2012. Unsynchronized Techniques for Approximate Parallel Computing. In RACES@SPLASH. ACM. https:\/\/people.csail.mit.edu\/rinard\/paper\/races12.unsynchronized.pdf  Martin C. Rinard. 2012. Unsynchronized Techniques for Approximate Parallel Computing. In RACES@SPLASH. ACM. https:\/\/people.csail.mit.edu\/rinard\/paper\/races12.unsynchronized.pdf"},{"key":"e_1_2_2_34_1","unstructured":"Karl Rupp. 2015. 40 Years of Microprocessor Trend Data. https:\/\/www.karlrupp.net\/2015\/06\/40-years-of-microprocessor-trend-data  Karl Rupp. 2015. 40 Years of Microprocessor Trend Data. https:\/\/www.karlrupp.net\/2015\/06\/40-years-of-microprocessor-trend-data"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993520"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2908080.2908114"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3022671.2984032"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2014.2380198"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.2312\/EGGH\/HPG10\/029-037"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00028"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2017.8280127"},{"key":"e_1_2_2_42_1","unstructured":"Xilinx. 2018. Accelerating DNNs with Xilinx Alveo Accelerator Cards. https:\/\/www.xilinx.com\/support\/documentation\/white_papers\/wp504-accel-dnns.pdf  Xilinx. 2018. Accelerating DNNs with Xilinx Alveo Accelerator Cards. https:\/\/www.xilinx.com\/support\/documentation\/white_papers\/wp504-accel-dnns.pdf"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2014.43"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2016.117"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.23919\/FMCAD.2018.8603015"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2017.25"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485497","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3485497","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:40Z","timestamp":1750191520000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485497"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,15]]},"references-count":46,"journal-issue":{"issue":"OOPSLA","published-print":{"date-parts":[[2021,10,20]]}},"alternative-id":["10.1145\/3485497"],"URL":"https:\/\/doi.org\/10.1145\/3485497","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2021,10,15]]},"assertion":[{"value":"2021-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}