{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:48:21Z","timestamp":1760233701801,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,2,7]],"date-time":"2021-02-07T00:00:00Z","timestamp":1612656000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["PTDC\/EEI-HAC\/30848\/2017"],"award-info":[{"award-number":["PTDC\/EEI-HAC\/30848\/2017"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Electronics"],"abstract":"<jats:p>Heterogeneous platforms with FPGAs have started to be employed in the High-Performance Computing (HPC) field to improve performance and overall efficiency. These platforms allow the use of specialized hardware to accelerate software applications, but require the software to be adapted in what can be a prolonged and complex process. The main goal of this work is to describe and evaluate mechanisms that can transparently transfer the control flow between CPU and FPGA within the scope of HPC. Combining such a mechanism with transparent software profiling and accelerator configuration could lead to an automatic way of accelerating regular applications. In this work, a mechanism based on the ptrace system call is proposed, and its performance on the Intel Xeon+FPGA platform is evaluated. The feasibility of the proposed approach is demonstrated by a working prototype that performs the transparent control flow transfer of any function call to a matching hardware accelerator. This approach is more general than shared library interposition at the cost of a small time overhead in each accelerator use (about 1.3 ms in the prototype implementation).<\/jats:p>","DOI":"10.3390\/electronics10040406","type":"journal-article","created":{"date-parts":[[2021,2,8]],"date-time":"2021-02-08T20:51:51Z","timestamp":1612817511000},"page":"406","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Transparent Control Flow Transfer between CPU and Accelerators for HPC"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9322-2758","authenticated-orcid":false,"given":"Daniel","family":"Granh\u00e3o","sequence":"first","affiliation":[{"name":"INESC TEC and Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7471-3888","authenticated-orcid":false,"given":"Jo\u00e3o","family":"Canas Ferreira","sequence":"additional","affiliation":[{"name":"INESC TEC and Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,7]]},"reference":[{"key":"ref_1","unstructured":"Cutress, I. (2019, December 16). Intel\u2019s Manufacturing Roadmap from 2019 to 2029: Back Porting, 7 nm, 5 nm, 3 nm, 2 nm, and 1.4 nm. Available online: https:\/\/web.archive.org\/web\/20191215001821\/https:\/\/www.anandtech.com\/show\/15217\/intels-manufacturing-roadmap-from-2019-to-2029."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1109\/MCSE.2017.29","article-title":"The End of Moore\u2019s Law: A New Beginning for Information Technology","volume":"19","author":"Theis","year":"2017","journal-title":"Comput. Sci. Eng."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1109\/MCSE.2017.31","article-title":"What\u2019s Next?","volume":"19","author":"Williams","year":"2017","journal-title":"Comput. Sci. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/MM.2013.74","article-title":"Implications of the Power Wall: Dim Cores and Reconfigurable Logic","volume":"33","author":"Wang","year":"2013","journal-title":"IEEE Micro"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Hao, Y., Fang, Z., Reinman, G., and Cong, J. (2017, January 4\u20138). Supporting Address Translation for Accelerator-Centric Architectures. Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, USA.","DOI":"10.1109\/HPCA.2017.19"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/MM.2015.42","article-title":"A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services","volume":"35","author":"Putnam","year":"2015","journal-title":"IEEE Micro"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Blott, M. (2016, January 18\u201322). Reconfigurable future for HPC. Proceedings of the 2016 International Conference on High Performance Computing Simulation (HPCS), Innsbruck, Austria.","DOI":"10.1109\/HPCSim.2016.7568326"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TVLSI.2016.2573640","article-title":"Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces","volume":"25","author":"Paulino","year":"2017","journal-title":"IEEE Trans. Very Large Scale Integr. Syst."},{"key":"ref_9","unstructured":"Gupta, P., and Accelerating Datacenter Workloads (2021, February 05). Presented at FPL\u201916. Available online: https:\/\/web.archive.org\/web\/20180903013405\/https:\/\/fpl2016.org\/slides\/Gupta%20\u2013%20Accelerating%20Datacenter%20Workloads.pdf."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/MC.2008.240","article-title":"Warp processing: Dynamic translation of binaries to FPGA circuits","volume":"41","author":"Vahid","year":"2008","journal-title":"Computer"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Gupta, S., Feng, S., Ansari, A., Mahlke, S., and August, D. (2011, January 3\u20137). Bundled execution of recurring traces for energy-efficient general purpose processing. Proceedings of the 44th Annual IEEE\/ACM International Symposium on Microarchitecture\u2014MICRO-44 \u201911, Porto Alegre, Brazil.","DOI":"10.1145\/2155620.2155623"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Paulino, N., Ferreira, J.C., and Cardoso, J.M. (2013, January 19\u201320). Architecture for transparent binary acceleration of loops with memory accesses. Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Cambridge, UK.","DOI":"10.1007\/978-3-642-36812-7_12"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Beisel, T., Niekamp, M., and Plessl, C. (2010, January 7\u20139). Using shared library interposing for transparent application acceleration in systems with heterogeneous hardware accelerators. Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Rennes, France.","DOI":"10.1109\/ASAP.2010.5540798"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Miyajima, T., Thomas, D., and Amano, H. (2012, January 5\u20137). A domain specific language and toolchain for OpenCV Runtime Binary Acceleration using GPU. Proceedings of the 2012 3rd International Conference on Networking and Computing, ICNC 2012, Okinawa, Japan.","DOI":"10.1109\/ICNC.2012.34"},{"key":"ref_15","unstructured":"Kerrisk, M. (2019, January 15). Ptrace(2)\u2014Linux Manual Page. Available online: https:\/\/web.archive.org\/web\/20181230071754\/http:\/\/man7.org\/linux\/man-pages\/man2\/ptrace.2.html."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1625","DOI":"10.1109\/TII.2012.2235844","article-title":"Transparent trace-based binary acceleration for reconfigurable HW\/SW systems","volume":"9","author":"Bispo","year":"2013","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Paulino, N.M.C., Ferreira, J.C., and Cardoso, J.M.P. (2014, January 26\u201328). Trace-based reconfigurable acceleration with data cache and external memory support. Proceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2014, Milan, Italy.","DOI":"10.1109\/ISPA.2014.29"},{"key":"ref_18","unstructured":"(2021, February 05). IEEE\/Open Group 1003.1-2017\u2014IEEE Standard for Information Technology\u2013Portable Operating System Interface (POSIX(TM)) Base Specifications, Issue 7. Available online: https:\/\/publications.opengroup.org\/standards\/unix\/t101."},{"key":"ref_19","unstructured":"Klitzke, E. (2019, March 05). Using Ptrace for Fun and Profit. Available online: https:\/\/web.archive.org\/web\/20200215141911\/https:\/\/eklitzke.org\/ptrace."},{"key":"ref_20","unstructured":"Luebbers, E., Liu, S., and Chu, M. (2021, February 03). Simplify Software Integration for FPGA Accelerators with OPAE (White Paper). Available online: https:\/\/01.org\/sites\/default\/files\/downloads\/opae\/open-programmable-acceleration-engine-paper.pdf."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Dworkin, M.J. (2007). Recommendation for Block Cipher Modes of Operation.","DOI":"10.6028\/NIST.SP.800-38d"},{"key":"ref_22","unstructured":"Hsing, H. (2019, March 19). AES Core. Available online: https:\/\/web.archive.org\/web\/20200710061100if_\/https:\/\/opencores.org\/projects\/tiny_aes."},{"key":"ref_23","unstructured":"Kokke (2019, May 20). tiny-AES-c. Available online: https:\/\/web.archive.org\/web\/20190325180304\/https:\/\/github.com\/kokke\/tiny-AES-c."},{"key":"ref_24","unstructured":"(2019, May 25). Netlib BLAS. Available online: https:\/\/web.archive.org\/web\/20190407202641\/http:\/\/netlib.org\/blas\/."}],"container-title":["Electronics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-9292\/10\/4\/406\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:21:06Z","timestamp":1760160066000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-9292\/10\/4\/406"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,7]]},"references-count":24,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["electronics10040406"],"URL":"https:\/\/doi.org\/10.3390\/electronics10040406","relation":{},"ISSN":["2079-9292"],"issn-type":[{"type":"electronic","value":"2079-9292"}],"subject":[],"published":{"date-parts":[[2021,2,7]]}}}