{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T00:05:01Z","timestamp":1777334701376,"version":"3.51.4"},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T00:00:00Z","timestamp":1651190400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T00:00:00Z","timestamp":1651190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["01IS17091B"],"award-info":[{"award-number":["01IS17091B"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005714","name":"Technische Universit\u00e4t Darmstadt","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005714","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Sign Process Syst"],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The open-source hardware\/software framework TaPaSCo aims to make reconfigurable computing on FPGAs more accessible to non-experts. To this end, it provides an easily usable task-based programming abstraction, and combines this with powerful tool support to automatically implement the individual hardware accelerators and integrate them into usable system-on-chips. Currently, TaPaSCo relies on the host to manage task parallelism and perform the actual task launches. However, for more expressive parallel programming patterns, such as pipelines of task farms, the round trips from the hardware accelerators back to the host for launching child tasks, especially when exploiting data-dependent execution times, quickly add up. The major contribution of this work is the addition of on-chip task scheduling and launching capabilities to TaPaSCo. This enables not only low-latency <jats:italic>dynamic<\/jats:italic> task parallelism, it also encompasses the efficient on-chip exchange of parameter values and task results between parent and child accelerator tasks. For larger distributed systems, the dynamic launch capability can even be extended over the network to span multiple FPGAs. Our solution is able to handle recursive task structures, and is shown to achieve latency reductions of over 35x compared to the prior approaches.<\/jats:p>","DOI":"10.1007\/s11265-022-01759-2","type":"journal-article","created":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T09:04:00Z","timestamp":1651223040000},"page":"883-893","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["On-Chip and Distributed Dynamic Parallelism for Task-based Hardware Accelerators"],"prefix":"10.1007","volume":"94","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5927-4426","authenticated-orcid":false,"given":"Carsten","family":"Heinz","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andreas","family":"Koch","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,4,29]]},"reference":[{"issue":"6","key":"1759_CR1","doi-asserted-by":"publisher","first-page":"968","DOI":"10.1007\/s10766-013-0269-2","volume":"42","author":"S Ernsting","year":"2014","unstructured":"Ernsting, S., & Kuchen, H. (2014). A Scalable Farm Skeleton for Hybrid Parallel and Distributed Programming. International Journal of Parallel Programming, 42(6), 968\u2013987. https:\/\/doi.org\/10.1007\/s10766-013-0269-2","journal-title":"International Journal of Parallel Programming"},{"key":"1759_CR2","doi-asserted-by":"publisher","DOI":"10.1007\/s11265-021-01640-8","author":"C Heinz","year":"2021","unstructured":"Heinz, C., Hofmann, J., Korinth, J., Sommer, L., Weber, L., & Koch, A. (2021). The TaPaSCo Open-Source Toolflow. Journal of Signal Processing Systems. https:\/\/doi.org\/10.1007\/s11265-021-01640-8","journal-title":"Journal of Signal Processing Systems"},{"key":"1759_CR3","doi-asserted-by":"crossref","unstructured":"Heinz, C., & Koch, A. (2021). Supporting on-chip dynamic parallelism for task-based hardware accelerators. In S. Derrien, F. Hannig, P. C. Diniz, D. Chillet (eds), Applied reconfigurable computing. Architectures, tools, and applications (pp. 81\u201392). Springer: Cham.","DOI":"10.1007\/978-3-030-79025-7_6"},{"issue":"08","key":"1759_CR4","doi-asserted-by":"publisher","first-page":"1143","DOI":"10.1109\/TC.2020.3000118","volume":"69","author":"T Wang","year":"2020","unstructured":"Wang, T., Geng, T., Li, A., Jin, X., & Herbordt, M. (2020). FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters. IEEE Transactions on Computers, 69(08), 1143\u20131158. https:\/\/doi.org\/10.1109\/TC.2020.3000118","journal-title":"IEEE Transactions on Computers"},{"key":"1759_CR5","doi-asserted-by":"publisher","unstructured":"Heinz, C., Hofmann, J. A., Sommer, L., & Koch, A. (2020). Improving job launch rates in the TaPaSCo FPGA middleware by Hardware\/Software-Co-Design. In\u00a02020 IEEE\/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS) (pp. 22\u201330). https:\/\/doi.org\/10.1109\/ROSS51935.2020.00008","DOI":"10.1109\/ROSS51935.2020.00008"},{"key":"1759_CR6","doi-asserted-by":"publisher","unstructured":"Ruiz, M., Sidler, D., Sutter, G., Alonso, G., & L\u00f3pez-Buedo, S. (2019). Limago: an FPGA-based Open-source 100\u00a0GbE TCP\/IP Stack. In\u00a02019 29th International Conference on Field Programmable Logic and Applications (FPL) (pp. 286\u2013292). IEEE.\u00a0https:\/\/doi.org\/10.1109\/FPL.2019.00053","DOI":"10.1109\/FPL.2019.00053"},{"key":"1759_CR7","doi-asserted-by":"crossref","unstructured":"Hartmann, M., Weber, L., Wirth, J., Sommer, L., & Koch, A. (2021). Optimizing a hardware network stack to realize an in-network ML inference application. In\u00a02021 IEEE\/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC).","DOI":"10.1109\/H2RC54759.2021.00008"},{"key":"1759_CR8","unstructured":"Aurora 64B\/66B Protocol Specification (SP011 (v1.3) October 1, 2014). https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/aurora_64b66b_protocol_spec_sp011.pdf"},{"key":"1759_CR9","unstructured":"BlueAXI. https:\/\/github.com\/esa-tu-darmstadt\/BlueAXI"},{"key":"1759_CR10","unstructured":"Dubucq, T., Forlini, T., Dos Reis, V. L., & Santos, I. (2015). Matrix: Bench - benchmarking the state-of-the-art task execution frameworks of many-task computing."},{"key":"1759_CR11","unstructured":"Shah, H., Voloshin, M., & Sharma, D. (2020). MVAPICH2 on Thor: High performance MPI meets mainstream ethernet controller. 9th Annual MVAPICH User Group (MUG) Meeting."},{"key":"1759_CR12","doi-asserted-by":"crossref","unstructured":"Vin\u00e7on, T., Weber, L., Bernhardt, A., Riegger, C., Hardock, S., Knoedler, C., Stock, F., Solis-Vasquez, L., Tamimi, S., & Koch, A. (2020). nKV in Action: Accelerating KV-Stores on native computation storage with near-data processing. In Proceedings of the VLDB Endowment, Volume 13.","DOI":"10.14778\/3415478.3415524"},{"key":"1759_CR13","doi-asserted-by":"crossref","unstructured":"Kim, S., Lee, K., Cho, W., Nam, Y., Cheon, J. H., & Rutenbar, R. A. (2020). Hardware architecture of a number theoretic transform for a bootstrappable RNS-based homomorphic encryption scheme. In\u00a02020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (pp. 56\u201364). IEEE.","DOI":"10.1109\/FCCM48280.2020.00017"},{"key":"1759_CR14","doi-asserted-by":"crossref","unstructured":"Scott, M. (2017). A note on the implementation of the number theoretic transform. In IMA International Conference on Cryptography and Coding (pp. 247\u2013258). Springer.","DOI":"10.1007\/978-3-319-71045-7_13"},{"key":"1759_CR15","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.3017930","author":"AC Mert","year":"2020","unstructured":"Mert, A. C., Karabulut, E., Ozturk, E., Savas, E., & Aysu, A. (2020). An Extensive Study of Flexible Design Methods for the Number Theoretic Transform. IEEE Transactions on Computers. https:\/\/doi.org\/10.1109\/TC.2020.3017930","journal-title":"IEEE Transactions on Computers"},{"key":"1759_CR16","unstructured":"Xilinx, Inc. Performance and Resource Utilization for AXI4-Stream Interconnect RTL v1.1. URL https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/ru\/axis-interconnect.html#virtexuplus"},{"key":"1759_CR17","doi-asserted-by":"crossref","unstructured":"Canis, A., Choi, J., Aldham, M., Zhang, V., Kammoona, A., Anderson, J. H., Brown, S., & Czajkowski, T. (2011). LegUp: High-level synthesis for FPGA-based processor\/accelerator systems. In Proceedings of the 19th ACM\/SIGDA international symposium on Field programmable gate arrays (pp. 33\u201336).","DOI":"10.1145\/1950413.1950423"},{"issue":"4","key":"1759_CR18","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1145\/2954679.2872415","volume":"51","author":"R Prabhakar","year":"2016","unstructured":"Prabhakar, R., Koeplinger, D., Brown, K. J., Lee, H., De Sa, C., Kozyrakis, C., & Olukotun, K. (2016). Generating configurable hardware from parallel patterns. Acm Sigplan Notices, 51(4), 651\u2013665.","journal-title":"Acm Sigplan Notices"},{"key":"1759_CR19","doi-asserted-by":"crossref","unstructured":"Chen, T., Srinath, S., Batten, C., & Suh, G. E. (2018). An architectural framework for accelerating dynamic parallel algorithms on reconfigurable hardware. In\u00a02018 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO) (pp. 55\u201367). IEEE.","DOI":"10.1109\/MICRO.2018.00014"},{"key":"1759_CR20","doi-asserted-by":"publisher","unstructured":"Tarafdar, N., Lin, T., Fukuda, E., Bannazadeh, H., Leon-Garcia, A., & Chow, P. (2017). Enabling flexible network FPGA clusters in a heterogeneous cloud data center. In Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Association for Computing Machinery.\u00a0FPGA\u00a0'17\u00a0(pp. 237\u2013246). New York, NY: USA.\u00a0\u00a0https:\/\/doi.org\/10.1145\/3020078.3021742","DOI":"10.1145\/3020078.3021742"}],"container-title":["Journal of Signal Processing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-022-01759-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11265-022-01759-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-022-01759-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,27]],"date-time":"2022-08-27T04:23:21Z","timestamp":1661574201000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11265-022-01759-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,29]]},"references-count":20,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["1759"],"URL":"https:\/\/doi.org\/10.1007\/s11265-022-01759-2","relation":{},"ISSN":["1939-8018","1939-8115"],"issn-type":[{"value":"1939-8018","type":"print"},{"value":"1939-8115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,29]]},"assertion":[{"value":"2 October 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 February 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 April 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 April 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}