{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:44:35Z","timestamp":1740123875085,"version":"3.37.3"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T00:00:00Z","timestamp":1604620800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T00:00:00Z","timestamp":1604620800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Universit\u00e4t Augsburg"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Parallel Prog"],"published-print":{"date-parts":[[2021,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To improve the scalability, several many-core architectures use message passing instead of shared memory accesses for communication. Unfortunately, Direct Memory Access (DMA) transfers in a shared address space are usually used to emulate message passing, which entails a lot of overhead and thwarts the advantages of message passing. Recently proposed register-level message passing alternatives use special instructions to send the contents of a single register to another core. The reduced communication overhead and architectural simplicity lead to good many-core scalability. After investigating several other approaches in terms of hardware complexity and throughput overhead, we recommend a small instruction set extension to enable register-level message passing at minimal hardware costs and describe its integration into a classical five stage RISC-V pipeline.<\/jats:p>","DOI":"10.1007\/s10766-020-00685-9","type":"journal-article","created":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T04:33:18Z","timestamp":1604637198000},"page":"487-505","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["PIMP My Many-Core: Pipeline-Integrated Message Passing"],"prefix":"10.1007","volume":"49","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8345-2760","authenticated-orcid":false,"given":"J\u00f6rg","family":"Mische","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Frieb","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexander","family":"Stegmeier","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Theo","family":"Ungerer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,11,6]]},"reference":[{"issue":"3","key":"685_CR1","first-page":"63","volume":"5","author":"DH Bailey","year":"1991","unstructured":"Bailey, D.H., et al.: The NAS parallel benchmarks. Int. J. High Perform. Comput. Appl. 5(3), 63\u201373 (1991)","journal-title":"Int. J. High Perform. Comput. Appl."},{"key":"685_CR2","doi-asserted-by":"crossref","unstructured":"Bell, S., et\u00a0al.: Tile64-processor: a 64-core soc with mesh interconnect. In: International Solid-State Circuits Conference (ISSC), pp. 588\u2013598 (2008)","DOI":"10.1109\/ISSCC.2008.4523070"},{"key":"685_CR3","doi-asserted-by":"publisher","first-page":"1654","DOI":"10.1016\/j.procs.2013.05.333","volume":"18","author":"BD de Dinechin","year":"2013","unstructured":"de Dinechin, B.D., de Massas, P.G., Lager, G., L\u00e9ger, C., Orgogozo, B., Reybert, J., Strudel, T.: A distributed run-time environment for the Kalray MPPA-256 integrated manycore processor. Procedia Comput. Sci. 18, 1654\u20131663 (2013)","journal-title":"Procedia Comput. Sci."},{"key":"685_CR4","unstructured":"Duller, A., Towner, D., Panesar, G., Gray, A., Robbins, W.: Picoarray technology: the tool\u2019s story. In: Design, Automation and Test in Europe (DATE), pp. 106\u2013111 (2005)"},{"key":"685_CR5","unstructured":"Frieb, M.: Hardware Extensions for a Timing-Predictable Many-Core Processor. Ph.D. thesis, Department of Computer Science, University of Augsburg (2019)"},{"key":"685_CR6","doi-asserted-by":"crossref","unstructured":"Frieb, M., Stegmeier, A., Mische, J., Ungerer, T.: Lightweight hardware synchronization for avoiding buffer overflows in network-on-chips. In: Architecture of Computing Systems (ARCS), pp. 112\u2013126 (2018)","DOI":"10.1007\/978-3-319-77610-1_9"},{"key":"685_CR7","doi-asserted-by":"crossref","unstructured":"Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM\/MPI Users\u2019 Group Meeting, pp. 97\u2013104. Budapest, Hungary (2004)","DOI":"10.1007\/978-3-540-30218-6_19"},{"issue":"3","key":"685_CR8","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1145\/2544350.2544353","volume":"10","author":"K Goossens","year":"2013","unstructured":"Goossens, K., et al.: Virtual execution platforms for mixed-time-criticality systems: the CompSOC architecture and design flow. ACM SIGBED Rev. 10(3), 23\u201334 (2013)","journal-title":"ACM SIGBED Rev."},{"key":"685_CR9","unstructured":"INMOS Limited: Transputer Instruction set\u2014A Compiler Writer\u2019s Guide (1988)"},{"key":"685_CR10","doi-asserted-by":"crossref","unstructured":"Kumar, R., Mattson, T.G., Pokam, G., van\u00a0der Wijngaart, R.: The case for Message Passing on Many-core Chips. Tech. Rep. UILU-ENG-10-2203 (CRHC 10-01), University of Illinois at Urbana-Champaign (2010)","DOI":"10.1007\/978-1-4419-6460-1_5"},{"key":"685_CR11","doi-asserted-by":"crossref","unstructured":"Lee, Y., Waterman, A., Avizienis, R., Cook, H., Sun, C., Stojanovi\u0107, V., Asanovi\u0107, K.: A 45 nm 1.3 ghz 16.7 double-precision gflops\/w risc-v processor with vector accelerators. In: European Solid State Circuits Conference (ESSCIRC), pp. 199\u2013202 (2014)","DOI":"10.1109\/ESSCIRC.2014.6942056"},{"key":"685_CR12","doi-asserted-by":"crossref","unstructured":"Lewis, D., et\u00a0al.: The stratix ii logic and routing architecture. In: Proceedings of the 2005 ACM\/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, pp. 14\u201320 (2005)","DOI":"10.1145\/1046192.1046195"},{"key":"685_CR13","doi-asserted-by":"crossref","unstructured":"Mattson, T.G., et\u00a0al.: The 48-core SCC Processor: the Programmer\u2019s View. In: High Performance Computing, Networking, Storage and Analysis (SC), pp. 1\u201311 (2010)","DOI":"10.1109\/SC.2010.53"},{"key":"685_CR14","doi-asserted-by":"crossref","unstructured":"Message Passing Interface Forum: Message-Passing Interface Standard, Version 3.1 (2015). High Performance Computing Center Stuttgart (HLRS)","DOI":"10.7551\/mitpress\/9486.003.0003"},{"key":"685_CR15","doi-asserted-by":"crossref","unstructured":"Mische, J., Frieb, M., Stegmeier, A., Ungerer, T.: Reduced complexity many-core: timing predictability due to message-passing. In: Architecture of Computing Systems (ARCS), pp. 139\u2013151 (2017)","DOI":"10.1007\/978-3-319-54999-6_11"},{"key":"685_CR16","doi-asserted-by":"crossref","unstructured":"Mische, J., Frieb, M., Stegmeier, A., Ungerer, T.: PIMP my many-core: pipeline-integrated message passing. In: Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), pp. 199\u2013211 (2019)","DOI":"10.1007\/978-3-030-27562-4_14"},{"key":"685_CR17","doi-asserted-by":"crossref","unstructured":"Schoeberl, M., Pezzarossa, L., Spars\u00f8, J.: A minimal network interface for a simple network-on-chip. In: Architecture of Computing Systems (ARCS), pp. 295\u2013307 (2019)","DOI":"10.1007\/978-3-030-18656-2_22"},{"issue":"2","key":"685_CR18","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1109\/MM.2016.25","volume":"36","author":"A Sodani","year":"2016","unstructured":"Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34\u201346 (2016)","journal-title":"IEEE Micro"},{"key":"685_CR19","unstructured":"Stegmeier, A.: Real-Time Analysis of MPI Programs for NoC-Based Many-Cores Using Time Division Multiplexing. Ph.D. thesis, Department of Computer Science, University of Augsburg (2019)"},{"key":"685_CR20","unstructured":"Strohmaier, E.: Highlights of the 54th TOP500 List. In: High Performance Computing, Networking, Storage and Analysis (SC) (2019)"},{"key":"685_CR21","doi-asserted-by":"crossref","unstructured":"S\u00f8rensen, R.B., Puffitsch, W., Schoeberl, M., Spars\u00f8, J.: Message passing on a time-predictable multicore processor. In: International Symposium on Real-time Distributed Computing (ISORC 2015), pp. 51\u201359 (2015)","DOI":"10.1109\/ISORC.2015.15"},{"issue":"2","key":"685_CR22","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1109\/MM.2002.997877","volume":"22","author":"MB Taylor","year":"2002","unstructured":"Taylor, M.B., et al.: The raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro 22(2), 25\u201335 (2002)","journal-title":"IEEE Micro"},{"issue":"1","key":"685_CR23","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1145\/1945023.1945033","volume":"45","author":"RF Van der Wijngaart","year":"2011","unstructured":"Van der Wijngaart, R.F., Mattson, T.G., Haas, W.: Light-weight communications on Intel\u2019s single-chip cloud computer processor. ACM SIGOPS Oper. Syst. Rev. 45(1), 73\u201383 (2011)","journal-title":"ACM SIGOPS Oper. Syst. Rev."},{"key":"685_CR24","unstructured":"Waterman, A., Asanovi\u0107, K.: The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA. Document Version 20191213 (2019)"},{"issue":"1","key":"685_CR25","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1007\/s11390-015-1510-9","volume":"30","author":"F Zheng","year":"2015","unstructured":"Zheng, F., Li, H.L., Lv, H., Guo, F., Xu, X.H., Xie, X.H.: Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture. J. Comput. Sci. Technol. 30(1), 145\u2013162 (2015)","journal-title":"J. Comput. Sci. Technol."}],"container-title":["International Journal of Parallel Programming"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10766-020-00685-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10766-020-00685-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10766-020-00685-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T16:28:01Z","timestamp":1697041681000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10766-020-00685-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,6]]},"references-count":25,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,8]]}},"alternative-id":["685"],"URL":"https:\/\/doi.org\/10.1007\/s10766-020-00685-9","relation":{},"ISSN":["0885-7458","1573-7640"],"issn-type":[{"type":"print","value":"0885-7458"},{"type":"electronic","value":"1573-7640"}],"subject":[],"published":{"date-parts":[[2020,11,6]]},"assertion":[{"value":"1 April 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 October 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 November 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}