{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,5]],"date-time":"2026-05-05T23:34:52Z","timestamp":1778024092334,"version":"3.51.4"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2021,7,20]],"date-time":"2021-07-20T00:00:00Z","timestamp":1626739200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,20]],"date-time":"2021-07-20T00:00:00Z","timestamp":1626739200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100014438","name":"business finland","doi-asserted-by":"crossref","award":["1982\/31\/2021"],"award-info":[{"award-number":["1982\/31\/2021"]}],"id":[{"id":"10.13039\/501100014438","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Technical Research Centre of Finland"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2022,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Commercial multicore central processing units (CPU) integrate a number of processor cores on a single chip to support parallel execution of computational tasks. Multicore CPUs can possibly improve performance over single cores for independent parallel tasks nearly linearly as long as sufficient bandwidth is available. Ideal speedup is, however, difficult to achieve when dense intercommunication between the cores or complex memory access patterns is required. This is caused by expensive synchronization and thread switching, and insufficient latency toleration. These facts guide programmers away from straight-forward parallel processing patterns toward complex and error-prone programming techniques. To address these problems, we have introduced the Thick control flow (TCF) Processor Architecture. TCF is an abstraction of parallel computation that combines self-similar threads into computational entities. In this paper, we compare the performance and programmability of an entry-level TCF processor and two Intel Skylake multicore CPUs on commonly used parallel kernels to find out how well our architecture solves these issues that greatly reduce the productivity of parallel software development. Code examples are given and programming experiences recorded.<\/jats:p>","DOI":"10.1007\/s11227-021-03985-0","type":"journal-article","created":{"date-parts":[[2021,7,20]],"date-time":"2021-07-20T10:26:12Z","timestamp":1626776772000},"page":"3152-3183","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Performance and programmability comparison of the thick control flow architecture and current multicore processors"],"prefix":"10.1007","volume":"78","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4865-8058","authenticated-orcid":false,"given":"Martti","family":"Forsell","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sara","family":"Nikula","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jussi","family":"Roivainen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ville","family":"Lepp\u00e4nen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jesper Larsson","family":"Tr\u00e4ff","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,7,20]]},"reference":[{"key":"3985_CR1","unstructured":"International Technology Roadmap for Semiconductors, Semiconductor Industry Association, year 2003 edition. http:\/\/www.itrs.net\/"},{"key":"3985_CR2","unstructured":"Research at Intel From a Few Cores to Many (2006): A Tera-scale Computing Research Overview, White Paper, Intel,"},{"issue":"7","key":"3985_CR3","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1109\/MSPEC.2010.5491011","volume":"47","author":"D Patterson","year":"2010","unstructured":"Patterson D (2010) The trouble with multicore. IEEE Spectr 47(7):28\u201332","journal-title":"IEEE Spectr"},{"key":"3985_CR4","volume-title":"Introduction to parallel algorithms","author":"J Jaja","year":"1992","unstructured":"Jaja J (1992) Introduction to parallel algorithms. Addison-Wesley, Reading"},{"key":"3985_CR5","doi-asserted-by":"crossref","unstructured":"M\u00e4kel\u00e4 J-M, Forsell M, Lepp\u00e4nen V (2017) Towards a language framework for thick control flows. In: Proc. of the High Level Programming Models and Supporting Environments (HIPS\u201917), May 29, 2017, Orlando, FL, USA","DOI":"10.1109\/IPDPSW.2017.119"},{"key":"3985_CR6","volume-title":"Practical PRAM programming","author":"J Keller","year":"2001","unstructured":"Keller J, Kessler C, Tr\u00e4ff JL (2001) Practical PRAM programming. Wiley, New York"},{"issue":"13","key":"3985_CR7","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1016\/S1383-7621(02)00064-4","volume":"47","author":"M Forsell","year":"2002","unstructured":"Forsell M (2002) Architectural differences of efficient sequential and parallel computers. J Syst Architect 47(13):1017\u20131041","journal-title":"J Syst Architect"},{"issue":"1","key":"3985_CR8","first-page":"98","volume":"3","author":"M Forsell","year":"2013","unstructured":"Forsell M, Lepp\u00e4nen V (2013) An extended PRAM-NUMA model of computation for TCF programming. Int J Netw Comput 3(1):98\u2013115","journal-title":"Int J Netw Comput"},{"key":"3985_CR9","unstructured":"Forsell M (2018) Accelerating general purpose parallel computing with the TPA architecture. In: ScalPerf 2018, September 23\u201328, 2018, Bertinoro, Italy"},{"key":"3985_CR10","unstructured":"Leppanen V, Forsell M, Makela J-M (2011), Thick control flows: introduction and prospects. In: Proc. PDPTA 2011, July 18\u201321, Las Vegas, USA, pp 540\u2013546"},{"key":"3985_CR11","doi-asserted-by":"crossref","unstructured":"Forsell M, Roivainen J, Leppanen V (2016) Outline of a thick control flow architecture. In: Proc. MPP 2016, SBAC-PAD 2016, October 26\u201328, 2016. Marina del Rey Marriott, Los Angeles, USA","DOI":"10.1109\/SBAC-PADW.2016.9"},{"key":"3985_CR12","unstructured":"REPLICA Multiprocessor Framework, White Paper, VTT (2020)"},{"key":"3985_CR13","doi-asserted-by":"crossref","unstructured":"Forsell M, Roivainen J, Lepp\u00e4nen V, Tr\u00e4ff JL (2018) Implementation of multioperations in thick control flow processors. In: Proc. APDCM\u201918, May 21\u201325, 2018, Vancouver, Canada","DOI":"10.1109\/IPDPSW.2018.00121"},{"key":"3985_CR14","unstructured":"Forsell M (2018) Flexible fibering scheme for thick control flow processors. In: Proc. PDPTA\u201918, July 30\u2013August 2, 2018, Las Vegas, USA, pp 16-20"},{"key":"3985_CR15","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1016\/j.micpro.2018.09.013","volume":"63","author":"M Forsell","year":"2018","unstructured":"Forsell M, Roivainen J, Lepp\u00e4nen V, Tr\u00e4ff JL (2018) Supporting concurrent memory access in TCF processor architectures. Microprocess Microsyst 63:226\u2013236","journal-title":"Microprocess Microsyst"},{"key":"3985_CR16","unstructured":"Hansson E, Alnervik E, Kessler C, Forsell M (2014) A quantitative comparison of PRAM based emulated shared memory architectures to current multicore CPUs and GPUs. In: Proceedings of the 11th Workshop on Parallel Systems and Algorithms (PASA\u201914) in Conjunction with the 27th International Conference on Architecture of Computing Systems (ARCS\u201914), February 25\u201326, Luebeck, Germany, pp 1-7"},{"issue":"5","key":"3985_CR17","doi-asserted-by":"publisher","first-page":"1911","DOI":"10.1007\/s11227-017-2199-z","volume":"74","author":"M Forsell","year":"2018","unstructured":"Forsell M, Roivainen J, Lepp\u00e4nen V (2018) REPLICA MBTAC\u2014multithreaded dual mode processor. J Supercomput 74(5):1911\u20131933","journal-title":"J Supercomput"},{"issue":"4","key":"3985_CR18","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1145\/357114.357116","volume":"2","author":"J Schwartz","year":"1980","unstructured":"Schwartz J (1980) Ultracomputers. ACM Trans Program Lang Syst 2(4):484\u2013521","journal-title":"ACM Trans Program Lang Syst"},{"key":"3985_CR19","doi-asserted-by":"crossref","unstructured":"Ranade A, Bhatt S, Johnsson S (1988) The fluent abstract machine. In: Proc. Fifth MIT Conference on Advanced Research in VLSI, March 1988, pp 71\u201394; TR-573. Department of Computer Science, Yale University","DOI":"10.7551\/mitpress\/1102.003.0008"},{"issue":"5","key":"3985_CR20","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1109\/MM.2002.1044299","volume":"22","author":"M Forsell","year":"2002","unstructured":"Forsell M (2002) A scalable high-performance computing solution for network on chips. IEEE Micro 22(5):46\u201355","journal-title":"IEEE Micro"},{"issue":"1","key":"3985_CR21","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1145\/1866739.1866757","volume":"54","author":"U Vishkin","year":"2011","unstructured":"Vishkin U (2011) Using simple abstraction to reinvent computing for parallelism. Commun ACM 54(1):75\u201385","journal-title":"Commun ACM"},{"key":"3985_CR22","doi-asserted-by":"crossref","unstructured":"Vishkin U (2014) Is multicore hardware for general-purpose parallel processing broken? Commun ACM 57(4):35-39","DOI":"10.1145\/2580945"},{"issue":"2","key":"3985_CR23","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1109\/TPDS.2017.2754376","volume":"29","author":"F Ghanim","year":"2018","unstructured":"Ghanim F, Vishkin U, Barua R (2018) Easy PRAM-Based high-performance parallel programming with ICE. IEEE Trans Parallel Distrib Syst 29(2):377\u2013390","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"1","key":"3985_CR24","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1016\/j.jcss.2010.06.012","volume":"77","author":"L Valiant","year":"2011","unstructured":"Valiant L (2011) A bridging model for multi-core computing. J Comput Syst Sci 77(1):154\u2013166","journal-title":"J Comput Syst Sci"},{"key":"3985_CR25","unstructured":"Forsell M, Nikula S, Roivainen J (2021) Preliminary performance and programmability comparison of the thick control flow architecture and current multicore CPUs. In: Arabnia H, Deligiannidis L, Grimaila M, Hodson D, Joe K, Sekijima M, Tinetti F (eds) Advances in Parallel & Distributed Processing, and Applications: Proceedings from PDPTA\u201920, CSC\u201920, MSV\u201920, and GCC\u201920 (July 27\u201330, 2020, Las Vegas, Nevada, USA). Springer"},{"key":"3985_CR26","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1016\/0022-0000(91)90005-P","volume":"42","author":"A Ranade","year":"1991","unstructured":"Ranade A (1991) How to emulate shared memory. J Comput Syst Sci 42:307\u2013326","journal-title":"J Comput Syst Sci"},{"issue":"5","key":"3985_CR27","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1016\/0141-9331(96)01092-7","volume":"20","author":"M Forsell","year":"1996","unstructured":"Forsell M (1996) Minimal pipeline architecture\u2014an alternative to superscalar architecture. Microprocess Microsyst 20(5):277\u2013284","journal-title":"Microprocess Microsyst"},{"key":"3985_CR28","unstructured":"Skylake (client)\u2014Microarchitectures\u2014Intel, Wiki Chip www document at address: https:\/\/en.wikichip.org\/wiki\/intel\/microarchitectures\/skylake_(client). Accessed March 14, 2021"},{"key":"3985_CR29","unstructured":"Skylake (server)\u2014Microarchitectures\u2014Intel, Wiki Chip www document available at http:\/\/en.wikichip.org\/wiki\/intel\/microarchitectures\/skylake_(server). Accessed March 14, 2021"},{"issue":"9","key":"3985_CR30","doi-asserted-by":"publisher","first-page":"948","DOI":"10.1109\/TC.1972.5009071","volume":"21","author":"M Flynn","year":"1972","unstructured":"Flynn M (1972) Some computer organizations and their effectiveness. IEEE Trans Comput 21(9):948\u2013960","journal-title":"IEEE Trans Comput"},{"key":"3985_CR31","doi-asserted-by":"crossref","unstructured":"Fortune S, Wyllie J (1978) Parallelism in random access machines. In: Proceedings of 10th ACM STOC, Association for Computing Machinery, New York, pp 114\u2013118","DOI":"10.1145\/800133.804339"},{"key":"3985_CR32","volume-title":"Highly parallel computing","author":"G Almasi","year":"1994","unstructured":"Almasi G, Gottlieb A (1994) Highly parallel computing. Benjamin\/Cummings, Redwood City"},{"issue":"2","key":"3985_CR33","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1145\/280277.280278","volume":"30","author":"D Skillicorn","year":"1998","unstructured":"Skillicorn D, Talia D (1998) Models and languages for parallel computation. ACM Comput Surv 30(2):123\u2013169","journal-title":"ACM Comput Surv"},{"key":"3985_CR34","volume-title":"Parallel computer architecture\u2014a hardware\/ software approach","author":"D Culler","year":"1999","unstructured":"Culler D, Singh J (1999) Parallel computer architecture\u2014a hardware\/ software approach. Morgan Kaufmann Publishers Inc., San Fransisco"},{"key":"3985_CR35","doi-asserted-by":"crossref","unstructured":"Rajasekaran S, Reif J (eds) (2008) Chapman Handbook of parallel computing\u2014models algorithms and applications. Hall\/CRC","DOI":"10.1201\/9781420011296"},{"key":"3985_CR36","unstructured":"Kirk D, Hwu W-M (2010) Programming massively parallel processors\u2014a hands-on approach. Morgan Kaufmann"},{"key":"3985_CR37","unstructured":"Pacheco P (2011) An introduction to parallel programming. Morgan Kaufmann"},{"key":"3985_CR38","unstructured":"The MPI Forum, CORPORATE (November 15\u201319, 1993), MPI: a message passing interface. In: Proc. 1993 ACM\/IEEE Conference on Supercomputing"},{"key":"3985_CR39","unstructured":"Lewis B, Berg D (1996) PThreads primer: a guide to multithreaded programming. Sunsoft Press"},{"key":"3985_CR40","doi-asserted-by":"crossref","unstructured":"Chandra R, Menon R, Dagum L, Kohr D, Maydan D, McDonald J (2001) Parallel programming in OpenMP. 1st edn. Morgan Kaufmann Publishers","DOI":"10.1016\/B978-155860671-5\/50002-5"},{"key":"3985_CR41","unstructured":"Carlson W, Draper J, Culler D, Yelick K, Brooks E, Warren K, Livermore L (1999) Introduction to UPC and language specification. In: CCS-TR-99-157, IDA Center for Computing Sciences"},{"key":"3985_CR42","doi-asserted-by":"crossref","unstructured":"Forsell M, Roivainen J, Lepp\u00e4nen V (2016) The REPLICA on-chip network. In: NORCAS 2016, November 1\u20132, 2016, Copenhagen, Denmark","DOI":"10.1109\/NORCHIP.2016.7792877"},{"key":"3985_CR43","doi-asserted-by":"crossref","unstructured":"Forsell M, Lepp\u00e4nen V, Penttonen M (2015) Cost of bandwidth-optimized sparse mesh layouts. In: Proceedings of 13th International Conference on Parallel Computing Technologies (PaCT\u201915), Lecture Notes in Computer Science (LNCS), vol 9251, August 31\u2013September 4, 2015, Petrozavodsk, Russia, pp 375\u2013389","DOI":"10.1007\/978-3-319-21909-7_37"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-021-03985-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-021-03985-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-021-03985-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T15:52:39Z","timestamp":1725465159000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-021-03985-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,20]]},"references-count":43,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,2]]}},"alternative-id":["3985"],"URL":"https:\/\/doi.org\/10.1007\/s11227-021-03985-0","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,20]]},"assertion":[{"value":"1 July 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 July 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}