{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:38:26Z","timestamp":1740123506522,"version":"3.37.3"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2022,3,2]],"date-time":"2022-03-02T00:00:00Z","timestamp":1646179200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,2]],"date-time":"2022-03-02T00:00:00Z","timestamp":1646179200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002428","name":"Austrian Science Fund","doi-asserted-by":"publisher","award":["P 29783"],"award-info":[{"award-number":["P 29783"]}],"id":[{"id":"10.13039\/501100002428","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003065","name":"University of Vienna","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003065","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Task-based runtime systems are an important branch of parallel programming research, since tasks decouple computation from the compute units, giving the runtime systems greater flexibility than a thread-based solution. This makes it easier to deal with the ever-increasing complexity of parallel architectures by providing a separation of concerns\u2014the specification of parallelism is separated from the implementation of the parallel computations on a specific architecture. The Open Community Runtime is one such system, aimed at large-scale parallel systems. Unlike many other task-based runtime systems, the creators not only provided an implementation but there is also a comprehensive specification document. This has allowed us to create an independent implementation, called OCR-Vx. In this article, we present our experience of developing the runtime system, put our work in the context of the specification and the other implementations, and describe key lessons that we have learned during our work. We discuss the design and implementation issues of task-based runtime systems and applications including task synchronization and scheduling, data management, memory consistency, the relation between shared-memory and distributed-memory runtime systems, NUMA architectures, and heterogeneous systems. The article is aimed at audiences not familiar with OCR, since we believe these lessons could be valuable for developers working on other task-based runtime systems or designing new ones.<\/jats:p>","DOI":"10.1007\/s11227-022-04355-0","type":"journal-article","created":{"date-parts":[[2022,3,2]],"date-time":"2022-03-02T12:02:31Z","timestamp":1646222551000},"page":"12344-12379","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["The OCR-Vx experience: lessons learned from designing and implementing a task-based runtime system"],"prefix":"10.1007","volume":"78","author":[{"given":"Jiri","family":"Dokulil","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6520-2047","authenticated-orcid":false,"given":"Siegfried","family":"Benkner","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,2]]},"reference":[{"key":"4355_CR1","unstructured":"ISO\/IEC TS 19571:2016 Programming Languages \u2014 Technical specification for C++ extensions for concurrency Tech. rep. (2013). https:\/\/www.iso.org\/standard\/65242.html"},{"key":"4355_CR2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2766064","author":"E Agullo","year":"2017","unstructured":"Agullo E, Aumage O, Faverge M, Furmento N, Pruvost F, Sergent M, Thibault S (2017) Achieving high performance on supercomputers with a sequential task-based programming model. IEEE Trans Parallel Distrib Syst. https:\/\/doi.org\/10.1109\/TPDS.2017.2766064","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"4355_CR3","first-page":"187","volume":"23","author":"C Augonnet","year":"2011","unstructured":"Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures concurrency and computation. Pract Exp 23:187\u2013198","journal-title":"Pract Exp"},{"key":"4355_CR4","unstructured":"Aumage O, Carpenter P, Benkner S (2021) Task-based performance portability in HPC\u2014maximising long-term investments in a fast evolving, complex and heterogeneous HPC landscape. White Paper, European Technology Platform for High Performance Computing (ETP4HPC). https:\/\/doi.org\/10.5281\/zenodo.5549731"},{"key":"4355_CR5","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1007\/978-981-13-5907-1_11","volume-title":"Parallel and distributed computing, applications and technologies","author":"E Bajrovic","year":"2019","unstructured":"Bajrovic E, Benkner S, Dokulil J (2019) Pipeline patterns on top of task-based runtimes. In: Park JH, Shen H, Sung Y, Tian H (eds) Parallel and distributed computing, applications and technologies. Springer, Singapore, pp 100\u2013110. https:\/\/doi.org\/10.1007\/978-981-13-5907-1_11"},{"key":"4355_CR6","doi-asserted-by":"crossref","unstructured":"Bauer M, Treichler S, Slaughter E, Aiken A: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC \u201912, pp. 66:1\u201366:11. IEEE Computer Society Press, Los Alamitos, CA, USA (2012)","DOI":"10.1109\/SC.2012.71"},{"issue":"05","key":"4355_CR7","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1109\/MM.2011.67","volume":"31","author":"S Benkner","year":"2011","unstructured":"Benkner S, Pllana S, Tr\u00e4ff JL, Tsigas P, Dolinsky U, Augonnet C, Bachmayer B, Kessler C, Moloney D, Osipov V (2011) PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(05):28\u201341. https:\/\/doi.org\/10.1109\/MM.2011.67","journal-title":"IEEE Micro"},{"issue":"6","key":"4355_CR8","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1145\/1379022.1375591","volume":"43","author":"HJ Boehm","year":"2008","unstructured":"Boehm HJ, Adve SV (2008) Foundations of the C++ concurrency memory model. SIGPLAN Not 43(6):68\u201378. https:\/\/doi.org\/10.1145\/1379022.1375591","journal-title":"SIGPLAN Not"},{"key":"4355_CR9","doi-asserted-by":"publisher","unstructured":"Bradner SO (1996) The internet standards process\u2014Revision 3. RFC 2026. https:\/\/doi.org\/10.17487\/RFC2026. https:\/\/www.rfc-editor.org\/info\/rfc2026","DOI":"10.17487\/RFC2026"},{"key":"4355_CR10","unstructured":"Budimlic Z, Cave V, Chatterjee S, Cledat R, Sarkar V, Seshasayee B, Surendran R, Vrvilo N (2015) Characterizing application execution using the Open Community Runtime. In: International Workshop on Runtime Systems for Extreme Scale Programming Models and Architectures, in Conjunction with SC15. Austin, Texas. https:\/\/www.cs.rice.edu\/~zoran\/Publications_files\/RESPA2015.pdf"},{"key":"4355_CR11","doi-asserted-by":"publisher","unstructured":"Bueno J, Planas J, Duran A, Badia RM, Martorell X, Ayguad \u0301e E, Labarta J (2012) Productive programming of gpu clusters with ompss. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 557\u2013568. https:\/\/doi.org\/10.1109\/IPDPS.2012.58","DOI":"10.1109\/IPDPS.2012.58"},{"key":"4355_CR12","doi-asserted-by":"publisher","unstructured":"Cav\u00e9 V, Zhao J, Shirako J, Sarkar V: Habanero-java: the new adventures of old x10. In: Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ \u201911, pp 51\u201361. ACM, New York, NY, USA (2011).https:\/\/doi.org\/10.1145\/2093157.2093165","DOI":"10.1145\/2093157.2093165"},{"key":"4355_CR13","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1016\/j.parco.2017.02.003","volume":"64","author":"V Cav\u00e9","year":"2017","unstructured":"Cav\u00e9 V, Cl\u00e9dat R, Griffin P, More A, Seshasayee B, Borkar S, Chatterjee S, Dunning D, Fryman J (2017) Traleika glacier: a hardware-software co-designed approach to exascale computing. Parallel Comput 64:33\u201349","journal-title":"Parallel Comput"},{"issue":"3","key":"4355_CR14","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1177\/1094342007078442","volume":"21","author":"B Chamberlain","year":"2007","unstructured":"Chamberlain B, Callahan D, Zima H (2007) Parallel programmability and the chapel language. Int J High Perform Comput Appl 21(3):291\u2013312","journal-title":"Int J High Perform Comput Appl"},{"issue":"5","key":"4355_CR15","doi-asserted-by":"publisher","first-page":"2725","DOI":"10.1007\/s11227-018-2681-2","volume":"75","author":"J Dokulil","year":"2019","unstructured":"Dokulil J (2019) Consistency model for runtime objects in the Open Community Runtime. J Supercomput 75(5):2725\u20132760","journal-title":"J Supercomput"},{"key":"4355_CR16","doi-asserted-by":"publisher","unstructured":"Dokulil J, Bajrovic E, Benkner S, Sandrieser M, Bachmayer B: HyPHI - task based hybrid execution C++ library for the Intel Xeon Phi coprocessor. In: Parallel Processing (ICPP), 2013 42nd International Conference on, pp 280\u2013289 (2013). https:\/\/doi.org\/10.1109\/ICPP.2013.37","DOI":"10.1109\/ICPP.2013.37"},{"key":"4355_CR17","unstructured":"Dokulil J, Benkner S: OCR extensions - local identifiers, labeled GUIDs, file IO, and data block partitioning. CoRR abs\/1509.03161 (2015). http:\/\/arxiv.org\/abs\/1509.03161"},{"key":"4355_CR18","doi-asserted-by":"crossref","unstructured":"Dokulil J, Benkner S: Retargeting of the Open Community Runtime to Intel Xeon Phi. In: International Conference On Computational Science, ICCS 2015, pp 1453\u20131462. Procedia Computer Science (2015)","DOI":"10.1016\/j.procs.2015.05.335"},{"key":"4355_CR19","doi-asserted-by":"crossref","unstructured":"Dokulil J, Benkner S: The Open Community Runtime on the Intel Knights Landing architecture. In: 17th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP-2017) (2017)","DOI":"10.1007\/978-3-319-65482-9_65"},{"key":"4355_CR20","doi-asserted-by":"crossref","unstructured":"Dokulil J, Benkner S: Automatic placement of tasks to NUMA nodes in iterative applications. In: 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020, V\u00e4ster\u00e5s, Sweden, March 11\u201313, 2020, pp 192\u2013195. IEEE (2020).","DOI":"10.1109\/PDP50117.2020.00036"},{"key":"4355_CR21","doi-asserted-by":"crossref","unstructured":"Dokulil J, Benkner S: Automatic placement of tasks to NUMA nodes in iterative applications. In: 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020, pp 192\u2013195. IEEE (2020).","DOI":"10.1109\/PDP50117.2020.00036"},{"key":"4355_CR22","doi-asserted-by":"publisher","unstructured":"Dokulil J, Benkner S: NUMA-aware CPU core allocation in cooperating dynamic applications. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020, New Orleans, LA, USA, May 18\u201322, 2020, pp 950\u2013957. IEEE (2020). https:\/\/doi.org\/10.1109\/IPDPSW50202.2020.00158","DOI":"10.1109\/IPDPSW50202.2020.00158"},{"key":"4355_CR23","unstructured":"Dokulil J, Sandrieser M, Benkner S: OCR-Vx - an alternative implementation of the Open Community Runtime. In: International Workshop on Runtime Systems for Extreme Scale Programming Models and Architectures, in Conjunction with SC15. Austin, Texas (2015)"},{"key":"4355_CR24","doi-asserted-by":"publisher","unstructured":"Dokulil J, Sandrieser M, Benkner S: Implementing the Open Community Runtime for shared-memory and distributed-memory systems. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp 364\u2013368 (2016). https:\/\/doi.org\/10.1109\/PDP.2016.81","DOI":"10.1109\/PDP.2016.81"},{"key":"4355_CR25","doi-asserted-by":"publisher","unstructured":"Fryman J (2017) Traleidaglacierx-stackextensionfinalreport. https:\/\/doi.org\/10.2172\/1497409. https:\/\/www.osti.gov\/biblio\/1497409","DOI":"10.2172\/1497409"},{"issue":"11","key":"4355_CR26","doi-asserted-by":"publisher","first-page":"4160","DOI":"10.1007\/s11227-016-1744-5","volume":"72","author":"J Gong","year":"2016","unstructured":"Gong J, Markidis S, Laure E et al (2016) Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations. J Supercomput 72:4160\u20134180. https:\/\/doi.org\/10.1007\/s11227-016-1744-5","journal-title":"J Supercomput"},{"key":"4355_CR27","doi-asserted-by":"crossref","unstructured":"Heroux MA, Dongarra J, Luszczek P (2013) HPCG technical specification. Tech. rep SAND2013-8752, Sandia National Laboratories. https:\/\/www.osti.gov\/servlets\/purl\/1113870","DOI":"10.2172\/1113870"},{"key":"4355_CR28","unstructured":"Kaehler A, Bradski G (2016) Learning OpenCV 3: computer vision in C++ with the OpenCV Library. O\u2019Reilly Media. ISBN: 9781491937990"},{"key":"4355_CR29","doi-asserted-by":"crossref","unstructured":"Kaiser H, Heller T, Adelstein-Lelbach B, Serio A, Fey D: HPX - a task based programming model in a global address space. In: The 8th International Conference on Partitioned Global Address Space Programming Models (PGAS) (2014)","DOI":"10.1145\/2676870.2676883"},{"key":"4355_CR30","doi-asserted-by":"publisher","unstructured":"Kale LV, Krishnan S: Charm++: a portable concurrent object oriented system based on C++. In: Proc. of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA \u201993, pp 91\u2013108. ACM, New York, USA (1993). https:\/\/doi.org\/10.1145\/165854.165874","DOI":"10.1145\/165854.165874"},{"issue":"04","key":"4355_CR31","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1535\/itj.1104.05","volume":"11","author":"A Kukanov","year":"2007","unstructured":"Kukanov A, Voss MJ (2007) The foundations for scalable multi-core software in intel threading building blocks. Intel Technol J 11(04):309\u2013322","journal-title":"Intel Technol J"},{"key":"4355_CR32","doi-asserted-by":"crossref","unstructured":"Landwehr J, Suetterlein J, M\u00e1rquez A, Manzano J, Gao GR: Application characterization at scale: lessons learned from developing a distributed Open Community Runtime system for high performance computing. In: Proceedings of the ACM International Conference on Computing Frontiers, CF \u201916, pp 164\u2013171. ACM, New York, NY, USA (2016).","DOI":"10.1145\/2903150.2903166"},{"key":"4355_CR33","doi-asserted-by":"crossref","unstructured":"Leiserson C, Plaat A (1997) Programming parallel applications in Cilk. Siam News 31(4)","DOI":"10.1007\/3-540-63138-0_6"},{"key":"4355_CR34","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1016\/j.future.2017.04.001","volume":"74","author":"T Li","year":"2017","unstructured":"Li T, Ren Y, Yu D, Jin S (2017) Analysis of NUMA effects in modern multicore systems for the design of high-performance data transfer applications. Futur Gener Comput Syst 74:41\u201350 (2017).\u00a0https:\/\/doi.org\/10.1016\/j.future.2017.04.001","journal-title":"Futur Gener Comput Syst"},{"key":"4355_CR35","doi-asserted-by":"publisher","unstructured":"Majo Z, Gross TR (2011) Memory system performance in a numa multicore multiprocessor. SYSTOR \u201911. Association for computing machinery, New York, NY, USA (2011). https:\/\/doi.org\/10.1145\/1987816.1987832","DOI":"10.1145\/1987816.1987832"},{"issue":"1","key":"4355_CR36","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1145\/1047659.1040336","volume":"40","author":"J Manson","year":"2005","unstructured":"Manson J, Pugh W, Adve SV (2005) The Java memory model. SIGPLAN Not 40(1):378\u2013391. https:\/\/doi.org\/10.1145\/1047659.1040336","journal-title":"SIGPLAN Not"},{"key":"4355_CR37","unstructured":"Mattson T, Cledat R (eds) The Open Community Runtime interface, version 1.2 (2016), OCR working group. https:\/\/www.univie.ac.at\/ocr-vx\/doc\/ocr-v1.2.0.pdf"},{"key":"4355_CR38","doi-asserted-by":"crossref","unstructured":"Mattson TG et\u00a0al.: The Open Community Runtime: a runtime system for extreme scale computing. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp 1\u20137 (2016).","DOI":"10.1109\/HPEC.2016.7761580"},{"key":"4355_CR39","first-page":"45","volume":"19","author":"RC Murphy","year":"2010","unstructured":"Murphy RC, Wheeler KB, Barrett BW, Ang JA (2010) Introducing the graph 500. Cray Users Group (CUG) 19:45\u201374","journal-title":"Cray Users Group (CUG)"},{"key":"4355_CR40","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevC.64.024612","author":"M Papa","year":"2001","unstructured":"Papa M, Maruyama T, Bonasera A (2001) Constrained molecular dynamics approach to fermionic systems. Phys Rev C. https:\/\/doi.org\/10.1103\/PhysRevC.64.024612","journal-title":"Phys Rev C"},{"key":"4355_CR41","unstructured":"Paul SR, Sarkar V, Seshasayee B, Cledat R, Cave V (2016) Enabling a data-centric model on the Open Community Runtime. In: Poster at SC16. Salt Lake City, Utah. http:\/\/sc16.supercomputing.org\/sc-archive\/src_poster\/poster_files\/spost143s2-file2.pdf"},{"key":"4355_CR42","doi-asserted-by":"crossref","unstructured":"Rocklin, M.: Dask: Parallel computation with blocked algorithms and task scheduling. In: K.\u00a0Huff, J.\u00a0Bergstra (eds.) Proceedings of the 14th Python in Science Conference, pp 130\u2013136 (2015)","DOI":"10.25080\/Majora-7b98e3ed-013"},{"key":"4355_CR43","doi-asserted-by":"publisher","unstructured":"Tardieu O et\u00a0al.: X10 and APGAS at petascale. In: Proc. of the 19th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 53\u201366. ACM, New York, USA (2014). https:\/\/doi.org\/10.1145\/2555243.2555245","DOI":"10.1145\/2555243.2555245"},{"key":"4355_CR44","doi-asserted-by":"publisher","first-page":"1069","DOI":"10.5194\/gmd-10-1069-2017","volume":"10","author":"P Ullrich","year":"2017","unstructured":"Ullrich P, Zarzycki C (2017) Tempestextremes: a framework for scale-insensitive pointwise feature tracking on unstructured grids. Geosci Model Dev 10:1069\u20131090. https:\/\/doi.org\/10.5194\/gmd-10-1069-2017","journal-title":"Geosci Model Dev"},{"key":"4355_CR45","unstructured":"Vaughan CT, Barrett RF (2014) miniAMR, Sandia National Laboratories (SNL). https:\/\/www.osti.gov\/biblio\/1253324"},{"key":"4355_CR46","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1007\/978-3-319-96983-1_5","volume-title":"Euro-Par 2018: Parallel Process","author":"L Yu","year":"2018","unstructured":"Yu L, Sarkar V (2018) GT-Race: graph traversal based data race detection for asynchronous many-task parallelism. In: Aldinucci M, Padovani L, Torquati M (eds) Euro-Par 2018: Parallel Process. Springer International Publishing, Cham, pp 59\u201373"},{"key":"4355_CR47","doi-asserted-by":"publisher","unstructured":"Zheng Y, Kamil A, Driscoll MB, Shan H, Yelick K (2014) UPC++: A PGAS extension for C++. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp 1105\u20131114. https:\/\/doi.org\/10.1109\/IPDPS.2014.115","DOI":"10.1109\/IPDPS.2014.115"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-022-04355-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-022-04355-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-022-04355-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,10]],"date-time":"2022-06-10T00:10:47Z","timestamp":1654819847000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-022-04355-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,2]]},"references-count":47,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["4355"],"URL":"https:\/\/doi.org\/10.1007\/s11227-022-04355-0","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"type":"print","value":"0920-8542"},{"type":"electronic","value":"1573-0484"}],"subject":[],"published":{"date-parts":[[2022,3,2]]},"assertion":[{"value":"3 February 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 March 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}