{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:22:13Z","timestamp":1750306933535,"version":"3.41.0"},"reference-count":13,"publisher":"Association for Computing Machinery (ACM)","issue":"1s","license":[{"start":{"date-parts":[[2013,3,1]],"date-time":"2013-03-01T00:00:00Z","timestamp":1362096000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2013,3]]},"abstract":"<jats:p>\n            Efficient, scalable and productive parallel programming is a major challenge for exploiting the future multi-processor SoC platforms. This article presents the\n            <jats:italic>MultiFlex<\/jats:italic>\n            programming environment which has been developed to address this challenge. It is targeted for use on\n            <jats:italic>Platform 2012<\/jats:italic>\n            , a scalable multi-processor fabric. The MultiFlex environment supports high-level simulation, iterative platform mapping, and includes tools for programming model aware debug, trace, visualization and analysis.\n          <\/jats:p>\n          <jats:p>This article focuses on the two classes of programming abstractions supported in MultiFlex. The first is a set of Parallel Programming Patterns (PPP) which offer a rich set of programming abstractions for implementing efficient data- and task-level parallel applications. The second is a Reactive Task Management (RTM) abstraction, which offers a lightweight C-based API to support dynamic dispatching of small grain tasks on tightly coupled parallel processing resources.<\/jats:p>\n          <jats:p>The use of the MultiFlex native programming model is illustrated through the capture and mapping of two representative video applications. The first is a high-quality rescaling (HQR) application on a multi-processor platform. We present the details of the optimization process which was required for mapping the HQR application, for which the reference code requires 350 GIPS (giga instructions per second), onto a 16 processor cluster. Our results show that the parallel implementation using the PPP model offers almost linear acceleration with respect to the number of processing elements.<\/jats:p>\n          <jats:p>The second application is a high-definition VC-1 decoder. For this application, we illustrate two different parallel programming model variants, one using PPPs, the other based on RTM. These two versions are mapped onto two variants of a homogeneous version of the Platform 2012 multi-core fabric.<\/jats:p>","DOI":"10.1145\/2435227.2435243","type":"journal-article","created":{"date-parts":[[2013,3,19]],"date-time":"2013-03-19T13:34:23Z","timestamp":1363700063000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Parallel programming patterns for multi-processor SoC"],"prefix":"10.1145","volume":"12","author":[{"given":"Pierre G.","family":"Paulin","sequence":"first","affiliation":[{"name":"STMicroelectronics Inc., Ottawa, Canada"}]},{"given":"Ali Erdem","family":"\u00d6zcan","sequence":"additional","affiliation":[{"name":"STMicroelectronics Inc., Ottawa, Canada"}]},{"given":"Vincent","family":"Gagn\u00e9","sequence":"additional","affiliation":[{"name":"STMicroelectronics Inc., Ottawa, Canada"}]},{"given":"Bruno","family":"Lavigueur","sequence":"additional","affiliation":[{"name":"STMicroelectronics Inc., Ottawa, Canada"}]},{"given":"Olivier","family":"Benny","sequence":"additional","affiliation":[{"name":"STMicroelectronics Inc., Ottawa, Canada"}]}],"member":"320","published-online":{"date-parts":[[2013,3,21]]},"reference":[{"volume-title":"P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Design, Automation, and Test Conference. 983--987","author":"Benini L.","key":"e_1_2_1_1_1","unstructured":"Benini , L. , Flamand , E. , Fuin , D. , and Melpignano , D . 2012 . P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Design, Automation, and Test Conference. 983--987 . Benini, L., Flamand, E., Fuin, D., and Melpignano, D. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Design, Automation, and Test Conference. 983--987."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.94"},{"key":"e_1_2_1_3_1","volume-title":"Design Patterns: Elements of Reusable Object-Oriented Software","author":"Gamma E","year":"1995","unstructured":"Gamma , E , Helm . R. , Johnson , R. , and Vlissides , J. M . 1995 . Design Patterns: Elements of Reusable Object-Oriented Software . Addison-Wesley . Gamma, E, Helm. R., Johnson, R., and Vlissides, J. M. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley."},{"key":"e_1_2_1_4_1","unstructured":"Intel. 2011a. CILK plus. http:\/\/software.intel.com\/en-us\/articles\/intel-cilk-plus\/.  Intel. 2011a. CILK plus. http:\/\/software.intel.com\/en-us\/articles\/intel-cilk-plus\/."},{"key":"e_1_2_1_5_1","unstructured":"Intel. 2011b. Array building blocks. http:\/\/software.intel.com\/en-us\/articles\/intel-array-buildingblocks\/.  Intel. 2011b. Array building blocks. http:\/\/software.intel.com\/en-us\/articles\/intel-array-buildingblocks\/."},{"key":"e_1_2_1_6_1","unstructured":"Intel. 2011c. Threading building blocks http:\/\/threadingbuildingblocks.org\/.  Intel. 2011c. Threading building blocks http:\/\/threadingbuildingblocks.org\/."},{"key":"e_1_2_1_7_1","unstructured":"Khronos 2013 Khronos OpenCL. http:\/\/www.khronos.org\/opencl\/.  Khronos 2013 Khronos OpenCL. http:\/\/www.khronos.org\/opencl\/."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228568"},{"key":"e_1_2_1_9_1","unstructured":"Microsoft Corporation. 2006. VC-1 technical overview.  Microsoft Corporation. 2006. VC-1 technical overview."},{"key":"e_1_2_1_10_1","unstructured":"OW2 Consortium 2011. The MIND Project. http:\/\/mind.ow2.org.  OW2 Consortium 2011. The MIND Project. http:\/\/mind.ow2.org."},{"key":"e_1_2_1_11_1","volume-title":"CRC Press","author":"Paulin P. G.","year":"2010","unstructured":"Paulin , P. G. , Benny , B. , Langevin , M. , Bouchebaba , Y. , Pilkington , C. , Lavigueur , B. , Lo , D. , Gagne , V. , and Metzger , M . 2010. MPSoC Platform Mapping Tools for Data-Dominated Applications. In Model-Based Design for Embedded Systems, G. Nicolescu G. and P. Mosterman, Eds ., CRC Press , 2010 . Paulin, P. G., Benny, B., Langevin, M., Bouchebaba, Y., Pilkington, C., Lavigueur, B., Lo, D., Gagne, V., and Metzger, M. 2010. MPSoC Platform Mapping Tools for Data-Dominated Applications. In Model-Based Design for Embedded Systems, G. Nicolescu G. and P. Mosterman, Eds., CRC Press, 2010."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2006.878259"},{"key":"e_1_2_1_13_1","volume-title":"CEA 2010","author":"Microelectronics","year":"2012","unstructured":"ST Microelectronics and CEA 2010 . Platform 2012 : A many-core programmable accelerator for ultra-efficient embedded computing in nanometer technology. http:\/\/www.cmc.ca\/en\/WhatWeOffer\/Prototyping\/~\/media\/WhatWeOffer\/TechPub\/20101105_Whitepaper_Final.pdf STMicroelectronics and CEA 2010. Platform 2012: A many-core programmable accelerator for ultra-efficient embedded computing in nanometer technology. http:\/\/www.cmc.ca\/en\/WhatWeOffer\/Prototyping\/~\/media\/WhatWeOffer\/TechPub\/20101105_Whitepaper_Final.pdf"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2435227.2435243","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2435227.2435243","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:35:40Z","timestamp":1750235740000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2435227.2435243"}},"subtitle":["Application to video processing"],"short-title":[],"issued":{"date-parts":[[2013,3]]},"references-count":13,"journal-issue":{"issue":"1s","published-print":{"date-parts":[[2013,3]]}},"alternative-id":["10.1145\/2435227.2435243"],"URL":"https:\/\/doi.org\/10.1145\/2435227.2435243","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2013,3]]},"assertion":[{"value":"2011-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-03-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}