{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,8]],"date-time":"2026-07-08T22:30:10Z","timestamp":1783549810064,"version":"3.55.0"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2020,7,7]],"date-time":"2020-07-07T00:00:00Z","timestamp":1594080000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2020,7,7]],"date-time":"2020-07-07T00:00:00Z","timestamp":1594080000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"funder":[{"DOI":"10.13039\/100000161","name":"National Institute of Standards and Technology","doi-asserted-by":"crossref","award":["70NANB16H247"],"award-info":[{"award-number":["70NANB16H247"]}],"id":[{"id":"10.13039\/100000161","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Real-Time Image Proc"],"published-print":{"date-parts":[[2021,6]]},"DOI":"10.1007\/s11554-020-00994-9","type":"journal-article","created":{"date-parts":[[2020,7,7]],"date-time":"2020-07-07T12:02:57Z","timestamp":1594123377000},"page":"561-583","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["CGMBE: a model-based tool for the design and implementation of real-time image processing applications on CPU\u2013GPU platforms"],"prefix":"10.1007","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4786-2559","authenticated-orcid":false,"given":"Jiahao","family":"Wu","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jing","family":"Xie","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alexandre","family":"Bardakoff","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Timothy","family":"Blattner","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Walid","family":"Keyrouz","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shuvra S.","family":"Bhattacharyya","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2020,7,7]]},"reference":[{"key":"994_CR1","unstructured":"Advanced Micro Devices, Inc.: AMD EPYC 7002 series datasheet. https:\/\/www.amd.com\/system\/files\/documents\/AMD-EPYC-7002-Series-Datasheet.pdf (2019). Last access: 2019-09-06"},{"key":"994_CR2","doi-asserted-by":"publisher","first-page":"012037","DOI":"10.1088\/1742-6596\/180\/1\/012037","volume":"180","author":"E Agullo","year":"2009","unstructured":"Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180, 012037 (2009). https:\/\/doi.org\/10.1088\/1742-6596\/180\/1\/012037","journal-title":"J. Phys. Conf. Ser."},{"key":"994_CR3","doi-asserted-by":"crossref","unstructured":"Anderson, E., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Du Croz, J., Hammarling, S., Demmel, J., Bischof, C., Sorensen, D.: LAPACK: A portable linear algebra library for high-performance computers. In: Proceedings of the 1990 ACM\/IEEE Conference on Supercomputing, Supercomputing \u201990, pp. 2\u201311. IEEE Computer Society Press, Los Alamitos, CA, USA (1990). http:\/\/dl.acm.org\/citation.cfm?id=110382.110385","DOI":"10.1109\/SUPERC.1990.129995"},{"issue":"2","key":"994_CR4","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1002\/cpe.1631","volume":"23","author":"C Augonnet","year":"2011","unstructured":"Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187\u2013198 (2011). https:\/\/doi.org\/10.1002\/cpe.1631","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"994_CR5","first-page":"56","volume":"8","author":"J Balart","year":"2004","unstructured":"Balart, J., Duran, A., Gonz\u00e0lez, M., Martorell, X., Ayguad\u00e9, E., Labarta, J.: Nanos mercurium: a research compiler for OpenMP. Proc Eur Workshop OpenMP 8, 56 (2004)","journal-title":"Proc Eur Workshop OpenMP"},{"key":"994_CR6","doi-asserted-by":"publisher","unstructured":"Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: Expressing locality and independence with logical regions. In: SC\u201912: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1\u201311 (2012). https:\/\/doi.org\/10.1109\/SC.2012.71","DOI":"10.1109\/SC.2012.71"},{"key":"994_CR7","volume-title":"Handbook of signal processing systems","year":"2019","unstructured":"Bhattacharyya, S.S., Deprettere, E., Leupers, R., Takala, J. (eds.): Handbook of signal processing systems, 3rd edn. Springer, Berlin (2019)","edition":"3"},{"issue":"3","key":"994_CR8","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1007\/s11265-017-1262-6","volume":"89","author":"T Blattner","year":"2017","unstructured":"Blattner, T., Keyrouz, W., Bhattacharyya, S.S., Halem, M., Brady, M.: A hybrid task graph scheduler for high performance image processing workflows. J. Signal Process. Syst. 89(3), 457\u2013467 (2017)","journal-title":"J. Signal Process. Syst."},{"key":"994_CR9","doi-asserted-by":"publisher","unstructured":"Blattner, T., Keyrouz, W., Chalfoun, J., Stivalet, B., Brady, M., Zhou, S.: A hybrid CPU-GPU system for stitching large scale optical microscopy images. In: Proceedings of the International Conference on Parallel Processing, pp. 1\u20139 (2014). https:\/\/doi.org\/10.1109\/ICPP.2014.9","DOI":"10.1109\/ICPP.2014.9"},{"issue":"6","key":"994_CR10","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1109\/MCSE.2013.98","volume":"15","author":"G Bosilca","year":"2013","unstructured":"Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36\u201345 (2013). https:\/\/doi.org\/10.1109\/MCSE.2013.98","journal-title":"Comput. Sci. Eng."},{"key":"994_CR11","doi-asserted-by":"publisher","unstructured":"Buck, J.T., Lee, E.A.: Scheduling dynamic dataflow graphs using the token flow model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, vol 3, pp. 429\u2013432 (1993). https:\/\/doi.org\/10.1109\/ICASSP.1993.319147","DOI":"10.1109\/ICASSP.1993.319147"},{"key":"994_CR12","first-page":"1","volume-title":"Applied parallel computing. state of the art in scientific computing","author":"A Buttari","year":"2007","unstructured":"Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Luszczek, P., Tomov, S.: The impact of multicore on math software. In: K\u00e5gstr\u00f6m, B., Elmroth, E., Dongarra, J., Wa\u015bniewski, J. (eds.) Applied parallel computing. state of the art in scientific computing, pp. 1\u201310. Springer, Berlin, Heidelberg (2007)"},{"key":"994_CR13","doi-asserted-by":"publisher","unstructured":"Choi, J., Dongarra, J.J., Pozo, R., Walker, D.W.: Scalapack: a scalable linear algebra library for distributed memory concurrent computers. In: [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, pp. 120\u2013127 (1992). https:\/\/doi.org\/10.1109\/FMPC.1992.234898","DOI":"10.1109\/FMPC.1992.234898"},{"issue":"3","key":"994_CR14","first-page":"219","volume":"13","author":"E Deelman","year":"2005","unstructured":"Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., et al.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Programm 13(3), 219\u2013237 (2005)","journal-title":"Sci. Programm"},{"key":"994_CR15","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.future.2014.10.008","volume":"46","author":"E Deelman","year":"2015","unstructured":"Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., Chen, W., da Silva, R.F., Livny, M., Wenger, K.: Pegasus, a workflow management system for science automation. Fut. Gen. Comput. Syst. 46, 17\u201335 (2015). https:\/\/doi.org\/10.1016\/j.future.2014.10.008","journal-title":"Fut. Gen. Comput. Syst."},{"key":"994_CR16","unstructured":"Dias, J.M.B.: VCA algorithm (unmix hyperspectral data) (2019). http:\/\/www.lx.it.pt\/~bioucas\/code.htm. Last Access: 2019-09-06"},{"key":"994_CR17","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/978-3-319-78024-5_18","volume-title":"Parallel processing and applied mathematics","author":"I Duff","year":"2018","unstructured":"Duff, I., Lopez, F.: Experiments with sparse cholesky using a parametrized task graph implementation. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) Parallel processing and applied mathematics, pp. 197\u2013206. Springer International Publishing, Cham (2018)"},{"issue":"02","key":"994_CR18","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1142\/S0129626411000151","volume":"21","author":"A Duran","year":"2011","unstructured":"Duran, A., Ayguad\u00e9, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Para. Process. Lett. 21(02), 173\u2013193 (2011). https:\/\/doi.org\/10.1142\/S0129626411000151","journal-title":"Para. Process. Lett."},{"key":"994_CR19","doi-asserted-by":"publisher","unstructured":"Eker, J., Janneck, J.W.: Dataflow programming in CAL\u2014balancing expressiveness, analyzability, and implementability. In: 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp. 1120\u20131124 (2012). https:\/\/doi.org\/10.1109\/ACSSC.2012.6489194","DOI":"10.1109\/ACSSC.2012.6489194"},{"key":"994_CR20","doi-asserted-by":"publisher","unstructured":"Gao, G.R., Govindarajan, R., Panangaden, P.: Well-behaved programs for DSP computation. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, vol 5, pp. 561\u2013564 (1992). https:\/\/doi.org\/10.1109\/ICASSP.1992.226558","DOI":"10.1109\/ICASSP.1992.226558"},{"key":"994_CR21","unstructured":"Google, Inc.: Protocol buffers. (2017). https:\/\/developers.google.com\/protocol-buffers. Accessed 8 Sept 2019"},{"key":"994_CR22","unstructured":"Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). http:\/\/eigen.tuxfamily.org. Accessed 8 Sept 2019"},{"key":"994_CR23","unstructured":"Horizon 2020 FET-HPC project: Parallel numerical linear algebra for extreme scale systems (2019). http:\/\/www.nlafet.eu, visited on July 31, 2019"},{"key":"994_CR24","doi-asserted-by":"publisher","unstructured":"Keinert, J., Haubelt, C., Teich, J.: Modeling and analysis of windowed synchronous algorithms. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings vol 3, pp. III-III (2006). https:\/\/doi.org\/10.1109\/ICASSP.2006.1660798","DOI":"10.1109\/ICASSP.2006.1660798"},{"issue":"4","key":"994_CR25","first-page":"406","volume":"31","author":"Y Kwok","year":"1999","unstructured":"Kwok, Y., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. J. Assoc. Comput. Mach. 31(4), 406\u2013471 (1999)","journal-title":"J. Assoc. Comput. Mach."},{"key":"994_CR26","volume-title":"Concurrent programming in Java: design principles and patterns","author":"D Lea","year":"1999","unstructured":"Lea, D.: Concurrent programming in Java: design principles and patterns, 2nd edn. Addison-Wesley, Boston (1999)","edition":"2"},{"issue":"9","key":"994_CR27","doi-asserted-by":"publisher","first-page":"1235","DOI":"10.1109\/PROC.1987.13876","volume":"75","author":"EA Lee","year":"1987","unstructured":"Lee, E.A., Messerschmitt, D.G.: Synchronous dataflow. Proc. IEEE 75(9), 1235\u20131245 (1987)","journal-title":"Proc. IEEE"},{"issue":"5","key":"994_CR28","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1109\/5.381846","volume":"83","author":"EA Lee","year":"1995","unstructured":"Lee, E.A., Parks, T.M.: Dataflow process networks. Proc IEEE 83(5), 773\u2013801 (1995)","journal-title":"Proc IEEE"},{"key":"994_CR29","first-page":"1","volume-title":"Handbook of hardware\/software codesign","author":"S Lin","year":"2017","unstructured":"Lin, S., Liu, Y., Lee, K., Li, L., Plishker, W., Bhattacharyya, S.S.: The DSPCAD framework for modeling and synthesis of signal processing systems. In: Ha, S., Teich, J. (eds.) Handbook of hardware\/software codesign, pp. 1\u201335. Springer, Berlin (2017)"},{"key":"994_CR30","doi-asserted-by":"publisher","unstructured":"Liu, Y., Barford, L., Bhattacharyya, S.S.: Generalized graph connections for dataflow modeling of DSP applications. In: 2018 IEEE International Workshop on Signal Processing Systems (SiPS), pp. 275\u2013280. Cape Town, South Africa (2018). https:\/\/doi.org\/10.1109\/SiPS.2018.8598305","DOI":"10.1109\/SiPS.2018.8598305"},{"issue":"4","key":"994_CR31","doi-asserted-by":"publisher","first-page":"898","DOI":"10.1109\/TGRS.2005.844293","volume":"43","author":"JMP Nascimento","year":"2005","unstructured":"Nascimento, J.M.P., Dias, J.M.B.: Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43(4), 898\u2013910 (2005)","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"994_CR32","unstructured":"OpenBLAS: An optimized BLAS library. https:\/\/www.openblas.net\/. Last Access: 2019-09-09"},{"key":"994_CR33","doi-asserted-by":"publisher","unstructured":"Palumbo, F., Carta, N., Raffo, L.: The multi-dataflow composer tool: A runtime reconfigurable HDL platform composer. In: Proceedings of the 2011 Conference on Design Architectures for Signal Image Processing (DASIP), pp. 1\u20138 (2011). https:\/\/doi.org\/10.1109\/DASIP.2011.6136876","DOI":"10.1109\/DASIP.2011.6136876"},{"key":"994_CR34","doi-asserted-by":"publisher","unstructured":"Pelcat, M., Menuet, P., Aridhi, S., Nezan, J.F.: Scalable compile-time scheduler for multi-core architectures. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 1552\u20131555 (2009). https:\/\/doi.org\/10.1109\/DATE.2009.5090909","DOI":"10.1109\/DATE.2009.5090909"},{"key":"994_CR35","doi-asserted-by":"publisher","unstructured":"Sahoo, D.R., Swaminathan, S., Al-Omari, R., Salapaka, M.V., Manimaran, G., Somani, A.K.: Feedback control for real-time scheduling. In: Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301), vol. 2, pp. 1254\u20131259 (2002). https:\/\/doi.org\/10.1109\/ACC.2002.1023192","DOI":"10.1109\/ACC.2002.1023192"},{"key":"994_CR36","unstructured":"Sriram, S., Bhattacharyya, S.S.: Embedded Multiprocessors: Scheduling and Synchronization, 2nd edn. CRC Press (2009). ISBN 1420048015. http:\/\/www.ece.umd.edu\/DSPCAD\/papers\/srir2009x1-flyer.pdf"},{"key":"994_CR37","unstructured":"The HDF Group: High level introduction to HDF5. https:\/\/support.hdfgroup.org\/HDF5\/Tutor\/HDF5Intro.pdf (2016). Last Access: 2019-09-06"},{"issue":"3","key":"994_CR38","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1109\/71.993206","volume":"13","author":"H Topcuoglu","year":"2002","unstructured":"Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Para. Distrib. Syst. 13(3), 260\u2013274 (2002)","journal-title":"IEEE Trans. Para. Distrib. Syst."},{"key":"994_CR39","doi-asserted-by":"publisher","unstructured":"Wang, Q., Zhang, X., Zhang, Y., Yi, Q.: AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC\u201913, pp. 25:1\u201325:12. Association for Computing Machinery, New York, NY, USA (2013). https:\/\/doi.org\/10.1145\/2503210.2503219","DOI":"10.1145\/2503210.2503219"},{"key":"994_CR40","doi-asserted-by":"publisher","unstructured":"Wu, J., Blattner, T., Keyrouz, W., Bhattacharyya, S.S.: Model-based dynamic scheduling for multicore implementation of image processing systems. In: 2017 IEEE International Workshop on Signal Processing Systems (SiPS), pp. 1\u20136. Lorient, France (2017). https:\/\/doi.org\/10.1109\/SiPS.2017.8110003","DOI":"10.1109\/SiPS.2017.8110003"}],"container-title":["Journal of Real-Time Image Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11554-020-00994-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11554-020-00994-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11554-020-00994-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,6]],"date-time":"2021-07-06T23:45:01Z","timestamp":1625615101000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11554-020-00994-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,7]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,6]]}},"alternative-id":["994"],"URL":"https:\/\/doi.org\/10.1007\/s11554-020-00994-9","relation":{},"ISSN":["1861-8200","1861-8219"],"issn-type":[{"value":"1861-8200","type":"print"},{"value":"1861-8219","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,7]]},"assertion":[{"value":"13 September 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 June 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 July 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}