{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:39:01Z","timestamp":1740123541134,"version":"3.37.3"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T00:00:00Z","timestamp":1648166400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T00:00:00Z","timestamp":1648166400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"publisher","award":["RTI2018-098156-B-C53"],"award-info":[{"award-number":["RTI2018-098156-B-C53"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004687","name":"Universidad de Murcia","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004687","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper presents HDNN, a proof-of-concept MLIR dialect for cross-platform computing specialized in deep neural networks. As target devices, HDNN supports CPUs, GPUs and TPUs. In this paper, we provide a comprehensive description of the HDNN dialect, outlining how this novel approach aims to solve the <jats:inline-formula><jats:alternatives><jats:tex-math>$$P^3$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:msup>\n                    <mml:mi>P<\/mml:mi>\n                    <mml:mn>3<\/mml:mn>\n                  <\/mml:msup>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> problem of parallel programming (portability, productivity, and performance). An HDNN program is device-agnostic, i.e., only the device specifier has to be changed to run a given workload in one device or another. Moreover, HDNN has been designed to be a domain-specific language, which ultimately helps programming productivity. Finally, HDNN relies on optimized libraries for heavy, performance-critical workloads. HDNN has been evaluated against other state-of-the-art machine learning frameworks on all the hardware platforms achieving excellent performance. We conclude that the ideas and concepts used in HDNN can be crucial for designing future generation compilers and programming languages to overcome the challenges of the forthcoming heterogeneous computing era.\n<\/jats:p>","DOI":"10.1007\/s11227-022-04417-3","type":"journal-article","created":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T19:03:33Z","timestamp":1648235013000},"page":"13814-13830","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["HDNN: a cross-platform MLIR dialect for deep neural networks"],"prefix":"10.1007","volume":"78","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4391-2451","authenticated-orcid":false,"given":"Pablo Antonio","family":"Mart\u00ednez","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7265-3508","authenticated-orcid":false,"given":"Gregorio","family":"Bernab\u00e9","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6388-2835","authenticated-orcid":false,"given":"Jos\u00e9 Manuel","family":"Garc\u00eda","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,3,25]]},"reference":[{"doi-asserted-by":"publisher","unstructured":"Ben-Nun T, Hoefler T (2019) Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput Surv. https:\/\/doi.org\/10.1145\/3320060","key":"4417_CR1","DOI":"10.1145\/3320060"},{"unstructured":"Intel Corporation (2022) Intel oneAPI Programming Guide. https:\/\/www.intel.com\/content\/www\/us\/en\/develop\/documentation\/oneapi-programming-guide\/top.html (online). Accessed 03 Feb 2022","key":"4417_CR2"},{"issue":"7","key":"4417_CR3","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1145\/3361682","volume":"63","author":"J Dally William","year":"2020","unstructured":"Dally William J, Yatish T, Song H (2020) Domain-specific hardware accelerators. Commun ACM 63(7):48\u201357. https:\/\/doi.org\/10.1145\/3361682","journal-title":"Commun ACM"},{"doi-asserted-by":"publisher","unstructured":"De Carvalho JP, Kuzma B, Korostelev I, Amaral JN, Barton C, Moreira J, Araujo G (2021) KernelFaRer: replacing native-code idioms with high-performance library calls. ACM Trans Arch Code Optim. https:\/\/doi.org\/10.1145\/3459010","key":"4417_CR4","DOI":"10.1145\/3459010"},{"doi-asserted-by":"crossref","unstructured":"Edwards HC, Trott CR (2013) Kokkos: enabling performance portability across manycore architectures. In: 2013 Extreme Scaling Workshop (xsw 2013), pp 18\u201324","key":"4417_CR5","DOI":"10.1109\/XSW.2013.7"},{"doi-asserted-by":"publisher","unstructured":"Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: ISCA \u201911. Association for Computing Machinery, New York, pp 365\u2013376. https:\/\/doi.org\/10.1145\/2000064.2000108","key":"4417_CR6","DOI":"10.1145\/2000064.2000108"},{"issue":"4","key":"4417_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3469030","volume":"18","author":"T Gysi","year":"2020","unstructured":"Gysi T, M\u00fcller C, Zinenko O, Herhut S, Davis E, Wicky T, Fuhrer O, Hoefler T, Grosser T (2020)  Domain-Specific Multi-Level IR Rewriting for GPU:  The Open Earth Compiler for GPU-accelerated Climate Simulation.  ACM Trans Architect Code Optimization 18(4):1-23 arXiv:2005.13014","journal-title":"ACM Trans Architect Code Optimization"},{"issue":"2","key":"4417_CR8","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1145\/3282307","volume":"62","author":"L Hennessy John","year":"2019","unstructured":"Hennessy John L, Patterson David A (2019) A new golden age for computer architecture. Commun ACM 62(2):48\u201360. https:\/\/doi.org\/10.1145\/3282307","journal-title":"Commun ACM"},{"doi-asserted-by":"publisher","unstructured":"Jouppi NP, Hyun Yoon D, Ashcraft M, et al (2021) Ten lessons from three generations shaped Google\u2019s TPUv4i : industrial product. In: 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Los Alamitos, pp 1\u201314. https:\/\/doi.org\/10.1109\/ISCA52012.2021.00010","key":"4417_CR9","DOI":"10.1109\/ISCA52012.2021.00010"},{"issue":"7","key":"4417_CR10","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1145\/3360307","volume":"63","author":"P Jouppi Norman","year":"2020","unstructured":"Jouppi Norman P, Hyun YD, George K et al (2020) A domain-specific supercomputer for training deep neural networks. Commun ACM 63(7):67\u201378. https:\/\/doi.org\/10.1145\/3360307","journal-title":"Commun ACM"},{"doi-asserted-by":"publisher","unstructured":"Jouppi NP, Young C, Patil N, et al (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA \u201917, New York. Association for Computing Machinery, pp 1-12. https:\/\/doi.org\/10.1145\/3079856.3080246","key":"4417_CR11","DOI":"10.1145\/3079856.3080246"},{"doi-asserted-by":"publisher","unstructured":"Komisarczyk K, Chelini L, Vadivel K, et al (2020) PET-to-MLIR: a polyhedral front-end for MLIR. In: 2020 23rd Euromicro Conference on Digital System Design (DSD), pp 551\u2013556. https:\/\/doi.org\/10.1109\/DSD51259.2020.00091","key":"4417_CR12","DOI":"10.1109\/DSD51259.2020.00091"},{"issue":"1","key":"4417_CR13","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1145\/3200691.3178493","volume":"53","author":"M Kotsifakou","year":"2018","unstructured":"Kotsifakou M, Srivastava P, Sinclair Matthew D et al (2018) HPVM: heterogeneous parallel virtual machine. SIGPLAN Not. 53(1):68\u201380. https:\/\/doi.org\/10.1145\/3200691.3178493","journal-title":"SIGPLAN Not."},{"unstructured":"Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO\u201904), Palo Alto","key":"4417_CR14"},{"unstructured":"Lattner C, Amini M, Bondhugula U, Cohen A, et al (2020) MLIR: a compiler infrastructure for the end of Moore\u2019s law. arXiv:2002.11054","key":"4417_CR15"},{"unstructured":"Leary C, Wang T (2017) XLA: TensorFlow, compiled. https:\/\/developers.googleblog.com\/2017\/03\/xla-tensorflow-compiled.html (online). Accessed 03 Feb 2022","key":"4417_CR16"},{"doi-asserted-by":"publisher","unstructured":"Lie S (2019) Wafer-scale deep learning. In: 2019 IEEE Hot Chips 31 Symposium (HCS), pp 1\u201331. https:\/\/doi.org\/10.1109\/HOTCHIPS.2019.8875628","key":"4417_CR17","DOI":"10.1109\/HOTCHIPS.2019.8875628"},{"doi-asserted-by":"publisher","unstructured":"Mart\u00ednez PA, Peccerillo B, Bartolini S, Garc\u00eda JM, Bernab\u00e9 G (2022) Applying Intel\u2019s oneAPI to a machine learning case study. Concurr Comput Pract Exp.  https:\/\/doi.org\/10.1002\/cpe.6917","key":"4417_CR18","DOI":"10.1002\/cpe.6917"},{"doi-asserted-by":"crossref","unstructured":"McCaskey A, Nguyen T (2021) A MLIR dialect for quantum assembly languages. arXiv:2101.11365","key":"4417_CR19","DOI":"10.1109\/QCE52317.2021.00043"},{"issue":"1","key":"4417_CR20","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1109\/TPDS.2018.2855182","volume":"30","author":"P Biagio","year":"2019","unstructured":"Biagio P, Sandro B (2019) PHAST\u2014a portable high-level modern C++ programming library for GPUs and multi-cores. IEEE Trans Parallel Distrib Syst 30(1):174\u2013189. https:\/\/doi.org\/10.1109\/TPDS.2018.2855182","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"5","key":"4417_CR21","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1109\/MCSE.2021.3097276","volume":"23","author":"Pennycook S John","year":"2021","unstructured":"John Pennycook S, Sewall Jason D, Jacobsen Douglas W (2021) Navigating performance, portability, and productivity. Comput Sci Eng 23(5):28\u201338. https:\/\/doi.org\/10.1109\/MCSE.2021.3097276","journal-title":"Comput Sci Eng"},{"unstructured":"Rotem N, Fix J, Abdulrasool S, et\u00a0al (2019) Glow: graph lowering compiler techniques for neural networks. arXiv:1805.00907","key":"4417_CR22"},{"issue":"3","key":"4417_CR23","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1109\/MCSE.2010.69","volume":"12","author":"E Stone John","year":"2010","unstructured":"Stone John E, David G, Guochun S (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66\u201373. https:\/\/doi.org\/10.1109\/MCSE.2010.69","journal-title":"Comput Sci Eng"},{"issue":"12","key":"4417_CR24","doi-asserted-by":"publisher","first-page":"2295","DOI":"10.1109\/JPROC.2017.2761740","volume":"105","author":"S Vivienne","year":"2017","unstructured":"Vivienne S, Yu-Hsin C, Tien-Ju Y, Emer Joel S (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295\u20132329. https:\/\/doi.org\/10.1109\/JPROC.2017.2761740","journal-title":"Proc IEEE"},{"issue":"5","key":"4417_CR25","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1109\/MCSE.2021.3097167","volume":"23","author":"W Michael","year":"2021","unstructured":"Michael W (2021) Performant, portable, and productive parallel programming with standard languages. Comput Sci Eng 23(5):39\u201345. https:\/\/doi.org\/10.1109\/MCSE.2021.3097167","journal-title":"Comput Sci Eng"},{"unstructured":"Zhao T, Huang X, Cao Y (2017) DeepDSL: a compilation-based domain-specific language for deep learning. arXiv:1701.02284","key":"4417_CR26"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-022-04417-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-022-04417-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-022-04417-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,4]],"date-time":"2022-07-04T14:14:47Z","timestamp":1656944087000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-022-04417-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,25]]},"references-count":26,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["4417"],"URL":"https:\/\/doi.org\/10.1007\/s11227-022-04417-3","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"type":"print","value":"0920-8542"},{"type":"electronic","value":"1573-0484"}],"subject":[],"published":{"date-parts":[[2022,3,25]]},"assertion":[{"value":"28 February 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 March 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 April 2022","order":3,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":4,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"ORCID added for author Martinez and author Garcia","order":5,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}]}}