{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T18:53:14Z","timestamp":1770835994258,"version":"3.50.1"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2022,3,19]],"date-time":"2022-03-19T00:00:00Z","timestamp":1647648000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,19]],"date-time":"2022-03-19T00:00:00Z","timestamp":1647648000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100007601","name":"Horizon 2020","doi-asserted-by":"publisher","award":["754304"],"award-info":[{"award-number":["754304"]}],"id":[{"id":"10.13039\/501100007601","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008432","name":"Consejer\u00eda de Educaci\u00f3n y Empleo, Junta de Extremadura","doi-asserted-by":"publisher","award":["IB20040"],"award-info":[{"award-number":["IB20040"]}],"id":[{"id":"10.13039\/501100008432","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["PID2019-110315RB-I00"],"award-info":[{"award-number":["PID2019-110315RB-I00"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100015720","name":"Universidad de Extremadura","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100015720","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Nowadays, data processing applications based on neural networks cope with the growth in the amount of data to be processed and with the increase in both the depth and complexity of the neural networks architectures, and hence in the number of parameters to be learned. High-performance computing platforms are provided with fast computing resources, including multi-core processors and graphical processing units, to manage such computational burden of deep neural network applications. A common optimization technique is to distribute the workload between the processes deployed on the resources of the platform. This approach is known as data-parallelism. Each process, known as replica, trains its own copy of the model on a disjoint data partition. Nevertheless, the heterogeneity of the computational resources composing the platform requires to unevenly distribute the workload between the replicas according to its computational capabilities, to optimize the overall execution performance. Since the amount of data to be processed is different in each replica, the influence of the gradients computed by the replicas in the global parameter updating should be different. This work proposes a modification of the gradient computation method that considers the different speeds of the replicas, and hence, its amount of data assigned. The experimental results have been conducted on heterogeneous high-performance computing platforms for a wide range of models and datasets, showing an improvement in the final accuracy with respect to current techniques, with a comparable performance.<\/jats:p>","DOI":"10.1007\/s11227-022-04399-2","type":"journal-article","created":{"date-parts":[[2022,3,19]],"date-time":"2022-03-19T18:02:45Z","timestamp":1647712965000},"page":"13455-13469","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Heterogeneous gradient computing optimization for scalable deep neural networks"],"prefix":"10.1007","volume":"78","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1858-9920","authenticated-orcid":false,"given":"Sergio","family":"Moreno-\u00c1lvarez","sequence":"first","affiliation":[]},{"given":"Mercedes E.","family":"Paoletti","sequence":"additional","affiliation":[]},{"given":"Juan A.","family":"Rico-Gallego","sequence":"additional","affiliation":[]},{"given":"Juan M.","family":"Haut","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,19]]},"reference":[{"key":"4399_CR1","unstructured":"Alistarh D, Grubic D, Li J, Tomioka R, Vojnovic M (2017) QSGD: communication-efficient SGD via gradient quantization and encoding. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 1709\u20131720"},{"key":"4399_CR2","unstructured":"Ben-Nun T, Hoefler T (2018) Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. arXiv:1802.09941"},{"key":"4399_CR3","unstructured":"Byrd J, Lipton Z (2019) What is the effect of importance weighting in deep learning? In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference Machine Learning, P. Machine Learning Research, vol.\u00a097. PMLR, pp 872\u2013881"},{"key":"4399_CR4","unstructured":"Chang HS, Learned-Miller EG, McCallum A (2017) Active bias: training more accurate neural networks by emphasizing high variance samples. In: NIPS"},{"key":"4399_CR5","doi-asserted-by":"publisher","unstructured":"Chen C, Weng Q, Wang W, Li B, Li B (2020) Semi-dynamic load balancing. In: Proceedings of the 11th ACM symposium on cloud computing. https:\/\/doi.org\/10.1145\/3419111.3421299","DOI":"10.1145\/3419111.3421299"},{"key":"4399_CR6","doi-asserted-by":"publisher","first-page":"314","DOI":"10.1016\/j.ins.2014.01.015","volume":"275","author":"CLP Chen","year":"2014","unstructured":"Chen CLP, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314\u2013347","journal-title":"Inf Sci"},{"key":"4399_CR7","unstructured":"Chen J, Monga R, Bengio S, Jozefowicz R (2016) Revisiting distributed synchronous sgd. In: ICLR Workshop Track"},{"key":"4399_CR8","doi-asserted-by":"crossref","unstructured":"Clarke D, Zhong Z, Rychkov V, Lastovetsky A (2013) Fupermod: a framework for optimal data partitioning for parallel scientific applications on dedicated heterogeneous hpc platforms. In: Parallel computing technologies. Springer, Berlin, pp 182\u2013196","DOI":"10.1007\/978-3-642-39958-9_16"},{"key":"4399_CR9","doi-asserted-by":"crossref","unstructured":"Gupta S, Zhang W, Wang F (2016) Model accuracy and runtime tradeoff in distributed deep learning: a systematic study. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp 171\u2013180","DOI":"10.1109\/ICDM.2016.0028"},{"key":"4399_CR10","doi-asserted-by":"crossref","unstructured":"Gupta S, Zhang W, Wang F (2017) Model accuracy and runtime tradeoff in distributed deep learning: a systematic study. In: IJCAI, pp 4854\u20134858","DOI":"10.24963\/ijcai.2017\/681"},{"key":"4399_CR11","doi-asserted-by":"crossref","unstructured":"Haut JM, Paoletti ME, Moreno-\u00c1lvarez S, Plaza J, Rico-Gallego JA, Plaza A (2021) Distributed deep learning for remote sensing data interpretation. In: Proceedings of the IEEE","DOI":"10.1109\/JPROC.2021.3063258"},{"key":"4399_CR12","unstructured":"Hemanth DJ, Estrela VV (2017) Deep learning for image processing applications, vol\u00a031. IOS Press"},{"issue":"5","key":"4399_CR13","doi-asserted-by":"publisher","first-page":"4340","DOI":"10.1109\/TGRS.2020.3016820","volume":"59","author":"D Hong","year":"2021","unstructured":"Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B (2021) More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Trans Geosci Remote Sens 59(5):4340\u20134354","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"4399_CR14","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2021.3130716","author":"D Hong","year":"2021","unstructured":"Hong D, Han Z, Yao J, Gao L, Zhang B, Plaza A, Chanussot J (2021) Spectralformer: rethinking hyperspectral image classification with transformers. IEEE Trans Geosci Remote Sens. https:\/\/doi.org\/10.1109\/TGRS.2021.3130716","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"4399_CR15","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Weinberger KQ (2017) Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2261\u20132269","DOI":"10.1109\/CVPR.2017.243"},{"issue":"2","key":"4399_CR16","doi-asserted-by":"publisher","first-page":"179","DOI":"10.32010\/26166127.2018.1.2.179.184","volume":"1","author":"N Ismayilova","year":"2018","unstructured":"Ismayilova N, Ismayilov E (2018) Convergence of hpc and ai: two directions of connection. Azerbaijan J High Perform Comput 1(2):179\u2013184","journal-title":"Azerbaijan J High Perform Comput"},{"key":"4399_CR17","doi-asserted-by":"crossref","unstructured":"Jiang J, Cui B, Zhang C, Yu L (2017) Heterogeneity-aware distributed parameter servers. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD \u201917. ACM, New York, pp 463\u2013478","DOI":"10.1145\/3035918.3035933"},{"key":"4399_CR18","unstructured":"Krizhevsky A (2014) One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997"},{"issue":"2","key":"4399_CR19","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1016\/j.zemedi.2018.12.003","volume":"29","author":"A Maier","year":"2019","unstructured":"Maier A, Syben C, Lasser T, Riess C (2019) A gentle introduction to deep learning in medical image processing. Zeitschrift Medizinische Physik 29(2):86\u2013101","journal-title":"Zeitschrift Medizinische Physik"},{"key":"4399_CR20","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.neucom.2017.11.044","volume":"281","author":"Y Ming","year":"2018","unstructured":"Ming Y, Zhao Y, Wu C, Li K, Yin J (2018) Distributed and asynchronous stochastic gradient descent with variance reduction. Neurocomputing 281:27\u201336","journal-title":"Neurocomputing"},{"key":"4399_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.neucom.2021.01.125","volume":"441","author":"S Moreno-Alvarez","year":"2021","unstructured":"Moreno-Alvarez S, Haut JM, Paoletti ME, Rico-Gallego JA (2021) Heterogeneous model parallelism for deep neural networks. Neurocomputing 441:1\u201312","journal-title":"Neurocomputing"},{"issue":"12","key":"4399_CR22","doi-asserted-by":"publisher","first-page":"9739","DOI":"10.1007\/s11227-020-03200-6","volume":"76","author":"S Moreno-\u00c1lvarez","year":"2020","unstructured":"Moreno-\u00c1lvarez S, Haut JM, Paoletti ME, Rico-Gallego JA, Diaz-Martin JC, Plaza J (2020) Training deep neural networks: a static load balancing approach. J Supercomput 76(12):9739\u20139754","journal-title":"J Supercomput"},{"key":"4399_CR23","doi-asserted-by":"crossref","unstructured":"Nguyen TD, Park JH, Hossain MI, Hossain MD, Lee SJ, Jang JW, Jo SH, Huynh LN, Tran TK, Huh EN (2018) Performance analysis of data parallelism technique in machine learning for human activity recognition using lstm. In: IEEE International Conference on Cloud Computing Technology and Science, pp 387\u2013391 (2019)","DOI":"10.1109\/CloudCom.2019.00066"},{"issue":"2","key":"4399_CR24","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1109\/TNNLS.2020.2979670","volume":"32","author":"DW Otter","year":"2020","unstructured":"Otter DW, Medina JR, Kalita JK (2020) A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst 32(2):604\u2013624","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"4399_CR25","unstructured":"Sergeev A, Balso MD (2018) Horovod: fast and easy distributed deep learning in tensorflow. arXiv:1802.05799"},{"key":"4399_CR26","unstructured":"Shallue CJ, Lee J, Antognini J, Sohl-Dickstein J, Frostig R, Dahl GE (2018) Measuring the effects of data parallelism on neural network training. arXiv:1811.03600"},{"key":"4399_CR27","doi-asserted-by":"crossref","unstructured":"Suarez E, Eicker N, Lippert T (2019) Modular supercomputing architecture: from idea to production. In: Contemporary high performance computing","DOI":"10.1201\/9781351036863-9"},{"key":"4399_CR28","unstructured":"Suresh AT, Yu F, Kumar S, McMahan HB (2017) Distributed mean estimation with limited communication. arXiv:1611.00429"},{"key":"4399_CR29","doi-asserted-by":"crossref","unstructured":"Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6450\u20136458","DOI":"10.1109\/CVPR.2017.683"},{"key":"4399_CR30","unstructured":"Wen W, Xu C, Yan F, Wu C, Wang Y, Chen Y, Li H (2017) Terngrad: ternary gradients to reduce communication in distributed deep learning. In: 31st International Conference on Neural Information Processing Systems (NIPS 2017)"},{"issue":"9","key":"4399_CR31","doi-asserted-by":"publisher","first-page":"5408","DOI":"10.1109\/TGRS.2018.2815613","volume":"56","author":"X Yang","year":"2018","unstructured":"Yang X, Ye Y, Li X, Lau RY, Zhang X, Huang X (2018) Hyperspectral image classification with deep learning models. IEEE Trans Geosci Remote Sens 56(9):5408\u20135423","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"4399_CR32","doi-asserted-by":"crossref","unstructured":"Yoginath S, Alam M, Ramanathan A, Bhowmik D, Laanait N, Perumalla KS (2019) Towards native execution of deep learning on a leadership-class hpc system. In: 2019 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, pp 941\u2013950 (2019)","DOI":"10.1109\/IPDPSW.2019.00160"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-022-04399-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-022-04399-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-022-04399-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,4]],"date-time":"2022-07-04T14:11:15Z","timestamp":1656943875000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-022-04399-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,19]]},"references-count":32,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["4399"],"URL":"https:\/\/doi.org\/10.1007\/s11227-022-04399-2","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,19]]},"assertion":[{"value":"22 February 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 March 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}