{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:26:10Z","timestamp":1750220770550,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":58,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,6,29]],"date-time":"2020-06-29T00:00:00Z","timestamp":1593388800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100011103","name":"European Commission","doi-asserted-by":"publisher","award":["779877"],"award-info":[{"award-number":["779877"]}],"id":[{"id":"10.13039\/100011103","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,6,29]]},"DOI":"10.1145\/3392717.3392762","type":"proceedings-article","created":{"date-parts":[[2020,6,29]],"date-time":"2020-06-29T18:49:02Z","timestamp":1593456542000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Wavefront parallelization of recurrent neural networks on multi-core architectures"],"prefix":"10.1145","author":[{"given":"Robin Kumar","family":"Sharma","sequence":"first","affiliation":[{"name":"Barcelona Supercomputing Center (BSC), Barcelona, Spain"}]},{"given":"Marc","family":"Casas","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center (BSC), Barcelona, Spain"}]}],"member":"320","published-online":{"date-parts":[[2020,6,29]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2016. Tiramisu Compiler. https:\/\/github.com\/Tiramisu-Compiler\/tiramisu. (2016).  2016. Tiramisu Compiler. https:\/\/github.com\/Tiramisu-Compiler\/tiramisu. (2016)."},{"key":"e_1_3_2_1_2_1","unstructured":"2017. Eigen: a C++ Linear Algebra Library. http:\/\/eigen.tuxfamily.org\/. (2017).  2017. Eigen: a C++ Linear Algebra Library. http:\/\/eigen.tuxfamily.org\/. (2017)."},{"key":"e_1_3_2_1_3_1","unstructured":"2017. KANN framework. https:\/\/github.com\/attractivechaos\/kann. (2017).  2017. KANN framework. https:\/\/github.com\/attractivechaos\/kann. (2017)."},{"key":"e_1_3_2_1_4_1","unstructured":"2017. Understanding LSTM Networks. https:\/\/colah.github.io\/posts\/2015-08-Understanding-LSTMs\/. (2017).  2017. Understanding LSTM Networks. https:\/\/colah.github.io\/posts\/2015-08-Understanding-LSTMs\/. (2017)."},{"key":"e_1_3_2_1_5_1","unstructured":"2018. CERN-Districuted Keras. https:\/\/github.com\/cerndb\/dist-keras. (2018).  2018. CERN-Districuted Keras. https:\/\/github.com\/cerndb\/dist-keras. (2018)."},{"key":"e_1_3_2_1_6_1","unstructured":"2018. Intel Keras performance improvement. https:\/\/intel.ly\/2N0xZrE. (2018).  2018. Intel Keras performance improvement. https:\/\/intel.ly\/2N0xZrE. (2018)."},{"key":"e_1_3_2_1_7_1","unstructured":"2018. Uber-horovod. https:\/\/github.com\/horovod\/horovod. (2018).  2018. Uber-horovod. https:\/\/github.com\/horovod\/horovod. (2018)."},{"key":"e_1_3_2_1_8_1","unstructured":"2018. Using the Intel optimized tensroflow. https:\/\/intel.ly\/2RPWklu. (2018).  2018. Using the Intel optimized tensroflow. https:\/\/intel.ly\/2RPWklu. (2018)."},{"key":"e_1_3_2_1_9_1","unstructured":"2018. Why use Keras? https:\/\/keras.io\/why-use-keras\/. (2018).  2018. Why use Keras? https:\/\/keras.io\/why-use-keras\/. (2018)."},{"key":"e_1_3_2_1_10_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat and Ian Goodfellow et. al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http:\/\/tensorfow.org\/ Software available from tensorlow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat and Ian Goodfellow et. al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http:\/\/tensorfow.org\/ Software available from tensorlow.org."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2002.1016487"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2002.1016487"},{"volume-title":"Optimizing Performance of Recurrent Neural Networks on GPUs. CoRR abs\/1604.01946","year":"2016","author":"Appleyard Jeremy","key":"e_1_3_2_1_13_1"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/45.329294"},{"volume-title":"LSTM Benchmarks for Deep Learning Frameworks. CoRR abs\/1806.01818","year":"2018","author":"Braun Stefan","key":"e_1_3_2_1_15_1"},{"volume-title":"MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. (12","year":"2015","author":"Chen Tianqi","key":"e_1_3_2_1_16_1"},{"key":"e_1_3_2_1_17_1","unstructured":"Kyunghyun Cho Bart van Merrienboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. (2014). arXiv:cs.CL\/1409.1259  Kyunghyun Cho Bart van Merrienboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. (2014). arXiv:cs.CL\/1409.1259"},{"key":"e_1_3_2_1_18_1","unstructured":"Fran\u00c3\u011fois Chollet. 2015. keras. https:\/\/github.com\/keras-team\/keras. (2015).  Fran\u00c3\u011fois Chollet. 2015. keras. https:\/\/github.com\/keras-team\/keras. (2015)."},{"key":"e_1_3_2_1_19_1","unstructured":"Junyoung Chung Caglar Gulcehre KyungHyun Cho and Y. Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. (12 2014).  Junyoung Chung Caglar Gulcehre KyungHyun Cho and Y. Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. (12 2014)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1980.1653418"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.3233\/978-1-61499-041-3-65"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626411000151"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.963769"},{"volume-title":"Deep learning","author":"Goodfellow Ian","key":"e_1_3_2_1_25_1"},{"volume-title":"Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks. (12","year":"2013","author":"Goodfellow Ian","key":"e_1_3_2_1_26_1"},{"volume-title":"Generating Sequences With Recurrent Neural Networks. (08","year":"2013","author":"Graves Alex","key":"e_1_3_2_1_27_1"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2017.05.020"},{"key":"e_1_3_2_1_31_1","unstructured":"M. Hermans and B. Schrauwen. 2013. Training and analyzing deep recurrent neural networks. Advances in Neural Information Processing Systems (01 2013).  M. Hermans and B. Schrauwen. 2013. Training and analyzing deep recurrent neural networks. Advances in Neural Information Processing Systems (01 2013)."},{"key":"e_1_3_2_1_32_1","unstructured":"M. Hermans and B. Schrauwen. 2013. Training and analyzing deep recurrent neural networks. Advances in Neural Information Processing Systems (01 2013).  M. Hermans and B. Schrauwen. 2013. Training and analyzing deep recurrent neural networks. Advances in Neural Information Processing Systems (01 2013)."},{"key":"e_1_3_2_1_33_1","unstructured":"Geoffrey Hinton. 2017. RmsProp optimizer. https:\/\/bit.ly\/36udFGJ. (2017).  Geoffrey Hinton. 2017. RmsProp optimizer. https:\/\/bit.ly\/36udFGJ. (2017)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_3_2_1_36_1","unstructured":"Lukasz Kaiser and Ilya Sutskever. 2016. Neural GPUs Learn Algorithms.  Lukasz Kaiser and Ilya Sutskever. 2016. Neural GPUs Learn Algorithms."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1137\/0114108"},{"key":"e_1_3_2_1_38_1","first-page":"1","article-title":"Exploring strategies for training deep neural networks","author":"Larochelle Hugo","year":"2009","journal-title":"Journal of machine learning research 10"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553453"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1477"},{"key":"e_1_3_2_1_41_1","unstructured":"R. G. Leonard and G. Doddington. 1993. Tidigits speech corpus. Texas Instruments Inc (1993).  R. G. Leonard and G. Doddington. 1993. Tidigits speech corpus. Texas Instruments Inc (1993)."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45591-4_35"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623612"},{"volume-title":"A Novel Approach to OnLine Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks. (10","year":"2019","author":"Liwicki Marcus","key":"e_1_3_2_1_44_1"},{"key":"e_1_3_2_1_45_1","unstructured":"Dan Neil Michael Pfeiffer and S-C Liu. 2016. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences.  Dan Neil Michael Pfeiffer and S-C Liu. 2016. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences."},{"volume-title":"Activation Functions: Comparison of trends in Practice and Research for Deep Learning.","year":"2018","author":"Nwankpa Chigozie","key":"e_1_3_2_1_46_1"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/390010.808263"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/7902.7904"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSoC.2019.00042"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2017.2762739"},{"key":"e_1_3_2_1_51_1","unstructured":"Tony Robinson and F. Failside. 1987. Static and Dynamic Error Propagation Networks with Application to Speech Coding. 632--641.  Tony Robinson and F. Failside. 1987. Static and Dynamic Error Propagation Networks with Application to Speech Coding. 632--641."},{"volume-title":"Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (01","year":"2014","author":"Sak H.","key":"e_1_3_2_1_52_1"},{"volume-title":"Recent Advances in Recurrent Neural Networks. (12","year":"2017","author":"Salehinejad Hojjat","key":"e_1_3_2_1_53_1"},{"volume-title":"On Supervised Learning From Sequential Data With Applications For Speech Recognition. (04","year":"1999","author":"Schuster Michael","key":"e_1_3_2_1_54_1"},{"volume-title":"Proceedings of the 28th International Conference on Machine Learning (ICML-11)","year":"2011","author":"Sutskever Ilya","key":"e_1_3_2_1_55_1"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688514"},{"volume-title":"Cao","year":"2016","author":"Wu Yonghui","key":"e_1_3_2_1_57_1"},{"key":"e_1_3_2_1_58_1","unstructured":"Minjia Zhang Samyam Rajbhandari Wenhan Wang and Yuxiong He. 2018. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster.  Minjia Zhang Samyam Rajbhandari Wenhan Wang and Yuxiong He. 2018. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster."}],"event":{"name":"ICS '20: 2020 International Conference on Supercomputing","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"],"location":"Barcelona Spain","acronym":"ICS '20"},"container-title":["Proceedings of the 34th ACM International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3392717.3392762","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3392717.3392762","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:15Z","timestamp":1750200075000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3392717.3392762"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,29]]},"references-count":58,"alternative-id":["10.1145\/3392717.3392762","10.1145\/3392717"],"URL":"https:\/\/doi.org\/10.1145\/3392717.3392762","relation":{},"subject":[],"published":{"date-parts":[[2020,6,29]]},"assertion":[{"value":"2020-06-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}