{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T03:47:53Z","timestamp":1772164073572,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2015,1,24]],"date-time":"2015-01-24T00:00:00Z","timestamp":1422057600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2015,1,24]]},"DOI":"10.1145\/2688500.2688521","type":"proceedings-article","created":{"date-parts":[[2015,1,28]],"date-time":"2015-01-28T09:12:26Z","timestamp":1422436346000},"page":"173-182","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["On optimizing machine learning workloads via kernel fusion"],"prefix":"10.1145","author":[{"given":"Arash","family":"Ashari","sequence":"first","affiliation":[{"name":"Ohio State University, USA"}]},{"given":"Shirish","family":"Tatikonda","sequence":"additional","affiliation":[{"name":"IBM, USA"}]},{"given":"Matthias","family":"Boehm","sequence":"additional","affiliation":[{"name":"IBM, USA"}]},{"given":"Berthold","family":"Reinwald","sequence":"additional","affiliation":[{"name":"IBM, USA"}]},{"given":"Keith","family":"Campbell","sequence":"additional","affiliation":[{"name":"IBM, Canada"}]},{"given":"John","family":"Keenleyside","sequence":"additional","affiliation":[{"name":"IBM, Canada"}]},{"given":"P.","family":"Sadayappan","sequence":"additional","affiliation":[{"name":"Ohio State University, USA"}]}],"member":"320","published-online":{"date-parts":[[2015,1,24]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/180\/1\/012037"},{"key":"e_1_3_2_1_2_1","first-page":"5","author":"Baldi P.","year":"2014","unstructured":"P. Baldi , P. Sadowski , and D. Whiteson . Searching for Exotic Particles in High-Energy Physics with Deep Learning. Nature communications , 5 , 2014 . P. Baldi, P. Sadowski, and D. Whiteson. Searching for Exotic Particles in High-Energy Physics with Deep Learning. Nature communications, 5, 2014.","journal-title":"Nature communications"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-92bf1922-003"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732286.2732292"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487677"},{"key":"e_1_3_2_1_7_1","volume-title":"NIPS Workshop","author":"Canny J.","year":"2013","unstructured":"J. Canny and H. Zhao . BIDMach: Large-Scale Learning with Zero Memory Allocation. In BigLearning , NIPS Workshop , 2013 . J. Canny and H. Zhao. BIDMach: Large-Scale Learning with Zero Memory Allocation. In BigLearning, NIPS Workshop, 2013."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390170"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2007.19.5.1155"},{"key":"e_1_3_2_1_10_1","first-page":"1345","volume-title":"Proceedings of the 30th International Conference on Machine Learning","author":"Coates A.","year":"2013","unstructured":"A. Coates , B. Huval , T. Wang , D. Wu , B. Catanzaro , and N. Andrew . Deep Learning with COTS HPC Systems . In Proceedings of the 30th International Conference on Machine Learning , pages 1337\u2013 1345 , 2013 . A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew. Deep Learning with COTS HPC Systems. In Proceedings of the 30th International Conference on Machine Learning, pages 1337\u20131345, 2013."},{"key":"e_1_3_2_1_11_1","volume-title":"NIPS Workshop","author":"Collobert R.","year":"2011","unstructured":"R. Collobert , K. Kavukcuoglu , and C. Farabet . Torch7: A Matlab-like Environment for Machine Learning. In BigLearning , NIPS Workshop , 2011 . R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A Matlab-like Environment for Machine Learning. In BigLearning, NIPS Workshop, 2011."},{"key":"e_1_3_2_1_12_1","unstructured":"cuBLAS. The NVIDIA CUDA Basic Linear Algebra Subroutines Library. URL https:\/\/developer.nvidia.com\/cublas.  cuBLAS. The NVIDIA CUDA Basic Linear Algebra Subroutines Library. URL https:\/\/developer.nvidia.com\/cublas."},{"key":"e_1_3_2_1_13_1","unstructured":"CUDA. A Parallel Computing Platform and Programming Model Invented by NVIDIA. URL http:\/\/www.nvidia.com\/object\/ cuda_home_new.html.  CUDA. A Parallel Computing Platform and Programming Model Invented by NVIDIA. URL http:\/\/www.nvidia.com\/object\/ cuda_home_new.html."},{"key":"e_1_3_2_1_14_1","unstructured":"cuDNN. The NVIDIA CUDA Library of Primitives for Deep Neural Networks. URL https:\/\/developer.nvidia.com\/cuDNN.  cuDNN. The NVIDIA CUDA Library of Primitives for Deep Neural Networks. URL https:\/\/developer.nvidia.com\/cuDNN."},{"key":"e_1_3_2_1_15_1","unstructured":"cuSPARSE. The NVIDIA CUDA Sparse Matrix Library. URL https:\/\/developer.nvidia.com\/cusparse.  cuSPARSE. The NVIDIA CUDA Sparse Matrix Library. URL https:\/\/developer.nvidia.com\/cusparse."},{"key":"e_1_3_2_1_16_1","first-page":"340","author":"Farivar R.","year":"2008","unstructured":"R. Farivar , D. Rebolledo , E. Chan , and R. H. Campbell . A Parallel Implementation of K-Means Clustering on GPUs. In PDPTA, pages 340 \u2013 345 , 2008 . R. Farivar, D. Rebolledo, E. Chan, and R. H. Campbell. A Parallel Implementation of K-Means Clustering on GPUs. In PDPTA, pages 340\u2013345, 2008.","journal-title":"A Parallel Implementation of K-Means Clustering on GPUs. In PDPTA, pages"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767930"},{"key":"e_1_3_2_1_18_1","unstructured":"HiPLAR. High Performance Linear Algebra in R. URL http: \/\/hiplar.org.  HiPLAR. High Performance Linear Algebra in R. URL http: \/\/hiplar.org."},{"key":"e_1_3_2_1_19_1","volume-title":"Large-Scale Linear Support Vector Regression. The Journal of Machine Learning Research, 13(1):3323\u20133348","author":"Ho C.-H.","year":"2012","unstructured":"C.-H. Ho and C.-J. Lin . Large-Scale Linear Support Vector Regression. The Journal of Machine Learning Research, 13(1):3323\u20133348 , 2012 . C.-H. Ho and C.-J. Lin. Large-Scale Linear Support Vector Regression. The Journal of Machine Learning Research, 13(1):3323\u20133348, 2012."},{"key":"e_1_3_2_1_20_1","unstructured":"Intel. Math Kernel Library. URL https:\/\/software.intel.com\/ en-us\/intel-mkl.  Intel. Math Kernel Library. URL https:\/\/software.intel.com\/ en-us\/intel-mkl."},{"key":"e_1_3_2_1_21_1","volume-title":"December","author":"Khronos OpenCL Working Group","year":"2008","unstructured":"Khronos OpenCL Working Group . The OpenCL Specification, version 1.0.29 , December 2008 . Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29, December 2008."},{"key":"e_1_3_2_1_22_1","volume-title":"Newnes","author":"Kirk D. B.","year":"2012","unstructured":"D. B. Kirk and W. H. Wen-mei . Programming Massively Parallel Processors: a Hands-on Approach . Newnes , 2012 . D. B. Kirk and W. H. Wen-mei. Programming Massively Parallel Processors: a Hands-on Approach. Newnes, 2012."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/324133.324140"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/1390681.1390703"},{"key":"e_1_3_2_1_25_1","first-page":"355","volume":"3","author":"Lopes N.","year":"2011","unstructured":"N. Lopes and B. Ribeiro . GPUMLib : An Efficient Open-Source GPU Machine Learning Library. International Journal of Computer Information Systems and Industrial Management Applications , 3 : 355 \u2013 362 , 2011 . N. Lopes and B. Ribeiro. GPUMLib: An Efficient Open-Source GPU Machine Learning Library. International Journal of Computer Information Systems and Industrial Management Applications, 3:355\u2013 362, 2011.","journal-title":"An Efficient Open-Source GPU Machine Learning Library. International Journal of Computer Information Systems and Industrial Management Applications"},{"key":"e_1_3_2_1_26_1","first-page":"232","volume-title":"Hybrid Intelligent Systems (HIS), 2010 10th International Conference on","author":"Lopes N.","unstructured":"N. Lopes , B. Ribeiro , and R. Quintas . GPUMLib: a New Library to Combine Machine Learning Algorithms with Graphics Processing Units . In Hybrid Intelligent Systems (HIS), 2010 10th International Conference on , pages 229\u2013 232 . IEEE, 2010. N. Lopes, B. Ribeiro, and R. Quintas. GPUMLib: a New Library to Combine Machine Learning Algorithms with Graphics Processing Units. In Hybrid Intelligent Systems (HIS), 2010 10th International Conference on, pages 229\u2013232. IEEE, 2010."},{"key":"e_1_3_2_1_27_1","unstructured":"MAGMA. Matrix Algebra on GPU and Multicore Architectures. URL http:\/\/icl.cs.utk.edu\/magma.  MAGMA. Matrix Algebra on GPU and Multicore Architectures. URL http:\/\/icl.cs.utk.edu\/magma."},{"key":"e_1_3_2_1_28_1","volume-title":"Generalized Linear Models. European Journal of Operational Research, 16(3):285\u2013292","author":"McCullagh P.","year":"1984","unstructured":"P. McCullagh . Generalized Linear Models. European Journal of Operational Research, 16(3):285\u2013292 , 1984 . P. McCullagh. Generalized Linear Models. European Journal of Operational Research, 16(3):285\u2013292, 1984."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_3_2_1_30_1","unstructured":"NVIDIA. CUDA GPU Occupancy Calculator. URL http:\/\/developer.download.nvidia.com\/compute\/ cuda\/CUDA_Occupancy_calculator.xls.  NVIDIA. CUDA GPU Occupancy Calculator. URL http:\/\/developer.download.nvidia.com\/compute\/ cuda\/CUDA_Occupancy_calculator.xls."},{"key":"e_1_3_2_1_31_1","unstructured":"NVVP. NVIDIA Visual Profiler. URL https:\/\/developer. nvidia.com\/nvidia-visual-profiler.  NVVP. NVIDIA Visual Profiler. URL https:\/\/developer. nvidia.com\/nvidia-visual-profiler."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553486"},{"key":"e_1_3_2_1_33_1","volume-title":"Springer","author":"Sharp T.","year":"2008","unstructured":"T. Sharp . Implementing Decision Trees and Forests on a GPU. In Computer Vision\u2013ECCV 2008, pages 595\u2013608 . Springer , 2008 . T. Sharp. Implementing Decision Trees and Forests on a GPU. In Computer Vision\u2013ECCV 2008, pages 595\u2013608. Springer, 2008."},{"key":"e_1_3_2_1_34_1","first-page":"2009","author":"Stamper J.","year":"2008","unstructured":"J. Stamper , A. Niculescu-Mizil , S. Ritter , G. Gordon , and K. Koedinger . Algebra I 2008 - 2009 . Challenge Data Set from KDD Cup 2010 Educational Data Mining Challenge, 2013. URL http: \/\/pslcdatashop.web.cmu.edu\/KDDCup\/downloads.jsp. J. Stamper, A. Niculescu-Mizil, S. Ritter, G. Gordon, and K. Koedinger. Algebra I 2008-2009. Challenge Data Set from KDD Cup 2010 Educational Data Mining Challenge, 2013. URL http: \/\/pslcdatashop.web.cmu.edu\/KDDCup\/downloads.jsp.","journal-title":"Algebra"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2593678"}],"event":{"name":"PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","location":"San Francisco CA USA","acronym":"PPoPP '15","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2688500.2688521","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2688500.2688521","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:55:45Z","timestamp":1750258545000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2688500.2688521"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,24]]},"references-count":35,"alternative-id":["10.1145\/2688500.2688521","10.1145\/2688500"],"URL":"https:\/\/doi.org\/10.1145\/2688500.2688521","relation":{"is-identical-to":[{"id-type":"doi","id":"10.1145\/2858788.2688521","asserted-by":"object"}]},"subject":[],"published":{"date-parts":[[2015,1,24]]},"assertion":[{"value":"2015-01-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}