{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:26:06Z","timestamp":1750220766797,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,6,16]],"date-time":"2020-06-16T00:00:00Z","timestamp":1592265600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Samsung Research Funding & Incubation Center of Samsung Electronics","award":[", SRFC-IT1901-15"],"award-info":[{"award-number":[", SRFC-IT1901-15"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,6,16]]},"DOI":"10.1145\/3372799.3394366","type":"proceedings-article","created":{"date-parts":[[2020,5,29]],"date-time":"2020-05-29T15:04:12Z","timestamp":1590764652000},"page":"136-140","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Towards Real-time CNN Inference from a Video Stream on a Mobile GPU (WiP Paper)"],"prefix":"10.1145","author":[{"given":"Chanyoung","family":"Oh","sequence":"first","affiliation":[{"name":"University of Seoul, Seoul, Republic of Korea"}]},{"given":"Gunju","family":"Park","sequence":"additional","affiliation":[{"name":"University of Seoul, Seoul, Republic of Korea"}]},{"given":"Sumin","family":"Kim","sequence":"additional","affiliation":[{"name":"University of Seoul, Seoul, Republic of Korea"}]},{"given":"Dohee","family":"Kim","sequence":"additional","affiliation":[{"name":"University of Seoul, Seoul, Republic of Korea"}]},{"given":"Youngmin","family":"Yi","sequence":"additional","affiliation":[{"name":"University of Seoul, Seoul, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2020,6,16]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"2016. ARM Compute Library. https:\/\/github.com\/ARM-software\/ComputeLibrary  2016. ARM Compute Library. https:\/\/github.com\/ARM-software\/ComputeLibrary"},{"key":"e_1_3_2_2_2_1","unstructured":"2017. Caffe2. https:\/\/caffe2.ai\/  2017. Caffe2. https:\/\/caffe2.ai\/"},{"key":"e_1_3_2_2_3_1","unstructured":"2017. Key Information About the Huawei Kirin 970. http:\/\/www.hisilicon.com\/en\/Media-Center\/News\/Key-Information-About-the-Huawei-Kirin970  2017. Key Information About the Huawei Kirin 970. http:\/\/www.hisilicon.com\/en\/Media-Center\/News\/Key-Information-About-the-Huawei-Kirin970"},{"key":"e_1_3_2_2_4_1","unstructured":"2017. Snapdragon Neural Processing Engine. https:\/\/developer.qualcomm.com\/docs\/snpe\/overview.html  2017. Snapdragon Neural Processing Engine. https:\/\/developer.qualcomm.com\/docs\/snpe\/overview.html"},{"key":"e_1_3_2_2_5_1","unstructured":"2017. TensorFlow Lite. https:\/\/www.tensorflow.org\/lite  2017. TensorFlow Lite. https:\/\/www.tensorflow.org\/lite"},{"key":"e_1_3_2_2_6_1","unstructured":"2018. ARM Mali-G76. https:\/\/www.arm.com\/products\/silicon-ip-multimedia\/gpu\/mali-g76  2018. ARM Mali-G76. https:\/\/www.arm.com\/products\/silicon-ip-multimedia\/gpu\/mali-g76"},{"key":"e_1_3_2_2_7_1","unstructured":"2019. Exynos 9820. https:\/\/www.samsung.com\/semiconductor\/minisite\/exynos\/products\/mobileprocessor\/exynos-9-series-9820\/  2019. Exynos 9820. https:\/\/www.samsung.com\/semiconductor\/minisite\/exynos\/products\/mobileprocessor\/exynos-9-series-9820\/"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001177"},{"key":"e_1_3_2_2_9_1","volume-title":"cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 ( 2014 ). Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"crossref","unstructured":"Jiankang Deng Jia Guo Zhou Yuxiang Jinke Yu Irene Kotsia and Stefanos Zafeiriou. 2019. RetinaFace: Single-stage Dense Face Localisation in the Wild. In arxiv.  Jiankang Deng Jia Guo Zhou Yuxiang Jinke Yu Irene Kotsia and Stefanos Zafeiriou. 2019. RetinaFace: Single-stage Dense Face Localisation in the Wild. In arxiv.","DOI":"10.1109\/CVPR42600.2020.00525"},{"volume-title":"A study of persistent threads style GPU programming for GPGPU workloads","author":"Gupta Kshitij","key":"e_1_3_2_2_11_1","unstructured":"Kshitij Gupta , Jeff A Stuart , and John D Owens . 2012. A study of persistent threads style GPU programming for GPGPU workloads . IEEE. Kshitij Gupta, Jeff A Stuart, and John D Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. IEEE."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_14_1","volume-title":"Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861","author":"Howard Andrew G","year":"2017","unstructured":"Andrew G Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017 . Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017). Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3081333.3081360"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2977496"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001178"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303950"},{"key":"e_1_3_2_2_19_1","volume-title":"On information and sufficiency. The annals of mathematical statistics","author":"Kullback Solomon","year":"1951","unstructured":"Solomon Kullback and Richard A Leibler . 1951. On information and sufficiency. The annals of mathematical statistics , Vol. 22 , 1 ( 1951 ), 79--86. Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics, Vol. 22, 1 (1951), 79--86."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPSN.2016.7460664"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"key":"e_1_3_2_2_22_1","volume-title":"Thirty-second AAAI conference on artificial intelligence.","author":"Li Dawei","year":"2018","unstructured":"Dawei Li , Xiaolong Wang , and Deguang Kong . 2018 . Deeprebirth: Accelerating deep neural network execution on mobile devices . In Thirty-second AAAI conference on artificial intelligence. Dawei Li, Xiaolong Wang, and Deguang Kong. 2018. Deeprebirth: Accelerating deep neural network execution on mobile devices. In Thirty-second AAAI conference on artificial intelligence."},{"key":"e_1_3_2_2_23_1","volume-title":"et almbox","author":"Loy Chen Change","year":"2019","unstructured":"Chen Change Loy , Dahua Lin , Wanli Ouyang , Yuanjun Xiong , Shuo Yang , Qingqiu Huang , Dongzhan Zhou , Wei Xia , Quanquan Li , Ping Luo , et almbox . 2019 . WIDER face and pedestrian challenge 2018: Methods and results. arXiv preprint arXiv:1902.06854 (2019). Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, et almbox. 2019. WIDER face and pedestrian challenge 2018: Methods and results. arXiv preprint arXiv:1902.06854 (2019)."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_3_2_2_25_1","volume-title":"Apple's 'neural engine' infuses the iphone with ai smarts. wired. com","author":"Simonite Tom","year":"2017","unstructured":"Tom Simonite . 2017. Apple's 'neural engine' infuses the iphone with ai smarts. wired. com ( 2017 ). Tom Simonite. 2017. Apple's 'neural engine' infuses the iphone with ai smarts. wired. com (2017)."},{"key":"e_1_3_2_2_26_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983990.2984032"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661229.2661250"},{"key":"e_1_3_2_2_29_1","volume-title":"OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering","author":"Stone John E","year":"2010","unstructured":"John E Stone , David Gohara , and Guochun Shi . 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering , Vol. 12 , 3 ( 2010 ), 66--73. John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering, Vol. 12, 3 (2010), 66--73."},{"key":"e_1_3_2_2_30_1","volume-title":"High-throughput cnn inference on embedded arm big. little multi-core processors","author":"Wang Siqi","year":"2019","unstructured":"Siqi Wang , Gayathri Ananthanarayanan , Yifan Zeng , Neeraj Goel , Anuj Pathania , and Tulika Mitra . 2019. High-throughput cnn inference on embedded arm big. little multi-core processors . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( 2019 ). Siqi Wang, Gayathri Ananthanarayanan, Yifan Zeng, Neeraj Goel, Anuj Pathania, and Tulika Mitra. 2019. High-throughput cnn inference on embedded arm big. little multi-core processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2019)."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.521"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123978"}],"event":{"name":"LCTES '20: 21st ACM SIGPLAN\/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages","SIGBED ACM Special Interest Group on Embedded Systems"],"location":"London United Kingdom","acronym":"LCTES '20"},"container-title":["The 21st ACM SIGPLAN\/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372799.3394366","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3372799.3394366","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:09Z","timestamp":1750200069000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372799.3394366"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,16]]},"references-count":32,"alternative-id":["10.1145\/3372799.3394366","10.1145\/3372799"],"URL":"https:\/\/doi.org\/10.1145\/3372799.3394366","relation":{},"subject":[],"published":{"date-parts":[[2020,6,16]]},"assertion":[{"value":"2020-06-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}