{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T01:23:18Z","timestamp":1780536198621,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,8,12]],"date-time":"2024-08-12T00:00:00Z","timestamp":1723420800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union's Horizon 2020 research and innovation program","award":["957197"],"award-info":[{"award-number":["957197"]}]},{"name":"Swedish Foundation for Strategic Research","award":["800928"],"award-info":[{"award-number":["800928"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,8,12]]},"DOI":"10.1145\/3677333.3678153","type":"proceedings-article","created":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T16:21:29Z","timestamp":1723220489000},"page":"58-67","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3955-2836","authenticated-orcid":false,"given":"Fareed","family":"Qararyah","sequence":"first","affiliation":[{"name":"Chalmers University of Technology, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0477-4540","authenticated-orcid":false,"given":"Muhammad Waqar","family":"Azhar","sequence":"additional","affiliation":[{"name":"Chalmers University of Technology, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9019-3605","authenticated-orcid":false,"given":"Mohammad Ali","family":"Maleki","sequence":"additional","affiliation":[{"name":"Chalmers University of Technology, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2776-9253","authenticated-orcid":false,"given":"Pedro","family":"Trancoso","sequence":"additional","affiliation":[{"name":"Chalmers University of Technology, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,8,12]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg\u00a0S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783725"},{"key":"e_1_3_2_1_3_1","volume-title":"NIPS 2011, BigLearning Workshop","author":"Bergstra James","year":"2011","unstructured":"James Bergstra, Fr\u00e9d\u00e9ric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, David Warde-Farley, Ian Goodfellow, Arnaud Bergeron, 2011. Theano: Deep learning on gpus with python. In NIPS 2011, BigLearning Workshop, Granada, Spain, Vol.\u00a03. Citeseer."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_3_2_1_5_1","volume-title":"Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332","author":"Cai Han","year":"2018","unstructured":"Han Cai, Ligeng Zhu, and Song Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3461648.3463848"},{"key":"e_1_3_2_1_7_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, 2018. TVM: An automated End-to-End optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578\u2013594."},{"key":"e_1_3_2_1_8_1","volume-title":"cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.195"},{"key":"e_1_3_2_1_10_1","volume-title":"Coatnet: Marrying convolution and attention for all data sizes. Advances in neural information processing systems 34","author":"Dai Zihang","year":"2021","unstructured":"Zihang Dai, Hanxiao Liu, Quoc\u00a0V Le, and Mingxing Tan. 2021. Coatnet: Marrying convolution and attention for all data sizes. Advances in neural information processing systems 34 (2021), 3965\u20133977."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3184407.3184423"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304014"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01186"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_15_1","volume-title":"Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861","author":"Howard G","year":"2017","unstructured":"Andrew\u00a0G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3579990.3580017"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.2973144"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_3_2_1_19_1","volume-title":"Transformers in vision: A survey. ACM computing surveys (CSUR) 54, 10s","author":"Khan Salman","year":"2022","unstructured":"Salman Khan, Muzammal Naseer, Munawar Hayat, Syed\u00a0Waqas Zamir, Fahad\u00a0Shahbaz Khan, and Mubarak Shah. 2022. Transformers in vision: A survey. ACM computing surveys (CSUR) 54, 10s (2022), 1\u201341."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"key":"e_1_3_2_1_22_1","volume-title":"Deep learning. nature 521, 7553","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436\u2013444."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2010.5537907"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/3014904.3014977"},{"key":"e_1_3_2_1_25_1","volume-title":"2016 26th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1\u20139.","author":"Li Huimin","year":"2016","unstructured":"Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou, and Lingli Wang. 2016. A high performance FPGA-based accelerator for large-scale convolutional neural networks. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1\u20139."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3084813"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401132.1401152"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447818.3460378"},{"key":"e_1_3_2_1_29_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_2_1_30_1","volume-title":"FiBHA: Fixed Budget Hybrid CNN Accelerator. In 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 180\u2013190","author":"Qararyah Fareed","year":"2022","unstructured":"Fareed Qararyah, Muhammad\u00a0Waqar Azhar, and Pedro Trancoso. 2022. FiBHA: Fixed Budget Hybrid CNN Accelerator. In 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 180\u2013190."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639823"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93417-4_38"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080221"},{"key":"e_1_3_2_1_35_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_36_1","volume-title":"Primer: Searching for efficient transformers for language modeling. arXiv preprint arXiv:2109.08668","author":"So R","year":"2021","unstructured":"David\u00a0R So, Wojciech Ma\u0144ke, Hanxiao Liu, Zihang Dai, Noam Shazeer, and Quoc\u00a0V Le. 2021. Primer: Searching for efficient transformers for language modeling. arXiv preprint arXiv:2109.08668 (2021)."},{"key":"e_1_3_2_1_37_1","volume-title":"International conference on machine learning. PMLR, 6105\u20136114","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105\u20136114."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_3_2_1_39_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\u00a0N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_40_1","volume-title":"2019 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1\u20138.","author":"Venkatesan Rangharajan","year":"2019","unstructured":"Rangharajan Venkatesan, Yakun\u00a0Sophia Shao, Miaorong Wang, Jason Clemons, Steve Dai, Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, 2019. Magnet: A modular accelerator generator for neural networks. In 2019 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1\u20138."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3134930"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240856"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062244"},{"key":"e_1_3_2_1_45_1","volume-title":"Early convolutions help transformers see better. Advances in neural information processing systems 34","author":"Xiao Tete","year":"2021","unstructured":"Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Doll\u00e1r, and Ross Girshick. 2021. Early convolutions help transformers see better. Advances in neural information processing systems 34 (2021), 30392\u201330400."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2019.2930577"},{"key":"e_1_3_2_1_47_1","volume-title":"2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 598\u2013610","author":"Yang Yifan","year":"2023","unstructured":"Yifan Yang, Joel\u00a0S Emer, and Daniel Sanchez. 2023. ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 598\u2013610."},{"key":"e_1_3_2_1_48_1","first-page":"12992","article-title":"Glance-and-gaze vision transformer","volume":"34","author":"Yu Qihang","year":"2021","unstructured":"Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan\u00a0L Yuille, and Wei Shen. 2021. Glance-and-gaze vision transformer. Advances in Neural Information Processing Systems 34 (2021), 12992\u201313003.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00062"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507767"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40649-019-0069-y"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00716"},{"key":"e_1_3_2_1_53_1","volume-title":"2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 1113\u20131126","author":"Zheng Size","year":"2023","unstructured":"Size Zheng, Siyuan Chen, Peidi Song, Renze Chen, Xiuhong Li, Shengen Yan, Dahua Lin, Jingwen Leng, and Yun Liang. 2023. Chimera: An analytical optimizing framework for effective compute-intensive operators fusion. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 1113\u20131126."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2020.3012215"},{"key":"e_1_3_2_1_55_1","article-title":". Convolutional Neural Networks Inference Memory Optimization with Receptive Field-Based Input Tiling","volume":"12","author":"Zhuang Weihao","year":"2021","unstructured":"Weihao Zhuang, Tristan Hascoet, Xunquan Chen, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki, 2021. Convolutional Neural Networks Inference Memory Optimization with Receptive Field-Based Input Tiling. APSIPA Transactions on Signal and Information Processing 12, 1 (2021).","journal-title":"APSIPA Transactions on Signal and Information Processing"}],"event":{"name":"ICPP Workshops '24: The 53rd International Conference on Parallel Processing Workshops","location":"Gotland Sweden","acronym":"ICPP Workshops '24"},"container-title":["The 53rd International Conference on Parallel Processing Workshops"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3677333.3678153","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:04:21Z","timestamp":1750291461000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3677333.3678153"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,12]]},"references-count":55,"alternative-id":["10.1145\/3677333.3678153","10.1145\/3677333"],"URL":"https:\/\/doi.org\/10.1145\/3677333.3678153","relation":{},"subject":[],"published":{"date-parts":[[2024,8,12]]},"assertion":[{"value":"2024-08-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}