{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T16:15:04Z","timestamp":1775837704470,"version":"3.50.1"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,6,6]],"date-time":"2022-06-06T00:00:00Z","timestamp":1654473600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Samsung Research Funding Incubation Center of Samsung Electronics","award":["SRFC-IT1801-04"],"award-info":[{"award-number":["SRFC-IT1801-04"]}]},{"name":"Samsung Research Funding Incubation Center of Samsung Electronics","award":["SRFC-IT1801-04, SRFC-IT1902-03"],"award-info":[{"award-number":["SRFC-IT1801-04, SRFC-IT1902-03"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            Recent advances in deep learning have made it possible to implement artificial intelligence in mobile devices. Many studies have put a lot of effort into developing lightweight deep learning models optimized for mobile devices. To overcome the performance limitations of manually designed deep learning models, an automated search algorithm, called\n            <jats:bold>neural architecture search<\/jats:bold>\n            (\n            <jats:bold>NAS<\/jats:bold>\n            ), has been proposed. However, studies on the effect of hardware architecture of the mobile device on the performance of NAS have been less explored. In this article, we show the importance of optimizing a hardware architecture, namely, NPU dataflow, when searching for a more accurate yet fast deep learning model. To do so, we first implement an optimization framework, named FlowOptimizer, for generating a best possible NPU dataflow for a given deep learning operator. Then, we utilize this framework during the latency-aware NAS to find the model with the highest accuracy satisfying the latency constraint. As a result, we show that the searched model with FlowOptimizer outperforms the performance by 87.1% and 92.3% on average compared to the searched model with NVDLA and Eyeriss, respectively, with better accuracy on a proxy dataset. We also show that the searched model can be transferred to a larger model to classify a more complex image dataset, i.e., ImageNet, achieving 0.2%\/5.4% higher Top-1\/Top-5 accuracy compared to MobileNetV2-1.0 with 3.6\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            lower latency.\n          <\/jats:p>","DOI":"10.1145\/3513085","type":"journal-article","created":{"date-parts":[[2022,2,24]],"date-time":"2022-02-24T17:13:41Z","timestamp":1645722821000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Implication of Optimizing NPU Dataflows on Neural Architecture Search for Mobile Devices"],"prefix":"10.1145","volume":"27","author":[{"given":"Jooyeon","family":"Lee","sequence":"first","affiliation":[{"name":"Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, Korea"}]},{"given":"Junsang","family":"Park","sequence":"additional","affiliation":[{"name":"Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, Korea"}]},{"given":"Seunghyun","family":"Lee","sequence":"additional","affiliation":[{"name":"Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6151-8602","authenticated-orcid":false,"given":"Jaeha","family":"Kung","sequence":"additional","affiliation":[{"name":"Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, Korea"}]}],"member":"320","published-online":{"date-parts":[[2022,6,6]]},"reference":[{"key":"e_1_3_2_2_2","article-title":"Once for all: Train one network and specialize it for efficient deployment","volume":"1908","author":"Cai Han","year":"2019","unstructured":"Han Cai, Chuang Gan, and Song Han. 2019. Once for all: Train one network and specialize it for efficient deployment. CoRR abs\/1908.09791. arxiv:1908.09791. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1908.09791.","journal-title":"CoRR"},{"key":"e_1_3_2_3_2","article-title":"Marvel: A data-centric compiler for DNN operators on spatial accelerators","volume":"2002","author":"Chatarasi Prasanth","year":"2020","unstructured":"Prasanth Chatarasi, Hyoukjun Kwon, Natesh Raina, Saurabh Malik, Vaisakh Haridas, Angshuman Parashar, Michael Pellauer, Tushar Krishna, and Vivek Sarkar. 2020. Marvel: A data-centric compiler for DNN operators on spatial accelerators. CoRR abs\/2002.07752. arxiv:2002.07752. Retrieved on May 25, 2021 from https:\/\/arxiv.org\/abs\/2002.07752.","journal-title":"CoRR"},{"key":"e_1_3_2_4_2"},{"key":"e_1_3_2_5_2"},{"key":"e_1_3_2_6_2","article-title":"Fair DARTS: Eliminating unfair advantages in differentiable architecture search","author":"Chu Xiangxiang","year":"2019","unstructured":"Xiangxiang Chu, Tianbao Zhou, Bo Zhang, and Jixiang Li. 2019. Fair DARTS: Eliminating unfair advantages in differentiable architecture search. arXiv:abs\/1911.12126. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1911.12126.","journal-title":"arXiv:abs\/1911.12126"},{"key":"e_1_3_2_7_2","volume-title":"Advances in Neural Information Processing Systems (NIPS\u201912)","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc\u2019Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems (NIPS\u201912)."},{"key":"e_1_3_2_8_2","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume":"1810","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs\/1810.04805. Retrieved from arxiv:1810.04805. http:\/\/arxiv.org\/abs\/1810.04805.","journal-title":"CoRR"},{"key":"e_1_3_2_9_2"},{"key":"e_1_3_2_10_2"},{"key":"e_1_3_2_11_2","article-title":"Single path one-shot neural architecture search with uniform sampling","volume":"1904","author":"Guo Zichao","year":"2019","unstructured":"Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2019. Single path one-shot neural architecture search with uniform sampling. CoRR abs\/1904.00420. arxiv:1904.00420. Retrieved from http:\/\/arxiv.org\/abs\/1904.00420.","journal-title":"CoRR"},{"key":"e_1_3_2_12_2"},{"key":"e_1_3_2_13_2","article-title":"Deep residual learning for image recognition","volume":"1512","author":"He Kaiming","year":"2015","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. CoRR abs\/1512.03385. arxiv:1512.03385. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1512.03385.","journal-title":"CoRR"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_3_2_15_2","article-title":"MobileNets: Efficient convolutional neural networks for mobile vision applications","volume":"1704","author":"Howard Andrew G.","year":"2017","unstructured":"Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs\/1704.04861. arxiv:1704.04861. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1704.04861.","journal-title":"CoRR"},{"key":"e_1_3_2_16_2","article-title":"sharpDARTS: Faster and more accurate differentiable architecture search","author":"Hundt Andrew","year":"2019","unstructured":"Andrew Hundt, Varun Jain, and Gregory D. Hager. 2019. sharpDARTS: Faster and more accurate differentiable architecture search. arXiv:abs\/1903.09900. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1903.09900.","journal-title":"arXiv:abs\/1903.09900"},{"key":"e_1_3_2_17_2"},{"key":"e_1_3_2_18_2"},{"key":"e_1_3_2_19_2"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415639"},{"key":"e_1_3_2_21_2","unstructured":"Alex Krizhevsky Vinod Nair and Geoffrey Hinton. 2010. CIFAR-10 (Canadian Institute for Advanced Research). Retrieved March 3 2021 from https:\/\/www.cs.toronto.edu\/kriz\/cifar.html."},{"key":"e_1_3_2_22_2","volume-title":"Advances in Neural Information Processing Systems (NIPS\u201912)","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS\u201912)."},{"key":"e_1_3_2_23_2"},{"key":"e_1_3_2_24_2"},{"key":"e_1_3_2_25_2","article-title":"S3NAS: Fast NPU-aware neural architecture search methodology","volume":"2009","author":"Lee Jaeseong","year":"2020","unstructured":"Jaeseong Lee, Duseok Kang, and Soonhoi Ha. 2020. S3NAS: Fast NPU-aware neural architecture search methodology. CoRR abs\/2009.02009. arxiv:2009.02009. Retrieved on May 25, 2021 from https:\/\/arxiv.org\/abs\/2009.02009.","journal-title":"CoRR"},{"key":"e_1_3_2_26_2"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218749"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_2"},{"key":"e_1_3_2_29_2","article-title":"DARTS: Differentiable architecture search","volume":"1806","author":"Liu Hanxiao","year":"2018","unstructured":"Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. DARTS: Differentiable architecture search. CoRR abs\/1806.09055. arxiv:1806.09055. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1806.09055.","journal-title":"CoRR"},{"key":"e_1_3_2_30_2","article-title":"ShuffleNet V2: Practical guidelines for efficient CNN architecture design","volume":"1807","author":"Ma Ningning","year":"2018","unstructured":"Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. CoRR abs\/1807.11164. arxiv:1807.11164. Retrieved on May 25, 2021 from https:\/\/arxiv.org\/abs\/1807.11164.","journal-title":"CoRR"},{"key":"e_1_3_2_31_2","unstructured":"NVIDIA. 2017. Volta GPU Architecture. https:\/\/www.nvidia.com\/en-us\/data-center\/volta-gpu-archi."},{"key":"e_1_3_2_32_2","unstructured":"NVIDIA. 2019. NVDLA Index of Documentation. (2019). Retrieved on May 25 2021 from http:\/\/nvdla.org\/contents.html."},{"key":"e_1_3_2_33_2","unstructured":"NVIDIA. 2020. NVIDIA Ampere Architecture. (2020). Retrieved on May 25 2021 from https:\/\/www.nvidia.com\/en-us\/data-center\/ampere-architecture\/."},{"key":"e_1_3_2_34_2"},{"key":"e_1_3_2_35_2","article-title":"Efficient neural architecture search via parameter sharing","volume":"1802","author":"Pham Hieu","year":"2018","unstructured":"Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. CoRR abs\/1802.03268. arxiv:1802.03268. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1802.03268.","journal-title":"CoRR"},{"key":"e_1_3_2_36_2","article-title":"Improving language understanding by generative pre-training","author":"Radford Alec","year":"2018","unstructured":"Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. OpenAI. https:\/\/openai.com\/blog\/language-unsupervised\/.","journal-title":"OpenAI"},{"key":"e_1_3_2_37_2"},{"key":"e_1_3_2_38_2","article-title":"YOLO9000: Better, faster, stronger","volume":"1612","author":"Redmon Joseph","year":"2016","unstructured":"Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, faster, stronger. CoRR abs\/1612.08242. arxiv:1612.08242.http:\/\/arxiv.org\/abs\/1612.08242.","journal-title":"CoRR"},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1007\/978-3-540-48765-4_16","volume-title":"Multiple Approaches to Intelligent Systems","author":"Rocha Miguel","year":"1999","unstructured":"Miguel Rocha and Jos\u00e9 Neves. 1999. Preventing premature convergence to local optima in genetic algorithms via random offspring generation. In Multiple Approaches to Intelligent Systems. Springer, 127\u2013136."},{"key":"e_1_3_2_40_2","first-page":"234","volume-title":"Medical Image Computing and Computer-Assisted Intervention (MICCAI\u201915)","author":"Ronneberger Olaf","year":"2015","unstructured":"Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI\u201915). 234\u2013241."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3317784"},{"key":"e_1_3_2_42_2","article-title":"Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation","volume":"1801","author":"Sandler Mark","year":"2018","unstructured":"Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR abs\/1801.04381. arxiv:1801.04381. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1801.04381.","journal-title":"CoRR"},{"key":"e_1_3_2_43_2"},{"key":"e_1_3_2_44_2","article-title":"Very deep convolutional networks for large-scale image recognition","volume":"1409","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. CoRR abs\/1409.1556. arxiv:1409.1556. http:\/\/arxiv.org\/abs\/1409.1556.","journal-title":"CoRR"},{"key":"e_1_3_2_45_2","first-page":"3104","volume-title":"Proceedings of the International Conference on Neural Information Processing Systems (NIPS\u201914)","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS\u201914). 3104\u20133112."},{"key":"e_1_3_2_46_2","article-title":"MnasNet: Platform-aware neural architecture search for mobile","volume":"1807","author":"Tan Mingxing","year":"2018","unstructured":"Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V. Le. 2018. MnasNet: Platform-aware neural architecture search for mobile. CoRR abs\/1807.11626. arxiv:1807.11626. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1807.11626.","journal-title":"CoRR"},{"key":"e_1_3_2_47_2","article-title":"EfficientNet: Rethinking model scaling for convolutional neural networks","volume":"1905","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. CoRR abs\/1905.11946. arxiv:1905.11946. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1905.11946.","journal-title":"CoRR"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01099"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218676"},{"key":"e_1_3_2_50_2"},{"key":"e_1_3_2_51_2"},{"key":"e_1_3_2_52_2","article-title":"Multi-scale context aggregation by dilated convolutions","volume":"1511","author":"Yu Fisher","year":"2016","unstructured":"Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions. CoRR abs\/1511.07122. arxiv:1511.07122. Retrieved on May 25, 2021 from https:\/\/arxiv.org\/abs\/1511.07122v3.","journal-title":"CoRR"},{"key":"e_1_3_2_53_2"},{"key":"e_1_3_2_54_2","article-title":"Graph neural networks: A review of methods and applications","volume":"1812","author":"Zhou Jie","year":"2018","unstructured":"Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. CoRR abs\/1812.08434. arxiv:1812.08434. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1812.08434.","journal-title":"CoRR"},{"key":"e_1_3_2_55_2","article-title":"Neural architecture search with reinforcement learning","volume":"1611","author":"Zoph Barret","year":"2016","unstructured":"Barret Zoph and Quoc V. Le. 2016. Neural architecture search with reinforcement learning. CoRR abs\/1611.01578. arxiv:1611.01578. Retrieved on May 25, 2021 from http:\/\/arxiv.org\/abs\/1611.01578.","journal-title":"CoRR"},{"key":"e_1_3_2_56_2"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3513085","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3513085","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:20Z","timestamp":1750188680000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3513085"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,6]]},"references-count":55,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3513085"],"URL":"https:\/\/doi.org\/10.1145\/3513085","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,6]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}