{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T16:29:40Z","timestamp":1775579380289,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,5,21]],"date-time":"2024-05-21T00:00:00Z","timestamp":1716249600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"crossref","award":["314645\/2020-9, 406377\/2018-9"],"award-info":[{"award-number":["314645\/2020-9, 406377\/2018-9"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004901","name":"FAPEMIG","doi-asserted-by":"crossref","award":["PPM-00333-18"],"award-info":[{"award-number":["PPM-00333-18"]}],"id":[{"id":"10.13039\/501100004901","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001807","name":"FAPESP","doi-asserted-by":"crossref","award":["2013\/08293-7"],"award-info":[{"award-number":["2013\/08293-7"]}],"id":[{"id":"10.13039\/501100001807","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100002322","name":"CAPES","doi-asserted-by":"crossref","award":["88887.668980\/2022-00"],"award-info":[{"award-number":["88887.668980\/2022-00"]}],"id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>Kernel scheduling is the problem of finding the most efficient implementation for a computational kernel. Identifying this implementation involves experimenting with the parameters of compiler optimizations, such as the size of tiling windows and unrolling factors. This article shows that it is possible to organize these parameters as points in a coordinate space. The function that maps these points to the running time of kernels, in general, will not determine a convex surface. However, this article provides empirical evidence that the origin of this surface (an unoptimized kernel) and its global optimum (the fastest kernel) reside on a convex region. We call this hypothesis the \u201cdroplet expectation.\u201d Consequently, a search method based on the Coordinate Descent algorithm tends to find the optimal kernel configuration quickly if the hypothesis holds. This approach\u2014called Droplet Search\u2014has been available in Apache TVM since April of 2023. Experimental results with six large deep learning models on various computing devices (ARM, Intel, AMD, and NVIDIA) indicate that Droplet Search is not only as effective as other AutoTVM search techniques but also 2 to 10 times faster. Moreover, models generated by Droplet Search are competitive with those produced by TVM\u2019s AutoScheduler (Ansor), despite the latter using 4 to 5 times more code transformations than AutoTVM.<\/jats:p>","DOI":"10.1145\/3650109","type":"journal-article","created":{"date-parts":[[2024,2,29]],"date-time":"2024-02-29T12:27:11Z","timestamp":1709209631000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["The Droplet Search Algorithm for Kernel Scheduling"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7882-0787","authenticated-orcid":false,"given":"Michael","family":"Canesche","sequence":"first","affiliation":[{"name":"UFMG, Belo Horizonte, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8737-0252","authenticated-orcid":false,"given":"Vanderson","family":"Ros\u00e1rio","sequence":"additional","affiliation":[{"name":"Cadence Design Systems Inc, San Jose, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1783-4231","authenticated-orcid":false,"given":"Edson","family":"Borin","sequence":"additional","affiliation":[{"name":"Unicamp, Campinas, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0375-1657","authenticated-orcid":false,"given":"Fernando","family":"Quint\u00e3o Pereira","sequence":"additional","affiliation":[{"name":"UFMG, Belo Horizonte, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,5,21]]},"reference":[{"key":"e_1_3_3_2_1","first-page":"265","volume-title":"OSDI","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In OSDI. USENIX Association, New York, NY, 265\u2013283."},{"key":"e_1_3_3_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.37"},{"key":"e_1_3_3_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cola.2021.101062"},{"key":"e_1_3_3_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197978"},{"key":"e_1_3_3_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2928270"},{"key":"e_1_3_3_7_1","volume-title":"Convex Optimization Theory","author":"Bertsekas D.","year":"2009","unstructured":"D. Bertsekas. 2009. Convex Optimization Theory. Athena Scientific, Nashua, NH. https:\/\/books.google.com.br\/books?id=0H1iQwAACAAJ"},{"key":"e_1_3_3_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582061"},{"key":"e_1_3_3_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377555.3377894"},{"key":"e_1_3_3_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2007.32"},{"key":"e_1_3_3_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/11688839_12"},{"key":"e_1_3_3_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1167473.1167492"},{"key":"e_1_3_3_13_1","unstructured":"Tianqi Chen Mu Li Yutian Li Min Lin Naiyan Wang Minjie Wang Tianjun Xiao Bing Xu Chiyuan Zhang and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274. Retrieved from http:\/\/arxiv.org\/abs\/1512.01274"},{"key":"e_1_3_3_14_1","first-page":"579","volume-title":"OSDI (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In OSDI (OSDI\u201918). USENIX Association, Berkeley, CA, 579\u2013594."},{"key":"e_1_3_3_15_1","first-page":"2244","volume-title":"ICML","author":"Cummins Chris","year":"2021","unstructured":"Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Michael F. P. O\u2019Boyle, and Hugh Leather. 2021a. ProGraML: A graph-based program representation for data flow analysis and compiler optimizations. In ICML, Vol. 139. PMLR, Baltimore, MD, 2244\u20132253."},{"key":"e_1_3_3_16_1","doi-asserted-by":"crossref","unstructured":"Chris Cummins Bram Wasti Jiadong Guo Brandon Cui Jason Ansel Sahir Gomez Somya Jain Jia Liu Olivier Teytaud Benoit Steiner Yuandong Tian and Hugh Leather. 2021b. CompilerGym: Robust performant compiler optimization environments for AI research. arXiv:2109.08267. Retrieved from https:\/\/arxiv.org\/abs\/2109.08267.","DOI":"10.1109\/CGO53902.2022.9741258"},{"key":"e_1_3_3_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370322"},{"key":"e_1_3_3_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478288"},{"key":"e_1_3_3_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/n19-1423"},{"key":"e_1_3_3_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3579990.3580006"},{"key":"e_1_3_3_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01407931"},{"key":"e_1_3_3_22_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385. Retrieved from http:\/\/arxiv.org\/abs\/1512.03385"},{"key":"e_1_3_3_23_1","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from http:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_3_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3563329"},{"key":"e_1_3_3_25_1","volume-title":"MLSys","author":"Kaufman Samuel J.","year":"2021","unstructured":"Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, and Mike Burrows. 2021. A learned performance model for tensor processing units. In MLSys, Alex Smola, Alex Dimakis, and Ion Stoica (Eds.). mlsys.org, Indio, CA, 15."},{"key":"e_1_3_3_26_1","first-page":"551","volume-title":"Optimization by Simulated Annealing","author":"Kirkpatrick S.","year":"1988","unstructured":"S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1988. Optimization by Simulated Annealing. MIT Press, Cambridge, MA, 551\u2013567."},{"key":"e_1_3_3_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/IVMEM53963.2021.00015"},{"key":"e_1_3_3_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/362835.362841"},{"key":"e_1_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3146257"},{"key":"e_1_3_3_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3497776.3517777"},{"key":"e_1_3_3_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/3008904.3009034"},{"key":"e_1_3_3_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3453483.3454103"},{"key":"e_1_3_3_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385989"},{"key":"e_1_3_3_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT52795.2021.00008"},{"key":"e_1_3_3_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5213293"},{"key":"e_1_3_3_36_1","unstructured":"Peter Richt\u00e1rik and Martin Tak\u00e1c. 2012. Parallel Coordinate Descent methods for big data optimization. arXiv:1212.0873. Retrieved from http:\/\/arxiv.org\/abs\/1212.0873"},{"key":"e_1_3_3_37_1","doi-asserted-by":"crossref","unstructured":"Abhronil Sengupta Yuting Ye Robert Wang Chiao Liu and Kaushik Roy. 2018. Going deeper in spiking neural networks: VGG and residual architectures. arXiv:1802.02627. Retrieved from http:\/\/arxiv.org\/abs\/1802.02627","DOI":"10.3389\/fnins.2019.00095"},{"key":"e_1_3_3_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3446804.3446849"},{"key":"e_1_3_3_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2013.6495004"},{"key":"e_1_3_3_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2968455.2968521"},{"key":"e_1_3_3_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3624309.3624311"},{"key":"e_1_3_3_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13374-9_28"},{"key":"e_1_3_3_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3570641"},{"key":"e_1_3_3_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851163"},{"key":"e_1_3_3_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-4380-9_16"},{"key":"e_1_3_3_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/113445.113449"},{"key":"e_1_3_3_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-015-0892-3"},{"key":"e_1_3_3_48_1","volume-title":"Nonlinear Programming, A Unified Approach (1st ed.)","author":"Zangwill W.","year":"1969","unstructured":"W. Zangwill. 1969. Nonlinear Programming, A Unified Approach (1st ed.). Prentice Hall, Hoboken NJ."},{"key":"e_1_3_3_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3437801.3441609"},{"key":"e_1_3_3_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/83.869186"},{"key":"e_1_3_3_51_1","volume-title":"OSDI","author":"Zheng Lianmin","year":"2020","unstructured":"Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor: Generating high-performance tensor programs for deep learning. In OSDI. USENIX Association, Berkeley, CA, Article 49, 17 pages."},{"key":"e_1_3_3_52_1","first-page":"233","volume-title":"OSDI","author":"Zhu Hongyu","year":"2022","unstructured":"Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, and Gennady Pekhimenko. 2022. ROLLER: Fast and efficient tensor compilation for deep learning. In OSDI, Marcos K. Aguilera and Hakim Weatherspoon (Eds.). USENIX Association, Berkeley, CA, 233\u2013248."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3650109","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3650109","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:44Z","timestamp":1750291424000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3650109"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,21]]},"references-count":51,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3650109"],"URL":"https:\/\/doi.org\/10.1145\/3650109","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,21]]},"assertion":[{"value":"2023-07-18","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-02-25","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}