{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T11:47:06Z","timestamp":1759837626375,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2021,9,24]],"date-time":"2021-09-24T00:00:00Z","timestamp":1632441600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. In this work, we incorporate this insight into Thallo, a domain-specific language for large-scale non-linear least squares optimization problems. We observe various code reorganizations performed by implementers of high-performance solvers in the literature, and then define a set of basic operations that span these scheduling choices, thereby defining a large scheduling space. Users can either specify code transformations in a scheduling language or use an autoscheduler. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high-performance GPU solvers. Since Thallo can generate solvers from a large scheduling space, it can handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-to-memory ratios, including diverse applications such as bundle adjustment, face blendshape fitting, and spatially-varying Poisson deconvolution, as seen in Figure\u00a01. Abstracting schedules from the optimization, we outperform state-of-the-art GPU-based optimization DSLs by an average of 16\u00d7 across all applications introduced in this work, and even some published hand-written GPU solvers by 30%+.<\/jats:p>","DOI":"10.1145\/3453986","type":"journal-article","created":{"date-parts":[[2021,9,24]],"date-time":"2021-09-24T16:04:21Z","timestamp":1632499461000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Thallo \u2013 Scheduling for High-Performance Large-Scale Non-Linear Least-Squares Solvers"],"prefix":"10.1145","volume":"40","author":[{"given":"Michael","family":"Mara","sequence":"first","affiliation":[{"name":"Stanford University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Felix","family":"Heide","sequence":"additional","affiliation":[{"name":"Princeton University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Zollh\u00f6fer","sequence":"additional","affiliation":[{"name":"Stanford University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthias","family":"Nie\u00dfner","sequence":"additional","affiliation":[{"name":"Technical University of Munich"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pat","family":"Hanrahan","sequence":"additional","affiliation":[{"name":"Stanford University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9,24]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http:\/\/tensorflow.org\/.Software available from tensorflow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http:\/\/tensorflow.org\/.Software available from tensorflow.org."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322967"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2001269.2001293"},{"key":"e_1_2_2_4_1","unstructured":"Sameer Agarwal Keir Mierle and Others. 2010a. Ceres Solver. http:\/\/ceres-solver.org. (2010).  Sameer Agarwal Keir Mierle and Others. 2010a. Ceres Solver. http:\/\/ceres-solver.org. (2010)."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1888028.1888032"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2892632"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/993483"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3054739"},{"volume-title":"SubdSH: Subdivision-based spherical harmonics field for real-time shading-based refinement under challenging unknown illumination. In 2018 IEEE Visual Communications and Image Processing (VCIP)","author":"Deng Teng","key":"e_1_2_2_10_1","unstructured":"Teng Deng , Jianmin Zheng , Jianfei Cai , and Tat-Jen Cham . 2018. SubdSH: Subdivision-based spherical harmonics field for real-time shading-based refinement under challenging unknown illumination. In 2018 IEEE Visual Communications and Image Processing (VCIP) . IEEE , 1\u20134. Teng Deng, Jianmin Zheng, Jianfei Cai, and Tat-Jen Cham. 2018. SubdSH: Subdivision-based spherical harmonics field for real-time shading-based refinement under challenging unknown illumination. In 2018 IEEE Visual Communications and Image Processing (VCIP). IEEE, 1\u20134."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/311535.311576"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132188"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2018.00059"},{"key":"e_1_2_2_14_1","volume-title":"Proceedings of the 18th Central European Seminar on Computer Graphics.","author":"Dvoroznak Marek","year":"2014","unstructured":"Marek Dvoroznak . 2014 . Interactive as-rigid-as-possible image deformation and registration . In Proceedings of the 18th Central European Seminar on Computer Graphics. Marek Dvoroznak. 2014. Interactive as-rigid-as-possible image deformation and registration. In Proceedings of the 18th Central European Seminar on Computer Graphics."},{"volume-title":"Recent Advances in Learning and Control","author":"Grant Michael","key":"e_1_2_2_15_1","unstructured":"Michael Grant and Stephen Boyd . 2008. Graph implementations for nonsmooth convex programs . In Recent Advances in Learning and Control , V. Blondel, S. Boyd, and H. Kimura (Eds.). Springer-Verlag Limited , 95\u2013110. http:\/\/stanford.edu\/ boyd\/graph_dcp.html. Michael Grant and Stephen Boyd. 2008. Graph implementations for nonsmooth convex programs. In Recent Advances in Learning and Control, V. Blondel, S. Boyd, and H. Kimura (Eds.). Springer-Verlag Limited, 95\u2013110. http:\/\/stanford.edu\/ boyd\/graph_dcp.html."},{"key":"e_1_2_2_16_1","volume-title":"CVX: Matlab software for disciplined convex programming, version 2.1","author":"Grant Michael","year":"2014","unstructured":"Michael Grant and Stephen Boyd . 2014 . CVX: Matlab software for disciplined convex programming, version 2.1 . http:\/\/cvxr.com\/cvx. (March 2014). Michael Grant and Stephen Boyd. 2014. CVX: Matlab software for disciplined convex programming, version 2.1. http:\/\/cvxr.com\/cvx. (March 2014)."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601174"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925892"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925875"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661229.2661260"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555815.1555775"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(81)90024-2"},{"key":"e_1_2_2_24_1","volume-title":"International Conference on Learning Representations.","author":"Hu Yuanming","year":"2020","unstructured":"Yuanming Hu , Luke Anderson , Tzu-Mao Li , Qi Sun , Nathan Carr , Jonathan Ragan-Kelley , and Fr\u00e9do Durand . 2020 . DiffTaichi: Differentiable programming for physical simulation . In International Conference on Learning Representations. Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Fr\u00e9do Durand. 2020. DiffTaichi: Differentiable programming for physical simulation. In International Conference on Learning Representations."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356506"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_22"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2047196.2047270"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/79.768574"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2866569"},{"key":"e_1_2_2_31_1","volume-title":"2011 International Conference on Robotics and Automation (ICRA). IEEE, 3607\u20133613","author":"K\u00fcmmerle Rainer","year":"2011","unstructured":"Rainer K\u00fcmmerle , Giorgio Grisetti , Hauke Strasdat , Kurt Konolige , and Wolfram Burgard . 2011 . G2o: A general framework for graph optimization . In 2011 International Conference on Robotics and Automation (ICRA). IEEE, 3607\u20133613 . Rainer K\u00fcmmerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. 2011. G2o: A general framework for graph optimization. In 2011 International Conference on Robotics and Automation (ICRA). IEEE, 3607\u20133613."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793527"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201383"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818013"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925907"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925952"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISMAR.2011.6092378"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508363.2508374"},{"key":"e_1_2_2_39_1","unstructured":"J. Nocedal and S. J. Wright. 2006. Numerical Optimization (2nd ed.). Springer New York.  J. Nocedal and S. J. Wright. 2006. Numerical Optimization (2nd ed.). Springer New York."},{"key":"e_1_2_2_40_1","first-page":"123","article-title":"Proximal algorithms","volume":"1","author":"Parikh Neal","year":"2013","unstructured":"Neal Parikh and Stephen Boyd . 2013 . Proximal algorithms . Foundations and Trends in Optimization 1 , 3 (2013), 123 \u2013 231 . Neal Parikh and Stephen Boyd. 2013. Proximal algorithms. Foundations and Trends in Optimization 1, 3 (2013), 123\u2013231.","journal-title":"Foundations and Trends in Optimization"},{"key":"e_1_2_2_41_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W.  Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W."},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/882262.882269"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130883"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.445"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/4021.4023"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3310248"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/1281991.1282006"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276478"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818056"},{"volume-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR). IEEE, 2387\u20132395","author":"Thies J.","key":"e_1_2_2_51_1","unstructured":"J. Thies , M. Zollh\u00f6fer , M. Stamminger , C. Theobalt , and M. Nie\u00dfner . 2016. Face2Face: Real-time face capture and reenactment of RGB videos . In Proceedings of Computer Vision and Pattern Recognition (CVPR). IEEE, 2387\u20132395 . J. Thies, M. Zollh\u00f6fer, M. Stamminger, C. Theobalt, and M. Nie\u00dfner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of Computer Vision and Pattern Recognition (CVPR). IEEE, 2387\u20132395."},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.5555\/646271.685629"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661229.2661232"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3181973"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISETC.2018.8583952"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766887"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601165"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3453986","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3453986","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:51Z","timestamp":1750193271000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3453986"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,24]]},"references-count":56,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3453986"],"URL":"https:\/\/doi.org\/10.1145\/3453986","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2021,9,24]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}