{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T09:28:39Z","timestamp":1763458119751,"version":"3.45.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,12,28]],"date-time":"2017-12-28T00:00:00Z","timestamp":1514419200000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"U.S. Government"},{"DOI":"10.13039\/100000185","name":"DARPA","doi-asserted-by":"crossref","award":["HR0011-13-3-0001"],"award-info":[{"award-number":["HR0011-13-3-0001"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2016,12,28]]},"abstract":"<jats:p>This article describes Surge, a nested data-parallel programming system designed to simplify the porting and tuning of parallel applications to multiple target architectures. Surge decouples high-level specification of computations, expressed using a C++ programming interface, from low-level implementation details using two first-class constructs: schedules and policies. Schedules describe the valid ways in which data-parallel operators may be implemented, while policies encapsulate a set of parameters that govern platform-specific code generation. These two mechanisms are used to implement a code generation system that analyzes computations and automatically generates a search space of valid platform-specific implementations. An input and architecture-adaptive autotuning system then explores this search space to find optimized implementations. We express in Surge five real-world benchmarks from domains such as machine learning and sparse linear algebra and from the high-level specifications, Surge automatically generates CPU and GPU implementations that perform on par with or better than manually optimized versions.<\/jats:p>","DOI":"10.1145\/3012011","type":"journal-article","created":{"date-parts":[[2016,12,28]],"date-time":"2016-12-28T08:20:40Z","timestamp":1482913240000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Designing a Tunable Nested Data-Parallel Programming System"],"prefix":"10.1145","volume":"13","author":[{"given":"Saurav","family":"Muralidharan","sequence":"first","affiliation":[{"name":"University of Utah"}]},{"given":"Michael","family":"Garland","sequence":"additional","affiliation":[{"name":"NVIDIA Corporation, Santa Clara, CA"}]},{"given":"Albert","family":"Sidelnik","sequence":"additional","affiliation":[{"name":"NVIDIA Corporation, Santa Clara, CA"}]},{"given":"Mary","family":"Hall","sequence":"additional","affiliation":[{"name":"University of Utah, Salt Lake City, UT"}]}],"member":"320","published-online":{"date-parts":[[2016,12,28]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628092"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228411"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1869459.1869469"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_2_1_5_1","first-page":"359","article-title":"Thrust: A productivity-oriented library for CUDA","volume":"2","author":"Bell Nathan","year":"2011","unstructured":"Nathan Bell and Jared Hoberock. 2011. Thrust: A productivity-oriented library for CUDA. GPU Comput. Gems Jade Ed. 2 (2011), 359--371.","journal-title":"GPU Comput. Gems Jade Ed."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442525"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/865063"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2854038.2854042"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2011.15"},{"key":"e_1_2_1_10_1","first-page":"3","article-title":"Concurrent collections. Sci","volume":"18","author":"Budimli\u0107 Zoran","year":"2010","unstructured":"Zoran Budimli\u0107, Michael Burke, Vincent Cav\u00e9, Kathleen Knobe, Geoff Lowney, Ryan Newton, Jens Palsberg, David Peixotto, Vivek Sarkar, Frank Schlimbach, and Sa\u011fnak Ta\u015firlar. 2010. Concurrent collections. Sci. Program. 18, 3--4 (Aug. 2010), 203--217.","journal-title":"Program."},{"key":"e_1_2_1_11_1","volume-title":"Retrieved","author":"Catanzaro Bryan","year":"2014","unstructured":"Bryan Catanzaro. 2014. GPU K-Means Clustering. Retrieved October 28, 2016 from https:\/\/github.com\/bryancatanzaro\/kmeans."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941553.1941562"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1248648.1248652"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783715"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"e_1_2_1_16_1","volume-title":"Retrieved","author":"Dalton Steven","year":"2010","unstructured":"Steven Dalton, Nathan Bell, and Michael Garland. 2010. CUSP Library. Retrieved October 28, 2016 from http:\/\/cusplibrary.github.io\/."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_2_1_18_1","volume-title":"Retrieved","author":"Demidov Denis","year":"2016","unstructured":"Denis Demidov, Karsten Ahnert, Karl Rupp, and Peter Gottschling. 2016. VexCL Symbolic Type. Retrieved October 28, 2016 from http:\/\/vexcl.readthedocs.io\/en\/latest\/symbolic.html."},{"key":"e_1_2_1_19_1","volume-title":"Retrieved","author":"Edwards H. Carter","year":"2016","unstructured":"H. Carter Edwards, Christian Trott, Juan Alday, Jesse Perla, Mauro Bianco, Robin Maffeo, Ben Sander, and Bryce Lelbach. 2016. Polymorphic multidimensional array reference. ISO\/IEC C++ Standards Committee Paper P0009R2. Retrieved October 28, 2016 from http:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2016\/p0009r2.html."},{"volume-title":"Retrieved","year":"2015","key":"e_1_2_1_20_1","unstructured":"ExMatEx. 2015. DoE Exascale Co-Design Center for Materials in Extreme Environments. Retrieved October 28, 2016 from http:\/\/www.exmatex.org."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161004"},{"key":"e_1_2_1_22_1","volume-title":"Retrieved","author":"Hoberock Jared","year":"2016","unstructured":"Jared Hoberock. 2016. Working draft, technical specification for C++ extensions for parallelism version 2. ISO\/IEC C++ Standards Committee Paper N4578 (2016). Retrieved October 28, 2016 from http:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2016\/n4578.html."},{"key":"e_1_2_1_23_1","volume-title":"Retrieved","author":"Hoberock Jared","year":"2016","unstructured":"Jared Hoberock, Michael Garland, and Olivier Giroux. 2016. An interface for abstracting execution. ISO\/IEC C++ Standards Committee Paper P0058R1 (2016). Retrieved October 28, 2016 from http:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2016\/p0058r1.pdf."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/242224.242477"},{"volume-title":"Retrieved","year":"2016","key":"e_1_2_1_25_1","unstructured":"Intel. 2016. Math Kernel Library. Retrieved October 28, 2016 from https:\/\/software.intel.com\/en-us\/intel-mkl."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04652-0_6"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/165854.165874"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2364506.2364512"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1863543.1863582"},{"key":"e_1_2_1_30_1","volume-title":"Retrieved","author":"Kretz Matthias","year":"2016","unstructured":"Matthias Kretz. 2016. Data-parallel vector types and operations. ISO\/IEC C++ Standards Committee Paper P0214R1 (2016). Retrieved October 28, 2016 from http:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2016\/p0214r1.pdf."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.23"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_2_1_33_1","unstructured":"Duane G. Merrill III. 2011. Allocation-oriented Algorithm Design with Application to GPU Computing. Ph.D. Dissertation. University of Virginia Charlottesville VA. UMI Order Number: AAI 3501820."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.59"},{"key":"e_1_2_1_35_1","volume-title":"Retrieved","author":"Cray NVIDIA","year":"2015","unstructured":"NVIDIA, Cray, CAPS, and PGI. 2015. The OpenACC Specification version 2.0a. Retrieved October 28, 2016 from http:\/\/www.openacc.org\/sites\/default\/files\/OpenACC.2.0a_1.pdf."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSoC.2014.43"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993501"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2384616.2384644"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1868294.1868314"},{"key":"e_1_2_1_41_1","volume-title":"Retrieved","author":"Sakharnykh Nikolay","year":"2013","unstructured":"Nikolay Sakharnykh. 2013. CoMD-CUDA. Retrieved October 28, 2016 from https:\/\/github.com\/NVIDIA\/CoMD-CUDA."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2784731.2784754"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161054"},{"key":"e_1_2_1_44_1","unstructured":"Vladimir N. Vapnik. 1998. Statistical Learning Theory."},{"key":"e_1_2_1_45_1","unstructured":"Todd Veldhuizen. 1995. Expression templates. C++ Report 7 5 (1995) 26--31."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2012.21"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3012011","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3012011","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3012011","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T09:17:54Z","timestamp":1763457474000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3012011"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12,28]]},"references-count":46,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,12,28]]}},"alternative-id":["10.1145\/3012011"],"URL":"https:\/\/doi.org\/10.1145\/3012011","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2016,12,28]]},"assertion":[{"value":"2016-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-10-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-12-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}