{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:18:34Z","timestamp":1763468314980,"version":"3.41.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,12,22]],"date-time":"2015-12-22T00:00:00Z","timestamp":1450742400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"IMPECS project of DST"},{"DOI":"10.13039\/100018768","name":"MPI-SWS","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100018768","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2016,1,7]]},"abstract":"<jats:p>Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy\u2014even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.<\/jats:p>","DOI":"10.1145\/2842618","type":"journal-article","created":{"date-parts":[[2015,12,22]],"date-time":"2015-12-22T14:11:19Z","timestamp":1450793479000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Falcon"],"prefix":"10.1145","volume":"12","author":[{"given":"Unnikrishnan","family":"Cheramangalath","sequence":"first","affiliation":[{"name":"Department of CSA, Indian Institute of Science, Bangalore, India"}]},{"given":"Rupesh","family":"Nasre","sequence":"additional","affiliation":[{"name":"Department of CSE, Indian Institute of Technology, Madras, India"}]},{"given":"Y. N.","family":"Srikant","sequence":"additional","affiliation":[{"name":"Department of CSA, Indian Institute of Science, Bangalore, India"}]}],"member":"320","published-online":{"date-parts":[[2015,12,22]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"GTgraph: A Suite of Synthetic Graph Generators. Retrieved","author":"Bader D.","year":"2015","unstructured":"D. Bader and K. Madduri . 2006 . GTgraph: A Suite of Synthetic Graph Generators. Retrieved November 18, 2015 , from http:\/\/www.cse.psu.edu\/&sim;madduri\/software\/GTgraph. D. Bader and K. Madduri. 2006. GTgraph: A Suite of Synthetic Graph Generators. Retrieved November 18, 2015, from http:\/\/www.cse.psu.edu\/&sim;madduri\/software\/GTgraph."},{"volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201908)","author":"Bader D.","key":"e_1_2_2_2_1","unstructured":"D. Bader and K. Madduri . 2008. Snap, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks . In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201908) . D. Bader and K. Madduri. 2008. Snap, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201908)."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/11602569_48"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1002\/rsa.v27:2"},{"key":"e_1_2_2_5_1","doi-asserted-by":"crossref","unstructured":"Martin Burtscher and Keshav Pingali. 2011. CUDA implementation of the tree-based Barnes hut n-body algorithm. In GPU Computing Gems Emerald Edition. Morgan Kaufmann 75--92. http:\/\/iss.ices.utexas.edu\/Publications\/Papers\/burtscher11.pdf.  Martin Burtscher and Keshav Pingali. 2011. CUDA implementation of the tree-based Barnes hut n-body algorithm. In GPU Computing Gems Emerald Edition. Morgan Kaufmann 75--92. http:\/\/iss.ices.utexas.edu\/Publications\/Papers\/burtscher11.pdf.","DOI":"10.1016\/B978-0-12-384988-5.00006-1"},{"key":"e_1_2_2_6_1","volume-title":"Falcon: A Graph Manipulation Language for Heterogeneous Systems. Technical Report. CSA, IISc, Bangalore, India","author":"Nasre Unnikrishnan C, R.","year":"2015","unstructured":"Unnikrishnan C, R. Nasre , and Y. N. Srikant . 2015 . Falcon: A Graph Manipulation Language for Heterogeneous Systems. Technical Report. CSA, IISc, Bangalore, India . http:\/\/www.csa.iisc.ernet.in\/TR\/2015\/5\/. Unnikrishnan C, R. Nasre, and Y. N. Srikant. 2015. Falcon: A Graph Manipulation Language for Heterogeneous Systems. Technical Report. CSA, IISc, Bangalore, India. http:\/\/www.csa.iisc.ernet.in\/TR\/2015\/5\/."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/160985.161150"},{"key":"e_1_2_2_8_1","volume-title":"Parallel Implementation of Boruvka\u2019s Minimum Spanning Tree Algorithm Retrieved","author":"Chung S.","year":"2015","unstructured":"S. Chung and A. Condon . 1996 . Parallel Implementation of Boruvka\u2019s Minimum Spanning Tree Algorithm Retrieved November 18, 2015 , from http:\/\/www.cs.ubc.ca\/&sim;condon\/papers\/chungcondon96.pdf. S. Chung and A. Condon. 1996. Parallel Implementation of Boruvka\u2019s Minimum Spanning Tree Algorithm Retrieved November 18, 2015, from http:\/\/www.cs.ubc.ca\/&sim;condon\/papers\/chungcondon96.pdf."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.45"},{"key":"e_1_2_2_10_1","volume-title":"Retrieved","author":"DIMACS.","year":"2009","unstructured":"DIMACS. 2009 . 9th DIMACS Implementation Challenge . Retrieved November 18, 2015, from http:\/\/www.dis.uniroma1.it\/challenge9\/download.shtml. DIMACS. 2009. 9th DIMACS Implementation Challenge. Retrieved November 18, 2015, from http:\/\/www.dis.uniroma1.it\/challenge9\/download.shtml."},{"key":"e_1_2_2_11_1","volume-title":"Retrieved","author":"Erd\u0151s P.","year":"1960","unstructured":"P. Erd\u0151s and A R\u00e9nyi . 1960 . On the Evolution of Random Graphs . Retrieved November 18, 2015, from http:\/\/www.renyi.hu\/&sim;p_erdos\/1960-10.pdf. P. Erd\u0151s and A R\u00e9nyi. 1960. On the Evolution of Random Graphs. Retrieved November 18, 2015, from http:\/\/www.renyi.hu\/&sim;p_erdos\/1960-10.pdf."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145816.2145860"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/297096.297147"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370866"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2535753.2535755"},{"key":"e_1_2_2_16_1","volume-title":"Proceedings of the Conference on Parallel Object-Oriented Scientific Computing (POOSC\u201905)","author":"Gregor Douglas","year":"2005","unstructured":"Douglas Gregor and Andrew Lumsdaine . 2005 . The parallel BGL: A generic library for distributed graph computations . In Proceedings of the Conference on Parallel Object-Oriented Scientific Computing (POOSC\u201905) . Douglas Gregor and Andrew Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. In Proceedings of the Conference on Parallel Object-Oriented Scientific Computing (POOSC\u201905)."},{"volume-title":"Proceedings of the 14th International Conference on High Performance Computing (HiPC\u201907)","author":"Harish P.","key":"e_1_2_2_17_1","unstructured":"P. Harish and P. J. Narayanan . 2007. Accelerating large graph algorithms on the GPU using CUDA . In Proceedings of the 14th International Conference on High Performance Computing (HiPC\u201907) . 197--208. P. Harish and P. J. Narayanan. 2007. Accelerating large graph algorithms on the GPU using CUDA. In Proceedings of the 14th International Conference on High Performance Computing (HiPC\u201907). 197--208."},{"key":"e_1_2_2_18_1","volume-title":"Technical Report IIIT\/TR\/2009\/74. International Institute of Information Technology, Hyderabad, India.","author":"Harish P.","year":"2009","unstructured":"P. Harish , V. Vineet , and P. J. Narayanan . 2009 . Large Graph Algorithms for Massively Multithreaded Architectures . Technical Report IIIT\/TR\/2009\/74. International Institute of Information Technology, Hyderabad, India. P. Harish, V. Vineet, and P. J. Narayanan. 2009. Large Graph Algorithms for Massively Multithreaded Architectures. Technical Report IIIT\/TR\/2009\/74. International Institute of Information Technology, Hyderabad, India."},{"key":"e_1_2_2_19_1","volume-title":"Retrieved","author":"Harris Mark","year":"2007","unstructured":"Mark Harris . 2007 . Optimizing Parallel Reduction in CUDA . Retrieved November 18, 2015, from http:\/\/docs.nvidia.com\/cuda\/samples\/6_Advanced\/reduction\/doc\/reduction.pdf. Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. Retrieved November 18, 2015, from http:\/\/docs.nvidia.com\/cuda\/samples\/6_Advanced\/reduction\/doc\/reduction.pdf."},{"key":"e_1_2_2_20_1","volume-title":"Thrust: A Productivity-Oriented Library for CUDA. Technical Report","author":"Hoberock Jared","year":"2011","unstructured":"Jared Hoberock and Nathan Bell . 2011 . Thrust: A Productivity-Oriented Library for CUDA. Technical Report . Nvidia Corporation . Jared Hoberock and Nathan Bell. 2011. Thrust: A Productivity-Oriented Library for CUDA. Technical Report. Nvidia Corporation."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2151013"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2038037.1941590"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2581122.2544162"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628088"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600227"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1594835.1504194"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145816.2145831"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366231.2337168"},{"key":"e_1_2_2_31_1","volume-title":"Proceedings of the European Symposium on Algorithms (ESA\u201998)","author":"Meyer Ulrich","year":"1998","unstructured":"Ulrich Meyer and Peter Sanders . 1998 . Delta-stepping: A parallel single source shortest path algorithm . In Proceedings of the European Symposium on Algorithms (ESA\u201998) . 393--404. Ulrich Meyer and Peter Sanders. 1998. Delta-stepping: A parallel single source shortest path algorithm. In Proceedings of the European Symposium on Algorithms (ESA\u201998). 393--404."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.28"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442531"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993501"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1925844.1926445"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2398857.2384644"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(95)00079-8"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2159430.2159438"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2458523.2458531"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517327.2442530"},{"key":"e_1_2_2_44_1","volume-title":"Using the GNU Compiler Collection. Retrieved","author":"Richard","year":"2015","unstructured":"Richard M. Stallman and the GCC Developer Community. 2011 . Using the GNU Compiler Collection. Retrieved November 18, 2015 , from https:\/\/gcc.gnu.org\/onlinedocs\/gcc.pdf. Richard M. Stallman and the GCC Developer Community. 2011. Using the GNU Compiler Collection. Retrieved November 18, 2015, from https:\/\/gcc.gnu.org\/onlinedocs\/gcc.pdf."},{"volume-title":"Technical Report IMM-BSC-2008-12","author":"Stockel Morten","key":"e_1_2_2_45_1","unstructured":"Morten Stockel and Soren Bog . 2008. Concurrent Datastructures . Technical Report IMM-BSC-2008-12 . Technical University of Denmark . Morten Stockel and Soren Bog. 2008. Concurrent Datastructures. Technical Report IMM-BSC-2008-12. Technical University of Denmark."},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2008.4771802"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941553.1941580"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/79173.79181"},{"key":"e_1_2_2_49_1","volume-title":"Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDP\u201910)","author":"Xiao Shucai","year":"2010","unstructured":"Shucai Xiao and Wu Chun Feng . 2010 . Inter-block GPU communication via fast barrier synchronization . In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDP\u201910) . 1--12. Shucai Xiao and Wu Chun Feng. 2010. Inter-block GPU communication via fast barrier synchronization. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDP\u201910). 1--12."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688507"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.111"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2842618","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2842618","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:53:48Z","timestamp":1750222428000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2842618"}},"subtitle":["A Graph Manipulation Language for Heterogeneous Systems"],"short-title":[],"issued":{"date-parts":[[2015,12,22]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,1,7]]}},"alternative-id":["10.1145\/2842618"],"URL":"https:\/\/doi.org\/10.1145\/2842618","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,12,22]]},"assertion":[{"value":"2015-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-12-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}