{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:44:37Z","timestamp":1774982677127,"version":"3.50.1"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T00:00:00Z","timestamp":1734480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,12,18]]},"abstract":"<jats:p>Multi-instance graph algorithms interleave the evaluation of multiple instances of the same algorithm with different inputs over the same graph. They have been shown to be significantly faster than traditional serial and batch evaluation, by sharing computation across instances. However, writing correct multi-instance algorithms is challenging; and in this work, we describe AutoMI, a framework for automatically converting vertex-centric graph algorithms into their vectorized multi-instance versions. We also develop an algebraic characterization of algorithms that can benefit best from multi-instance computation with simpler and faster streamlined vectorization. This allows users to decide when to use such optimization and instruct AutoMI to make the best use of SIMD vectorization. Using 6 real-life graphs, we show that AutoMI-converted multi-instance algorithms are 9.6 to 29.5 times faster than serial evaluation, 7.1 to 26.4 times faster than batch evaluation, and are even 2.6 to 4.6 times faster than existing highly optimized handcrafted multi-instance algorithms without vectorization.<\/jats:p>","DOI":"10.1145\/3698833","type":"journal-article","created":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T16:40:35Z","timestamp":1734712835000},"page":"1-27","source":"Crossref","is-referenced-by-count":0,"title":["Automating Vectorized Distributed Graph Computation"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-2941-2658","authenticated-orcid":false,"given":"Wenyue","family":"Zhao","sequence":"first","affiliation":[{"name":"University of Edinburgh, Edinburgh, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7984-3219","authenticated-orcid":false,"given":"Yang","family":"Cao","sequence":"additional","affiliation":[{"name":"University of Edinburgh, Edinburgh, UK"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-9056-8587","authenticated-orcid":false,"given":"Peter","family":"Buneman","sequence":"additional","affiliation":[{"name":"University of Edinburgh, Edinburgh, UK"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6023-9969","authenticated-orcid":false,"given":"Jia","family":"Li","sequence":"additional","affiliation":[{"name":"Edinburgh Research Center, Central Software Institute, Huawei, Edinburgh, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8676-0948","authenticated-orcid":false,"given":"Nikos","family":"Ntarmos","sequence":"additional","affiliation":[{"name":"Edinburgh Research Center, Central Software Institute, Huawei, Edinburgh, UK"}]}],"member":"320","published-online":{"date-parts":[[2024,12,20]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"Access: 2024. Friendster. http:\/\/konect.cc\/networks\/friendster\/."},{"key":"e_1_2_2_2_1","unstructured":"Access: 2024. Giraph. https:\/\/giraph.apache.org\/."},{"key":"e_1_2_2_3_1","unstructured":"Access: 2024. Intel\u00ae Intrinsics Guide. https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/intrinsics-guide\/index.html."},{"key":"e_1_2_2_4_1","unstructured":"Access: 2024. libclang. https:\/\/clang.llvm.org\/."},{"key":"e_1_2_2_5_1","unstructured":"Access: 2024. LiveJournal. http:\/\/snap.stanford.edu\/data\/soc-LiveJournal1.html."},{"key":"e_1_2_2_6_1","unstructured":"Access: 2024. MovieLens. http:\/\/grouplens.org\/datasets\/movielens\/."},{"key":"e_1_2_2_7_1","unstructured":"Access: 2024. Multi-instance BFS implementation. https:\/\/github.com\/mtodat\/ms-bfs."},{"key":"e_1_2_2_8_1","unstructured":"Access: 2024. Netflix. http:\/\/konect.cc\/networks\/netflix\/."},{"key":"e_1_2_2_9_1","unstructured":"Access: 2024. PowerLyra. https:\/\/github.com\/realstolz\/powerlyra\/."},{"key":"e_1_2_2_10_1","unstructured":"Access: 2024. Quegel. http:\/\/www.cse.cuhk.edu.hk\/systems\/quegel\/."},{"key":"e_1_2_2_11_1","unstructured":"Access: 2024. Twitter. http:\/\/konect.cc\/networks\/twitter\/."},{"key":"e_1_2_2_12_1","unstructured":"Access: 2024. UK domain. http:\/\/konect.cc\/networks\/dimacs10-uk-2007-05\/."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3643027"},{"key":"e_1_2_2_14_1","volume-title":"Multiple-Query Optimization of Regular Path Queries. In 33rd IEEE International Conference on Data Engineering, ICDE 2017","author":"Abul-Basher Zahid","year":"2017","unstructured":"Zahid Abul-Basher. 2017. Multiple-Query Optimization of Regular Path Queries. In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19--22, 2017. IEEE Computer Society, 1426--1430."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3231751.3231764"},{"key":"e_1_2_2_16_1","volume-title":"Processing. In Proceedings of the 19th International Conference on Data Engineering, March 5--8","year":"2003","unstructured":"NicolasBruno,LuisGravano,NickKoudas,andDiveshSrivastava.2003.Navigation-vs.Index-BasedXMLMulti-Query Processing. In Proceedings of the 19th International Conference on Data Engineering, March 5--8, 2003, Bangalore, India, Umeshwar Dayal, Krithi Ramamritham, and T. M. Vijayaraman (Eds.). IEEE Computer Society, 139--150."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44503-X_20"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972740.43"},{"key":"e_1_2_2_19_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--16","unstructured":"HongzhengChen,MinghuaShen,NongXiao,andYutongLu.2021.Krill:acompilerandruntimesystemforconcurrent graph processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--16."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3298989"},{"key":"e_1_2_2_21_1","volume-title":"Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing. In ICPP 2021: 50th International Conference on Parallel Processing","author":"Esfahani Mohsen Koohi","year":"2021","unstructured":"Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2021. Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing. In ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9 - 12, 2021, Xian-He Sun, Sameer Shende, Laxmikant V. Kal\u00e9, and Yong Chen (Eds.). ACM, 42:1--42:10."},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3605573.3605623"},{"key":"e_1_2_2_23_1","doi-asserted-by":"crossref","unstructured":"Floris Geerts Thomas Mu\u00f1oz Cristian Riveros and Domagoj Vrgoc. 2021. Expressive Power of Linear Algebra Query Languages. In PODS.","DOI":"10.1145\/3452021.3458314"},{"key":"e_1_2_2_24_1","volume-title":"PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX symposium on operating systems design and implemen- tation (OSDI 12)","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX symposium on operating systems design and implemen- tation (OSDI 12). 17--30."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1265530.1265535"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3034786.3056125"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2732980"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2151013"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3225632.3225755"},{"key":"e_1_2_2_30_1","volume-title":"Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017","author":"Kaufmann Moritz","year":"2017","unstructured":"Moritz Kaufmann, Manuel Then, Alfons Kemper, and Thomas Neumann. 2017. Parallel Array-Based Single- and Multi-Source Breadth First Searches on Large Dense Graphs. In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21--24, 2017, Volker Markl, Salvatore Orlando, Bernhard Mitschang, Periklis Andritsos, Kai-Uwe Sattler, and Sebastian Bre\u00df (Eds.). OpenProceedings.org, 1--12."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_2_2_32_1","volume-title":"10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012","author":"Kyrola Aapo","year":"2012","unstructured":"Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8--10, 2012, Chandu Thekkath and Amin Vahdat (Eds.). USENIX Association, 31--46."},{"key":"e_1_2_2_33_1","volume-title":"Scalable Multi-query Optimization for SPARQL. In IEEE 28th International Conference on Data Engineering (ICDE 2012","author":"Le Wangchao","year":"2012","unstructured":"Wangchao Le, Anastasios Kementsietsidis, Songyun Duan, and Feifei Li. 2012. Scalable Multi-query Optimization for SPARQL. In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1--5 April, 2012, Anastasios Kementsietsidis and Marcos Antonio Vaz Salles (Eds.). IEEE Computer Society, 666--677."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3603581.3603594"},{"key":"e_1_2_2_35_1","volume-title":"WISA 2021, Kaifeng, China, September 24--26, 2021, Proceedings (Lecture Notes in Computer Science","volume":"646","author":"Liu Xinyi","year":"2021","unstructured":"Xinyi Liu, Zhigang Wang, Ning Wang, Xiangtan Li, Bo Zhang, Jun Qiao, Zhiqiang Wei, and Jie Nie. 2021. An Adaptive Sharing Framework for Efficient Multi-source Shortest Path Computation. In Web Information Systems and Applications - 18th International Conference, WISA 2021, Kaifeng, China, September 24--26, 2021, Proceedings (Lecture Notes in Computer Science, Vol. 12999), Chunxiao Xing, Xiaoming Fu, Yong Zhang, Guigang Zhang, and Chaolemen Borjigin (Eds.). Springer, 635--646."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_2_2_37_1","volume-title":"Cache-Efficient Fork-Processing Patterns on Large Graphs. In SIGMOD '21: International Conference on Management of Data","author":"Lu Shengliang","year":"2021","unstructured":"Shengliang Lu, Shixuan Sun, Johns Paul, Yuchen Li, and Bingsheng He. 2021. Cache-Efficient Fork-Processing Patterns on Large Graphs. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 1208--1221."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735508.2735517"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData47090.2019.9006359"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818185"},{"key":"e_1_2_2_42_1","first-page":"321","article-title":"Semiring frameworks and algorithms for shortest-distance problems","volume":"7","author":"Mehryar Mohri","year":"2002","unstructured":"Mehryar Mohri et al. 2002. Semiring frameworks and algorithms for shortest-distance problems. Journal of Automata, Languages and Combinatorics 7, 3 (2002), 321--350.","journal-title":"Journal of Automata, Languages and Combinatorics"},{"key":"e_1_2_2_43_1","volume-title":"Congra: Towards Efficient Processing of Concurrent Graph Queries on Shared-Memory Machines. In 2017 IEEE International Conference on Computer Design, ICCD 2017","author":"Pan Peitian","year":"2017","unstructured":"Peitian Pan and Chao Li. 2017. Congra: Towards Efficient Processing of Concurrent Graph Queries on Shared-Memory Machines. In 2017 IEEE International Conference on Computer Design, ICCD 2017, Boston, MA, USA, November 5--8, 2017. IEEE Computer Society, 217--224."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2947050"},{"key":"e_1_2_2_45_1","doi-asserted-by":"crossref","unstructured":"Orestis Polychroniou Arun Raghavan and Kenneth A Ross. 2015. Rethinking SIMD vectorization for in-memory databases. In SIGMOD.","DOI":"10.1145\/2723372.2747645"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2384616.2384644"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2737953"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.14778\/3021924.3021929"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/42201.42203"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.54724"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442530"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2017.27"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735507"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/3090163.3090166"},{"key":"e_1_2_2_56_1","volume-title":"Practical SIMD Vectorization Techniques for Intel\u00ae Xeon Phi Coprocessors. In IPDPS Workshops.","author":"Tian Xinmin","year":"2013","unstructured":"Xinmin Tian, Hideki Saito, Serguei Preis, Eric N. Garcia, Sergey Kozhukhov, Matt Masten, Aleksei G. Cherkasov, and Nikolay Panchenko. 2013. Practical SIMD Vectorization Techniques for Intel\u00ae Xeon Phi Coprocessors. In IPDPS Workshops."},{"key":"e_1_2_2_57_1","volume-title":"The 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC'14","year":"2014","unstructured":"JilongXue,ZhiYang,ZhiQu,ShianHou,andYafeiDai.2014.Seraph:anefficient,low-costsystemforconcurrentgraph processing. In The 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC'14, Vancouver, BC, Canada - June 23 - 27, 2014, Beth Plale, Matei Ripeanu, Franck Cappello, and Dongyan Xu (Eds.). ACM, 227--238."},{"key":"e_1_2_2_58_1","doi-asserted-by":"crossref","unstructured":"Da Yan Yingyi Bu Yuanyuan Tian Amol Deshpande et al. 2017. Big graph analytics platforms. Foundations and Trends\u00ae in Databases 7 1--2 (2017) 1--195.","DOI":"10.1561\/1900000056"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/2904483.2904488"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/2904483.2904488"},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470362"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3567955.3567963"},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3127479.3128606"},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2899398"},{"key":"e_1_2_2_65_1","volume-title":"Approach for Efficient Concurrent Iterative Graph Processing. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018","year":"2018","unstructured":"YuZhang,XiaofeiLiao,HaiJin,LinGu,LigangHe,BingshengHe,andHaikunLiu.2018.CGraph:ACorrelations-aware Approach for Efficient Concurrent Iterative Graph Processing. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11--13, 2018, Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX Association, 441--452."},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276491"},{"key":"e_1_2_2_67_1","doi-asserted-by":"crossref","unstructured":"Jin Zhao Yu Zhang Ligang He Qikun Li Xiang Zhang Xinyu Jiang Hui Yu Xiaofei Liao Hai Jin Lin Gu et al. 2023. GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing. ACM Transactions on Architecture and Code Optimization (2023).","DOI":"10.1145\/3600091"},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356143"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3698833","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3698833","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T17:44:59Z","timestamp":1774979099000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3698833"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,18]]},"references-count":68,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,18]]}},"alternative-id":["10.1145\/3698833"],"URL":"https:\/\/doi.org\/10.1145\/3698833","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,18]]}}}