{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T09:01:06Z","timestamp":1773910866582,"version":"3.50.1"},"reference-count":86,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T00:00:00Z","timestamp":1715817600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["2106999"],"award-info":[{"award-number":["2106999"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components problem on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is an inherent limitation of this approach and is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM.<\/jats:p>\n          <jats:p\/>\n          <jats:p>\n            We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call\n            <jats:sc>GraphZeppelin<\/jats:sc>\n            , uses new linear sketching data structures (\n            <jats:sc>CubeSketch<\/jats:sc>\n            ) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph.\n            <jats:sc>GraphZeppelin<\/jats:sc>\n            is optimized for massive dense graphs:\n            <jats:sc>GraphZeppelin<\/jats:sc>\n            can process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result\n            <jats:sc>GraphZeppelin<\/jats:sc>\n            vastly increases the scale of graphs that can be processed.\n          <\/jats:p>","DOI":"10.1145\/3643846","type":"journal-article","created":{"date-parts":[[2024,2,20]],"date-time":"2024-02-20T12:23:46Z","timestamp":1708431826000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["<scp>GraphZeppelin<\/scp>\n            : How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)"],"prefix":"10.1145","volume":"49","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7542-9800","authenticated-orcid":false,"given":"David","family":"Tench","sequence":"first","affiliation":[{"name":"Rutgers University, New Brunswick, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5974-7745","authenticated-orcid":false,"given":"Evan","family":"West","sequence":"additional","affiliation":[{"name":"Stony Brook University, Stony Brook, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-8077-1099","authenticated-orcid":false,"given":"Victor","family":"Zhang","sequence":"additional","affiliation":[{"name":"Rutgers University, New Brunswick, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7639-530X","authenticated-orcid":false,"given":"Michael A.","family":"Bender","sequence":"additional","affiliation":[{"name":"Dept of Computer Science, Stony Brook University, Stony Brook, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6671-7031","authenticated-orcid":false,"given":"Abiyaz","family":"Chowdhury","sequence":"additional","affiliation":[{"name":"Stony Brook University, Stony Brook, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7636-0107","authenticated-orcid":false,"given":"Daniel","family":"Delayo","sequence":"additional","affiliation":[{"name":"Stony Brook University, Stony Brook, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5476-8746","authenticated-orcid":false,"given":"J. Ahmed","family":"Dellas","sequence":"additional","affiliation":[{"name":"Rutgers University, New Brunswick, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3616-7788","authenticated-orcid":false,"given":"Mart\u00edn","family":"Farach-Colton","sequence":"additional","affiliation":[{"name":"Rutgers University, New Brunswick, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-3651-2485","authenticated-orcid":false,"given":"Tyler","family":"Seip","sequence":"additional","affiliation":[{"name":"MongoDB, New York City, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8452-6573","authenticated-orcid":false,"given":"Kenny","family":"Zhang","sequence":"additional","affiliation":[{"name":"Stony Brook University, Stony Brook, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,5,16]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-001-0088-5"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2213556.2213560"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611973099.40"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2858250"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1242\/jcs.02714"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPAS.2018.8708900"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.14778\/3184470.3184473"},{"key":"e_1_3_2_9_2","first-page":"45","volume-title":"Proceedings of the Cray User Group (CUG)","author":"Ang J.","year":"2010","unstructured":"J. Ang, Brian W. Barrett, Kyle B. Wheeler, and Richard C. Murphy. 2010. Introducing the graph 500. In Proceedings of the Cray User Group (CUG). 45\u201374."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-60220-8_74"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150462"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1562764.1562787"},{"key":"e_1_3_2_13_2","volume-title":"Proceedings of the Web Science (WebSci\u201909)","author":"Bordino Ilaria","year":"2009","unstructured":"Ilaria Bordino and Debora Donato. 2009. Dynamic characterization of a large Web graph. In Proceedings of the Web Science (WebSci\u201909)."},{"key":"e_1_3_2_14_2","first-page":"23:1\u201323:23","volume-title":"Proceedings of the19th International Symposium on Experimental Algorithms (SEA\u201921)","author":"Brodal Gerth St\u00f8lting","year":"2021","unstructured":"Gerth St\u00f8lting Brodal, Rolf Fagerberg, David Hammer, Ulrich Meyer, Manuel Penschuck, and Hung Tran. 2021. An experimental study of external memory algorithms for connected components. In Proceedings of the19th International Symposium on Experimental Algorithms (SEA\u201921). 23:1\u201323:23."},{"key":"e_1_3_2_15_2","first-page":"280","volume-title":"Proceedings of the European Parallel Virtual Machine\/Message Passing Interface Users\u2019 Group Meeting (EuroMPI\u201901)","author":"Bu\u0161 Libor","year":"2001","unstructured":"Libor Bu\u0161 and Pavel Tvrdik. 2001. A parallel algorithm for connected components on distributed memory machines. In Proceedings of the European Parallel Virtual Machine\/Message Passing Interface Users\u2019 Group Meeting (EuroMPI\u201901). 280\u2013287."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2018.8547541"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113362"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2168836.2168846"},{"key":"e_1_3_2_19_2","first-page":"139","volume-title":"Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201995)","author":"Chiang Yi-Jen","year":"1995","unstructured":"Yi-Jen Chiang, Michael T. Goodrich, Edward F. Grove, Roberto Tamassia, Darren Erik Vengroff, and Jeffrey Scott Vitter. 1995. External-memory graph algorithms. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201995). 139\u2013149."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824077"},{"key":"e_1_3_2_21_2","unstructured":"Yann Collet. 2016. xxHash-Extremely fast non-cryptographic hash algorithm. Retrieved from https:\/\/github.com\/Cyan4973\/xxHash. Accessed March 1 2023."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10619-013-7131-9"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454225"},{"key":"e_1_3_2_24_2","first-page":"918","article-title":"Low-latency graph streaming using compressed purely-functional trees","author":"Dhulipala Laxman","year":"2019","unstructured":"Laxman Dhulipala, Guy Blelloch, and Julian Shun. 2019. Low-latency graph streaming using compressed purely-functional trees. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).918\u2013934. Retrieved from https:\/\/par.nsf.gov\/biblio\/10137103","journal-title":"In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2012.6408680"},{"key":"e_1_3_2_26_2","first-page":"226","volume-title":"Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201996)","author":"Ester Martin","year":"1996","unstructured":"Martin Ester, Hans-Peter Kriegel, J\u00f6rg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201996). 226\u2013231."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994538"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2005.09.013"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00013"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/181014.181021"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2745754.2745763"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487581"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2008.10.013"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2017.04.018"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISDA.2010.5687244"},{"key":"e_1_3_2_36_2","first-page":"309","volume-title":"Proceedings of the12th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201915)","author":"Iyer Anand","year":"2015","unstructured":"Anand Iyer, Li Erran Li, and Ion Stoica. 2015. CellIQ: Real-time cellular network analytics at scale. In Proceedings of the12th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201915). 309\u2013322."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/2960414.2960419"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1090\/conm\/026\/737400"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2901736"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2011.26"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2010.121"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.5220\/0006173704990508"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/2743021"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2011.85"},{"key":"e_1_3_2_45_2","first-page":"1","article-title":"Connected components on distributed memory machines","volume":"30","author":"Krishnamurthy Arvind","year":"1997","unstructured":"Arvind Krishnamurthy, Steven Lumetta, David E. Culler, and Katherine Yelick. 1997. Connected components on distributed memory machines. Third DIMACS Implementation Challenge 30 (1997), 1\u201321.","journal-title":"Third DIMACS Implementation Challenge"},{"key":"e_1_3_2_46_2","first-page":"31","volume-title":"Proceedings of the10th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201912)","author":"Kyrola Aapo","year":"2012","unstructured":"Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the10th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201912). 31\u201346."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-13186-3_53"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/1232722.1232727"},{"key":"e_1_3_2_49_2","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http:\/\/snap.stanford.edu\/data. Accessed March 1 2023."},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2320716"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/35021BIGCOMP.2015.7072830"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-016-0393-1"},{"key":"e_1_3_2_53_2","first-page":"285","volume-title":"Proceedings of the15th USENIX Conference on File and Storage Technologies (FAST\u201917)","author":"Liu Hang","year":"2017","unstructured":"Hang Liu and H. Howie Huang. 2017. Graphene: Fine-grained I\/O management for graph computing. In Proceedings of the15th USENIX Conference on File and Storage Technologies (FAST\u201917). 285\u2013300."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3064176.3064191"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113298"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2008.11.001"},{"key":"e_1_3_2_57_2","first-page":"548","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201912)","author":"McAuley Julian J.","year":"2012","unstructured":"Julian J. McAuley and Jure Leskovec. 2012. Learning to discover social circles in ego networks.. In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201912). 548\u201356."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-48054-0_39"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.0020173"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/1353343.1353418"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/2983551"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1561\/0400000002"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611975482.111"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0012-365X(00)00224-7"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1101\/gr.213959.116"},{"key":"e_1_3_2_66_2","volume-title":"OpenMP Application Programming Interface","author":"OpenMP Architecture Review Board","year":"2018","unstructured":"OpenMP Architecture Review Board 2018. OpenMP Application Programming Interface (5th ed.). OpenMP Architecture Review Board."},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457313"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.9"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/98267.98287"},{"key":"e_1_3_2_70_2","doi-asserted-by":"crossref","unstructured":"Matei Ripeanu and Ian Foster. 2002. Mapping the gnutella network: Macroscopic properties of large-scale peer-to-peer systems. Peer-to-Peer Systems Peter Druschel Frans Kaashoek and Antony Rowstron (Eds.). Springer Berlin Heidelberg Berlin Heidelberg 85\u201393.","DOI":"10.1007\/3-540-45748-8_8"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v29i1.9277"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1145\/3186728.3164139"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-58667-0_6"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-43659-3_24"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526146"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/1687399.1687477"},{"key":"e_1_3_2_78_2","volume-title":"Graph Clustering by Flow Simulation","author":"Dongen Stijn Marinus Van","year":"2000","unstructured":"Stijn Marinus Van Dongen. 2000. Graph Clustering by Flow Simulation. Ph.D. Dissertation. University Utrecht."},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/384192.384193"},{"key":"e_1_3_2_80_2","first-page":"429","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC\u201919)","author":"Vora Keval","year":"2019","unstructured":"Keval Vora. 2019. LUMOS: Dependency-driven disk-based graph processing. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC\u201919). 429\u2013442."},{"key":"e_1_3_2_81_2","first-page":"507","volume-title":"Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC\u201916)","author":"Vora Keval","year":"2016","unstructured":"Keval Vora, Guoqing Xu, and Rajiv Gupta. 2016. Load the edges you need: A generic I\/O optimization for disk-based graph processing. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC\u201916). 507\u2013522."},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00541-4"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-10-169"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2518664"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173208"},{"key":"e_1_3_2_86_2","first-page":"45","volume-title":"Proceedings of the13th USENIX Conference on File and Storage Technologies (FAST\u201915)","author":"Zheng Da","year":"2015","unstructured":"Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of the13th USENIX Conference on File and Storage Technologies (FAST\u201915). 45\u201358."},{"key":"e_1_3_2_87_2","first-page":"375","volume-title":"Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201915)","author":"Zhu Xiaowei","year":"2015","unstructured":"Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201915). 375\u2013386."}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643846","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643846","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:58Z","timestamp":1750291438000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643846"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,16]]},"references-count":86,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3643846"],"URL":"https:\/\/doi.org\/10.1145\/3643846","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"value":"0362-5915","type":"print"},{"value":"1557-4644","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,16]]},"assertion":[{"value":"2023-03-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}