{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T09:18:42Z","timestamp":1773825522045,"version":"3.50.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:p>Subgraph counting is a fundamental problem in understanding and analyzing graph structured data, yet computationally challenging. This calls for an accurate and efficient algorithm for Subgraph Cardinality Estimation, which is to estimate the number of all isomorphic embeddings of a query graph in a data graph. We present FaST<jats:sub>est<\/jats:sub>, a novel algorithm that combines (1) a powerful filtering technique to significantly reduce the sample space, (2) an adaptive tree sampling algorithm for accurate and efficient estimation, and (3) a worst-case optimal stratified graph sampling algorithm for hard instances. Extensive experiments on real-world datasets show that FaST<jats:sub>est<\/jats:sub>outperforms state-of-the-art sampling-based methods by up to two orders of magnitude and GNN-based methods by up to three orders of magnitude in terms of accuracy.<\/jats:p>","DOI":"10.14778\/3654621.3654635","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T22:21:08Z","timestamp":1717107668000},"page":"1697-1709","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Cardinality Estimation of Subgraph Matching: A Filtering-Sampling Approach"],"prefix":"10.14778","volume":"17","author":[{"given":"Wonseok","family":"Shin","sequence":"first","affiliation":[{"name":"Seoul National University"}]},{"given":"Siwoo","family":"Song","sequence":"additional","affiliation":[{"name":"Seoul National University"}]},{"given":"Kunsoo","family":"Park","sequence":"additional","affiliation":[{"name":"Seoul National University"}]},{"given":"Wook-Shin","family":"Han","sequence":"additional","affiliation":[{"name":"POSTECH"}]}],"member":"320","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. Boost C++ Libraries. Retrieved 2024-03-16 from http:\/\/www.boost.org\/"},{"key":"e_1_2_1_2_1","first-page":"1","article-title":"A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling","volume":"6","author":"Assadi Sepehr","year":"2018","unstructured":"Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. 2018. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. In Proceedings of ITCS. 6:1--6:20.","journal-title":"Proceedings of ITCS."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of FOCS. 739--748","author":"Atserias Albert","year":"2008","unstructured":"Albert Atserias, Martin Grohe, and D\u00e1niel Marx. 2008. Size bounds and query plans for relational joins. In Proceedings of FOCS. 739--748."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of ACM SIGKDD. 306--316","author":"Bera Suman K.","unstructured":"Suman K. Bera and C. Seshadhri. 2020. How to Count Triangles, without Seeing the Whole Graph. In Proceedings of ACM SIGKDD. 306--316."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of ACM SIGMOD. 1447--1462","author":"Bhattarai Bibek","unstructured":"Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In Proceedings of ACM SIGMOD. 1447--1462."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915236"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2016.2515595"},{"key":"e_1_2_1_8_1","volume-title":"A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinformatics","author":"Bonnici Vincenzo","year":"2013","unstructured":"Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, and Alfredo Ferro. 2013. A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinformatics (2013)."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the VLDB Endowment. 1651--1663","author":"Bressan Marco","year":"2019","unstructured":"Marco Bressan, Stefano Leucci, and Alessandro Panconesi. 2019. Motivo: Fast Motif Counting via Succinct Color Coding and Adaptive Sampling. In Proceedings of the VLDB Endowment. 1651--1663."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1009213286"},{"key":"e_1_2_1_11_1","unstructured":"George Casella and Roger L Berger. 2021. Statistical inference. Cengage Learning."},{"key":"e_1_2_1_12_1","volume-title":"Lui","author":"Chen Xiaowei","year":"2016","unstructured":"Xiaowei Chen and John C. S. Lui. 2016. Mining Graphlet Counts in Online Social Networks. In Proceedings of IEEE ICDM. 71--80."},{"key":"e_1_2_1_13_1","first-page":"1","article-title":"Random Sampling and Size Estimation Over Cyclic Joins","volume":"7","author":"Chen Yu","year":"2020","unstructured":"Yu Chen and Ke Yi. 2020. Random Sampling and Size Estimation Over Cyclic Joins. In Proceedings of ICDT. 7:1--7:18.","journal-title":"Proceedings of ICDT."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/26.4.404"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2004.75"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2274576.2274578"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.4153\/CJM-1956-045-5"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319880"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465300"},{"key":"e_1_2_1_20_1","volume-title":"Singh","author":"He Huahai","year":"2008","unstructured":"Huahai He and Ambuj K. Singh. 2008. Graphs-at-a-Time: Query Language and Access Methods for Graph Databases. In Proceedings of ACM SIGMOD. 405--418."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of ACM SIGKDD. 158--167","author":"Horv\u00e1th Tam\u00e1s","year":"2004","unstructured":"Tam\u00e1s Horv\u00e1th, Thomas G\u00e4rtner, and Stefan Wrobel. 2004. Cyclic Pattern Kernels for Predictive Graph Mining. In Proceedings of ACM SIGKDD. 158--167."},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","first-page":"1142","DOI":"10.1137\/S0097539793304601","article-title":"The Complexity of Planar Counting Problems","volume":"27","author":"Marathe Harry","year":"1998","unstructured":"Hunt, Harry B. and Marathe, Madhav V. and Radhakrishnan, Venkatesh and Stearns, Richard E. 1998. The Complexity of Planar Counting Problems. Siam Journal on Computing 27, 4 (1998), 1142--1167.","journal-title":"Siam Journal on Computing"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of ACM WWW. 441--449","author":"Jain Shweta","unstructured":"Shweta Jain and C. Seshadhri. 2017. A Fast and Provable Method for Estimating Clique Counts Using Turan's Theorem. In Proceedings of ACM WWW. 441--449."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457265"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the VLDB Endowment. 1238--1249","author":"Kim Jinha","year":"2015","unstructured":"Jinha Kim, Hyungyu Shin, Wook-Shin Han, Sungpack Hong, and Hassan Chafi. 2015. Taming subgraph isomorphism for RDF query processing. In Proceedings of the VLDB Endowment. 1238--1249."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of ACM SIGMOD. 964--976","author":"Kim Kyoungmin","year":"2021","unstructured":"Kyoungmin Kim, Hyeonji Kim, George Fletcher, and Wook Shin Han. 2021. Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation. In Proceedings of ACM SIGMOD. 964--976."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of ACM SIGMOD. 615--629","author":"Li Feifei","year":"2016","unstructured":"Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander Join: Online Aggregation via Random Walks. In Proceedings of ACM SIGMOD. 615--629."},{"key":"e_1_2_1_28_1","article-title":"k-Degenerate Graphs","volume":"22","author":"Lick Don R.","year":"1970","unstructured":"Don R. Lick and Arthur T. White. 1970. k-Degenerate Graphs. Canadian Journal of Mathematics 22, 5 (1970).","journal-title":"Canadian Journal of Mathematics"},{"key":"e_1_2_1_29_1","volume-title":"Sampling: design and analysis","author":"Lohr Sharon L","unstructured":"Sharon L Lohr. 2021. Sampling: design and analysis. CRC press."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of STACS. 542--553","author":"Marx D\u00e1niel","year":"2014","unstructured":"D\u00e1niel Marx and Michal Pilipczuk. 2014. Everything you always wanted to know about the parameterized complexity of Subgraph Isomorphism (but were afraid to ask). In Proceedings of STACS. 542--553."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the VLDB Endowment. 1692--1704","author":"Mhedhbi Amine","year":"2018","unstructured":"Amine Mhedhbi and Semih Salihoglu. 2018. Optimizing subgraph queries by combining binary and worstcase optimal joins. In Proceedings of the VLDB Endowment. 1692--1704."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the VLDB Endowment. 982--993","author":"Moerkotte Guido","year":"2009","unstructured":"Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. In Proceedings of the VLDB Endowment. 982--993."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-11-96"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of IEEE ICDE. 984--994","author":"Neumann Thomas","year":"2011","unstructured":"Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In Proceedings of IEEE ICDE. 984--994."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of ACM SIGMOD. 1099--1114","author":"Park Yeonsu","year":"2020","unstructured":"Yeonsu Park, Seongyun Ko, Sourav S. Bhowmick, Kyoungmin Kim, Kijae Hong, and Wook Shin Han. 2020. G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In Proceedings of ACM SIGMOD. 1099--1114."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1957.tb01515.x"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl301"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the VLDB Endowment. 420--431","author":"Sahu Siddhartha","year":"2017","unstructured":"Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer \u00d6zsu. 2017. The ubiquity of large graphs and surprising challenges of graph processing. In Proceedings of the VLDB Endowment. 420--431."},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the AISTATS. 488--495","author":"Shervashidze Nino","year":"2009","unstructured":"Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In Proceedings of the AISTATS. 488--495."},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Wonseok Shin Siwoo Song Kunsoo Park and Wook-Shin Han. 2023. Cardinality Estimation of Subgraph Matching: A Filtering-Sampling Approach. arXiv:2309.15433","DOI":"10.14778\/3654621.3654635"},{"key":"e_1_2_1_41_1","volume-title":"Kostylev","author":"Stefanoni Giorgio","year":"2018","unstructured":"Giorgio Stefanoni, Boris Motik, and Egor V. Kostylev. 2018. Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. Proceedings of ACM WWW, 1043--1052."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380581"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.2980257"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2011.12.071"},{"key":"e_1_2_1_45_1","volume-title":"An Algorithm for Subgraph Isomorphism. J. ACM","author":"Ullmann J. R.","year":"1976","unstructured":"J. R. Ullmann. 1976. An Algorithm for Subgraph Isomorphism. J. ACM (1976), 31--42."},{"key":"e_1_2_1_46_1","unstructured":"Princeton University. 2010. About WordNet. Retrieved 2023-04-07 from https:\/\/wordnet.princeton.edu\/."},{"key":"e_1_2_1_47_1","volume-title":"Mohamed Zait, and Sunil P. Chakkappen.","author":"Vengerov David","year":"2015","unstructured":"David Vengerov, Andre Cavalheiro Menck, Mohamed Zait, and Sunil P. Chakkappen. 2015. Join Size Estimation Subject to Filter Conditions. In Proceedings of the VLDB Endowment. 1530--1541."},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of ACM SIGMOD. 160--175","author":"Wang Hanchen","year":"2022","unstructured":"Hanchen Wang, Rong Hu, Ying Zhang, Lu Qin, Wei Wang, and Wenjie Zhang. 2022. Neural Subgraph Counting with Wasserstein Estimator. In Proceedings of ACM SIGMOD. 160--175."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of ACM WWW. 1191--1202","author":"Ye Xiaowei","year":"2022","unstructured":"Xiaowei Ye, Rong Hua Li, Qiangqiang Dai, Hongzhi Chen, and Guoren Wang. 2022. Lightning Fast and Space Efficient k-clique Counting. In Proceedings of ACM WWW. 1191--1202."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of ACM SIGMOD. 2142--2155","author":"Zhao Kangfei","year":"2021","unstructured":"Kangfei Zhao, Jeffrey Xu Yu, Hao Zhang, Qiyan Li, and Yu Rong. 2021. A Learned Sketch for Subgraph Counting. In Proceedings of ACM SIGMOD. 2142--2155."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of ACM SIGMOD. 1525--1539","author":"Zhao Zhuoyue","year":"2018","unstructured":"Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, and Ke Yi. 2018. Random Sampling over Joins Revisited. In Proceedings of ACM SIGMOD. 1525--1539."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-6-8"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3654621.3654635","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T19:57:37Z","timestamp":1732132657000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3654621.3654635"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3]]},"references-count":52,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["10.14778\/3654621.3654635"],"URL":"https:\/\/doi.org\/10.14778\/3654621.3654635","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,3]]},"assertion":[{"value":"2024-05-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}