{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T07:20:42Z","timestamp":1763018442451,"version":"3.41.0"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,1,11]],"date-time":"2022-01-11T00:00:00Z","timestamp":1641859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61972203, 62072166"],"award-info":[{"award-number":["61972203, 62072166"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"crossref","award":["BK20190442"],"award-info":[{"award-number":["BK20190442"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2022,10,31]]},"abstract":"<jats:p>SimRank is an attractive link-based similarity measure used in fertile fields of Web search and sociometry. However, the existing deterministic method by Kusumoto et\u00a0al.\u00a0[<jats:xref ref-type=\"bibr\">24<\/jats:xref>] for retrieving SimRank does not always produce high-quality similarity results, as it fails to accurately obtain diagonal correction matrix\u00a0<jats:italic>D<\/jats:italic>. Moreover, SimRank has a \u201cconnectivity trait\u201d problem: increasing the number of paths between a pair of nodes would decrease its similarity score. The best-known remedy, SimRank++ [<jats:xref ref-type=\"bibr\">1<\/jats:xref>], cannot completely fix this problem, since its score would still be zero if there are no common in-neighbors between two nodes.<\/jats:p><jats:p>In this article, we study fast high-quality link-based similarity search on billion-scale graphs. (1) We first devise a \u201cvaried-<jats:italic>D<\/jats:italic>\u201d method to accurately compute SimRank in linear memory. We also aggregate duplicate computations, which reduces the time of [<jats:xref ref-type=\"bibr\">24<\/jats:xref>] from quadratic to linear in the number of iterations. (2) We propose a novel \u201ccosine-based\u201d SimRank model to circumvent the \u201cconnectivity trait\u201d problem. (3) To substantially speed up the partial-pairs \u201ccosine-based\u201d SimRank search on large graphs, we devise an efficient dimensionality reduction algorithm,<jats:sans-serif>PSR<jats:sup>#<\/jats:sup><\/jats:sans-serif>, with guaranteed accuracy. (4) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument in [<jats:xref ref-type=\"bibr\">24<\/jats:xref>] that \u201cif<jats:italic>D<\/jats:italic>is replaced by a scaled identity matrix (1-\u0194)I, their top-K rankings will not be affected much\u201d. (5) We propose a novel method that can accurately convert from Li et\u00a0al.\u00a0 SimRank ~{S} to Jeh and Widom\u2019s SimRank<jats:italic>S<\/jats:italic>. (6) We propose<jats:sans-serif>GSR<jats:sup>#<\/jats:sup><\/jats:sans-serif>, a generalisation of our \u201ccosine-based\u201d SimRank model, to quantify pairwise similarities across two distinct graphs, unlike SimRank that would assess nodes across two graphs as completely dissimilar. Extensive experiments on various datasets demonstrate the superiority of our proposed approaches in terms of high search quality, computational efficiency, accuracy, and scalability on billion-edge graphs.<\/jats:p>","DOI":"10.1145\/3495209","type":"journal-article","created":{"date-parts":[[2022,1,17]],"date-time":"2022-01-17T06:04:22Z","timestamp":1642399462000},"page":"1-45","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge Graphs"],"prefix":"10.1145","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1082-9475","authenticated-orcid":false,"given":"Weiren","family":"Yu","sequence":"first","affiliation":[{"name":"University of Warwick, Coventry, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9786-7257","authenticated-orcid":false,"given":"Julie","family":"McCann","sequence":"additional","affiliation":[{"name":"Imperial College, London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2721-6867","authenticated-orcid":false,"given":"Chengyuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Hunan University, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5181-4712","authenticated-orcid":false,"given":"Hakan","family":"Ferhatosmanoglu","sequence":"additional","affiliation":[{"name":"University of Warwick, Coventry, UK"}]}],"member":"320","published-online":{"date-parts":[[2022,1,11]]},"reference":[{"issue":"1","key":"e_1_3_2_2_2","article-title":"SimRank++: Query rewriting through link analysis of the click graph","volume":"1","author":"Antonellis Ioannis","year":"2008","unstructured":"Ioannis Antonellis, Hector Garcia Molina, and Chichao Chang. 2008. SimRank++: Query rewriting through link analysis of the click graph. PVLDB 1, 1 (2008), 408\u2013421.","journal-title":"PVLDB"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-31164-2","volume-title":"Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection","author":"Christen Peter","year":"2012","unstructured":"Peter Christen. 2012. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer. DOI:https:\/\/doi.org\/10.1007\/978-3-642-31164-2"},{"doi-asserted-by":"publisher","key":"e_1_3_2_4_2","DOI":"10.1145\/352595.352598"},{"doi-asserted-by":"publisher","key":"e_1_3_2_5_2","DOI":"10.1145\/1277741.1277784"},{"doi-asserted-by":"publisher","key":"e_1_3_2_6_2","DOI":"10.1145\/3366423.3380081"},{"doi-asserted-by":"publisher","key":"e_1_3_2_7_2","DOI":"10.1137\/19M1245505"},{"doi-asserted-by":"publisher","key":"e_1_3_2_8_2","DOI":"10.1007\/978-3-540-30192-9_55"},{"doi-asserted-by":"publisher","key":"e_1_3_2_9_2","DOI":"10.1145\/1060745.1060839"},{"doi-asserted-by":"publisher","key":"e_1_3_2_10_2","DOI":"10.1145\/1059981.1059982"},{"doi-asserted-by":"publisher","key":"e_1_3_2_11_2","DOI":"10.1109\/ICDE.2013.6544858"},{"doi-asserted-by":"publisher","key":"e_1_3_2_12_2","DOI":"10.1145\/635484.635487"},{"doi-asserted-by":"publisher","key":"e_1_3_2_13_2","DOI":"10.1145\/2911451.2914715"},{"doi-asserted-by":"publisher","key":"e_1_3_2_14_2","DOI":"10.1145\/3297280.3297331"},{"doi-asserted-by":"publisher","key":"e_1_3_2_15_2","DOI":"10.1145\/1835804.1835874"},{"doi-asserted-by":"publisher","key":"e_1_3_2_16_2","DOI":"10.1016\/j.is.2013.12.008"},{"issue":"4","key":"e_1_3_2_17_2","first-page":"286","article-title":"Bibliographic coupling","volume":"23","author":"Hill Barbara M.","year":"1972","unstructured":"Barbara M. Hill and Anthony Debons. 1972. Bibliographic coupling. Journal of the Association for Information Science and Technology 23, 4 (1972), 286. DOI:https:\/\/doi.org\/10.1002\/asi.4630230413","journal-title":"Journal of the Association for Information Science and Technology"},{"doi-asserted-by":"publisher","key":"e_1_3_2_18_2","DOI":"10.1145\/2487736"},{"doi-asserted-by":"publisher","key":"e_1_3_2_19_2","DOI":"10.1145\/775047.775126"},{"doi-asserted-by":"publisher","key":"e_1_3_2_20_2","DOI":"10.1145\/775152.775191"},{"doi-asserted-by":"publisher","key":"e_1_3_2_21_2","DOI":"10.14778\/3099622.3099625"},{"doi-asserted-by":"publisher","key":"e_1_3_2_22_2","DOI":"10.1145\/2020408.2020561"},{"doi-asserted-by":"publisher","key":"e_1_3_2_23_2","DOI":"10.1073\/pnas.1611275114"},{"doi-asserted-by":"publisher","key":"e_1_3_2_24_2","DOI":"10.1145\/1852102.1852104"},{"doi-asserted-by":"publisher","key":"e_1_3_2_25_2","DOI":"10.1145\/2588555.2610526"},{"doi-asserted-by":"publisher","key":"e_1_3_2_26_2","DOI":"10.1109\/ICDE.2012.109"},{"doi-asserted-by":"publisher","key":"e_1_3_2_27_2","DOI":"10.1145\/382979.383041"},{"doi-asserted-by":"publisher","key":"e_1_3_2_28_2","DOI":"10.1145\/1150402.1150479"},{"doi-asserted-by":"publisher","key":"e_1_3_2_29_2","DOI":"10.1145\/1739041.1739098"},{"doi-asserted-by":"publisher","key":"e_1_3_2_30_2","DOI":"10.1137\/1.9781611972801.50"},{"doi-asserted-by":"publisher","key":"e_1_3_2_31_2","DOI":"10.1145\/3312528"},{"doi-asserted-by":"publisher","key":"e_1_3_2_32_2","DOI":"10.1145\/1135777.1135994"},{"doi-asserted-by":"publisher","key":"e_1_3_2_33_2","DOI":"10.1007\/s10115-011-0427-z"},{"doi-asserted-by":"publisher","key":"e_1_3_2_34_2","DOI":"10.1007\/s00778-009-0168-8"},{"doi-asserted-by":"publisher","key":"e_1_3_2_35_2","DOI":"10.1016\/j.ins.2020.12.046"},{"doi-asserted-by":"publisher","key":"e_1_3_2_36_2","DOI":"10.1109\/TKDE.2019.2893175"},{"doi-asserted-by":"publisher","key":"e_1_3_2_37_2","DOI":"10.14778\/3384345.3384347"},{"doi-asserted-by":"publisher","key":"e_1_3_2_38_2","DOI":"10.1002\/asi.4630240406"},{"doi-asserted-by":"publisher","key":"e_1_3_2_39_2","DOI":"10.14778\/2735508.2735520"},{"doi-asserted-by":"publisher","key":"e_1_3_2_40_2","DOI":"10.1145\/2882903.2915243"},{"doi-asserted-by":"publisher","key":"e_1_3_2_41_2","DOI":"10.1109\/ICDM.2006.70"},{"doi-asserted-by":"publisher","key":"e_1_3_2_42_2","DOI":"10.1145\/3318464.3389781"},{"doi-asserted-by":"publisher","key":"e_1_3_2_43_2","DOI":"10.1145\/3299869.3319873"},{"doi-asserted-by":"publisher","key":"e_1_3_2_44_2","DOI":"10.1145\/1076034.1076059"},{"doi-asserted-by":"publisher","key":"e_1_3_2_45_2","DOI":"10.1145\/1059981.1059984"},{"key":"e_1_3_2_46_2","first-page":"37","volume-title":"Proceedings of the EDBT","author":"Youngmann Brit","year":"2019","unstructured":"Brit Youngmann, Tova Milo, and Amit Somech. 2019. Boosting SimRank with semantics. In Proceedings of the EDBT. 37\u201348. DOI:https:\/\/doi.org\/10.5441\/002\/edbt.2019.05"},{"key":"e_1_3_2_47_2","first-page":"1","article-title":"RoleSim*: Scaling axiomatic role-based similarity ranking on large graphs","author":"Yu Weiren","year":"2021","unstructured":"Weiren Yu, Sima Iranmanesh, Aparajita Haldar, Maoyin Zhang, and Hakan Ferhatosmanoglu. 2021. RoleSim*: Scaling axiomatic role-based similarity ranking on large graphs. World Wide Web (2021), 1\u201345. https:\/\/www.springerprofessional.de\/en\/rolesim-scaling-axiomatic-role-based-similarity-ranking-on-large\/19563864.","journal-title":"World Wide Web"},{"key":"e_1_3_2_48_2","volume-title":"Proceedings of the 2013 IEEE 29th International Conference on Data Engineering","author":"Yu Weiren","year":"2013","unstructured":"Weiren Yu, Xuemin Lin, and Wenjie Zhang. 2013. Towards efficient SimRank computation on large networks. In Proceedings of the 2013 IEEE 29th International Conference on Data Engineering."},{"doi-asserted-by":"publisher","key":"e_1_3_2_49_2","DOI":"10.1109\/ICDE.2014.6816660"},{"doi-asserted-by":"publisher","key":"e_1_3_2_50_2","DOI":"10.1007\/s00778-018-0536-3"},{"key":"e_1_3_2_51_2","volume-title":"Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge Graphs","author":"Yu Weiren","year":"2021","unstructured":"Weiren Yu, Julie McCann, and Chengyuan Zhang. 2021. Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge Graphs. Technical Report 4. Retrieved from https:\/\/warwick.ac.uk\/fac\/sci\/dcs\/people\/weiren_yu\/tois_2021_tech_rep.pdf."},{"doi-asserted-by":"publisher","key":"e_1_3_2_52_2","DOI":"10.1145\/2600428.2609459"},{"doi-asserted-by":"publisher","key":"e_1_3_2_53_2","DOI":"10.14778\/2735479.2735489"},{"doi-asserted-by":"publisher","key":"e_1_3_2_54_2","DOI":"10.1145\/2766462.2767720"},{"doi-asserted-by":"publisher","key":"e_1_3_2_55_2","DOI":"10.1109\/ICDM.2016.0070"},{"doi-asserted-by":"publisher","key":"e_1_3_2_56_2","DOI":"10.1145\/3368616"},{"doi-asserted-by":"publisher","key":"e_1_3_2_57_2","DOI":"10.1145\/3178876.3186126"},{"doi-asserted-by":"publisher","key":"e_1_3_2_58_2","DOI":"10.1145\/1645953.1646025"},{"doi-asserted-by":"publisher","key":"e_1_3_2_59_2","DOI":"10.1145\/3083899"},{"doi-asserted-by":"publisher","key":"e_1_3_2_60_2","DOI":"10.14778\/2536349.2536350"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3495209","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3495209","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:02Z","timestamp":1750191122000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3495209"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,11]]},"references-count":59,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,10,31]]}},"alternative-id":["10.1145\/3495209"],"URL":"https:\/\/doi.org\/10.1145\/3495209","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2022,1,11]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}