{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:19:14Z","timestamp":1740122354278,"version":"3.37.3"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T00:00:00Z","timestamp":1657929600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T00:00:00Z","timestamp":1657929600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001821","name":"Vienna Science and Technology Fund","doi-asserted-by":"publisher","award":["VRG19-009"],"award-info":[{"award-number":["VRG19-009"]}],"id":[{"id":"10.13039\/501100001821","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["SFB 876"],"award-info":[{"award-number":["SFB 876"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The graph edit distance is an intuitive measure to quantify the dissimilarity of graphs, but its computation is <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\mathsf {NP}$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>NP<\/mml:mi>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>-hard and challenging in practice. We introduce methods for answering nearest neighbor and range queries regarding this distance efficiently for large databases with up to millions of graphs. We build on the filter-verification paradigm, where lower and upper bounds are used to reduce the number of exact computations of the graph edit distance. Highly effective bounds for this involve solving a linear assignment problem for each graph in the database, which is prohibitive in massive datasets. Index-based approaches typically provide only weak bounds leading to high computational costs verification. In this work, we derive novel lower bounds for efficient filtering from restricted assignment problems, where the cost function is a tree metric. This special case allows embedding the costs of optimal assignments isometrically into <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\ell _1$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:msub>\n                    <mml:mi>\u2113<\/mml:mi>\n                    <mml:mn>1<\/mml:mn>\n                  <\/mml:msub>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> space, rendering efficient indexing possible. We propose several lower bounds of the graph edit distance obtained from tree metrics reflecting the edit costs, which are combined for effective filtering. Our method termed <jats:italic>EmbAssi<\/jats:italic> can be integrated into existing filter-verification pipelines as a fast and effective pre-filtering step. Empirically we show that for many real-world graphs our lower bounds are already close to the exact graph edit distance, while our index construction and search scales to very large databases.<\/jats:p>","DOI":"10.1007\/s10618-022-00850-3","type":"journal-article","created":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T05:02:48Z","timestamp":1657947768000},"page":"1728-1755","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["EmbAssi: embedding assignment costs for similarity search in large graph databases"],"prefix":"10.1007","volume":"36","author":[{"given":"Franka","family":"Bause","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9143-4880","authenticated-orcid":false,"given":"Erich","family":"Schubert","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2645-947X","authenticated-orcid":false,"given":"Nils M.","family":"Kriege","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,7,16]]},"reference":[{"key":"850_CR1","unstructured":"Backurs A, Dong Y, Indyk P, Razenshteyn I, Wagner T (2020) Scalable nearest neighbor search for optimal transport. In: Int. Conf. Machine Learning, ICML, 119, 497\u2013506"},{"key":"850_CR2","doi-asserted-by":"publisher","unstructured":"Bai Y, Ding H, Bian S, Chen T, Sun Y, Wang W (2019) SimGNN: A neural network approach to fast graph similarity computation. In: ACM International Conference on Web Search and Data Mining, WSDM. https:\/\/doi.org\/10.1145\/3289600.3290967","DOI":"10.1145\/3289600.3290967"},{"key":"850_CR3","doi-asserted-by":"publisher","unstructured":"Bause F, Blumenthal DB, Schubert E, Kriege NM (2021) Metric indexing for graph similarity search. In: SISAP 2021. Lecture Notes in Computer Science, vol. 13058 https:\/\/doi.org\/10.1007\/978-3-030-89657-7_24","DOI":"10.1007\/978-3-030-89657-7_24"},{"key":"850_CR4","doi-asserted-by":"publisher","unstructured":"Beygelzimer A, Kakade SM, Langford J (2006) Cover trees for nearest neighbor. In: Int. Conf. Machine Learning, ICML, vol. 148. https:\/\/doi.org\/10.1145\/1143844.1143857","DOI":"10.1145\/1143844.1143857"},{"issue":"1","key":"850_CR5","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1007\/s00778-019-00544-1","volume":"29","author":"D Blumenthal","year":"2019","unstructured":"Blumenthal D, Boria N, Gamper J, Bougleux S, Brun L (2019) Comparing heuristics for graph edit distance computation. VLDB J 29(1):419\u2013458. https:\/\/doi.org\/10.1007\/s00778-019-00544-1","journal-title":"VLDB J"},{"key":"850_CR6","unstructured":"Bock HH (1974) Automatische Klassifikation. Vandenhoeck & Ruprecht, ???"},{"key":"850_CR7","doi-asserted-by":"publisher","unstructured":"Burkard RE, Dell\u2019Amico M, Martello S (2012) Assignment Problems. SIAM, ???. https:\/\/doi.org\/10.1137\/1.9781611972238","DOI":"10.1137\/1.9781611972238"},{"key":"850_CR8","doi-asserted-by":"publisher","unstructured":"Chang L, Feng X, Lin X, Qin L, Zhang W, Ouyang D (2020) Speeding up GED verification for graph similarity search. In: Int. Conf. Data Engineering, ICDE, pp. 793\u2013804. https:\/\/doi.org\/10.1109\/ICDE48307.2020.00074","DOI":"10.1109\/ICDE48307.2020.00074"},{"key":"850_CR9","doi-asserted-by":"publisher","first-page":"762","DOI":"10.1016\/j.knosys.2018.10.002","volume":"163","author":"X Chen","year":"2019","unstructured":"Chen X, Huo H, Huan J, Vitter JS (2019) An efficient algorithm for graph edit distance computation. Knowl-Based Syst 163:762\u2013775. https:\/\/doi.org\/10.1016\/j.knosys.2018.10.002","journal-title":"Knowl-Based Syst"},{"key":"850_CR10","doi-asserted-by":"publisher","unstructured":"Duan R, Su H-H (2012) A scaling algorithm for maximum weight matching in bipartite graphs. In: Symposium on Discrete Algorithms, SODA https:\/\/doi.org\/10.1137\/1.9781611973099.111","DOI":"10.1137\/1.9781611973099.111"},{"key":"850_CR11","unstructured":"Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Int. Conf. Knowledge Discovery and Data Mining (KDD), pp. 226\u2013231"},{"issue":"4","key":"850_CR12","doi-asserted-by":"publisher","first-page":"1410","DOI":"10.1021\/acs.jcim.8b00820","volume":"59","author":"C Garcia-Hernandez","year":"2019","unstructured":"Garcia-Hernandez C, Fern\u00e1ndez A, Serratosa F (2019) Ligand-based virtual screening using graph edit distance as molecular similarity measure. J Chem Inf Model 59(4):1410\u20131421. https:\/\/doi.org\/10.1021\/acs.jcim.8b00820","journal-title":"J Chem Inf Model"},{"issue":"D1","key":"850_CR13","doi-asserted-by":"publisher","first-page":"945","DOI":"10.1093\/nar\/gkw1074","volume":"45","author":"A Gaulton","year":"2016","unstructured":"Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibri\u00e1n-Uhalte E, Davies M, Dedman N, Karlsson A, Magari\u00f1os MP, Overington JP, Papadatos G, Smit I, Leach AR (2016) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):945\u2013954. https:\/\/doi.org\/10.1093\/nar\/gkw1074","journal-title":"Nucleic Acids Res"},{"key":"850_CR14","doi-asserted-by":"publisher","unstructured":"Gouda K, Hassaan M (2016) CSI_GED: An efficient approach for graph edit similarity computation. In: Int. Conf. Data Engineering, ICDE. https:\/\/doi.org\/10.1109\/ICDE.2016.7498246","DOI":"10.1109\/ICDE.2016.7498246"},{"key":"850_CR15","doi-asserted-by":"publisher","unstructured":"Kim J, Choi D, Li C: Inves: Incremental partitioning-based verification for graph similarity search. In: EDBT, pp. 229\u2013240 (2019). https:\/\/doi.org\/10.5441\/002\/edbt.2019.21","DOI":"10.5441\/002\/edbt.2019.21"},{"key":"850_CR16","unstructured":"Kriege NM, Fey M, Fisseler D, Mutzel P, Weichert F (2018) Recognizing cuneiform signs using graph based methods. In: Int. Workshop on Cost-Sensitive Learning, COST@SDM. PMLR, 88"},{"key":"850_CR17","doi-asserted-by":"publisher","unstructured":"Kriege NM, Giscard P, Bause F, Wilson RC: Computing optimal assignments in linear time for approximate graph matching. In: ICDM, pp. 349\u2013358 (2019). https:\/\/doi.org\/10.1109\/ICDM.2019.00045","DOI":"10.1109\/ICDM.2019.00045"},{"key":"850_CR18","unstructured":"Kriege NM, Giscard P, Wilson RC. (2016) On valid optimal assignment kernels and applications to graph classification. In: Advances in Neural Information Processing Systems, pp. 1615\u20131623"},{"issue":"1","key":"850_CR19","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1007\/s41109-019-0195-3","volume":"5","author":"NM Kriege","year":"2020","unstructured":"Kriege NM, Johansson FD, Morris C (2020) A survey on graph kernels. Appl. Netw. Sci. 5(1):6. https:\/\/doi.org\/10.1007\/s41109-019-0195-3","journal-title":"Appl. Netw. Sci."},{"key":"850_CR20","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1016\/j.patcog.2017.07.029","volume":"72","author":"J Lerouge","year":"2017","unstructured":"Lerouge J, Abu-Aisheh Z, Raveaux R, H\u00e9roux P, Adam S (2017) New binary linear programming formulation to compute the graph edit distance. Pattern Recognit 72:254\u2013265. https:\/\/doi.org\/10.1016\/j.patcog.2017.07.029","journal-title":"Pattern Recognit"},{"key":"850_CR21","unstructured":"Le T, Yamada M, Fukumizu K, Cuturi M (2019) Tree-sliced variants of Wasserstein distances. In: Neural Information Processing Systems"},{"key":"850_CR22","doi-asserted-by":"publisher","unstructured":"Liang Y, Zhao P (2017) Similarity search in graph databases: A multi-layered indexing approach. In: Int. Conf. Data Engineering, ICDE. https:\/\/doi.org\/10.1109\/ICDE.2017.129","DOI":"10.1109\/ICDE.2017.129"},{"key":"850_CR23","unstructured":"Li Y, Gu C, Dullien T, Vinyals O, Kohli P (2019) Graph matching networks for learning the similarity of graph structured objects. In: ICML"},{"key":"850_CR24","unstructured":"Morris C, Kriege NM, Bause F, Kersting K, Mutzel P, Neumann, M (2020) TUDataset: A collection of benchmark datasets for learning with graphs. In: ICML Workshop on Graph Representation Learning and Beyond, GRL+"},{"issue":"1","key":"850_CR25","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1137\/0105003","volume":"5","author":"JR Munkres","year":"1957","unstructured":"Munkres JR (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32\u201338","journal-title":"J Soc Ind Appl Math"},{"issue":"8","key":"850_CR26","doi-asserted-by":"publisher","first-page":"1358","DOI":"10.1021\/ci100132g","volume":"50","author":"R Nasr","year":"2010","unstructured":"Nasr R, Hirschberg DS, Baldi P (2010) Hashing algorithms and data structures for rapid searches of fingerprint vectors. J Chem Inf Model 50(8):1358\u20131368. https:\/\/doi.org\/10.1021\/ci100132g","journal-title":"J Chem Inf Model"},{"key":"850_CR27","doi-asserted-by":"publisher","unstructured":"Neuhaus M, Riesen K, Bunke H (2006) Fast suboptimal algorithms for the computation of graph edit distance. In: Structural, Syntactic, and Statistical Pattern Recognition, pp. 163\u2013172. https:\/\/doi.org\/10.1007\/11815921_17","DOI":"10.1007\/11815921_17"},{"key":"850_CR28","doi-asserted-by":"crossref","unstructured":"Qin Z, Bai Y, Sun Y (2020) GHashing: Semantic graph hashing for approximate similarity search in graph databases. In: ACM SIGKDD, pp. 2062\u20132072","DOI":"10.1145\/3394486.3403257"},{"issue":"7","key":"850_CR29","doi-asserted-by":"publisher","first-page":"950","DOI":"10.1016\/j.imavis.2008.04.004","volume":"27","author":"K Riesen","year":"2009","unstructured":"Riesen K, Bunke H (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image Vision Comput 27(7):950\u2013959. https:\/\/doi.org\/10.1016\/j.imavis.2008.04.004","journal-title":"Image Vision Comput"},{"key":"850_CR30","doi-asserted-by":"crossref","unstructured":"Riesen K, Ferrer M, Fischer A, Bunke H: Approximation of graph edit distance in quadratic time. In: Graph-Based Representations in Pattern Recognition, pp. 3\u201312 (2015)","DOI":"10.1007\/978-3-319-18224-7_1"},{"issue":"1","key":"850_CR31","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1007\/s10618-012-0300-z","volume":"28","author":"E Schubert","year":"2014","unstructured":"Schubert E, Zimek A, Kriegel H (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 28(1):190\u2013237. https:\/\/doi.org\/10.1007\/s10618-012-0300-z","journal-title":"Data Min Knowl Discov"},{"key":"850_CR32","unstructured":"Schubert E, Zimek A (2019) ELKI: A large open-source library for data analysis - ELKI release 0.7.5 Heidelberg. CoRR arXiv: abs\/1902.03616"},{"key":"850_CR33","doi-asserted-by":"publisher","unstructured":"Seidl T, Kriegel H (1998) Optimal multi-step k-nearest neighbor search. In: SIGMOD Int. Conf. Management of Data, pp. 154\u2013165. https:\/\/doi.org\/10.1145\/276304.276319","DOI":"10.1145\/276304.276319"},{"key":"850_CR34","doi-asserted-by":"crossref","unstructured":"Seidl M, Wieser E, Zeppelzauer M, Pinz A, Breiteneder C (2015) Graph-based shape similarity of petroglyphs. In: ECCV Workshops Computer Vision, pp. 133\u2013148","DOI":"10.1007\/978-3-319-16178-5_9"},{"key":"850_CR35","unstructured":"Semple C, Steel M (2003) Phylogenetics. Oxford lecture series in mathematics and its applications. Oxford University Press, ???"},{"issue":"1","key":"850_CR36","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1093\/comjnl\/16.1.30","volume":"16","author":"R Sibson","year":"1973","unstructured":"Sibson R (1973) SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1):30\u201334. https:\/\/doi.org\/10.1093\/comjnl\/16.1.30","journal-title":"The Computer Journal"},{"key":"850_CR37","doi-asserted-by":"publisher","unstructured":"St\u00f6cker BK, Sch\u00e4fer T, Mutzel P, K\u00f6ster J, Kriege, NM, Rahmann S (2019) Protein complex similarity based on Weisfeiler-Lehman labeling. In: 12th Int. Conf. Similarity Search and Applications, SISAP, 11807, 308\u2013322. https:\/\/doi.org\/10.1007\/978-3-030-32047-8_27","DOI":"10.1007\/978-3-030-32047-8_27"},{"issue":"3","key":"850_CR38","doi-asserted-by":"publisher","first-page":"440","DOI":"10.1109\/TKDE.2010.28","volume":"24","author":"G Wang","year":"2012","unstructured":"Wang G, Wang B, Yang X, Yu G (2012) Efficiently indexing large sparse graphs for similarity search. IEEE Trans Knowl Data Eng 24(3):440\u2013451. https:\/\/doi.org\/10.1109\/TKDE.2010.28","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"850_CR39","doi-asserted-by":"publisher","unstructured":"Wang X, Ding X, Tung A, Ying S, Jin H (2012) An efficient graph indexing method. In: Int. Conf. Data Engineering, ICDE https:\/\/doi.org\/10.1109\/ICDE.2012.28","DOI":"10.1109\/ICDE.2012.28"},{"issue":"1","key":"850_CR40","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/TNNLS.2020.2978386","volume":"32","author":"Z Wu","year":"2021","unstructured":"Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021) A comprehensive survey on graph neural networks. IEEE Trans Neural Networks Learn Syst 32(1):4\u201324. https:\/\/doi.org\/10.1109\/TNNLS.2020.2978386","journal-title":"IEEE Trans Neural Networks Learn Syst"},{"key":"850_CR41","unstructured":"Xiao B, Cheng J, Hancock ER (2013) Graph-based Methods in Computer Vision: Developments and Applications. Premier reference source. Information Science Reference, ???"},{"key":"850_CR42","doi-asserted-by":"publisher","unstructured":"Yang L, Zou L (2021) Noah: Neural-optimized A* search algorithm for graph edit distance computation. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 576\u2013587. https:\/\/doi.org\/10.1109\/ICDE51399.2021.00056","DOI":"10.1109\/ICDE51399.2021.00056"},{"issue":"1","key":"850_CR43","doi-asserted-by":"publisher","first-page":"25","DOI":"10.14778\/1687627.1687631","volume":"2","author":"Z Zeng","year":"2009","unstructured":"Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: On approximating graph edit distance. Proc. VLDB Endow. 2(1):25\u201336. https:\/\/doi.org\/10.14778\/1687627.1687631","journal-title":"Proc. VLDB Endow."},{"issue":"3","key":"850_CR44","doi-asserted-by":"publisher","first-page":"169","DOI":"10.14778\/2732232.2732236","volume":"7","author":"X Zhao","year":"2013","unstructured":"Zhao X, Xiao C, Lin X, Liu Q, Zhang W (2013) A partition-based approach to structure similarity search. Proc VLDB Endow 7(3):169\u2013180. https:\/\/doi.org\/10.14778\/2732232.2732236","journal-title":"Proc VLDB Endow"},{"key":"850_CR45","doi-asserted-by":"publisher","unstructured":"Zhao X, Xiao C, Lin X, Wang W (2012) Efficient graph similarity joins with edit distance constraints. In: Int. Conf. Data Engineering, ICDE https:\/\/doi.org\/10.1109\/ICDE.2012.91","DOI":"10.1109\/ICDE.2012.91"},{"issue":"4","key":"850_CR46","doi-asserted-by":"publisher","first-page":"964","DOI":"10.1109\/TKDE.2014.2349924","volume":"27","author":"W Zheng","year":"2015","unstructured":"Zheng W, Zou L, Lian X, Wang D, Zhao D (2015) Efficient graph similarity search over large graph databases. IEEE Trans Knowl Data Eng 27(4):964\u2013978. https:\/\/doi.org\/10.1109\/TKDE.2014.2349924","journal-title":"IEEE Trans Knowl Data Eng"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00850-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-022-00850-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00850-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,6]],"date-time":"2022-10-06T10:10:40Z","timestamp":1665051040000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-022-00850-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,16]]},"references-count":46,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["850"],"URL":"https:\/\/doi.org\/10.1007\/s10618-022-00850-3","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"type":"print","value":"1384-5810"},{"type":"electronic","value":"1573-756X"}],"subject":[],"published":{"date-parts":[[2022,7,16]]},"assertion":[{"value":"8 October 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 June 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 July 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}