{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T14:53:15Z","timestamp":1777733595968,"version":"3.51.4"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,10,23]],"date-time":"2015-10-23T00:00:00Z","timestamp":1445558400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Linkage Project","award":["LP100200079"],"award-info":[{"award-number":["LP100200079"]}]},{"name":"Funnelback Pty. Ltd."},{"name":"Veda"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2015,10,26]]},"abstract":"<jats:p>Real-time Entity Resolution (ER) is the process of matching query records in subsecond time with records in a database that represent the same real-world entity. Indexing techniques are generally used to efficiently extract a set of candidate records from the database that are similar to a query record, and that are to be compared with the query record in more detail. The sorted neighborhood indexing method, which sorts a database and compares records within a sliding window, has been successfully used for ER of large static databases. However, because it is based on static sorted arrays and is designed for batch ER that resolves all records in a database rather than resolving those relating to a single query record, this technique is not suitable for real-time ER on dynamic databases that are constantly updated. We propose a tree-based technique that facilitates dynamic indexing based on the sorted neighborhood method, which can be used for real-time ER, and investigate both static and adaptive window approaches. We propose an approach to reduce query matching times by precalculating the similarities between attribute values stored in neighboring tree nodes. We also propose a multitree solution where different sorting keys are used to reduce the effects of errors and variations in attribute values on matching quality by building several distinct index trees. We experimentally evaluate our proposed techniques on large real datasets, as well as on synthetic data with different data quality characteristics. Our results show that as the index grows, no appreciable increase occurs in both record insertion and query times, and that using multiple trees gives noticeable improvements on matching quality with only a small increase in query time. Compared to earlier indexing techniques for real-time ER, our approach achieves significantly reduced indexing and query matching times while maintaining high matching accuracy.<\/jats:p>","DOI":"10.1145\/2816821","type":"journal-article","created":{"date-parts":[[2015,10,24]],"date-time":"2015-10-24T18:27:12Z","timestamp":1445711232000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution"],"prefix":"10.1145","volume":"6","author":[{"given":"Banda","family":"Ramadan","sequence":"first","affiliation":[{"name":"Australian National University, Canberra, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Christen","sequence":"additional","affiliation":[{"name":"Australian National University, Canberra, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huizhi","family":"Liang","sequence":"additional","affiliation":[{"name":"Australian National University, Canberra, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ross W.","family":"Gayler","sequence":"additional","affiliation":[{"name":"Veda, Melbourne, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,10,23]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"263","article-title":"An information organization algorithm","volume":"146","author":"Adelson-Velskii Georgy Maximovic","year":"1962","unstructured":"Georgy Maximovic Adelson-Velskii and Evgenii Mikhailovich Landis . 1962 . An information organization algorithm . In Doklady Akademia Nauk SSSR , Vol. 146. 263 -- 266 . Georgy Maximovic Adelson-Velskii and Evgenii Mikhailovich Landis. 1962. An information organization algorithm. In Doklady Akademia Nauk SSSR, Vol. 146. 263--266.","journal-title":"Doklady Akademia Nauk SSSR"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1105926.1106227"},{"key":"e_1_2_1_3_1","volume-title":"SIGKDD Workshops. ACM, 25--27","author":"Baxter Rohan","year":"2003","unstructured":"Rohan Baxter , Peter Christen , and Tim Churches . 2003 . A comparison of fast blocking methods for record linkage . In SIGKDD Workshops. ACM, 25--27 . Rohan Baxter, Peter Christen, and Tim Churches. 2003. A comparison of fast blocking methods for record linkage. In SIGKDD Workshops. ACM, 25--27."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622637.1622653"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31164-2"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.127"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646173"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-01307-2_47"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/356770.356776"},{"key":"e_1_2_1_10_1","volume-title":"Leiserson","author":"Cormen Thomas H.","year":"2009","unstructured":"Thomas H. Cormen , Clifford Stein , Ronald L. Rivest , and Charles E . Leiserson . 2009 . Introduction to Algorithms (3rd ed.). McGraw-Hill Higher Education . Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. 2009. Introduction to Algorithms (3rd ed.). McGraw-Hill Higher Education."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.134"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544914"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.20"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.9"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/320083.320092"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/645925.671516"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732939.2732943"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687771"},{"key":"e_1_2_1_19_1","volume-title":"Ullman","author":"Hector Garcia-Molina Jennifer Widom","year":"2009","unstructured":"Jennifer Widom Hector Garcia-Molina and Jeffrey D . Ullman . 2009 . Database Systems : The Complete Book (2nd ed.). Pearson Prentice Hall . Jennifer Widom Hector Garcia-Molina and Jeffrey D. Ullman. 2009. Database Systems: The Complete Book (2nd ed.). Pearson Prentice Hall."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/223784.223807"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009761603038"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920898"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/789081.789250"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1739041.1739104"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 12th International Conference on Very Large Data Bases (VLDB\u201986)","author":"Tobin","unstructured":"Tobin J. Lehman and Michael J. Carey. 1986. A study of index structures for main memory database management systems . In Proceedings of the 12th International Conference on Very Large Data Bases (VLDB\u201986) . Kyoto. Tobin J. Lehman and Michael J. Carey. 1986. A study of index structures for main memory database management systems. In Proceedings of the 12th International Conference on Very Large Data Bases (VLDB\u201986). Kyoto."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 6th International Conference on Very Large Data Bases (VLDB\u201980)","volume":"6","author":"Litwin Witold","year":"1980","unstructured":"Witold Litwin . 1980 . Linear hashing: A new tool for file and table addressing . In Proceedings of the 6th International Conference on Very Large Data Bases (VLDB\u201980) , Vol. 6 . 1--3. Witold Litwin. 1980. Linear hashing: A new tool for file and table addressing. In Proceedings of the 6th International Conference on Very Large Data Bases (VLDB\u201980), Vol. 6. 1--3."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/347090.347123"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/1841211"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.262"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661869"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-18032-8_45"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40319-4_5"},{"key":"e_1_2_1_33_1","volume-title":"Western Multiconference on Computer Simulation","author":"Rice Stephen V.","unstructured":"Stephen V. Rice . 2007. Braided AVL trees for efficient event sets and ranked sets in the SIMSCRIPT III simulation programming language . In Western Multiconference on Computer Simulation . Woodhead Publishing , San Diego , 150--155. Stephen V. Rice. 2007. Braided AVL trees for efficient event sets and ranked sets in the SIMSCRIPT III simulation programming language. In Western Multiconference on Computer Simulation. Woodhead Publishing, San Diego, 150--155."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1137\/0205003"},{"key":"e_1_2_1_35_1","volume-title":"An Introduction to the Analysis of Algorithms","author":"Sedgewick Robert","unstructured":"Robert Sedgewick and Philippe Flajolet . 2013. An Introduction to the Analysis of Algorithms ( 2 nd ed.). Addison-Wesley . Robert Sedgewick and Philippe Flajolet. 2013. An Introduction to the Analysis of Algorithms (2nd ed.). Addison-Wesley.","edition":"2"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2508207"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-013-0315-0"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1255175.1255213"},{"key":"e_1_2_1_40_1","volume-title":"Human behavior and the principle of least effort","author":"Zipf George Kingsley","unstructured":"George Kingsley Zipf . 1949. Human behavior and the principle of least effort . Addison-Wesley Press . George Kingsley Zipf. 1949. Human behavior and the principle of least effort. Addison-Wesley Press."}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2816821","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2816821","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:48:19Z","timestamp":1750225699000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2816821"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,10,23]]},"references-count":39,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,10,26]]}},"alternative-id":["10.1145\/2816821"],"URL":"https:\/\/doi.org\/10.1145\/2816821","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"value":"1936-1955","type":"print"},{"value":"1936-1963","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,10,23]]},"assertion":[{"value":"2014-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-10-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}