{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T05:09:16Z","timestamp":1755925756349,"version":"3.41.0"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2013,5,1]],"date-time":"2013-05-01T00:00:00Z","timestamp":1367366400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2013,5]]},"abstract":"<jats:p>We consider the problem of fuzzy full-text search in large text collections, that is, full-text search which is robust against errors both on the side of the query as well as on the side of the documents. Standard inverted-index techniques work extremely well for ordinary full-text search but fail to achieve interactive query times (below 100 milliseconds) for fuzzy full-text search even on moderately-sized text collections (above 10 GBs of text). We present new preprocessing techniques that achieve interactive query times on large text collections (100 GB of text, served by a single machine). We consider two similarity measures, one where the query terms match similar terms in the collection (e.g., algorithm matches algoritm or vice versa) and one where the query terms match terms with a similar prefix in the collection (e.g., alori matches algorithm). The latter is important when we want to display results instantly after each keystroke (search as you type). All algorithms have been fully integrated into the CompleteSearch engine.<\/jats:p>","DOI":"10.1145\/2457465.2457470","type":"journal-article","created":{"date-parts":[[2013,5,21]],"date-time":"2013-05-21T12:33:56Z","timestamp":1369139636000},"page":"1-59","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Efficient fuzzy search in large text collections"],"prefix":"10.1145","volume":"31","author":[{"given":"Hannah","family":"Bast","sequence":"first","affiliation":[{"name":"Albert Ludwigs University, Freiburg, Germany"}]},{"given":"Marjan","family":"Celikik","sequence":"additional","affiliation":[{"name":"Albert Ludwigs University, Freiburg, Germany"}]}],"member":"320","published-online":{"date-parts":[[2013,5,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-27801-6_30"},{"volume-title":"Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE'98)","author":"Baeza-Yates R.","key":"e_1_2_1_2_1"},{"volume-title":"Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware (SPIRE'99)","author":"Baeza-Yates R. A.","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30192-9_58"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646272"},{"volume-title":"Proceedings of the 3rd Conference on Innovative Data Systems Research (CIDR'07)","author":"Bast H.","key":"e_1_2_1_6_1"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242591"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/3127091.3127105"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010023"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1507509.1507518"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963190.1963191"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(82)90004-8"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075218.1075255"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401995"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1529282.1529669"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.9"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559919"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/502807.502808"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007352.1007374"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277953"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/253495.253521"},{"volume-title":"Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'00","author":"Demaine E. D.","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.298177"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277833"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/11764298_26"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277821"},{"volume-title":"Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01)","author":"Gravano L.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10268"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/JRPROC.1952.273898"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/361932.361940"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526760"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Jokinen P.\n     and \n      \n      \n      Ukkonen E\n      \n  \n  . \n  1991\n  . Two algorithms for approximate string matching in static texts. In Proceedings of the. 2nd Annual Symposium on Mathematical Foundations of Computer Science. P. Jokinen and E. Ukkonen Eds\n  . Lecture Notes in Computer Science vol. \n  520 240--248.  Jokinen P. and Ukkonen E. 1991. Two algorithms for approximate string matching in static texts. In Proceedings of the. 2nd Annual Symposium on Mathematical Foundations of Computer Science. P. Jokinen and E. Ukkonen Eds. Lecture Notes in Computer Science vol. 520 240--248.","DOI":"10.1007\/3-540-54345-7_67"},{"volume-title":"Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01)","author":"Kahveci T.","key":"e_1_2_1_33_1"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010026"},{"key":"e_1_2_1_35_1","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein V. I.","year":"1966","journal-title":"Sov. Phys. Dokl."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497434"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348333"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220304"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.165464"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/167088.167172"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458145"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1162\/0891201042544938"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/358728.358752"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Muth R.\n     and \n      \n      \n      Manber U\n      \n  \n  . \n  1996\n  . Approximate multiple strings search. In Proceedings of the Conference on Combinatorial Pattern Matching (CPM'96). D. S. Hirschberg and E. W. Myers Eds. Lecture Notes in Computer Science Series vol. \n  1075 Springer 75--86.   Muth R. and Manber U. 1996. Approximate multiple strings search. In Proceedings of the Conference on Combinatorial Pattern Matching (CPM'96). D. S. Hirschberg and E. W. Myers Eds. Lecture Notes in Computer Science Series vol. 1075 Springer 75--86.","DOI":"10.1007\/3-540-61258-0_7"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Myers E. W. 1994. A sublinear algorithm for approximate keyword searching. Algorithmica V12 4 345--374.  Myers E. W. 1994. A sublinear algorithm for approximate keyword searching. Algorithmica V12 4 345--374.","DOI":"10.1007\/BF01185432"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/316542.316550"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/375360.375365"},{"key":"e_1_2_1_48_1","first-page":"2001","article-title":"Indexing methods for approximate string matching","volume":"24","author":"Navarro G.","year":"2000","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03784-9_21"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(70)90057-4"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.3390\/a2031105"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.69.1.4"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564416"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-002-0082-8"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1137\/0126070"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIT.2005.23"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772782"},{"volume-title":"Proceedings of the 3rd Annual European Symposium on Algorithms (ESA'95)","author":"Sutinen E.","key":"e_1_2_1_58_1"},{"volume-title":"Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM'96)","author":"Sutinen E.","key":"e_1_2_1_59_1"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0019-9958(85)80046-2"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.5555\/647813.738278"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01074755"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(83)90022-5"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/135239.135244"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453957"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000825"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380250307"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2457465.2457470","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2457465.2457470","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:18:36Z","timestamp":1750234716000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2457465.2457470"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,5]]},"references-count":67,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2013,5]]}},"alternative-id":["10.1145\/2457465.2457470"],"URL":"https:\/\/doi.org\/10.1145\/2457465.2457470","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2013,5]]},"assertion":[{"value":"2011-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-05-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}