{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T18:41:12Z","timestamp":1675190472073},"reference-count":4,"publisher":"MIT Press - Journals","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["TACL"],"published-print":{"date-parts":[[2016,12]]},"abstract":"<jats:p> Efficient methods for storing and querying are critical for scaling high-order m-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500\u00d7, despite only incurring a modest increase in construction time and memory usage. For large corpora and high Markov orders, our method is highly competitive with the state-of-the-art KenLM package. It imposes much lower memory requirements, often by orders of magnitude, and has runtimes that are either similar (for training) or comparable (for querying). <\/jats:p>","DOI":"10.1162\/tacl_a_00112","type":"journal-article","created":{"date-parts":[[2018,12,28]],"date-time":"2018-12-28T15:44:07Z","timestamp":1546011847000},"page":"477-490","source":"Crossref","is-referenced-by-count":4,"title":["Fast, Small and Exact: Infinite-order Language Modelling with                     Compressed Suffix Trees"],"prefix":"10.1162","volume":"4","author":[{"given":"Ehsan","family":"Shareghi","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, Monash University,"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthias","family":"Petri","sequence":"additional","affiliation":[{"name":"Computing and Information Systems, The University of Melbourne,"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gholamreza","family":"Haffari","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Monash University,"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Trevor","family":"Cohn","sequence":"additional","affiliation":[{"name":"Computing and Information Systems, The University of Melbourne,"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","reference":[{"issue":"4","key":"p_6","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1006\/csla.1999.0128","volume":"13","author":"Chen Stanley F","year":"1999","journal-title":"Computer Speech & Language"},{"issue":"5","key":"p_20","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1137\/0222058","volume":"22","author":"Manber Udi","year":"1993","journal-title":"SIAM Journal on Computing"},{"key":"p_21","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1016\/j.jda.2013.07.004","volume":"25","author":"Navarro Gonzalo","year":"2014","journal-title":"Journal of Discrete Algorithms"},{"issue":"2","key":"p_34","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1145\/1897816.1897842","volume":"54","author":"Wood Frank","year":"2011","journal-title":"Communications of the ACM"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/tacl_a_00112","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:38:30Z","timestamp":1615585110000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/43376"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12]]},"references-count":4,"alternative-id":["10.1162\/tacl_a_00112"],"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00112","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,12]]}}}