{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,18]],"date-time":"2026-02-18T23:39:46Z","timestamp":1771457986060,"version":"3.50.1"},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2011,2,1]],"date-time":"2011-02-01T00:00:00Z","timestamp":1296518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2011,2]]},"abstract":"<jats:p>Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spell-checking, and gene identification applications to name but a few. Unfortunately, real-world sequence data often exhibit long range dependencies which can only be captured by computationally challenging, complex models. Sequence data arising from natural processes also often exhibits power-law properties, yet common sequence models do not capture such properties. The sequence memoizer is a new hierarchical Bayesian model for discrete sequence data that captures long range dependencies and power-law characteristics, while remaining computationally attractive. Its utility as a language model and general purpose lossless compressor is demonstrated.<\/jats:p>","DOI":"10.1145\/1897816.1897842","type":"journal-article","created":{"date-parts":[[2011,2,1]],"date-time":"2011-02-01T15:50:21Z","timestamp":1296575421000},"page":"91-98","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["The sequence memoizer"],"prefix":"10.1145","volume":"54","author":[{"given":"Frank","family":"Wood","sequence":"first","affiliation":[{"name":"Columbia University, New York"}]},{"given":"Jan","family":"Gasthaus","sequence":"additional","affiliation":[{"name":"University College London, England"}]},{"given":"C\u00e9dric","family":"Archambeau","sequence":"additional","affiliation":[{"name":"Xerox Research Centre Europe, Grenoble, France"}]},{"given":"Lancelot","family":"James","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Kowloon, Hong Kong"}]},{"given":"Yee Whye","family":"Teh","sequence":"additional","affiliation":[{"name":"University College London, England"}]}],"member":"320","published-online":{"date-parts":[[2011,2]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"27th International Conference on Machine Learning","author":"Bartlett N.","unstructured":"Bartlett , N. , Pfau , D. , Wood , F. Forgetting counts : Constant memory inference for a dependent hierarchical Pitman--Yor process . In 27th International Conference on Machine Learning , to appear (2010). Bartlett, N., Pfau, D., Wood, F. Forgetting counts: Constant memory inference for a dependent hierarchical Pitman--Yor process. In 27th International Conference on Machine Learning, to appear (2010)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944966"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.1999.0128"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/40.2_and_3.67"},{"key":"e_1_2_1_5_1","volume-title":"Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science","author":"Doucet A.","year":"2001","unstructured":"Doucet , A. , de Freitas , N. , Gordon , N.J. Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science . Springer-Verlag , New York , May 2001 . Doucet, A., de Freitas, N., Gordon, N.J. Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer-Verlag, New York, May 2001."},{"key":"e_1_2_1_6_1","volume-title":"Advances in Neural Information Processing Systems 23, to appear","author":"Gasthaus J.","year":"2010","unstructured":"Gasthaus , J. , Teh , Y.W. Improvements to the sequence memoizer . In Advances in Neural Information Processing Systems 23, to appear ( 2010 ). Gasthaus, J., Teh, Y.W. Improvements to the sequence memoizer. In Advances in Neural Information Processing Systems 23, to appear (2010)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/DCC.2010.36"},{"key":"e_1_2_1_8_1","volume-title":"Bayesian data analysis","author":"Gelman A.","year":"2004","unstructured":"Gelman , A. , Carlin , J.B. , Stern , H.S. , Rubin , D.B. Bayesian data analysis . Chapman & amp; Hall, CRC, 2 nd edn, 2004 . Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. Bayesian data analysis. Chapman &amp; Hall, CRC, 2nd edn, 2004.","edition":"2"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/PL00009177"},{"key":"e_1_2_1_10_1","first-page":"459","article-title":"Interpolating between types and tokens by estimating power law generators","volume":"18","author":"Goldwater S.","year":"2006","unstructured":"Goldwater , S. , Griffiths , T.L. , Johnson , M . Interpolating between types and tokens by estimating power law generators . In Advances in Neural Information Processing Systems 18 ( 2006 ), MIT Press, 459 -- 466 . Goldwater, S., Griffiths, T.L., Johnson, M. Interpolating between types and tokens by estimating power law generators. In Advances in Neural Information Processing Systems 18 (2006), MIT Press, 459--466.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324900000218"},{"key":"e_1_2_1_12_1","volume-title":"Large text compression benchmark. URL: http:\/\/www.mattmahoney.net\/text\/text.html","author":"Mahoney M.","year":"2009","unstructured":"Mahoney , M. Large text compression benchmark. URL: http:\/\/www.mattmahoney.net\/text\/text.html ( 2009 ). Mahoney, M. Large text compression benchmark. URL: http:\/\/www.mattmahoney.net\/text\/text.html (2009)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2008.12.025"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1214\/aop\/1022874819"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1214\/aop\/1024404422"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-4145-2"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/584091.584093"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220299"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.661523"},{"key":"e_1_2_1_20_1","volume-title":"CTW website. URL: http:\/\/www.ele.tue.nl\/ctw\/","author":"Willems F.M.J.","year":"2009","unstructured":"Willems , F.M.J. CTW website. URL: http:\/\/www.ele.tue.nl\/ctw\/ ( 2009 ). Willems, F.M.J. CTW website. URL: http:\/\/www.ele.tue.nl\/ctw\/ (2009)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553518"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324907004597"},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","DOI":"10.4159\/harvard.9780674434929","volume-title":"Selective Studies and the Principle of Relative Frequency in Language","author":"Zipf G.","year":"1932","unstructured":"Zipf , G. Selective Studies and the Principle of Relative Frequency in Language . Harvard University Press , Cambridge, MA , 1932 . Zipf, G. Selective Studies and the Principle of Relative Frequency in Language. Harvard University Press, Cambridge, MA, 1932."}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1897816.1897842","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1897816.1897842","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:52:36Z","timestamp":1750243956000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1897816.1897842"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,2]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,2]]}},"alternative-id":["10.1145\/1897816.1897842"],"URL":"https:\/\/doi.org\/10.1145\/1897816.1897842","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"value":"0001-0782","type":"print"},{"value":"1557-7317","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,2]]},"assertion":[{"value":"2011-02-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}