{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T09:35:05Z","timestamp":1774949705227,"version":"3.50.1"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2019,8]]},"abstract":"<jats:p>n-gram language models are widely used in language processing applications, e.g., automatic speech recognition, for ranking the candidate word sequences generated from the generator model, e.g., the acoustic model. Large n-gram models typically give good ranking results; however, they require a huge amount of memory storage. While distributing the model across multiple nodes resolves the memory issue, it nonetheless incurs a great network communication overhead and introduces a different bottleneck. In this paper, we present our distributed system developed at Tencent with novel optimization techniques for reducing the network overhead, including distributed indexing, batching and caching. They reduce the network requests and accelerate the operation on each single node. We also propose a cascade fault-tolerance mechanism which adaptively switches to small n-gram models depending on the severity of the failure. Experimental study on 9 automatic speech recognition (ASR) datasets confirms that our distributed system scales to large models efficiently, effectively and robustly. We have successfully deployed it for Tencent's WeChat ASR with the peak network traffic at the scale of 100 millions of messages per minute.<\/jats:p>","DOI":"10.14778\/3352063.3352136","type":"journal-article","created":{"date-parts":[[2019,9,18]],"date-time":"2019-09-18T18:36:11Z","timestamp":1568831771000},"page":"2206-2217","source":"Crossref","is-referenced-by-count":5,"title":["A distributed system for large-scale n-gram language models at Tencent"],"prefix":"10.14778","volume":"12","author":[{"given":"Qiang","family":"Long","sequence":"first","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei","family":"Wang","sequence":"additional","affiliation":[{"name":"National University of Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinfu","family":"Deng","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Song","family":"Liu","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenhao","family":"Huang","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fangying","family":"Chen","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sifan","family":"Liu","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2404"},{"key":"e_1_2_1_2_1","volume-title":"Deep speech 2: End-to-end speech recognition in english and mandarin. CoRR, abs\/1512.02595","author":"Amodei D.","year":"2015","unstructured":"D. Amodei , R. Anubhai , E. Battenberg , C. Case , J. Casper , B. Catanzaro , J. Chen , M. Chrzanowski , A. Coates , G. Diamos , E. Elsen , J. Engel , L. Fan , C. Fougner , T. Han , A. Y. Hannun , B. Jun , P. LeGresley , L. Lin , S. Narang , A. Y. Ng , S. Ozair , R. Prenger , J. Raiman , S. Satheesh , D. Seetapun , S. Sengupta , Y. Wang , Z. Wang , C. Wang , B. Xiao , D. Yogatama , J. Zhan , and Z. Zhu . Deep speech 2: End-to-end speech recognition in english and mandarin. CoRR, abs\/1512.02595 , 2015 . D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Y. Hannun, B. Jun, P. LeGresley, L. Lin, S. Narang, A. Y. Ng, S. Ozair, R. Prenger, J. Raiman, S. Satheesh, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, and Z. Zhu. Deep speech 2: End-to-end speech recognition in english and mandarin. CoRR, abs\/1512.02595, 2015."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035928"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944966"},{"key":"e_1_2_1_5_1","first-page":"858","volume-title":"EMNLP-CoNLL","author":"Brants T.","year":"2007","unstructured":"T. Brants , A. C. Popat , P. Xu , F. J. Och , and J. Dean . Large language models in machine translation . In EMNLP-CoNLL , pages 858 -- 867 , Prague, Czech Republic , June 2007 . T. Brants, A. C. Popat, P. Xu, F. J. Och, and J. Dean. Large language models in machine translation. In EMNLP-CoNLL, pages 858--867, Prague, Czech Republic, June 2007."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.1999.0128"},{"key":"e_1_2_1_7_1","first-page":"613","volume-title":"NSDI","author":"Crankshaw D.","year":"2017","unstructured":"D. Crankshaw , X. Wang , G. Zhou , M. J. Franklin , J. E. Gonzalez , and I. Stoica . Clipper: A low-latency online prediction serving system . In NSDI , pages 613 -- 627 , Boston, MA , 2017 . USENIX Association. D. Crankshaw, X. Wang, G. Zhou, M. J. Franklin, J. E. Gonzalez, and I. Stoica. Clipper: A low-latency online prediction serving system. In NSDI, pages 613--627, Boston, MA, 2017. USENIX Association."},{"key":"e_1_2_1_8_1","first-page":"37","volume-title":"ICASSP","volume":"4","author":"Emami A.","unstructured":"A. Emami , K. Papineni , and J. Sorensen . Large-scale distributed language modeling . In ICASSP , volume 4 , pages IV- 37 --IV-40, April 2007. A. Emami, K. Papineni, and J. Sorensen. Large-scale distributed language modeling. In ICASSP, volume 4, pages IV-37--IV-40, April 2007."},{"key":"e_1_2_1_9_1","first-page":"II-1764","volume-title":"ICML'14","author":"Graves A.","year":"2014","unstructured":"A. Graves and N. Jaitly . Towards end-to-end speech recognition with recurrent neural networks. In ICML - Volume 32 , ICML'14 , pages II-1764 -- II-1772 . JMLR.org, 2014 . A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In ICML - Volume 32, ICML'14, pages II-1764--II-1772. JMLR.org, 2014."},{"key":"e_1_2_1_10_1","volume-title":"Speech recognition with deep recurrent neural networks. CoRR, abs\/1303.5778","author":"Graves A.","year":"2013","unstructured":"A. Graves , A. Mohamed , and G. E. Hinton . Speech recognition with deep recurrent neural networks. CoRR, abs\/1303.5778 , 2013 . A. Graves, A. Mohamed, and G. E. Hinton. Speech recognition with deep recurrent neural networks. CoRR, abs\/1303.5778, 2013."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2012.2205597"},{"key":"e_1_2_1_12_1","volume-title":"Blackout: Speeding up recurrent neural network language models with very large vocabularies. CoRR, abs\/1511.06909","author":"Ji S.","year":"2015","unstructured":"S. Ji , S. V. N. Vishwanathan , N. Satish , M. J. Anderson , and P. Dubey . Blackout: Speeding up recurrent neural network language models with very large vocabularies. CoRR, abs\/1511.06909 , 2015 . S. Ji, S. V. N. Vishwanathan, N. Satish, M. J. Anderson, and P. Dubey. Blackout: Speeding up recurrent neural network language models with very large vocabularies. CoRR, abs\/1511.06909, 2015."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196894"},{"key":"e_1_2_1_14_1","volume-title":"Exploring the limits of language modeling. CoRR, abs\/1602.02410","author":"J\u00f3zefowicz R.","year":"2016","unstructured":"R. J\u00f3zefowicz , O. Vinyals , M. Schuster , N. Shazeer , and Y. Wu . Exploring the limits of language modeling. CoRR, abs\/1602.02410 , 2016 . R. J\u00f3zefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu. Exploring the limits of language modeling. CoRR, abs\/1602.02410, 2016."},{"key":"e_1_2_1_15_1","volume-title":"Speech and Language Processing","author":"Jurafsky D.","year":"2009","unstructured":"D. Jurafsky and J. H. Martin . Speech and Language Processing ( 2 nd Edition). Prentice-Hall, Inc. , Upper Saddle River, NJ, USA, 2009 . D. Jurafsky and J. H. Martin. Speech and Language Processing (2nd Edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2009.","edition":"2"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137664"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1995.479394"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183751"},{"key":"e_1_2_1_19_1","volume-title":"First-pass large vocabulary continuous speech recognition using bi-directional recurrent dnns. CoRR, abs\/1408.2873","author":"Maas A. L.","year":"2014","unstructured":"A. L. Maas , A. Y. Hannun , D. Jurafsky , and A. Y. Ng . First-pass large vocabulary continuous speech recognition using bi-directional recurrent dnns. CoRR, abs\/1408.2873 , 2014 . A. L. Maas, A. Y. Hannun, D. Jurafsky, and A. Y. Ng. First-pass large vocabulary continuous speech recognition using bi-directional recurrent dnns. CoRR, abs\/1408.2873, 2014."},{"key":"e_1_2_1_20_1","volume-title":"Application of large models to automatic speech recognition","author":"Mandery C.","year":"2011","unstructured":"C. Mandery . Distributed n-gram language models : Application of large models to automatic speech recognition . 2011 . C. Mandery. Distributed n-gram language models : Application of large models to automatic speech recognition. 2011."},{"key":"e_1_2_1_21_1","first-page":"5528","volume-title":"Extensions of recurrent neural network language model","author":"Mikolov T.","year":"2011","unstructured":"T. Mikolov , S. Kombrink , L. Burget , J. Cernocky , and S. Khudanpur . Extensions of recurrent neural network language model . pages 5528 -- 5531 , 06 2011 . T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S. Khudanpur. Extensions of recurrent neural network language model. pages 5528 -- 5531, 06 2011."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.2001.0184"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735506"},{"key":"e_1_2_1_24_1","volume-title":"abs\/1708.07252","author":"Shi D.","year":"2017","unstructured":"D. Shi . A study on neural network language modeling. CoRR , abs\/1708.07252 , 2017 . D. Shi. A study on neural network language modeling. CoRR, abs\/1708.07252, 2017."},{"key":"e_1_2_1_25_1","volume-title":"Entropy-based pruning of backoff language models. CoRR, cs.CL\/0006025","author":"Stolcke A.","year":"2000","unstructured":"A. Stolcke . Entropy-based pruning of backoff language models. CoRR, cs.CL\/0006025 , 2000 . A. Stolcke. Entropy-based pruning of backoff language models. CoRR, cs.CL\/0006025, 2000."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7179001"},{"key":"e_1_2_1_27_1","volume-title":"The HTK book. 01","author":"Young S.","year":"2002","unstructured":"S. Young , G. Evermann , M. Gales , T. Hain , D. Kershaw , X. Liu , G. Moore , J. Odell , D. Ollason , D. Povey , V. Valtchev , and P. Woodland . The HTK book. 01 2002 . S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. The HTK book. 01 2002."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/1610075.1610108"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3352063.3352136","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:34:27Z","timestamp":1672223667000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3352063.3352136"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8]]},"references-count":28,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2019,8]]}},"alternative-id":["10.14778\/3352063.3352136"],"URL":"https:\/\/doi.org\/10.14778\/3352063.3352136","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2019,8]]}}}