{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:19:29Z","timestamp":1750220369273,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":60,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Science, Technology & Information Technology Bureau of Guangzhou Development Zone","award":["GZSTI16EG24"],"award-info":[{"award-number":["GZSTI16EG24"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3481542","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T06:57:34Z","timestamp":1634540254000},"page":"1157-1166","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["L2RS"],"prefix":"10.1145","author":[{"given":"Yuanfeng","family":"Song","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology &amp; WeBank Co., Ltd, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Di","family":"Jiang","sequence":"additional","affiliation":[{"name":"WeBank Co., Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuefang","family":"Zhao","sequence":"additional","affiliation":[{"name":"WeBank Co., Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qian","family":"Xu","sequence":"additional","affiliation":[{"name":"WeBank Co., Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raymond Chi-Wing","family":"Wong","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lixin","family":"Fan","sequence":"additional","affiliation":[{"name":"WeBank Co., Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiang","family":"Yang","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology &amp; WeBank Co., Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045390.3045410"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2914714"},{"key":"e_1_3_2_1_3_1","volume-title":"Statistical language model adaptation: review and perspectives. Speech communication","author":"Bellegarda Jerome R","year":"2004","unstructured":"Jerome R Bellegarda . 2004. Statistical language model adaptation: review and perspectives. Speech communication , Vol. 42 , 1 ( 2004 ), 93--108. Jerome R Bellegarda. 2004. Statistical language model adaptation: review and perspectives. Speech communication, Vol. 42, 1 (2004), 93--108."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/636713.636724"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102363"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148205"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273513"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273513"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472621"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2010.5494942"},{"key":"e_1_3_2_1_12_1","volume-title":"Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech. arXiv preprint arXiv:1803.08976","author":"Chung Yu-An","year":"2018","unstructured":"Yu-An Chung and James Glass . 2018. Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech. arXiv preprint arXiv:1803.08976 ( 2018 ). Yu-An Chung and James Glass. 2018. Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech. arXiv preprint arXiv:1803.08976 (2018)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/11776420_44"},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers). 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186."},{"key":"e_1_3_2_1_15_1","volume-title":"Zhong Meng, and Shinji Watanabe.","author":"Erdogan Hakan","year":"2016","unstructured":"Hakan Erdogan , Tomoki Hayashi , John R Hershey , Takaaki Hori , Chiori Hori , Wei-Ning Hsu , Suyoun Kim , Jonathan Le Roux , Zhong Meng, and Shinji Watanabe. 2016 . Multi-channel speech recognition: LSTMs all the way through. In CHiME- 4 workshop. 1--4. Hakan Erdogan, Tomoki Hayashi, John R Hershey, Takaaki Hori, Chiori Hori, Wei-Ning Hsu, Suyoun Kim, Jonathan Le Roux, Zhong Meng, and Shinji Watanabe. 2016. Multi-channel speech recognition: LSTMs all the way through. In CHiME-4 workshop. 1--4."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1117\/12.290336"},{"key":"e_1_3_2_1_17_1","volume-title":"Greedy function approximation: a gradient boosting machine. Annals of statistics","author":"Friedman Jerome H","year":"2001","unstructured":"Jerome H Friedman . 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics ( 2001 ), 1189--1232. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/35.3.243"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277811"},{"key":"e_1_3_2_1_20_1","volume-title":"Large margin rank boundaries for ordinal regression. Advances in large margin classifiers","author":"Herbrich Ralf","year":"2000","unstructured":"Ralf Herbrich . 2000. Large margin rank boundaries for ordinal regression. Advances in large margin classifiers ( 2000 ), 115--132. Ralf Herbrich. 2000. Large margin rank boundaries for ordinal regression. Advances in large margin classifiers (2000), 115--132."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2073796.2073829"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505665"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Di Jiang Yuanfeng Song Rongzhong Lian Siqi Bao Jinhua Peng Huang He Hua Wu Chen Zhang and Lei Chen. 2021. Familia: A Configurable Topic Modeling Framework for Industrial Text Engineering. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021) 516.  Di Jiang Yuanfeng Song Rongzhong Lian Siqi Bao Jinhua Peng Huang He Hua Wu Chen Zhang and Lei Chen. 2021. Familia: A Configurable Topic Modeling Framework for Industrial Text Engineering. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021) 516.","DOI":"10.1007\/978-3-030-73200-4_36"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963460"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(01)00041-3"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.56193"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/3020751.3020798"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIME.2009.101"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU46091.2019.9003972"},{"key":"e_1_3_2_1_30_1","volume-title":"Amit Das, Zhong Meng, and Yifan Gong.","author":"Li Jinyu","year":"2020","unstructured":"Jinyu Li , Rui Zhao , Eric Sun , Jeremy HM Wong , Amit Das, Zhong Meng, and Yifan Gong. 2020 . High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE , 7699--7703. Jinyu Li, Rui Zhao, Eric Sun, Jeremy HM Wong, Amit Das, Zhong Meng, and Yifan Gong. 2020. High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7699--7703."},{"key":"e_1_3_2_1_31_1","volume-title":"Hyderabad","author":"Li Ke","year":"2018","unstructured":"Ke Li , Hainan Xu , Yiming Wang , Daniel Povey , and Sanjeev Khudanpur . 2018. Recurrent neural network language model adaptation for conversational speech recognition. INTERSPEECH , Hyderabad ( 2018 ), 1--5. Ke Li, Hainan Xu, Yiming Wang, Daniel Povey, and Sanjeev Khudanpur. 2018. Recurrent neural network language model adaptation for conversational speech recognition. INTERSPEECH, Hyderabad (2018), 1--5."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/2981562.2981675"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btv413"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000016"},{"key":"e_1_3_2_1_35_1","volume-title":"Efficient Lattice Rescoring Using Recurrent Neural Network Language Models. In IEEE International Conference on Acoustics.","author":"Liu Xunying","year":"2014","unstructured":"Xunying Liu , Yongqiang Wang , Xie Chen , Mark J. F. Gales , and Phil Woodland . 2014 . Efficient Lattice Rescoring Using Recurrent Neural Network Language Models. In IEEE International Conference on Acoustics. Xunying Liu, Yongqiang Wang, Xie Chen, Mark J. F. Gales, and Phil Woodland. 2014. Efficient Lattice Rescoring Using Recurrent Neural Network Language Models. In IEEE International Conference on Acoustics."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.2000.0152"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Tom\u00e1vs Mikolov Martin Karafi\u00e1t Luk\u00e1vs Burget Jan \u010cernock\u1ef3 and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.  Tom\u00e1vs Mikolov Martin Karafi\u00e1t Luk\u00e1vs Burget Jan \u010cernock\u1ef3 and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.","DOI":"10.21437\/Interspeech.2010-343"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2011.5947611"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999959"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2011.2174225"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Atsunori Ogawa Marc Delcroix Shigeki Karita and Tomohiro Nakatani. 2018. Rescoring N-Best Speech Recognition List Based on One-on-One Hypothesis Comparison Using Encoder-Classifier Model. In 2018 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). IEEE 6099--6103.  Atsunori Ogawa Marc Delcroix Shigeki Karita and Tomohiro Nakatani. 2018. Rescoring N-Best Speech Recognition List Based on One-on-One Hypothesis Comparison Using Encoder-Classifier Model. In 2018 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). IEEE 6099--6103.","DOI":"10.1109\/ICASSP.2018.8461405"},{"key":"e_1_3_2_1_42_1","volume-title":"IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.","author":"Povey Daniel","year":"2011","unstructured":"Daniel Povey , Arnab Ghoshal , Gilles Boulianne , Lukas Burget , Ondrej Glembek , Nagendra Goel , Mirko Hannemann , Petr Motlicek , Yanmin Qian , Petr Schwarz , 2011 . The Kaldi speech recognition toolkit . In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society. Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et almbox. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2006.06.006"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1218962"},{"key":"e_1_3_2_1_45_1","unstructured":"Anthony Rousseau Paul Del\u00e9glise and Yannick Esteve. 2012. TED-LIUM: an Automatic Speech Recognition dedicated corpus.. In LREC. 125--129.  Anthony Rousseau Paul Del\u00e9glise and Yannick Esteve. 2012. TED-LIUM: an Automatic Speech Recognition dedicated corpus.. In LREC. 125--129."},{"key":"e_1_3_2_1_46_1","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV). 0--0.","author":"S\u00e1nchez Jorge","year":"2018","unstructured":"Jorge S\u00e1nchez , Franco Luque , and Leandro Lichtensztein . 2018 . A Structured Listwise Approach to Learning to Rank for Image Tagging . In Proceedings of the European Conference on Computer Vision (ECCV). 0--0. Jorge S\u00e1nchez, Franco Luque, and Leandro Lichtensztein. 2018. A Structured Listwise Approach to Learning to Rank for Image Tagging. In Proceedings of the European Conference on Computer Vision (ECCV). 0--0."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/2968618.2968738"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2017.10.048"},{"key":"e_1_3_2_1_49_1","volume-title":"Lixin Fan, and Qiang Yang.","author":"Song Yuanfeng","year":"2019","unstructured":"Yuanfeng Song , Di Jiang , Xuefang Zhao , Qian Xu , Raymond Chi-Wing Wong , Lixin Fan, and Qiang Yang. 2019 . L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition . arxiv: cs.CL\/1910.11496 Yuanfeng Song, Di Jiang, Xuefang Zhao, Qian Xu, Raymond Chi-Wing Wong, Lixin Fan, and Qiang Yang. 2019. L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition. arxiv: cs.CL\/1910.11496"},{"key":"e_1_3_2_1_50_1","first-page":"2011","article-title":"Large-scale learning to rank using boosted decision trees","volume":"2","author":"Svore Krysta M","year":"2011","unstructured":"Krysta M Svore and CJ Burges . 2011 . Large-scale learning to rank using boosted decision trees . Scaling Up Machine Learning: Parallel and Distributed Approaches , Vol. 2 (2011), 2011 . Krysta M Svore and CJ Burges. 2011. Large-scale learning to rank using boosted decision trees. Scaling Up Machine Learning: Parallel and Distributed Approaches, Vol. 2 (2011), 2011.","journal-title":"Scaling Up Machine Learning: Parallel and Distributed Approaches"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Tomohiro Tanaka Ryo Masumura Takafumi Moriya and Yushi Aono. 2018. Neural Speech-to-Text Language Models for Rescoring Hypotheses of DNN-HMM Hybrid Automatic Speech Recognition Systems. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE 196--200.  Tomohiro Tanaka Ryo Masumura Takafumi Moriya and Yushi Aono. 2018. Neural Speech-to-Text Language Models for Rescoring Hypotheses of DNN-HMM Hybrid Automatic Speech Recognition Systems. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE 196--200.","DOI":"10.23919\/APSIPA.2018.8659622"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Tomohiro Tanaka Ryo Masumura Takafumi Moriya Takanobu Oba and Yushi Aono. 2019. A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge.. In INTERSPEECH. 2210--2214.  Tomohiro Tanaka Ryo Masumura Takafumi Moriya Takanobu Oba and Yushi Aono. 2019. A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge.. In INTERSPEECH. 2210--2214.","DOI":"10.21437\/Interspeech.2019-2263"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1341531.1341544"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999907"},{"key":"e_1_3_2_1_55_1","volume-title":"BERT has a mouth, and it must speak: BERT as a markov random field language model. arXiv preprint arXiv:1902.04094","author":"Wang Alex","year":"2019","unstructured":"Alex Wang and Kyunghyun Cho . 2019. BERT has a mouth, and it must speak: BERT as a markov random field language model. arXiv preprint arXiv:1902.04094 ( 2019 ). Alex Wang and Kyunghyun Cho. 2019. BERT has a mouth, and it must speak: BERT as a markov random field language model. arXiv preprint arXiv:1902.04094 (2019)."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054345"},{"key":"e_1_3_2_1_57_1","unstructured":"Han Xiao. 2018. bert-as-service. https:\/\/github.com\/hanxiao\/bert-as-service.  Han Xiao. 2018. bert-as-service. https:\/\/github.com\/hanxiao\/bert-as-service."},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461974"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277809"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2736277.2741115"}],"event":{"name":"MM '21: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Virtual Event China","acronym":"MM '21"},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3481542","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3481542","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:35Z","timestamp":1750191455000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3481542"}},"subtitle":["A Learning-to-Rescore Mechanism for Hybrid Speech Recognition"],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":60,"alternative-id":["10.1145\/3474085.3481542","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3481542","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}