{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:33:47Z","timestamp":1750221227978,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":21,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,10,2]],"date-time":"2018-10-02T00:00:00Z","timestamp":1538438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"ADAPT Centre for Digital Content Technology","award":["Grant 13\/RC\/2106"],"award-info":[{"award-number":["Grant 13\/RC\/2106"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,10,2]]},"DOI":"10.1145\/3242969.3264976","type":"proceedings-article","created":{"date-parts":[[2018,10,2]],"date-time":"2018-10-02T12:09:29Z","timestamp":1538482169000},"page":"538-541","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Large Vocabulary Continuous Audio-Visual Speech Recognition"],"prefix":"10.1145","author":[{"given":"George","family":"Sterpu","sequence":"first","affiliation":[{"name":"Trinity College Dublin, Dublin, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,10,2]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Yannis M. Assael Brendan Shillingford Shimon Whiteson and Nando de Freitas. 2016. LipNet: Sentence-level Lipreading. Vol. abs\/1611.01599 (2016). http:\/\/arxiv.org\/abs\/1611.01599  Yannis M. Assael Brendan Shillingford Shimon Whiteson and Nando de Freitas. 2016. LipNet: Sentence-level Lipreading. Vol. abs\/1611.01599 (2016). http:\/\/arxiv.org\/abs\/1611.01599"},{"key":"e_1_3_2_1_2_1","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2018. Neural Machine Translation by Jointly Learning to Align and Translate International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1409.0473  Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2018. Neural Machine Translation by Jointly Learning to Align and Translate International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"T. Baltruusaitis P. Robinson and L. P. Morency. 2016. OpenFace: An open source facial behavior analysis toolkit 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). 1--10.  T. Baltruusaitis P. Robinson and L. P. Morency. 2016. OpenFace: An open source facial behavior analysis toolkit 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). 1--10.","DOI":"10.1109\/WACV.2016.7477553"},{"key":"e_1_3_2_1_4_1","unstructured":"BBC and Oxford University. 2017. The BBC-Oxford Multi-View Lip Reading Sentences 2 (LRS2) Dataset. http:\/\/www.robots.ox.ac.uk\/~vgg\/data\/lip_reading_sentences\/. (2017). Online Accessed: 11 August 2018.  BBC and Oxford University. 2017. The BBC-Oxford Multi-View Lip Reading Sentences 2 (LRS2) Dataset. http:\/\/www.robots.ox.ac.uk\/~vgg\/data\/lip_reading_sentences\/. (2017). Online Accessed: 11 August 2018."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Chung-Cheng Chiu Tara Sainath Yonghui Wu Rohit Prabhavalkar Patrick Nguyen Zhifeng Chen Anjuli Kannan Ron J. Weiss Kanishka Rao Katya Gonina Navdeep Jaitly Bo Li Jan Chorowski and Michiel Bacchiani. 2018. State-of-the-art Speech Recognition With Sequence-to-Sequence Models ICASSP. https:\/\/arxiv.org\/pdf\/1712.01769.pdf  Chung-Cheng Chiu Tara Sainath Yonghui Wu Rohit Prabhavalkar Patrick Nguyen Zhifeng Chen Anjuli Kannan Ron J. Weiss Kanishka Rao Katya Gonina Navdeep Jaitly Bo Li Jan Chorowski and Michiel Bacchiani. 2018. State-of-the-art Speech Recognition With Sequence-to-Sequence Models ICASSP. https:\/\/arxiv.org\/pdf\/1712.01769.pdf","DOI":"10.1109\/ICASSP.2018.8462105"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"J. S. Garofolo L. F. Lamel W. M. Fisher J. G. Fiscus D. S. Pallett and N. L. Dahlgren. 1993. DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus CDROM. (1993).  J. S. Garofolo L. F. Lamel W. M. Fisher J. G. Fiscus D. S. Pallett and N. L. Dahlgren. 1993. DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus CDROM. (1993).","DOI":"10.6028\/NIST.IR.4930"},{"volume-title":"IET Conference Proceedings (January. 1999","year":"1999","author":"Gers F. A.","key":"e_1_3_2_1_7_1"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2407694"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"S. Kim T. Hori and S. Watanabe. 2017. Joint CTC-attention based end-to-end speech recognition using multi-task learning 2017 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). 4835--4839.  S. Kim T. Hori and S. Watanabe. 2017. Joint CTC-attention based end-to-end speech recognition using multi-task learning 2017 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). 4835--4839.","DOI":"10.1109\/ICASSP.2017.7953075"},{"key":"e_1_3_2_1_11_1","unstructured":"Edward Nitchie. 1919. Lip-reading Principles and Practice. Frederick A. Stokes Company.  Edward Nitchie. 1919. Lip-reading Principles and Practice. Frederick A. Stokes Company."},{"volume-title":"ICASSP","author":"Petridis Stavros","key":"e_1_3_2_1_12_1"},{"key":"e_1_3_2_1_13_1","first-page":"9","article-title":"Recent advances in the automatic recognition of audiovisual speech","volume":"91","author":"Potamianos G.","year":"2003","journal-title":"Proc. IEEE"},{"volume-title":"Lip Reading Sentences in the Wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","year":"2017","author":"Chung Joon Son","key":"e_1_3_2_1_14_1"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"George Sterpu and Naomi Harte. 2017. Towards lipreading sentences using Active Appearance Models AVSP. Stockholm Sweden.  George Sterpu and Naomi Harte. 2017. Towards lipreading sentences using Active Appearance Models AVSP. Stockholm Sweden.","DOI":"10.21437\/AVSP.2017-14"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242969.3243014"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"George Sterpu Christian Saam and Naomi Harte. 2018 b. Can DNNs Learn to Lipread Full Sentences? ArXiv e-prints (May. 2018). {arxiv}1805.11685  George Sterpu Christian Saam and Naomi Harte. 2018 b. Can DNNs Learn to Lipread Full Sentences? ArXiv e-prints (May. 2018). {arxiv}1805.11685","DOI":"10.1109\/ICIP.2018.8451388"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2815268"},{"key":"e_1_3_2_1_19_1","unstructured":"Kwanchiva Thangthai Helen L. Bear and Richard Harvey. 2017. Comparing phonemes and visemes with DNN-based lipreading Workshop on Lip-Reading using deep learning methods (BMVC 2017).  Kwanchiva Thangthai Helen L. Bear and Richard Harvey. 2017. Comparing phonemes and visemes with DNN-based lipreading Workshop on Lip-Reading using deep learning methods (BMVC 2017)."},{"volume-title":"Multi-attention Recurrent Network for Human Communication Comprehension AAAI Conference on Artificial Intelligence. https:\/\/aaai.org\/ocs\/index.php\/AAAI\/AAAI18\/paper\/view\/17390","year":"2018","author":"Zadeh Amir","key":"e_1_3_2_1_20_1"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2014.06.004"}],"event":{"name":"ICMI '18: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI Specialist Interest Group in Computer-Human Interaction of the ACM"],"location":"Boulder CO USA","acronym":"ICMI '18"},"container-title":["Proceedings of the 20th ACM International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3242969.3264976","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3242969.3264976","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:06:58Z","timestamp":1750212418000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3242969.3264976"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,2]]},"references-count":21,"alternative-id":["10.1145\/3242969.3264976","10.1145\/3242969"],"URL":"https:\/\/doi.org\/10.1145\/3242969.3264976","relation":{},"subject":[],"published":{"date-parts":[[2018,10,2]]},"assertion":[{"value":"2018-10-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}