{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,9]],"date-time":"2026-07-09T14:42:20Z","timestamp":1783608140344,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":20,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2004,10,13]]},"DOI":"10.1145\/1027933.1027972","type":"proceedings-article","created":{"date-parts":[[2005,1,30]],"date-time":"2005-01-30T12:58:48Z","timestamp":1107089928000},"page":"235-242","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":58,"title":["A segment-based audio-visual speech recognizer"],"prefix":"10.1145","author":[{"given":"Timothy J.","family":"Hazen","sequence":"first","affiliation":[{"name":"MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kate","family":"Saenko","sequence":"additional","affiliation":[{"name":"MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chia-Hao","family":"La","sequence":"additional","affiliation":[{"name":"MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"James R.","family":"Glass","sequence":"additional","affiliation":[{"name":"MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2004,10,13]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"32","volume-title":"Journal on Communications of the Scientific Society for Telecommunications, Hungary, number 43","author":"Benoit C.","year":"1992","unstructured":"C. Benoit . The intrinsic bimodality of speech communication and the synthesis of talking faces . In Journal on Communications of the Scientific Society for Telecommunications, Hungary, number 43 , pages 32 -- 40 , September 1992 . C. Benoit. The intrinsic bimodality of speech communication and the synthesis of talking faces. In Journal on Communications of the Scientific Society for Telecommunications, Hungary, number 43, pages 32--40, September 1992."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMSP.1998.738914"},{"key":"e_1_3_2_1_3_1","volume-title":"Proc. of the International Conference on Spoken Language Processing","author":"Chu S.","year":"2000","unstructured":"S. Chu and T. Huang . Bimodal speech recognition using coupled hidden Markov models . In Proc. of the International Conference on Spoken Language Processing , vol. II , Beijing , October 2000 . S. Chu and T. Huang. Bimodal speech recognition using coupled hidden Markov models. In Proc. of the International Conference on Spoken Language Processing, vol. II, Beijing, October 2000."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/6046.865479"},{"key":"e_1_3_2_1_5_1","volume-title":"A probabilistic framework for segment-based speech recognition. To appear in Computer Speech and Language","author":"Glass J.","year":"2003","unstructured":"J. Glass . A probabilistic framework for segment-based speech recognition. To appear in Computer Speech and Language , 2003 . J. Glass. A probabilistic framework for segment-based speech recognition. To appear in Computer Speech and Language, 2003."},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of ICSLP 98","author":"Halberstadt A.","year":"1998","unstructured":"A. Halberstadt and J. Glass . Heterogeneous measurements and multiple classifiers for speech recognition . In Proceedings of ICSLP 98 , Sydney, Australia , November 1998 . A. Halberstadt and J. Glass. Heterogeneous measurements and multiple classifiers for speech recognition. In Proceedings of ICSLP 98, Sydney, Australia, November 1998."},{"key":"e_1_3_2_1_7_1","volume-title":"Seattle","author":"Hazen T. J.","year":"1998","unstructured":"T. J. Hazen and A. Halberstadt , \" Using aggregation to improve the performance of mixture Gaussian acoustic models,\" In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing , Seattle , May , 1998 . T. J. Hazen and A. Halberstadt, \"Using aggregation to improve the performance of mixture Gaussian acoustic models,\" In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Seattle, May, 1998."},{"key":"e_1_3_2_1_8_1","volume-title":"May","author":"Audio Visual Speech Technologies IBM","year":"2003","unstructured":"IBM Research - Audio Visual Speech Technologies : Data Collection. Accessed online at http:\/\/www.research.ibm.com\/AVSTG\/data.html , May 2003 . IBM Research - Audio Visual Speech Technologies: Data Collection. Accessed online at http:\/\/www.research.ibm.com\/AVSTG\/data.html, May 2003."},{"key":"e_1_3_2_1_9_1","unstructured":"Intel's AVCSR Toolkit source code can be downloaded from http:\/\/sourceforge.net\/projects\/opencvlibrary\/.  Intel's AVCSR Toolkit source code can be downloaded from http:\/\/sourceforge.net\/projects\/opencvlibrary\/."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/29.46546"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2002.1035365"},{"key":"e_1_3_2_1_12_1","first-page":"38","volume-title":"Proc. of the International Conference on Spoken Language Processing","author":"Matthews I.","year":"1996","unstructured":"I. Matthews , J. A. Bangham , and S. Cox . Audio-visual speech recognition using multiscale nonlinear image decomposition . In Proc. of the International Conference on Spoken Language Processing , pp. 38 -- 41 , Philadelphia, PA , 1996 . I. Matthews, J. A. Bangham, and S. Cox. Audio-visual speech recognition using multiscale nonlinear image decomposition. In Proc. of the International Conference on Spoken Language Processing, pp. 38--41, Philadelphia, PA, 1996."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001400000374"},{"key":"e_1_3_2_1_14_1","first-page":"72 99","volume-title":"AVBPA'99","author":"Messer K.","year":"1999","unstructured":"K. Messer , J. Matas , J. Kittler , and K. Jonsson . XM2VTSDB: The extended M2VTS database. In Audio- and Video-based Biometric Person Authentication , AVBPA'99 , pages 72 -- 77 , Washington, D.C. , March 1999 . 16 IDIAP--RR 99 - 02 . K. Messer, J. Matas, J. Kittler, and K. Jonsson. XM2VTSDB: The extended M2VTS database. In Audio- and Video-based Biometric Person Authentication, AVBPA'99, pages 72--77, Washington, D.C., March 1999. 16 IDIAP--RR 99-02."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/646072.677299"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","first-page":"1293","DOI":"10.21437\/Eurospeech.2003-410","volume-title":"Proc. Of EUROSPEECH","author":"Potamianos G.","year":"2003","unstructured":"G. Potamianos and C. Neti . Audio-visual speech recognition in challenging environments . In Proc. Of EUROSPEECH , pp. 1293 -- 1296 , Geneva, Switzerland , September 2003 . G. Potamianos and C. Neti. Audio-visual speech recognition in challenging environments. In Proc. Of EUROSPEECH, pp. 1293--1296, Geneva, Switzerland, September 2003."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1027933.1027960"},{"key":"e_1_3_2_1_19_1","volume-title":"Martigny","author":"Sanderson C.","year":"2002","unstructured":"C. Sanderson . The VidTIMIT Database. IDIAP Communication 02-06 , Martigny , Switzerland , 2002 . C. Sanderson. The VidTIMIT Database. IDIAP Communication 02-06, Martigny, Switzerland, 2002."},{"key":"e_1_3_2_1_21_1","volume-title":"Proc. 1999 IEEE ASRU Workshop","author":"Strom N.","year":"1999","unstructured":"N. Strom , L. Hetherington , T.J. Hazen , E. Sandness , and J. Glass . Acoustic modeling improvements in a segment-based speech recognizer . In Proc. 1999 IEEE ASRU Workshop , Keystone, CO , December 1999 . N. Strom, L. Hetherington, T.J. Hazen, E. Sandness, and J. Glass. Acoustic modeling improvements in a segment-based speech recognizer. In Proc. 1999 IEEE ASRU Workshop, Keystone, CO, December 1999."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-6393(90)90010-7"}],"event":{"name":"ICMI04: Sixth International Conference on Multimodal Interfaces 2004","location":"State College PA USA","acronym":"ICMI04","sponsor":["ACM Association for Computing Machinery","SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Proceedings of the 6th international conference on Multimodal interfaces"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1027933.1027972","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T12:51:42Z","timestamp":1673441502000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1027933.1027972"}},"subtitle":["data collection, development, and initial experiments"],"short-title":[],"issued":{"date-parts":[[2004,10,13]]},"references-count":20,"alternative-id":["10.1145\/1027933.1027972","10.1145\/1027933"],"URL":"https:\/\/doi.org\/10.1145\/1027933.1027972","relation":{},"subject":[],"published":{"date-parts":[[2004,10,13]]},"assertion":[{"value":"2004-10-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}