{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T10:11:32Z","timestamp":1742379092480},"publisher-location":"New York, NY, USA","reference-count":33,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2004,10,13]]},"DOI":"10.1145\/1027933.1027960","type":"proceedings-article","created":{"date-parts":[[2005,1,30]],"date-time":"2005-01-30T17:58:48Z","timestamp":1107107928000},"page":"152-158","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Articulatory features for robust visual speech recognition"],"prefix":"10.1145","author":[{"given":"Kate","family":"Saenko","sequence":"first","affiliation":[{"name":"MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA"}]},{"given":"Trevor","family":"Darrell","sequence":"additional","affiliation":[{"name":"MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA"}]},{"given":"James R.","family":"Glass","sequence":"additional","affiliation":[{"name":"MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA"}]}],"member":"320","published-online":{"date-parts":[[2004,10,13]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"461","volume-title":"On the integration of auditory and visual parameters in HMM-based ASR,\" in Speechreading by Humans and Machines","author":"Adjoudani A.","year":"1996","unstructured":"A. Adjoudani and C. Benoit , \" On the integration of auditory and visual parameters in HMM-based ASR,\" in Speechreading by Humans and Machines , D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany : Springer , pp. 461 -- 471 , 1996 . A. Adjoudani and C. Benoit, \"On the integration of auditory and visual parameters in HMM-based ASR,\" in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer, pp. 461--471, 1996."},{"key":"e_1_3_2_1_2_1","first-page":"309","volume-title":"noise suppression with pattern matching,\" In Advances in Speech Signal Processing","author":"Boll S.","year":"1992","unstructured":"S. Boll , \"Speech enhancement in the 1980s : noise suppression with pattern matching,\" In Advances in Speech Signal Processing , pp. 309 -- 325 , Dekker , 1992 . S. Boll, \"Speech enhancement in the 1980s: noise suppression with pattern matching,\" In Advances in Speech Signal Processing, pp. 309--325, Dekker, 1992."},{"key":"e_1_3_2_1_3_1","volume-title":"ICASSP","author":"Bregler C.","year":"1994","unstructured":"C. Bregler and Y. Konig , \" Eigenlips for Robust Speech Recognition,\" In Proc . ICASSP , 1994 . C. Bregler and Y. Konig, \"Eigenlips for Robust Speech Recognition,\" In Proc. ICASSP, 1994."},{"key":"e_1_3_2_1_4_1","first-page":"65","volume-title":"Real-time lip tracking and bimodal continuous speech recognition,\" in Proc. Works. Multimedia Signal Processing","author":"Chan M.","year":"1998","unstructured":"M. Chan , Y. Zhang , and T. Huang , \" Real-time lip tracking and bimodal continuous speech recognition,\" in Proc. Works. Multimedia Signal Processing , pp. 65 -- 70 , Redondo Beach , CA , 1998 . M. Chan, Y. Zhang, and T. Huang, \"Real-time lip tracking and bimodal continuous speech recognition,\" in Proc. Works. Multimedia Signal Processing, pp. 65--70, Redondo Beach, CA, 1998."},{"key":"e_1_3_2_1_5_1","volume-title":"LIBSVM: A Library For Support Vector Machines","author":"Chang C.","year":"2001","unstructured":"C. Chang and C. Lin , LIBSVM: A Library For Support Vector Machines , 2001 . Software available at http:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvm. C. Chang and C. Lin, LIBSVM: A Library For Support Vector Machines, 2001. Software available at http:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvm."},{"key":"e_1_3_2_1_6_1","volume-title":"Harper and Row","author":"Chomsky N.","year":"1968","unstructured":"N. Chomsky and M. Halle , The Sound Pattern of English , Harper and Row , New York , 1968 . N. Chomsky and M. Halle, The Sound Pattern of English, Harper and Row, New York, 1968."},{"key":"e_1_3_2_1_7_1","first-page":"747","volume-title":"Int. Conf. Spoken Lang. Processing","author":"Chu S.","year":"2000","unstructured":"S. Chu and T. Huang , \" Bimodal speech recognition using coupled hidden Markov models,\" In Proc . Int. Conf. Spoken Lang. Processing , vol. II , Beijing, China , pp. 747 -- 750 , 2000 . S. Chu and T. Huang, \"Bimodal speech recognition using coupled hidden Markov models,\" In Proc. Int. Conf. Spoken Lang. Processing, vol. II, Beijing, China, pp. 747--750, 2000."},{"key":"e_1_3_2_1_8_1","first-page":"484","volume-title":"Europ. Conf","author":"Cootes T.","year":"1998","unstructured":"T. Cootes , G. Edwards , and C. Taylor , \" Active appearance models,\" In Proc . Europ. Conf . Computer Vision, Germany , pp. 484 -- 498 , 1998 . T. Cootes, G. Edwards, and C. Taylor, \"Active appearance models,\" In Proc. Europ. Conf. Computer Vision, Germany, pp. 484--498, 1998."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.1995.1004"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/6046.865479"},{"key":"e_1_3_2_1_11_1","volume-title":"Acoustic Theory of Speech Production","author":"Fant G.","year":"1960","unstructured":"G. Fant , Acoustic Theory of Speech Production , Netherlands : Mouton and Co. , 1960 . G. Fant, Acoustic Theory of Speech Production, Netherlands: Mouton and Co., 1960."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1155\/S1110865702207039"},{"key":"e_1_3_2_1_13_1","first-page":"177","volume-title":"Int. Conf. Acoust., Speech, Signal Processing","author":"Gurbuz S.","year":"2001","unstructured":"S. Gurbuz , Z. Tufekci , E. Patterson , and J. Gowdy , \" Application of affine-invariant fourier descriptors to lipreading for audio-visual speech recognition,\" in Proc . Int. Conf. Acoust., Speech, Signal Processing , pp. 177 -- 180 , Salt Lake City, UT , 2001 . S. Gurbuz, Z. Tufekci, E. Patterson, and J. Gowdy, \"Application of affine-invariant fourier descriptors to lipreading for audio-visual speech recognition,\" in Proc. Int. Conf. Acoust., Speech, Signal Processing, pp. 177--180, Salt Lake City, UT, 2001."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00133570"},{"key":"e_1_3_2_1_15_1","volume-title":"ICSLP","author":"King S.","year":"1998","unstructured":"S. King , T. Stephenson , S. Isard , P. Taylor and A. Strachan , \" Speech recognition via phonetically featured syllables,\" In Proc . ICSLP , Sydney , 1998 . S. King, T. Stephenson, S. Isard, P. Taylor and A. Strachan, \"Speech recognition via phonetically featured syllables,\" In Proc. ICSLP, Sydney, 1998."},{"key":"e_1_3_2_1_16_1","first-page":"891","volume-title":"ICSLP","author":"Kirchhoff K.","year":"1998","unstructured":"K. Kirchhoff , G. Fink and G. Sagerer , \" Combining Acoustic and Articulatory-feature Information for Robust Speech Recognition,\" In Proc . ICSLP , pp. 891 -- 894 , Sydney , 1998 . K. Kirchhoff, G. Fink and G. Sagerer, \"Combining Acoustic and Articulatory-feature Information for Robust Speech Recognition,\" In Proc. ICSLP, pp. 891--894, Sydney, 1998."},{"key":"e_1_3_2_1_17_1","first-page":"57","volume-title":"Europ. Tut. Works. Audio-Visual Speech Processing","author":"Krone G.","year":"1997","unstructured":"G. Krone , B. Talle , A. Wichert , and G. Palm , \" Neural architectures for sensor fusion in speech recognition,\" In Proc . Europ. Tut. Works. Audio-Visual Speech Processing , pp. 57 -- 60 , Greece , 1997 . G. Krone, B. Talle, A. Wichert, and G. Palm, \"Neural architectures for sensor fusion in speech recognition,\" In Proc. Europ. Tut. Works. Audio-Visual Speech Processing, pp. 57--60, Greece, 1997."},{"key":"e_1_3_2_1_18_1","volume-title":"HLT\/NAACL","author":"Livescu K.","year":"2004","unstructured":"K. Livescu and J. Glass , \" Feature-based Pronunciation Modeling for Speech Recognition,\" In Proc . HLT\/NAACL , Boston , May , 2004 . K. Livescu and J. Glass, \"Feature-based Pronunciation Modeling for Speech Recognition,\" In Proc. HLT\/NAACL, Boston, May, 2004."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1002\/scj.4690220607"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.982900"},{"key":"e_1_3_2_1_21_1","volume-title":"ICSLP","author":"Metze F.","year":"2002","unstructured":"F. Metze , and A. Waibel , \" A Flexible Stream Architecture for ASR Using Articulatory Features,\" In Proc . ICSLP , Denver , 2002 . F. Metze, and A. Waibel, \"A Flexible Stream Architecture for ASR Using Articulatory Features,\" In Proc. ICSLP, Denver, 2002."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1907526"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMSP.2001.962801"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2000.861925"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the Audio Visual Speech Conference","author":"Niyogi P.","year":"1999","unstructured":"P. Niyogi , E. Petajan , and J. Zhong , \" Feature Based Representation for Audio-Visual Speech Recognition \", Proceedings of the Audio Visual Speech Conference , Santa Cruz, CA , 1999 . P. Niyogi, E. Petajan, and J. Zhong, \"Feature Based Representation for Audio-Visual Speech Recognition\", Proceedings of the Audio Visual Speech Conference, Santa Cruz, CA, 1999."},{"key":"e_1_3_2_1_26_1","first-page":"265","volume-title":"Global Telecomm. Conf.","author":"Petajan E.","year":"1984","unstructured":"E. Petajan , \"Automatic lipreading to enhance speech recognition,\" In Proc . Global Telecomm. Conf. , pp. 265 -- 272 , Atlanta, GA , 1984 . E. Petajan, \"Automatic lipreading to enhance speech recognition,\" In Proc. Global Telecomm. Conf., pp. 265--272, Atlanta, GA, 1984."},{"key":"e_1_3_2_1_27_1","first-page":"1293","volume-title":"Eur. Conf. Speech Comm. Tech.","author":"Potamianos G.","year":"2003","unstructured":"G. Potamianos and C. Neti , \" Audio-visual speech recognition in challenging environments,\" In Proc . Eur. Conf. Speech Comm. Tech. , pp. 1293 -- 1296 , Geneva , September , 2003 . G. Potamianos and C. Neti, \"Audio-visual speech recognition in challenging environments,\" In Proc. Eur. Conf. Speech Comm. Tech., pp. 1293--1296, Geneva, September, 2003."},{"key":"e_1_3_2_1_28_1","volume-title":"Proc","author":"Potamianos G.","year":"2003","unstructured":"G. Potamianos , C. Neti , G. Gravier , A. Garg , and A. Senior , \" Recent Advances in the Automatic Recognition of Audio-Visual Speech \", In Proc . IEEE , 2003 . G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. Senior, \"Recent Advances in the Automatic Recognition of Audio-Visual Speech\", In Proc. IEEE, 2003."},{"key":"e_1_3_2_1_29_1","first-page":"1097","volume-title":"ICME","author":"Potamianos G.","year":"2000","unstructured":"G. Potamianos , A. Verma , C. Neti , G. Iyengar , and S. Basu , \" A Cascade Image Transform for Speaker-Independent Automatic Speechreading,\" In Proc . ICME , volume II , pp. 1097 -- 1100 , New York , 2000 . G. Potamianos, A. Verma, C. Neti, G. Iyengar, and S. Basu, \"A Cascade Image Transform for Speaker-Independent Automatic Speechreading,\" In Proc. ICME, volume II, pp. 1097--1100, New York, 2000."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1907309"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1420380"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.799688"},{"key":"e_1_3_2_1_33_1","volume-title":"Statistical Learning Theory","author":"Vapnik V.","year":"1998","unstructured":"V. Vapnik , Statistical Learning Theory , J. Wiley , New York , 1998 . V. Vapnik, Statistical Learning Theory, J. Wiley, New York, 1998."}],"event":{"name":"ICMI04: Sixth International Conference on Multimodal Interfaces 2004","sponsor":["ACM Association for Computing Machinery","SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"State College PA USA","acronym":"ICMI04"},"container-title":["Proceedings of the 6th international conference on Multimodal interfaces"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1027933.1027960","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T17:51:13Z","timestamp":1673459473000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1027933.1027960"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,10,13]]},"references-count":33,"alternative-id":["10.1145\/1027933.1027960","10.1145\/1027933"],"URL":"https:\/\/doi.org\/10.1145\/1027933.1027960","relation":{},"subject":[],"published":{"date-parts":[[2004,10,13]]},"assertion":[{"value":"2004-10-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}