{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,17]],"date-time":"2025-04-17T14:30:04Z","timestamp":1744900204853,"version":"3.37.3"},"reference-count":27,"publisher":"World Scientific Pub Co Pte Ltd","issue":"01","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2020,1]]},"abstract":"<jats:p> Traditional automatic lip-reading systems generally consist of two stages: feature extraction and recognition, while the handcrafted features are empirical and cannot learn the relevance of lip movement sequence sufficiently. Recently, deep learning approaches have attracted increasing attention, especially the significant improvements of convolution neural network (CNN) applied to image classification and long short-term memory (LSTM) used in speech recognition, video processing and text analysis. In this paper, we propose a hybrid neural network architecture, which integrates CNN and bidirectional LSTM (BiLSTM) for lip reading. First, we extract key frames from each isolated video clip and use five key points to locate mouth region. Then, features are extracted from raw mouth images using an eight-layer CNN. The extracted features have the characteristics of stronger robustness and fault-tolerant\u00a0capability. Finally, we use BiLSTM to capture the correlation of sequential information among frame features in two directions and the softmax function to predict final recognition result. The proposed method is capable of extracting local features through convolution operations and finding hidden correlation in temporal information from lip image sequences. The evaluation results of lip-reading recognition experiments demonstrate that our proposed method outperforms conventional approaches such as active contour model (ACM) and hidden Markov model (HMM). <\/jats:p>","DOI":"10.1142\/s0218001420540038","type":"journal-article","created":{"date-parts":[[2019,3,20]],"date-time":"2019-03-20T04:25:55Z","timestamp":1553055955000},"page":"2054003","source":"Crossref","is-referenced-by-count":10,"title":["Automatic Lip Reading Using Convolution Neural Network and Bidirectional Long Short-term Memory"],"prefix":"10.1142","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7985-269X","authenticated-orcid":false,"given":"Yuanyao","family":"Lu","sequence":"first","affiliation":[{"name":"School of Electronic and Information Engineering,    North China University of Technology, Beijing, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jie","family":"Yan","sequence":"additional","affiliation":[{"name":"School of Electronic and Information Engineering,    North China University of Technology, Beijing, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2019,5,24]]},"reference":[{"doi-asserted-by":"publisher","key":"S0218001420540038BIB001","DOI":"10.1109\/ICASSP.2004.1327261"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB002","DOI":"10.1007\/s10772-016-9332-x"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB003","DOI":"10.1007\/978-3-642-15760-8_33"},{"key":"S0218001420540038BIB004","first-page":"648","volume-title":"Chinese Control and Decision Conference","volume":"23","author":"Fan X.","year":"2012"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB005","DOI":"10.1007\/978-3-642-24797-2_4"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB006","DOI":"10.1016\/j.neunet.2005.06.042"},{"issue":"1","key":"S0218001420540038BIB007","first-page":"218","volume":"13","author":"Hassanat A. B.","year":"2014","journal-title":"Int. J. Sci. Basic Appl. Res."},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB008","DOI":"10.1162\/neco.1997.9.8.1735"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB009","DOI":"10.1016\/j.imavis.2016.03.003"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB010","DOI":"10.1109\/CVPR.2017.604"},{"issue":"3","key":"S0218001420540038BIB011","first-page":"1755","volume":"10","author":"King D. E.","year":"2009","journal-title":"J. Mach. Learning Res."},{"key":"S0218001420540038BIB012","first-page":"1097","volume-title":"Int. Conf. Neural Information Processing Systems","volume":"60","author":"Krizhevsky A.","year":"2012"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB013","DOI":"10.1007\/978-3-319-48881-3_6"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB014","DOI":"10.1109\/ICIS.2016.7550888"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB015","DOI":"10.1109\/ICASSP.1996.543246"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB016","DOI":"10.1109\/ICME.2001.1237849"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB017","DOI":"10.1007\/s10489-014-0629-7"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB018","DOI":"10.1109\/ICASSP.2017.7952625"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB019","DOI":"10.1016\/j.eswa.2010.09.119"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB020","DOI":"10.1007\/s11263-015-0816-y"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB021","DOI":"10.1016\/j.neunet.2014.08.005"},{"key":"S0218001420540038BIB022","first-page":"625","volume-title":"IEEE Fourth Workshop on Multimedia Signal Processing","author":"Scanlon P.","year":"2012"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB023","DOI":"10.1109\/78.650093"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB024","DOI":"10.1109\/CISP.2010.5646264"},{"key":"S0218001420540038BIB025","first-page":"98","volume-title":"4th IAPR TC 9 Workshop on Pattern Recognition of Social Signals in Human-Computer-Interaction","author":"Thanda A.","year":"2016"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB026","DOI":"10.1109\/ICASSP.2016.7472852"},{"doi-asserted-by":"publisher","key":"S0218001420540038BIB027","DOI":"10.1145\/2907069"}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001420540038","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,2,21]],"date-time":"2020-02-21T04:41:15Z","timestamp":1582260075000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0218001420540038"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,24]]},"references-count":27,"journal-issue":{"issue":"01","published-print":{"date-parts":[[2020,1]]}},"alternative-id":["10.1142\/S0218001420540038"],"URL":"https:\/\/doi.org\/10.1142\/s0218001420540038","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"type":"print","value":"0218-0014"},{"type":"electronic","value":"1793-6381"}],"subject":[],"published":{"date-parts":[[2019,5,24]]}}}