{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T17:25:13Z","timestamp":1760549113006},"reference-count":20,"publisher":"World Scientific Pub Co Pte Lt","issue":"02","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Comp. Intel. Appl."],"published-print":{"date-parts":[[2011,6]]},"abstract":"<jats:p> A lip-reading technique that identifies visemes from visual data only and without evaluating the corresponding acoustic signals is presented. The technique is based on vertical components of the optical flow (OF) analysis and these are classified using support vector machines (SVM). The OF is decomposed into multiple non-overlapping fixed scale blocks and statistical features of each block are computed for successive video frames of an utterance. This technique performs automatic temporal segmentation (i.e., determining the start and the end of an utterance) of the utterances, achieved by pair-wise pixel comparison method, which evaluates the differences in intensity of corresponding pixels in two successive frames. The experiments were conducted on a database of 14 visemes taken from seven subjects and the accuracy tested using five and ten fold cross validation for binary and multiclass SVM respectively to determine the impact of subject variations. Unlike other systems in the literature, the results indicate that the proposed method is more robust to inter-subject variations with high sensitivity and specificity for 12 out of 14 visemes. Potential applications of such a system include human computer interface (HCI) for mobility-impaired users, lip reading mobile phones, in-vehicle systems, and improvement of speech based computer control in noisy environment. <\/jats:p>","DOI":"10.1142\/s1469026811003045","type":"journal-article","created":{"date-parts":[[2011,6,29]],"date-time":"2011-06-29T09:57:22Z","timestamp":1309341442000},"page":"167-187","source":"Crossref","is-referenced-by-count":10,"title":["VISUAL SPEECH RECOGNITION USING OPTICAL FLOW AND SUPPORT VECTOR MACHINES"],"prefix":"10.1142","volume":"10","author":[{"given":"AYAZ A.","family":"SHAIKH","sequence":"first","affiliation":[{"name":"School of Electrical and Computer Engineering and Health Innovations Research Institute RMIT University, Vic 3001, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"DINESH K.","family":"KUMAR","sequence":"additional","affiliation":[{"name":"School of Electrical and Computer Engineering and Health Innovations Research Institute RMIT University, Vic 3001, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"JAYAVARDHANA","family":"GUBBI","sequence":"additional","affiliation":[{"name":"ISSNIP, Department of Electrical and Electronic Engineering, The University of Melbourne, Vic 3010, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2012,4,30]]},"reference":[{"key":"rf1","doi-asserted-by":"publisher","DOI":"10.1142\/S0219467808003167"},{"key":"rf3","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2003.817150"},{"key":"rf4","volume-title":"Issues in Visual and Audio-Visual Speech Processing","author":"Potamianos G.","year":"2004"},{"key":"rf5","doi-asserted-by":"publisher","DOI":"10.1002\/scj.4690220607"},{"key":"rf7","doi-asserted-by":"publisher","DOI":"10.1109\/34.982900"},{"key":"rf8","volume":"1","author":"Iwano K.","journal-title":"EURASIP Journal on Audio, Speech and Music Processing"},{"key":"rf10","doi-asserted-by":"publisher","DOI":"10.1109\/49.363147"},{"key":"rf11","doi-asserted-by":"publisher","DOI":"10.1109\/79.911195"},{"key":"rf12","first-page":"1228","author":"Zhang X.","journal-title":"EURASIP J. Appl. Signal Process"},{"key":"rf16","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.1996.0006"},{"key":"rf17","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(81)90024-2"},{"key":"rf19","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/7132.001.0001","volume-title":"Visual reconstruction","author":"Blake A.","year":"1987"},{"key":"rf20","first-page":"211","volume":"61","author":"Bruhun A.","journal-title":"IJCV"},{"key":"rf22","first-page":"775","author":"Wan V.","journal-title":"Proc. of Neural Networks for Signal Processing"},{"key":"rf24","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.857572"},{"key":"rf29","doi-asserted-by":"publisher","DOI":"10.1016\/S0923-5965(00)00011-4"},{"key":"rf30","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-006-4329-6"},{"key":"rf32","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2008.2011515"},{"key":"rf33","doi-asserted-by":"publisher","DOI":"10.1142\/S1469026806001800"},{"key":"rf35","doi-asserted-by":"publisher","DOI":"10.1007\/11539117_25"}],"container-title":["International Journal of Computational Intelligence and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S1469026811003045","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T00:27:46Z","timestamp":1565137666000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S1469026811003045"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,6]]},"references-count":20,"journal-issue":{"issue":"02","published-online":{"date-parts":[[2012,4,30]]},"published-print":{"date-parts":[[2011,6]]}},"alternative-id":["10.1142\/S1469026811003045"],"URL":"https:\/\/doi.org\/10.1142\/s1469026811003045","relation":{},"ISSN":["1469-0268","1757-5885"],"issn-type":[{"value":"1469-0268","type":"print"},{"value":"1757-5885","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,6]]}}}