{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,24]],"date-time":"2025-03-24T06:39:16Z","timestamp":1742798356726},"reference-count":7,"publisher":"World Scientific Pub Co Pte Lt","issue":"01","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Artif. Intell. Tools"],"published-print":{"date-parts":[[1999,3]]},"abstract":"<jats:p> In recent years a number of techniques have been proposed to improve the accuracy and the robustness of automatic speech recognition in noisy environments. Among these, suplementing the acoustic information with visual data, mostly extracted from speaker's lip shapes, has been proved to be successful. We have already demonstrated the effectiveness of integrating visual data at two different levels during speech decoding according to both direct and separate identification strategies (DI+SI). This paper outlines methods for reinforcing the visible speech recognition in the framework of separate identification. First, we define visual-specific units using a self-organizing mapping technique. Second, we complete a stochastic learning of these units with a discriminative neural-network-based technique for speech recognition purposes. Finally, we show on a connected-letter speech recognition task that using these methods improves performances of the DI+SI based system under varying noise-level conditions. <\/jats:p>","DOI":"10.1142\/s021821309900004x","type":"journal-article","created":{"date-parts":[[2003,4,22]],"date-time":"2003-04-22T07:42:22Z","timestamp":1050997342000},"page":"43-52","source":"Crossref","is-referenced-by-count":8,"title":["DISCRIMINATIVE LEARNING OF VISUAL DATA FOR AUDIOVISUAL SPEECH RECOGNITION"],"prefix":"10.1142","volume":"08","author":[{"given":"ALEXANDRINA","family":"ROGOZAN","sequence":"first","affiliation":[{"name":"Laboratoire d'Informatique de l'Universit\u00e9 du Maine, 72085 Le Mans Cedex 9, France"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"p_5","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(98)00056-9"},{"key":"p_8","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1007\/978-3-662-13015-5_4","volume":"150","author":"Kricos P.","year":"1996","journal-title":"NATO ASI Series. Series F: Computer and Systems Science"},{"key":"p_9","first-page":"336","author":"Rogozan A.","year":"1998","journal-title":"Washinghton D. C."},{"key":"p_10","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1967.1054010"},{"key":"p_12","doi-asserted-by":"publisher","DOI":"10.1109\/72.554199"},{"key":"p_13","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1007\/978-3-662-13015-5_39","volume":"150","author":"Goldschen A.","year":"1996","journal-title":"NATO ASI Series. Series F: Computer and Systems Science"},{"key":"p_14","author":"Waibel A.","year":"1989","journal-title":"IEEE Transactions on Acoustics, Speech and Signal Processing ("}],"container-title":["International Journal on Artificial Intelligence Tools"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S021821309900004X","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T13:46:01Z","timestamp":1565185561000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S021821309900004X"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1999,3]]},"references-count":7,"journal-issue":{"issue":"01","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[1999,3]]}},"alternative-id":["10.1142\/S021821309900004X"],"URL":"https:\/\/doi.org\/10.1142\/s021821309900004x","relation":{},"ISSN":["0218-2130","1793-6349"],"issn-type":[{"value":"0218-2130","type":"print"},{"value":"1793-6349","type":"electronic"}],"subject":[],"published":{"date-parts":[[1999,3]]}}}