{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T17:34:57Z","timestamp":1772645697438,"version":"3.50.1"},"reference-count":17,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2020,2,1]]},"DOI":"10.1587\/transinf.2019edp7234","type":"journal-article","created":{"date-parts":[[2020,1,31]],"date-time":"2020-01-31T22:09:54Z","timestamp":1580508594000},"page":"406-415","source":"Crossref","is-referenced-by-count":13,"title":["Automatic Construction of a Large-Scale Speech Recognition Database Using Multi-Genre Broadcast Data with Inaccurate Subtitle Timestamps"],"prefix":"10.1587","volume":"E103.D","author":[{"given":"Jeong-Uk","family":"BANG","sequence":"first","affiliation":[{"name":"School of Electronics Engineering, Chungbuk National University"}]},{"given":"Mu-Yeol","family":"CHOI","sequence":"additional","affiliation":[{"name":"Electronics and Telecommunications Research Institute"}]},{"given":"Sang-Hun","family":"KIM","sequence":"additional","affiliation":[{"name":"Electronics and Telecommunications Research Institute"}]},{"given":"Oh-Wook","family":"KWON","sequence":"additional","affiliation":[{"name":"School of Electronics Engineering, Chungbuk National University"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"[1] H. Liao, E. McDermott, and A. Senior, \u201cLarge scale deep neural network acoustic modeling with semi-supervised training data for YouTube transcription,\u201d 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.368-373, Olomouc, Czech Republic, Dec. 2013. 10.1109\/asru.2013.6707758","DOI":"10.1109\/ASRU.2013.6707758"},{"key":"2","doi-asserted-by":"crossref","unstructured":"[2] A. Mansikkaniemi, P. Smit, and M. Kurimo, \u201cAutomatic construction of the Finnish Parliament Speech Corpus,\u201d Interspeech 2017, pp.3762-3766, Stockholm, Sweden, Aug. 2017. 10.21437\/interspeech.2017-1115","DOI":"10.21437\/Interspeech.2017-1115"},{"key":"3","doi-asserted-by":"crossref","unstructured":"[3] P. Lanchantin, M.J.F. Gales, P. Karanasou, X. Liu, Y. Qian, L. Wang, P.C. Woodland, and C. Zhang, \u201cThe development of the Cambridge university alignment systems for the multi-genre broadcast challenge,\u201d 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.647-653, Scottsdale, Arizona, USA, Dec. 2015. 10.1109\/asru.2015.7404857","DOI":"10.1109\/ASRU.2015.7404857"},{"key":"4","doi-asserted-by":"crossref","unstructured":"[4] P. Lanchantin, M.J.F. Gales, P. Karanasou, X. Liu, Y. Qian, L. Wang, P.C. Woodland, and C. Zhang, \u201cSelection of multi-genre broadcast data for the training of automatic speech recognition systems,\u201d Interspeech 2016, pp.3057-3061, San Francisco, USA, Sept. 2016. 10.21437\/interspeech.2016-462","DOI":"10.21437\/Interspeech.2016-462"},{"key":"5","doi-asserted-by":"crossref","unstructured":"[5] P. Bell, M.J.F. Gales, T. Hain, J. Kilgour, P. Lanchantin, X. Liu, A. McParland, S. Renals, O. Saz, M. Wester, and P.C. Woodland, \u201cThe MGB Challenge: Evaluating multi-genre broadcast media transcription,\u201d 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.687-693, Scottsdale, Arizona, USA, Dec. 2015. 10.1109\/asru.2015.7404863","DOI":"10.1109\/ASRU.2015.7404863"},{"key":"6","doi-asserted-by":"publisher","unstructured":"[6] L. Lamel, J.-L. Gauvain, and G. Adda, \u201cLightly supervised and unsupervised acoustic model training,\u201d Computer Speech &amp; Language, vol.16, no.1, pp.115-129, Jan. 2002. 10.1006\/csla.2001.0186","DOI":"10.1006\/csla.2001.0186"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] J.-U. Bang, M.-Y. Choi, S.-H. Kim, and O.-W. Kwon, \u201cImproving speech recognizers by refining broadcast data with inaccurate subtitle timestamps,\u201d Interspeech 2017, pp.2929-2933, Stockholm, Sweden, Aug. 2017. 10.21437\/interspeech.2017-650","DOI":"10.21437\/Interspeech.2017-650"},{"key":"8","doi-asserted-by":"crossref","unstructured":"[8] P.J. Moreno, C. Joerg, J.M.V. Thong, and O. Glickman, \u201cA recursive algorithm for the forced alignment of very long audio segments,\u201d Proc. ICSLP, Sydney, Australia, Dec. 1998.","DOI":"10.21437\/ICSLP.1998-603"},{"key":"9","unstructured":"[9] A. Katsamanis, M. Black, P.G. Georgiou, L. Goldstein, and S. Narayanan, \u201cSailAlign: Robust long speech-text alignment,\u201d Proc. New tools and Methods for VLSPR, Philadelphia, pp.28-31, PA, USA, Jan. 2011."},{"key":"10","unstructured":"[10] N. Braunschweiler, M.J.F. Gales, and S. Buchholz., \u201cLightly supervised recognition for automatic alignment of large coherent speech recordings,\u201d Interspeech 2010, pp.2222-2225, Makuhari, Japan, Sept. 2010."},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, \u201cLibrispeech: An ASR corpus based on public domain audio books,\u201d 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5206-5210, Brisbane, QLD, Australia, Aug. 2015. 10.1109\/icassp.2015.7178964","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"12","doi-asserted-by":"publisher","unstructured":"[12] M.J.F. Gales, D.Y. Kim, P.C. Woodland, H.Y. Chan, D. Mrva, R. Sinha, and S.E. Tranter, \u201cProgress in the CU-HTK broadcast news transcription system,\u201d IEEE Trans. Audio Speech Lang. Process., vol.14, no.5, pp.1513-1525, Sept. 2006. 10.1109\/tasl.2006.878264","DOI":"10.1109\/TASL.2006.878264"},{"key":"13","unstructured":"[13] V.I. Levenshtein, \u201cBinary codes capable of correcting deletions, insertions and reversals,\u201d Soviet Physics Doklady, vol.10, no.8, pp.707-710, Feb. 1966."},{"key":"14","unstructured":"[14] D. Povey, \u201cText alignment script of the Kaldi tools,\u201d https:\/\/github.com\/kaldi-asr\/kaldi\/blob\/master\/src\/bin\/compute-wer.cc."},{"key":"15","unstructured":"[15] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, \u201cThe Kaldi speech recognition toolkit,\u201d Proc. Automatic Speech Recognition and Understanding (ASRU), Big Island, Hawaii, USA, Dec. 2011."},{"key":"16","unstructured":"[16] D. Povey, \u201cNeural-network training script of the Kaldi nnet2 version,\u201d https:\/\/github.com\/kaldi-asr\/kaldi\/blob\/master\/egs\/wsj\/s5\/steps\/nnet-2\/train_block.sh."},{"key":"17","doi-asserted-by":"crossref","unstructured":"[17] A. Stolcke, \u201cSRILM-an extensible language modeling toolkit,\u201d Interspeech 2002, pp.901-904, Denver, Colorado, USA, Sept. 2002.","DOI":"10.21437\/ICSLP.2002-303"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E103.D\/2\/E103.D_2019EDP7234\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,13]],"date-time":"2022-10-13T20:41:26Z","timestamp":1665693686000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E103.D\/2\/E103.D_2019EDP7234\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,1]]},"references-count":17,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2019edp7234","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,1]]}}}