{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:30:10Z","timestamp":1750221010492,"version":"3.41.0"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2019,8,10]],"date-time":"2019-08-10T00:00:00Z","timestamp":1565395200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,1,31]]},"abstract":"<jats:p>Accurate phonetic transcriptions are crucial for building robust acoustic models for speech recognition as well as speech synthesis applications. Phonetic transcriptions are not usually provided with speech corpora. A lexicon is used to generate phone-level transcriptions of speech corpora with sentence-level transcriptions. When lexical entries are not available, letter-to-sound (LTS) rules are used. Whether it is a lexicon or LTS, the rules for pronunciation are generic and may not match the spoken utterance. This can lead to transcription errors. The objective of this study is to address the issue of mismatch between the transcription and its acoustic realisation. In particular, the issue of vowel deletions is studied. Group-delay-based segmentation is used to determine insertion\/deletion of vowels in the speech utterance. The transcriptions are corrected in the training data based on this. The corrected data are used in automatic speech recognition (ASR) and text to speech synthesis (TTS) systems. ASR and TTS systems built with the corrected transcriptions show improvements in the performance.<\/jats:p>","DOI":"10.1145\/3342352","type":"journal-article","created":{"date-parts":[[2019,8,12]],"date-time":"2019-08-12T12:16:36Z","timestamp":1565612196000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Importance of Signal Processing Cues in Transcription Correction for Low-Resource Indian Languages"],"prefix":"10.1145","volume":"19","author":[{"given":"Jeena J.","family":"Prakash","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Madras, Chennai, Tamil Nadu, India"}]},{"given":"Golda Brunet","family":"Rajan","sequence":"additional","affiliation":[{"name":"Government College of Engineering Salem, Salem, Tamil Nadu"}]},{"given":"Hema A.","family":"Murthy","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Madras, Chennai, Tamil Nadu, India"}]}],"member":"320","published-online":{"date-parts":[[2019,8,10]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"{n.d.}. Indic TTS. Retrieved from https:\/\/www.iitm.ac.in\/donlab\/tts\/.  {n.d.}. Indic TTS. Retrieved from https:\/\/www.iitm.ac.in\/donlab\/tts\/."},{"volume-title":"Proceedings of the Spoken Language Technology Workshop (SLT\u201914)","author":"Abraham Basil","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2006.1660164"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IALP.2011.65"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2007.367209"},{"volume-title":"Proceedings of the International Conference on Text, Speech and Dialogue.","author":"Baby Arun","key":"e_1_2_1_6_1"},{"volume-title":"Murthy","year":"2017","author":"Baby Arun","key":"e_1_2_1_7_1"},{"volume-title":"Proceedings of the Community-Based Building of Language Resources (CBBLR\u201916)","year":"2016","author":"Baby Arun","key":"e_1_2_1_8_1"},{"volume-title":"Proceedings of the 3rd ESCA Workshop in Speech Synthesis. 77--80","year":"1998","author":"Black Alan W.","key":"e_1_2_1_9_1"},{"volume-title":"Proceedings of the National Conference on Communication.","author":"Deivapalan P.","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.1999.0123"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.03.005"},{"volume":"27","volume-title":"Proceedings of the Speech Transcription Workshop","author":"Evermann Gunnar","key":"e_1_2_1_13_1"},{"key":"e_1_2_1_14_1","first-page":"38","article-title":"A tutorial on pronunciation modeling for large vocabulary speech recognition","volume":"2705","author":"Fosler-Lussier Eric","year":"2000","journal-title":"Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.917681"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00034-017-0598-2"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.889790"},{"volume-title":"Spoken Language Processing: A Guide to Theory, Algorithm, and System Development","author":"Huang Xuedong","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1996.541110"},{"volume-title":"Proceedings of the 5th ISCA Workshop on Speech Synthesis. 223--224","author":"John Kominek","key":"e_1_2_1_20_1"},{"volume-title":"Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH'18)","year":"2018","author":"Prakash Jeena J.","key":"e_1_2_1_21_1"},{"volume-title":"Proceedings of the 5th ISCA Workshop on Speech Synthesis. 127--132","year":"2004","author":"Kim Yeon-Jun","key":"e_1_2_1_22_1"},{"volume-title":"Proceedings of the 5th ISCA Workshop on Speech Synthesis. 155--160","author":"Kominek John","key":"e_1_2_1_23_1"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12046-009-0006-0"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.2307\/747528"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1979.1170743"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201914)","author":"Liu Xunying","key":"e_1_2_1_27_1"},{"volume-title":"Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH\u201918)","year":"2018","author":"Mahesh M.","key":"e_1_2_1_28_1"},{"volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU\u201903)","author":"Maison Beno\u0131t","key":"e_1_2_1_29_1"},{"volume-title":"Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH\u201918)","year":"2018","key":"e_1_2_1_30_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/79.382443"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-6393(91)90011-H"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12046-011-0045-1"},{"volume-title":"Proceedings of the ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition. 115--118","author":"Nagarajan T.","key":"e_1_2_1_34_1"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1155\/S1110865704406210"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1049\/el:20030616"},{"volume-title":"Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding.","year":"2011","author":"Povey Daniel","key":"e_1_2_1_37_1"},{"volume-title":"Proceedings of INTERSPEECH. 327--331","author":"Prakash Anusha","key":"e_1_2_1_38_1"},{"volume-title":"Murthy","year":"2016","author":"Prakash Jeena J.","key":"e_1_2_1_39_1"},{"key":"e_1_2_1_40_1","first-page":"3","article-title":"Automatic segmentation of continuous speech using minimum phase group delay functions","volume":"42","author":"Prasad V. Kamakshi","year":"2004","journal-title":"Speech Commun."},{"volume-title":"Proceedings of the 8th ISCA Workshop on Speech Synthesis. 311--316","year":"2013","author":"Ramani B.","key":"e_1_2_1_41_1"},{"key":"e_1_2_1_42_1","unstructured":"A. Rudnicky. {n.d.}. Cmu lexicon. Retrieved from www.speech.cs.cmu.edu\/cgi-bin\/cmudict.  A. Rudnicky. {n.d.}. Cmu lexicon. Retrieved from www.speech.cs.cmu.edu\/cgi-bin\/cmudict."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2009.5373263"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2015.12.008"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2019.2908913"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952315"},{"volume-title":"Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH\u201914)","year":"2014","author":"Aswin Shanmugam S.","key":"e_1_2_1_47_1"},{"volume-title":"Proceedings of the 7th International Conference on Spoken Language Processing. 901--904","year":"2002","author":"Stolcke Andreas","key":"e_1_2_1_48_1"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178976"},{"volume-title":"Proceedings of the 8th Annual Conference of the International Speech Communication Association (INTERSPEECH\u201907)","year":"2007","author":"Tachibana Ryuki","key":"e_1_2_1_50_1"},{"volume-title":"Proceedings of the 23rd National Conference on Communications (NCC\u201917)","author":"Tanamala Swetha","key":"e_1_2_1_51_1"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2018-1178"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2009.03.004"},{"volume-title":"Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP\u201998)","year":"1998","author":"Weng Fuliang","key":"e_1_2_1_54_1"},{"volume-title":"Retrieved","year":"2018","key":"e_1_2_1_55_1"},{"key":"e_1_2_1_56_1","unstructured":"Steve Young Gunnar Evermann Mark Gales Thomas Hain Dan Kershaw Xunying Liu Gareth Moore Julian Odell Dave Ollason Dan Povey etal 2006. The HTK Book. Cambridge University Engineering Department. Retrieved from http:\/\/www.dsic.upv.es\/docs\/posgrado\/20\/RES\/materialesDocentes\/alejandroViewgraphs\/htkbook.pdf.  Steve Young Gunnar Evermann Mark Gales Thomas Hain Dan Kershaw Xunying Liu Gareth Moore Julian Odell Dave Ollason Dan Povey et al. 2006. The HTK Book. Cambridge University Engineering Department. Retrieved from http:\/\/www.dsic.upv.es\/docs\/posgrado\/20\/RES\/materialesDocentes\/alejandroViewgraphs\/htkbook.pdf."},{"volume-title":"Proceedings of the 6th ISCA Workshop on Speech Synthesis.","year":"2007","author":"Zen Heiga","key":"e_1_2_1_57_1"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCSLP.2016.7918446"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3342352","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3342352","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:26:02Z","timestamp":1750206362000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3342352"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,10]]},"references-count":58,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,31]]}},"alternative-id":["10.1145\/3342352"],"URL":"https:\/\/doi.org\/10.1145\/3342352","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2019,8,10]]},"assertion":[{"value":"2019-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}