{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:37:02Z","timestamp":1750307822424,"version":"3.41.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2008,6,1]],"date-time":"2008-06-01T00:00:00Z","timestamp":1212278400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2008,6]]},"abstract":"<jats:p>Tone recognition has been a basic but important task for speech\nrecognition and assessment of tonal languages, such as Mandarin\nChinese. Most previously proposed approaches adopt a two-step\napproach where syllables within an utterance are identified via\nforced alignment first, and tone recognition using a variety of\nclassifiers---such as neural networks, Gaussian mixture models\n(GMM), hidden Markov models (HMM), support vector machines\n(SVM)---is then performed on each segmented syllable to predict its\ntone. However, forced alignment does not always generate accurate\nsyllable boundaries, leading to unstable voiced-unvoiced detection\nand deteriorating performance in tone recognition. Aiming to\nalleviate this problem, we propose a robust approach called Tone\nRecognition Using Extended Segments (TRUES) for HMM-based\ncontinuous tone recognition. The proposed approach extracts an\nunbroken pitch contour from a given utterance based on dynamic\nprogramming over time-domain acoustic features of average magnitude\ndifference function (AMDF). The pitch contour of each syllable is\nthen extended for tri-tone HMM modeling, such that the influence\nfrom inaccurate syllable boundaries is lessened. Our experimental\nresults demonstrate that the proposed TRUES achieves 49.13%\nrelative error rate reduction over that of the recently proposed\nsupratone modeling, which is deemed the state of the art of tone\nrecognition that outperforms several previously proposed\napproaches. The encouraging improvement demonstrates the\neffectiveness and robustness of the proposed TRUES, as well as the\ncorresponding pitch determination algorithm which produces unbroken\npitch contours.<\/jats:p>","DOI":"10.1145\/1386869.1386872","type":"journal-article","created":{"date-parts":[[2008,8,27]],"date-time":"2008-08-27T11:56:36Z","timestamp":1219838196000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["TRUES"],"prefix":"10.1145","volume":"7","author":[{"given":"Jiang-Chun","family":"Chen","sequence":"first","affiliation":[{"name":"National Tsing Hua University, Taiwan"}]},{"given":"Jyh-Shing Roger","family":"Jang","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2008,6]]},"reference":[{"volume-title":"Praat: Doing phonetics by computer Version 4.6.34","year":"2007","author":"Boersma P.","key":"e_1_2_1_1_1"},{"volume-title":"Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP\u201900)","author":"Chang E.","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","unstructured":"Chao Y. R. 1968. A Grammar of Spoken Chinese. University of California Press Berkeley CA. Chao Y. R. 1968. A Grammar of Spoken Chinese . University of California Press Berkeley CA."},{"volume-title":"Proceedings of the 5th European Conference on Speech Communication and Technology (EUROSPEECH\u201997)","author":"Chen C. J.","key":"e_1_2_1_4_1"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSPI\u201901)","author":"Chen C. J.","key":"e_1_2_1_5_1"},{"volume-title":"Proceedings of the 4th International Symposium on Chinese Spoken Language Processing (ISCSLP\u201904)","author":"Chen J. C.","key":"e_1_2_1_6_1"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.366544"},{"volume-title":"Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP\u201902)","year":"2002","author":"Hosom J. P.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","unstructured":"Huang X. Acero A. and Hon H. W. 2001. Spoken Language Processing. Prentice Hall PTR Upper Saddle River NJ Chap. 12. Huang X. Acero A. and Hon H. W. 2001. Spoken Language Processing . Prentice Hall PTR Upper Saddle River NJ Chap. 12."},{"volume-title":"Proceedings of the 3rd International Symposium on Chinese Spoken Language Processing (ISCSLP\u201902)","author":"Jang J. S. R.","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/595576.595581"},{"volume-title":"Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH\u201905)","author":"Lin C. Y.","key":"e_1_2_1_12_1"},{"volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ICASSP\u201903)","author":"Lin W. Y.","key":"e_1_2_1_13_1"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2004.09.004"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.2717413"},{"volume-title":"Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP\u201900)","author":"Seide F.","key":"e_1_2_1_16_1"},{"key":"e_1_2_1_17_1","unstructured":"SFS. Speech Filing System. http:\/\/www.phon.ucl.ac.uk\/resource\/sfs.html SFS. Speech Filing System. http:\/\/www.phon.ucl.ac.uk\/resource\/sfs.html"},{"volume-title":"Speech Coding and Synthesis","author":"Talkin D.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","unstructured":"Tang Poetry Corpus. 2002--2006. Available in http:\/\/mir.cs.nthu.edu.tw\/research\/corpus\/ tangpoetry. Tang Poetry Corpus . 2002--2006. Available in http:\/\/mir.cs.nthu.edu.tw\/research\/corpus\/ tangpoetry."},{"key":"e_1_2_1_20_1","first-page":"455","article-title":"Multi-space probability distribution HMM. IEICE","volume":"3","author":"Tokuda K.","year":"2002","journal-title":"Trans. Inform. Syst. E85-D"},{"volume-title":"Proceedings of the 3rd ESCA\/COCOSDA International Workshop on Speech Synthesis. 207--212","author":"Toledano D. T.","key":"e_1_2_1_21_1"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1343--1346","author":"Wang C.","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/11939993_47"},{"volume-title":"Proceedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH\u201906)","author":"Wang H. L.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001494000115"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP\u201994)","author":"Woodland P. C.","key":"e_1_2_1_26_1"},{"key":"e_1_2_1_27_1","unstructured":"Young S. Evermann G. Kershaw D. Moore G. Odell J. Ollason D. Valtchev V. and Woodland P. 2002. The HTK Book (HTK Version 3.2). Cambridge University Cambridge UK. Young S. Evermann G. Kershaw D. Moore G. Odell J. Ollason D. Valtchev V. and Woodland P. 2002. The HTK Book (HTK Version 3.2). Cambridge University Cambridge UK."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/11939993_61"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1386869.1386872","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1386869.1386872","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:57:48Z","timestamp":1750255068000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1386869.1386872"}},"subtitle":["Tone Recognition Using Extended Segments"],"short-title":[],"issued":{"date-parts":[[2008,6]]},"references-count":28,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2008,6]]}},"alternative-id":["10.1145\/1386869.1386872"],"URL":"https:\/\/doi.org\/10.1145\/1386869.1386872","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2008,6]]},"assertion":[{"value":"2007-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}