{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:53:36Z","timestamp":1750308816481,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2011,3,1]],"date-time":"2011-03-01T00:00:00Z","timestamp":1298937600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001868","name":"National Science Council Taiwan","doi-asserted-by":"publisher","award":["NSC 98-2221-E-006-139-MY3"],"award-info":[{"award-number":["NSC 98-2221-E-006-139-MY3"]}],"id":[{"id":"10.13039\/501100001868","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2011,3]]},"abstract":"<jats:p>This article presents a probabilistic scheme for detecting the interruption point (IP) in spontaneous speech based on inter-syllable boundary-based prosodic features. Because of the high error rate in spontaneous speech recognition, a combined acoustic model considering both syllable and subsyllable recognition units, is firstly used to determine the inter-syllable boundaries and output the recognition confidence of the input speech. Based on the finding that IPs always occur at inter-syllable boundaries, a probability distribution of the prosodic features at the current potential IP is estimated. The Conditional Random Field (CRF) model, which employs the clustered prosodic features of the current potential IP and its preceding and succeeding inter-syllable boundaries, is employed to output the IP likelihood measure. Finally, the confidence of the recognized speech, the probability distribution of the prosodic features and the CRF-based IP likelihood measure are integrated to determine the optimal IP sequence of the input spontaneous speech. In addition, pitch reset and lengthening are also applied to improve the IP detection performance. The Mandarin Conversional Dialogue Corpus is adopted for evaluation. Experimental results show that the proposed IP detection approach obtains 10.56% and 6.5% more effective results than the hidden Markov model and the Maximum Entropy model respectively under the same experimental conditions. Besides, the IP detection error rate can be further reduced by 9.15% using pitch reset and lengthening information. The experimental results confirm that the proposed model based on inter-syllable boundary-based prosodic features can effectively detect the interruption point in spontaneous Mandarin speech.<\/jats:p>","DOI":"10.1145\/1929908.1929914","type":"journal-article","created":{"date-parts":[[2011,3,17]],"date-time":"2011-03-17T12:40:16Z","timestamp":1300365616000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Interruption Point Detection of Spontaneous Speech Using Inter-Syllable Boundary-Based Prosodic Features"],"prefix":"10.1145","volume":"10","author":[{"given":"Chung-Hsien","family":"Wu","sequence":"first","affiliation":[{"name":"National Cheng Kung University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei-Bin","family":"Liang","sequence":"additional","affiliation":[{"name":"National Cheng Kung University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jui-Feng","family":"Yeh","sequence":"additional","affiliation":[{"name":"National Chiayi University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,3]]},"reference":[{"key":"e_1_2_1_1_1","article-title":"NIST conducts rich transcription evaluation. IEEE Speech Lang","author":"Banerjee S.","year":"2009","journal-title":"Process. Tech. Comm. Newsl."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.3115\/981967.981975"},{"volume-title":"Praat: Doing phonetics by computer","year":"2009","author":"Boersma P.","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1218960"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Chen S. F. and Rosenfeld R. 1999. A Gaussian prior for smoothing maximum entropy models. Tech. rep. CMU-CS-99-108. Carnegie Mellon University. Chen S. F. and Rosenfeld R. 1999. A Gaussian prior for smoothing maximum entropy models. Tech. rep. CMU-CS-99-108. Carnegie Mellon University.","DOI":"10.21236\/ADA360974"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/1034678.1034742"},{"key":"e_1_2_1_7_1","unstructured":"Duda R. O. Hart P. E. and Stork D. G. 2001. Pattern Recognition 2nd Ed. Wiley Interscience Publication. Duda R. O. Hart P. E. and Stork D. G. 2001. Pattern Recognition 2nd Ed. Wiley Interscience Publication."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/973226.973229"},{"volume-title":"Proceedings of the Conference on Language Resources and Evaluation (LREC\u201906)","author":"Huang Z.","key":"e_1_2_1_9_1"},{"volume-title":"Proceedings of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT\/NAACL\u201904)","author":"Kim J.","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","unstructured":"Kudo T. 2009. CRF++: Yet another CRF toolkit. http:\/\/crfpp.sourceforge.net\/. Kudo T. 2009. CRF++: Yet another CRF toolkit. http:\/\/crfpp.sourceforge.net\/."},{"volume-title":"Proceedings of International Conference on Machine Learning (ICML\u201901)","author":"Lafferty J.","key":"e_1_2_1_12_1"},{"volume-title":"Proceedings of the International Conference on Spoken Language Processing (ICSLP\u201904)","year":"2004","author":"Lee C.-H.","key":"e_1_2_1_13_1"},{"volume-title":"Proceedings of IEEE Conference on Multimedia and Expo (ICME\u201908)","author":"Liang W.-B.","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2009.2014792"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01589116"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.878255"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.408547"},{"key":"e_1_2_1_19_1","unstructured":"NIST. 2004. Rich transcription (RT-04F) evaluation plan. www.nist.gov\/speech\/tests\/rt\/2004-fall\/docs\/rt04f-eval-plan-v14.pdf. NIST. 2004. Rich transcription (RT-04F) evaluation plan. www.nist.gov\/speech\/tests\/rt\/2004-fall\/docs\/rt04f-eval-plan-v14.pdf."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073445.1073473"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(00)00028-5"},{"volume-title":"Proceedings of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT\/NAACL\u201904)","author":"Snover M.","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","unstructured":"Strassel S. 2004. Simple metadata annotation specification version 6.2. linguistic data consortium. http:\/\/www.ldc.upenn.edu\/Projects\/MDE. Strassel S. 2004. Simple metadata annotation specification version 6.2. linguistic data consortium. http:\/\/www.ldc.upenn.edu\/Projects\/MDE."},{"volume-title":"Proceedings of the 3rd ESCA\/COCOSDA Workshop on Speech Synthesis (ESCA\u201998)","author":"Toledano D. T.","key":"e_1_2_1_25_1"},{"volume-title":"Proceedings of the International Conference on Speech Prosody (SP\u201904)","author":"Tseng C.-Y.","key":"e_1_2_1_26_1"},{"volume-title":"-F","year":"2002","author":"Tseng S.-C.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","unstructured":"The Association for Computational Linguistics and Chinese Language Processing (ACLCLP). Brief introduction to TCC-300 corpus. http:\/\/www.aclclp.org.tw\/doc\/tcc_doc.PDF. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP) . Brief introduction to TCC-300 corpus. http:\/\/www.aclclp.org.tw\/doc\/tcc_doc.PDF."},{"edition":"2","volume-title":"Information Retrieval","author":"Van Rijsbergen C. J.","key":"e_1_2_1_29_1"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.878267"},{"volume-title":"Proceedings of the European Conference on Speech Communication and Technology (INTERSPEECH\u201907)","author":"Yeh J.-F.","key":"e_1_2_1_31_1"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1929908.1929914","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1929908.1929914","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:26:32Z","timestamp":1750278392000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1929908.1929914"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,3]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,3]]}},"alternative-id":["10.1145\/1929908.1929914"],"URL":"https:\/\/doi.org\/10.1145\/1929908.1929914","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2011,3]]},"assertion":[{"value":"2010-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}