{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T13:33:04Z","timestamp":1753882384410,"version":"3.41.2"},"reference-count":33,"publisher":"World Scientific Pub Co Pte Ltd","issue":"01","funder":[{"name":"Open Research Fund Program of Beijing National Research Center for Information Science and Technology","award":["BNR2021KF02005"],"award-info":[{"award-number":["BNR2021KF02005"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. As. Lang. Proc."],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:p> The Putonghua Proficiency Test (Putonghua Shuiping Ceshi, PSC) is a speaking test in China. The propositional speaking section in PSC focuses on the ability of speakers to express ideas fluently and accurately without textual reference. However, unlike other sections of the PSC, propositional speaking is still scored manually, which can result in inefficiency, high costs, and subjectivity. To address these issues, an automatic speech fluency evaluation method based on multimodality is proposed. First, different neural networks are used to extract unimodal features. Then, cross-modal attention is applied to achieve multimodal fusion. Finally, fluency evaluation results are obtained using self-attention to reinforce high-contributing information. The proposed method achieves 81.67[Formula: see text] accuracy on a self-built dataset, demonstrating that combining textual and acoustic features provides complementary information to improve automatic speech fluency evaluation accuracy. <\/jats:p>","DOI":"10.1142\/s2717554523500017","type":"journal-article","created":{"date-parts":[[2023,4,30]],"date-time":"2023-04-30T08:04:33Z","timestamp":1682841873000},"source":"Crossref","is-referenced-by-count":0,"title":["Automatic Speech Fluency Evaluation Method Based on Multimodality for Putonghua Proficiency Test Propositional Speaking"],"prefix":"10.1142","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3840-4871","authenticated-orcid":false,"given":"Jiajun","family":"Liu","sequence":"first","affiliation":[{"name":"School of Software, Xinjiang University, No. 666 Shengli Avenue, Urumqi, Xinjiang, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aishan","family":"Wumaier","sequence":"additional","affiliation":[{"name":"School of Science and Engineering, Xinjiang University, No. 666 Shengli Avenue, Urumqi, Xinjiang, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linna","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Science and Engineering, Xinjiang University, No. 666 Shengli Avenue, Urumqi, Xinjiang, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huazhen","family":"Meng","sequence":"additional","affiliation":[{"name":"School of Science and Engineering, Xinjiang University, No. 666 Shengli Avenue, Urumqi, Xinjiang, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2023,5,31]]},"reference":[{"issue":"4","key":"S2717554523500017BIB001","first-page":"91","volume":"29","author":"Guo X.","year":"2007","journal-title":"J. Xiangtan Normal Univ."},{"key":"S2717554523500017BIB002","doi-asserted-by":"publisher","DOI":"10.1142\/9789812772961_0018"},{"key":"S2717554523500017BIB003","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2009.04.009"},{"key":"S2717554523500017BIB004","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2022-896"},{"key":"S2717554523500017BIB005","first-page":"1687","volume-title":"SMC\u201903 Conf. Proc. 2003 IEEE Int. Conf. Systems, Man and Cybernetics. Conf. Theme-System Security and Assurance (Cat. No. 03CH37483)","volume":"2","author":"Liu D.","year":"2003"},{"volume-title":"Second Language Studies: Acquisition, Learning, Education and Technology","year":"2010","author":"Bhat S.","key":"S2717554523500017BIB006"},{"key":"S2717554523500017BIB007","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2014-358"},{"key":"S2717554523500017BIB008","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2015.07.006"},{"key":"S2717554523500017BIB009","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-0501"},{"key":"S2717554523500017BIB010","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2018-1336"},{"key":"S2717554523500017BIB011","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682187"},{"key":"S2717554523500017BIB012","doi-asserted-by":"publisher","DOI":"10.1109\/BMEI.2011.6098713"},{"key":"S2717554523500017BIB013","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053452"},{"key":"S2717554523500017BIB014","doi-asserted-by":"publisher","DOI":"10.1109\/O-COCOSDA202152914.2021.9660601"},{"key":"S2717554523500017BIB015","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2021-1258"},{"key":"S2717554523500017BIB016","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2009.4960712"},{"key":"S2717554523500017BIB017","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9747391"},{"key":"S2717554523500017BIB018","first-page":"6438843","volume":"2022","author":"Yang H.","year":"2022","journal-title":"Wirel. Commun. Mob. Comput."},{"key":"S2717554523500017BIB019","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2018.8639583"},{"issue":"9","key":"S2717554523500017BIB020","first-page":"341","volume":"5","author":"Boersma P.","year":"2001","journal-title":"Glot. Int."},{"key":"S2717554523500017BIB021","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502224"},{"key":"S2717554523500017BIB022","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"S2717554523500017BIB023","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2011.11.004"},{"key":"S2717554523500017BIB024","unstructured":"J. Devlin,  M. W. Chang,  K. Lee and  K. Toutanova ,  Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol.  1  (Minneapolis, MN,  2019),  pp. 4171\u20134186."},{"key":"S2717554523500017BIB025","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani A.","year":"2017"},{"key":"S2717554523500017BIB026","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1656"},{"key":"S2717554523500017BIB027","doi-asserted-by":"publisher","DOI":"10.3390\/s21144913"},{"key":"S2717554523500017BIB028","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9746924"},{"key":"S2717554523500017BIB029","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-020-09345-z"},{"key":"S2717554523500017BIB030","first-page":"8026","volume":"32","author":"Paszke A.","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S2717554523500017BIB031","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"S2717554523500017BIB032","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1115"},{"key":"S2717554523500017BIB033","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1209"}],"container-title":["International Journal of Asian Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S2717554523500017","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,5]],"date-time":"2023-06-05T06:01:33Z","timestamp":1685944893000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S2717554523500017"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3]]},"references-count":33,"journal-issue":{"issue":"01","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["10.1142\/S2717554523500017"],"URL":"https:\/\/doi.org\/10.1142\/s2717554523500017","relation":{},"ISSN":["2717-5545","2424-791X"],"issn-type":[{"type":"print","value":"2717-5545"},{"type":"electronic","value":"2424-791X"}],"subject":[],"published":{"date-parts":[[2022,3]]},"article-number":"2350001"}}