{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T14:27:47Z","timestamp":1753885667332,"version":"3.41.2"},"reference-count":18,"publisher":"World Scientific Pub Co Pte Ltd","issue":"03n04","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U1836219"],"award-info":[{"award-number":["U1836219"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Institute for Guo Qiang of Tsinghua University","award":["2019GQG0001"],"award-info":[{"award-number":["2019GQG0001"]}]},{"DOI":"10.13039\/501100017582","name":"Beijing National Research Center for Information Science and Technology","doi-asserted-by":"crossref","award":["BNR2019TD01022"],"award-info":[{"award-number":["BNR2019TD01022"]}],"id":[{"id":"10.13039\/501100017582","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. As. Lang. Proc."],"published-print":{"date-parts":[[2021,9]]},"abstract":"<jats:p> Speech keyword search (KWS) is the task of automatically detecting the required keywords in continuous speech. Single-keyword detection can be regarded as the task of speech keyword wake-up. For many practical applications of these small vocabulary speech recognition tasks, it is costly and unnecessary to build a full large vocabulary speech recognition system. For tasks related to speech keyword search, insufficiency in data resources remains the main challenge so far. Speech pre-training has become an effective technique, showing its superiority in a variety of tasks. The key idea is to learn effective representations in settings where a large amount of unlabeled data is available to improve the performance while labeled data of downstream tasks are limited. This research focuses on the combination of unsupervised pre-training and keyword search based on the Keyword-Filler model and introduces unsupervised pre-training into speech keyword search. The research selects pre-trained model architecture Wav2vec2.0 including XLSR. The research results show that training with feature extracted by pre-trained model performs better than the baseline. In the case of low-resource condition, the baseline performance drops significantly, while the performance of the pre-trained tuned model does not decrease but even increases slightly in some intervals. It can be seen that the pre-trained model can be tuned to achieve better performance on very little data. This shows the advantage and application value of keyword search based on unsupervised pre-training. <\/jats:p>","DOI":"10.1142\/s2717554522500059","type":"journal-article","created":{"date-parts":[[2022,3,22]],"date-time":"2022-03-22T08:15:29Z","timestamp":1647936929000},"source":"Crossref","is-referenced-by-count":1,"title":["Keyword Search Based on Unsupervised Pre-Trained Acoustic Models"],"prefix":"10.1142","volume":"31","author":[{"given":"Xiner","family":"Li","sequence":"first","affiliation":[{"name":"Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, P. R. China"}]},{"given":"Jing","family":"Zhao","sequence":"additional","affiliation":[{"name":"Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, P. R. China"}]},{"given":"Wei-Qiang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, P. R. China"}]},{"given":"Zhiqiang","family":"Lv","sequence":"additional","affiliation":[{"name":"TEG AI, Tencent Inc, Beijing 100193, P. R. China"}]},{"given":"Shen","family":"Huang","sequence":"additional","affiliation":[{"name":"TEG AI, Tencent Inc, Beijing 100193, P. R. China"}]}],"member":"219","published-online":{"date-parts":[[2022,3,21]]},"reference":[{"key":"S2717554522500059BIB001","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2021.04.002"},{"key":"S2717554522500059BIB004","doi-asserted-by":"publisher","DOI":"10.1109\/29.103088"},{"key":"S2717554522500059BIB005","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-46016-0_17"},{"key":"S2717554522500059BIB006","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854370"},{"key":"S2717554522500059BIB008","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"S2717554522500059BIB009","first-page":"5999","volume":"30","author":"Vaswani A.","year":"2017","journal-title":"Adv. Neural Inform. Process. Syst."},{"key":"S2717554522500059BIB013","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461802"},{"key":"S2717554522500059BIB015","first-page":"3765","volume":"29","author":"Hyvarinen A.","year":"2016","journal-title":"Adv. Neural Inform. Process. Syst."},{"key":"S2717554522500059BIB018","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58621-8_45"},{"key":"S2717554522500059BIB019","first-page":"1597","volume-title":"Proc. Int. Conf. Machine Learning","author":"Chen T.","year":"2020"},{"key":"S2717554522500059BIB020","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1992.226116"},{"key":"S2717554522500059BIB021","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1985.1168253"},{"key":"S2717554522500059BIB022","first-page":"107","volume-title":"Proc. RIAO 2000 on Content-Based Multimedia Information Access","author":"Garofolo J. S.","year":"2000"},{"key":"S2717554522500059BIB023","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054423"},{"key":"S2717554522500059BIB026","first-page":"16","volume-title":"Proc. Fourth Int. Workshop on Spoken Language Technologies for Under-Resourced Languages","author":"Gales M. J.","year":"2014"},{"key":"S2717554522500059BIB028","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"S2717554522500059BIB029","doi-asserted-by":"publisher","DOI":"10.1109\/ICSDA.2017.8384449"},{"key":"S2717554522500059BIB030","first-page":"1","volume-title":"Proc. IEEE Workshop on Automatic Speech Recognition and Understanding","author":"Povey D.","year":"2011"}],"container-title":["International Journal of Asian Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S2717554522500059","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T02:18:47Z","timestamp":1660789127000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S2717554522500059"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9]]},"references-count":18,"journal-issue":{"issue":"03n04","published-print":{"date-parts":[[2021,9]]}},"alternative-id":["10.1142\/S2717554522500059"],"URL":"https:\/\/doi.org\/10.1142\/s2717554522500059","relation":{},"ISSN":["2717-5545","2424-791X"],"issn-type":[{"type":"print","value":"2717-5545"},{"type":"electronic","value":"2424-791X"}],"subject":[],"published":{"date-parts":[[2021,9]]},"article-number":"2250005"}}