{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,30]],"date-time":"2026-06-30T07:16:51Z","timestamp":1782803811027,"version":"3.54.5"},"publisher-location":"Cham","reference-count":24,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031441943","type":"print"},{"value":"9783031441950","type":"electronic"}],"license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T00:00:00Z","timestamp":1695340800000},"content-version":"vor","delay-in-days":264,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. However, fine-tuning on data from new domains is usually accompanied by a decrease in performance on the original domain. Therefore, in our experiments, we examine how well the performance of large-scale ASR models can be approximated for smaller domains, with our own dataset of German Senior Voice Commands (SVC-de), and how much of the general speech recognition performance can be preserved by selectively freezing parts of the model during training. To further increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain, we apply Experience Replay\u00a0[20] for continual learning. By adding only a fraction of data from the original domain, we are able to reach Word-Error-Rates (WERs) below 5% on the new domain, while stabilizing performance for general speech recognition at acceptable WERs.<\/jats:p>","DOI":"10.1007\/978-3-031-44195-0_40","type":"book-chapter","created":{"date-parts":[[2023,9,21]],"date-time":"2023-09-21T12:04:08Z","timestamp":1695297848000},"page":"489-500","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Replay to\u00a0Remember: Continual Layer-Specific Fine-Tuning for\u00a0German Speech Recognition"],"prefix":"10.1007","author":[{"given":"Theresa","family":"Pekarek Rosin","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefan","family":"Wermter","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2023,9,22]]},"reference":[{"key":"40_CR1","unstructured":"Ardila, R., et al.: Common voice: a massively-multilingual speech corpus. In: Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (2020)"},{"key":"40_CR2","doi-asserted-by":"crossref","unstructured":"Babu, A., et al.: XLS-R: self-supervised Cross-lingual Speech Representation Learning at Scale. In: Proceedings of INTERSPEECH 2022, pp. 2278\u20132282. ISCA, Incheon, Korea (2022)","DOI":"10.21437\/Interspeech.2022-143"},{"key":"40_CR3","unstructured":"Baevski, A., Zhou, H., Mohamed, A., Auli, M.: Wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS). Curran Associates Inc., Vancouver, BC, Canada (2020)"},{"key":"40_CR4","doi-asserted-by":"crossref","unstructured":"Chan, W., Jaitly, N., Le, Q., Vinyals, O.: LIsten, attend and spell: a neural network for large vocabulary conversational speech recognition. In: Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960\u20134964. IEEE Press, Shanghai, China (2016)","DOI":"10.1109\/ICASSP.2016.7472621"},{"key":"40_CR5","doi-asserted-by":"crossref","unstructured":"Conneau, A., Baevski, A., Collobert, R., Mohamed, A., Auli, M.: Unsupervised cross-lingual representation learning for speech recognition. In: Proceedings of INTERSPEECH 2021, pp. 2426\u20132430. ISCA, Brno, Czechia (2021)","DOI":"10.21437\/Interspeech.2021-329"},{"key":"40_CR6","doi-asserted-by":"crossref","unstructured":"Graves, A.: Sequence transduction with recurrent neural networks. In: ICML 2012 Workshop on Representation Learning (2012)","DOI":"10.1007\/978-3-642-24797-2_3"},{"key":"40_CR7","unstructured":"Grosman, J.: Fine-tuned XLSR-53 Large model for speech recognition in German. https:\/\/huggingface.co\/jonatasgrosman\/wav2vec2-large-xlsr-53-german (2021)"},{"key":"40_CR8","unstructured":"Grosman, J.: Fine-tuned XLS-R 1B model for speech recognition in German. https:\/\/huggingface.co\/jonatasgrosman\/wav2vec2-xls-r-1b-german (2022)"},{"key":"40_CR9","doi-asserted-by":"crossref","unstructured":"Gulati, A., et al.: Conformer: convolution-augmented transformer for speech recognition. In: Proceedings of INTERSPEECH 2020, pp. 5036\u20135040. ISCA, Shanghai, China (2020)","DOI":"10.21437\/Interspeech.2020-3015"},{"key":"40_CR10","unstructured":"Huang, B.: Fine-tuned whisper model for speech recognition in German. https:\/\/huggingface.co\/bofenghuang\/whisper-small-cv11-german (2022)"},{"key":"40_CR11","doi-asserted-by":"crossref","unstructured":"Huang, W., Hu, W., Yeung, Y.T., Chen, X.: Conv-transformer transducer: low latency, low frame rate, streamable end-to-end speech recognition. In: Proceedings of INTERSPEECH 2020, pp. 5001\u20135005. ISCA, Shanghai, China (2020)","DOI":"10.21437\/Interspeech.2020-2361"},{"key":"40_CR12","doi-asserted-by":"crossref","unstructured":"Huang, Y., Ye, G., Li, J., Gong, Y.: Rapid speaker adaptation for conformer transducer: attention and Bias are all you need. In: Proceedings of INTERSPEECH 2021, pp. 1309\u20131313. ISCA, Brno, Czechia (2021)","DOI":"10.21437\/Interspeech.2021-1884"},{"key":"40_CR13","unstructured":"Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings of 7th International Conference on Learning Representations (ICLR). New Orleans, LA, USA (2019)"},{"key":"40_CR14","doi-asserted-by":"crossref","unstructured":"MacDonald, R.L., et al.: Disordered speech data collection: lessons learned at 1 million utterances from project Euphonia. In: Proceedings of INTERSPEECH 2021, pp. 3066\u20133070. ISCA, Brno, Czech Republic (2021)","DOI":"10.21437\/Interspeech.2021-697"},{"key":"40_CR15","unstructured":"McDowell, A.: Fine-tuned XLS-R 300M model for speech recognition in German. https:\/\/huggingface.co\/AndrewMcDowell\/wav2vec2-xls-r-300m-german-de (2022)"},{"key":"40_CR16","doi-asserted-by":"crossref","unstructured":"Moro-Velazquez, L., et al.: Study of the performance of automatic speech recognition systems in speakers with Parkinson\u2019s Disease. In: Proceedings of INTERSPEECH 2019, pp. 3875\u20133879. ISCA, Graz, Austria (2019)","DOI":"10.21437\/Interspeech.2019-2993"},{"key":"40_CR17","doi-asserted-by":"publisher","unstructured":"Ngueajio, M.K., Washington, G.: Hey ASR system! Why aren\u2019t you more inclusive? In: Chen, J.Y.C., Fragomeni, G., Degen, H., Ntoa, S. (eds.) HCI International 2022 \u2013 Late Breaking Papers: Interacting with eXtended Reality and Artificial Intelligence. HCII 2022. LNCS, vol. 13518. Springer, Cham (2022). https:\/\/doi.org\/10.1007\/978-3-031-21707-4_30","DOI":"10.1007\/978-3-031-21707-4_30"},{"key":"40_CR18","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1016\/j.neunet.2019.01.012","volume":"113","author":"GI Parisi","year":"2019","unstructured":"Parisi, G.I., Kemker, R., Part, J.L., Kanan, C., Wermter, S.: Continual lifelong learning with neural networks: a review. Neural Netw. 113, 54\u201371 (2019)","journal-title":"Neural Netw."},{"key":"40_CR19","unstructured":"Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. arXiv:2212.04356 (2022)"},{"key":"40_CR20","unstructured":"Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., Wayne, G.: Experience replay for continual learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), pp. 348\u2013358. Curran Associates Inc, Vancouver, BC, Canada (2019)"},{"key":"40_CR21","doi-asserted-by":"crossref","unstructured":"Shor, J., et al.: Personalizing ASR for Dysarthric and accented speech with limited data. In: Proceedings of INTERSPEECH 2019, pp. 784\u2013788. ISCA, Graz, Austria (2019)","DOI":"10.21437\/Interspeech.2019-1427"},{"key":"40_CR22","doi-asserted-by":"crossref","unstructured":"Shrivastava, H., Garg, A., Cao, Y., Zhang, Y., Sainath, T.N.: Echo state speech recognition. In: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5669\u20135673. IEEE Press, Toronto, ON, Canada (2021)","DOI":"10.1109\/ICASSP39728.2021.9414495"},{"key":"40_CR23","doi-asserted-by":"crossref","unstructured":"Vander Eeckt, S., Van Hamme, H.: Continual learning for monolingual end-to-end automatic speech recognition. In: Proceedings of 30th European Signal Processing Conference (EUSIPCO), pp. 459\u2013463. IEEE Press, Belgrade, Serbia (2022)","DOI":"10.23919\/EUSIPCO55093.2022.9909589"},{"key":"40_CR24","unstructured":"Vaswani, A., et al.: attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), pp. 5998\u20136008. Curran Associates Inc, Long Beach, CA, USA (2017)"}],"container-title":["Lecture Notes in Computer Science","Artificial Neural Networks and Machine Learning \u2013 ICANN 2023"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-44195-0_40","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,21]],"date-time":"2023-09-21T12:10:13Z","timestamp":1695298213000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-44195-0_40"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"ISBN":["9783031441943","9783031441950"],"references-count":24,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-44195-0_40","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023]]},"assertion":[{"value":"22 September 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ICANN","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Artificial Neural Networks","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Heraklion","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Greece","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2023","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 September 2023","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"29 September 2023","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"32","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"icann2023","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/e-nns.org\/icann2023\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"easyacademia.org","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"947","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"426","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"22","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"45% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"2.4","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"4","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"type of other papers accepted  : 9 Abstract","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}