{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:29:03Z","timestamp":1777656543275,"version":"3.51.4"},"reference-count":31,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2023,9,30]],"date-time":"2023-09-30T00:00:00Z","timestamp":1696032000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Improvement of the Quality of Life and Activity for the Elderly","award":["MIS 5047294"],"award-info":[{"award-number":["MIS 5047294"]}]},{"name":"Improvement of the Quality of Life and Activity for the Elderly","award":["NSRF 2014-2020"],"award-info":[{"award-number":["NSRF 2014-2020"]}]},{"name":"Competitiveness, Entrepreneurship and Innovation","award":["MIS 5047294"],"award-info":[{"award-number":["MIS 5047294"]}]},{"name":"Competitiveness, Entrepreneurship and Innovation","award":["NSRF 2014-2020"],"award-info":[{"award-number":["NSRF 2014-2020"]}]},{"name":"European Regional Development Fund","award":["MIS 5047294"],"award-info":[{"award-number":["MIS 5047294"]}]},{"name":"European Regional Development Fund","award":["NSRF 2014-2020"],"award-info":[{"award-number":["NSRF 2014-2020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>This work describes a methodology for sound event detection in domestic environments. Efficient solutions in this task can support the autonomous living of the elderly. The methodology deals with the \u201cChallenge on Detection and Classification of Acoustic Scenes and Events (DCASE)\u201d 2023, and more specifically with Task 4a \u201cSound event detection of domestic activities\u201d. This task involves the detection of 10 common events in domestic environments in 10 s sound clips. The events may have arbitrary duration in the 10 s clip. The main components of the methodology are data augmentation on mel-spectrograms that represent the sound clips, feature extraction by passing spectrograms through a frequency-dynamic convolution network with an extra attention module in sequence with each convolution, concatenation of these features with BEATs embeddings, and use of BiGRU for sequence modeling. Also, a mean teacher model is employed for leveraging unlabeled data. This research focuses on the effect of data augmentation techniques, of the feature extraction models, and on self-supervised learning. The main contribution is the proposed feature extraction model, which uses weighted attention on frequency in each convolution, combined in sequence with a local attention module adopted by computer vision. The proposed system features promising and robust performance.<\/jats:p>","DOI":"10.3390\/info14100534","type":"journal-article","created":{"date-parts":[[2023,10,2]],"date-time":"2023-10-02T04:28:08Z","timestamp":1696220888000},"page":"534","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Sound Event Detection in Domestic Environment Using Frequency-Dynamic Convolution and Local Attention"],"prefix":"10.3390","volume":"14","author":[{"given":"Grigorios-Aris","family":"Cheimariotis","sequence":"first","affiliation":[{"name":"Electrical and Computer Engineering Department, Democritus University of Thrace, 67100 Xanthi, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0898-6102","authenticated-orcid":false,"given":"Nikolaos","family":"Mitianoudis","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, Democritus University of Thrace, 67100 Xanthi, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1109\/MSP.2015.2503881","article-title":"Monitoring activities of daily living in smart homes: Understanding human behavior","volume":"33","author":"Debes","year":"2016","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1109\/TITB.2009.2037317","article-title":"SVM-based multimodal classification of activities of daily living in health smart homes: Sensors, algorithms, and first experimental results","volume":"14","author":"Fleury","year":"2010","journal-title":"IEEE Trans. Inform. Technol. Biomed."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Popescu, M., Li, Y., Skubic, M., and Rantz, M. (2008, January 20\u201325). An acoustic fall detector system that uses sound height information to reduce the false alarm rate. Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada.","DOI":"10.1109\/IEMBS.2008.4650244"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., and Ritter, M. (2017, January 5\u20139). Audio Set: An ontology and human-labeled dataset for audio events. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Serizel, R., Turpault, N., Shah, A., and Salamon, J. (2020, January 4\u20138). Sound event detection in synthetic domestic environments. Proceedings of the ICASSP 2020\u20132020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9054478"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Nam, H., Kim, S.H., and Park, Y.H. (2022, January 22\u201327). Filteraugment: An acoustic environmental data augmentation method. Proceedings of the ICASSP 2022\u20132022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9747680"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Nam, H., Kim, S.H., Ko, B.Y., and Park, Y.H. (2022, January 18\u201322). Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection. Proceedings of the Interspeech 2022, Incheon, Republic of Korea.","DOI":"10.21437\/Interspeech.2022-10127"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Xiao, S., Zhang, X., and Zhang, P. (2023, January 4\u201310). Multi-Dimensional Frequency Dynamic Convolution with Confident Mean Teacher for Sound Event Detection. Proceedings of the ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece.","DOI":"10.1109\/ICASSP49357.2023.10096306"},{"key":"ref_9","unstructured":"Kim, J.W., Son, S.W., Song, Y., Kook, H., Song, I.H., and Lim, J.E. (2023). Semi-supervised learning-based sound event detection using frequency dynamic convolution with large kernel attention for DCASE challenge 2023 task 4. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shao, N., Loweimi, E., and Li, X. (2021). RCT: Random Consistency Training for Semi-supervised Sound Event Detection. arXiv.","DOI":"10.21437\/Interspeech.2022-10037"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Koh, C.-Y., Chen, Y.-S., Liu, Y.-W., and Bai, M.R. (2021, January 6\u201311). Sound Event Detection by Consistency Training and Pseudo-Labeling with Feature-Pyramid Convolutional Recurrent Neural Networks. Proceedings of the ICASSP 2021\u20142021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9414350"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kim, S.J., and Chung, Y.J. (2022). Multi-Scale Features for Transformer Model to Improve the Performance of Sound Event Detection. Appl. Sci., 12.","DOI":"10.3390\/app12052626"},{"key":"ref_13","unstructured":"Miyazaki, K., Komatsu, T., Hayashi, T., Watanabe, S., Toda, T., and Takeda, K. (2020, January 2\u20133). Conformer-based Sound event detection with semi-supervised learning and data augmentation. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020, Tokyo, Japan."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Du, Q., and Luo, Y. (2022, January 10\u201313). You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding. Proceedings of the 2022 IEEE 42nd International Conference on Distributed Computing Systems Workshops (ICDCSW), Bologna, Italy.","DOI":"10.1109\/ICDCSW56584.2022.00035"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Niizumi, D., Takeuchi, D., Ohishi, Y., Harada, N., and Kashino, K. (2021, January 18\u201322). Byol for audio: Self-supervised learning for general-purpose audio representation. Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China.","DOI":"10.1109\/IJCNN52387.2021.9534474"},{"key":"ref_16","unstructured":"Chen, S., Wu, Y., Wang, C., Liu, S., Tompkins, D., Chen, Z., and Wei, F. (2022). BEATs: Audio Pre-Training with Acoustic Tokenizers. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Cho, \u039a., Merri\u00ebnboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_18","unstructured":"Ashish, V., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is All you Need. Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA."},{"key":"ref_19","first-page":"1196","article-title":"Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results","volume":"2017","author":"Tarvainen","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_20","unstructured":"Duo, X., Fang, W., and Li, J. (2023, September 28). Semi-Supervised Sound Event Detection System for DCASE 2023 Task4a. DCASE2023 Challenge. Technical Report June 2023. Available online: https:\/\/dcase.community\/documents\/challenge2023\/technical_reports\/DCASE2023_Wenxin_97_t4a.pdf."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, K., Du, X., Zhu, B., Ma, Z., Berg-Kirkpatrick, T., and Dubnov, S. (2022, January 23\u201327). HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection. Proceedings of the ICASSP 2022\u20142022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746312"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Gulati, A., Qin, J., Chiu, C.-C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., and Wu, Y. (2020). Conformer: Convolution-augmented transformer for speech recognition. arXiv.","DOI":"10.21437\/Interspeech.2020-3015"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1007\/s41095-023-0364-2","article-title":"Visual attention network","volume":"9","author":"Guo","year":"2023","journal-title":"Comp. Visual Media"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lucic, M., and Schmid, C. (2021, January 11\u201317). Vivit: A Video Vision Transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Online.","DOI":"10.1109\/ICCV48922.2021.00676"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3292","DOI":"10.1109\/TASLP.2021.3120633","article-title":"Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation","volume":"29","author":"Gong","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_26","unstructured":"Xiao, S., Shen, J., Hu, A., Zhang, X., Zhang, P., and Yan, P.Y. (2023, September 28). Sound Event Detection with Weak Prediction for DCASE 2023 Challenge Task4A, DCASE2023 Challenge. Technical Report, June 2023. Available online: https:\/\/dcase.community\/documents\/challenge2023\/technical_reports\/DCASE2023_Zhang_63_t4a.pdf."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ebbers, J., Haeb-Umbach, R., and Serizel, R. (2023). Post-Processing Independent Evaluation of Sound Event Detection Systems. arXiv.","DOI":"10.1109\/ICASSP43922.2022.9747556"},{"key":"ref_28","unstructured":"Loshchilov, I., and Hutter, F. (2017, January 24\u201326). Decoupled Weight Decay Regularization. Proceedings of the International Conference on Learning Representations, Toulon, France."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Bilen, C., Ferroni, G., Tuveri, F., Azcarreta, J., and Krstulovi\u0107, S. (2020, January 4\u20138). A Framework for the Robust Evaluation of Sound Event Detection. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9052995"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ebbers, J., Serizel, R., and Haeb-Umbach, R. (2022, January 23\u201327). Threshold independent evaluation of sound event detection scores. Proceedings of the ICASSP 2022\u20142022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9747556"},{"key":"ref_31","first-page":"387","article-title":"MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection","volume":"37","author":"Chen","year":"2023","journal-title":"Proc. AAAI Conf. Artif. Intell."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/10\/534\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:02:41Z","timestamp":1760130161000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/10\/534"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,30]]},"references-count":31,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["info14100534"],"URL":"https:\/\/doi.org\/10.3390\/info14100534","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,30]]}}}