{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T12:14:15Z","timestamp":1771330455496,"version":"3.50.1"},"reference-count":27,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2023,3,23]],"date-time":"2023-03-23T00:00:00Z","timestamp":1679529600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"research fund from Chosun University, 2021"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This study proposes a sound event localization and detection (SELD) method using imbalanced real and synthetic data via a multi-generator. The proposed method is based on a residual convolutional neural network (RCNN) and a transformer encoder for real spatial sound scenes. SELD aims to classify the sound event, detect the onset and offset of the classified event, and estimate the direction of the sound event. In Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Task 3, SELD is performed with a few real spatial sound scene data and a relatively large number of synthetic data. When a model is trained using imbalanced data, it can proceed by focusing only on a larger number of data. Thus, a multi-generator that samples real and synthetic data at a specific rate in one batch is proposed to prevent this problem. We applied the data augmentation technique SpecAugment and used time-frequency masking to the dataset. Furthermore, we propose a neural network architecture to apply the RCNN and transformer encoder. Several models were trained with various structures and hyperparameters, and several ensemble models were obtained by \u201ccherry-picking\u201d specific models. Based on the experiment, the single model of the proposed method and the model applied with the ensemble exhibited improved performance compared with the baseline model.<\/jats:p>","DOI":"10.3390\/s23073398","type":"journal-article","created":{"date-parts":[[2023,3,24]],"date-time":"2023-03-24T03:16:46Z","timestamp":1679627806000},"page":"3398","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Sound Event Localization and Detection Using Imbalanced Real and Synthetic Data via Multi-Generator"],"prefix":"10.3390","volume":"23","author":[{"given":"Yeongseo","family":"Shin","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Chosun University, Gwangju 61452, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3361-8360","authenticated-orcid":false,"given":"Chanjun","family":"Chun","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Chosun University, Gwangju 61452, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1109\/TASLP.2015.2389618","article-title":"Robust sound event classification using deep neural networks","volume":"23","author":"McLoughlin","year":"2015","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Benetos, E., and Dixon, S. (2011, January 22\u201327). Polyphonic music transcription using note onset and offset detection. Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic.","DOI":"10.1109\/ICASSP.2011.5946322"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/MSP.2021.3090678","article-title":"Sound event detection: A tutorial","volume":"38","author":"Mesaros","year":"2021","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1109\/TITS.2015.2470216","article-title":"Audio surveillance of roads: A system for detecting anomalous sounds","volume":"17","author":"Foggia","year":"2016","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Drossos, K., Adavanne, S., and Virtanen, T. (2017, January 15\u201318). Automated audio captioning with recurrent neural networks. Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA.","DOI":"10.1109\/WASPAA.2017.8170058"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chazan, S.E., Hammer, H., Hazan, G., Goldberger, J., and Gannot, S. (2019, January 2\u20136). Multi-microphone speaker separation based on deep DOA estimation. Proceedings of the European Signal Processing Conference (EUSIPCO), A Coruna, Spain.","DOI":"10.23919\/EUSIPCO.2019.8903121"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1109\/LSP.2016.2583658","article-title":"DNN-based feature enhancement using DOA-constrained ICA for robust speech recognition","volume":"23","author":"Lee","year":"2016","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"3912","DOI":"10.1121\/1.5042222","article-title":"Sound source localization and speech enhancement with sparse Bayesian learning beamforming","volume":"143","author":"Xenaki","year":"2018","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_9","unstructured":"Politis, A., Adavanne, S., and Virtanen, T. (2020, January 2\u20134). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), Virtual."},{"key":"ref_10","unstructured":"Politis, A., Adavanne, S., Krause, D., Deleforge, A., Srivastava, P., and Virtanen, T. (2021, January 15\u201319). A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection. Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), Virtual."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1749","DOI":"10.1109\/TASLP.2022.3173054","article-title":"Salsa: Spatial cue-augmented log-spectrogram features for polyphonic sound event localization and detection","volume":"30","author":"Nguyen","year":"2022","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_12","unstructured":"Nguyen, T.N.T., Jones, D.L., Watcharasupat, K.N., Phan, H., and Gan, W.S. (2022, January 22\u201327). SALSA-Lite: A fast and effective feature for polyphonic sound event localization and detection with microphone arrays. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shimada, K., Koyama, Y., Takahashi, N., Takahashi, S., and Mitsufuji, Y. (2021, January 6\u201311). ACCDOA: Activity-coupled cartesian direction of arrival representation for sound event localization and detection. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413609"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Shimada, K., Koyama, Y., Takahashi, S., Takahashi, N., Tsunoo, E., and Mitsufuji, Y. (2022, January 22\u201327). Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.31219\/osf.io\/f4kax"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/JSTSP.2018.2885636","article-title":"Sound event localization and detection of overlapping sources using convolutional recurrent neural networks","volume":"13","author":"Adavanne","year":"2018","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Cao, Y., Kong, Q., Iqbal, T., An, F., Wang, W., and Plumbley, M.D. (2019, January 25\u201326). Polyphonic Sound Event Detection and Localization Using a Two-Stage strategy. Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), New York, NY, USA.","DOI":"10.33682\/4jhy-bj81"},{"key":"ref_17","unstructured":"Politis, A., Shimada, K., Sudarsanam, P., Adavanne, S., Krause, D., Koyama, Y., Takahashi, N., Takahashi, S., Mitsufuji, Y., and Virtanen, T. (2022). STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events. arXiv."},{"key":"ref_18","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"684","DOI":"10.1109\/TASLP.2020.3047233","article-title":"Overview and evaluation of sound event localization and detection in DCASE 2019","volume":"29","author":"Politis","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_21","unstructured":"Ahonen, J., Pulkki, V., and Lokki, T. (2007, January 15\u201317). Teleconference application and B-format microphone array for directional audio coding. Proceedings of the Audio Engineering Society Conference, Saariselka, Finland."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yasuda, M., Koizumi, Y., Saito, S., Uematsu, H., and Imoto, K. (2020, January 4\u20138). Sound event localization based on sound intensity vector refined by DNN-based denoising and source separation. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9054462"},{"key":"ref_23","unstructured":"Dozat, T. (2016, January 2\u20134). Incorporating Nesterov Momentum into Adam. Proceedings of the International Conference on Learning Representations (ICLR), San Juan, PR, USA."},{"key":"ref_24","unstructured":"Loshchilov, I., and Hutter, F. (2017, January 24\u201326). SGDR: Stochastic gradient descent with warm restarts. Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Park, D.S., Chan, W., Zhang, Y., Chiu, C.C., Zoph, B., Cubuk, E.D., and Le, Q.V. (2019, January 15\u201319). Specaugment: A simple data augmentation method for automatic speech recognition. Proceedings of the Interspeech, Graz, Austria.","DOI":"10.21437\/Interspeech.2019-2680"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Dietterich, T.G. (2000, January 21\u201323). Ensemble methods in machine learning. Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy.","DOI":"10.1007\/3-540-45014-9_1"},{"key":"ref_27","unstructured":"(2022, August 31). Dcase.Community. Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes\u2014DCASE. Available online: https:\/\/dcase.community\/challenge2022\/task-sound-event-localization-and-detection-evaluated-in-real-spatial-sound-scenes-results."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/7\/3398\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:01:39Z","timestamp":1760122899000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/7\/3398"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,23]]},"references-count":27,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["s23073398"],"URL":"https:\/\/doi.org\/10.3390\/s23073398","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,23]]}}}