{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T14:44:34Z","timestamp":1774968274593,"version":"3.50.1"},"reference-count":75,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2022,11,8]],"date-time":"2022-11-08T00:00:00Z","timestamp":1667865600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Regional Development Fund (ERDF)","award":["POCI-01-0247-FEDER-041435"],"award-info":[{"award-number":["POCI-01-0247-FEDER-041435"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Audio recognition can be used in smart cities for security, surveillance, manufacturing, autonomous vehicles, and noise mitigation, just to name a few. However, urban sounds are everyday audio events that occur daily, presenting unstructured characteristics containing different genres of noise and sounds unrelated to the sound event under study, making it a challenging problem. Therefore, the main objective of this literature review is to summarize the most recent works on this subject to understand the current approaches and identify their limitations. Based on the reviewed articles, it can be realized that Deep Learning (DL) architectures, attention mechanisms, data augmentation techniques, and pretraining are the most crucial factors to consider while creating an efficient sound classification model. The best-found results were obtained by Mushtaq and Su, in 2020, using a DenseNet-161 with pretrained weights from ImageNet, and NA-1 and NA-2 as augmentation techniques, which were of 97.98%, 98.52%, and 99.22% for UrbanSound8K, ESC-50, and ESC-10 datasets, respectively. Nonetheless, the use of these models in real-world scenarios has not been properly addressed, so their effectiveness is still questionable in such situations.<\/jats:p>","DOI":"10.3390\/s22228608","type":"journal-article","created":{"date-parts":[[2022,11,8]],"date-time":"2022-11-08T08:52:56Z","timestamp":1667897576000},"page":"8608","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["Sound Classification and Processing of Urban Environments: A Systematic Literature Review"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9413-3300","authenticated-orcid":false,"given":"Ana Filipa Rodrigues","family":"Nogueira","sequence":"first","affiliation":[{"name":"Faculdade de Ci\u00eancias, Universidade do Porto, Rua do Campo Alegre 1021 1055, 4169-007 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4948-550X","authenticated-orcid":false,"given":"Hugo S.","family":"Oliveira","sequence":"additional","affiliation":[{"name":"Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s\/n, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1094-0114","authenticated-orcid":false,"given":"Jos\u00e9 J. M.","family":"Machado","sequence":"additional","affiliation":[{"name":"Departamento de Engenharia Mec\u00e2nica, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s\/n, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7603-6526","authenticated-orcid":false,"given":"Jo\u00e3o Manuel R. S.","family":"Tavares","sequence":"additional","affiliation":[{"name":"Departamento de Engenharia Mec\u00e2nica, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s\/n, 4200-465 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"429","DOI":"10.3390\/smartcities4020024","article-title":"IoT in Smart Cities: A Survey of Technologies, Practices and Challenges","volume":"4","author":"Syed","year":"2021","journal-title":"Smart Cities"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Virtanen, T., Plumbley, M.D., and Ellis, D. (2018). Sound Analysis in Smart Cities. Computational Analysis of Sound Scenes and Events, Springer International Publishing.","DOI":"10.1007\/978-3-319-63450-0"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Mushtaq, Z., and Su, S.F. (2020). Efficient Classification of Environmental Sounds through Multiple Features Aggregation and Data Enhancement Techniques for Spectrogram Images. Symmetry, 12.","DOI":"10.3390\/sym12111822"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Das, J.K., Chakrabarty, A., and Piran, M.J. (2021). Environmental sound classification using convolution neural networks with different integrated loss functions. Expert Syst., 39.","DOI":"10.1111\/exsy.12804"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Das, J.K., Ghosh, A., Pal, A.K., Dutta, S., and Chakrabarty, A. (2020, January 27\u201329). Urban Sound Classification Using Convolutional Neural Network and Long Short Term Memory Based on Multiple Features. Proceedings of the 2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS), Hong Kong, China.","DOI":"10.1109\/ICDS50568.2020.9268723"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"21552","DOI":"10.1038\/s41598-021-01045-4","article-title":"Environmental sound classification using temporal-frequency attention based convolutional neural network","volume":"11","author":"Mu","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_7","unstructured":"MacIntyre, J., Maglogiannis, I., Iliadis, L., and Pimenidis, E. Recognition of Urban Sound Events Using Deep Context-Aware Feature Extractors and Handcrafted Features. Proceedings of the Artificial Intelligence Applications and Innovations."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"107819","DOI":"10.1016\/j.apacoust.2020.107819","article-title":"Ensemble of handcrafted and deep features for urban sound classification","volume":"175","author":"Luz","year":"2021","journal-title":"Appl. Acoust."},{"key":"ref_9","unstructured":"Gong, Y., Chung, Y., and Glass, J.R. (2022, October 01). AST: Audio Spectrogram Transformer. CoRR, Available online: http:\/\/xxx.lanl.gov\/abs\/2104.01778."},{"key":"ref_10","unstructured":"Akbari, H., Yuan, L., Qian, R., Chuang, W., Chang, S., Cui, Y., and Gong, B. (2022, October 01). VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. CoRR, Available online: http:\/\/xxx.lanl.gov\/abs\/2104.11178."},{"key":"ref_11","unstructured":"Elliott, D., Otero, C.E., Wyatt, S., and Martino, E. (2022, October 01). Tiny Transformers for Environmental Sound Classification at the Edge. CoRR, Available online: http:\/\/xxx.lanl.gov\/abs\/2103.12157."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wyatt, S., Elliott, D., Aravamudan, A., Otero, C.E., Otero, L.D., Anagnostopoulos, G.C., Smith, A.O., Peter, A.M., Jones, W., and Leung, S. (2021, January 14\u201331). Environmental Sound Classification with Tiny Transformers in Noisy Edge Environments. Proceedings of the 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA.","DOI":"10.1109\/WF-IoT51360.2021.9596007"},{"key":"ref_13","unstructured":"Park, S., Jeong, Y., and Lee, T. (2021, January 15\u201319). Many-to-Many Audio Spectrogram Tansformer: Transformer for Sound Event Localization and Detection. Proceedings of the DCASE, Barcelona, Spain."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Koutini, K., Schl\u00fcter, J., Eghbal-zadeh, H., and Widmer, G. (2022, October 01). Efficient Training of Audio Transformers with Patchout. CoRR, Available online: http:\/\/xxx.lanl.gov\/abs\/2110.05069.","DOI":"10.21437\/Interspeech.2022-227"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"108660","DOI":"10.1016\/j.apacoust.2022.108660","article-title":"Connectogram\u2014A graph-based time dependent representation for sounds","volume":"191","author":"Aksu","year":"2022","journal-title":"Appl. Acoust."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2450","DOI":"10.1109\/TASLP.2020.3014737","article-title":"Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization","volume":"28","author":"Kong","year":"2020","journal-title":"IEEE\/Acm Trans. Audio Speech Lang. Process."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1109\/LSP.2017.2657381","article-title":"Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification","volume":"24","author":"Salamon","year":"2017","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_18","first-page":"1","article-title":"Multiclass Audio Segmentation Based on Recurrent Neural Networks for Broadcast Domain Data","volume":"5","author":"Gimeno","year":"2020","journal-title":"J Audio Speech Music Proc."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"130327","DOI":"10.1109\/ACCESS.2019.2939495","article-title":"Learning Attentive Representations for Environmental Sound Classification","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1016\/j.neucom.2020.08.069","article-title":"Attention based convolutional recurrent neural network for environmental sound classification","volume":"453","author":"Zhang","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Qiao, T., Zhang, S., Cao, S., and Xu, S. (2021). High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism. Sensors, 21.","DOI":"10.3390\/s21165500"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1016\/j.neucom.2021.06.031","article-title":"Environment sound classification using an attention-based residual neural network","volume":"460","author":"Tripathi","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ristea, N.C., Ionescu, R.T., and Khan, F.S. (2022). SepTr: Separable Transformer for Audio Spectrogram Processing. arXiv.","DOI":"10.21437\/Interspeech.2022-249"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1016\/j.jclinepi.2021.02.003","article-title":"Updating guidance for reporting systematic reviews: development of the PRISMA 2020 statement","volume":"134","author":"Page","year":"2021","journal-title":"J. Clin. Epidemiol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zinemanas, P., Rocamora, M., Miron, M., Font, F., and Serra, X. (2021). An Interpretable Deep Learning Model for Automatic Sound Classification. Electronics, 10.","DOI":"10.3390\/electronics10070850"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., and Nieto, O. (2015, January 6\u201312). librosa: Audio and music signal analysis in python. Proceedings of the 14th Python in Science Conference, Austin, TX, USA.","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"ref_27","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_28","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1002\/wics.199","article-title":"The Bayesian information criterion: background, derivation, and applications","volume":"4","author":"Neath","year":"2012","journal-title":"Wiley Interdiscip. Rev. Comput. Stat."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Joyce, J.M. (2011). Kullback-leibler divergence. International Encyclopedia of Statistical Science, Springer.","DOI":"10.1007\/978-3-642-04898-2_327"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1514","DOI":"10.1002\/aic.690330911","article-title":"Generalized likelihood ratio method for gross error identification","volume":"33","author":"Narasimhan","year":"1987","journal-title":"AIChE J."},{"key":"ref_32","first-page":"124","article-title":"The robustness of hotelling\u2019s T 2","volume":"62","author":"Holloway","year":"1967","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_33","first-page":"1","article-title":"An Overview of Automatic Audio Segmentation","volume":"6","author":"Theodorou","year":"2014","journal-title":"Int. J. Inf. Technol. Comput. Sci."},{"key":"ref_34","unstructured":"Tax, T.M.S., Antich, J.L.D., Purwins, H., and Maal\u00f8e, L. (2017, January 4\u20139). Utilizing Domain Knowledge in End-to-End Audio Processing. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA."},{"key":"ref_35","first-page":"1925","article-title":"Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification","volume":"28","author":"Cobos","year":"2020","journal-title":"IEEE\/Acm Trans. Audio Speech Lang. Process."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"8245","DOI":"10.1007\/s10489-021-02314-5","article-title":"Multichannel environmental sound segmentation","volume":"51","author":"Sudo","year":"2021","journal-title":"Appl. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Venkatesh, S., Moffat, D., and Miranda, E.R. (2022, October 01). You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection, Available online: http:\/\/xxx.lanl.gov\/abs\/2109.00962.","DOI":"10.3390\/app12073293"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"4759","DOI":"10.1007\/s12652-021-03184-y","article-title":"Recognition of pulmonary diseases from lung sounds using convolutional neural networks and long short-term memory","volume":"13","author":"Fraiwan","year":"2022","journal-title":"J. Ambient. Intell. Humaniz. Comput."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.ins.2021.01.088","article-title":"Application of Petersen graph pattern technique for automated detection of heart valve diseases with PCG signals","volume":"565","author":"Tuncer","year":"2021","journal-title":"Inf. Sci."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"108152","DOI":"10.1016\/j.apacoust.2021.108152","article-title":"Heart sounds classification using convolutional neural network with 1D-local binary pattern and 1D-local ternary pattern features","volume":"180","author":"Er","year":"2021","journal-title":"Appl. Acoust."},{"key":"ref_41","first-page":"100206","article-title":"Heart sound classification using signal processing and machine learning algorithms","volume":"7","author":"Zeinali","year":"2022","journal-title":"Mach. Learn. Appl."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"10934","DOI":"10.1109\/ACCESS.2022.3144355","article-title":"Real-Time Multi-Level Neonatal Heart and Lung Sound Quality Assessment for Telehealth Applications","volume":"10","author":"Grooby","year":"2022","journal-title":"IEEE Access"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"117104","DOI":"10.1016\/j.eswa.2022.117104","article-title":"MFCC-based descriptor for bee queen presence detection","volume":"201","author":"Soares","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"106994","DOI":"10.1016\/j.compag.2022.106994","article-title":"Fusion of acoustic and deep features for pig cough sound recognition","volume":"197","author":"Shen","year":"2022","journal-title":"Comput. Electron. Agric."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/j.biosystemseng.2022.05.010","article-title":"Investigation of acoustic and visual features for pig cough classification","volume":"219","author":"Shen","year":"2022","journal-title":"Biosyst. Eng."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"107866","DOI":"10.1016\/j.apacoust.2020.107866","article-title":"Multileveled ternary pattern and iterative ReliefF based bird sound classification","volume":"176","author":"Tuncer","year":"2021","journal-title":"Appl. Acoust."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.neucom.2014.12.042","article-title":"Adaptive energy detection for bird sound detection in complex environments","volume":"155","author":"Zhang","year":"2015","journal-title":"Neurocomputing"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"3187","DOI":"10.1109\/TMM.2018.2834866","article-title":"Local Wavelet Acoustic Pattern: A Novel Time\u2013Frequency Descriptor for Birdsong Recognition","volume":"20","author":"Hsu","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.apacoust.2017.10.024","article-title":"Acoustic classification of frog within-species and species-specific calls","volume":"131","author":"Xie","year":"2018","journal-title":"Appl. Acoust."},{"key":"ref_50","first-page":"100202","article-title":"Frog calling activity detection using lightweight CNN with multi-view spectrogram: A case study on Kroombit tinker frog","volume":"7","author":"Xie","year":"2022","journal-title":"Mach. Learn. Appl."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"106852","DOI":"10.1016\/j.ecolind.2020.106852","article-title":"Automated species identification of frog choruses in environmental recordings using acoustic indices","volume":"119","author":"Brodie","year":"2020","journal-title":"Ecol. Indic."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"107375","DOI":"10.1016\/j.apacoust.2020.107375","article-title":"Multispecies bioacoustic classification using transfer learning of deep convolutional neural networks with pseudo-labeling","volume":"166","author":"Zhong","year":"2020","journal-title":"Appl. Acoust."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"101113","DOI":"10.1016\/j.ecoinf.2020.101113","article-title":"A pipeline for identification of bird and frog species in tropical soundscape recordings using a convolutional neural network","volume":"59","author":"LeBien","year":"2020","journal-title":"Ecol. Inform."},{"key":"ref_54","first-page":"3384","article-title":"Animal sounds classification scheme based on multi-feature network with mixed datasets","volume":"14","author":"Kim","year":"2020","journal-title":"Ksii Trans. Internet Inf. Syst."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1016\/j.sigpro.2011.10.001","article-title":"Audio based solutions for detecting intruders in wild areas","volume":"92","author":"Ghiurcau","year":"2012","journal-title":"Signal Process."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.ecolind.2016.12.018","article-title":"Automatic identification of rainfall in acoustic recordings","volume":"75","author":"Bedoya","year":"2017","journal-title":"Ecol. Indic."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"108478","DOI":"10.1016\/j.apacoust.2021.108478","article-title":"Rainfall observation using surveillance audio","volume":"186","author":"Wang","year":"2022","journal-title":"Appl. Acoust."},{"key":"ref_58","unstructured":"Peter, D., Alavi, A.H., Javadi, B., and Fernandes, S.L. (2020). Chapter 7\u2014Trends of Sound Event Recognition in Audio Surveillance: A Recent Review and Study. The Cognitive Approach in Cloud Computing and Internet of Things Technologies for Surveillance Tracking Systems, Intelligent Data-Centric Systems, Academic Press."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/j.eswa.2018.08.052","article-title":"Assessing the performances of different neural network architectures for the detection of screams and shouts in public transportation","volume":"117","author":"Laffitte","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_60","unstructured":"Arnault, A., Hanssens, B., and Riche, N. (2020). Urban Sound Classification: Striving towards a fair comparison. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1145\/3224204","article-title":"SONYC: A System for Monitoring, Analyzing, and Mitigating Urban Noise Pollution","volume":"62","author":"Bello","year":"2019","journal-title":"Commun. ACM"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"114839","DOI":"10.1016\/j.eswa.2021.114839","article-title":"Deep Belief Network based audio classification for construction sites monitoring","volume":"177","author":"Scarpiniti","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Aziz, S., Awais, M., Akram, T., Khan, U., Alhussein, M., and Aurangzeb, K. (2019). Automatic Scene Recognition through Acoustic Classification for Behavioral Robotics. Electronics, 8.","DOI":"10.3390\/electronics8050483"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1016\/j.future.2022.03.041","article-title":"Noise2Weight: On detecting payload weight from drones acoustic emissions","volume":"134","author":"Ibrahim","year":"2022","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Pramanick, D., Ansar, H., Kumar, H., Pranav, S., Tengshe, R., and Fatimah, B. (2021, January 6\u20138). Deep learning based urban sound classification and ambulance siren detector using spectrogram. Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India.","DOI":"10.1109\/ICCCNT51525.2021.9579778"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Fatimah, B., Preethi, A., Hrushikesh, V., Singh B., A., and Kotion, H.R. (2020, January 1\u20133). An automatic siren detection algorithm using Fourier Decomposition Method and MFCC. Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India.","DOI":"10.1109\/ICCCNT49239.2020.9225414"},{"key":"ref_67","unstructured":"Heittola, T., Mesaros, A., and Virtanen, T. (2020). Acoustic scene classification in dcase 2020 challenge: Generalization across devices and low complexity solutions. arXiv."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Salamon, J., Jacoby, C., and Bello, J.P. (2014, January 3\u20137). A dataset and taxonomy for urban sound research. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2655045"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Piczak, K.J. (2015, January 26\u201330). ESC: Dataset for environmental sound classification. Proceedings of the 23rd ACM international conference on Multimedia, Brisbane, Australia.","DOI":"10.1145\/2733373.2806390"},{"key":"ref_70","unstructured":"Koizumi, Y., Kawaguchi, Y., Imoto, K., Nakamura, T., Nikaido, Y., Tanabe, R., Purohit, H., Suefusa, K., Endo, T., and Yasuda, M. (2020). Description and discussion on DCASE2020 challenge task2: Unsupervised anomalous sound detection for machine condition monitoring. arXiv."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1109\/TAFFC.2014.2336244","article-title":"Crema-d: Crowd-sourced emotional multimodal actors dataset","volume":"5","author":"Cao","year":"2014","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Gemmeke, J.F., Ellis, D.P., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., and Ritter, M. (2017, January 5\u20139). Audio set: An ontology and human-labeled dataset for audio events. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Mesaros, A., Heittola, T., and Virtanen, T. (September, January 29). TUT Database for Acoustic Scene Classification and Sound Event Detection. Proceedings of the 24th European Signal Processing Conference 2016 (EUSIPCO 2016), Budapest, Hungary.","DOI":"10.1109\/EUSIPCO.2016.7760424"},{"key":"ref_74","first-page":"1720","article-title":"Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion","volume":"8","author":"Rachman","year":"2018","journal-title":"Int. J. Electr. Comput. Eng."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1109\/TASLP.2021.3133208","article-title":"FSD50K: An open dataset of human-labeled sound events","volume":"30","author":"Fonseca","year":"2022","journal-title":"IEEE\/ACM Trans. Audio, Speech, Lang. Process."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/22\/8608\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:12:38Z","timestamp":1760145158000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/22\/8608"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,8]]},"references-count":75,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["s22228608"],"URL":"https:\/\/doi.org\/10.3390\/s22228608","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,8]]}}}