{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T16:10:22Z","timestamp":1781194222761,"version":"3.54.1"},"reference-count":47,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2020,7,3]],"date-time":"2020-07-03T00:00:00Z","timestamp":1593734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020","doi-asserted-by":"publisher","award":["779158."],"award-info":[{"award-number":["779158."]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Spanish Ministry of Science, Innovation and Universities","award":["DIN2018-009982"],"award-info":[{"award-number":["DIN2018-009982"]}]},{"name":"Spanish Ministry of Science, Innovation and Universities","award":["PTQ-17-09106"],"award-info":[{"award-number":["PTQ-17-09106"]}]},{"name":"Spanish Ministry of Science, Innovation and Universities","award":["RTI2018-097045-B-C21"],"award-info":[{"award-number":["RTI2018-097045-B-C21"]}]},{"DOI":"10.13039\/501100002924","name":"FEDER","doi-asserted-by":"publisher","award":["RTI2018-097045-B-C21"],"award-info":[{"award-number":["RTI2018-097045-B-C21"]}],"id":[{"id":"10.13039\/501100002924","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system. Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solutions aimed at addressing both limitations. This paper proposes an audio OSR\/FSL system divided into three steps: a high-level audio representation, feature embedding using two different autoencoder architectures and a multi-layer perceptron (MLP) trained on latent space representations to detect known classes and reject unwanted ones. An extensive set of experiments is carried out considering multiple combinations of openness factors (OSR condition) and number of shots (FSL condition), showing the validity of the proposed approach and confirming superior performance with respect to a baseline system based on transfer learning.<\/jats:p>","DOI":"10.3390\/s20133741","type":"journal-article","created":{"date-parts":[[2020,7,6]],"date-time":"2020-07-06T09:49:11Z","timestamp":1594028951000},"page":"3741","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Open Set Audio Classification Using Autoencoders Trained on Few Data"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7503-1272","authenticated-orcid":false,"given":"Javier","family":"Naranjo-Alcazar","sequence":"first","affiliation":[{"name":"Visualfy, 46181 Benisan\u00f3, Spain"},{"name":"Computer Science Department, Universitat de Val\u00e8ncia, 46100 Burjassot, Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sergi","family":"Perez-Castanos","sequence":"additional","affiliation":[{"name":"Visualfy, 46181 Benisan\u00f3, Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3494-9954","authenticated-orcid":false,"given":"Pedro","family":"Zuccarello","sequence":"additional","affiliation":[{"name":"Visualfy, 46181 Benisan\u00f3, Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4545-0315","authenticated-orcid":false,"given":"Fabio","family":"Antonacci","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, 20133 Milan, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7318-3192","authenticated-orcid":false,"given":"Maximo","family":"Cobos","sequence":"additional","affiliation":[{"name":"Computer Science Department, Universitat de Val\u00e8ncia, 46100 Burjassot, Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2020,7,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Piczak, K.J. (2020, January 17\u201320). Environmental sound classification with convolutional neural networks. Proceedings of the 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA, USA.","DOI":"10.1109\/MLSP.2015.7324337"},{"key":"ref_2","unstructured":"Cak\u0131r, E., Heittola, T., and Virtanen, T. (2016, January 3). Domestic audio tagging with convolutional neural networks. Proceedings of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016), Budapest, Hungary."},{"key":"ref_3","unstructured":"Valenti, M., Diment, A., Parascandolo, G., Squartini, S., and Virtanen, T. (2016, January 3). DCASE 2016 acoustic scene classification using convolutional neural networks. Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, Budapest, Hungary."},{"key":"ref_4","unstructured":"Bae, S.H., Choi, I., and Kim, N.S. (2016, January 3). Acoustic scene classification using parallel combination of LSTM and CNN. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), Budapest, Hungary."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1733","DOI":"10.1109\/TMM.2015.2428998","article-title":"Detection and classification of acoustic scenes and events","volume":"17","author":"Stowell","year":"2015","journal-title":"IEEE Trans. Multimed."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2317","DOI":"10.1109\/TPAMI.2014.2321392","article-title":"Probability models for open set recognition","volume":"36","author":"Scheirer","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Battaglino, D., Lepauloux, L., and Evans, N. (2016, January 14\u201316). The open-set problem in acoustic scene classification. Proceedings of the 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), Xi\u2019an, China.","DOI":"10.1109\/IWAENC.2016.7602939"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1007\/s10994-016-5610-8","article-title":"Nearest neighbors distance ratio open-set classifier","volume":"106","author":"Werneck","year":"2017","journal-title":"Mach. Learn."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Bendale, A., and Boult, T.E. (2016, January 27\u201330). Towards open set deep networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.173"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Rakowski, A., and Kosmider, M. (2019, January 25\u201326). Frequency-Aware CNN for Open Set Acoustic Scene Classification. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA.","DOI":"10.33682\/en2t-9m14"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Lu, R., Wu, K., Duan, Z., and Zhang, C. (2017, January 5\u20139). Deep ranking: Triplet MatchNet for music metric learning. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952130"},{"key":"ref_12","unstructured":"Chen, K., and Salman, A. (2011, January 12\u201317). Extracting speaker-specific information with a regularized siamese deep network. Proceedings of the Advances in Neural Information Processing Systems (NIPS 2011), Granada, Spain."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Bredin, H. (2017, January 5\u20139). Tristounet: Triplet loss for speaker turn embedding. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7953194"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2009","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Cramer, J., Wu, H.H., Salamon, J., and Bello, J.P. (2019, January 27\u201330). Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Minneapolis, MN, USA.","DOI":"10.1109\/ICASSP.2019.8682475"},{"key":"ref_16","unstructured":"Bromley, J., Guyon, I., LeCun, Y., S\u00e4ckinger, E., and Shah, R. (December, January 28). Signature verification using a \u201csiamese\u201d time delay neural network. Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Melekhov, I., Kannala, J., and Rahtu, E. (2016, January 4\u20138). Siamese network features for image matching. Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico.","DOI":"10.1109\/ICPR.2016.7899663"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Schroff, F., Kalenichenko, D., and Philbin, J. (2015, January 7\u201312). Facenet: A unified embedding for face recognition and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hoffer, E., and Ailon, N. (2015). Deep metric learning using triplet network. International Workshop on Similarity-Based Pattern Recognition, Springer.","DOI":"10.1007\/978-3-319-24261-3_7"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zheng, Y., Pal, D.K., and Savvides, M. (2018, January 18\u201323). Ring loss: Convex feature normalization for face recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00534"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wen, Y., Zhang, K., Li, Z., and Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. European Conference On Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46478-7_31"},{"key":"ref_22","unstructured":"Naranjo-Alcazar, J., Perez-Castanos, S., Zuccarrello, P., and Cobos, M. (2020). An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, Y., Yao, Q., Kwok, J., and Ni, L.M. (2019). Generalizing from a few examples: A survey on few-shot learning. arXiv.","DOI":"10.1145\/3386252"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Masi, I., Wu, Y., Hassner, T., and Natarajan, P. (November, January 29). Deep face recognition: A survey. Proceedings of the 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Paran\u00e1, Brazil.","DOI":"10.1109\/SIBGRAPI.2018.00067"},{"key":"ref_25","unstructured":"Snell, J., Swersky, K., and Zemel, R. (2017, January 4\u20139). Prototypical networks for few-shot learning. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_26","unstructured":"Huang, G.B., Mattar, M., Berg, T., and Learned-Miller, E. (2008, January 12\u201318). Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments. Proceedings of the Workshop on Faces in \u2018Real-Life\u2019 Images: Detection, Alignment, and Recognition, Erik Learned-Miller and Andras Ferencz and Fr\u00e9d\u00e9ric Jurie, Marseille, France."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Geng, C., Huang, S.j., and Chen, S. (2020). Recent advances in open set recognition: A survey. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2020.2981604"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Kotz, S., and Nadarajah, S. (2000). Extreme Value Distributions: Theory and Applications, World Scientific.","DOI":"10.1142\/9781860944024"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Jain, L.P., Scheirer, W.J., and Boult, T.E. (2014). Multi-class open set recognition using probability of inclusion. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10578-9_26"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1690","DOI":"10.1109\/TPAMI.2016.2613924","article-title":"Sparse representation-based open set recognition","volume":"39","author":"Zhang","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Bendale, A., and Boult, T. (2015, January 7\u201312). Towards open world recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298799"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2624","DOI":"10.1109\/TPAMI.2013.83","article-title":"Distance-based image classification: Generalizing to new classes at near-zero cost","volume":"35","author":"Mensink","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Shu, L., Xu, H., and Liu, B. (2017). Doc: Deep open classification of text documents. arXiv.","DOI":"10.18653\/v1\/D17-1314"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kardan, N., and Stanley, K.O. (2017, January 14\u201319). Mitigating fooling with competitive overcomplete output layer neural networks. Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA.","DOI":"10.1109\/IJCNN.2017.7965897"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chung, Y.A., Wu, C.C., Shen, C.H., Lee, H.Y., and Lee, L.S. (2016). Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder. arXiv.","DOI":"10.21437\/Interspeech.2016-82"},{"key":"ref_36","unstructured":"Tagliasacchi, M., Gfeller, B., Quitry, F.d.C., and Roblek, D. (2019). Self-supervised audio representation learning for mobile devices. arXiv."},{"key":"ref_37","unstructured":"Quitry, F.d.C., Tagliasacchi, M., and Roblek, D. (2019). Learning audio representations via phase prediction. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., and Ritter, M. (2017, January 5\u20139). Audio Set: An ontology and human-labeled dataset for audio events. Proceedings of the IEEE ICASSP 2017, New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Koizumi, Y., Saito, S., Uematsu, H., Harada, N., and Imoto, K. (2019, January 20\u201323). ToyADMOS: A Dataset of Miniature-machine Operating Sounds for Anomalous Sound Detection. Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA.","DOI":"10.1109\/WASPAA.2019.8937164"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Purohit, H., Tanabe, R., Ichige, T., Endo, T., Nikaido, Y., Suefusa, K., and Kawaguchi, Y. (2019, January 25\u201326). MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA.","DOI":"10.33682\/m76f-d618"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wilkinghoff, K., and Kurth, F. (2019, January 25\u201326). Open-Set Acoustic Scene Classification with Deep Convolutional Autoencoders. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019, New York, NY, USA.","DOI":"10.33682\/340j-wd27"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Cakir, E., Heittola, T., Huttunen, H., and Virtanen, T. (2015, January 12\u201317). Polyphonic sound event detection using multi label deep neural networks. Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland.","DOI":"10.1109\/IJCNN.2015.7280624"},{"key":"ref_43","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1230","DOI":"10.1109\/TASLP.2017.2690563","article-title":"Unsupervised feature learning based on deep models for environmental audio tagging","volume":"25","author":"Xu","year":"2017","journal-title":"IEEE\/ACM Trans. Audio, Speech Lang. Process."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.A. (2008, January 5\u20139). Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.","DOI":"10.1145\/1390156.1390294"},{"key":"ref_46","unstructured":"Baldi, P. (2011, January 2). Autoencoders, unsupervised learning, and deep architectures. Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, Bellevue, Washington, DC, USA."},{"key":"ref_47","unstructured":"Mesaros, A., Heittola, T., and Virtanen, T. (2018, January 19\u201320). A multi-device dataset for urban acoustic scene classification. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), Surrey, UK."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/13\/3741\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:47:11Z","timestamp":1760176031000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/13\/3741"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,3]]},"references-count":47,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2020,7]]}},"alternative-id":["s20133741"],"URL":"https:\/\/doi.org\/10.3390\/s20133741","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,3]]}}}