{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T02:12:17Z","timestamp":1774404737138,"version":"3.50.1"},"reference-count":89,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T00:00:00Z","timestamp":1739318400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T00:00:00Z","timestamp":1739318400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["517437545"],"award-info":[{"award-number":["517437545"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004115","name":"Gottfried Wilhelm Leibniz Universit\u00e4t Hannover","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004115","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In this paper, a detailed investigation of deep learning-based speaker detection and localization (SDL) with higher-order Ambisonics signals is conducted. Different spherical harmonic (SH) input features such as the higher-order pseudointensity vector (HO-PIV), relative harmonic coefficients (RHCs), and the spatially-localized pseudointensity vector (SL-PIV), a feature proposed for the first time as an input feature for deep learning-based SDL, are examined using first- to fourth-order SH signals. The trained neural networks, optimized with a single loss function for the combined tasks of detection and localization, are then evaluated in detail for overall SDL performance as well as their performance in the sub-tasks of detection and, particularly, localization. The results are further analyzed in dependence on room reverberation, signal-to-interference ratio (SIR), as well as the number and distances between multiple simultaneously active speakers, utilizing both simulated and measured data. The findings indicate an overall improvement in SDL performance up to third-order Ambisonics for all investigated features, while using fourth-order signals does not yield any further improvement or sometimes even delivers worse results. Notably, the HO-PIV and the SL-PIV, both extensions of the first-order pseudointensity vector (FO-PIV), have proven to be suitable input features. In particular the newly proposed SL-PIV has been found to be the best of the investigated features on third- and fourth-order Ambisonics signals, especially in the most demanding scenarios on measured data, with multiple, closely located speakers and poor SIR.<\/jats:p>","DOI":"10.1186\/s13636-025-00393-7","type":"journal-article","created":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T03:21:11Z","timestamp":1739330471000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization"],"prefix":"10.1186","volume":"2025","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4302-5569","authenticated-orcid":false,"given":"Nils","family":"Poschadel","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephan","family":"Preihs","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J\u00fcrgen","family":"Peissig","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,2,12]]},"reference":[{"issue":"5","key":"393_CR1","doi-asserted-by":"publisher","first-page":"1086","DOI":"10.1109\/TPAMI.2017.2648793","volume":"40","author":"ID Gebru","year":"2018","unstructured":"I.D. Gebru, S. Ba, X. Li, R. Horaud, Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1086\u20131099 (2018). https:\/\/doi.org\/10.1109\/TPAMI.2017.2648793","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"393_CR2","doi-asserted-by":"publisher","unstructured":"L. Perotin, R. Serizel, E. Vincent, A. Gu\u00e9rin, in Proceedings of the 16th International Workshop on Acoustic Signal Enhancement (IWAENC). Crnn-based joint azimuth and elevation localization with the ambisonics intensity vector (Tokyo, 2018). https:\/\/doi.org\/10.1109\/IWAENC.2018.8521403","DOI":"10.1109\/IWAENC.2018.8521403"},{"key":"393_CR3","doi-asserted-by":"publisher","unstructured":"T. Lotter, H.W. L\u00f6llmann, P. Vary, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Robust direction of arrival estimation for speech enhancement in noisy reverberant rooms (Orlando, 2002). https:\/\/doi.org\/10.1109\/ICASSP.2002.5745659","DOI":"10.1109\/ICASSP.2002.5745659"},{"key":"393_CR4","doi-asserted-by":"publisher","first-page":"984","DOI":"10.1109\/TASLP.2023.3346643","volume":"32","author":"D Berghi","year":"2024","unstructured":"D. Berghi, P.J. Jackson, Leveraging Visual Supervision for Array-Based Active Speaker Detection and Localization. IEEE\/ACM Trans. Audio Speech Lang. Process. 32, 984\u2013995 (2024). https:\/\/doi.org\/10.1109\/TASLP.2023.3346643","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"393_CR5","doi-asserted-by":"publisher","unstructured":"K. Shimada, Y. Koyama, N. Takahashi, S. Takahashi, Y. Mitsufuji, in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection (IEEE, 2021), pp. 915\u2013919. https:\/\/doi.org\/10.1109\/ICASSP39728.2021.9413609","DOI":"10.1109\/ICASSP39728.2021.9413609"},{"issue":"1","key":"393_CR6","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1109\/JSTSP.2018.2885636","volume":"13","author":"S Adavanne","year":"2019","unstructured":"S. Adavanne, A. Politis, J. Nikunen, T. Virtanen, Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J. Sel. Top. Signal Process. 13(1), 34\u201348 (2019). https:\/\/doi.org\/10.1109\/JSTSP.2018.2885636","journal-title":"IEEE J. Sel. Top. Signal Process."},{"issue":"2","key":"393_CR7","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1109\/TASLP.2018.2877892","volume":"27","author":"FR St\u00f6ter","year":"2019","unstructured":"F.R. St\u00f6ter, S. Chakrabarty, B. Edler, E.A.P. Habets, CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning. IEEE\/ACM Trans. Audio Speech Lang. Process. 27(2), 268\u2013282 (2019). https:\/\/doi.org\/10.1109\/TASLP.2018.2877892","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"393_CR8","doi-asserted-by":"publisher","unstructured":"P.A. Grumiaux, S. Kiti\u0107, L. Girin, A. Gu\u00e9rin, in 2020 28th European Signal Processing Conference (EUSIPCO). High-Resolution Speaker Counting in Reverberant Rooms Using CRNN with Ambisonics Features (IEEE, 2020), pp. 71\u201375. https:\/\/doi.org\/10.23919\/Eusipco47968.2020.9287637","DOI":"10.23919\/Eusipco47968.2020.9287637"},{"key":"393_CR9","doi-asserted-by":"publisher","unstructured":"P.A. Grumiaux, S. Kiti\u0107, L. Girin, A. Gu\u00e9rin, in Forum Acusticum 2020. Multichannel source counting with CRNN: analysis of the performance (Lyon, 2020), pp. 829\u2013835. https:\/\/doi.org\/10.48465\/fa.2020.0766","DOI":"10.48465\/fa.2020.0766"},{"issue":"4","key":"393_CR10","doi-asserted-by":"publisher","first-page":"320","DOI":"10.1109\/TASSP.1976.1162830","volume":"24","author":"CH Knapp","year":"1976","unstructured":"C.H. Knapp, G.C. Carter, The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320\u2013327 (1976). https:\/\/doi.org\/10.1109\/TASSP.1976.1162830","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"issue":"3","key":"393_CR11","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1109\/TAP.1986.1143830","volume":"34","author":"RO Schmidt","year":"1986","unstructured":"R.O. Schmidt, Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276\u2013280 (1986). https:\/\/doi.org\/10.1109\/TAP.1986.1143830","journal-title":"IEEE Trans. Antennas Propag."},{"issue":"7","key":"393_CR12","doi-asserted-by":"publisher","first-page":"984","DOI":"10.1109\/29.32276","volume":"37","author":"R Roy","year":"1989","unstructured":"R. Roy, T. Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech Signal Process. 37(7), 984\u2013995 (1989). https:\/\/doi.org\/10.1109\/29.32276","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"key":"393_CR13","doi-asserted-by":"publisher","unstructured":"J.H. DiBiase, H.F. Silverman, M.S. Brandstein, in Microphone Arrays. ed. by A. Lacroix, A. Venetsanopoulos, M. Brandstein, D. Ward. Robust Localization in Reverberant Rooms (Springer Berlin Heidelberg, Berlin, 2001), pp. 157\u2013180. https:\/\/doi.org\/10.1007\/978-3-662-04619-7_8","DOI":"10.1007\/978-3-662-04619-7_8"},{"key":"393_CR14","doi-asserted-by":"publisher","unstructured":"S. Chakrabarty, E.A.P. Habets, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Broadband doa estimation using convolutional neural networks trained with noise signals (New Paltz, 2017), pp. 136\u2013140. https:\/\/doi.org\/10.1109\/WASPAA.2017.8170010","DOI":"10.1109\/WASPAA.2017.8170010"},{"issue":"1","key":"393_CR15","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1109\/JSTSP.2019.2900164","volume":"13","author":"L Perotin","year":"2019","unstructured":"L. Perotin, R. Serizel, E. Vincent, A. Gu\u00e9rin, Crnn-based multiple doa estimation using acoustic intensity features for ambisonics recordings. IEEE J. Sel. Top. Signal Process. 13(1), 22\u201333 (2019). https:\/\/doi.org\/10.1109\/JSTSP.2019.2900164","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"393_CR16","doi-asserted-by":"publisher","unstructured":"N. Poschadel, S. Preihs, J. Peissig, in Proceedings of the 29th European Signal Processing Conference (EUSIPCO). Multi-source direction of arrival estimation of noisy speech using convolutional recurrent neural networks with higher-order ambisonics signals (Dublin, 2021), pp. 1015\u20131019. https:\/\/doi.org\/10.23919\/EUSIPCO54536.2021.9616002","DOI":"10.23919\/EUSIPCO54536.2021.9616002"},{"key":"393_CR17","doi-asserted-by":"publisher","unstructured":"Z. Tang, J.D. Kanu, K. Hogan, D. Manocha, in Proceedings of the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019). Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks (ISCA, Graz, 2019), pp. 654\u2013658. https:\/\/doi.org\/10.21437\/Interspeech.2019-1111","DOI":"10.21437\/Interspeech.2019-1111"},{"key":"393_CR18","doi-asserted-by":"publisher","unstructured":"L. Perotin, A. D\u00e9fossez, E. Vincent, R. Serizel, A. Gu\u00e9rin, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Regression versus classification for neural network based audio source localization (New Paltz, 2019). https:\/\/doi.org\/10.1109\/WASPAA.2019.8937277","DOI":"10.1109\/WASPAA.2019.8937277"},{"key":"393_CR19","unstructured":"N. Poschadel, S. Preihs, J. Peissig, in Fortschritte der Akustik - DAGA 2023. Comparison of Regression and Classification Models for Multi-Source Direction of Arrival Estimation with Convolutional Recurrent Neural Networks (Deutsche Gesellschaft, Hamburg, 2023)"},{"key":"393_CR20","unstructured":"A. Politis, S. Adavanne, D. Krause, A. Deleforge, P. Srivastava, T. Virtanen, A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection. (2021). arXiv\u00a0preprint\u00a0arXiv:2106.06999v2"},{"key":"393_CR21","doi-asserted-by":"crossref","unstructured":"Y. Cao, T. Iqbal, Q. Kong, Y. Zhong, W. Wang, M.D. Plumbley. Event-Independent Network for Polyphonic Sound Event Localization and Detection (2020).\u00a0arXiv preprint arXiv:010.00140","DOI":"10.1109\/ICASSP39728.2021.9413473"},{"key":"393_CR22","doi-asserted-by":"publisher","unstructured":"Y. Cao, T. Iqbal, Q. Kong, F. An, W. Wang, M.D. Plumbley, in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection (IEEE, 2021), pp. 885\u2013889. https:\/\/doi.org\/10.1109\/ICASSP39728.2021.9413473","DOI":"10.1109\/ICASSP39728.2021.9413473"},{"issue":"10","key":"393_CR23","doi-asserted-by":"publisher","first-page":"1901","DOI":"10.1109\/TASLP.2017.2726762","volume":"25","author":"M Kolbaek","year":"2017","unstructured":"M. Kolbaek, D. Yu, Z.H. Tan, J. Jensen, Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks. IEEE\/ACM Trans. Audio Speech Lang. Process. 25(10), 1901\u20131913 (2017). https:\/\/doi.org\/10.1109\/TASLP.2017.2726762","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"393_CR24","doi-asserted-by":"publisher","unstructured":"K. Shimada, Y. Koyama, S. Takahashi, N. Takahashi, E. Tsunoo, Y. Mitsufuji, in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Multi-ACCDOA: Localizing And Detecting Overlapping Sounds From The Same Class With Auxiliary Duplicating Permutation Invariant Training (IEEE, 2022), pp. 316\u2013320. https:\/\/doi.org\/10.1109\/ICASSP43922.2022.9746384","DOI":"10.1109\/ICASSP43922.2022.9746384"},{"key":"393_CR25","unstructured":"R. Roden, N. Moritz, S. Gerlach, S. Weinzierl, S. Goetze, in Fortschritte der Akustik - DAGA 2015. On sound source localization of speech signals using deep neural networks (Deutsche Gesellschaft, N\u00fcrnberg, 2015)"},{"key":"393_CR26","unstructured":"T. Hirvonen, in 138th Convention of the Audio Engineering Society. Classification of Spatial Audio Location and Content Using Convolutional Neural Networks (Audio Engineering Society, Warsaw, 2015)"},{"key":"393_CR27","doi-asserted-by":"publisher","unstructured":"D. Krause, A. Politis, K. Kowalczyk, in Proceedings of the 28th European Signal Processing Conference (EUSIPCO). Comparison of Convolution Types in CNN-based Feature Extraction for Sound Source Localization (2021), pp. 820\u2013824. https:\/\/doi.org\/10.23919\/Eusipco47968.2020.9287344","DOI":"10.23919\/Eusipco47968.2020.9287344"},{"key":"393_CR28","doi-asserted-by":"publisher","unstructured":"P.A. Grumiaux, S. Kiti\u0107, L. Girin, A. Gu\u00e9rin, in 2021 29th European Signal Processing Conference (EUSIPCO). Improved feature extraction for CRNN-based multiple sound source localization (IEEE, 2021), pp. 231\u2013235. https:\/\/doi.org\/10.23919\/EUSIPCO54536.2021.9616124","DOI":"10.23919\/EUSIPCO54536.2021.9616124"},{"key":"393_CR29","doi-asserted-by":"publisher","first-page":"1749","DOI":"10.1109\/TASLP.2022.3173054","volume":"30","author":"TN Tho Nguyen","year":"2022","unstructured":"T.N. Tho Nguyen, K.N. Watcharasupat, N.K. Nguyen, D.L. Jones, W.S. Gan, SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection. IEEE\/ACM Trans. Audio Speech Lang. Process. 30, 1749\u20131762 (2022). https:\/\/doi.org\/10.1109\/TASLP.2022.3173054","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"393_CR30","doi-asserted-by":"publisher","unstructured":"T.N. Tho Nguyen, D.L. Jones, K.N. Watcharasupat, H. Phan, W.S. Gan, in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays (IEEE, 2022), pp. 716\u2013720. https:\/\/doi.org\/10.1109\/ICASSP43922.2022.9746132","DOI":"10.1109\/ICASSP43922.2022.9746132"},{"key":"393_CR31","doi-asserted-by":"publisher","first-page":"996","DOI":"10.1109\/TASLP.2023.3346297","volume":"32","author":"DA Krause","year":"2024","unstructured":"D.A. Krause, G. Garc\u00eda-Barrios, A. Politis, A. Mesaros, Binaural Sound Source Distance Estimation and Localization for a Moving Listener. IEEE\/ACM Trans. Audio Speech Lang. Process. 32, 996\u20131011 (2024). https:\/\/doi.org\/10.1109\/TASLP.2023.3346297","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"393_CR32","doi-asserted-by":"publisher","unstructured":"P.A. Grumiaux, S. Kiti\u0107, P. Srivastava, L. Girin, A. Gu\u00e9rin, in 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Saladnet: Self-Attentive Multisource Localization in the Ambisonics Domain (IEEE, 2021), pp. 336\u2013340. https:\/\/doi.org\/10.1109\/WASPAA52581.2021.9632737","DOI":"10.1109\/WASPAA52581.2021.9632737"},{"issue":"1","key":"393_CR33","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1121\/10.0011809","volume":"152","author":"PA Grumiaux","year":"2022","unstructured":"P.A. Grumiaux, S. Kiti\u0107, L. Girin, A. Gu\u00e9rin, A survey of sound source localization with deep learning methods. J. Acoust. Soc. Am. 152(1), 107 (2022). https:\/\/doi.org\/10.1121\/10.0011809","journal-title":"J. Acoust. Soc. Am."},{"issue":"12","key":"393_CR34","doi-asserted-by":"publisher","first-page":"3446","DOI":"10.3390\/en14123446","volume":"14","author":"MU Liaquat","year":"2021","unstructured":"M.U. Liaquat, H.S. Munawar, A. Rahman, Z. Qadir, A.Z. Kouzani, M.A.P. Mahmud, Sound Localization for Ad-Hoc Microphone Arrays. Energies 14(12), 3446 (2021). https:\/\/doi.org\/10.3390\/en14123446","journal-title":"Energies"},{"issue":"7","key":"393_CR35","doi-asserted-by":"publisher","first-page":"074801","DOI":"10.1121\/10.0011811","volume":"2","author":"M Hahmann","year":"2022","unstructured":"M. Hahmann, E. Fernandez-Grande, H. Gunawan, P. Gerstoft, Sound source localization using multiple ad hoc distributed microphone arrays. JASA Express Lett. 2(7), 074801 (2022). https:\/\/doi.org\/10.1121\/10.0011811","journal-title":"JASA Express Lett."},{"key":"393_CR36","doi-asserted-by":"publisher","unstructured":"J. Wang, J. Wang, K. Qian, X. Xie, J. Kuang, Binaural sound localization based on deep neural network and affinity propagation clustering in mismatched HRTF condition. EURASIP J. Audio Speech Music Process. 2020(1) (2020). https:\/\/doi.org\/10.1186\/s13636-020-0171-y","DOI":"10.1186\/s13636-020-0171-y"},{"key":"393_CR37","doi-asserted-by":"publisher","unstructured":"K. Youssef, S. Argentieri, J.L. Zarader, in Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems. A learning-based approach to robust binaural sound localization (2013), pp. 2927\u20132932. https:\/\/doi.org\/10.1109\/IROS.2013.6696771","DOI":"10.1109\/IROS.2013.6696771"},{"key":"393_CR38","doi-asserted-by":"publisher","unstructured":"D. Berghi, P.J.B. Jackson, in 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Audio Inputs for Active Speaker Detection and Localization Via Microphone Array (IEEE, 2023), pp. 1\u20135. https:\/\/doi.org\/10.1109\/WASPAA58266.2023.10248185","DOI":"10.1109\/WASPAA58266.2023.10248185"},{"key":"393_CR39","doi-asserted-by":"publisher","first-page":"684","DOI":"10.1109\/TASLP.2020.3047233","volume":"29","author":"A Politis","year":"2021","unstructured":"A. Politis, A. Mesaros, S. Adavanne, T. Heittola, T. Virtanen, Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019. IEEE\/ACM Trans. Audio Speech Lang. Process. 29, 684\u2013698 (2021). https:\/\/doi.org\/10.1109\/TASLP.2020.3047233","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"393_CR40","doi-asserted-by":"publisher","first-page":"605","DOI":"10.1109\/TASLP.2019.2960734","volume":"28","author":"A Fahim","year":"2020","unstructured":"A. Fahim, P.N. Samarasinghe, T.D. Abhayapala, Multi-Source DOA Estimation Through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield. IEEE\/ACM Trans. Audio Speech Lang. Process. 28, 605\u2013618 (2020). https:\/\/doi.org\/10.1109\/TASLP.2019.2960734","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"393_CR41","doi-asserted-by":"publisher","unstructured":"Y. Hu, P.N. Samarasinghe, T.D. Abhayapala, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Sound source localization using relative harmonic coefficients in modal domain (New Paltz, 2019). https:\/\/doi.org\/10.1109\/WASPAA.2019.8937221","DOI":"10.1109\/WASPAA.2019.8937221"},{"key":"393_CR42","doi-asserted-by":"publisher","unstructured":"Y. Hu, S. Gannot, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Closed-form single source direction-of-arrival estimator using first-order relative harmonic coefficients (Singapore, 2022), pp. 726\u2013730. https:\/\/doi.org\/10.1109\/ICASSP43922.2022.9746143","DOI":"10.1109\/ICASSP43922.2022.9746143"},{"key":"393_CR43","doi-asserted-by":"publisher","first-page":"3108","DOI":"10.1109\/TASLP.2020.3037521","volume":"28","author":"Y Hu","year":"2020","unstructured":"Y. Hu, P.N. Samarasinghe, S. Gannot, T.D. Abhayapala, Semi-supervised multiple source localization using relative harmonic coefficients under noisy and reverberant environments. IEEE Trans. Acoust. Speech Signal Process. 28, 3108\u20133123 (2020). https:\/\/doi.org\/10.1109\/TASLP.2020.3037521","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"key":"393_CR44","doi-asserted-by":"publisher","unstructured":"Y. Hu, P.N. Samarasinghe, T.D. Abhayapala, S. Gannot, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Unsupervised multiple source localization using relative harmonic coefficients (Barcelona, 2020), pp. 571\u2013575. https:\/\/doi.org\/10.1109\/ICASSP40776.2020.9053656","DOI":"10.1109\/ICASSP40776.2020.9053656"},{"key":"393_CR45","doi-asserted-by":"publisher","unstructured":"D.P. Jarrett, E.A.P. Habets, P.A. Naylor, Theory and Applications of Spherical Microphone Array Processing, vol. 9 (Springer, Cham, 2017). https:\/\/doi.org\/10.1007\/978-3-319-42211-4","DOI":"10.1007\/978-3-319-42211-4"},{"key":"393_CR46","doi-asserted-by":"publisher","unstructured":"J. Daniel, S. Kiti\u0107, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Time Domain Velocity Vector for Retracing the Multipath Propagation (Barcelona, 2020), pp. 421\u2013425. https:\/\/doi.org\/10.1109\/ICASSP40776.2020.9054561","DOI":"10.1109\/ICASSP40776.2020.9054561"},{"key":"393_CR47","doi-asserted-by":"publisher","unstructured":"S. Kiti\u0107, J. Daniel, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Generalized Time Domain Velocity Vector (Singapore, 2022), pp. 936\u2013940. https:\/\/doi.org\/10.1109\/ICASSP43922.2022.9747173","DOI":"10.1109\/ICASSP43922.2022.9747173"},{"key":"393_CR48","unstructured":"P.A. Grumiaux, Deep learning for speaker counting and localization with ambisonics signals. Ph.D. thesis, Grenoble Alpes University, Grenoble, France (2021)"},{"key":"393_CR49","doi-asserted-by":"publisher","unstructured":"Y. Hu, S. Gannot, in Proceedings of the 30th European Signal Processing Conference (EUSIPCO). Comparison of learning-based doa estimation between sh domain features (Belgrade, 2022), pp. 329\u2013333. https:\/\/doi.org\/10.23919\/EUSIPCO55093.2022.9909795","DOI":"10.23919\/EUSIPCO55093.2022.9909795"},{"key":"393_CR50","doi-asserted-by":"publisher","unstructured":"N. Poschadel, R. Hupke, S. Preihs, J. Peissig, in Proceedings of the 29th European Signal Processing Conference (EUSIPCO). Direction of arrival estimation of noisy speech using convolutional recurrent neural networks with higher-order ambisonics signals (Dublin, 2021), pp. 211\u2013215. https:\/\/doi.org\/10.23919\/EUSIPCO54536.2021.9616204","DOI":"10.23919\/EUSIPCO54536.2021.9616204"},{"issue":"6","key":"393_CR51","first-page":"503","volume":"55","author":"V Pulkki","year":"2007","unstructured":"V. Pulkki, Spatial Sound Reproduction with Directional Audio Coding. J. Audio Eng. Soc. 55(6), 503\u2013516 (2007)","journal-title":"J. Audio Eng. Soc."},{"issue":"5","key":"393_CR52","doi-asserted-by":"publisher","first-page":"852","DOI":"10.1109\/JSTSP.2015.2415762","volume":"9","author":"A Politis","year":"2015","unstructured":"A. Politis, J. Vilkamo, V. Pulkki, Sector-based parametric sound field reproduction in the spherical harmonic domain. IEEE J. Sel. Top. Signal Process. 9(5), 852\u2013866 (2015). https:\/\/doi.org\/10.1109\/JSTSP.2015.2415762","journal-title":"IEEE J. Sel. Top. Signal Process."},{"issue":"11","key":"393_CR53","doi-asserted-by":"publisher","first-page":"840","DOI":"10.17743\/jaes.2019.0041","volume":"67","author":"L McCormack","year":"2019","unstructured":"L. McCormack, S. Delikaris-Manias, A. Politis et al., Applications of spatially localized active-intensity vectors for sound-field visualization. J. Audio Eng. Soc. 67(11), 840\u2013854 (2019). https:\/\/doi.org\/10.17743\/jaes.2019.0041","journal-title":"J. Audio Eng. Soc."},{"key":"393_CR54","unstructured":"S. Delikaris-Manias, L. McCormack, D. Pavlidi, A. Mouchtaris, in Proceedings of the 11th European Congress and Exposition on Noise Control Engineering (Euronoise). Spatially localized direction of arrival estimation (European Acoustics Association, Crete, 2018), pp. 2549\u20132554"},{"key":"393_CR55","unstructured":"M. McCrea, L. McCormack, V. Pulkki, in Proceedings of the 2nd Nordic Sound and Music Computing (NordicSMC) Conference, ed. by P.R. Kantan, R. Paisa, S. Willemsen. Sound Source Localization Using Sector-Based Analysis with Multiple Receivers (Zenodo,\u00a0Aalborg, 2021)"},{"key":"393_CR56","doi-asserted-by":"publisher","unstructured":"B. Rafaely, Fundamentals of Spherical Array Processing, vol. 8 (Springer Berlin, Heidelberg, 2015). https:\/\/doi.org\/10.1007\/978-3-662-45664-4","DOI":"10.1007\/978-3-662-45664-4"},{"key":"393_CR57","doi-asserted-by":"publisher","unstructured":"F. Zotter, M. Frank, Ambisonics, vol. 19 (Springer, Cham, 2019). https:\/\/doi.org\/10.1007\/978-3-030-17207-7","DOI":"10.1007\/978-3-030-17207-7"},{"key":"393_CR58","unstructured":"J. Daniel, in AES 23rd International Conference. Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format (Copenhagen, 2003)"},{"key":"393_CR59","doi-asserted-by":"publisher","unstructured":"A. Politis, V. Pulkki, in Parametric Time-Frequency Domain Spatial Audio, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis. Higher-order directional audio coding (Wiley, Chichester, 2017), pp. 141\u2013159. https:\/\/doi.org\/10.1002\/9781119252634.ch6","DOI":"10.1002\/9781119252634.ch6"},{"issue":"5","key":"393_CR60","doi-asserted-by":"publisher","first-page":"3094","DOI":"10.1121\/1.2063108","volume":"118","author":"M Park","year":"2005","unstructured":"M. Park, B. Rafaely, Sound-field analysis by plane-wave decomposition using spherical microphone array. J. Acoust. Soc. Am. 118(5), 3094\u20133103 (2005). https:\/\/doi.org\/10.1121\/1.2063108","journal-title":"J. Acoust. Soc. Am."},{"key":"393_CR61","doi-asserted-by":"publisher","unstructured":"M.J. Crocker, F. Jacobsen, in Encyclopedia of acoustics, ed. by M.J. Crocker. Sound intensity (Wiley-Interscience, New York, 1997), pp. 1855\u20131868. https:\/\/doi.org\/10.1002\/9780470172544.ch156","DOI":"10.1002\/9780470172544.ch156"},{"key":"393_CR62","doi-asserted-by":"publisher","unstructured":"F. Jacobsen, in Springer Handbook of Acoustics, ed. by T.D. Rossing. Sound Intensity (Springer, Berlin, 2014), pp. 1093\u20131114. https:\/\/doi.org\/10.1007\/978-1-4939-0755-7_25","DOI":"10.1007\/978-1-4939-0755-7_25"},{"key":"393_CR63","unstructured":"D.P. Jarrett, E.A.P. Habets, P.A. Naylor, in Proceedings of the 18th European Signal Processing Conference (EUSIPCO). 3d source localization in the spherical harmonic domain using a pseudointensity vector (Aalborg, 2010), pp. 442\u2013446"},{"key":"393_CR64","unstructured":"J. Merimaa, Analysis, synthesis, and perception of spatial sound: Binaural localization modeling and multichannel loudspeaker reproduction. Ph.D. thesis, Helsinki University of Technology, Espoo, Finland (2006)"},{"key":"393_CR65","unstructured":"A. Politis, V. Pulkki, Acoustic intensity, energy-density and diffuseness estimation in a directionally-constrained region. (2016).\u00a0arXiv\u00a0preprint\u00a0arXiv:1609.03409v2"},{"issue":"10","key":"393_CR66","first-page":"807","volume":"60","author":"F Zotter","year":"2012","unstructured":"F. Zotter, M. Frank, All-round ambisonic panning and decoding. J. Audio Eng. Soc. 60(10), 807\u2013820 (2012)","journal-title":"J. Audio Eng. Soc."},{"key":"393_CR67","unstructured":"D.A. Clevert, T. Unterthiner, S. Hochreiter, in 4th International Conference on Learning Representations, ICLR 2016, ed. by Y. Bengio, Y. LeCun. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) (ICLR,\u00a0San Juan, 2016). https:\/\/arxiv.org\/abs\/1511.07289v5"},{"key":"393_CR68","unstructured":"M. Abadi, A. Agarwal, P. Barham, et al., TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (2015).\u00a0https:\/\/download.tensorflow.org\/paper\/whitepaper2015.pdf. Accessed 1 Dec 2024,"},{"key":"393_CR69","doi-asserted-by":"publisher","unstructured":"D. Diaz-Guerra, A. Politis, T. Virtanen, in 31st European Signal Processing Conference (EUSIPCO). Position Tracking of a Varying Number of Sound Sources with Sliding Permutation Invariant Training (IEEE, 2023), pp. 251\u2013255. https:\/\/doi.org\/10.23919\/EUSIPCO58844.2023.10289897","DOI":"10.23919\/EUSIPCO58844.2023.10289897"},{"key":"393_CR70","doi-asserted-by":"crossref","unstructured":"J. Meyer, G. Elko, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield, vol. 2 (IEEE, Orlando, 2002), pp. 1781\u20131784","DOI":"10.1109\/ICASSP.2002.5744968"},{"key":"393_CR71","unstructured":"A. Politis. Spherical-Array-Processing: A collection of MATLAB routines for acoustical array processing on spherical harmonic signals, commonly captured with a spherical microphone array (2024). https:\/\/github.com\/polarch\/Spherical-Array-Processing. Accessed 01 Dec 2024"},{"key":"393_CR72","unstructured":"A. Wabnitz, N. Epain, C. Jin, A. van Schaik, in Proceedings of the International Symposium on Room Acoustics (ISRA). Room acoustics simulation for multichannel microphone arrays (Australian Acoustical Society, Melbourne, 2010)"},{"issue":"4","key":"393_CR73","doi-asserted-by":"publisher","first-page":"943","DOI":"10.1121\/1.382599","volume":"65","author":"JB Allen","year":"1979","unstructured":"J.B. Allen, D.A. Berkley, Image Method for Efficiently Simulating Small-Room Acoustics. J. Acoust. Soc. Am. 65(4), 943\u2013950 (1979). https:\/\/doi.org\/10.1121\/1.382599","journal-title":"J. Acoust. Soc. Am."},{"key":"393_CR74","unstructured":"D. Schr\u00f6der, P. Dross, M. Vorl\u00e4nder, in Audio Engineering Society Conference: 30th International Conference: Intelligent Audio Environments, A Fast Reverberation Estimator for Virtual Environments (Audio Engineering Society,\u00a0Saariselk\u00e4, 2007). http:\/\/www.aes.org\/e-lib\/browse.cfm?elib=13911"},{"key":"393_CR75","unstructured":"mh acoustics. EigenUnits. (2024) https:\/\/mhacoustics.com\/eigenunits. Accessed 01 Dec 2024"},{"key":"393_CR76","doi-asserted-by":"publisher","unstructured":"D. Ackermann, M. Ilse, D. Grigoriev et al., A ground truth on room acoustical analysis and perception (grap). (2018). https:\/\/doi.org\/10.14279\/DEPOSITONCE-7003","DOI":"10.14279\/DEPOSITONCE-7003"},{"key":"393_CR77","doi-asserted-by":"publisher","unstructured":"P. Srivastava, A. Deleforge, A. Politis, E. Vincent, in INTERSPEECH 2023. How to (Virtually) Train Your Speaker Localizer (ISCA, 2023), pp. 1204\u20131208. https:\/\/doi.org\/10.21437\/Interspeech.2023-1065","DOI":"10.21437\/Interspeech.2023-1065"},{"key":"393_CR78","unstructured":"M. Berzborn, R. Bomhardt, J. Klein, J.G. Richter, M. Vorl\u00e4nder, in Fortschritte der Akustik - DAGA 2017. The ITA-Toolbox: An Open Source MATLAB Toolbox for Acoustic Measurements and Signal Processing (Deutsche Gesellschaft, Kiel, 2017)"},{"key":"393_CR79","doi-asserted-by":"publisher","unstructured":"DIN EN ISO 3382-2:2008-09, Akustik - Messung von Parametern der Raumakustik - Teil 2: Nachhallzeit in gew\u00f6hnlichen R\u00e4umen (ISO 3382-2:2008); Deutsche Fassung EN ISO 3382-2:2008. https:\/\/doi.org\/10.31030\/1411187","DOI":"10.31030\/1411187"},{"key":"393_CR80","doi-asserted-by":"publisher","unstructured":"J.S. Garofolo. Timit: Acoustic-phonetic continuous speech corpus. Linguist. Data Consortium (Linguistic Data Consortium, Philadelphia, 1993).\u00a0https:\/\/doi.org\/10.35111\/17gk-bn40","DOI":"10.35111\/17gk-bn40"},{"key":"393_CR81","unstructured":"R. Hupke, M. Nophut, S. Li, R. Schlieper, S. Preihs, J. Peissig, in 144th Convention of the Audio Engineering Society. The immersive media laboratory: Installation of a novel multichannel audio laboratory for immersive media applications (Audio Engineering Society, Milan, 2018)"},{"key":"393_CR82","unstructured":"International Telecommunications Union. ITU-R BS.1116-3: Methods for the subjective assessment of small impairments in audio systems (2015)"},{"key":"393_CR83","unstructured":"R. Kiyan, S. Preihs, J. Peissig, in Fortschritte der Akustik - DAGA 2024. Robokopp: Robotic Setup for Automated Sweet Spot Measurements with Head Simulators and Microphone Arrays (Deutsche Gesellschaft, Hannover, 2024)"},{"key":"393_CR84","unstructured":"R. Kiyan, S. Preihs, J. Peissig, in 156th Convention of the Audio Engineering Society. Determining the immersion sweet area in multichannel loudspeaker reproduction using spatial sound field features (Audio Engineering Society, Madrid, 2024). Accepted for publication"},{"key":"393_CR85","unstructured":"N. Poschadel, R. Kiyan, S. Preihs, J. Peissig, in Proceedings of the 24th International Congress on Acoustics (ICA 2022). On the Impact of Input Scaling Strategies for Deep Learning based DOA Estimation from Ambisonics Signals (International Commission for Acoustics, Gyeongju, 2022)"},{"issue":"1\u20132","key":"393_CR86","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1002\/nav.3800020109","volume":"2","author":"HW Kuhn","year":"1955","unstructured":"H.W. Kuhn, The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1\u20132), 83\u201397 (1955). https:\/\/doi.org\/10.1002\/nav.3800020109","journal-title":"Nav. Res. Logist. Q."},{"issue":"6","key":"393_CR87","doi-asserted-by":"publisher","first-page":"162","DOI":"10.3390\/app6060162","volume":"6","author":"A Mesaros","year":"2016","unstructured":"A. Mesaros, T. Heittola, T. Virtanen, Metrics for Polyphonic Sound Event Detection. Appl. Sci. 6(6), 162 (2016). https:\/\/doi.org\/10.3390\/app6060162","journal-title":"Appl. Sci."},{"key":"393_CR88","doi-asserted-by":"publisher","unstructured":"A. Mesaros, S. Adavanne, A. Politis, T. Heittola, T. Virtanen, in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Joint Measurement of Localization and Detection of Sound Events (IEEE, Piscataway, 2019), pp. 333\u2013337. https:\/\/doi.org\/10.1109\/WASPAA.2019.8937220","DOI":"10.1109\/WASPAA.2019.8937220"},{"key":"393_CR89","doi-asserted-by":"publisher","unstructured":"B. Efron, R. Tibshirani, An introduction to the bootstrap. Monographs on statistics and applied probability An introduction to the bootstrap (Chapman & Hall, New York, 1994). https:\/\/doi.org\/10.1201\/9780429246593","DOI":"10.1201\/9780429246593"}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-025-00393-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13636-025-00393-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-025-00393-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T03:21:26Z","timestamp":1739330486000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-025-00393-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,12]]},"references-count":89,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["393"],"URL":"https:\/\/doi.org\/10.1186\/s13636-025-00393-7","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,12]]},"assertion":[{"value":"5 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 January 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 February 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"7"}}