{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T14:03:55Z","timestamp":1772114635513,"version":"3.50.1"},"reference-count":93,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,4,19]],"date-time":"2023-04-19T00:00:00Z","timestamp":1681862400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research and innovation programme under the Marie Sk\u0142odowska-Curie grant","award":["956962"],"award-info":[{"award-number":["956962"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,5,31]]},"abstract":"<jats:p>Robust sound source localization for environments with noise and reverberation are increasingly exploiting deep neural networks fed with various acoustic features. Yet, state-of-the-art research mainly focuses on optimizing algorithmic accuracy, resulting in huge models preventing edge-device deployment. The edge, however, urges for real-time low-footprint acoustic reasoning for applications such as hearing aids and robot interactions. Hence, we set off from a robust CNN-based model using SRP-PHAT features, Cross3D\u00a0[<jats:xref ref-type=\"bibr\">16<\/jats:xref>], to pursue an efficient yet compact model architecture for the extreme edge. For both the SRP feature representation and neural network, we propose respectively our scalable LC-SRP-Edge and Cross3D-Edge algorithms which are optimized towards lower hardware overhead. LC-SRP-Edge halves the complexity and on-chip memory overhead for the sinc interpolation compared to the original LC-SRP\u00a0[<jats:xref ref-type=\"bibr\">19<\/jats:xref>]. Over multiple SRP resolution cases, Cross3D-Edge saves 10.32%~73.71% computational complexity and 59.77%~94.66% neural network weights against the Cross3D baseline. In terms of the accuracy-efficiency tradeoff, the most balanced version (<jats:bold>EM<\/jats:bold>) requires only 127.1 MFLOPS computation, 3.71 MByte\/s bandwidth, and 0.821 MByte on-chip memory in total, while still retaining competitiveness in state-of-the-art accuracy comparisons. It achieves 8.59\u00a0ms\/frame end-to-end latency on a Rasberry Pi 4B, which is 7.26\u00d7 faster than the corresponding baseline.<\/jats:p>","DOI":"10.1145\/3586996","type":"journal-article","created":{"date-parts":[[2023,3,7]],"date-time":"2023-03-07T04:58:01Z","timestamp":1678165081000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2159-0658","authenticated-orcid":false,"given":"Jun","family":"Yin","sequence":"first","affiliation":[{"name":"ESAT-MICAS KU Leuven, Leuven, Belgium"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3495-9263","authenticated-orcid":false,"given":"Marian","family":"Verhelst","sequence":"additional","affiliation":[{"name":"ESAT-MICAS KU Leuven, Leuven, Belgium"}]}],"member":"320","published-online":{"date-parts":[[2023,4,19]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2018.2885636"},{"key":"e_1_3_1_3_2","first-page":"1462","volume-title":"Proceedings of the 2018 26th European Signal Processing Conference.","author":"Adavanne Sharath","year":"2018","unstructured":"Sharath Adavanne, Archontis Politis, and Tuomas Virtanen. 2018. Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network. In Proceedings of the 2018 26th European Signal Processing Conference. IEEE, 1462\u20131466."},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","unstructured":"Sharath Adavanne Archontis Politis and Tuomas Virtanen. 2019. Localization detection and tracking of multiple moving sound sources with a convolutional recurrent neural network. In Workshop on Detection and Classification of Acoustic Scenes and Events .","DOI":"10.33682\/xb0q-a335"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Sharath Adavanne Archontis Politis and Tuomas Virtanen. 2019. A multi-room reverberant dataset for sound event localization and detection. In Workshop on Detection and Classification of Acoustic Scenes and Events .","DOI":"10.33682\/1xwd-5v76"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1121\/1.382599"},{"key":"e_1_3_1_7_2","first-page":"885","volume-title":"Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Cao Yin","year":"2021","unstructured":"Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, and Mark D. Plumbley. 2021. An improved event-independent network for polyphonic sound event localization and detection. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 885\u2013889."},{"key":"e_1_3_1_8_2","first-page":"136","volume-title":"Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.","author":"Chakrabarty Soumitro","year":"2017","unstructured":"Soumitro Chakrabarty and Emanu\u00ebl A. P. Habets. 2017. Broadband DOA estimation using convolutional neural networks trained with noise signals. In Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 136\u2013140."},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","unstructured":"Soumitro Chakrabarty and Emanu\u00ebl A. P. Habets. 2017. Multi-speaker localization using convolutional neural network trained with noise. arXiv:1712.04276 [cs.SD].","DOI":"10.1109\/WASPAA.2017.8170010"},{"key":"e_1_3_1_10_2","unstructured":"Tianqi Chen Thierry Moreau Ziheng Jiang Lianmin Zheng Eddie Q. Yan Haichen Shen Meghan Cowan Leyuan Wang Yuwei Hu Luis Ceze Carlos Guestrin and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In USENIX Symposium on Operating Systems Design and Implementation ."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ymssp.2018.09.019"},{"key":"e_1_3_1_12_2","unstructured":"Kyunghyun Cho Bart van Merrienboer \u00c7aglar G\u00fcl\u00e7ehre Dzmitry Bahdanau Fethi Bougares Holger Schwenk and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder\u2013decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing ."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.195"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2020.3011256"},{"issue":"1","key":"e_1_3_1_15_2","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1109\/TNNLS.2018.2830119","article-title":"Enhanced robot speech recognition using biomimetic binaural sound source localization","volume":"30","author":"D\u00e1vila-Chac\u00f3n Jorge","year":"2018","unstructured":"Jorge D\u00e1vila-Chac\u00f3n, Jindong Liu, and Stefan Wermter. 2018. Enhanced robot speech recognition using biomimetic binaural sound source localization. IEEE Transactions on Neural Networks and Learning Systems 30, 1 (2018), 138\u2013150.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_1_16_2","unstructured":"David Diaz-Guerra. 2020. Cross3D Codebase. Retrieved from https:\/\/github.com\/DavidDiazGuerra\/Cross3D."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2020.3040031"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-020-09905-3"},{"key":"e_1_3_1_19_2","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1007\/978-3-662-04619-7_8","volume-title":"Proceedings of the Microphone Arrays","author":"DiBiase Joseph H.","year":"2001","unstructured":"Joseph H. DiBiase, Harvey F. Silverman, and Michael S. Brandstein. 2001. Robust localization in reverberant rooms. In Proceedings of the Microphone Arrays. Springer, 157\u2013180."},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Thomas Dietzen Enzo De Sena and Toon van Waterschoot. 2020. Low-complexity steered response power mapping based on Nyquist-Shannon sampling. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA\u201921) 206\u2013210.","DOI":"10.1109\/WASPAA52581.2021.9632774"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2007.906694"},{"key":"e_1_3_1_22_2","first-page":"I\u2013121","volume-title":"Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP\u201907","volume":"1","author":"Do Hoang","year":"2007","unstructured":"Hoang Do, Harvey F. Silverman, and Ying Yu. 2007. A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP\u201907. Vol. 1. IEEE, I\u2013121."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2020.2990485"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Pierre-Amaury Grumiaux Srdan Kitic Laurent Girin and Alexandre Gu\u00e9rin. 2021. Improved feature extraction for CRNN-based multiple sound source localization. In 29th European Signal Processing Conference (EUSIPCO\u201921) . 231\u2013235.","DOI":"10.23919\/EUSIPCO54536.2021.9616124"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Pierre-Amaury Grumiaux Sr\u0111an Kiti\u0107 Laurent Girin and Alexandre Gu\u00e9rin. 2021. A survey of sound source localization with deep learning methods. The Journal of the Acoustical Society of America 152 1 (2021) 107.","DOI":"10.1121\/10.0011809"},{"key":"e_1_3_1_26_2","first-page":"16","volume-title":"Proceedings of the 2020 28th European Signal Processing Conference.","author":"Guirguis Karim","year":"2021","unstructured":"Karim Guirguis, Christoph Schorn, Andre Guntoro, Sherif Abdulatif, and Bin Yang. 2021. SELD-TCN: Sound event localization & detection via temporal convolutional networks. In Proceedings of the 2020 28th European Signal Processing Conference. IEEE, 16\u201320."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_3_1_28_2","volume-title":"Proceedings of the Audio Engineering Society Convention 138","author":"Hirvonen Toni","year":"2015","unstructured":"Toni Hirvonen. 2015. Classification of spatial audio location and content using convolutional neural networks. In Proceedings of the Audio Engineering Society Convention 138. Audio Engineering Society."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"issue":"11","key":"e_1_3_1_30_2","doi-asserted-by":"crossref","first-page":"2535","DOI":"10.3390\/s17112535","article-title":"Design of UAV-embedded microphone array system for sound source localization in outdoor environments","volume":"17","author":"Hoshiba Kotaro","year":"2017","unstructured":"Kotaro Hoshiba, Kai Washizaki, Mizuho Wakabayashi, Takahiro Ishiki, Makoto Kumon, Yoshiaki Bando, Daniel Gabriel, Kazuhiro Nakadai, and Hiroshi G. Okuno. 2017. Design of UAV-embedded microphone array system for sound source localization in outdoor environments. Sensors 17, 11 (2017), 2535.","journal-title":"Sensors"},{"key":"e_1_3_1_31_2","first-page":"26","volume-title":"Proceedings of the 2020 IEEE 3rd International Conference on Information Communication and Signal Processing.","author":"Huang Yankun","year":"2020","unstructured":"Yankun Huang, Xihong Wu, and Tianshu Qu. 2020. A time-domain unsupervised learning based sound source localization method. In Proceedings of the 2020 IEEE 3rd International Conference on Information Communication and Signal Processing. IEEE, 26\u201332."},{"key":"e_1_3_1_32_2","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-42211-4","volume-title":"Theory and Applications of Spherical Microphone Array Processing","author":"Jarrett Daniel P.","year":"2017","unstructured":"Daniel P. Jarrett, Emanu\u00ebl A. P. Habets, and Patrick A. Naylor. 2017. Theory and Applications of Spherical Microphone Array Processing. Vol. 9. Springer."},{"key":"e_1_3_1_33_2","volume-title":"Sound Event Localization and Detection using Convolutional Recurrent Neural Network","author":"Jee Wen Jie","year":"2019","unstructured":"Wen Jie Jee, R. Mars, P. Pratik, S. Nagisetty, and C. S. Lim. 2019. Sound Event Localization and Detection using Convolutional Recurrent Neural Network. Technical Report. DCASE2019 Challenge, Tech. Rep."},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"S\u0142awomir Kapka and Mateusz Lewandowski. 2019. Sound source detection localization and classification using consecutive ensemble of CRNN models. ArXiv abs\/1908.00766 (2019).","DOI":"10.33682\/9f2t-ab23"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.2528\/PIERB10100510"},{"key":"e_1_3_1_36_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs\/1412.6980 (2014)."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1976.1162830"},{"key":"e_1_3_1_38_2","unstructured":"Qiuqiang Kong Yin Cao Turab Iqbal Yong Xu Wenwu Wang and Mark D. Plumbley. 2019. Cross-task learning for audio tagging sound event detection and spatial localization: DCASE 2019 baseline systems. arXiv:1904.03476 [cs.SD]."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ultras.2013.06.009"},{"issue":"6","key":"e_1_3_1_40_2","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1016\/j.ultras.2012.01.017","article-title":"Acoustic source localization in anisotropic plates","volume":"52","author":"Kundu Tribikram","year":"2012","unstructured":"Tribikram Kundu, Hayato Nakatani, and Nobuo Takeda. 2012. Acoustic source localization in anisotropic plates. Ultrasonics 52, 6 (2012), 740\u2013746.","journal-title":"Ultrasonics"},{"key":"e_1_3_1_41_2","first-page":"1","volume-title":"Proceedings of the 2019 IEEE 21st International Workshop on Multimedia Signal Processing.","author":"Moing Guillaume Le","year":"2019","unstructured":"Guillaume Le Moing, Phongtharin Vinayavekhin, Tadanobu Inoue, Jayakorn Vongkulbhisal, Asim Munawar, Ryuki Tachibana, and Don Joven Agravante. 2019. Learning multiple sound source 2d localization. In Proceedings of the 2019 IEEE 21st International Workshop on Multimedia Signal Processing. IEEE, 1\u20136."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_1_43_2","first-page":"2616","volume-title":"Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Li Qinglong","year":"2018","unstructured":"Qinglong Li, Xueliang Zhang, and Hao Li. 2018. Online direction of arrival estimation based on deep learning. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2616\u20132620."},{"key":"e_1_3_1_44_2","doi-asserted-by":"crossref","unstructured":"Markus V. S. Lima Wallace A. Martins Leonardo O. Nunes Luiz W. P. Biscainho Tadeu N. Ferreira Mauricio V. M. Costa and Bowon Lee. 2015. A volumetric SRP with refinement step for sound source localization. IEEE Signal Processing Letters 22 8 (2015) 1098\u20131102.","DOI":"10.1109\/LSP.2014.2385864"},{"key":"e_1_3_1_45_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. arXiv:1711.05101 [cs.LG]."},{"key":"e_1_3_1_46_2","volume-title":"Introduction to Shannon Sampling and Interpolation Theory","author":"Marks Robert J II","year":"2012","unstructured":"Robert J II Marks. 2012. Introduction to Shannon Sampling and Interpolation Theory. Springer Science & Business Media."},{"key":"e_1_3_1_47_2","first-page":"18","volume-title":"Proceedings of the 14th Python in Science Conference","volume":"8","author":"McFee Brian","year":"2015","unstructured":"Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference. Vol. 8. Citeseer, 18\u201325."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342012452166"},{"key":"e_1_3_1_49_2","unstructured":"Javier Naranjo-Alcazar Sergi Perez-Castanos Jose Ferrandis Pedro Zuccarello and Maximo Cobos. 2021. Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs . arXiv:2006.14436 [cs.SD]."},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1121\/1.5000165"},{"key":"e_1_3_1_51_2","article-title":"Three-stage approach for sound event localization and detection","author":"Noh Kyoungjin","year":"2019","unstructured":"Kyoungjin Noh, C. Jeong-Hwan, J. Dongyeop, and C. Joon-Hyuk. 2019. Three-stage approach for sound event localization and detection. Tech. Report of Detection and Classification of Acoustic Scenes and Events 2019 (DCASE) Challange (2019). https:\/\/www.semanticscholar.org\/paper\/THREE-STAGE-APPROACH-FOR-SOUND-EVENT-LOCALIZATION-Noh-Choi\/2e0962d0fc80a5b069a09716b35e4fa1ecdb97b1.","journal-title":"Tech. Report of Detection and Classification of Acoustic Scenes and Events 2019 (DCASE) Challange"},{"key":"e_1_3_1_52_2","first-page":"5206","volume-title":"Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Panayotov Vassil","year":"2015","unstructured":"Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An asr corpus based on public domain audio books. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 5206\u20135210."},{"key":"e_1_3_1_53_2","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1109\/IWAENC.2018.8521403","volume-title":"Proceedings of the 2018 16th International Workshop on Acoustic Signal Enhancement.","author":"Perotin Laur\u00e9line","year":"2018","unstructured":"Laur\u00e9line Perotin, Romain Serizel, Emmanuel Vincent, and Alexandre Gu\u00e9rin. 2018. CRNN-based joint azimuth and elevation localization with the Ambisonics intensity vector. In Proceedings of the 2018 16th International Workshop on Acoustic Signal Enhancement. IEEE, 241\u2013245."},{"key":"e_1_3_1_54_2","first-page":"6125","volume-title":"Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Pertil\u00e4 Pasi","year":"2017","unstructured":"Pasi Pertil\u00e4 and Emre Cakir. 2017. Robust direction estimation with convolutional neural networks based steered response power. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6125\u20136129."},{"key":"e_1_3_1_55_2","unstructured":"Archontis Politis Sharath Adavanne and Tuomas Virtanen. 2020. A dataset of reverberant spatial sound scenes with moving sources for sound event localization and detection. arXiv:2006.01919 [eess.AS]."},{"key":"e_1_3_1_56_2","doi-asserted-by":"crossref","unstructured":"Nils Poschadel Robert Hupke Stephan Preihs and J\u00fcrgen Peissig. 2021. Direction of arrival estimation of noisy speech using convolutional recurrent neural networks with higher-order ambisonics signals. In 29th European Signal Processing Conference (EUSIPCO\u201921) 211\u2013215.","DOI":"10.23919\/EUSIPCO54536.2021.9616204"},{"key":"e_1_3_1_57_2","volume-title":"Proceedings of the 23rd International Congress on Acoustics.","author":"Pujol Hadrien","year":"2019","unstructured":"Hadrien Pujol, Eric Bavu, and Alexandre Garcia. 2019. Source localization in reverberant rooms using Deep Learning and microphone arrays. In Proceedings of the 23rd International Congress on Acoustics."},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1121\/10.0005046"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2017.07.011"},{"key":"e_1_3_1_60_2","first-page":"I\u2013529","volume-title":"Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"1","author":"Rickard Scott","year":"2002","unstructured":"Scott Rickard and Ozgiir Yilmaz. 2002. On the approximate W-disjoint orthogonality of speech. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 1. IEEE, I\u2013529."},{"key":"e_1_3_1_61_2","volume-title":"On Sound Source Localization of Speech Signals using Deep Neural Networks","author":"Roden Reinhild","year":"2015","unstructured":"Reinhild Roden, Niko Moritz, Stephan Gerlach, Stefan Weinzierl, and Stefan Goetze. 2015. On Sound Source Localization of Speech Signals using Deep Neural Networks. https:\/\/www.semanticscholar.org\/paper\/On-sound-source-localization-of-speech-signals-deep-Roden-Moritz\/cbbcd9214f1d25aaf4cae3cddbf0d9712056e837."},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/29.32276"},{"issue":"2","key":"e_1_3_1_63_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/TETCI.2017.2775237","article-title":"Exploiting CNNs for improving acoustic source localization in noisy and reverberant conditions","volume":"2","author":"Salvati Daniele","year":"2018","unstructured":"Daniele Salvati, Carlo Drioli, and Gian Luca Foresti. 2018. Exploiting CNNs for improving acoustic source localization in noisy and reverberant conditions. IEEE Transactions on Emerging Topics in Computational Intelligence 2, 2 (2018), 103\u2013116.","journal-title":"IEEE Transactions on Emerging Topics in Computational Intelligence"},{"key":"e_1_3_1_64_2","first-page":"411","volume-title":"Proceedings of the 7th International Symposium on Signal Processing and Its Applications.","volume":"2","author":"Sawada Hiroshi","year":"2003","unstructured":"Hiroshi Sawada, Ryo Mukai, and Shoji Makino. 2003. Direction of arrival estimation for multiple source signals using independent component analysis. In Proceedings of the 7th International Symposium on Signal Processing and Its Applications. Vol. 2. IEEE, 411\u2013414."},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAP.1986.1143830"},{"key":"e_1_3_1_66_2","doi-asserted-by":"crossref","unstructured":"Christopher Schymura Benedikt T. B\u00f6nninghoff Tsubasa Ochiai Marc Delcroix Keisuke Kinoshita Tomohiro Nakatani Shoko Araki and Dorothea Kolossa. 2021. PILOT: Introducing transformers for probabilistic sound event localization. In Interspeech .","DOI":"10.21437\/Interspeech.2021-124"},{"key":"e_1_3_1_67_2","first-page":"915","volume-title":"Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Shimada Kazuki","year":"2021","unstructured":"Kazuki Shimada, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, and Yuki Mitsufuji. 2021. Accdoa: Activity-coupled cartesian direction of arrival representation for sound event localization and detection. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 915\u2013919."},{"key":"e_1_3_1_68_2","unstructured":"Kazuki Shimada Naoya Takahashi Shusuke Takahashi and Yuki Mitsufuji. 2020. Sound event localization and detection using activity-coupled Cartesian DOA vector and RD3Net. arXiv:2006.12014 [eess.AS]."},{"key":"e_1_3_1_69_2","volume-title":"Proceedings of the Interspeech 2018-19th Annual Conference of the International Speech Communication Association","author":"Sivasankaran Sunit","year":"2018","unstructured":"Sunit Sivasankaran, Emmanuel Vincent, and Dominique Fohr. 2018. Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment. In Proceedings of the Interspeech 2018-19th Annual Conference of the International Speech Communication Association."},{"key":"e_1_3_1_70_2","doi-asserted-by":"crossref","unstructured":"Aswin Shanmugam Subramanian Chao Weng Shinji Watanabe Meng Yu and Dong Yu. 2021. Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput. Speech Lang. 75 (2021) 101360.","DOI":"10.1016\/j.csl.2022.101360"},{"key":"e_1_3_1_71_2","unstructured":"Dmitry Suvorov Ge Dong and Roman Zhukov. 2018. Deep residual network for sound source localization in the time domain. arXiv:1808.06429 [cs.SD]."},{"key":"e_1_3_1_72_2","first-page":"14","volume-title":"Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control.","author":"Tervo Sakari","year":"2008","unstructured":"Sakari Tervo and Tapio Lokki. 2008. Interpolation methods for the SRP-PHAT algorithm. In Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control.14\u201317."},{"key":"e_1_3_1_73_2","first-page":"6797","volume-title":"Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Thuillier Etienne","year":"2018","unstructured":"Etienne Thuillier, Hannes Gamper, and Ivan J. Tashev. 2018. Spatial audio feature discovery with convolutional neural networks. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6797\u20136801."},{"key":"e_1_3_1_74_2","first-page":"393","volume-title":"Proceedings of the RO-MAN 2007-The 16th IEEE International Symposium on Robot and Human Interactive Communication","author":"Trifa Vlad M.","year":"2007","unstructured":"Vlad M. Trifa, Ansgar Koene, Jan Mor\u00e9n, and Gordon Cheng. 2007. Real-time acoustic source localization in noisy environments for human-robot multimodal interaction. In Proceedings of the RO-MAN 2007-The 16th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 393\u2013398."},{"issue":"10","key":"e_1_3_1_75_2","doi-asserted-by":"crossref","first-page":"2257","DOI":"10.1587\/transinf.E96.D.2257","article-title":"An approach for sound source localization by complex-valued neural network","volume":"96","author":"Tsuzuki Hirofumi","year":"2013","unstructured":"Hirofumi Tsuzuki, Mauricio Kugler, Susumu Kuroyanagi, and Akira Iwata. 2013. An approach for sound source localization by complex-valued neural network. IEICE Transactions on Information and Systems 96, 10 (2013), 2257\u20132265.","journal-title":"IEICE Transactions on Information and Systems"},{"issue":"3","key":"e_1_3_1_76_2","doi-asserted-by":"crossref","first-page":"164","DOI":"10.3109\/14992027.2010.537376","article-title":"Sound source localization using hearing aids with microphones placed behind-the-ear, in-the-canal, and in-the-pinna","volume":"50","author":"Bogaert Tim Van den","year":"2011","unstructured":"Tim Van den Bogaert, Evelyne Carette, and Jan Wouters. 2011. Sound source localization using hearing aids with microphones placed behind-the-ear, in-the-canal, and in-the-pinna. International Journal of Audiology 50, 3 (2011), 164\u2013176.","journal-title":"International Journal of Audiology"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2020.2984852"},{"key":"e_1_3_1_78_2","first-page":"566","volume-title":"Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Varzandeh Reza","year":"2020","unstructured":"Reza Varzandeh, Kamil Adilo\u011flu, Simon Doclo, and Volker Hohmann. 2020. Exploiting periodicity features for joint detection and DOA estimation of speech sources using convolutional neural networks. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 566\u2013570."},{"key":"e_1_3_1_79_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_1_80_2","first-page":"1567","volume-title":"Proceedings of the 2018 26th European Signal Processing Conference.","author":"Vecchiotti Paolo","year":"2018","unstructured":"Paolo Vecchiotti, Emanuele Principi, Stefano Squartini, and Francesco Piazza. 2018. Deep neural networks for joint voice activity detection and speaker localization. In Proceedings of the 2018 26th European Signal Processing Conference. IEEE, 1567\u20131571."},{"issue":"10","key":"e_1_3_1_81_2","doi-asserted-by":"crossref","first-page":"3418","DOI":"10.3390\/s18103418","article-title":"Towards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates","volume":"18","author":"Vera-Diaz Juan Manuel","year":"2018","unstructured":"Juan Manuel Vera-Diaz, Daniel Pizarro, and Javier Macias-Guarasa. 2018. Towards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates. Sensors 18, 10 (2018), 3418.","journal-title":"Sensors"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.5555\/3307050"},{"key":"e_1_3_1_83_2","doi-asserted-by":"crossref","unstructured":"Qing Wang Jun Du Hua-Xin Wu Jia Pan Feng Ma and Chin-Hui Lee. 2023. A four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection. arXiv:2101.02919 [cs.SD].","DOI":"10.1109\/TASLP.2023.3256088"},{"key":"e_1_3_1_84_2","article-title":"The USTC-IFLYTEK system for sound event localization and detection of DCASE2020 challenge","author":"Wang Qing","year":"2020","unstructured":"Qing Wang, Huaxin Wu, Zijun Jing, Feng Ma, Yi Fang, Yuxuan Wang, Tairan Chen, Jia Pan, Jun Du, and Chin-Hui Lee. 2020. The USTC-IFLYTEK system for sound event localization and detection of DCASE2020 challenge. Tech. Rep., DCASE2020 Challenge (2020). https:\/\/www.semanticscholar.org\/paper\/THE-USTC-IFLYTEK-SYSTEM-FOR-SOUND-EVENT-AND-OF-Wang-Wu\/735990cac7c3791725ac4c846ac61a603409d66b.","journal-title":"Tech. Rep., DCASE2020 Challenge"},{"issue":"1","key":"e_1_3_1_85_2","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1109\/TASLP.2018.2876169","article-title":"Robust speaker localization guided by deep learning-based time-frequency masking","volume":"27","author":"Wang Zhong-Qiu","year":"2018","unstructured":"Zhong-Qiu Wang, Xueliang Zhang, and DeLiang Wang. 2018. Robust speaker localization guided by deep learning-based time-frequency masking. IEEE\/ACM Transactions on Audio, Speech, and Language Processing 27, 1 (2018), 178\u2013188.","journal-title":"IEEE\/ACM Transactions on Audio, Speech, and Language Processing"},{"key":"e_1_3_1_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_1_87_2","first-page":"4680","volume-title":"Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Wu Yifan","year":"2021","unstructured":"Yifan Wu, Roshan Ayyalasomayajula, Michael J. Bianco, Dinesh Bharadia, and Peter Gerstoft. 2021. SSLIDE: Sound source localization for indoors based on deep learning. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 4680\u20134684."},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1121\/1.5042222"},{"key":"e_1_3_1_89_2","first-page":"2814","volume-title":"Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Xiao Xiong","year":"2015","unstructured":"Xiong Xiao, Shengkui Zhao, Xionghu Zhong, Douglas L. Jones, Eng Siong Chng, and Haizhou Li. 2015. A learning-based approach to direction of arrival estimation in noisy and reverberant environments. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2814\u20132818."},{"issue":"8","key":"e_1_3_1_90_2","first-page":"1567","article-title":"High-accuracy TDOA-based localization without time synchronization","volume":"24","author":"Xu Bin","year":"2012","unstructured":"Bin Xu, Guodong Sun, Ran Yu, and Zheng Yang. 2012. High-accuracy TDOA-based localization without time synchronization. IEEE Transactions on Parallel and Distributed Systems 24, 8 (2012), 1567\u20131576.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"issue":"1","key":"e_1_3_1_91_2","doi-asserted-by":"crossref","first-page":"37","DOI":"10.20965\/jrm.2017.p0037","article-title":"Sound source localization using deep learning models","volume":"29","author":"Yalta Nelson","year":"2017","unstructured":"Nelson Yalta, Kazuhiro Nakadai, and Tetsuya Ogata. 2017. Sound source localization using deep learning models. Journal of Robotics and Mechatronics 29, 1 (2017), 37\u201348.","journal-title":"Journal of Robotics and Mechatronics"},{"key":"e_1_3_1_92_2","first-page":"651","volume-title":"Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Yasuda Masahiro","year":"2020","unstructured":"Masahiro Yasuda, Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, and Keisuke Imoto. 2020. Sound event localization based on sound intensity vector refined by DNN-based denoising and source separation. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 651\u2013655."},{"key":"e_1_3_1_93_2","first-page":"2927","volume-title":"Proceedings of the 2013 IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Youssef Karim","year":"2013","unstructured":"Karim Youssef, Sylvain Argentieri, and Jean-Luc Zarader. 2013. A learning-based approach to robust binaural sound localization. In Proceedings of the 2013 IEEE\/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2927\u20132932."},{"key":"e_1_3_1_94_2","first-page":"2703","volume-title":"Proceedings of the INTERSPEECH","author":"Zhang Wangyou","year":"2019","unstructured":"Wangyou Zhang, Ying Zhou, and Yanmin Qian. 2019. Robust DOA estimation based on convolutional neural network and time-frequency masking. In Proceedings of the INTERSPEECH. 2703\u20132707."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3586996","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3586996","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:33Z","timestamp":1750178253000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3586996"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,19]]},"references-count":93,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,5,31]]}},"alternative-id":["10.1145\/3586996"],"URL":"https:\/\/doi.org\/10.1145\/3586996","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,19]]},"assertion":[{"value":"2022-02-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-12","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-04-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}