{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T19:47:37Z","timestamp":1776887257891,"version":"3.51.2"},"reference-count":42,"publisher":"IOP Publishing","issue":"4","license":[{"start":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T00:00:00Z","timestamp":1734652800000},"content-version":"vor","delay-in-days":19,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T00:00:00Z","timestamp":1734652800000},"content-version":"tdm","delay-in-days":19,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Union","doi-asserted-by":"crossref","award":["7202070 \u201cHBP\u201d"],"award-info":[{"award-number":["7202070 \u201cHBP\u201d"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003246","name":"Nederlandse Organisatie voor Wetenschappelijk Onderzoek","doi-asserted-by":"crossref","award":["NWA.1292.19.298"],"award-info":[{"award-number":["NWA.1292.19.298"]}],"id":[{"id":"10.13039\/501100003246","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Neuromorph. Comput. Eng."],"published-print":{"date-parts":[[2024,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Speech enhancement improves communication in noisy environments, affecting areas such as automatic speech recognition (ASR), hearing aids, and telecommunications. With these domains typically being power-constrained and event-based, and often requiring low latency, neuromorphic algorithms\u2013particularly spiking neural networks (SNNs)\u2013hold significant potential. However, current effective SNN solutions require a long temporal window to calculate Short Time Fourier Transforms (STFTs) and thus impose substantial latency, typically around 32\u2009ms, which is too long for applications such as hearing aids. Inspired by the Dual-Path Recurrent Neural Network (DPRNN) in deep neural networks (DNNs), we develop a two-phase time-domain streaming SNN fframework for speech enhancement, named <jats:italic>Dual-Path Spiking Neural Network (DPSNN)<\/jats:italic>. DPSNNs achieve low latency by replacing the STFT and inverse STFT (iSTFT) in traditional frequency-domain models with a learned convolutional encoder and decoder. In the DPSNN, the first phase uses Spiking Convolutional Neural Networks (SCNNs) to capture temporal contextual information, while the second phase uses Spiking Recurrent Neural Networks (SRNNs) to focus on frequency-related features. In addition, threshold-based activation suppression, along with <jats:italic>L<\/jats:italic>\n                  <jats:sub>1<\/jats:sub> regularization loss, is applied to specific non-spiking layers in DPSNNs to further improve their energy efficiency. Evaluating on the Voice Cloning Toolkit (VCTK) Corpus and Intel N-DNS Challenge dataset, our approach demonstrates excellent performance in speech objective metrics, along with the very low latency (approximately 5\u2009ms) required for applications like hearing aids.<\/jats:p>","DOI":"10.1088\/2634-4386\/ad93f9","type":"journal-article","created":{"date-parts":[[2024,11,18]],"date-time":"2024-11-18T22:52:15Z","timestamp":1731970335000},"page":"044008","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["DPSNN: spiking neural network for low-latency streaming speech enhancement"],"prefix":"10.1088","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8967-8760","authenticated-orcid":true,"given":"Tao","family":"Sun","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7866-278X","authenticated-orcid":true,"given":"Sander","family":"Boht\u00e9","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"266","published-online":{"date-parts":[[2024,12,20]]},"reference":[{"key":"ncead93f9bib1","doi-asserted-by":"publisher","first-page":"1702","DOI":"10.1109\/TASLP.2018.2842159","article-title":"Supervised speech separation based on deep learning: an overview","volume":"26","author":"Wang","year":"2018","journal-title":"IEEE\/ACM Trans. Audio, Speech Lang. Process."},{"key":"ncead93f9bib2","doi-asserted-by":"publisher","first-page":"1381","DOI":"10.1109\/TASL.2013.2250961","article-title":"Towards scaling up classification-based speech separation","volume":"21","author":"Wang","year":"2013","journal-title":"IEEE Trans. Audio, Speech Lang. Process."},{"key":"ncead93f9bib3","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1109\/TASLP.2014.2364452","article-title":"A regression approach to speech enhancement based on deep neural networks","volume":"23","author":"Xu","year":"2014","journal-title":"IEEE\/ACM Trans. Audio, Speech Lang. Process."},{"key":"ncead93f9bib4","first-page":"pp 1136","article-title":"A new framework for supervised speech enhancement in the time domain","author":"Pandey","year":"2018"},{"key":"ncead93f9bib5","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1109\/MSP.2014.2369251","article-title":"Phase processing for single-channel speech enhancement: History and recent advances","volume":"32","author":"Gerkmann","year":"2015","journal-title":"IEEE Signal Process. Mag."},{"key":"ncead93f9bib6","first-page":"pp 1","article-title":"Complex spectrogram enhancement by convolutional neural network with multi-metrics learning","author":"Fu","year":"2017"},{"key":"ncead93f9bib7","first-page":"pp 696","article-title":"TasNet: time-domain audio separation network for real-time, single-channel speech separation","author":"Luo","year":"2018"},{"key":"ncead93f9bib8","doi-asserted-by":"publisher","first-page":"1256","DOI":"10.1109\/TASLP.2019.2915167","article-title":"Conv-TasNet: surpassing ideal time-frequency magnitude masking for speech separation","volume":"27","author":"Luo","year":"2019","journal-title":"IEEE\/ACM Trans. Audio, Speech Lang. Process."},{"key":"ncead93f9bib9","first-page":"pp 46","article-title":"Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation","author":"Luo","year":"2020"},{"key":"ncead93f9bib10","doi-asserted-by":"publisher","first-page":"1380","DOI":"10.3390\/s23031380","article-title":"A survey on low-latency DNN-based speech enhancement","volume":"23","author":"Drgas","year":"2023","journal-title":"Sensors"},{"key":"ncead93f9bib11","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1109\/TASLP.2022.3224285","article-title":"STFT-domain neural speech enhancement with very low algorithmic latency","volume":"31","author":"Wang","year":"2022","journal-title":"IEEE\/ACM Trans. Audio, Speech Lang. Process."},{"key":"ncead93f9bib12","doi-asserted-by":"publisher","DOI":"10.1088\/2634-4386\/ace737","article-title":"The Intel neuromorphic DNS challenge","volume":"3","author":"Timcheck","year":"2023","journal-title":"Neuromorph. Comput. Eng."},{"key":"ncead93f9bib13","first-page":"pp 301","article-title":"Deep neural network based low-latency speech separation with asymmetric analysis-synthesis window pair","author":"Wang","year":"2021"},{"key":"ncead93f9bib14","first-page":"pp 616","article-title":"Trainable adaptive window switching for speech enhancement","author":"Koizumi","year":"2019"},{"key":"ncead93f9bib15","article-title":"Recommendation g. 114, one-way transmission time","author":"ITU T","year":"2000"},{"key":"ncead93f9bib16","article-title":"Clarity challenge: speech enhancement for hearing aids","author":"Team Clarity","year":"2024"},{"key":"ncead93f9bib17","first-page":"pp 76","article-title":"Low-latency deep clustering for speech separation","author":"Wang","year":"2019"},{"key":"ncead93f9bib18","article-title":"The Intel neuromorphic DNS challenge","year":"2024"},{"key":"ncead93f9bib19","article-title":"Spiking structured state space model for monaural speech enhancement","author":"Du","year":"2023"},{"key":"ncead93f9bib20","first-page":"pp 111","article-title":"Single channel speech enhancement using U-Net spiking neural networks","author":"Riahi","year":"2023"},{"key":"ncead93f9bib21","doi-asserted-by":"publisher","first-page":"905","DOI":"10.1038\/s42256-021-00397-w","article-title":"Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks","volume":"3","author":"Yin","year":"2021","journal-title":"Nat. Mach. Intell."},{"key":"ncead93f9bib22","first-page":"pp 4554","article-title":"STAR: sparse thresholded activation under partial-regularization for activation sparsity exploration","author":"Zhu","year":"2023"},{"key":"ncead93f9bib23","article-title":"Noisy speech database for training speech enhancement algorithms and tts models","author":"Valentini-Botinhao","year":"2017"},{"key":"ncead93f9bib24","author":"Sun","year":"2022"},{"key":"ncead93f9bib25","first-page":"pp 6875","article-title":"TCNN: temporal convolutional neural network for real-time speech enhancement in the time domain","author":"Pandey","year":"2019"},{"key":"ncead93f9bib26","first-page":"pp 006","article-title":"Raw waveform-based speech enhancement by fully convolutional networks","author":"Fu","year":"2017"},{"key":"ncead93f9bib27","first-page":"pp 254","article-title":"Dilated FCN: listening longer to hear better","author":"Gong","year":"2019"},{"key":"ncead93f9bib28","article-title":"Improved speech enhancement with the Wave-U-Net","author":"Macartney","year":"2018"},{"key":"ncead93f9bib29","article-title":"Wave-U-Net: a multi-scale neural network for end-to-end audio source separation","author":"Stoller","year":"2018"},{"key":"ncead93f9bib30","first-page":"pp 1524","article-title":"When audio denoising meets spiking neural network","author":"Hao","year":"2024"},{"key":"ncead93f9bib31","article-title":"Efficiently modeling long sequences with structured state spaces","author":"Gu","year":"2022"},{"key":"ncead93f9bib32","first-page":"pp 234","article-title":"U-Net: convolutional networks for biomedical image segmentation","author":"Ronneberger","year":"2015"},{"key":"ncead93f9bib33","first-page":"pp 992","article-title":"Boosting the intelligibility of waveform speech enhancement networks through self-supervised representations","author":"Sun","year":"2021"},{"key":"ncead93f9bib34","author":"Gerstner","year":"2002"},{"key":"ncead93f9bib35","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1109\/MSP.2019.2931595","article-title":"Surrogate gradient learning in spiking neural networks: bringing the power of gradient-based optimization to spiking neural networks","volume":"36","author":"Neftci","year":"2019","journal-title":"IEEE Signal Process. Mag."},{"key":"ncead93f9bib36","first-page":"pp 2661","article-title":"Incorporating learnable membrane time constant to enhance learning of spiking neural networks","author":"Fang","year":"2021"},{"key":"ncead93f9bib37","first-page":"pp 1","article-title":"Effective and efficient computation with multiple-timescale spiking recurrent neural networks","author":"Yin","year":"2020"},{"key":"ncead93f9bib38","article-title":"The diverse environments multi-channel acoustic noise database (DEMAND): a database of multichannel environmental noise recordings","volume":"vol 19","author":"Thiemann","year":"2013"},{"key":"ncead93f9bib39","first-page":"pp 626","article-title":"SDR - half-baked or well done?","author":"Le Roux","year":"2019"},{"key":"ncead93f9bib40","first-page":"pp 749","article-title":"Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs","volume":"vol 2","author":"Rix","year":"2001"},{"key":"ncead93f9bib41","first-page":"pp 886","article-title":"DNSMOS P. 835: a non-intrusive perceptual objective speech quality metric to evaluate noise suppressors","author":"Reddy","year":"2022"},{"key":"ncead93f9bib42","doi-asserted-by":"publisher","first-page":"2125","DOI":"10.1109\/TASL.2011.2114881","article-title":"An algorithm for intelligibility prediction of time\u2013frequency weighted noisy speech","volume":"19","author":"Taal","year":"2011","journal-title":"IEEE Trans. Audio, Speech Lang Process."}],"container-title":["Neuromorphic Computing and Engineering"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T11:59:43Z","timestamp":1734695983000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/ad93f9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,1]]},"references-count":42,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2024,12,20]]},"published-print":{"date-parts":[[2024,12,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2634-4386\/ad93f9","relation":{},"ISSN":["2634-4386"],"issn-type":[{"value":"2634-4386","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,1]]},"assertion":[{"value":"DPSNN: spiking neural network for low-latency streaming speech enhancement","name":"article_title","label":"Article Title"},{"value":"Neuromorphic Computing and Engineering","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2024 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-08-14","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-11-18","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-12-20","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}