{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T19:30:22Z","timestamp":1762543822781,"version":"3.27.0"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T00:00:00Z","timestamp":1681948800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T00:00:00Z","timestamp":1681948800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Voice activity detection remains a significant challenge in the presence of transients since transients are more dominant than speech, though it has achieved satisfactory performance in quasi-stationary noisy environments. This paper studies the differences between speech and transients in nonlinear dynamic characteristics and proposes a new method for accurately detecting speech and transients. Limited by algorithm complexity, previous research has proposed few detectors to model speech and transients based on contextual information and thus failing to detect transient frames accurately. To address this challenge, our study proposes to map features of audio signals to a time series complex network, a kind of graph data, analyzed by the Laplacian and adjacency matrix of graphs, then classified by the support vector machine (SVM) classifier. The proposed algorithm can analyze a more extended speech period, allowing the full utilization of contextual information of preceding and following frames. The experimental results show that the performance of this method has obvious superiority over other existing algorithms.<\/jats:p>","DOI":"10.1186\/s13636-023-00282-x","type":"journal-article","created":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T15:58:23Z","timestamp":1682006303000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Voice activity detection in the presence of transient based on graph"],"prefix":"10.1186","volume":"2023","author":[{"given":"Xiao-Yuan","family":"Guo","sequence":"first","affiliation":[]},{"given":"Chun-Xian","family":"Gao","sequence":"additional","affiliation":[]},{"given":"Hui","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,4,20]]},"reference":[{"key":"282_CR1","doi-asserted-by":"publisher","first-page":"942617","DOI":"10.1155\/2009\/942617","volume":"2009","author":"B Schuller","year":"2009","unstructured":"B. Schuller, M. W\u00f6llmer, T. Moosmayr, Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement. J Audio Speech Music Proc. 2009, 942617 (2009)","journal-title":"J Audio Speech Music Proc."},{"key":"282_CR2","doi-asserted-by":"crossref","unstructured":"K. Veena, D. Mathew, in 2015 International Conference on Power, Instrumentation, Control and Computing (PICC). Speaker identification and verification of noisy speech using multitaper mfcc and gaussian mixture models (IEEE 2015), pp. 1-4","DOI":"10.1109\/PICC.2015.7455806"},{"issue":"1","key":"282_CR3","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1109\/TCE.2011.5735502","volume":"57","author":"N Cho","year":"2011","unstructured":"N. Cho, E.-K. Kim, Enhanced voice activity detection using acoustic event detection and classification. IEEE Trans. Consum. Electron. 57(1), 196\u2013202 (2011)","journal-title":"IEEE Trans. Consum. Electron."},{"issue":"6","key":"282_CR4","doi-asserted-by":"publisher","first-page":"1965","DOI":"10.1109\/TSP.2006.874403","volume":"54","author":"J-H Chang","year":"2006","unstructured":"J.-H. Chang, N.S. Kim, S.K. Mitra, Voice activity detection based on multiple statistical models. IEEE Trans. Sig. Process. 54(6), 1965\u20131976 (2006)","journal-title":"IEEE Trans. Sig. Process."},{"issue":"1","key":"282_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/97.736233","volume":"6","author":"J Sohn","year":"1999","unstructured":"J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Sig. Process. Lett. 6(1), 1\u20133 (1999)","journal-title":"IEEE Sig. Process. Lett."},{"issue":"3\u20134","key":"282_CR6","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1016\/j.specom.2003.10.002","volume":"42","author":"J Ram\u0131rez","year":"2004","unstructured":"J. Ram\u0131rez, J.C. Segura, C. Ben\u0131tez, A. De La Torre, A. Rubio, Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42(3\u20134), 271\u2013287 (2004)","journal-title":"Speech Commun."},{"issue":"6","key":"282_CR7","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","volume":"29","author":"G Hinton","year":"2012","unstructured":"G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82\u201397 (2012)","journal-title":"IEEE Signal Process. Mag."},{"issue":"4","key":"282_CR8","doi-asserted-by":"publisher","first-page":"697","DOI":"10.1109\/TASL.2012.2229986","volume":"21","author":"X-L Zhang","year":"2013","unstructured":"X.-L. Zhang, J. Wu, Deep belief networks based voice activity detection. IEEE Trans. Audio Speech Lang. Process. 21(4), 697\u2013710 (2013)","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"282_CR9","doi-asserted-by":"publisher","unstructured":"S. Thomas, S. Ganapathy, G. Saon, H. Soltau, Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2519-2523 (2014). https:\/\/doi.org\/10.1109\/ICASSP.2014.6854054","DOI":"10.1109\/ICASSP.2014.6854054"},{"key":"282_CR10","doi-asserted-by":"crossref","unstructured":"R. Tahmasbi, S. Rezaei, A soft voice activity detection using GARCH filter and variance gamma distribution. IEEE Trans. Audio, Speech, Lang. Process. 15(4), 1129-1134 (2007)","DOI":"10.1109\/TASL.2007.894521"},{"key":"282_CR11","doi-asserted-by":"publisher","unstructured":"A. Ivry, B. Berdugo, I. Cohen, in IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 2. Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets (2019), pp. 254-264. https:\/\/doi.org\/10.1109\/JSTSP.2019.2909472","DOI":"10.1109\/JSTSP.2019.2909472"},{"key":"282_CR12","doi-asserted-by":"crossref","unstructured":"Kobayashi, H., Shimamura, T.: in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100), vol. 3. A weighted autocorrelation method for pitch extraction of noisy speech (IEEE 2000), pp. 1307-1310","DOI":"10.1109\/ICASSP.2000.861818"},{"issue":"12","key":"282_CR13","doi-asserted-by":"publisher","first-page":"2238","DOI":"10.1109\/TASLP.2015.2476762","volume":"23","author":"I-C Yoo","year":"2015","unstructured":"I.-C. Yoo, H. Lim, D. Yook, Formant-based robust voice activity detection. IEEE\/ACM Trans. Audio Speech Lang. Process. 23(12), 2238\u20132245 (2015)","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"282_CR14","doi-asserted-by":"crossref","unstructured":"T. Kristjansson, S. Deligne, P. Olsen, Voicing features for robust speech detection. Entropy. 2(2.5), 3 (2005)","DOI":"10.21437\/Interspeech.2005-186"},{"issue":"3","key":"282_CR15","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1109\/LSP.2013.2237903","volume":"20","author":"SO Sadjadi","year":"2013","unstructured":"S.O. Sadjadi, J.H. Hansen, Unsupervised speech activity detection using voicing measures and perceptual spectral flux. IEEE Sig. Process. Lett. 20(3), 197\u2013200 (2013)","journal-title":"IEEE Sig. Process. Lett."},{"issue":"1","key":"282_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1687-4722-2013-21","volume":"2013","author":"Y Ma","year":"2013","unstructured":"Y. Ma, A. Nishihara, Efficient voice activity detection algorithm using long-term spectral flatness measure. EURASIP J. Audio Speech Music Process. 2013(1), 1\u201318 (2013)","journal-title":"EURASIP J. Audio Speech Music Process."},{"key":"282_CR17","doi-asserted-by":"crossref","unstructured":"E. Scheirer, M. Slaney, in 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. Construction and evaluation of a robust multifeature speech\/music discriminator (IEEE, 1997), pp. 1331-1334","DOI":"10.1109\/ICASSP.1997.596192"},{"issue":"6","key":"282_CR18","doi-asserted-by":"publisher","first-page":"1820","DOI":"10.1016\/j.compeleceng.2012.09.003","volume":"38","author":"D Vlaj","year":"2012","unstructured":"D. Vlaj, Z. Ka\u010di\u010d, M. Kos, Voice activity detection algorithm using nonlinear spectral weights, hangover and hangbefore criteria. Comput. Electr. Eng. 38(6), 1820\u20131836 (2012)","journal-title":"Comput. Electr. Eng."},{"issue":"1","key":"282_CR19","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1109\/TASL.2012.2215593","volume":"21","author":"R Talmon","year":"2012","unstructured":"R. Talmon, I. Cohen, S. Gannot, Single-channel transient interference suppression with diffusion maps. IEEE Trans. Audio Speech Lang. Process. 21(1), 132\u2013144 (2012)","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"issue":"9","key":"282_CR20","doi-asserted-by":"publisher","first-page":"2528","DOI":"10.1109\/TASL.2012.2205243","volume":"20","author":"R Talmon","year":"2012","unstructured":"R. Talmon, I. Cohen, S. Gannot, R.R. Coifman, Supervised graph-based processing for sequential transient interference suppression. IEEE Trans. Audio Speech Lang. Process. 20(9), 2528\u20132538 (2012)","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"issue":"12","key":"282_CR21","doi-asserted-by":"publisher","first-page":"2313","DOI":"10.1109\/TASLP.2016.2566919","volume":"24","author":"D Dov","year":"2016","unstructured":"D. Dov, R. Talmon, I. Cohen, Kernel method for voice activity detection in the presence of transients. IEEE\/ACM Trans. Audio Speech Lang. Process. 24(12), 2313\u20132326 (2016)","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"issue":"6","key":"282_CR22","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1109\/MSP.2020.3018087","volume":"37","author":"M Petrovic","year":"2020","unstructured":"M. Petrovic, R. Liegeois, T.A. Bolton, D. Van De Ville, Community-aware graph signal processing: Modularity defines new ways of processing graph signals. IEEE Sig. Process. Mag. 37(6), 150\u2013159 (2020)","journal-title":"IEEE Sig. Process. Mag."},{"key":"282_CR23","doi-asserted-by":"crossref","unstructured":"E. Pavez, A. Ortega, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Generalized laplacian precision matrix estimation for graph signal processing (IEEE, 2016), pp. 6350-6354","DOI":"10.1109\/ICASSP.2016.7472899"},{"key":"282_CR24","doi-asserted-by":"crossref","unstructured":"A. Hiruma, K. Yatabe, Y. Oikawa, in 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC). Separating stereo audio mixture having no phase difference by convex clustering and disjointness map (IEEE, 2018), pp. 266-270","DOI":"10.1109\/IWAENC.2018.8521350"},{"key":"282_CR25","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1016\/j.specom.2020.06.005","volume":"123","author":"X Yan","year":"2020","unstructured":"X. Yan, Z. Yang, T. Wang, H. Guo, An iterative graph spectral subtraction method for speech enhancement. Speech Commun. 123, 35\u201342 (2020)","journal-title":"Speech Commun."},{"issue":"4","key":"282_CR26","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1109\/MCAS.2012.2221521","volume":"12","author":"X Li","year":"2012","unstructured":"X. Li, D. Yang, X. Liu, X.M. Wu, Bridging time series dynamics and complex network theory with application to electrocardiogram analysis. IEEE Circ. Syst. Mag. 12(4), 33\u201346 (2012)","journal-title":"IEEE Circ. Syst. Mag."},{"key":"282_CR27","doi-asserted-by":"crossref","unstructured":"H. Trang, T.H. Loc, H.B.H. Nam, in 2014 International Conference on Advanced Technologies for Communications (ATC 2014). Proposed combination of pca and mfcc feature extraction in speech recognition system (IEEE, 2014), pp. 697-702","DOI":"10.1109\/ATC.2014.7043477"},{"key":"282_CR28","doi-asserted-by":"publisher","unstructured":"D. R. Hardoon, S. Szedmak, J. Shawe-Taylor, in Neural Computation, vol. 16, no. 12. Canonical Correlation Analysis: An Overview with Application to Learning Methods (2004), pp. 2639-2664. https:\/\/doi.org\/10.1162\/0899766042321814","DOI":"10.1162\/0899766042321814"},{"key":"282_CR29","doi-asserted-by":"crossref","unstructured":"X. Peipei, Z. Li, L. Fanzhang, Learning similarity with cosine similarity ensemble[J]. Inf. Sci. 307(C): 39-52 (2015)","DOI":"10.1016\/j.ins.2015.02.024"},{"key":"282_CR30","unstructured":"V.M. Panaretos, Y. Zemel, Statistical aspects of wasserstein distances. (2018). arXiv preprint arXiv:1806.05500"},{"key":"282_CR31","doi-asserted-by":"crossref","unstructured":"M. Mesbahi, M. Egerstedt, in Graph Theoretic Methods in Multiagent Networks. Graph theoretic methods in multiagent networks (Princeton University Press, 2010)","DOI":"10.1515\/9781400835355"},{"key":"282_CR32","doi-asserted-by":"crossref","unstructured":"V. Panayotov, G. Chen, D. Povey, S. Khudanpur, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Librispeech: an asr corpus based on public domain audio books (IEEE, 2015), pp. 5206-5210. https:\/\/ieeexplore.ieee.org\/document\/7178964","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"282_CR33","doi-asserted-by":"crossref","unstructured":"F. Font, G. Roma, X. Serra, Freesound technical demo[C]\/\/Proceedings of the 21st ACM international conference on Multimedia. 411-412 (2013). Transients source: http:\/\/www.freesound.org\/","DOI":"10.1145\/2502081.2502245"},{"issue":"6","key":"282_CR34","doi-asserted-by":"publisher","first-page":"1261","DOI":"10.1109\/TASL.2013.2248717","volume":"21","author":"S Mousazadeh","year":"2013","unstructured":"S. Mousazadeh, I. Cohen, Voice activity detection in presence of transient noise using spectral clustering. IEEE Trans. Audio Speech Lang. Process. 21(6), 1261\u20131271 (2013)","journal-title":"IEEE Trans. Audio Speech Lang. Process."}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-023-00282-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13636-023-00282-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-023-00282-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,18]],"date-time":"2024-10-18T22:22:15Z","timestamp":1729290135000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-023-00282-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,20]]},"references-count":34,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["282"],"URL":"https:\/\/doi.org\/10.1186\/s13636-023-00282-x","relation":{},"ISSN":["1687-4722"],"issn-type":[{"type":"electronic","value":"1687-4722"}],"subject":[],"published":{"date-parts":[[2023,4,20]]},"assertion":[{"value":"23 June 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"16"}}