{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T03:51:22Z","timestamp":1775274682571,"version":"3.50.1"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T00:00:00Z","timestamp":1715817600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T00:00:00Z","timestamp":1715817600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Basic Scientific research project of colleges and universities of Liaoning Province Education Department","award":["LJKZ0338"],"award-info":[{"award-number":["LJKZ0338"]}]},{"name":"Guangdong Province Science and technology innovation strategy Special City and county science and technology innovation support project","award":["STKJ2023071"],"award-info":[{"award-number":["STKJ2023071"]}]},{"name":"Science and Technology Program of Huludao City","award":["2023JH(1)4\/02b"],"award-info":[{"award-number":["2023JH(1)4\/02b"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Sound event detection involves identifying sound categories in audio and determining when they start and end. However, in real-life situations, sound events are usually not isolated. When one sound event occurs, there are often other related sound events that take place as co-occurrences or successive occurrences. The timing relationship of sound events can reflect their characteristics. Therefore, this paper proposes a sound event detection method for traffic scenes based on a graph convolutional network, which considers this timing relationship as a form of multimodal information. The proposed method involves using the acoustic event window method to obtain co-occurrences or successive occurrences of relationship information in the sound signal while filtering out possible noise relationship information. This information is then represented as a graphical structure. Next, the graph convolutional neural network is improved to balance relationship weights between neighbors and itself and to avoid excessive smoothing. It is used to learn the relationship information in the graph structure. Finally, the convolutional recurrent neural network is used to learn the acoustic feature information of sound events, and the relationship information of sound events is obtained by multi-modal fusion to enhance the performance of sound event detection. The experimental results show that using multi-modal information with the proposed method can effectively improve the performance of the model and enhance the perception ability of smart cars in their surrounding environment while driving.<\/jats:p>","DOI":"10.1007\/s40747-024-01463-7","type":"journal-article","created":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T08:02:31Z","timestamp":1715846551000},"page":"5653-5668","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Sound event detection in traffic scenes based on graph convolutional network to obtain multi-modal information"],"prefix":"10.1007","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7843-1335","authenticated-orcid":false,"given":"Yanji","family":"Jiang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0003-4280-6774","authenticated-orcid":false,"given":"Dingxu","family":"Guo","sequence":"additional","affiliation":[]},{"given":"Lan","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Haitao","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Hao","family":"Dong","sequence":"additional","affiliation":[]},{"given":"Youli","family":"Qiu","sequence":"additional","affiliation":[]},{"given":"Huiwen","family":"Zou","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,16]]},"reference":[{"key":"1463_CR1","unstructured":"Chen Y, Zhang Y, Duan Z (2017) DCASE2017 sound event detection using convolutional neural network. Detection and classification of acoustic scenes and events"},{"key":"1463_CR2","unstructured":"Zhou J (2017) Sound event detection in multichannel audio LSTM network. Detection and classification of acoustic scenes and events"},{"issue":"6","key":"1463_CR3","doi-asserted-by":"publisher","first-page":"1291","DOI":"10.1109\/TASLP.2017.2690575","volume":"25","author":"E Cak\u0131r","year":"2017","unstructured":"Cak\u0131r E, Parascandolo G, Heittola T, Huttunen H, Virtanen T (2017) Convolutional recurrent neural networks for polyphonic sound event detection. IEEE\/ACM Trans Audio Speech Lang Process 25(6):1291\u20131303. https:\/\/doi.org\/10.1109\/TASLP.2017.2690575","journal-title":"IEEE\/ACM Trans Audio Speech Lang Process"},{"key":"1463_CR4","unstructured":"Lu R, Duan Z (2017) Bidirectional GRU for sound event detection. Detection and classification of acoustic scenes and events, pp 1\u20133"},{"key":"1463_CR5","doi-asserted-by":"publisher","unstructured":"Xia W, Koishida K (2019) Sound event detection in multichannel audio using convolutional time-frequency-channel squeeze and excitation. https:\/\/doi.org\/10.48550\/arXiv.1908.01399","DOI":"10.48550\/arXiv.1908.01399"},{"key":"1463_CR6","doi-asserted-by":"publisher","unstructured":"Watcharasupat KN, Nguyen TNT, Nguyen NK, Lee ZJ, Jones DL, Gan WS (2021) Improving polyphonic sound event detection on multichannel recordings with the S\u00f8rensen-dice coefficient loss and transfer learning. https:\/\/doi.org\/10.48550\/arXiv.2107.10471","DOI":"10.48550\/arXiv.2107.10471"},{"key":"1463_CR7","doi-asserted-by":"publisher","unstructured":"Wang X, Zhang X, Zi Y, Xiong S (2022) A frame loss of multiple instance learning for weakly supervised sound event detection. In: ICASSP 2022\u20132022 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 331\u2013335. https:\/\/doi.org\/10.1109\/ICASSP43922.2022.9746435","DOI":"10.1109\/ICASSP43922.2022.9746435"},{"key":"1463_CR8","doi-asserted-by":"publisher","unstructured":"Feroze K, Maud AR (2018) Sound event detection in real life audio using perceptual linear predictive feature with neural network. In: 2018 15th international Bhurban conference on applied sciences and technology (IBCAST). IEEE, pp 377\u2013382. https:\/\/doi.org\/10.1109\/IBCAST.2018.8312252","DOI":"10.1109\/IBCAST.2018.8312252"},{"key":"1463_CR9","doi-asserted-by":"publisher","unstructured":"Adavanne S, Virtanen T (2017) A report on sound event detection with different binaural features. https:\/\/doi.org\/10.48550\/arXiv.1710.02997","DOI":"10.48550\/arXiv.1710.02997"},{"key":"1463_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2023.119903","volume":"657","author":"Q Ke","year":"2024","unstructured":"Ke Q, Jing X, Wo\u017aniak M, Xu S, Liang Y, Zheng J (2024) APGVAE: adaptive disentangled representation learning with the graph-based structure information. Inf Sci 657:119903. https:\/\/doi.org\/10.1016\/j.ins.2023.119903","journal-title":"Inf Sci"},{"issue":"2","key":"1463_CR11","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1587\/transinf.2020EDP7036","volume":"104","author":"N Tonami","year":"2021","unstructured":"Tonami N, Imoto K, Yamanishi R, Yamashita Y (2021) Joint analysis of sound events and acoustic scenes using multitask learning. IEICE Trans Inf Syst 104(2):294\u2013301. https:\/\/doi.org\/10.1587\/transinf.2020EDP7036","journal-title":"IEICE Trans Inf Syst"},{"key":"1463_CR12","doi-asserted-by":"publisher","unstructured":"Komatsu T, Watanabe S, Miyazaki K, Hayashi T (2022) Acoustic event detection with classifier chains. arXiv:2202.08470. https:\/\/doi.org\/10.21437\/Interspeech.2021-2218","DOI":"10.21437\/Interspeech.2021-2218"},{"key":"1463_CR13","doi-asserted-by":"publisher","first-page":"1560","DOI":"10.1109\/LSP.2020.3019702","volume":"27","author":"H Wang","year":"2020","unstructured":"Wang H, Zou Y, Chong D, Wang W (2020) Modeling label dependencies for audio tagging with graph convolutional network. IEEE Signal Process Lett 27:1560\u20131564. https:\/\/doi.org\/10.1109\/LSP.2020.3019702","journal-title":"IEEE Signal Process Lett"},{"key":"1463_CR14","doi-asserted-by":"publisher","unstructured":"Sun Y, Ghaffarzadegan S (2020) An ontology-aware framework for audio event classification. In: ICASSP 2020\u20132020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 321\u2013325. https:\/\/doi.org\/10.1109\/ICASSP40776.2020.9053389","DOI":"10.1109\/ICASSP40776.2020.9053389"},{"key":"1463_CR15","doi-asserted-by":"publisher","unstructured":"Imoto K, Kyochi S (2019) Sound event detection using graph Laplacian regularization based on event co-occurrence. In: ICASSP 2019\u20132019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1\u20135. https:\/\/doi.org\/10.1109\/ICASSP.2019.8683708","DOI":"10.1109\/ICASSP.2019.8683708"},{"key":"1463_CR16","doi-asserted-by":"publisher","unstructured":"Nt H, Maehara T (2019) Revisiting graph neural networks: all we have is low-pass filters. https:\/\/doi.org\/10.48550\/arXiv.1905.09550","DOI":"10.48550\/arXiv.1905.09550"},{"key":"1463_CR17","doi-asserted-by":"publisher","unstructured":"Chen ZM, Wei XS, Wang P, Guo Y (2019) Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5177\u20135186. https:\/\/doi.org\/10.1109\/CVPR.2019.00532","DOI":"10.1109\/CVPR.2019.00532"},{"key":"1463_CR18","doi-asserted-by":"publisher","unstructured":"Luan S, Hua C, Lu Q, Zhu J, Zhao M, Zhang S, Precup D (2022) Revisiting heterophily for graph neural networks. Advances in neural information processing systems, vol 35, pp 1362\u20131375. https:\/\/doi.org\/10.48550\/arXiv.2210.07606","DOI":"10.48550\/arXiv.2210.07606"},{"key":"1463_CR19","doi-asserted-by":"publisher","unstructured":"Luan S, Hua C, Xu M, Lu Q, Zhu J, Chang X W, Precup D (2024) When do graph neural networks help with node classification? Investigating the homophily principle on node distinguishability. Advances in Neural Information Processing Systems, vol 36. https:\/\/doi.org\/10.48550\/arXiv.2304.14274","DOI":"10.48550\/arXiv.2304.14274"},{"key":"1463_CR20","doi-asserted-by":"publisher","unstructured":"Luan S, Zhao M, Hua C, Chang X W, Precup D (2020) Complete the missing half: augmenting aggregation filtering with diversification for graph convolutional networks. https:\/\/doi.org\/10.48550\/arXiv.2008.08844","DOI":"10.48550\/arXiv.2008.08844"},{"key":"1463_CR21","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.109616","volume":"254","author":"W Dong","year":"2022","unstructured":"Dong W, Wu J, Zhang X, Bai Z, Wang P, Wo\u017aniak M (2022) Improving performance and efficiency of graph neural networks by injective aggregation. Knowl Based Syst 254:109616. https:\/\/doi.org\/10.1016\/j.knosys.2022.109616","journal-title":"Knowl Based Syst"},{"key":"1463_CR22","doi-asserted-by":"publisher","unstructured":"Mesaros A, Heittola T, Virtanen T (2016) TUT database for acoustic scene classification and sound event detection. In: 2016 24th European signal processing conference (EUSIPCO). IEEE, pp 1128\u20131132. https:\/\/doi.org\/10.1109\/EUSIPCO.2016.7760424","DOI":"10.1109\/EUSIPCO.2016.7760424"},{"key":"1463_CR23","unstructured":"Mesaros A, Heittola T, Diment A, Elizalde B, Shah A, Vincent, E, Virtanen T (2017) DCASE 2017 challenge setup: tasks, datasets and baseline system. In: DCASE 2017-workshop on detection and classification of acoustic scenes and events. http:\/\/urn.fi\/URN:ISBN:978-952-15-4042-4"},{"issue":"6","key":"1463_CR24","doi-asserted-by":"publisher","first-page":"162","DOI":"10.3390\/app6060162","volume":"6","author":"A Mesaros","year":"2016","unstructured":"Mesaros A, Heittola T, Virtanen T (2016) Metrics for polyphonic sound event detection. Appl Sci 6(6):162. https:\/\/doi.org\/10.3390\/app6060162","journal-title":"Appl Sci"},{"issue":"7","key":"1463_CR25","doi-asserted-by":"publisher","first-page":"3293","DOI":"10.3390\/app12073293","volume":"12","author":"S Venkatesh","year":"2022","unstructured":"Venkatesh S, Moffat D, Miranda ER (2022) You only hear once: a yolo-like algorithm for audio segmentation and sound event detection. Appl Sci 12(7):3293. https:\/\/doi.org\/10.3390\/app12073293","journal-title":"Appl Sci"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01463-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01463-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01463-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T17:26:18Z","timestamp":1721237178000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01463-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,16]]},"references-count":25,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["1463"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01463-7","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,16]]},"assertion":[{"value":"31 January 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors assert that there are no conflicts of interest in relation to the publication of this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}