{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T22:22:14Z","timestamp":1766269334890,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2022,6,12]],"date-time":"2022-06-12T00:00:00Z","timestamp":1654992000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004663","name":"the Ministry of Science and Technology","doi-asserted-by":"publisher","award":["MOST 108-2221-E-027-064"],"award-info":[{"award-number":["MOST 108-2221-E-027-064"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The train horn sound is an active audible warning signal used for warning commuters and railway employees of the oncoming train(s), assuring a smooth operation and traffic safety, especially at barrier-free crossings. This work studies deep learning-based approaches to develop a system providing the early detection of train arrival based on the recognition of train horn sounds from the traffic soundscape. A custom dataset of train horn sounds, car horn sounds, and traffic noises is developed to conduct experiments and analysis. We propose a novel two-stream end-to-end CNN model (i.e., THD-RawNet), which combines two approaches of feature extraction from raw audio waveforms, for audio classification in train horn detection (THD). Besides a stream with a sequential one-dimensional CNN (1D-CNN) as in existing sound classification works, we propose to utilize multiple 1D-CNN branches to process raw waves in different temporal resolutions to extract an image-like representation for the 2D-CNN classification part. Our experiment results and comparative analysis have proved the effectiveness of the proposed two-stream network and the method of combining features extracted in multiple temporal resolutions. The THD-RawNet obtained better accuracies and robustness compared to those of baseline models trained on either raw audio or handcrafted features, in which at the input size of one second the network yielded an accuracy of 95.11% for testing data in normal traffic conditions and remained above a 93% accuracy for the considerable noisy condition of-10 dB SNR. The proposed THD system can be integrated into the smart railway crossing systems, private cars, and self-driving cars to improve railway transit safety.<\/jats:p>","DOI":"10.3390\/s22124453","type":"journal-article","created":{"date-parts":[[2022,6,13]],"date-time":"2022-06-13T02:01:44Z","timestamp":1655085704000},"page":"4453","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["End-to-End Train Horn Detection for Railway Transit Safety"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3197-679X","authenticated-orcid":false,"given":"Van-Thuan","family":"Tran","sequence":"first","affiliation":[{"name":"Department of Electronic Engineering, National Taipei University of Technology, Taipei 10608, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei-Ho","family":"Tsai","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, National Taipei University of Technology, Taipei 10608, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yury","family":"Furletov","sequence":"additional","affiliation":[{"name":"Department of Mathematical Cybernetics and Information Technology, Moscow Technical University of Communications and Informatics, 111024 Moscow, Russia"},{"name":"Department of Automotive Engineering, Moscow Automobile and Road Construction State Technical University, 125319 Moscow, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1739-9831","authenticated-orcid":false,"given":"Mikhail","family":"Gorodnichev","sequence":"additional","affiliation":[{"name":"Department of Mathematical Cybernetics and Information Technology, Moscow Technical University of Communications and Informatics, 111024 Moscow, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Santos, J., Hempel, M., and Sharif, H. (2013, January 2\u20135). Sensing Techniques and Detection Methods for Train Approach Detection. Proceedings of the IEEE 78th Vehicular Technology Conference (VTC Fall), Las Vegas, NV, USA.","DOI":"10.1109\/VTCFall.2013.6692407"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Chetty, K., Chen, Q., and Woodbridge, K. (2016, January 2\u20136). Train Monitoring Using GSM-R Based Passive Radar. Proceedings of the 2016 IEEE Radar Conference (RadarConf), Philadelphia, PA, USA.","DOI":"10.1109\/RADAR.2016.7485069"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"64","DOI":"10.21014\/acta_imeko.v5i4.419","article-title":"On the Safety Design of Radar Based Railway Level Crossing Surveillance Systems","volume":"5","author":"Addabbo","year":"2016","journal-title":"Acta IMEKO"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Angrisani, L., Grillo, D., Lo Moriello, R.S., and Filo, G. (2010, January 3\u20136). Automatic Detection of Train Arrival through an Accelerometer. Proceedings of the 2010 IEEE Instrumentation Measurement Technology Conference Proceedings, Austin, TX, USA.","DOI":"10.1109\/IMTC.2010.5488089"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Piczak, K.J. (2015, January 17\u201320). Environmental Sound Classification with Convolutional Neural Networks. Proceedings of the 2015 IEEE International Workshop on Machine Learning for Signal Processing, Boston, MA, USA.","DOI":"10.1109\/MLSP.2015.7324337"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"21552","DOI":"10.1038\/s41598-021-01045-4","article-title":"Environmental Sound Classification Using Temporal-Frequency Attention Based Convolutional Neural Network","volume":"11","author":"Mu","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1016\/j.neucom.2020.08.069","article-title":"Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification","volume":"453","author":"Zhang","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/978-981-15-6318-8_20","article-title":"Convolutional Neural Network Based Sound Recognition Methods for Detecting Presence of Amateur Drones in Unauthorized Zones","volume":"Volume 1241","author":"Ganapathi","year":"2020","journal-title":"Communications in Computer and Information Science"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"130327","DOI":"10.1109\/ACCESS.2019.2939495","article-title":"Learning Attentive Representations for Environmental Sound Classification","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Su, Y., Zhang, K., Wang, J., and Madani, K. (2019). Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion. Sensors, 19.","DOI":"10.3390\/s19071733"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Qiao, T., Zhang, S., Cao, S., and Xu, S. (2021). High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism. Sensors, 21.","DOI":"10.3390\/s21165500"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2048","DOI":"10.1016\/j.procs.2017.08.250","article-title":"Classifying Environmental Sounds Using Image Recognition Networks","volume":"112","author":"Boddapati","year":"2017","journal-title":"Procedia Comput. Sci."},{"key":"ref_13","first-page":"1097","article-title":"ImageNet Classification with Deep Convolutional Neural Networks","volume":"Volume 25","author":"Pereira","year":"2012","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going Deeper with Convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Guzhov, A., Raue, F., Hees, J., and Dengel, A.R. (2021, January 10\u201315). ESResNet: Environmental Sound Classification Based on Visual Domain Models. Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy.","DOI":"10.1109\/ICPR48806.2021.9413035"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1016\/j.eswa.2019.06.040","article-title":"End-to-End Environmental Sound Classification Using a 1D Convolutional Neural Network","volume":"136","author":"Abdoli","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_18","unstructured":"Aytar, Y., Vondrick, C., and Torralba, A. (2016, January 5\u201310). SoundNet: Learning Sound Representations from Unlabeled Video. Proceedings of the 30th International Conference on Neural Information Processing Systems NIPS\u201916, Barcelona, Spain."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tokozume, Y., and Harada, T. (2017, January 5\u20139). Learning Environmental Sounds with End-to-End Convolutional Neural Network. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952651"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Takahashi, N., Gygli, M., Pfister, B., and Van Gool, L. (2016). Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition. Interspeech 2016, 2982\u20132986.","DOI":"10.21437\/Interspeech.2016-805"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1109\/LSP.2017.2657381","article-title":"Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification","volume":"24","author":"Salamon","year":"2017","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bear, H.L., Morfi, V., and Benetos, E. (2021). An Evaluation of Data Augmentation Methods for Sound Scene Geotagging. Interspeech 2021, 581\u2013585.","DOI":"10.21437\/Interspeech.2021-1837"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Park, D., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E., and Le, Q. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. Interspeech 2019, 2613\u20132617.","DOI":"10.21437\/Interspeech.2019-2680"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Piczak, K.J. (2015). ESC: Dataset for Environmental Sound Classification. Proceeding of the 2015 ACM Multimedia Conference, Brisbane, Australia, 26\u201330 October 2015, Association for Computing Machinery, Inc.","DOI":"10.1145\/2733373.2806390"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Mcfee, B., Raffel, C., Liang, D., Ellis, D.P.W., Mcvicar, M., Battenberg, E., and Nieto, O. (2015, January 6\u201312). Librosa: Audio and Music Signal Analysis in Python. Proceedings of the 14th Python in Science Conference, Austin, TX, USA.","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"ref_26","unstructured":"Kingma, D.P., and Ba, J.L. (2015, January 7\u20139). Adam: A Method for Stochastic Optimization. Proceedings of the International Conference for Learning Representations, San Diego, CA, USA."},{"key":"ref_27","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 7\u20139). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France."},{"key":"ref_28","first-page":"1929","article-title":"Dropout: A Simple Way to Prevent Neural Networks from Overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lezhenin, I., Bogach, N., and Pyshkin, E. (2019, January 1\u20134). Urban Sound Classification Using Long Short-Term Memory Neural Network. Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, FedCSIS 2019, Leipzig, Germany.","DOI":"10.15439\/2019F185"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/12\/4453\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:28:29Z","timestamp":1760138909000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/12\/4453"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,12]]},"references-count":29,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["s22124453"],"URL":"https:\/\/doi.org\/10.3390\/s22124453","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,6,12]]}}}