{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T11:08:36Z","timestamp":1767006516797,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,2,7]],"date-time":"2019-02-07T00:00:00Z","timestamp":1549497600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1833115"],"award-info":[{"award-number":["U1833115"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In order to obtain real-time controlling dynamics in air traffic system, a framework is proposed to introduce and process air traffic control (ATC) speech via radiotelephony communication. An automatic speech recognition (ASR) and controlling instruction understanding (CIU)-based pipeline is designed to convert the ATC speech into ATC related elements, i.e., controlling intent and parameters. A correction procedure is also proposed to improve the reliability of the information obtained by the proposed framework. In the ASR model, acoustic model (AM), pronunciation model (PM), and phoneme- and word-based language model (LM) are proposed to unify multilingual ASR into one model. In this work, based on their tasks, the AM and PM are defined as speech recognition and machine translation problems respectively. Two-dimensional convolution and average-pooling layers are designed to solve special challenges of ASR in ATC. An encoder\u2013decoder architecture-based neural network is proposed to translate phoneme labels into word labels, which achieves the purpose of ASR. In the CIU model, a recurrent neural network-based joint model is proposed to detect the controlling intent and label the controlling parameters, in which the two tasks are solved in one network to enhance the performance with each other based on ATC communication rules. The ATC speech is now converted into ATC related elements by the proposed ASR and CIU model. To further improve the accuracy of the sensing framework, a correction procedure is proposed to revise minor mistakes in ASR decoding results based on the flight information, such as flight plan, ADS-B. The proposed models are trained using real operating data and applied to a civil aviation airport in China to evaluate their performance. Experimental results show that the proposed framework can obtain real-time controlling dynamics with high performance, only 4% word-error rate. Meanwhile, the decoding efficiency can also meet the requirement of real-time applications, i.e., an average 0.147 real time factor. With the proposed framework and obtained traffic dynamics, current ATC applications can be accomplished with higher accuracy. In addition, the proposed ASR pipeline has high reusability, which allows us to apply it to other controlling scenes and languages with minor changes.<\/jats:p>","DOI":"10.3390\/s19030679","type":"journal-article","created":{"date-parts":[[2019,2,7]],"date-time":"2019-02-07T11:50:33Z","timestamp":1549540233000},"page":"679","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["Real-time Controlling Dynamics Sensing in Air Traffic System"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7194-5023","authenticated-orcid":false,"given":"Yi","family":"Lin","sequence":"first","affiliation":[{"name":"National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China"},{"name":"National Key Laboratory of Air Traffic Control Automation System Technology, Sichuan University, Chengdu 610065, China"}]},{"given":"Xianlong","family":"Tan","sequence":"additional","affiliation":[{"name":"Southwest Air Traffic Management Bureau, Civil Aviation Administration of China, Chengdu 610000, China"}]},{"given":"Bo","family":"Yang","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China"},{"name":"National Key Laboratory of Air Traffic Control Automation System Technology, Sichuan University, Chengdu 610065, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4550-2023","authenticated-orcid":false,"given":"Kai","family":"Yang","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China"},{"name":"National Key Laboratory of Air Traffic Control Automation System Technology, Sichuan University, Chengdu 610065, China"}]},{"given":"Jianwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China"},{"name":"National Key Laboratory of Air Traffic Control Automation System Technology, Sichuan University, Chengdu 610065, China"}]},{"given":"Jing","family":"Yu","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China"},{"name":"National Key Laboratory of Air Traffic Control Automation System Technology, Sichuan University, Chengdu 610065, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,2,7]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"Bergner, J., and Hassa, O. (2012). Air Traffic Control. Information Ergonomics, Springer.","key":"ref_1","DOI":"10.1007\/978-3-642-25841-1_7"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.jairtraman.2012.11.010","article-title":"An analysis of air traffic controller-pilot miscommunication in the NextGen environment","volume":"27","author":"Skaltsas","year":"2013","journal-title":"J. Air Transp. Manag."},{"doi-asserted-by":"crossref","unstructured":"Helmke, H., Ohneiser, O., Muhlhausen, T., and Wies, M. (2016, January 25\u201329). Reducing controller workload with automatic speech recognition. Proceedings of the 2016 IEEE\/AIAA 35th Digital Avionics Systems Conference (DASC), Sacramento, CA, USA.","key":"ref_3","DOI":"10.1109\/DASC.2016.7778024"},{"unstructured":"Jagan Mohan, B., and Ramesh Babu, N. (2014, January 9\u201311). Speech recognition using MFCC and DTW. Proceedings of the 2014 International Conference on Advances in Electrical Engineering (ICAEE), Vellore, India.","key":"ref_4"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1561\/2000000004","article-title":"The Application of Hidden Markov Models in Speech Recognition","volume":"1","author":"Gales","year":"2007","journal-title":"Found. Trends\u00ae Signal Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/TASL.2011.2134090","article-title":"Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition","volume":"20","author":"Dahl","year":"2012","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"doi-asserted-by":"crossref","unstructured":"Graves, A., Fern\u00e1ndez, S., Gomez, F., and Schmidhuber, J. (2006, January 25\u201329). Connectionist temporal classification. Proceedings of the 23rd International Conference on Machine Learning-ICML \u201906, Pittsburgh, PA, USA.","key":"ref_7","DOI":"10.1145\/1143844.1143891"},{"unstructured":"Hannun, A.Y., Maas, A.L., Jurafsky, D., and Ng, A.Y. (arXiv, 2014). First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs, arXiv.","key":"ref_8"},{"doi-asserted-by":"crossref","unstructured":"Sainath, T.N., Vinyals, O., Senior, A., and Sak, H. (2015, January 19\u201324). Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia.","key":"ref_9","DOI":"10.1109\/ICASSP.2015.7178838"},{"doi-asserted-by":"crossref","unstructured":"Deng, L., Abdel-Hamid, O., and Yu, D. (2013, January 26\u201331). A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","key":"ref_10","DOI":"10.1109\/ICASSP.2013.6638952"},{"key":"ref_11","first-page":"173","article-title":"Deep Speech 2: End-to-End Speech Recognition in English and Mandarin","volume":"48","author":"Amodei","year":"2016","journal-title":"Int. Conf. Mach. Learn."},{"doi-asserted-by":"crossref","unstructured":"Zhang, Y., Chan, W., and Jaitly, N. (2017, January 5\u20139). Very deep convolutional networks for end-to-end speech recognition. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","key":"ref_12","DOI":"10.1109\/ICASSP.2017.7953077"},{"key":"ref_13","first-page":"1916","article-title":"Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control","volume":"9","author":"Nguyen","year":"2015","journal-title":"Int. J. Comput. Electr. Autom. Control Inf. Eng."},{"unstructured":"Wang, D., and Zhang, X. (2015). THCHS-30: A Free Chinese Speech Corpus. arXiv.","key":"ref_14"},{"unstructured":"(2019, February 02). Open Speech and Language Resources. Available online: http:\/\/www.openslr.org\/7\/.","key":"ref_15"},{"key":"ref_16","first-page":"765","article-title":"Perceptual evaluation of speech quality (PESQ)-The new ITU standard for objective measurement of perceived speech quality, Part II\u2013Psychoacoustic model","volume":"50","author":"Beerends","year":"2002","journal-title":"J. Audio Eng. Soc."},{"unstructured":"ICAO (2010). Manual on the Implementation of ICAO Language Proficiency Requirements, International Civil Aviation Organization.","key":"ref_17"},{"doi-asserted-by":"crossref","unstructured":"Kopald, H.D., Chanen, A., Chen, S., Smith, E.C., and Tarakan, R.M. (2013, January 5\u201310). Applying automatic speech recognition technology to Air Traffic Management. Proceedings of the 2013 IEEE\/AIAA 32nd Digital Avionics Systems Conference (DASC), East Syracuse, NY, USA.","key":"ref_18","DOI":"10.1109\/DASC.2013.6719700"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1016\/j.ast.2011.05.002","article-title":"A speech interface for air traffic control terminals","volume":"21","author":"Ferreiros","year":"2012","journal-title":"Aerosp. Sci. Technol."},{"doi-asserted-by":"crossref","unstructured":"Srinivasamurthy, A., Motlicek, P., Himawan, I., Szasz\u00e1k, G., Oualil, Y., and Helmke, H. (2017, January 20\u201324). Semi-Supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control. Proceedings of the Interspeech 2017, Stockholm, Sweden.","key":"ref_20","DOI":"10.21437\/Interspeech.2017-1446"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1109\/MAES.2006.1705165","article-title":"Air traffic control speech recognition system cross-task and speaker adaptation","volume":"21","author":"Cordoba","year":"2006","journal-title":"IEEE Aerosp. Electron. Syst. Mag."},{"doi-asserted-by":"crossref","unstructured":"Johnson, D.R., Nenov, V.I., and Espinoza, G. (2013, January 5\u201310). Automatic Speech Semantic Recognition and verification in Air Traffic Control. Proceedings of the 2013 IEEE\/AIAA 32nd Digital Avionics Systems Conference (DASC), East Syracuse, NY, USA.","key":"ref_22","DOI":"10.1109\/DASC.2013.6712602"},{"doi-asserted-by":"crossref","unstructured":"Pellegrini, T., Farinas, J., Delpech, E., and Lancelot, F. (arXiv, 2018). The Airbus Air Traffic Control speech recognition 2018 challenge: Towards ATC automatic transcription and call sign detection, arXiv.","key":"ref_23","DOI":"10.21437\/Interspeech.2019-1962"},{"unstructured":"Biadsy, F. (2011). Automatic Dialect and Accent Recognition and its Application to Speech Recognition. [Ph.D. Thesis, Columbia University].","key":"ref_24"},{"unstructured":"Haffner, P., Tur, G., and Wright, J.H. (2003, January 6\u201310). Optimizing SVMs for complex call classification. Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP \u201903), Hong Kong, China.","key":"ref_25"},{"doi-asserted-by":"crossref","unstructured":"Yao, K., Peng, B., Zweig, G., Yu, D., Li, X., and Gao, F. (2014, January 4\u20139). Recurrent conditional random field for language understanding. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","key":"ref_26","DOI":"10.1109\/ICASSP.2014.6854368"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1016\/j.jmateco.2003.11.006","article-title":"On the objective of firms under uncertainty with stock markets","volume":"40","author":"Bonnisseau","year":"2004","journal-title":"J. Math. Econ."},{"doi-asserted-by":"crossref","unstructured":"Yao, K., Peng, B., Zhang, Y., Yu, D., Zweig, G., and Shi, Y. (2014, January 7\u201310). Spoken language understanding using long short-term memory neural networks. Proceedings of the 2014 IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, NV, USA.","key":"ref_28","DOI":"10.1109\/SLT.2014.7078572"},{"doi-asserted-by":"crossref","unstructured":"Xu, P., and Sarikaya, R. (2013, January 8\u201312). Convolutional neural network based triangular CRF for joint intent detection and slot filling. Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic.","key":"ref_29","DOI":"10.1109\/ASRU.2013.6707709"},{"doi-asserted-by":"crossref","unstructured":"Guo, D., Tur, G., Yih, W., and Zweig, G. (2014, January 7\u201310). Joint semantic utterance classification and slot filling with recursive neural networks. Proceedings of the 2014 IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, NV, USA.","key":"ref_30","DOI":"10.1109\/SLT.2014.7078634"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"doi-asserted-by":"crossref","unstructured":"Chen, K., Yan, Z.-J., and Huo, Q. (2015, January 23\u201326). A context-sensitive-chunk BPTT approach to training deep LSTM\/BLSTM recurrent neural networks for offline handwriting recognition. Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia.","key":"ref_32","DOI":"10.1109\/ICDAR.2015.7333794"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1023\/A:1025012916465","article-title":"Recomposition: Coordinating a Web of Software Dependencies","volume":"12","author":"Grinter","year":"2003","journal-title":"Comput. Support. Coop. Work"},{"key":"ref_34","first-page":"1764","article-title":"Towards End-To-End Speech Recognition with Recurrent Neural Networks","volume":"32","author":"Graves","year":"2014","journal-title":"JMLR Workshop Conf. Proc."},{"doi-asserted-by":"crossref","unstructured":"Liu, B., and Lane, I. (2016, January 13\u201315). Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks. Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Los Angeles, CA, USA.","key":"ref_35","DOI":"10.18653\/v1\/W16-3603"},{"unstructured":"(2019, February 02). Kaldi. Available online: http:\/\/kaldi-asr.org\/.","key":"ref_36"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/3\/679\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:31:35Z","timestamp":1760185895000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/3\/679"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,7]]},"references-count":36,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,2]]}},"alternative-id":["s19030679"],"URL":"https:\/\/doi.org\/10.3390\/s19030679","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2019,2,7]]}}}