{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:10:32Z","timestamp":1760242232542,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2017,1,19]],"date-time":"2017-01-19T00:00:00Z","timestamp":1484784000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Over the last few decades, a number of reinforcement learning techniques have emerged, and different reinforcement learning-based applications have proliferated. However, such techniques tend to specialize in a particular field. This is an obstacle to their generalization and extrapolation to other areas. Besides, neither the reward-punishment (r-p) learning process nor the convergence of results is fast and efficient enough. To address these obstacles, this research proposes a general reinforcement learning model. This model is independent of input and output types and based on general bioinspired principles that help to speed up the learning process. The model is composed of a perception module based on sensors whose specific perceptions are mapped as perception patterns. In this manner, similar perceptions (even if perceived at different positions in the environment) are accounted for by the same perception pattern. Additionally, the model includes a procedure that statistically associates perception-action pattern pairs depending on the positive or negative results output by executing the respective action in response to a particular perception during the learning process. To do this, the model is fitted with a mechanism that reacts positively or negatively to particular sensory stimuli in order to rate results. The model is supplemented by an action module that can be configured depending on the maneuverability of each specific agent. The model has been applied in the air navigation domain, a field with strong safety restrictions, which led us to implement a simulated system equipped with the proposed model. Accordingly, the perception sensors were based on Automatic Dependent Surveillance-Broadcast (ADS-B) technology, which is described in this paper. The results were quite satisfactory, and it outperformed traditional methods existing in the literature with respect to learning reliability and efficiency.<\/jats:p>","DOI":"10.3390\/s17010188","type":"journal-article","created":{"date-parts":[[2017,1,19]],"date-time":"2017-01-19T10:55:40Z","timestamp":1484823340000},"page":"188","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["A Reinforcement Learning Model Equipped with Sensors for Generating Perception Patterns: Implementation of a Simulated Air Navigation System Using ADS-B (Automatic Dependent Surveillance-Broadcast) Technology"],"prefix":"10.3390","volume":"17","author":[{"given":"Santiago","family":"\u00c1lvarez de Toledo","sequence":"first","affiliation":[{"name":"Escuela T\u00e9cnica Superior de Ingenieros Inform\u00e1ticos, Campus de Montegancedo, Technical University of Madrid (UPM), Boadilla del Monte, 28660 Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aurea","family":"Anguera","sequence":"additional","affiliation":[{"name":"Escuela T\u00e9cnica Superior de Ingenier\u00eda de Sistemas Inform\u00e1ticos, Technical University of Madrid (UPM), C\/Alan Turing s\/n (Ctra. de Valencia km. 7), 28031 Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6626-5835","authenticated-orcid":false,"given":"Jos\u00e9","family":"Barreiro","sequence":"additional","affiliation":[{"name":"Escuela T\u00e9cnica Superior de Ingenieros Inform\u00e1ticos, Campus de Montegancedo, Technical University of Madrid (UPM), Boadilla del Monte, 28660 Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Juan","family":"Lara","sequence":"additional","affiliation":[{"name":"Escuela de Ciencias T\u00e9cnicas e Ingenier\u00eda, Madrid Open University (MOU), Crta. de la Coru\u00f1a km. 38.500, V\u00eda de Servicio, 15, Collado Villalba, 28400 Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7928-5237","authenticated-orcid":false,"given":"David","family":"Lizcano","sequence":"additional","affiliation":[{"name":"Escuela de Ciencias T\u00e9cnicas e Ingenier\u00eda, Madrid Open University (MOU), Crta. de la Coru\u00f1a km. 38.500, V\u00eda de Servicio, 15, Collado Villalba, 28400 Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2017,1,19]]},"reference":[{"key":"ref_1","unstructured":"Simon, H.A. (1983). Machine Learning, Springer."},{"key":"ref_2","unstructured":"Airservices How ADS-B Works. Available online: http:\/\/www.airservicesaustralia.com\/projects\/ads-b\/how-ads-b-works\/."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Rubinstein, R.Y. (1981). Simulation and the Monte Carlo Method, Wiley.","DOI":"10.1002\/9780470316511"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kalos, M.H., and Whitlock, P.A. (1986). Monte Carlo Methods, Wiley.","DOI":"10.1002\/9783527617395"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ulam, S.M. (1991). Adventures of a Mathematician, University of California Press.","DOI":"10.1525\/9780520910553"},{"key":"ref_6","unstructured":"Sutton, R.S. (1978). Learning Theory Support for a Single Channel Theory of the Brain,. [Ph.D. Thesis, Stanford University]."},{"key":"ref_7","first-page":"72","article-title":"Single channel theory: A neuronal theory of learning","volume":"4","author":"Sutton","year":"1978","journal-title":"Brain Theory Newslett."},{"key":"ref_8","first-page":"835","article-title":"Neuronlike elements that can solve difficult learning control problems","volume":"13","author":"Barto","year":"1983","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_10","unstructured":"Dale, E., and Michie, D. (1968). BOXES: An Experiment in Adaptive Control, Elsevier\/North-Holland."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00114726","article-title":"Reinforcement learning with replacing eligibility traces","volume":"22","author":"Singh","year":"1996","journal-title":"Mach. Learn."},{"key":"ref_12","unstructured":"Cohen, J.D., Tesauro, G., and Alspector, J. Monte Carlo matrix inversion and reinforcement learning. Proceedings of the 1993 Conference on Advances in Neural Information Processing Systems."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/BF00993306","article-title":"Asynchronous stochastic approximation and Q-Learning","volume":"16","author":"Tsitsiklis","year":"1994","journal-title":"Mach. Learn."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Busoniu, L., Ernst, D., De Schutter, B., and Babuska, R. (July, January 30). Online least-squares policy iteration for reinforcement learning control. Proceedings of the 2010 American Control Conference (ACC), Baltimore, MD, USA.","DOI":"10.1109\/ACC.2010.5530856"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1109\/TSMC.1987.289329","article-title":"Building and understanding adaptive systems: A statistical\/numerical approach to factory automation and brain research","volume":"17","author":"Werbos","year":"1987","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_16","unstructured":"Thathachar, M.A.L., and Sastry, P.S. (, January December). Estimator algorithms for learning automata. Proceedings of the Platinum Jubilee Conference on Systems and Signal Processing, Bengalore, India."},{"key":"ref_17","unstructured":"Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. [Ph.D. Thesis, Cambridge University]."},{"key":"ref_18","unstructured":"Platt, J.C., Koller, D., Singer, Y., and Roweis, S.T. (2008). Advances in Neural Information Processing Systems 20, MIT Press."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Farahmand, A.M., Ghavamzadeh, M., Szepesvari, C., and Mannor, S. (2009, January 10\u201312). Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems. Proceedings of the 2009 American Control Conference (ACC-09), St. Louis, MO, USA.","DOI":"10.1109\/ACC.2009.5160611"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Szepesvari, C.S., and Smart, W.D. (2004, January 4\u20138). Interpolation-based Q-learning. Proceedings of the 21st International Conference on Machine Learning (ICML-04), Bannf, AB, Canada.","DOI":"10.1145\/1015330.1015445"},{"key":"ref_21","unstructured":"Gambardella, L.C., and Dorigo, M. (1995). Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9\u201312 July 1995, Morgan Kaufmann."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1162\/neco.1994.6.2.215","article-title":"TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play","volume":"6","author":"Tesauro","year":"1994","journal-title":"Neural Comput."},{"key":"ref_23","unstructured":"Crites, R., and Barto, A. (1996). Advances in Neural Information Processing Systems 8, MIT Press."},{"key":"ref_24","unstructured":"Tesauro, G., Das, R., Walsh, W.E., and Kephart, J.O. (2005, January 13\u201316). Utility-Function-Driven Resource Allocation in Autonomic Systems. Proceedings of the 2nd International Conference on Autonomic Computing (ICAC 2005), Seattle, WA, USA."},{"key":"ref_25","unstructured":"Silver, D., Sutton, R., and Muller, M. (2007, January 6\u201312). Reinforcemen learning of local shape in the game of Go. Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India."},{"key":"ref_26","unstructured":"Guez, A., Vincent, R.D., Avoli, M., and Pineau, J. (2008, January 13\u201317). Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning. Proceedings of the 23rd AAAI National Conference on Artificial Intelligence, Chicago, IL, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ipek, E., Mutlu, O., Martinez, J.F., and Caruana, R. (2008, January 21\u201325). Self-optimizing memory controllers: A reinforcement learning approach. Proceedings of the 35th International Symposium on Computer Architecture, Beijing, China.","DOI":"10.1109\/ISCA.2008.21"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_29","unstructured":"International Civil Aviation Organization (ICAO) (2012). ICAO Doc 9871, Technical Provisions for Mode S and Extended Squitter, ICAO. [2nd ed.]."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/17\/1\/188\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:26:35Z","timestamp":1760207195000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/17\/1\/188"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,1,19]]},"references-count":29,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2017,1]]}},"alternative-id":["s17010188"],"URL":"https:\/\/doi.org\/10.3390\/s17010188","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2017,1,19]]}}}