{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T05:22:37Z","timestamp":1777440157038,"version":"3.51.4"},"reference-count":24,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2011,1,20]],"date-time":"2011-01-20T00:00:00Z","timestamp":1295481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In a network of low-powered wireless sensors, it is essential to capture as many environmental events as possible while still preserving the battery life of the sensor node. This paper focuses on a real-time learning algorithm to extend the lifetime of a sensor node to sense and transmit environmental events. A common method that is generally adopted in ad-hoc sensor networks is to periodically put the sensor nodes to sleep. The purpose of the learning algorithm is to couple the sensor\u2019s sleeping behavior to the natural statistics of the environment hence that it can be in optimal harmony with changes in the environment, the sensors can sleep when steady environment and stay awake when turbulent environment. This paper presents theoretical and experimental validation of a reward based learning algorithm that can be implemented on an embedded sensor. The key contribution of the proposed approach is the design and implementation of a reward function that satisfies a trade-off between the above two mutually contradicting objectives, and a linear critic function to approximate the discounted sum of future rewards in order to perform policy learning.<\/jats:p>","DOI":"10.3390\/s110101229","type":"journal-article","created":{"date-parts":[[2011,1,20]],"date-time":"2011-01-20T11:51:03Z","timestamp":1295524263000},"page":"1229-1242","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Intelligent Sensing in Dynamic Environments Using Markov Decision Process"],"prefix":"10.3390","volume":"11","author":[{"given":"Thrishantha","family":"Nanayakkara","sequence":"first","affiliation":[{"name":"Division of Engineering, King\u2019s College, University of London, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Malka N.","family":"Halgamuge","sequence":"additional","affiliation":[{"name":"Melbourne School of Engineering, The University of Melbourne, Melbourne, VIC, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Prasanna","family":"Sridhar","sequence":"additional","affiliation":[{"name":"Microsoft Corp., Redmond, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Asad M.","family":"Madni","sequence":"additional","affiliation":[{"name":"Crocker Capital, San Francisco, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2011,1,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Boulis, A., Ganeriwal, S., and Srivastava, M.B. (2003, January 11). Aggregation in sensor networks: An energy-accuracy trade-off. Anchorage, AK, USA.","DOI":"10.1016\/S1570-8705(03)00009-X"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1109\/LCOMM.2003.815663","article-title":"Data aggregation and dilution by modulus addressing in wireless sensor networks","volume":"7","author":"Cayirci","year":"2003","journal-title":"IEEE Commun. Lett"},{"key":"ref_3","unstructured":"Sankarasubramaniam, Y., Akyildiz, I.F., and McLaughlin, S.W. (2003, January 11). Energy efficiency based packet size optimization in wireless sensor networks. Anchorage, AK, USA."},{"key":"ref_4","unstructured":"Halgamuge, M.N., Guru, S.M., and Jennings, A. (March, January 23). Energy efficient cluster formation in wireless sensor networks. Tahity, French Polynesia."},{"key":"ref_5","unstructured":"Halgamuge, M.N., Guru, S.M., and Jennings, A. (2005). Classification and Clustering for Knowledge Discovery, Springer-Verlag."},{"key":"ref_6","unstructured":"Zou, Y., and Chakrabarty, K. (2003, January 11). Target localization based on energy considerations in distributed sensor networks. Anchorage, AK, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Halgamuge, M.N. (2007, January 21\u201323). Efficient battery management for sensor lifetime. 21st International Conference on Advanced Information Networking and Applications Workshops, Niagara Falls, ON, Canada.","DOI":"10.1109\/AINAW.2007.165"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/TWC.2002.804190","article-title":"An application-specific protocol architecture for wireless microsensor networks","volume":"1","author":"Heinzelman","year":"2002","journal-title":"IEEE Tran. Wireless Comm"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Intanagonwiwat, C., Govindan, R., and Estrin, D (2000). Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks, University of Southern California. Technical Report 00-732;.","DOI":"10.1145\/345910.345920"},{"key":"ref_10","unstructured":"Ye, W., Heidemann, J., and Estrin, D. (2002, January 23\u201327). An energy-efficient MAC protocol for wireless sensor networks. New York, NY, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"259","DOI":"10.2528\/PIERB08122303","article-title":"An estimation of sensor energy consumption","volume":"12","author":"Halgamuge","year":"2009","journal-title":"Prog. Electromagn. Res. B"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_13","unstructured":"Toussaint, M. (2009). Lecture Notes: Markov Decision Processes, TU Berlin."},{"key":"ref_14","first-page":"679","article-title":"A Markovian Decision Process","volume":"6","author":"Bellman","year":"1957","journal-title":"J. Math. Mech"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1593","DOI":"10.1126\/science.275.5306.1593","article-title":"A neural substrate of prediction and reward","volume":"275","author":"Schultz","year":"1997","journal-title":"Science"},{"key":"ref_16","unstructured":"Watkins, C.J.C.H. (1989). Learning from Delayed Rewards, Ph.D. Dissertation,."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1016\/S0896-6273(02)00963-7","article-title":"Reward, motivation, and reinforcement learning","volume":"36","author":"Dayan","year":"2002","journal-title":"Neuron"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2094","DOI":"10.1152\/jn.00304.2006","article-title":"Rat nucleus accumbens neurons persistently encode locations associated with morphine reward","volume":"97","author":"German","year":"2007","journal-title":"J. Neurophysiol"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1152\/jn.1998.80.2.947","article-title":"Influence of reward expectation on behavior\u2013Related neuronal activity in primate striatum","volume":"80","author":"Hollerman","year":"1998","journal-title":"J. Neurophysiol"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1016\/S0896-6273(02)00974-1","article-title":"Neural economics and the biological substrates of valuation","volume":"36","author":"Montague","year":"2002","journal-title":"Neuron"},{"key":"ref_21","unstructured":"Mataric, M.J. Reward functions for accelerated learning. New Brunswick, NJ, USA."},{"key":"ref_22","unstructured":"Ng, A.Y., Harada, D., and Russell, S. Policy invariance under reward transformations: Theory and applications to reward shaping. Bled, Slovenia."},{"key":"ref_23","first-page":"142","article-title":"Multi-dimensional reinforcement learning using a Vector Q-Net-Application to mobile robots","volume":"1","author":"Kiguchi","year":"2003","journal-title":"Int. J. Control, Autom. Syst"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sridhar, P., Nanayakkara, T., Madni, A.M., and Jamshidi, M. (2007, January 4\u20136). Dynamic power management of an embedded sensor network based on Actor-Critic reinforcement based learning. Melbourne, VIC, Australia.","DOI":"10.1109\/ICIAFS.2007.4544783"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/11\/1\/1229\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:54:59Z","timestamp":1760219699000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/11\/1\/1229"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,1,20]]},"references-count":24,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2011,1]]}},"alternative-id":["s110101229"],"URL":"https:\/\/doi.org\/10.3390\/s110101229","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,1,20]]}}}