{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T15:43:04Z","timestamp":1778341384814,"version":"3.51.4"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2017,5,25]],"date-time":"2017-05-25T00:00:00Z","timestamp":1495670400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001602","name":"Science Foundation Ireland","doi-asserted-by":"crossref","award":["10\/CE\/I1855"],"award-info":[{"award-number":["10\/CE\/I1855"]}],"id":[{"id":"10.13039\/501100001602","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Lero - the Irish Software Engineering Research Centre"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Auton. Adapt. Syst."],"published-print":{"date-parts":[[2017,6,30]]},"abstract":"<jats:p>Multi-agent reinforcement learning (MARL) is a widely researched technique for decentralised control in complex large-scale autonomous systems. Such systems often operate in environments that are continuously evolving and where agents\u2019 actions are non-deterministic, so called inherently non-stationary environments. When there are inconsistent results for agents acting on such an environment, learning and adapting is challenging. In this article, we propose P-MARL, an approach that integrates prediction and pattern change detection abilities into MARL and thus minimises the effect of non-stationarity in the environment. The environment is modelled as a time-series, with future estimates provided using prediction techniques. Learning is based on the predicted environment behaviour, with agents employing this knowledge to improve their performance in realtime. We illustrate P-MARL\u2019s performance in a real-world smart grid scenario, where the environment is heavily influenced by non-stationary power demand patterns from residential consumers. We evaluate P-MARL in three different situations, where agents\u2019 action decisions are independent, simultaneous, and sequential. Results show that all methods outperform traditional MARL, with sequential P-MARL achieving best results.<\/jats:p>","DOI":"10.1145\/3070861","type":"journal-article","created":{"date-parts":[[2017,5,25]],"date-time":"2017-05-25T16:16:45Z","timestamp":1495729005000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":50,"title":["Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments"],"prefix":"10.1145","volume":"12","author":[{"given":"Andrei","family":"Marinescu","sequence":"first","affiliation":[{"name":"Trinity College Dublin, Dublin, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ivana","family":"Dusparic","sequence":"additional","affiliation":[{"name":"Trinity College Dublin, Dublin, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siobh\u00e1n","family":"Clarke","sequence":"additional","affiliation":[{"name":"Trinity College Dublin, Dublin, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,5,25]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2003998"},{"key":"e_1_2_1_2_1","volume-title":"Jenkins","author":"Box George E. P.","year":"1970","unstructured":"George E. P. Box and Gwilym M . Jenkins . 1970 . Time Series Analysis, Forecasting and Control . San Francisco, CA : Holden Day . George E. P. Box and Gwilym M. Jenkins. 1970. Time Series Analysis, Forecasting and Control. San Francisco, CA: Holden Day."},{"key":"e_1_2_1_3_1","volume-title":"Neurofuzzy Adaptive Modelling and Control","author":"Brown Martin","unstructured":"Martin Brown and Christopher John Harris . 1994. Neurofuzzy Adaptive Modelling and Control . Prentice Hall . Martin Brown and Christopher John Harris. 1994. Neurofuzzy Adaptive Modelling and Control. Prentice Hall."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2007.913919"},{"key":"e_1_2_1_5_1","volume-title":"Zhang","author":"Choi Samuel P. M.","year":"2001","unstructured":"Samuel P. M. Choi , Dit-Yan Yeung , and Nevin L . Zhang . 2001 . Hidden-mode Markov decision processes for nonstationary sequential decision making. In Sequence Learning. Springer , 264--287. Samuel P. M. Choi, Dit-Yan Yeung, and Nevin L. Zhang. 2001. Hidden-mode Markov decision processes for nonstationary sequential decision making. In Sequence Learning. Springer, 264--287."},{"key":"e_1_2_1_6_1","unstructured":"Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI\/IAAI. 746--752.   Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI\/IAAI. 746--752."},{"key":"e_1_2_1_7_1","unstructured":"Comission for Energy Regulation Ireland. 2011. Smart Meter Trial Data. Retrieved from http:\/\/www.ucd.ie\/issda\/data\/commissionforenergyregulationcer\/.  Comission for Energy Regulation Ireland. 2011. Smart Meter Trial Data. Retrieved from http:\/\/www.ucd.ie\/issda\/data\/commissionforenergyregulationcer\/."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022627411411"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976602753712972"},{"key":"e_1_2_1_10_1","volume-title":"Self-Organizing Architectures","author":"Dusparic Ivana","unstructured":"Ivana Dusparic and Vinny Cahill . 2010. Multi-policy optimization in self-organizing systems . In Self-Organizing Architectures . Springer , 101--126. Ivana Dusparic and Vinny Cahill. 2010. Multi-policy optimization in self-organizing systems. In Self-Organizing Architectures. Springer, 101--126."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/SusTech.2013.6617303"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/2615731.2617427"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2011.6161220"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1987.13927"},{"key":"e_1_2_1_16_1","volume-title":"Wong","author":"Hartigan John A.","year":"1979","unstructured":"John A. Hartigan and Manchek A . Wong . 1979 . Algorithm AS 136: A k-means clustering algorithm. Applied Statistics ( 1979), 100--108. John A. Hartigan and Manchek A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Applied Statistics (1979), 100--108."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems","author":"Hernandez Pablo","unstructured":"Pablo Hernandez , E. Munoz de Cote, and L. Enrique Sucar. 2013. Learning Against Non-stationary Opponents . In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems , Saint Paul, MN. Pablo Hernandez, E. Munoz de Cote, and L. Enrique Sucar. 2013. Learning Against Non-stationary Opponents. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, Saint Paul, MN."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964288"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1380584.1380585"},{"key":"e_1_2_1_20_1","volume-title":"W-learning: Competition Among Selfish Q-learners. Departmental Technical Report","author":"Humphrys Mark","year":"1995","unstructured":"Mark Humphrys . 1995 . W-learning: Competition Among Selfish Q-learners. Departmental Technical Report . University of Cambridge . Mark Humphrys. 1995. W-learning: Competition Among Selfish Q-learners. Departmental Technical Report. University of Cambridge."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(98)00023-X"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1547--1548","author":"Kahlen Micha","year":"2014","unstructured":"Micha Kahlen , Wolfgang Ketter , and Jan van Dalen . 2014 . Agent-coordinated virtual power plants of electric vehicles . In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1547--1548 . Micha Kahlen, Wolfgang Ketter, and Jan van Dalen. 2014. Agent-coordinated virtual power plants of electric vehicles. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1547--1548."},{"key":"e_1_2_1_23_1","volume-title":"Complexity of Computer Computations","author":"Karp Richard M.","unstructured":"Richard M. Karp . 1972. Reducibility among combinatorial problems . In Complexity of Computer Computations . Springer , 85--103. Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Springer, 85--103."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-32259-7_7"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.58325"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of AAAI-2005 Workshop on Multiagent Learning.","author":"Laum\u00f4nier Julien","year":"2005","unstructured":"Julien Laum\u00f4nier and Brahim Chaib-draa. 2005 . Multiagent Q-learning: Preliminary study on dominance between the Nash and Stackelberg equilibriums . In Proceedings of AAAI-2005 Workshop on Multiagent Learning. Julien Laum\u00f4nier and Brahim Chaib-draa. 2005. Multiagent Q-learning: Preliminary study on dominance between the Nash and Stackelberg equilibriums. In Proceedings of AAAI-2005 Workshop on Multiagent Learning."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Andrei Marinescu Ivana Dusparic Colin Harris Vinny Cahill and Siobh\u00e1n Clarke. 2014a. A dynamic forecasting method for small scale residential electrical demand. In IJCNN. 3767--3774.  Andrei Marinescu Ivana Dusparic Colin Harris Vinny Cahill and Siobh\u00e1n Clarke. 2014a. A dynamic forecasting method for small scale residential electrical demand. In IJCNN. 3767--3774.","DOI":"10.1109\/IJCNN.2014.6889425"},{"key":"e_1_2_1_28_1","volume-title":"Innovative Smart Grid Technologies (ISGT)","author":"Marinescu Andrei","year":"2014","unstructured":"Andrei Marinescu , Collin Harris , Ivana Dusparic , Vinny Cahill , and Siobh\u00e1n Clarke . 2014b. A hybrid approach to very small scale electrical demand forecasting . In Innovative Smart Grid Technologies (ISGT) , 2014 IEEE PES. 1--5. Andrei Marinescu, Collin Harris, Ivana Dusparic, Vinny Cahill, and Siobh\u00e1n Clarke. 2014b. A hybrid approach to very small scale electrical demand forecasting. In Innovative Smart Grid Technologies (ISGT), 2014 IEEE PES. 1--5."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/SE4SG.2013.6596108"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the he 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12","author":"Ramchurn Sarvapali D.","year":"2011","unstructured":"Sarvapali D. Ramchurn , Perukrishnen Vytelingum , Alex Rogers , and Nick Jennings . 2011 . Agent-based control for decentralised demand side management in the smart grid . In Proceedings of the he 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12 . Sarvapali D. Ramchurn, Perukrishnen Vytelingum, Alex Rogers, and Nick Jennings. 2011. Agent-based control for decentralised demand side management in the smart grid. In Proceedings of the he 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2659021.2659052"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 2010 13th International IEEE Conference on Intelligent Transportation Systems (ITSC). 531--538","unstructured":"As\u2019ad. Salkham and Vinny Cahill. 2010. Soilse: A decentralized approach to optimization of fluctuating urban traffic using reinforcement learning . In Proceedings of the 2010 13th International IEEE Conference on Intelligent Transportation Systems (ITSC). 531--538 . As\u2019ad. Salkham and Vinny Cahill. 2010. Soilse: A decentralized approach to optimization of fluctuating urban traffic using reinforcement learning. In Proceedings of the 2010 13th International IEEE Conference on Intelligent Transportation Systems (ITSC). 531--538."},{"key":"e_1_2_1_34_1","volume-title":"Multi-agent Reinforcement Learning: A Critical Survey. Technical report","author":"Shoham Yoav","unstructured":"Yoav Shoham , Rob Powers , and Trond Grenager . 2003. Multi-agent Reinforcement Learning: A Critical Survey. Technical report , Stanford University . Yoav Shoham, Rob Powers, and Trond Grenager. 2003. Multi-agent Reinforcement Learning: A Critical Survey. Technical report, Stanford University."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143872"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008942012299"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/101883.102055"},{"key":"e_1_2_1_38_1","volume-title":"Barto","author":"Sutton Richard S.","year":"1998","unstructured":"Richard S. Sutton and Andrew G . Barto . 1998 . Introduction to Reinforcement Learning. MIT Press . Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning. MIT Press."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICAC.2006.1662383"},{"key":"e_1_2_1_40_1","unstructured":"U.S. Department of Energy at Pacific Northwest National Laboratory. 2014. GridLAB-D. Retrieved from http:\/\/www.gridlabd.org\/.  U.S. Department of Energy at Pacific Northwest National Laboratory. 2014. GridLAB-D. Retrieved from http:\/\/www.gridlabd.org\/."},{"key":"e_1_2_1_41_1","volume-title":"EPA Fuel Economy Information","author":"S.","year":"2014","unstructured":"U. S. EPA Fuel Economy Information . 2014 . Nissan Leaf. Retrieved from http:\/\/www.fueleconomy.gov\/feg\/Find.do?action&equals;sbs8id&equals;32154. U.S. EPA Fuel Economy Information. 2014. Nissan Leaf. Retrieved from http:\/\/www.fueleconomy.gov\/feg\/Find.do?action&equals;sbs8id&equals;32154."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1591--1592","author":"Valogianni Konstantina","year":"2014","unstructured":"Konstantina Valogianni , Wolfgang Ketter , and John Collins . 2014 . Learning to schedule electric vehicle charging given individual customer preferences . In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1591--1592 . Konstantina Valogianni, Wolfgang Ketter, and John Collins. 2014. Learning to schedule electric vehicle charging given individual customer preferences. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1591--1592."},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of he 10th International Conference on Autonomous Agents and Multi-agent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 803--810","author":"Vandael Stijn","year":"2011","unstructured":"Stijn Vandael , Nelis Bouck\u00e9 , Tom Holvoet , Klaas De Craemer , and Geert Deconinck . 2011 . Decentralized coordination of plug-in hybrid vehicles for imbalance reduction in a smart grid . In Proceedings of he 10th International Conference on Autonomous Agents and Multi-agent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 803--810 . Stijn Vandael, Nelis Bouck\u00e9, Tom Holvoet, Klaas De Craemer, and Geert Deconinck. 2011. Decentralized coordination of plug-in hybrid vehicles for imbalance reduction in a smart grid. In Proceedings of he 10th International Conference on Autonomous Agents and Multi-agent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 803--810."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_2_1_45_1","volume-title":"Hypothesis Testing in Time Series Analysis","author":"Whittle Peter","unstructured":"Peter Whittle . 1951. Hypothesis Testing in Time Series Analysis . Vol. 4 . Almqvist 8 Wiksells . Peter Whittle. 1951. Hypothesis Testing in Time Series Analysis. Vol. 4. Almqvist 8 Wiksells."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018046501280"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-2070(97)00044-7"}],"container-title":["ACM Transactions on Autonomous and Adaptive Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3070861","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3070861","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:30:27Z","timestamp":1750217427000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3070861"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,5,25]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2017,6,30]]}},"alternative-id":["10.1145\/3070861"],"URL":"https:\/\/doi.org\/10.1145\/3070861","relation":{},"ISSN":["1556-4665","1556-4703"],"issn-type":[{"value":"1556-4665","type":"print"},{"value":"1556-4703","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,5,25]]},"assertion":[{"value":"2015-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-05-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}