{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T18:13:00Z","timestamp":1772820780723,"version":"3.50.1"},"reference-count":68,"publisher":"MDPI AG","issue":"20","license":[{"start":{"date-parts":[[2022,10,13]],"date-time":"2022-10-13T00:00:00Z","timestamp":1665619200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61672128"],"award-info":[{"award-number":["61672128"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["DUT20TD107"],"award-info":[{"award-number":["DUT20TD107"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fundamental Research Fund for Central University","award":["61672128"],"award-info":[{"award-number":["61672128"]}]},{"name":"Fundamental Research Fund for Central University","award":["DUT20TD107"],"award-info":[{"award-number":["DUT20TD107"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Electricity demands are increasing significantly and the traditional power grid system is facing huge challenges. As the desired next-generation power grid system, smart grid can provide secure and reliable power generation, and consumption, and can also realize the system\u2019s coordinated and intelligent power distribution. Coordinating grid power distribution usually requires mutual communication between power distributors to accomplish coordination. However, the power network is complex, the network nodes are far apart, and the communication bandwidth is often expensive. Therefore, how to reduce the communication bandwidth in the cooperative power distribution process task is crucially important. One way to tackle this problem is to build mechanisms to selectively send out communications, which allow distributors to send information at certain moments and key states. The distributors in the power grid are modeled as reinforcement learning agents, and the communication bandwidth in the power grid can be reduced by optimizing the communication frequency between agents. Therefore, in this paper, we propose a model for deciding whether to communicate based on the causal inference method, Causal Inference Communication Model (CICM). CICM regards whether to communicate as a binary intervention variable, and determines which intervention is more effective by estimating the individual treatment effect (ITE). It offers the optimal communication strategy about whether to send information while ensuring task completion. This method effectively reduces the communication frequency between grid distributors, and at the same time maximizes the power distribution effect. In addition, we test the method in StarCraft II and 3D environment habitation experiments, which fully proves the effectiveness of the method.<\/jats:p>","DOI":"10.3390\/s22207785","type":"journal-article","created":{"date-parts":[[2022,10,14]],"date-time":"2022-10-14T01:44:13Z","timestamp":1665711853000},"page":"7785","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Pruning the Communication Bandwidth between Reinforcement Learning Agents through Causal Inference: An Innovative Approach to Designing a Smart Grid Power System"],"prefix":"10.3390","volume":"22","author":[{"given":"Xianjie","family":"Zhang","sequence":"first","affiliation":[{"name":"Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian 116620, China"}]},{"given":"Yu","family":"Liu","sequence":"additional","affiliation":[{"name":"Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian 116620, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5042-7756","authenticated-orcid":false,"given":"Wenjun","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computing and Information Systems, Singapore Management University, 81 Victoria Street, Singapore 188065, Singapore"}]},{"given":"Chen","family":"Gong","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,10,13]]},"reference":[{"key":"ref_1","unstructured":"Li, D. (2020). Cooperative Communications in Smart Grid Networks. [Ph.D. Thesis, University of Sheffield]."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/SURV.2011.122211.00021","article-title":"Smart Grid Communications: Overview of Research Challenges, Solutions, and Standardization Activities","volume":"15","author":"Fan","year":"2013","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/j.rser.2016.06.093","article-title":"Collaborative smart grids\u2014A survey on trends","volume":"65","year":"2016","journal-title":"Renew. Sustain. Energy Rev."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2589","DOI":"10.1016\/j.renene.2019.08.092","article-title":"A survey on smart grid technologies and applications","volume":"146","author":"Dileep","year":"2020","journal-title":"Renew. Energy"},{"key":"ref_5","first-page":"157","article-title":"Energy Saving in Distribution System using Internet of Things in Smart Grid environment","volume":"8","author":"Arya","year":"2019","journal-title":"Int. J. Comput. Digit. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/j.engappai.2018.12.002","article-title":"Resource allocation for smart grid communication based on a multi-swarm artificial bee colony algorithm with cooperative learning","volume":"81","author":"Ma","year":"2019","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_7","unstructured":"Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., and Mordatch, I. (2017, January 4\u20139). Multi-agent actor-critic for mixed cooperative-competitive environments. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1287\/moor.27.4.819.297","article-title":"The complexity of decentralized control of Markov decision processes","volume":"27","author":"Bernstein","year":"2002","journal-title":"Math. Oper. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1016\/j.neucom.2021.07.014","article-title":"Structural relational inference actor-critic for multi-agent reinforcement learning","volume":"459","author":"Zhang","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.neucom.2022.09.144","article-title":"Common belief multi-agent reinforcement learning based on variational recurrent models","volume":"513","author":"Zhang","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1016\/j.isatra.2012.06.010","article-title":"Optimal control in microgrid using multi-agent reinforcement learning","volume":"51","author":"Li","year":"2012","journal-title":"ISA Trans."},{"key":"ref_12","unstructured":"Bratko, I., and Dzeroski, S. (1999, January 27\u201330). Distributed Value Functions. Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"808","DOI":"10.1109\/TSG.2014.2363844","article-title":"Balancing Energy in the Smart Grid Using Distributed Value Function (DVF)","volume":"6","author":"Shirzeh","year":"2015","journal-title":"IEEE Trans. Smart Grid"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Riedmiller, M., Moore, A., and Schneider, J. (2001). Reinforcement Learning for Cooperating and Communicating Reactive Agents in Electrical Power Grids. Proceedings of the Balancing Reactivity and Social Deliberation in Multi-Agent Systems, Springer.","DOI":"10.1007\/3-540-44568-4_9"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.apenergy.2018.03.017","article-title":"Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids","volume":"219","author":"Kofinas","year":"2018","journal-title":"Appl. Energy"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1109\/TSG.2019.2931753","article-title":"Low-latency communications for community resilience microgrids: A reinforcement learning approach","volume":"11","author":"Elsayed","year":"2019","journal-title":"IEEE Trans. Smart Grid"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Mao, H., Zhang, Z., Xiao, Z., Gong, Z., and Ni, Y. (2020, January 7\u201312). Learning agent communication under limited bandwidth by message pruning. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i04.5957"},{"key":"ref_18","unstructured":"Bogin, B., Geva, M., and Berant, J. (2018). Emergence of Communication in an Interactive World with Consistent Speakers. CoRR."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Patel, S., Wani, S., Jain, U., Schwing, A.G., Lazebnik, S., Savva, M., and Chang, A.X. (2021, January 11). Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01565"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, I.J., Ren, Z., Yeh, R.A., and Schwing, A.G. (October, January 27). Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic.","DOI":"10.1109\/IROS51168.2021.9636592"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"3154","DOI":"10.1109\/LRA.2022.3145964","article-title":"Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge","volume":"7","author":"Liu","year":"2022","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_22","unstructured":"Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (2016, January 5\u201310). Learning to Communicate with Deep Multi-Agent Reinforcement Learning. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_23","unstructured":"Sukhbaatar, S., and Fergus, R. (2016, January 5\u201310). Learning multiagent communication with backpropagation. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_24","unstructured":"Peng, P., Yuan, Q., Wen, Y., Yang, Y., Tang, Z., Long, H., and Wang, J. (2017). Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games. CoRR."},{"key":"ref_25","unstructured":"Das, A., Gervet, T., Romoff, J., Batra, D., Parikh, D., Rabbat, M., and Pineau, J. (2019, January 9\u201315). Tarmac: Targeted multi-agent communication. Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA."},{"key":"ref_26","unstructured":"Wang, T., Wang, J., Zheng, C., and Zhang, C. (2020, January 26\u201330). Learning Nearly Decomposable Value Functions Via Communication Minimization. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia."},{"key":"ref_27","unstructured":"Niu, Y., Paleja, R., and Gombolay, M. (2021). Multi-Agent Graph-Attention Communication and Teaming, International Foundation for Autonomous Agents and Multiagent Systems. AAMAS \u201921."},{"key":"ref_28","unstructured":"Wang, R., He, X., Yu, R., Qiu, W., An, B., and Rabinovich, Z. (2020, January 13\u201318). Learning Efficient Multi-agent Communication: An Information Bottleneck Approach. Proceedings of the 37th International Conference on Machine Learning, Virtual Event."},{"key":"ref_29","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017, January 4\u20139). Causal Effect Inference with Deep Latent-Variable Models. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_30","unstructured":"Shalit, U., Johansson, F.D., and Sontag, D. (2017, January 6\u201311). Estimating individual treatment effect: Generalization bounds and algorithms. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_31","unstructured":"Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (2018, January 3\u20138). Representation Learning for Treatment Effect Estimation from Observational Data. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_32","unstructured":"Kingma, D.P., and Welling, M. (2014, January 14\u201316). Auto-Encoding Variational Bayes. Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bou Ghosn, S., Ranganathan, P., Salem, S., Tang, J., Loegering, D., and Nygard, K.E. (2010, January 4\u20136). Agent-Oriented Designs for a Self Healing Smart Grid. Proceedings of the 2010 First IEEE International Conference on Smart Grid Communications, Gaithersburg, MD, USA.","DOI":"10.1109\/SMARTGRID.2010.5622085"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Brown, R.E. (2008, January 20\u201324). Impact of Smart Grid on distribution system design. Proceedings of the 2008 IEEE Power and Energy Society General Meeting-Conversion and Delivery of Electrical Energy in the 21st Century, Pittsburgh, PA, USA.","DOI":"10.1109\/PES.2008.4596843"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/JCN.2012.00030","article-title":"Smart grid cooperative communication with smart relay","volume":"14","author":"Ahmed","year":"2012","journal-title":"J. Commun. Netw."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.engappai.2014.04.007","article-title":"Agent-based modeling and simulation of a smart grid: A case study of communication effects on frequency control","volume":"33","author":"Kilkki","year":"2014","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1964","DOI":"10.1109\/TSG.2020.3026930","article-title":"Distributed Control of Multi-Energy Storage Systems for Voltage Regulation in Distribution Networks: A Back-and-Forth Communication Framework","volume":"12","author":"Yu","year":"2021","journal-title":"IEEE Trans. Smart Grid"},{"key":"ref_38","unstructured":"Gong, C., Yang, Z., Bai, Y., He, J., Shi, J., Sinha, A., Xu, B., Hou, X., Fan, G., and Lo, D. (2022). Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets. arXiv."},{"key":"ref_39","unstructured":"Hoshen, Y. (2017, January 4\u20139). Vain: Attentional multi-agent predictive modeling. Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA."},{"key":"ref_40","unstructured":"Jiang, J., and Lu, Z. (2018, January 3\u20138). Learning attentional communication for multi-agent cooperation. Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montr\u00e9al, QC, Canada."},{"key":"ref_41","unstructured":"Iqbal, S., and Sha, F. (2019, January 9\u201315). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_42","unstructured":"Yang, Y., Hao, J., Liao, B., Shao, K., Chen, G., Liu, W., and Tang, H. (2020). Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning. CoRR."},{"key":"ref_43","unstructured":"Bai, Y., Gong, C., Zhang, B., Fan, G., and Hou, X. (2021). Value Function Factorisation with Hypergraph Convolution for Cooperative Multi-agent Reinforcement Learning. arXiv."},{"key":"ref_44","unstructured":"Parnika, P., Diddigi, R.B., Danda, S.K.R., and Bhatnagar, S. (2021, January 3\u20137). Attention Actor-Critic Algorithm for Multi-Agent Constrained Co-Operative Reinforcement Learning. Proceedings of the International Foundation for Autonomous Agents and Multiagent Systems, Online."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Chen, H., Yang, G., Zhang, J., Yin, Q., and Huang, K. (2022). RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning. arXiv.","DOI":"10.1109\/IJCNN55064.2022.9892225"},{"key":"ref_46","unstructured":"Gentzel, A.M., Pruthi, P., and Jensen, D. (2021, January 18\u201324). How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference. Proceedings of the 38th International Conference on Machine Learning, Virtual Event."},{"key":"ref_47","first-page":"1","article-title":"Evaluation Methods and Measures for Causal Learning Algorithms","volume":"2022","author":"Cheng","year":"2022","journal-title":"IEEE Trans. Artif. Intell."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Cheng, L., Guo, R., and Liu, H. (2022, January 16\u201319). Causal mediation analysis with hidden confounders. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Chengdu, China.","DOI":"10.1145\/3488560.3498407"},{"key":"ref_49","unstructured":"Bibaut, A., Malenica, I., Vlassis, N., and Van Der Laan, M. (2019, January 9\u201315). More efficient off-policy evaluation through regularized targeted learning. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_50","first-page":"10276","article-title":"Off-Policy Evaluation in Partially Observable Environments","volume":"34","author":"Tennenholtz","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_51","unstructured":"Oberst, M., and Sontag, D. (2019, January 9\u201315). Counterfactual off-policy evaluation with gumbel-max structural causal models. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Jaber, A., Zhang, J., and Bareinboim, E. (2019, January 9\u201315). Causal Identification under Markov Equivalence: Completeness Results. Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA.","DOI":"10.24963\/ijcai.2019\/859"},{"key":"ref_53","unstructured":"Bennett, A., Kallus, N., Li, L., and Mousavi, A. (2021, January 13\u201315). Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders. Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Virtual."},{"key":"ref_54","first-page":"12113","article-title":"Estimating Identifiable Causal Effects through Double Machine Learning","volume":"35","author":"Jung","year":"2021","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_55","unstructured":"Lu, C., Sch\u00f6lkopf, B., and Hern\u00e1ndez-Lobato, J.M. (2018). Deconfounding Reinforcement Learning in Observational Settings. CoRR."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Lauritzen, S.L. (1996). Graphical Models, Clarendon Press.","DOI":"10.1093\/oso\/9780198522195.001.0001"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Wermuth, N., and Lauritzen, S.L. (1982). Graphical and Recursive Models for Contigency Tables, Institut for Elektroniske Systemer, Aalborg Universitetscenter.","DOI":"10.2307\/2336490"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1017\/S1446788700027312","article-title":"Recursive causal models","volume":"36","author":"Kiiveri","year":"1984","journal-title":"J. Aust. Math. Soc."},{"key":"ref_59","unstructured":"Levine, S. (2018). Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review. CoRR."},{"key":"ref_60","first-page":"9700","article-title":"MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation","volume":"Volume 33","author":"Larochelle","year":"2020","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., and Malik, J. (2019, January 27\u201328). Habitat: A Platform for Embodied AI Research. Proceedings of the Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00943"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Chang, A., Dai, A., Funkhouser, T., Halber, M., Niebner, M., Savva, M., Song, S., Zeng, A., and Zhang, Y. (2017, January 10\u201312). Matterport3D: Learning from RGB-D Data in Indoor Environments. Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China.","DOI":"10.1109\/3DV.2017.00081"},{"key":"ref_63","unstructured":"Samvelyan, M., Rashid, T., de Witt, C.S., Farquhar, G., Nardelli, N., Rudner, T.G.J., Hung, C.M., Torr, P.H.S., Foerster, J., and Whiteson, S. (2019). The StarCraft Multi-Agent Challenge. CoRR."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Ammirato, P., Poirson, P., Park, E., Ko\u0161eck\u00e1, J., and Berg, A.C. (2017, January 29). A dataset for developing and benchmarking active vision. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989164"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Jain, U., Weihs, L., Kolve, E., Rastegari, M., Lazebnik, S., Farhadi, A., Schwing, A.G., and Kembhavi, A. (2019, January 15\u201319). Two Body Problem: Collaborative Visual Task Completion. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00685"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Jain, U., Weihs, L., Kolve, E., Farhadi, A., Lazebnik, S., Kembhavi, A., and Schwing, A.G. (2020, January 23\u201328). A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks. Proceedings of the Computer Vision-ECCV 2020-16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58558-7_28"},{"key":"ref_67","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, January 10\u201315). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_68","first-page":"741","article-title":"Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model","volume":"Volume 33","author":"Larochelle","year":"2020","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/20\/7785\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:53:39Z","timestamp":1760144019000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/20\/7785"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,13]]},"references-count":68,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["s22207785"],"URL":"https:\/\/doi.org\/10.3390\/s22207785","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,13]]}}}