{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T10:16:05Z","timestamp":1770977765036,"version":"3.50.1"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"ICFP","license":[{"start":{"date-parts":[[2023,8,30]],"date-time":"2023-08-30T00:00:00Z","timestamp":1693353600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Innovation Fund Denmark","award":["DIREC"],"award-info":[{"award-number":["DIREC"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2023,8,30]]},"abstract":"<jats:p>\n            The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the development of reinforcement learning applications. We introduce a formal specification of reinforcement learning problems and algorithms, with a particular focus on temporal difference methods and their definitions in backup diagrams. We further develop a test harness for a large class of reinforcement learning applications based on temporal difference learning, including SARSA and Q-learning. The entire development is rooted in functional programming methods; starting with pure specifications and denotational semantics, ending with property-based testing and using compositional interpreters for a domain-specific term language as a test oracle for concrete implementations. We demonstrate the usefulness of this testing method on a number of examples, and evaluate with mutation testing. We show that our test suite is effective in killing mutants (90% mutants killed for 75% of subject agents). More importantly, almost half of all mutants are killed by generic write-once-use-everywhere tests that apply to\n            <jats:italic>any<\/jats:italic>\n            reinforcement learning problem modeled using our library, without any additional effort from the programmer.\n          <\/jats:p>","DOI":"10.1145\/3607835","type":"journal-article","created":{"date-parts":[[2023,8,31]],"date-time":"2023-08-31T17:40:31Z","timestamp":1693503631000},"page":"125-158","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Formal Specification and Testing for Reinforcement Learning"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4776-883X","authenticated-orcid":false,"given":"Mahsa","family":"Varshosaz","sequence":"first","affiliation":[{"name":"IT University of Copenhagen, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1939-9053","authenticated-orcid":false,"given":"Mohsen","family":"Ghaffari","sequence":"additional","affiliation":[{"name":"IT University of Copenhagen, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5382-3949","authenticated-orcid":false,"given":"Einar Broch","family":"Johnsen","sequence":"additional","affiliation":[{"name":"University of Oslo, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0532-2685","authenticated-orcid":false,"given":"Andrzej","family":"W\u0105sowski","sequence":"additional","affiliation":[{"name":"IT University of Copenhagen, Denmark"}]}],"member":"320","published-online":{"date-parts":[[2023,8,31]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"CoachNet: An Adversarial Sampling Approach for Reinforcement Learning. In NeurIPS2019 Workshop on Safety and Robustness in Decision Making. arXiv. https:\/\/doi.org\/10","author":"Abolfathi Elmira Amirloo","year":"2021","unstructured":"Elmira Amirloo Abolfathi, Jun Luo, Peyman Yadmellat, and Kasra Rezaee. 2021. CoachNet: An Adversarial Sampling Approach for Reinforcement Learning. In NeurIPS2019 Workshop on Safety and Robustness in Decision Making. arXiv. https:\/\/doi.org\/10.48550\/ARXIV.2101.02649"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11797"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","unstructured":"Rajeev Alur Suguman Bansal Osbert Bastani and Kishor Jothimurugan. 2022. A Framework for\u00a0Transforming Specifications in\u00a0Reinforcement Learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday Jean-Fran\u00e7ois Raskin Krishnendu Chatterjee Laurent Doyen and Rupak Majumdar (Eds.) (Lecture Notes in Computer Science Vol. 13660). Springer. https:\/\/doi.org\/10.1007\/978-3-031-22337-2_29 10.1007\/978-3-031-22337-2_29","DOI":"10.1007\/978-3-031-22337-2_29"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468537"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/357766.351266"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11631"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/C-M.1978.218136"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460319.3464844"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2103.03938"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12107"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-17462-0_28"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/2789272.2886795"},{"key":"e_1_2_1_13_1","volume-title":"Bayesian data analysis","author":"Gelman Andrew","unstructured":"Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2044-8317.2011.02037.x"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1977.231145"},{"key":"e_1_2_1_16_1","volume-title":"Logically-Correct Reinforcement Learning. CoRR, abs\/1801.08099","author":"Hasanbeig Mohammadhosein","year":"2018","unstructured":"Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning. CoRR, abs\/1801.08099 (2018), arXiv:1801.08099. arxiv:1801.08099"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","unstructured":"Sandy Huang Nicolas Papernot Ian Goodfellow Yan Duan and Pieter Abbeel. 2017. Adversarial Attacks on Neural Network Policies. arXiv. https:\/\/doi.org\/10.48550\/ARXIV.1702.02284","DOI":"10.48550\/ARXIV.1702.02284"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3365365.3382216"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.4230\/LIPIcs.CONCUR.2020.3"},{"key":"e_1_2_1_20_1","volume-title":"Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alch\u00e9-Buc","author":"Jothimurugan Kishor","unstructured":"Kishor Jothimurugan, Rajeev Alur, and Osbert Bastani. 2019. A Composable Specification Language for Reinforcement Learning Tasks. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). 32, Curran Associates, Inc.."},{"key":"e_1_2_1_21_1","volume-title":"Advances in Neural Information Processing Systems","author":"Jothimurugan Kishor","unstructured":"Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, and Rajeev Alur. 2021. Compositional Reinforcement Learning from Logical Specifications. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, Inc., 10026\u201310039."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-13188-2_17"},{"key":"e_1_2_1_23_1","volume-title":"Proc. 24th International Conference on Artificial Intelligence and Statistics, Arindam Banerjee and Kenji Fukumizu (Eds.) (Proceedings of Machine Learning Research","volume":"1170","author":"Jothimurugan Kishor","year":"2021","unstructured":"Kishor Jothimurugan, Osbert Bastani, and Rajeev Alur. 2021. Abstract Value Iteration for Hierarchical Reinforcement Learning. In Proc. 24th International Conference on Artificial Intelligence and Statistics, Arindam Banerjee and Kenji Fukumizu (Eds.) (Proceedings of Machine Learning Research, Vol. 130). PMLR, 1162\u20131170."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-49674-9_8"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.301"},{"key":"e_1_2_1_26_1","volume-title":"JAGS, and Stan","author":"Kruschke John","unstructured":"John Kruschke. 2014. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/525"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556962"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-91265-9_8"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2635868.2635920"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1201\/9780429029608"},{"key":"e_1_2_1_33_1","volume-title":"Advances in Neural Information Processing Systems","author":"Oikarinen Tuomas","unstructured":"Tuomas Oikarinen, Wang Zhang, Alexandre Megretski, Luca Daniel, and Tsui-Wei Weng. 2021. Robust Deep Reinforcement Learning through Adversarial Loss. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, 26156\u201326167."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534388"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/bs.adcom.2018.03.015"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/503272.503288"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE51524.2021.9678764"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICST53961.2022.00013"},{"key":"e_1_2_1_39_1","volume-title":"Safe Machine Learning workshop at ICLR","author":"Ruderman Avraham","year":"2019","unstructured":"Avraham Ruderman, Richard Everett, Bristy Sikder, Hubert Soyer, Jonathan Uesato, Ananya Kumar, Charlie Beattie, and Pushmeet Kohli. 2019. Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis. In Safe Machine Learning workshop at ICLR 2019."},{"key":"e_1_2_1_40_1","unstructured":"G. A. Rummery and M. Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Technical Report CUED\/F-INFENF\/TR https:\/\/cir.nii.ac.jp\/crid\/1573668924277769344"},{"key":"e_1_2_1_41_1","unstructured":"Stuart J Russell and Peter Norvig. 2016. Artificial intelligence: A modern approach. Pearson Education Limited."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1707.06347"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3560429"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238172"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022633531479"},{"key":"e_1_2_1_46_1","volume-title":"Barto","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). The MIT Press."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2022\/72"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19849-6_20"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510625"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE51524.2021.9678566"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ADPRL.2009.4927542"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468891.3468897"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.8083298"},{"key":"e_1_2_1_54_1","unstructured":"Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2019.05.046"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","unstructured":"Shaohua Zhang Shuang Liu Jun Sun Yuqi Chen Wenzhi Huang Jinyi Liu Jian Liu and Jianye Hao. 2021. FIGCPS: Effective Failure-inducing Input Generation for Cyber-Physical Systems with Deep Reinforcement Learning. In 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 555\u2013567. https:\/\/doi.org\/10.1109\/ASE51524.2021.9678832 10.1109\/ASE51524.2021.9678832","DOI":"10.1109\/ASE51524.2021.9678832"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00048"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2023.3269804"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3607835","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3607835","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:05Z","timestamp":1750178225000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3607835"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,30]]},"references-count":57,"journal-issue":{"issue":"ICFP","published-print":{"date-parts":[[2023,8,30]]}},"alternative-id":["10.1145\/3607835"],"URL":"https:\/\/doi.org\/10.1145\/3607835","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,30]]},"assertion":[{"value":"2023-08-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}