{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T10:56:28Z","timestamp":1760784988279,"version":"3.40.3"},"publisher-location":"Cham","reference-count":18,"publisher":"Springer International Publishing","isbn-type":[{"type":"print","value":"9783030798758"},{"type":"electronic","value":"9783030798765"}],"license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,5]],"date-time":"2021-07-05T00:00:00Z","timestamp":1625443200000},"content-version":"vor","delay-in-days":185,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We introduce a modular and transparent approach for augmenting the ability of reinforcement learning agents to comply with a given norm base. The normative supervisor module functions as both an event recorder and real-time compliance checker w.r.t. an external norm base. We have implemented this module with a theorem prover for defeasible deontic logic, in a reinforcement learning agent that we task with playing a \u201cvegan\u201d version of the arcade game Pac-Man.<\/jats:p>","DOI":"10.1007\/978-3-030-79876-5_32","type":"book-chapter","created":{"date-parts":[[2021,7,7]],"date-time":"2021-07-07T09:20:19Z","timestamp":1625649619000},"page":"565-576","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A Normative Supervisor for Reinforcement Learning Agents"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5998-3273","authenticated-orcid":false,"given":"Emery","family":"Neufeld","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8004-6601","authenticated-orcid":false,"given":"Ezio","family":"Bartocci","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6947-8772","authenticated-orcid":false,"given":"Agata","family":"Ciabattoni","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9878-2762","authenticated-orcid":false,"given":"Guido","family":"Governatori","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,7,5]]},"reference":[{"key":"32_CR1","unstructured":"Aler Tubella, A., Dignum, V.: The glass box approach: Verifying contextual adherence to values. In: Proc. of AISafety@IJCAI: Workshop on Artificial Intelligence Safety co-located with the 28th International Joint Conference on Artificial Intelligence. CEUR Workshop Proceedings, vol. 2419. CEUR-WS.org (2019), http:\/\/ceur-ws.org\/Vol-2419\/paper_18.pdf"},{"key":"32_CR2","doi-asserted-by":"publisher","unstructured":"Aler Tubella, A., Theodorou, A., Dignum, F., Dignum, V.: Governance by glass-box: Implementing transparent moral bounds for AI behaviour. In: Proc. of IJCAI 2019: the 28th International Joint Conference on Artificial Intelligence. pp. 5787\u20135793. ijcai.org (2019). https:\/\/doi.org\/10.24963\/ijcai.2019\/802","DOI":"10.24963\/ijcai.2019\/802"},{"key":"32_CR3","unstructured":"Andrighetto, G., Governatori, G., Noriega, P., van der Torre, L.W.N. (eds.): Normative Multi-Agent Systems, Dagstuhl Follow-Ups, vol. 4. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2013), http:\/\/drops.dagstuhl.de\/opus\/portals\/dfu\/index.php?semnr=13003"},{"key":"32_CR4","unstructured":"Boella, G., van der Torre, L.: Regulative and constitutive norms in normative multiagent systems. In: Proc. of KR 2004: the 9th International Conference on Principles of Knowledge Representation and Reasoning. pp. 255\u2013266. AAAI Press (2004), http:\/\/www.aaai.org\/Library\/KR\/2004\/kr04-028.php"},{"issue":"3","key":"32_CR5","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1109\/JPROC.2019.2898267","volume":"107","author":"P Bremner","year":"2019","unstructured":"Bremner, P., Dennis, L.A., Fisher, M., Winfield, A.F.T.: On proactive, transparent, and verifiable ethical reasoning for robots. Proc. IEEE 107(3), 541\u2013561 (2019). https:\/\/doi.org\/10.1109\/JPROC.2019.2898267","journal-title":"Proc. IEEE"},{"key":"32_CR6","doi-asserted-by":"crossref","unstructured":"Broersen, J., Dastani, M., Hulstijn, J., Huang, Z., van der Torre, L.: The boid architecture: conflicts between beliefs, obligations, intentions and desires. In: Proceedings of the fifth international conference on Autonomous agents. pp. 9\u201316 (2001)","DOI":"10.1145\/375735.375766"},{"key":"32_CR7","unstructured":"DeNero, J., Klein, D.: UC Berkeley CS188 intro to AI - course materials (2014)"},{"issue":"6","key":"32_CR8","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1007\/s10992-013-9295-1","volume":"42","author":"Guido Governatori","year":"2013","unstructured":"Governatori, Guido, Olivieri, Francesco, Rotolo, Antonino, Scannapieco, Simone: Computing Strong and Weak Permissions in Defeasible Logic. Journal of Philosophical Logic 42(6), 799\u2013829 (2013). https:\/\/doi.org\/10.1007\/s10992-013-9295-1","journal-title":"Journal of Philosophical Logic"},{"issue":"1","key":"32_CR9","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1007\/s10458-008-9030-4","volume":"17","author":"G Governatori","year":"2008","unstructured":"Governatori, G., Rotolo, A.: BIO logical agents: Norms, beliefs, intentions in defeasible logic. Journal of Autonomous Agents and Multi Agent Systems 17(1), 36\u201369 (2008). https:\/\/doi.org\/10.1007\/s10458-008-9030-4","journal-title":"Journal of Autonomous Agents and Multi Agent Systems"},{"key":"32_CR10","doi-asserted-by":"publisher","unstructured":"Hasanbeig, M., Kantaros, Y., Abate, A., Kroening, D., Pappas, G.J., Lee, I.: Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees. In: Proc. of CDC 2019: the 58th IEEE Conference on Decision and Control. pp. 5338\u20135343 (2019). https:\/\/doi.org\/10.1109\/CDC40024.2019.9028919","DOI":"10.1109\/CDC40024.2019.9028919"},{"key":"32_CR11","doi-asserted-by":"publisher","unstructured":"Lam, H.P., Governatori, G.: The making of SPINdle. In: Proc. of RuleML 2009: International Symposium on Rule Interchange and Applications. LNCS, vol.\u00a05858, pp. 315\u2013322. Springer, Heidelberg (2009). https:\/\/doi.org\/10.1007\/978-3-642-04985-9","DOI":"10.1007\/978-3-642-04985-9"},{"key":"32_CR12","doi-asserted-by":"publisher","unstructured":"Lam, H.P., Governatori, G.: Towards a model of UAVs navigation in urban canyon through defeasible logic. Journal of Logic and Computation 23(2), 373\u2013395 (2013). https:\/\/doi.org\/10.1007\/978-3-642-04985-9","DOI":"10.1007\/978-3-642-04985-9"},{"key":"32_CR13","unstructured":"Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. Journal of Machine Learning Research 17, 39:1\u201339:40 (2016), http:\/\/jmlr.org\/papers\/v17\/15-522.html"},{"key":"32_CR14","doi-asserted-by":"publisher","unstructured":"Noothigattu, R., Bouneffouf, D., Mattei, N., Chandra, R., Madan, P., Varshney, K.R., Campbell, M., Singh, M., Rossi, F.: Teaching ai agents ethical values using reinforcement learning and policy orchestration. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. pp. 6377\u20136381. International Joint Conferences on Artificial Intelligence Organization (7 2019). https:\/\/doi.org\/10.24963\/ijcai.2019\/891, https:\/\/doi.org\/10.24963\/ijcai.2019\/891","DOI":"10.24963\/ijcai.2019\/891"},{"issue":"275","key":"32_CR15","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1093\/mind\/LXIX.275.289","volume":"69","author":"PH Nowell-Smith","year":"1960","unstructured":"Nowell-Smith, P.H., Lemmon, E.J.: Escapism: The logical basis of ethics. Mind 69(275), 289\u2013300 (1960)","journal-title":"Mind"},{"issue":"7676","key":"32_CR16","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T.P., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550(7676), 354\u2013359 (2017). https:\/\/doi.org\/10.1038\/nature24270","journal-title":"Nature"},{"key":"32_CR17","doi-asserted-by":"crossref","unstructured":"Von\u00a0Wright, G.H.: An essay in deontic logic and the general theory of action. Acta Philosophica Fennica 21 (1968)","DOI":"10.22201\/iifs.18704905e.1968.50"},{"key":"32_CR18","unstructured":"Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, King\u2019s College, Cambridge, UK (May 1989), http:\/\/www.cs.rhul.ac.uk\/~chrisw\/new_thesis.pdf"}],"container-title":["Lecture Notes in Computer Science","Automated Deduction \u2013 CADE 28"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-79876-5_32","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,3]],"date-time":"2023-01-03T05:31:26Z","timestamp":1672723886000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-79876-5_32"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"ISBN":["9783030798758","9783030798765"],"references-count":18,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-79876-5_32","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2021]]},"assertion":[{"value":"5 July 2021","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"CADE","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Automated Deduction","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2021","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"12 July 2021","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"15 July 2021","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"28","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"cade2021","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/www.cs.cmu.edu\/~mheule\/CADE28\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"EasyChair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"76","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"29","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"38% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"5","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"2 invited papers and 7 system descriptions are also included.","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}