{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T17:25:24Z","timestamp":1778347524687,"version":"3.51.4"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T00:00:00Z","timestamp":1675382400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T00:00:00Z","timestamp":1675382400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005416","name":"Norges Forskningsr\u00e5d","doi-asserted-by":"publisher","award":["270940"],"award-info":[{"award-number":["270940"]}],"id":[{"id":"10.13039\/501100005416","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012704","name":"University of Agder","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100012704","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The Tsetlin Machine is a recent supervised learning algorithm that has obtained competitive accuracy- and resource usage results across several benchmarks. It has been used for convolution, classification, and regression, producing interpretable rules in propositional logic. In this paper, we introduce the first framework for reinforcement learning based on the Tsetlin Machine. Our framework integrates the value iteration algorithm with the regression Tsetlin Machine as the value function approximator. To obtain accurate off-policy state-value estimation, we propose a modified Tsetlin Machine feedback mechanism that adapts to the dynamic nature of value iteration. In particular, we show that the Tsetlin Machine is able to unlearn and recover from the misleading experiences that often occur at the beginning of training. A key challenge that we address is mapping the intrinsically continuous nature of state-value learning to the propositional Tsetlin Machine architecture, leveraging probabilistic updates. While accurate off-policy, this mechanism learns significantly slower than neural networks on-policy. However, by introducing multi-step temporal-difference learning in combination with high-frequency propositional logic patterns, we are able to close the performance gap. Several gridworld instances document that our framework can outperform comparable neural network models, despite being based on simple one-level AND-rules in propositional logic. Finally, we propose how the class of models learnt by our Tsetlin Machine for the gridworld problem can be translated into a more understandable graph structure. The graph structure captures the state-value function approximation and the corresponding policy found by the Tsetlin Machine.<\/jats:p>","DOI":"10.1007\/s10489-022-04297-3","type":"journal-article","created":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T16:03:59Z","timestamp":1675440239000},"page":"8596-8613","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Off-policy and on-policy reinforcement learning with the Tsetlin machine"],"prefix":"10.1007","volume":"53","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2699-9903","authenticated-orcid":false,"given":"Saeed","family":"Rahimi Gorji","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7287-030X","authenticated-orcid":false,"given":"Ole-Christoffer","family":"Granmo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,2,3]]},"reference":[{"key":"4297_CR1","unstructured":"Abeyrathna KD, Bhattarai B, Goodwin M, Gorji S, Granmo OC, Jiao L, Saha R, Yadav RK (2021) Massively parallel and asynchronous Tsetlin machine architecture supporting almost constant-time scaling. In: The thirty-eighth international conference on machine learning (ICML 2021). ICML"},{"key":"4297_CR2","doi-asserted-by":"crossref","unstructured":"Abeyrathna KD, Granmo OC, Zhang X, Jiao L, Goodwin M (2019) The regression Tsetlin machine - a novel approach to interpretable non-linear regression. Phil Trans R Soc A, vol 378","DOI":"10.1098\/rsta.2019.0165"},{"key":"4297_CR3","doi-asserted-by":"publisher","first-page":"8233","DOI":"10.1109\/ACCESS.2021.3049569","volume":"9","author":"KD Abeyrathna","year":"2021","unstructured":"Abeyrathna KD, Granmo OC, Goodwin M (2021) Extending the Tsetlin machine with Integer-Weighted clauses for increased interpretability. IEEE Access 9:8233\u20138248","journal-title":"IEEE Access"},{"key":"4297_CR4","doi-asserted-by":"publisher","first-page":"115134","DOI":"10.1109\/ACCESS.2019.2935416","volume":"7","author":"GT Berge","year":"2019","unstructured":"Berge GT, Granmo OC, Tveit T, Goodwin M, Jiao L, Matheussen B (2019) Using the Tsetlin machine to learn human-interpretable rules for high-accuracy text categorization with medical applications. IEEE Access 7:115134\u2013115146","journal-title":"IEEE Access"},{"key":"4297_CR5","doi-asserted-by":"crossref","unstructured":"Bhattarai B, Granmo OC, Jiao L (2022) Word-level human interpretable scoring mechanism for novel text detection using Tsetlin machines. Appl Intell:1\u201325","DOI":"10.1007\/s10489-022-03281-1"},{"key":"4297_CR6","doi-asserted-by":"crossref","unstructured":"Ernst D, Geurts P, Wehenkel L (2003) Iteratively extending time horizon reinforcement learning. In: Machine learning: ECML 2003, pp 96-107. Springer Berlin Heidelberg","DOI":"10.1007\/978-3-540-39857-8_11"},{"key":"4297_CR7","first-page":"503","volume":"6","author":"D Ernst","year":"2005","unstructured":"Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503\u2013 556","journal-title":"J Mach Learn Res"},{"key":"4297_CR8","doi-asserted-by":"publisher","unstructured":"Ernst D, Glavic M, Geurts P, Wehenkel L (2005) Approximate value iteration in the reinforcement learning context. Appl Electr Power Syst Contr Int J Emerging Electr Power Syst, vol 3. https:\/\/doi.org\/10.2202\/1553-779X.1066","DOI":"10.2202\/1553-779X.1066"},{"key":"4297_CR9","doi-asserted-by":"crossref","unstructured":"Giri C, Granmo OC, van Hoof H, Blakely CD (2022) Logic-based ai for interpretable board game winner prediction with Tsetlin machine. In: Advances in computational intelligence - IEEE world congress on computational intelligence. IEEE, WCCI 2022, Padua, Italy, 18-23 Jul 2022","DOI":"10.1109\/IJCNN55064.2022.9892796"},{"key":"4297_CR10","unstructured":"Granmo OC (2018) The Tsetlin machine - a game theoretic bandit driven approach to optimal pattern recognition with propositional logic. arXiv:1804.01508"},{"key":"4297_CR11","unstructured":"Lavrova DS, Eliseev NN (2020) Network attacks detection based on Tsetlin machine. Inf Secur Prob Comput Syst:17\u201323"},{"key":"4297_CR12","doi-asserted-by":"publisher","unstructured":"Lei J, Rahman T, Shafik R, Wheeldon A, Yakovlev A, Granmo OC, Kawsar F, Mathur A (2021) Low-power audio keyword spotting using Tsetlin machines. J Low Power Electr Appl, vol 11(2). https:\/\/doi.org\/10.3390\/jlpea11020018, https:\/\/www.mdpi.com\/2079-9268\/11\/2\/18","DOI":"10.3390\/jlpea11020018"},{"key":"4297_CR13","unstructured":"Phoulady A, Granmo OC, Rahimi Gorji S, Phoulady HA (2020) The weighted Tsetlin machine: compressed representations with clause weighting. In: Ninth international workshop on statistical relational AI (starAI 2020)"},{"key":"4297_CR14","doi-asserted-by":"crossref","unstructured":"Rahimi Gorji S, Granmo OC, Glimsdal S, Edwards J, Goodwin M (2020) Increasing the inference and learning speed of Tsetlin machines with clause indexing. In: Trends in artificial intelligence theory and applications. Artificial intelligence practices. Springer international publishing, cham, pp 695\u2013708","DOI":"10.1007\/978-3-030-55789-8_60"},{"key":"4297_CR15","doi-asserted-by":"crossref","unstructured":"Rahimi Gorji S, Granmo OC, Phoulady A, Goodwin M (2019) A Tsetlin machine with multigranular clauses. In: Lecture notes in computer science: proceedings of the thirty-ninth international conference on innovative techniques and applications of artificial intelligence (SGAI-2019). Springer international publishing, vol 11927","DOI":"10.1007\/978-3-030-34885-4_11"},{"key":"4297_CR16","doi-asserted-by":"publisher","unstructured":"Rahman T, Shafik R, Granmo OC, Yakovlev A (2022) Resilient biomedical systems design under noise using logic-based machine learning. Frontiers Contr Eng, vol 2. https:\/\/doi.org\/10.3389\/fcteg.2021.778118https:\/\/doi.org\/10.3389\/fcteg.2021.778118","DOI":"10.3389\/fcteg.2021.778118 10.3389\/fcteg.2021.778118"},{"key":"4297_CR17","doi-asserted-by":"publisher","first-page":"42200","DOI":"10.1109\/ACCESS.2020.2976199","volume":"8","author":"R Roscher","year":"2020","unstructured":"Roscher R, Bohn B, Duarte MF, Garcke J (2020) Explainable machine learning for scientific insights and discoveries. IEEE Access 8:42200\u201342216. https:\/\/doi.org\/10.1109\/ACCESS.2020.2976199","journal-title":"IEEE Access"},{"key":"4297_CR18","unstructured":"Rosenstein M, Barto A (2002) Supervised learning combined with an actor-critic architecture title2: Tech rep USA"},{"issue":"5","key":"4297_CR19","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","volume":"1","author":"C Rudin","year":"2019","unstructured":"Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Mach Intell 1 (5):206\u2013215. https:\/\/doi.org\/10.1038\/s42256-019-0048-x","journal-title":"Nature Mach Intell"},{"key":"4297_CR20","unstructured":"Rummery GA, Niranjan M (1994) On-line q-learning using connectionist systems. Tech Rep"},{"key":"4297_CR21","doi-asserted-by":"crossref","unstructured":"Saha R, Granmo OC, Goodwin M (2021) Using Tsetlin machine to discover interpretable rules in natural language processing applications. Expert Syst. https:\/\/onlinelibrary.wiley.com\/doi\/abs\/10.1111\/exsy.12873","DOI":"10.1111\/exsy.12873"},{"key":"4297_CR22","doi-asserted-by":"crossref","unstructured":"Saha R, Granmo OC, Zadorozhny VI, Goodwin M (2022) A relational Tsetlin machine with applications to natural language understanding. J Intell Inf Syst:1\u201328","DOI":"10.1007\/s10844-021-00682-5"},{"key":"4297_CR23","doi-asserted-by":"crossref","unstructured":"Shafik R, Wheeldon A, Yakovlev A (2020) Explainability and dependability analysis of learning automata based AI hardware. In: IEEE 26th international symposium on on-line testing and robust system design (IOLTS). IEEE, Naples, Italy","DOI":"10.1109\/IOLTS50870.2020.9159725"},{"key":"4297_CR24","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. a bradford book, Cambridge, MA, USA"},{"issue":"11","key":"4297_CR25","doi-asserted-by":"publisher","first-page":"1134","DOI":"10.1145\/1968.1972","volume":"27","author":"LG Valiant","year":"1984","unstructured":"Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1134\u20131142. https:\/\/doi.org\/10.1145\/1968.1972","journal-title":"Commun ACM"},{"key":"4297_CR26","doi-asserted-by":"crossref","unstructured":"Wheeldon A, Shafik R, Rahman T, Lei J, Yakovlev A, Granmo OC (2020) Learning automata based energy-efficient ai hardware design for IoT. Philosophical transactions of the royal society a","DOI":"10.1098\/rsta.2019.0593"},{"key":"4297_CR27","doi-asserted-by":"crossref","unstructured":"Yadav RK, Jiao L, Granmo OC, Goodwin M (2021) Human-level interpretable learning for aspect-based sentiment analysis. In: Proceedings of AAAI, Vancouver, Canada. AAAI","DOI":"10.1609\/aaai.v35i16.17671"},{"key":"4297_CR28","doi-asserted-by":"crossref","unstructured":"Zhang X, Jiao L, Granmo OC, Goodwin M (2021) On the convergence of Tsetlin machines for the IDENTITY- and NOT operators. IEEE Trans Pattern Anal Mach Intell","DOI":"10.1109\/TPAMI.2021.3085591"},{"issue":"5","key":"4297_CR29","doi-asserted-by":"publisher","first-page":"726","DOI":"10.1109\/TETCI.2021.3100641","volume":"5","author":"Y Zhang","year":"2021","unstructured":"Zhang Y, Ti\u0148o P, Leonardis A, Tang K (2021) A survey on neural network interpretability. IEEE Trans Emerging Topics Computat Intell 5(5):726\u2013742. https:\/\/doi.org\/10.1109\/TETCI.2021.3100641","journal-title":"IEEE Trans Emerging Topics Computat Intell"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-04297-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-022-04297-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-04297-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,30]],"date-time":"2023-04-30T09:31:56Z","timestamp":1682847116000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-022-04297-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,3]]},"references-count":29,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["4297"],"URL":"https:\/\/doi.org\/10.1007\/s10489-022-04297-3","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,3]]},"assertion":[{"value":"27 October 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 February 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"<!--Emphasis Type='Bold' removed-->Competing interests"}}]}}