{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,19]],"date-time":"2025-05-19T18:40:02Z","timestamp":1747680002617,"version":"3.40.5"},"reference-count":28,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T00:00:00Z","timestamp":1740960000000},"content-version":"vor","delay-in-days":61,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of (total functional) transductions. We do so using variants of RASP, a programming language designed to help people \u201cthink like transformers,\u201d as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence transductions and show that it computes exactly the first-order rational transductions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular transductions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular transductions. Finally, we show that masked average-hard attention transformers can simulate S-RASP.<\/jats:p>","DOI":"10.1162\/tacl_a_00736","type":"journal-article","created":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T15:46:43Z","timestamp":1741016803000},"page":"200-219","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":0,"title":["Transformers as Transducers"],"prefix":"10.1162","volume":"13","author":[{"given":"Lena","family":"Strobl","sequence":"first","affiliation":[{"name":"Ume\u00e5 University, Sweden. lena.strobl@umu.se"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dana","family":"Angluin","sequence":"additional","affiliation":[{"name":"Yale University, USA. dana.angluin@yale.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Chiang","sequence":"additional","affiliation":[{"name":"University of Notre Dame, USA. dchiang@nd.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan","family":"Rawski","sequence":"additional","affiliation":[{"name":"San Jos\u00e9 State University, USA. jon.rawski@sjsu.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ashish","family":"Sabharwal","sequence":"additional","affiliation":[{"name":"Allen Institute for AI, USA. ashishs@allenai.org"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2025,2,28]]},"reference":[{"key":"2025051914250880300_bib1","unstructured":"Eric\n              Bakovic\n            \n          . 2000. Harmony, Dominance, and Control. Ph.D. thesis, Rutgers, The State University of New Jersey. 10.7282\/T3TQ60BJ"},{"key":"2025051914250880300_bib2","article-title":"Logical languages accepted by transformer encoders with hard attention","volume-title":"Proceedings of the Twelfth International Conference on Learning Representations (ICLR)","author":"Barcel\u00f3","year":"2024"},{"article-title":"Polyregular functions","year":"2018","author":"Boja\u0144czyk","key":"2025051914250880300_bib3"},{"key":"2025051914250880300_bib4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3531130.3533326","article-title":"Transducers of polynomial growth","volume-title":"Proceedings of the 37th Annual ACM\/IEEE Symposium on Logic in Computer Science (LICS)","author":"Boja\u0144czyk","year":"2022"},{"key":"2025051914250880300_bib5","doi-asserted-by":"publisher","first-page":"113:1\u2013113:14","DOI":"10.4230\/LIPIcs.ICALP.2020.113","article-title":"Single-use automata and transducers for infinite alphabets","volume-title":"Proceedings of the 47th International Colloquium on Automata, Languages, and Programming (ICALP)","author":"Boja\u0144czyk","year":"2020"},{"key":"2025051914250880300_bib6","doi-asserted-by":"publisher","first-page":"160","DOI":"10.4230\/LIPIcs.CSL.2015.160","article-title":"Aperiodic two-way transducers and FO-transductions","volume-title":"24th EACSL Annual Conference on Computer Science Logic (CSL)","author":"Carton","year":"2015"},{"key":"2025051914250880300_bib7","doi-asserted-by":"publisher","first-page":"7654","DOI":"10.18653\/v1\/2022.acl-long.527","article-title":"Overcoming a theoretical limitation of self- attention","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)","author":"Chiang","year":"2022"},{"key":"2025051914250880300_bib8","article-title":"Neural networks and the Chomsky hierarchy","volume-title":"Proceedings of the Eleventh International Conference on Learning Representations (ICLR)","author":"Del\u00e9tang","year":"2023"},{"key":"2025051914250880300_bib9","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1145\/2933575.2934520","article-title":"First-order definability of rational transductions: An algebraic approach","volume-title":"Proceedings of the 31st Annual ACM\/IEEE Symposium on Logic in Computer Science (LICS)","author":"Filiot","year":"2016"},{"key":"2025051914250880300_bib10","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1007\/BF01744431","article-title":"Parity, circuits, and the polynomial-time hierarchy","volume":"17","author":"Furst","year":"1984","journal-title":"Mathematical Systems Theory"},{"key":"2025051914250880300_bib11","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1162\/tacl_a_00306","article-title":"Theoretical limitations of self-attention in neural sequence models","volume":"8","author":"Hahn","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025051914250880300_bib12","doi-asserted-by":"publisher","first-page":"800","DOI":"10.1162\/tacl_a_00490","article-title":"Formal language recognition by hard attention Transformers: Perspectives from circuit complexity","volume":"10","author":"Hao","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"volume-title":"Counter-free Automata","year":"1971","author":"McNaughton","key":"2025051914250880300_bib13"},{"key":"2025051914250880300_bib14","article-title":"The expressive power of transformers with chain of thought","volume-title":"Proceedings of the Twelfth International Conference on Learning Representations (ICLR)","author":"Merrill","year":"2024"},{"key":"2025051914250880300_bib15","doi-asserted-by":"publisher","first-page":"843","DOI":"10.1162\/tacl_a_00493","article-title":"Saturated transformers are constant-depth threshold circuits","volume":"10","author":"Merrill","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"issue":"2","key":"2025051914250880300_bib16","first-page":"269","article-title":"Finite-state transducers in language and speech processing","volume":"23","author":"Mohri","year":"1997","journal-title":"Computational Linguistics"},{"article-title":"Two-way transducers with planar behaviours are aperiodic","year":"2021","author":"Nguy\u1ec5n","key":"2025051914250880300_bib17"},{"article-title":"Two-way automata and transducers with planar behaviours are aperiodic","year":"2023","author":"Nguy\u1ec5n","key":"2025051914250880300_bib18"},{"key":"2025051914250880300_bib19","first-page":"75:1\u201375:35","article-title":"Attention is Turing- complete","volume":"22","author":"P\u00e9rez","year":"2021","journal-title":"Journal of Machine Learning Research"},{"key":"2025051914250880300_bib20","doi-asserted-by":"publisher","DOI":"10.4230\/LIPIcs.ICALP.2020.135","article-title":"Implicit automata in typed \u03bb-calculi I: aperiodicity in a non-commutative logic","volume-title":"47th International Colloquium on Automata, Languages, and Programming (ICALP)","author":"Pradic","year":"2020"},{"volume-title":"Computational Approaches to Morphology and Syntax","year":"2007","author":"Roark","key":"2025051914250880300_bib21"},{"issue":"2","key":"2025051914250880300_bib22","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1016\/S0019-9958(65)90108-7","article-title":"On finite monoids having only trivial subgroups","volume":"8","author":"Sch\u00fctzenberger","year":"1965","journal-title":"Information and Control"},{"key":"2025051914250880300_bib23","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1162\/tacl_a_00663","article-title":"What formal languages can transformers express? A survey","volume":"12","author":"Strobl","year":"2024","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025051914250880300_bib24","doi-asserted-by":"publisher","first-page":"13003","DOI":"10.18653\/v1\/2023.findings-acl.824","article-title":"Challenging BIG-bench tasks and whether chain-of-thought can solve them","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Suzgun","year":"2023"},{"key":"2025051914250880300_bib25","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems 30 (NeurIPS)","author":"Vaswani","year":"2017"},{"key":"2025051914250880300_bib26","first-page":"11080","article-title":"Thinking like transformers","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Weiss","year":"2021"},{"key":"2025051914250880300_bib27","article-title":"Masked hard-attention transformers recognize exactly the star-free languages","volume-title":"Advances in Neural Information Processing 37 (NeurIPS)","author":"Yang","year":"2024"},{"key":"2025051914250880300_bib28","doi-asserted-by":"publisher","first-page":"3770","DOI":"10.18653\/v1\/2021.acl-long.292","article-title":"Self-attention networks can process bounded hierarchical languages","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP)","author":"Yao","year":"2021"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00736\/2506494\/tacl_a_00736.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00736\/2506494\/tacl_a_00736.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,19]],"date-time":"2025-05-19T18:25:19Z","timestamp":1747679119000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00736\/128187\/Transformers-as-Transducers"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":28,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00736","relation":{},"ISSN":["2307-387X"],"issn-type":[{"type":"electronic","value":"2307-387X"}],"subject":[],"published-other":{"date-parts":[[2025]]},"published":{"date-parts":[[2025]]}}}