{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T10:54:03Z","timestamp":1775732043658,"version":"3.50.1"},"reference-count":66,"publisher":"American Association for the Advancement of Science (AAAS)","issue":"98","content-domain":{"domain":["www.science.org"],"crossmark-restriction":true},"short-container-title":["Sci. Robot."],"published-print":{"date-parts":[[2025,1,22]]},"abstract":"<jats:p>Humans excel at applying learned behavior to unlearned situations. A crucial component of this generalization behavior is our ability to compose\/decompose a whole into reusable parts, an attribute known as compositionality. One of the fundamental questions in robotics concerns this characteristic: How can linguistic compositionality be developed concomitantly with sensorimotor skills through associative learning, particularly when individuals only learn partial linguistic compositions and their corresponding sensorimotor patterns? To address this question, we propose a brain-inspired neural network model that integrates vision, proprioception, and language into a framework of predictive coding and active inference on the basis of the free-energy principle. The effectiveness and capabilities of this model were assessed through various simulation experiments conducted with a robot arm. Our results show that generalization in learning to unlearned verb-noun compositions is significantly enhanced when training variations of task composition are increased. We attribute this to self-organized compositional structures in linguistic latent state space being influenced substantially by sensorimotor learning. Ablation studies show that visual attention and working memory are essential to accurately generate visuomotor sequences to achieve linguistically represented goals. These insights advance our understanding of mechanisms underlying development of compositionality through interactions of linguistic and sensorimotor experience.<\/jats:p>","DOI":"10.1126\/scirobotics.adp0751","type":"journal-article","created":{"date-parts":[[2025,1,22]],"date-time":"2025-01-22T18:58:12Z","timestamp":1737572292000},"update-policy":"https:\/\/doi.org\/10.34133\/aaas_crossmark","source":"Crossref","is-referenced-by-count":13,"title":["Development of compositionality through interactive learning of language and action of robots"],"prefix":"10.1126","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6036-651X","authenticated-orcid":true,"given":"Prasanna","family":"Vijayaraghavan","sequence":"first","affiliation":[{"name":"Okinawa Institute of Science and Technology, Okinawa, Japan."}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9725-6971","authenticated-orcid":true,"given":"Jeffrey Frederic","family":"Quei\u00dfer","sequence":"additional","affiliation":[{"name":"Okinawa Institute of Science and Technology, Okinawa, Japan."}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0712-145X","authenticated-orcid":true,"given":"Sergio Verduzco","family":"Flores","sequence":"additional","affiliation":[{"name":"Okinawa Institute of Science and Technology, Okinawa, Japan."}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9131-9206","authenticated-orcid":true,"given":"Jun","family":"Tani","sequence":"additional","affiliation":[{"name":"Okinawa Institute of Science and Technology, Okinawa, Japan."}]}],"member":"221","reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"N. Chomsky Syntactic Structures (Mouton and Co. 1957).","DOI":"10.1515\/9783112316009"},{"key":"e_1_3_2_3_2","unstructured":"G. Evans The Varieties of Reference (Oxford Univ. Press 1982)."},{"key":"e_1_3_2_4_2","unstructured":"G. Frege Collected Papers on Mathematics Logic and Philosophy (Wiley-Blackwell 1991)."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1026542332224"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0140525X16001837"},{"key":"e_1_3_2_7_2","first-page":"757","article-title":"Compositionality decomposed: How do neural networks generalise?","volume":"67","author":"Hupkes D.","year":"2020","unstructured":"D. Hupkes, V. Dankers, M. Mul, E. Bruni, Compositionality decomposed: How do neural networks generalise? J. Artif. Intel. Res. 67, 757\u2013795 (2020).","journal-title":"J. Artif. Intel. Res."},{"key":"e_1_3_2_8_2","unstructured":"C. Lynch A. Wahid J. Thompson T. Ding J. Betker R. Baruch T. Armstrong P. Florence Interactive language: Talking to robots in real time. arXiv:2210.06407 [cs.RO] (2022)."},{"key":"e_1_3_2_9_2","unstructured":"S. Nolfi On the unexpected abilities of large language models. arXiv:2308.09720 [cs.AI] (2023)."},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"M. Abdou A. Kulmizev D. Hershcovich S. Frank E. Pavlick A. Sogaard. Can language models encode perceptual structure without grounding? A case study in color. arXiv:2109.06129 [cs.CV] (2021).","DOI":"10.18653\/v1\/2021.conll-1.9"},{"key":"e_1_3_2_11_2","unstructured":"S. Yousefi L. Betthauser H. Hasanbeig R. Milli\u00e8re I. Momennejad Decoding in-context learning: Neuroscience-inspired analysis of representations in large language models. arXiv:2310.00313 [cs.CL] (2024)."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsta.2022.0041"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1038\/nrn2787"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog2706_2"},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"M. Tomasello \u201cThe usage-based theory of language acquistion\u201d in The Cambridge Handbook of Child Language E. L. Bavin Ed. (Cambridge Univ. Press 2009) pp. 69-87.","DOI":"10.1017\/CBO9780511576164.005"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1162\/1064546053278973"},{"key":"e_1_3_2_17_2","unstructured":"J. Piaget The Language and Thought of the Child (Meridian 1955)."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cogbrainres.2005.02.020"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cortex.2017.10.021"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1038\/nrn2811"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","unstructured":"P. Oudeyer G. Kachergis W. Schueller Computational and robotic models of early language development: A review. arXiv:1903.10246 [cs.CL] (2019).","DOI":"10.4324\/9781315110622-5"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/0167-2789(90)90087-6"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9507.1992.tb00135.x"},{"key":"e_1_3_2_24_2","unstructured":"M. Tomasello First Verbs: A Case Study of Early Grammatical Development (Cambridge Univ. Press 2009)."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2018.02.004"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"L. Raggioli A. Cangelosi Embodied attention in word-object mapping: A developmental cognitive robotics model in 2022 IEEE International Conference on Development and Learning (ICDL) (IEEE 2022) pp. 156\u2013163.","DOI":"10.1109\/ICDL53763.2022.9962189"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1177\/105971230501300102"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAMD.2010.2053034"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1098\/rstb.2017.0131"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1080\/09540091.2017.1318357"},{"key":"e_1_3_2_31_2","unstructured":"A. Akakzia C. Colas P. Y. Oudeyer M. Chetouani O. Sigaud Grounding language to autonomously-acquired skills via goal generation poster presented at the Ninth International Conference on Learning Representations (ICLR) 3 to 7 May 2021; https:\/\/iclr.cc\/virtual\/2021\/poster\/3190."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2852838"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1038\/4580"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1098\/rstb.2008.0300"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0006421"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00422-010-0364-z"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2011.00218"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.3390\/e22050564"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.3390\/e24040469"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco_a_01412"},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","unstructured":"S. R. Sehon Goal-directed action and teleological explanation in Causation and Explanation Topics in Contemporary Philosophy J. Keim Campbell M. O\u2019Rourke H. S. Silverstein Eds. (MIT Press 2007) pp. 155\u2013170.","DOI":"10.7551\/mitpress\/1753.003.0010"},{"key":"e_1_3_2_42_2","first-page":"111","article-title":"One-year-old infants use teleological representations of actions productively","volume":"27","author":"Csibra G.","year":"2003","unstructured":"G. Csibra, S. B\u00edr\u00f3, O. Ko\u00f3s, G. Gergely, One-year-old infants use teleological representations of actions productively. Cogn. Sci. 27, 111\u2013133 (2003).","journal-title":"Cogn. Sci."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco_a_01228"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cell.2020.09.031"},{"key":"e_1_3_2_45_2","unstructured":"T. Ito T. Klinger D. Schultz J. Murray M. Cole M. Rigotti Compositional generalization through abstract representations in human and artificial neural networks in vol. 35 of Advances in Neural Information Processing Systems S. Koyejo S. Mohamed A. Agarwal D. Belgrave K. Cho A. Oh Eds. (Curran Associates 2022) pp. 32225\u201332239."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2003.809171"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2004.05.007"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1162\/089976698300017467"},{"key":"e_1_3_2_49_2","first-page":"28","article-title":"The generalization of \u2019student\u2019s\u2019 problem when several different population variances are involved","volume":"34","author":"Welch B. L.","year":"1947","unstructured":"B. L. Welch, The generalization of \u2019student\u2019s\u2019 problem when several different population variances are involved. Biometrika 34, 28\u201335 (1947).","journal-title":"Biometrika"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2024.1353870"},{"key":"e_1_3_2_51_2","first-page":"3","article-title":"Poverty of stimulus: Unfinished business","volume":"33","author":"Chomsky N.","year":"2012","unstructured":"N. Chomsky, Poverty of stimulus: Unfinished business. Studies Chin. Linguistics 33, 3\u201316 (2012).","journal-title":"Studies Chin. Linguistics"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.3389\/fnbot.2020.00061"},{"key":"e_1_3_2_53_2","unstructured":"A. Radford J. W. Kim C. Hallacy A. Ramesh G. Goh S. Agarwal G. Sastry A. Askell P. Mishkin J. Clark G. Krueger I. Sutskever Learning transferable visual models from natural language supervision. arXiv:2103.00020 [cs.CV] (2021)."},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","unstructured":"M. G. C. A. Cimino F. A. Galatolo G. Vaglini Generating images from caption and vice versa via clip-guided generative latent space search in IMPROVE 2021: Proceedings of the International Conference on Image Processing and Vision Engineering (ACM 2021) pp. 166\u2013174.","DOI":"10.5220\/0010503701660174"},{"key":"e_1_3_2_55_2","unstructured":"J.-B. Alayrac J. Donahue P. Luc A. Miech I. Barr Y. Hasson K. Lenc A. Mensch K. Millican M. Reynolds R. Ring E. Rutherford S. Cabi T. Han Z. Gong S. Samangooei M. Monteiro J. L. Menick S. Borgeaud A. Brock A. Nematzadeh S. Sharifzadeh M. Bi\u0144kowski R. Barreira O. Vinyals A. Zisserman K. Simonyan Flamingo: A visual language model for few-shot learning in vol. 35 of Advances in Neural Information Processing Systems S. Koyejo S. Mohamed A. Agarwal D. Belgrave K. Cho A. Oh Eds. (Curran Associates 2022) pp. 23716\u201323736."},{"key":"e_1_3_2_56_2","unstructured":"M. Ahn A. Brohan N. Brown Y. Chebotar O. Cortes B. David C. Finn C. Fu K. Gopalakrishnan K. Hausman A. Herzog D. Ho J. Hsu J. Ibarz B. Ichter A. Irpan E. Jang R. J. Ruano K. Jeffrey S. Jesmonth N. J. Joshi R. Julian D. Kalashnikov Y. Kuang K.-H. Lee S. Levine Y. Lu L. Luu C. Parada P. Pastor J. Quiambao K. Rao J. Rettinghouse D. Reyes P. Sermanet N. Sievers C. Tan A. Toshev V. Vanhoucke F. Xia T. Xiao P. Xu S. Xu M. Yan A. Zeng Do as I can not as I say: Grounding language in robotic affordances. arXiv:2204.01691 [cs.RO] (2022)."},{"key":"e_1_3_2_57_2","unstructured":"D. Driess F. Xia M. S. M. Sajjadi C. Lynch A. Chowdhery B. Ichter A. Wahid J. Tompson Q. Vuong T. Yu W. Huang Y. Chebotar P. Sermanet D. Duckworth S. Levine V. Vanhoucke K. Hausman M. Toussaint K. Greff A. Zeng I. Mordatch P. Florence PaLM-E: An embodied multimodal language model. arXiv:2303.03378 [cs.LG] (2023)."},{"key":"e_1_3_2_58_2","unstructured":"A. Brohan N. Brown J. Carbajal Y. Chebotar X. Chen K. Choromanski T. Ding D. Driess A. Dubey C. Finn P. Florence C. Fu M. G. Arenas K. Gopalakrishnan K. Han K. Hausman A. Herzog J. Hsu B. Ichter A. Irpan N. Joshi R. Julian D. Kalashnikov Y. Kuang I. Leal L. Lee Tsang-Wei Edward Lee S. Levine Y. Lu H. Michalewski I. Mordatch K. Pertsch K. Rao K. Reymann M. Ryoo G. Salazar P. Sanketi P. Sermanet J. Singh A. Singh R. Soricut H. Tran V. Vanhoucke Q. Vuong A. Wahid S. Welker P. Wohlhart J. Wu F. Xia T. Xiao P. Xu S. Xu T. Yu B. Zitkovich Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv:2307.15818 [cs.RO] (2023)."},{"key":"e_1_3_2_59_2","unstructured":"D. J. Chalmers The Conscious Mind: In Search of a Fundamental Theory (Oxford Paperbacks 1997)."},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3392663"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2023.10.002"},{"key":"e_1_3_2_62_2","unstructured":"T. Yoshida A. Masumori T. Ikegami From text to motion: Grounding gpt-4 in a humanoid robot \u201calter3.\u201d arXiv:2312.06571 [cs.RO] (2023)."},{"key":"e_1_3_2_63_2","unstructured":"M. Jaderberg K. Simonyan A. Zisserman K. Kavukcuoglu Spatial transformer networks in vol. 28 of Advances in Neural Information Processing Systems C. Cortes N. Lawrence D. Lee M. Sugiyama R. Garnett Eds. (Curran Associates 2015) pp. 2017\u20132025."},{"key":"e_1_3_2_64_2","first-page":"335","article-title":"The hierarchical and functional connectivity of higher-order cognitive mechanisms: Neurorobotic model to investigate the stability and flexibility of working memory","volume":"7","author":"Shibata Alnajjar F.","year":"2013","unstructured":"F. Shibata Alnajjar, Y. Yamashita, J. Tani, The hierarchical and functional connectivity of higher-order cognitive mechanisms: Neurorobotic model to investigate the stability and flexibility of working memory. Front. Neurorobot. 7, 335\u2013346 (2013).","journal-title":"Front. Neurorobot."},{"key":"e_1_3_2_65_2","unstructured":"C. Bishop Pattern Recognition and Machine Learning (Information Science and Statistics (Springer-Verlag 2006)."},{"key":"e_1_3_2_66_2","unstructured":"D. P. Kingma J. Ba Adam: A method for stochastic optimization. arXiv:1412.6980 [cs.LG] (2017)."},{"key":"e_1_3_2_67_2","unstructured":"R. Pascanu T. Mikolov Y. Bengio On the difficulty of training recurrent neural networks in Proceedings of the 30th International Conference on Machine Learning S. Dasgupta D. McAllester Eds. (MLResearchPress 2013) pp. 1310\u20131318."}],"container-title":["Science Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.science.org\/doi\/pdf\/10.1126\/scirobotics.adp0751","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,22]],"date-time":"2025-01-22T18:58:38Z","timestamp":1737572318000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.science.org\/doi\/10.1126\/scirobotics.adp0751"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,22]]},"references-count":66,"journal-issue":{"issue":"98","published-print":{"date-parts":[[2025,1,22]]}},"alternative-id":["10.1126\/scirobotics.adp0751"],"URL":"https:\/\/doi.org\/10.1126\/scirobotics.adp0751","relation":{},"ISSN":["2470-9476"],"issn-type":[{"value":"2470-9476","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,22]]},"assertion":[{"value":"2024-03-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"eadp0751"}}