{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T19:23:29Z","timestamp":1773343409122,"version":"3.50.1"},"reference-count":72,"publisher":"Elsevier BV","issue":"2","license":[{"start":{"date-parts":[[2021,8,16]],"date-time":"2021-08-16T00:00:00Z","timestamp":1629072000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2021,8,16]],"date-time":"2021-08-16T00:00:00Z","timestamp":1629072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/100000081","name":"Directorate for Education and Human Resources","doi-asserted-by":"publisher","award":["1726550"],"award-info":[{"award-number":["1726550"]}],"id":[{"id":"10.13039\/100000081","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Artif Intell Educ"],"published-print":{"date-parts":[[2022,6]]},"DOI":"10.1007\/s40593-021-00269-9","type":"journal-article","created":{"date-parts":[[2021,8,16]],"date-time":"2021-08-16T14:02:57Z","timestamp":1629122577000},"page":"454-500","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Leveraging Granularity: Hierarchical Reinforcement Learning for Pedagogical Policy Induction"],"prefix":"10.1016","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9401-4854","authenticated-orcid":false,"given":"Guojing","family":"Zhou","sequence":"first","affiliation":[]},{"given":"Hamoon","family":"Azizsoltani","sequence":"additional","affiliation":[]},{"given":"Markel Sanz","family":"Ausin","sequence":"additional","affiliation":[]},{"given":"Tiffany","family":"Barnes","sequence":"additional","affiliation":[]},{"given":"Min","family":"Chi","sequence":"additional","affiliation":[]}],"member":"78","published-online":{"date-parts":[[2021,8,16]]},"reference":[{"issue":"1","key":"269_CR1","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1037\/0003-066X.48.1.35","volume":"48","author":"JR Anderson","year":"1993","unstructured":"Anderson, J. R. (1993). Problem solving and learning. American Psychologist, 48(1), 35.","journal-title":"American Psychologist"},{"issue":"2","key":"269_CR2","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1207\/s15327809jls0402_2","volume":"4","author":"JR Anderson","year":"1995","unstructured":"Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. The journal of the learning sciences, 4(2), 167\u2013207.","journal-title":"The journal of the learning sciences"},{"key":"269_CR3","unstructured":"Andrychowicz, M., Baker, B., & et al. (2018). Learning dexterous in-hand manipulation. arXiv:1808.00177."},{"key":"269_CR4","doi-asserted-by":"crossref","unstructured":"Azizsoltani, H., Kim, Y. J., Ausin, M. S., Barnes, T., & Chi, M. (2019). Unobserved is not equal to non-existent: Using gaussian processes to infer immediate rewards across contexts. IJCAI, 1974\u20131980.","DOI":"10.24963\/ijcai.2019\/273"},{"issue":"July","key":"269_CR5","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1016\/j.engappai.2018.06.007","volume":"74","author":"H Azizsoltani","year":"2018","unstructured":"Azizsoltani, H., & Sadeghi, E. (2018). Adaptive sequential strategy for risk estimation of engineering systems using gaussian process regression active learning. Engineering Applications of Artificial Intelligence, 74(July), 146\u2013165.","journal-title":"Engineering Applications of Artificial Intelligence"},{"issue":"1-2","key":"269_CR6","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/A:1022140919877","volume":"13","author":"AG Barto","year":"2003","unstructured":"Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems, 13(1-2), 41\u201377.","journal-title":"Discrete event dynamic systems"},{"issue":"552-557","key":"269_CR7","first-page":"1","volume":"2000","author":"J Beck","year":"2000","unstructured":"Beck, J., Woolf, B. P., & Beal, C. R. (2000). Advisor: a machine learning architecture for intelligent tutor construction. AAAI\/IAAI, 2000(552-557), 1\u20132.","journal-title":"AAAI\/IAAI"},{"key":"269_CR8","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1017\/CBO9780511840975.004","volume":"1","author":"S Chaiklin","year":"2003","unstructured":"Chaiklin, S., et al. (2003). The zone of proximal development in vygotsky\u2019s analysis of learning and instruction. Vygotsky\u2019s educational theory in cultural context, 1, 39\u201364.","journal-title":"Vygotsky\u2019s educational theory in cultural context"},{"key":"269_CR9","first-page":"409","volume":"158","author":"M Chi","year":"2007","unstructured":"Chi, M., & Vanlehn, K. (2007). Accelerated future learning via explicit instruction of a problem solving strategy. Frontiers In Artificial Intelligence And Applications, 158, 409.","journal-title":"Frontiers In Artificial Intelligence And Applications"},{"issue":"1","key":"269_CR10","first-page":"25","volume":"13","author":"M Chi","year":"2010","unstructured":"Chi, M., & VanLehn, K. (2010). Meta-cognitive strategy instruction in intelligent tutoring systems: how, when, and why. Educational Technology & Society, 13(1), 25\u201339.","journal-title":"Educational Technology & Society"},{"issue":"1-2","key":"269_CR11","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1007\/s11257-010-9093-1","volume":"21","author":"M Chi","year":"2011","unstructured":"Chi, M., VanLehn, K., Litman, D., & Jordan, P. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction, 21 (1-2), 137\u2013180.","journal-title":"User Modeling and User-Adapted Interaction"},{"key":"269_CR12","unstructured":"Clement, B., Oudeyer, P. Y., & Lopes, M. (2016). A comparison of automatic teaching strategies for heterogeneous student populations. In EDM 16-9th international conference on educational data mining."},{"key":"269_CR13","unstructured":"Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. Irvington."},{"key":"269_CR14","doi-asserted-by":"crossref","unstructured":"Cuay\u00e1huitl, H., Dethlefs, N., Frommberger, L., Richter, K. F., & Bateman, J. (2010). Generating adaptive route instructions using hierarchical reinforcement learning. In International conference on spatial cognition (pp. 319\u2013334). Springer.","DOI":"10.1007\/978-3-642-14749-4_27"},{"key":"269_CR15","doi-asserted-by":"crossref","unstructured":"Doroudi, S., Aleven, V., & Brunskill, E. (2017). Robust evaluation matrix: Towards a more principled offline exploration of instructional policies. In Proceedings of the fourth (2017) ACM conference on learning@ scale (pp. 3\u201312).","DOI":"10.1145\/3051457.3051463"},{"issue":"4","key":"269_CR16","doi-asserted-by":"publisher","first-page":"568","DOI":"10.1007\/s40593-019-00187-x","volume":"29","author":"S Doroudi","year":"2019","unstructured":"Doroudi, S., Aleven, V., & Brunskill, E. (2019). Where\u2019s the reward? International Journal of Artificial Intelligence in Education, 29(4), 568\u2013620. https:\/\/doi.org\/10.1007\/s40593-019-00187-x.","journal-title":"International Journal of Artificial Intelligence in Education"},{"key":"269_CR17","first-page":"116","volume":"512","author":"ML Eaton","year":"1983","unstructured":"Eaton, M. L. (1983). Multivariate statistics: a vector space approach. John Wiley & Sons, Inc., 605 Third Ave., New York, NY 10158, USA, 1983, 512, 116\u2013117.","journal-title":"John Wiley & Sons, Inc., 605 Third Ave., New York, NY 10158, USA, 1983"},{"key":"269_CR18","volume-title":"An introduction to probability theory and its applications, Vol. 2","author":"W Feller","year":"2008","unstructured":"Feller, W. (2008). An introduction to probability theory and its applications Vol. 2. Hoboken: Wiley."},{"key":"269_CR19","unstructured":"Goldberg, P. W., Williams, C. K., & et al. (1998). Regression with input-dependent noise: a gaussian process treatment. In NIPS (pp. 493\u2013499)."},{"issue":"4","key":"269_CR20","doi-asserted-by":"publisher","first-page":"1261","DOI":"10.1109\/TIT.2005.844072","volume":"51","author":"D Guo","year":"2005","unstructured":"Guo, D., Shamai, S., & Verd\u00fa, S. (2005). Mutual information and minimum mean-square error in gaussian channels. IEEE Transactions on Information Theory, 51(4), 1261\u20131282.","journal-title":"IEEE Transactions on Information Theory"},{"key":"269_CR21","unstructured":"Haarnoja, T., Zhou, A., & et al. (2018). Soft actor-critic algorithms and applications. arXiv:1812.05905."},{"issue":"1","key":"269_CR22","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1007\/s10489-008-0115-1","volume":"31","author":"A Iglesias","year":"2009","unstructured":"Iglesias, A., Mart\u00ednez, P., Aler, R., & Fern\u00e1ndez, F. (2009). Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Applied Intelligence, 31(1), 89\u2013106.","journal-title":"Applied Intelligence"},{"issue":"4","key":"269_CR23","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1016\/j.knosys.2009.01.007","volume":"22","author":"A Iglesias","year":"2009","unstructured":"Iglesias, A., Mart\u00ednez, P., Aler, R., & Fern\u00e1ndez, F. (2009). Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems. Knowledge-Based Systems, 22(4), 266\u2013270.","journal-title":"Knowledge-Based Systems"},{"issue":"3","key":"269_CR24","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1007\/s11251-009-9102-0","volume":"38","author":"S Kalyuga","year":"2010","unstructured":"Kalyuga, S., & Renkl, A. (2010). Expertise reversal effect and its instructional implications: Introduction to the special issue. Instructional Science, 38 (3), 209\u2013215.","journal-title":"Instructional Science"},{"key":"269_CR25","unstructured":"Kulkarni, T. D., Narasimhan, K., Saeedi, A., & Tenenbaum, J. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in neural information processing systems (pp. 3675\u20133683)."},{"key":"269_CR26","unstructured":"Lillicrap, T. P., Hunt, J. J., & et al. (2015). Continuous control with deep reinforcement learning. arXiv:1509.02971."},{"key":"269_CR27","unstructured":"Liz, B., Dreyfus, T., Mason, J., Tsamir, P., Watson, A., & Zaslavsky, O. (2006). Exemplification in mathematics education. In Proceedings of the 30th conference of the international group for the psychology of mathematics education. ERIC, (Vol. 1 pp. 126\u2013154 )."},{"key":"269_CR28","unstructured":"Mandel, T., Liu, Y. E., Levine, S., Brunskill, E., & Popovic, Z. (2014). Offline policy evaluation across representations with applications to educational games. In Proceedings of the 2014 international conference on autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems (pp. 1077\u20131084)."},{"key":"269_CR29","doi-asserted-by":"crossref","unstructured":"McLaren, B. M., van Gog, T., Ganoe, C., Yaron, D., & Karabinos, M. (2014). Exploring the assistance dilemma: Comparing instructional support in examples and problems. In Intelligent tutoring systems (pp. 354\u2013361). Springer.","DOI":"10.1007\/978-3-319-07221-0_44"},{"key":"269_CR30","doi-asserted-by":"crossref","unstructured":"McLaren, B. M., & Isotani, S. (2011). When is it best to learn with all worked examples?. In International conference on artificial intelligence in education (pp. 222\u2013229). Springer.","DOI":"10.1007\/978-3-642-21869-9_30"},{"key":"269_CR31","unstructured":"McLaren, B. M., Lim, S. J., & Koedinger, K. R. (2008). When and how often should worked examples be given to students? new results and a summary of the current state of research. In Cogsci (pp. 2176\u20132181)."},{"issue":"7540","key":"269_CR32","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., & et al. (2015). Human-level control through deep reinforcement learning. Nature, 518 (7540), 529.","journal-title":"Nature"},{"key":"269_CR33","doi-asserted-by":"crossref","unstructured":"Najar, A. S., Mitrovic, A., & McLaren, B. M. (2014). Adaptive support versus alternating worked examples and tutored problems: Which leads to better learning?. In UMAP (pp. 171\u2013182). Springer.","DOI":"10.1007\/978-3-319-08786-3_15"},{"issue":"4","key":"269_CR34","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1145\/3072959.3073602","volume":"36","author":"XB Peng","year":"2017","unstructured":"Peng, X. B., Berseth, G., Yin, K., & Van De Panne, M. (2017). Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG), 36(4), 41.","journal-title":"ACM Transactions on Graphics (TOG)"},{"issue":"2","key":"269_CR35","doi-asserted-by":"publisher","first-page":"4064","DOI":"10.1016\/j.sbspro.2010.03.641","volume":"2","author":"P Phobun","year":"2010","unstructured":"Phobun, P., & Vicheanpanya, J. (2010). Adaptive intelligent tutoring systems for e-learning systems. Procedia-Social and Behavioral Sciences, 2(2), 4064\u20134069.","journal-title":"Procedia-Social and Behavioral Sciences"},{"issue":"6","key":"269_CR36","doi-asserted-by":"publisher","first-page":"1290","DOI":"10.1111\/cogs.12290","volume":"40","author":"AN Rafferty","year":"2016","unstructured":"Rafferty, A. N., Brunskill, E., Griffiths, T. L., & Shafto, P. (2016). Faster teaching via pomdp planning. Cognitive science, 40(6), 1290\u20131332.","journal-title":"Cognitive science"},{"key":"269_CR37","doi-asserted-by":"crossref","unstructured":"Rasmussen, C. E. (2004). Gaussian processes in machine learning. In Advanced lectures on machine learning (pp. 63\u201371). Springer.","DOI":"10.1007\/978-3-540-28650-9_4"},{"issue":"4","key":"269_CR38","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1080\/00220970209599510","volume":"70","author":"A Renkl","year":"2002","unstructured":"Renkl, A., Atkinson, R. K., Maier, U. H., & Staley, R. (2002). From example study to problem solving: Smooth transitions help learning. The Journal of Experimental Education, 70(4), 293\u2013315.","journal-title":"The Journal of Experimental Education"},{"key":"269_CR39","unstructured":"Rowe, J., Mott, B., & Lester, J. (2014). Optimizing player experience in interactive narrative planning: a modular reinforcement learning approach. In Tenth artificial intelligence and interactive digital entertainment conference."},{"key":"269_CR40","doi-asserted-by":"crossref","unstructured":"Rowe, J. P., & Lester, J. C. (2015). Improving student problem solving in narrative-centered learning environments: a modular reinforcement learning framework. In International conference on artificial intelligence in education (pp. 419\u2013428). Springer.","DOI":"10.1007\/978-3-319-19773-9_42"},{"key":"269_CR41","unstructured":"Ryan, M., & Reid, M. (2000). Learning to fly: an application of hierarchical reinforcement learning. In Proceedings of the 17th international conference on machine learning. Citeseer."},{"issue":"3","key":"269_CR42","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1007\/s11251-009-9107-8","volume":"38","author":"RJ Salden","year":"2010","unstructured":"Salden, R. J., Aleven, V., Schwonke, R., & Renkl, A. (2010). The expertise reversal effect and worked examples in tutored problem solving. Instructional Science, 38(3), 289\u2013307.","journal-title":"Instructional Science"},{"key":"269_CR43","unstructured":"Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv:1511.05952."},{"key":"269_CR44","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning (pp. 1889\u20131897)."},{"key":"269_CR45","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347."},{"issue":"9-10","key":"269_CR46","doi-asserted-by":"publisher","first-page":"1569","DOI":"10.1007\/s10994-017-5650-8","volume":"106","author":"D Schwab","year":"2017","unstructured":"Schwab, D., & Ray, S. (2017). Offline reinforcement learning with task hierarchies. Machine Learning, 106(9-10), 1569\u20131598.","journal-title":"Machine Learning"},{"issue":"2","key":"269_CR47","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1016\/j.chb.2008.12.011","volume":"25","author":"R Schwonke","year":"2009","unstructured":"Schwonke, R., Renkl, A., Krieg, C., Wittwer, J., Aleven, V., & Salden, R. (2009). The worked-example effect: Not an artefact of lousy control conditions. Computers in Human Behavior, 25(2), 258\u2013266.","journal-title":"Computers in Human Behavior"},{"key":"269_CR48","doi-asserted-by":"crossref","unstructured":"Shen, S., Ausin, M. S., Mostafavi, B., & Chi, M. (2018). Improving learning & reducing time: a constrained action-based reinforcement learning approach. In Proceedings of the 26th conference on user modeling, adaptation and personalization (pp. 43\u201351). ACM.","DOI":"10.1145\/3209219.3209232"},{"key":"269_CR49","doi-asserted-by":"crossref","unstructured":"Shen, S., & Chi, M. (2016). Reinforcement learning: the sooner the better, or the later the better?. In Proceedings of the 2016 conference on user modeling adaptation and personalization (pp. 37\u201344). ACM.","DOI":"10.1145\/2930238.2930247"},{"key":"269_CR50","doi-asserted-by":"crossref","unstructured":"Shih, B., Koedinger, K. R., & Scheines, R. (2011). A response time model for bottom-out hints as worked examples. Handbook of educational data mining, 201\u2013212.","DOI":"10.1201\/b10274-17"},{"issue":"7587","key":"269_CR51","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., & et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484.","journal-title":"Nature"},{"issue":"6419","key":"269_CR52","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1126\/science.aar6404","volume":"362","author":"D Silver","year":"2018","unstructured":"Silver, D., Hubert, T., Schrittwieser, J., & et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140\u20131144.","journal-title":"Science"},{"issue":"2","key":"269_CR53","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1037\/0022-006X.59.2.205","volume":"59","author":"RE Snow","year":"1991","unstructured":"Snow, R. E. (1991). Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. Journal of Consulting and Clinical Psychology, 59(2), 205.","journal-title":"Journal of Consulting and Clinical Psychology"},{"key":"269_CR54","doi-asserted-by":"crossref","unstructured":"Stamper, J. C., Eagle, M., Barnes, T., & Croy, M. (2011). Experimental evaluation of automatic hint generation for a logic tutor. In International conference on artificial intelligence in education (pp. 345\u2013352). Springer.","DOI":"10.1007\/978-3-642-21869-9_45"},{"issue":"1-2","key":"269_CR55","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"RS Sutton","year":"1999","unstructured":"Sutton, R. S., Precup, D., & Singh, S. (1999). Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2), 181\u2013211.","journal-title":"Artificial Intelligence"},{"issue":"1","key":"269_CR56","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1207\/s1532690xci0201_3","volume":"2","author":"J Sweller","year":"1985","unstructured":"Sweller, J., & Cooper, G. A. (1985). The use of worked examples as a substitute for problem solving in learning algebra. Cognition and Instruction, 2 (1), 59\u201389.","journal-title":"Cognition and Instruction"},{"issue":"1","key":"269_CR57","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1007\/BF01273901","volume":"29","author":"F Swetz","year":"1995","unstructured":"Swetz, F. (1995). To know and to teach: Mathematical pedagogy from a historical context. Educational Studies in Mathematics, 29(1), 73\u201388.","journal-title":"Educational Studies in Mathematics"},{"key":"269_CR58","unstructured":"Swetz, F. J. (1987). Capitalism and arithmetic: the new math of the 15th century, including the full text of the Treviso arithmetic of 1478, translated by David Eugene Smith Open Court Publishing."},{"issue":"3","key":"269_CR59","doi-asserted-by":"publisher","first-page":"212","DOI":"10.1016\/j.cedpsych.2010.10.004","volume":"36","author":"T Van Gog","year":"2011","unstructured":"Van Gog, T., Kester, L., & Paas, F. (2011). Effects of worked examples, example-problem, and problem-example pairs on novices\u2019 learning. Contemporary Educational Psychology, 36(3), 212\u2013218.","journal-title":"Contemporary Educational Psychology"},{"key":"269_CR60","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI. Phoenix, AZ, (Vol. 2 p. 5).","DOI":"10.1609\/aaai.v30i1.10295"},{"issue":"3","key":"269_CR61","first-page":"227","volume":"16","author":"K Vanlehn","year":"2006","unstructured":"Vanlehn, K. (2006). The behavior of tutoring systems. IJAIED, 16(3), 227\u2013265.","journal-title":"IJAIED"},{"key":"269_CR62","doi-asserted-by":"crossref","unstructured":"VanLehn, K., Bhembe, D., Chi, M., Lynch, C., Schulze, K., Shelby, R., Taylor, L., Treacy, D., Weinstein, A., & Wintersgill, M. (2004). Implicit versus explicit learning of strategies in a non-procedural cognitive skill. In International conference on intelligent tutoring systems (pp. 521\u2013530). Springer.","DOI":"10.1007\/978-3-540-30139-4_49"},{"key":"269_CR63","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"Vinyals, O., Babuschkin, I., Czarnecki, W., & et al. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575, 350.","journal-title":"Nature"},{"key":"269_CR64","doi-asserted-by":"crossref","unstructured":"Wang, X., Chen, W., Wu, J., Wang, Y. F., & Yang Wang, W. (2018). Video captioning via hierarchical reinforcement learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4213\u20134222).","DOI":"10.1109\/CVPR.2018.00443"},{"key":"269_CR65","unstructured":"Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., & De Freitas, N. (2015). Dueling network architectures for deep reinforcement learning. arXiv:1511.06581."},{"key":"269_CR66","doi-asserted-by":"crossref","unstructured":"Williams, J.D. (2008). The best of both worlds: unifying conventional dialog systems and pomdps. In INTERSPEECH (pp. 1173\u20131176).","DOI":"10.21437\/Interspeech.2008-355"},{"key":"269_CR67","doi-asserted-by":"crossref","unstructured":"Zhou, G., Azizsoltani, H., Ausin, M. S., Barnes, T., & Chi, M. (2019). Hierarchical reinforcement learning for pedagogical policy induction. In International conference on artificial intelligence in education.","DOI":"10.1007\/978-3-030-23204-7_45"},{"key":"269_CR68","unstructured":"Zhou, G., & Chi, M. (2017). The impact of decision agency & granularity on aptitude treatment interaction in tutoring. In Proceedings of the 39th annual conference of the cognitive science society (pp. 3652\u20133657)."},{"key":"269_CR69","unstructured":"Zhou, G., Lynch, C., Price, T. W., Barnes, T., & Chi, M. (2016). The impact of granularity on the effectiveness of students\u2019 pedagogical decision. In Proceedings of the 38th annual conference of the cognitive science society (pp. 2801\u20132806)."},{"key":"269_CR70","unstructured":"Zhou, G., Price, T. W., Lynch, C., Barnes, T., & Chi, M. (2015). The impact of granularity on worked examples and problem solving. In Proceedings of the 37th annual conference of the cognitive science society (pp. 2817\u20132822)."},{"key":"269_CR71","unstructured":"Zhou, G., Wang, J., Lynch, C., & Chi, M. (2017). Towards closing the loop: Bridging machine-induced pedagogical policies to learning theories. In EDM."},{"key":"269_CR72","doi-asserted-by":"crossref","unstructured":"Zhou, G., Yang, X., Azizsoltani, H., Barnes, T., & Chi, M. (2020). Improving student-tutor interaction through data-driven explanation of hierarchical reinforcement induced pedagogical policies. In Proceedings of the 28th conference on user modeling, adaptation and personalization. ACM.","DOI":"10.1145\/3340631.3394848"}],"container-title":["International Journal of Artificial Intelligence in Education"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-021-00269-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40593-021-00269-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-021-00269-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T18:12:13Z","timestamp":1772647933000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40593-021-00269-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,16]]},"references-count":72,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6]]}},"alternative-id":["269"],"URL":"https:\/\/doi.org\/10.1007\/s40593-021-00269-9","relation":{},"ISSN":["1560-4292","1560-4306"],"issn-type":[{"value":"1560-4292","type":"print"},{"value":"1560-4306","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,16]]},"assertion":[{"value":"18 July 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 August 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}