{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T16:32:58Z","timestamp":1771518778130,"version":"3.50.1"},"reference-count":55,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T00:00:00Z","timestamp":1665964800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T00:00:00Z","timestamp":1665964800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Sequence-wise recommendation, where recommend exercises to each student step by step, is one of the most exciting tasks in the field of intelligent tutoring systems (ITS). It is important to develop a personalized sequence-wise recommendation framework that immerses students in learning and helps them acquire as much necessary knowledge as possible, rather than merely focusing on providing non-mastered exercises, which is referred to optimize a single objective. However, due to the different knowledge levels of students and the large scale of exercise banks, it is difficult to generate a personalized exercise recommendation for each student. To fully exploit the multifaceted beneficial information collected from e-learning platforms, we design a dynamic multi-objective sequence-wise recommendation framework via deep reinforcement learning, i.e., DMoSwR-DRL, which automatically select the most suitable exercises for each student based on the well-designed domain-objective rewards. Within this framework, the interaction between students and exercises can be explicitly modeled by integrating the actor\u2013critic network and the state representation component, which can greatly help the agent perform effective reinforcement learning. Specifically, we carefully design a state representation module with dynamic recurrent mechanism, which integrates concept information and exercise difficulty level, thus generating a continuous state representation of the student. Subsequently, a flexible reward function is designed to simultaneously optimize the four domain-specific objectives of difficulty, novelty, coverage, and diversity, providing the students with a trade-off sequence-wise recommendation. To set up the online evaluation, we test DMoSwR-DRL on a simulated environment which can model qualitative development of knowledge level and predicts their performance for a given exercise. Comprehensive experiments are conducted on four classical exercise-answer datasets, and the results show the effectiveness and advantages of DMoSwR-DRL in terms of recommendation quality.<\/jats:p>","DOI":"10.1007\/s40747-022-00871-x","type":"journal-article","created":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T07:02:36Z","timestamp":1665990156000},"page":"1891-1911","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Dynamic multi-objective sequence-wise recommendation framework via deep reinforcement learning"],"prefix":"10.1007","volume":"9","author":[{"given":"Xiankun","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Yuhu","family":"Shang","sequence":"additional","affiliation":[]},{"given":"Yimeng","family":"Ren","sequence":"additional","affiliation":[]},{"given":"Kun","family":"Liang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,10,17]]},"reference":[{"key":"871_CR1","doi-asserted-by":"crossref","unstructured":"Koedinger KR, Mclaughlin EA, Kim J et al (2015) Learning is not a spectator sport: doing is better than watching for learning from a MOOC. ACM, 2015. In: Proceedings of the second ACM conference on learning, pp 111\u2013120","DOI":"10.1145\/2724660.2724681"},{"issue":"3","key":"871_CR2","first-page":"243","volume":"2","author":"L Gang","year":"2012","unstructured":"Gang L, Hao T (2012) User-based question recommendation for question answering system. Int J Inf Educ Technol 2(3):243\u2013246","journal-title":"Int J Inf Educ Technol"},{"key":"871_CR3","unstructured":"Chen X, Li S, Li H, Jiang S, Qi Y, Song L (2018) Generative adversarial user model for reinforcement learning based recommendation system. arXiv preprint arXiv:1812.10613"},{"issue":"04","key":"871_CR4","doi-asserted-by":"publisher","first-page":"1075","DOI":"10.1142\/S0219622021500310","volume":"20","author":"T Anwar","year":"2021","unstructured":"Anwar T, Uma V, Srivastava G (2021) Rec-cfsvd++: Implementing recommendation system using collaborative filtering and singular value decomposition (svd)++[J]. Int J Inf Technol Decis Mak 20(04):1075\u20131093","journal-title":"Int J Inf Technol Decis Mak"},{"issue":"3","key":"871_CR5","doi-asserted-by":"publisher","first-page":"1367","DOI":"10.1007\/s40747-021-00274-4","volume":"7","author":"H Xia","year":"2021","unstructured":"Xia H, Luo Y, Liu Y (2021) Attention neural collaboration filtering based on GRU for recommender systems [J]. Complex Intell Syst 7(3):1367\u20131379","journal-title":"Complex Intell Syst"},{"key":"871_CR6","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.105618","volume":"195","author":"D Shi","year":"2020","unstructured":"Shi D, Wang T, Xing H et al (2020) A learning path recommendation model based on a multidimensional knowledge graph framework for e-learning [J]. Knowl Based Syst 195:105618","journal-title":"Knowl Based Syst"},{"key":"871_CR7","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1016\/j.ins.2018.02.053","volume":"444","author":"Y Zhou","year":"2018","unstructured":"Zhou Y, Huang C, Hu Q, Zhu J, Tang Y (2018) Personalized learning full-path recommendation model based on LSTM neural networks. Inf Sci 444:135\u2013152","journal-title":"Inf Sci"},{"key":"871_CR8","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1016\/j.ins.2020.08.017","volume":"545","author":"A Xw","year":"2021","unstructured":"Xw A, Xm A, Qh B, Zh B, Cha B (2021) Fine-grained learning performance prediction via adaptive sparse self-attention networks. Inf Sci 545:223\u2013240","journal-title":"Inf Sci"},{"issue":"3","key":"871_CR9","doi-asserted-by":"publisher","first-page":"2183","DOI":"10.1007\/s40747-022-00647-3","volume":"8","author":"AA Mubarak","year":"2022","unstructured":"Mubarak AA, Cao H, Hezam IM et al (2022) Modeling students\u2019 performance using graph convolutional networks [J]. Complex Intell Syst 8(3):2183\u20132201","journal-title":"Complex Intell Syst"},{"key":"871_CR10","volume-title":"Concept-aware deep knowledge tracing and exercise recommendation in an online learning system","author":"F Ai","year":"2019","unstructured":"Ai F, Chen Y, Guo Y, Zhao Y, Wang Z, Fu G, Wang G (2019) Concept-aware deep knowledge tracing and exercise recommendation in an online learning system. International Educational Data Mining Society, Washington"},{"key":"871_CR11","doi-asserted-by":"crossref","unstructured":"Zhang J, Shi X, King I, Yeung DY (2017) Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th international conference on World Wide Web, pp 765\u2013774","DOI":"10.1145\/3038912.3052580"},{"key":"871_CR12","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1016\/j.ins.2018.09.054","volume":"474","author":"X Zheng","year":"2019","unstructured":"Zheng X, Wang M, Chen C, Wang Y, Cheng Z (2019) EXPLORE: explainable item-tag CO-recommendation. Inf Sci 474:170\u2013186","journal-title":"Inf Sci"},{"issue":"3","key":"871_CR13","first-page":"19","volume":"3","author":"C Piech","year":"2015","unstructured":"Piech C, Bassen J, Huang J, Ganguli S, Sahami M, Guibas LJ, Sohl-Dickstein J (2015) Deep knowledge tracing. Adv Neural Inf Process Syst 3(3):19\u201323","journal-title":"Adv Neural Inf Process Syst"},{"key":"871_CR14","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.03.003","author":"A Pl","year":"2022","unstructured":"Pl A, Lw B, Mk A et al (2022) Exploration in deep reinforcement learning: a survey [J]. Inf Fusion. https:\/\/doi.org\/10.1016\/j.inffus.2022.03.003","journal-title":"Inf Fusion"},{"key":"871_CR15","unstructured":"Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, Kings College, Cambridge, England"},{"key":"871_CR16","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TSMC.1983.6313077","volume":"5","author":"AG Barto","year":"1983","unstructured":"Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 5:834\u2013846","journal-title":"IEEE Trans Syst Man Cybern"},{"key":"871_CR17","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.neucom.2020.01.043","volume":"388","author":"R Wu","year":"2020","unstructured":"Wu R, Zhou C, Chao F, Yang L, Lin CM, Shang C (2020) Integration of an actor\u2013critic model and generative adversarial networks for a Chinese calligraphy robot. Neurocomputing 388:12\u201323","journal-title":"Neurocomputing"},{"issue":"2","key":"871_CR18","doi-asserted-by":"publisher","first-page":"51","DOI":"10.4018\/ijdet.2014040103","volume":"12","author":"RY Toledo","year":"2014","unstructured":"Toledo RY, Mota YC (2014) An e-learning collaborative filtering approach to suggest problems to solve in programming online judges. Int J Distance Educ Technol (IJDET) 12(2):51\u201365","journal-title":"Int J Distance Educ Technol (IJDET)"},{"issue":"8","key":"871_CR19","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1109\/MC.2009.263","volume":"42","author":"Y Koren","year":"2009","unstructured":"Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30\u201337","journal-title":"Computer"},{"key":"871_CR20","doi-asserted-by":"crossref","unstructured":"Thai-Nghe N, Drumond L, Horv\u00e1th T, Krohn-Grimberghe A, Nanopoulos A, Schmidt-Thieme L (2012) Factorization techniques for predicting student performance. In: Educational recommender systems and technologies: practices and challenges, pp 129\u2013153","DOI":"10.4018\/978-1-61350-489-5.ch006"},{"key":"871_CR21","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.114935","volume":"177","author":"S Liu","year":"2021","unstructured":"Liu S, Zou R, Sun J, Zhang K, Jiang L, Zhou D, Yang J (2021) A hierarchical memory network for knowledge tracing. Expert Syst Appl 177:114935","journal-title":"Expert Syst Appl"},{"issue":"10","key":"871_CR22","first-page":"229","volume":"54","author":"C Jiang","year":"2018","unstructured":"Jiang C, Feng J, Sun X (2018) Personalized exercises recommendation algorithm based on knowledge hierarchical graph. Comput Eng Appl 54(10):229\u2013235","journal-title":"Comput Eng Appl"},{"issue":"1","key":"871_CR23","first-page":"176","volume":"40","author":"T Zhu","year":"2017","unstructured":"Zhu T, Huang Z, Chen E, Liu Q, Wu R, Wu L, Su Y, Chen Z, Hu G (2017) Recommendation method for personalized test questions based on cognitive diagnosis. J Comput 40(1):176\u2013191","journal-title":"J Comput"},{"key":"871_CR24","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106481","volume":"210","author":"Z Wu","year":"2020","unstructured":"Wu Z, Li M, Tang Y, Liang Q (2020) Exercise recommendation based on knowledge concept prediction. Knowl Based Syst 210:106481","journal-title":"Knowl Based Syst"},{"issue":"5","key":"871_CR25","doi-asserted-by":"publisher","first-page":"1403","DOI":"10.1002\/cae.22395","volume":"29","author":"P Lv","year":"2021","unstructured":"Lv P, Wang X, Xu J, Wang J (2021) Intelligent personalized exercise recommendation: a weighted knowledge graph-based approach. Comput Appl Eng Educ 29(5):1403\u20131419","journal-title":"Comput Appl Eng Educ"},{"key":"871_CR26","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1016\/j.ins.2020.03.014","volume":"523","author":"Y Huo","year":"2020","unstructured":"Huo Y, Wong DF, Ni LM, Chao LS, Zhang J (2020) Knowledge modeling via contextualized representations for LSTM-based personalized exercise recommendation. Inf Sci 523:266\u2013278","journal-title":"Inf Sci"},{"key":"871_CR27","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602"},{"issue":"7587","key":"871_CR28","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver D, Huang A, Maddison CJ, Guez A, Sifre L et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484\u2013489","journal-title":"Nature"},{"key":"871_CR29","doi-asserted-by":"crossref","unstructured":"Wang P, Fan Y, Xia L, Zhao W X, Niu S, Huang J (2020) KERL: a knowledge-guided reinforcement learning model for sequential recommendation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 209\u2013218","DOI":"10.1145\/3397271.3401134"},{"key":"871_CR30","doi-asserted-by":"crossref","unstructured":"Zhao X, Zhang L, Ding Z, Xia L, Tang J, Yin D (2018) Recommendations with negative feedback via pairwise deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1040\u20131048","DOI":"10.1145\/3219819.3219886"},{"key":"871_CR31","doi-asserted-by":"crossref","unstructured":"Chen H, Dai X, Cai H, Zhang W, Wang X, Tang R et al (2019) Large-scale interactive recommendation with tree-structured policy gradient. In: Proceedings of the AAAI conference on artificial intelligence, pp 3312\u20133320","DOI":"10.1609\/aaai.v33i01.33013312"},{"key":"871_CR32","doi-asserted-by":"crossref","unstructured":"Zou L, Xia L, Ding Z, Song J, Liu W, Yin D (2019) Reinforcement learning to optimize long-term user engagement in recommender systems. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 2810\u20132818","DOI":"10.1145\/3292500.3330668"},{"key":"871_CR33","doi-asserted-by":"crossref","unstructured":"Massimo D, Ricci F (2018) Harnessing a generalised user behaviour model for next-POI recommendation. In: Proceedings of the 12th ACM conference on recommender systems. pp 402\u2013406","DOI":"10.1145\/3240323.3240392"},{"key":"871_CR34","doi-asserted-by":"crossref","unstructured":"Zhou F, Yin R, Zhang K, Trajcevski G, Zhong T, Wu J (2019) Adversarial point-of-interest recommendation. In: Proceedings of the 28th international conference on World Wide Web, pp 3462\u20133468","DOI":"10.1145\/3308558.3313609"},{"key":"871_CR35","doi-asserted-by":"crossref","unstructured":"Liu Y, Shen Z, Zhang Y, Cui L (2021) Diversity-promoting deep reinforcement learning for interactive recommendation. In: The 5th international conference on crowd science and engineering, pp 132\u2013139","DOI":"10.1145\/3503181.3503203"},{"key":"871_CR36","doi-asserted-by":"crossref","unstructured":"Ding Q, Liu Y, Miao C, Cheng F, Tang H (2020) A hybrid bandit framework for diversified recommendation. In: Proceedings of the AAAI conference on artificial intelligence, pp 4036\u20134044","DOI":"10.1609\/aaai.v35i5.16524"},{"key":"871_CR37","doi-asserted-by":"crossref","unstructured":"Zhao D, Zhang L, Zhang B, Zheng L, Bao Y, Yan W (2020) MaHRL: multi-goals abstraction based deep hierarchical reinforcement learning for recommendations. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 871\u2013880","DOI":"10.1145\/3397271.3401170"},{"key":"871_CR38","unstructured":"Maillard O A, Ryabko D, Munos R (2011) Selecting the state-representation in reinforcement learning. In: Advances in neural information processing systems, pp 2627\u20132635"},{"key":"871_CR39","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106170","volume":"205","author":"F Liu","year":"2020","unstructured":"Liu F, Tang R, Li X, Zhang W, Ye Y, Chen H, He X (2020) State representation modeling for deep reinforcement learning based recommendation. Knowl Based Syst 205:106170","journal-title":"Knowl Based Syst"},{"key":"871_CR40","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1027\/\/1015-5759.16.1.3","volume":"16","author":"P Hontangas","year":"2000","unstructured":"Hontangas P, Ponsoda V, Olea J, Wise SL (2000) The choice of item difficulty in self-adapted testing. Eur J Psychol Assess 16:3\u201312","journal-title":"Eur J Psychol Assess"},{"key":"871_CR41","unstructured":"Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 aaai fall symposium series. arxiv: 1507.06527"},{"key":"871_CR42","doi-asserted-by":"crossref","unstructured":"Su Y, Liu Q, Liu Q et al (2018) Exercise-enhanced sequential modeling for student performance prediction [C]. In: Proceedings of the AAAI conference on artificial intelligence, vol 32(1)","DOI":"10.1609\/aaai.v32i1.11864"},{"key":"871_CR43","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1016\/j.neucom.2019.04.044","volume":"356","author":"Y Su","year":"2019","unstructured":"Su Y, Kuo CCJ (2019) On extended long short-term memory and dependent bidirectional recurrent neural network. Neurocomputing 356:151\u2013161","journal-title":"Neurocomputing"},{"key":"871_CR44","volume-title":"Reinforcement learning: an introduction [M]","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction [M]. MIT Press, Cambridge"},{"key":"871_CR45","doi-asserted-by":"crossref","unstructured":"Papou\u0161ek J, Stanislav V, Pel\u00e1nek R (2016) Impact of question difficulty on engagement and learning [C]. In: International conference on intelligent tutoring systems. Springer, Cham, pp 267\u2013272","DOI":"10.1007\/978-3-319-39583-8_28"},{"key":"871_CR46","unstructured":"Heffernan P (2010) Assistment-2009\u20132010. https:\/\/sites.google.com\/site\/assistmentsdata\/home\/2009-2010-assistment-data. Accessed 12 Feb 2022"},{"key":"871_CR47","unstructured":"Stamper J, Niculescu-Mizil A, Ritter S, Gordon GJ, Koedinger KR (2010) Algebra I 2005\u20132006. Challenge data set from KDD cup 2010 educational data mining challenge. Find it at http:\/\/pslcdatashop.web.cmu.edu\/KDDCup\/downloads.jsp. Accessed 12 Feb 2022"},{"key":"871_CR48","unstructured":"Bier N (2011) Statics2011. https:\/\/pslcdatashop.web.cmu.edu\/. Accessed 12 Feb 2022"},{"key":"871_CR49","unstructured":"Heffernan P (2017) https:\/\/sites.google.com\/view\/assistmentsdataminingassist2017. Accessed 2 May 2022"},{"key":"871_CR50","doi-asserted-by":"crossref","unstructured":"Chiang CL, Cheng MY, Ye TY et al (2019) Convergence Improvement of Q-learning based on a personalized recommendation system [C]. In: 2019 international automatic control conference (CACS). IEEE, pp 1\u20136","DOI":"10.1109\/CACS47674.2019.9024742"},{"issue":"6","key":"871_CR51","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3359554","volume":"13","author":"Y Lei","year":"2019","unstructured":"Lei Y, Li W (2019) Interactive recommendation with user-specific deep reinforcement learning [J]. ACM Trans Knowl Discov Data 13(6):1\u201315","journal-title":"ACM Trans Knowl Discov Data"},{"key":"871_CR52","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1016\/j.neucom.2021.03.072","volume":"447","author":"C Yan","year":"2021","unstructured":"Yan C, Xian J, Wan Y et al (2021) Modeling implicit feedback based on bandit learning for recommendation [J]. Neurocomputing 447:244\u2013256","journal-title":"Neurocomputing"},{"key":"871_CR53","unstructured":"Kingma Diederik P, Adam JB (2014) A method for stochastic optimization [J]. arXiv preprint arXiv:1412.6980"},{"issue":"1","key":"871_CR54","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929\u20131958","journal-title":"J Mach Learn Res"},{"key":"871_CR55","unstructured":"Burda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. arXiv preprint arXiv:1810.12894"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00871-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00871-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00871-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,18]],"date-time":"2023-04-18T09:39:07Z","timestamp":1681810747000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00871-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,17]]},"references-count":55,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["871"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00871-x","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,17]]},"assertion":[{"value":"4 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 August 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 October 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}