{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:43:43Z","timestamp":1760237023217,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2020,2,13]],"date-time":"2020-02-13T00:00:00Z","timestamp":1581552000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Nature Science Foundation of China (NSFC)","award":["61371149"],"award-info":[{"award-number":["61371149"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Reinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the learning of target tasks. However, if an unmatched source task is selected, it will slow down or even disrupt the learning procedure. Therefore, it is very important for knowledge transfer to select appropriate source tasks that have a high degree of matching with target tasks. In this paper, a novel task matching algorithm is proposed to derive the latent structures of value functions of tasks, and align the structures for similarity estimation. Through the latent structure matching, the highly-matched source tasks are selected effectively, from which knowledge is then transferred to give action advice, and improve exploration strategies of the target tasks. Experiments are conducted on the simulated navigation environment and the mountain car environment. The results illustrate the significant performance gain of the improved exploration strategy, compared with traditional    \u03f5   -greedy exploration strategy. A theoretical proof is also given to verify the improvement of the exploration strategy based on latent structure matching.<\/jats:p>","DOI":"10.3390\/fi12020036","type":"journal-article","created":{"date-parts":[[2020,2,18]],"date-time":"2020-02-18T10:10:25Z","timestamp":1582020625000},"page":"36","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Latent Structure Matching for Knowledge Transfer in Reinforcement Learning"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9856-7691","authenticated-orcid":false,"given":"Yi","family":"Zhou","sequence":"first","affiliation":[{"name":"School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China"}]},{"given":"Fenglei","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,2,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction. IEEE Trans. Neural Netw., 9.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1016\/j.artint.2015.05.008","article-title":"Transferring knowledge as heuristics in reinforcement learning: A case-based approach","volume":"226","author":"Reinaldo","year":"2015","journal-title":"Artif. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1716","DOI":"10.1038\/s41591-018-0213-5","article-title":"The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care","volume":"24","author":"Komorowski","year":"2018","journal-title":"Nat. Med."},{"key":"ref_5","first-page":"1","article-title":"Deep Direct Reinforcement Learning for Financial Signal Representation and Trading","volume":"28","author":"Deng","year":"2016","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_6","first-page":"1","article-title":"Expert Level control of Ramp Metering based on Multi-task Deep Reinforcement Learning","volume":"99","author":"Belletti","year":"2017","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_7","unstructured":"Abel, D., Jinnai, Y., Guo, Y., Konidaris, G., and Littman, M.L. (2018, January 10\u201315). Policy and Value Transfer in Lifelong Reinforcement Learning. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1007\/978-3-642-27645-3_5","article-title":"Transfer in reinforcement learning: a framework and a survey","volume":"12","author":"Lazaric","year":"2012","journal-title":"Reinf. Learn."},{"key":"ref_10","first-page":"2125","article-title":"Transfer learning via inter-task mappings for temporal difference learning","volume":"8","author":"Taylor","year":"2007","journal-title":"J. Mach. Learn. Res."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Cheng, Q., Wang, X., and Shen, L. (2017, January 26\u201328). Transfer learning via linear multi-variable mapping under reinforcement learning framework. Proceedings of the 36th Chinese Control Conference, Liaoning, China.","DOI":"10.23919\/ChiCC.2017.8028754"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Cheng, Q., Wang, X., and Shen, L. (2017, January 5\u20138). An Autonomous Inter-task Mapping Learning Method via Artificial Neural Network for Transfer Learning. Proceedings of the 2017 IEEE International Conference on Robotics and Biomimetics, Macau, China.","DOI":"10.1109\/ROBIO.2017.8324510"},{"key":"ref_13","unstructured":"Fachantidis, A., Partalas, I., Taylor, M.E., and Vlahavas, I. (2011, January 9\u201311). Transfer learning via multiple inter-task mappings. Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning, Athens, Greece."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1177\/1059712314559525","article-title":"Transfer learning with probabilistic mapping selection","volume":"23","author":"Fachantidis","year":"2015","journal-title":"Adapt. Behav."},{"key":"ref_15","unstructured":"Ferns, N., Panangaden, P., and Precup, D. (2004, January 7\u201311). Metrics for finite Markov decision processes. Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, Banff, AB, Canada."},{"key":"ref_16","unstructured":"Taylor, M.E., Kuhlmann, G., and Stone, P. (2008, January 12\u201316). Autonomous transfer for reinforcement learning. Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems, Estoril, Portugal."},{"key":"ref_17","unstructured":"Celiberto, L.A., Matsuura, J.P., De Mantaras, R.L., and Bianchi, R.A. (2011, January 16\u201322). Using cases as heuristics in reinforcement learning: a transfer learning application. Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Menlo Park, CA, USA."},{"key":"ref_18","unstructured":"Carroll, J.L., and Seppi, K. (2005, January 31). Task similarity measures for transfer in reinforcement learning task libraries. Proceedings of the IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"889","DOI":"10.1109\/TNNLS.2014.2327636","article-title":"Self-Organizing Neural Networks Integrating Domain Knowledge and Reinforcement Learning","volume":"26","author":"Teng","year":"2015","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_20","unstructured":"Ammar, H.B., Eaton, E., Taylor, M.E., Mocanu, D.C., Driessens, K., Weiss, G., and Tuyls, K. (2014, January 27\u201331). An automated measure of mdp similarity for transfer in reinforcement learnin. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Quebec, QC, Canada."},{"key":"ref_21","unstructured":"Song, J.H., Gao, Y., and Wang, H. (2016, January 9\u201313). Measuring the distance between finite Markov decision processes. Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems, Singapore."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1145\/331499.331504","article-title":"Data clustering: a review","volume":"31","author":"Jain","year":"1999","journal-title":"ACM Comput. Surv."},{"key":"ref_23","first-page":"1131","article-title":"Multi-task reinforcement learning in partially observable stochatic environments","volume":"10","author":"Li","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"ref_24","unstructured":"Liu, M., Chowdhary, G., How, J., and Carin, L. (2012, January 3\u20138). Transfer learning for reinforcement learning with dependent Dirichlet process and Gaussian process. Proceedings of the Twenty-sixth Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_25","unstructured":"Karimpanal, T.G., and Bouffanais, R. (2018). Self-Organizing Maps as a Storage and Transfer Mechanism in Reinforcement Learning. arXiv, Available online: https:\/\/arxiv.org\/abs\/1807.07530."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Taylor, M.E., Whiteson, S., and Stone, P. (2007, January 14\u201318). ABSTRACT Transfer via InterTask Mappings in Policy Search Reinforcement Learning. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, Honolulu, HI, USA.","DOI":"10.1145\/1329125.1329170"},{"key":"ref_27","unstructured":"Lazaric, A., and Ghavamzadeh, M. (2010, January 21\u201324). Bayesian multi-task reinforcement learning. Proceedings of the International Conference on Machine Learning, Haifa, Israel."},{"key":"ref_28","unstructured":"Wilson, A., Fern, A., and Tadepalli, P. (2012, January 1\u201316). Transfer learning in sequential decision problems: A hierarchical bayesian approach. Proceedings of the International Conference on Unsupervised and Transfer Learning Workshop, Bellevue, WA, USA."},{"key":"ref_29","first-page":"725","article-title":"Mixture of manifolds clustering via low rank embedding","volume":"8","author":"Liu","year":"2011","journal-title":"J. Inf. Comput. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000016","article-title":"Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers","volume":"3","author":"Boyd","year":"2010","journal-title":"Found. Trends Mach. Learn."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2053","DOI":"10.1109\/TIT.2010.2044061","article-title":"The Power of Convex Relaxation: Near-Optimal Matrix Completion","volume":"56","author":"Tao","year":"2010","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Devis, T., Gustau, C.V., and Zhao, H.D. (2016). Kernel Manifold Alignment for Domain Adaptation. PLoS ONE, 11.","DOI":"10.1371\/journal.pone.0148655"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Favaro, P., Vidal, R., and Ravichandran, A. (2011, January 20\u201325). A closed form solution to robust subspace estimation and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995365"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2323","DOI":"10.1126\/science.290.5500.2323","article-title":"Nonlinear dimensionality reduction by locally linear embedding","volume":"290","author":"Roweis","year":"2000","journal-title":"Science"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Tokic, M., and Gunther, P. (2011, January 16\u201322). Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax. Proceedings of the Annual Conference on Artificial Intelligence, Berlin, Heidelberg.","DOI":"10.1007\/978-3-642-24455-1_33"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1109\/TNNLS.2014.2320280","article-title":"Generalization Performance of Radial Basis Function Networks","volume":"26","author":"Lei","year":"2015","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1266","DOI":"10.1109\/ACCESS.2016.2548519","article-title":"Biometric Authentication Using Noisy Electrocardiograms Acquired by Mobile Sensors","volume":"4","author":"Soo","year":"2016","journal-title":"IEEE Access"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Ammar, H.B., Mocanu, D.C., and Taylor, M.E. (2013, January 22\u201326). Automatically mapped transfer between reinforcement learning tasks via three-way restricted boltzmann machines. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Prague, Czech Republic.","DOI":"10.1007\/978-3-642-40991-2_29"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"64639","DOI":"10.1109\/ACCESS.2018.2876494","article-title":"Benchmarking projective simulation in navigation problems","volume":"6","author":"Melnikov","year":"2018","journal-title":"IEEE Access"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Yanai, H., Takeuchi, K., and Takane, Y. (2011). Singular Value Decomposition (SVD). Stat. Soc. Behav. Sci., 64639\u201364648.","DOI":"10.1007\/978-1-4419-9887-3_5"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1126\/science.153.3731.34","article-title":"Dynamic programming","volume":"8","author":"Bellman","year":"1966","journal-title":"Science"},{"key":"ref_43","unstructured":"Barreto, A., Dabney, W., Munos, R., Hunt, J.J., Schaul, T., Hasselt, H.V., and Silver, D. (2017, January 4\u20139). Successor features for transfer in reinforcement learning. Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/12\/2\/36\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T08:57:39Z","timestamp":1760173059000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/12\/2\/36"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,13]]},"references-count":43,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2020,2]]}},"alternative-id":["fi12020036"],"URL":"https:\/\/doi.org\/10.3390\/fi12020036","relation":{},"ISSN":["1999-5903"],"issn-type":[{"type":"electronic","value":"1999-5903"}],"subject":[],"published":{"date-parts":[[2020,2,13]]}}}