{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T09:59:53Z","timestamp":1766138393389,"version":"3.37.3"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2023,12,1]],"date-time":"2023-12-01T00:00:00Z","timestamp":1701388800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"name":"National Defense Science and Technology Innovation Special Zone Project","award":["1916311LZ001003"],"award-info":[{"award-number":["1916311LZ001003"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Meta-learning is a pivotal and potentially influential machine learning approach to solve challenging problems in reinforcement learning. However, the costly hyper-parameter tuning for training stability of meta-learning is a known shortcoming and currently a hotspot of research. This paper addresses this shortcoming by introducing an online and easily trainable hyper-parameter optimization approach, called Meta Parameters Learning via Meta-Learning (MPML), to combine online hyper-parameter adjustment scheme into meta-learning algorithm, which reduces the need to tune hyper-parameters. Specifically, a basic learning rate for each training task is put forward. Besides, the proposed algorithm dynamically adapts multiple basic learning rate and a shared meta-learning rate through conducting gradient descent alongside the initial optimization steps. In addition, the sensitivity with respect to hyper-parameter choices in the proposed approach are also discussed compared with model-agnostic meta-learning method. The experimental results on reinforcement learning problems demonstrate MPML algorithm is easy to implement and delivers more highly competitive performance than existing meta-learning methods on a diverse set of challenging control tasks.<\/jats:p>","DOI":"10.1093\/comjnl\/bxad089","type":"journal-article","created":{"date-parts":[[2023,12,2]],"date-time":"2023-12-02T10:22:52Z","timestamp":1701512572000},"page":"1645-1651","source":"Crossref","is-referenced-by-count":1,"title":["Online Optimization Method of Learning Process for Meta-Learning"],"prefix":"10.1093","volume":"67","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0996-436X","authenticated-orcid":false,"given":"Zhixiong","family":"Xu","sequence":"first","affiliation":[{"name":"Army Academy of Border and Coastal Defence , Chang\u2019an District, Xi\u2019an, 710100 , China"},{"name":"Army Engineering University of PLA , Qinhuai District, Nanjing, 210001 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Army Academy of Border and Coastal Defence , Chang\u2019an District, Xi\u2019an, 710100 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ailin","family":"Li","sequence":"additional","affiliation":[{"name":"Army Academy of Border and Coastal Defence , Chang\u2019an District, Xi\u2019an, 710100 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Feifei","family":"Zhao","sequence":"additional","affiliation":[{"name":"Army Academy of Border and Coastal Defence , Chang\u2019an District, Xi\u2019an, 710100 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuanyuan","family":"Jing","sequence":"additional","affiliation":[{"name":"Army Academy of Border and Coastal Defence , Chang\u2019an District, Xi\u2019an, 710100 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zheng","family":"Wan","sequence":"additional","affiliation":[{"name":"Army Academy of Border and Coastal Defence , Chang\u2019an District, Xi\u2019an, 710100 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6997-8504","authenticated-orcid":false,"given":"Lei","family":"Cao","sequence":"additional","affiliation":[{"name":"Army Engineering University of PLA , Qinhuai District, Nanjing, 210001 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5198-0932","authenticated-orcid":false,"given":"Xiliang","family":"Chen","sequence":"additional","affiliation":[{"name":"Army Engineering University of PLA , Qinhuai District, Nanjing, 210001 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,12,1]]},"reference":[{"key":"2024062312365010100_ref1","first-page":"568","article-title":"Learning a synaptic learning rule","volume-title":"IJCNN-91-Seattle International Joint Conference on Neural Networks, IEEE","author":"Bengio","year":"2002"},{"key":"2024062312365010100_ref2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-1-4615-5529-2_1","volume-title":"Learning to Learn","author":"Thrun","year":"1998"},{"key":"2024062312365010100_ref3","first-page":"1","volume-title":"Simple principles of meta-learning","author":"Schmidhuber","year":"1996"},{"article-title":"Learning to reinforcement learn","year":"2016","author":"","key":"2024062312365010100_ref4"},{"key":"2024062312365010100_ref5","first-page":"1842","article-title":"Meta-learning with memory-augmented neural networks","volume-title":"International conference on machine learning. (ICML)","author":"","year":"2016"},{"journal-title":"arXiv preprint arXiv:170703141","article-title":"A simple neural attentive meta-learner","author":"","key":"2024062312365010100_ref6"},{"article-title":"RL2: fast reinforcement learning via slow reinforcement learning","year":"2016","author":"","key":"2024062312365010100_ref7"},{"key":"2024062312365010100_ref8","first-page":"3981","article-title":"Learning to learn by gradient descent by gradient descent","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"","year":"2016"},{"key":"2024062312365010100_ref9","first-page":"458","article-title":"Optimization as a model for few-shot learning","volume-title":"In International Conference on Learning Representations. (ICML)","author":"Ravi","year":"2017"},{"key":"2024062312365010100_ref10","first-page":"523","article-title":"Learning feed-forward one-shot learners","volume-title":"Advances in Neural Information Processing Systems. (NIPS)","author":"Bertinetto","year":"2016"},{"key":"2024062312365010100_ref11","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1007\/s10994-016-5580-x","article-title":"Probabilistic inference for determining options in reinforcement learning","volume":"104","author":"","journal-title":"Machine Learning"},{"key":"2024062312365010100_ref12","first-page":"3486","article-title":"Strategic attentive writer for learning macro-actions","volume-title":"Advances in neural information processing systems (NIPS)","author":"Vezhnevets","year":"2016"},{"key":"2024062312365010100_ref13","first-page":"3540","article-title":"Feudal networks for hierarchical reinforcement learning","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML), JMLR.org","author":"","year":"2017"},{"article-title":"Stochastic neural networks for hierarchical reinforcement learning","year":"2017","author":"Florensa","key":"2024062312365010100_ref14"},{"key":"2024062312365010100_ref15","first-page":"456","article-title":"The option-critic architecture","volume-title":"Thirty-First AAAI Conference on Artificial Intelligence","author":"Bacon","year":"2017"},{"article-title":"Hierarchical actor-critic","year":"2017","author":"Levy","key":"2024062312365010100_ref16"},{"article-title":"Continuous adaptation via meta-learning in nonstationary and competitive environments","year":"2017","author":"","key":"2024062312365010100_ref17"},{"key":"2024062312365010100_ref18","first-page":"2402","article-title":"Meta-gradient reinforcement learning","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Xu","year":"2018"},{"first-page":"1","volume-title":"The Workshop on Computer Games","author":"","key":"2024062312365010100_ref19"},{"key":"2024062312365010100_ref20","first-page":"1126","article-title":"Model-agnostic meta-learning for fast adaptation of deep networks","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML). JMLR.org","author":"Finn","year":"2017"},{"key":"2024062312365010100_ref21","first-page":"5","article-title":"Adam: a method for stochastic optimization","volume":"23","author":"Kingma","journal-title":"Computer Science"},{"key":"2024062312365010100_ref22","first-page":"456","article-title":"How to train your MAML","volume-title":"In International Conference on Learning Representations (ICLR)","author":"Antoniou","year":"2019"},{"key":"2024062312365010100_ref23","first-page":"234","volume-title":"Sixth International Conference on Learning Representations (ICLR)","author":"Baydin","year":"2018"},{"article-title":"Reptile: a scalable metalearning algorithm","year":"2018","author":"Nichol","key":"2024062312365010100_ref24"},{"article-title":"Recasting gradient-based meta-learning as hierarchical bayes","year":"2018","author":"Grant","key":"2024062312365010100_ref25"},{"key":"2024062312365010100_ref26","first-page":"85","article-title":"Continuous adaptation via meta-learning in nonstationary and competitive environments","volume-title":"International Conference on Learning Representations (ICLR)","author":"Al-Shedivat","year":"2018"},{"key":"2024062312365010100_ref27","first-page":"452","article-title":"Meta-sgd: learning to learn quickly for few shot learning","author":"Li","journal-title":"CoRR"},{"key":"2024062312365010100_ref28","first-page":"256","article-title":"Alpha MAML: adaptive model-agnostic meta-learning","volume-title":"6th ICML Workshop on Automated Machine Learning","author":"Behl","year":"2019"},{"key":"2024062312365010100_ref29","first-page":"236","article-title":"Task agnostic meta-learning for few-shot learning","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Jamal","year":"2019"},{"key":"2024062312365010100_ref30","first-page":"567","article-title":"Auto-\ud835\udf06: disentangling dynamic task relationships","author":"Liu","journal-title":"Transactions on Machine Learning Research"},{"key":"2024062312365010100_ref31","first-page":"5026","volume-title":"IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Todorov","year":"2012"},{"key":"2024062312365010100_ref32","first-page":"134","article-title":"Meta-dataset: a dataset of datasets for learning to learn from few examples","volume-title":"7th ICML Workshop on Automated Machine Learning","author":"Triantafillou","year":"2020"},{"key":"2024062312365010100_ref33","first-page":"741","article-title":"Trust region policy optimizsssation","author":"Schulman","journal-title":"ICML"},{"first-page":"195","volume-title":"Introduction to pytorch, Deep learning with python","author":"Ketkar","key":"2024062312365010100_ref34"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/67\/5\/1645\/58307802\/bxad089.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/67\/5\/1645\/58307802\/bxad089.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,23]],"date-time":"2024-06-23T12:37:18Z","timestamp":1719146238000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/67\/5\/1645\/7457340"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,1]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,12,1]]},"published-print":{"date-parts":[[2024,6,22]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxad089","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"type":"print","value":"0010-4620"},{"type":"electronic","value":"1460-2067"}],"subject":[],"published-other":{"date-parts":[[2024,5]]},"published":{"date-parts":[[2023,12,1]]}}}