{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T07:17:25Z","timestamp":1761808645271,"version":"3.41.2"},"reference-count":15,"publisher":"World Scientific Pub Co Pte Ltd","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2023,12,15]]},"abstract":"<jats:p> Reinforcement learning is currently applicable across a range of domains, including robotics, gaming, and natural language processing. However, the approach faces difficulties in environments with sparse rewards. Random network distillation (RND) is a good intrinsic reward solution to this problem. Nevertheless, the RND method\u2019s effectiveness hinges on excellent initialization, and the reliance on random features somewhat constrains the agent\u2019s exploration capabilities. This paper proposes a self-supervised network distillation (SSND) exploration method, addressing the drawbacks of RND\u2019s reliance on initializing random networks while enhancing the agent\u2019s exploration capability in sparse reward environments. The method uses distillation error as intrinsic rewards, with the target network trained using self-supervised learning. During the training of the predictor network, we noticed fluctuations in both loss values and intrinsic rewards, which have a detrimental impact on the performance of the intelligent agent. To resolve this issue, we introduce batch normalization layers to the target network, which helps mitigate intrinsic reward anomalies stemming from the target network\u2019s instability. Experiments show that the self-supervised network distillation is better than RND in terms of exploration speed and performance. <\/jats:p>","DOI":"10.1142\/s0218001423510217","type":"journal-article","created":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T07:49:55Z","timestamp":1702367395000},"source":"Crossref","is-referenced-by-count":1,"title":["Self-Supervised Network Distillation for Exploration"],"prefix":"10.1142","volume":"37","author":[{"given":"Xu","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Software Engineering, Xiamen University of Technology, 600 Ligong Road, Xiamen 361000\/Jimei District, P. R. China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3000-247X","authenticated-orcid":false,"given":"Ruiyu","family":"Dai","sequence":"additional","affiliation":[{"name":"College of Computer and Information Engineering, Xiamen University of Technology, 600 Ligong Road, Xiamen 361000\/Jimei District, P. R. China"}]},{"given":"Weisi","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Software Engineering, Xiamen University of Technology, 600 Ligong Road, Xiamen 361000\/Jimei District, P. R. China"}]},{"given":"Jiguang","family":"Qiu","sequence":"additional","affiliation":[{"name":"College of Software Engineering, Xiamen University of Technology, 600 Ligong Road, Xiamen 361000\/Jimei District, P. R. China"}]}],"member":"219","published-online":{"date-parts":[[2024,1,5]]},"reference":[{"key":"S0218001423510217BIB003","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2013.00907"},{"key":"S0218001423510217BIB004","volume":"29","author":"Bellemare M.","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S0218001423510217BIB008","first-page":"30","author":"Fu J.","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S0218001423510217BIB010","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001423510072"},{"key":"S0218001423510217BIB012","first-page":"448","volume-title":"Proc. 32nd Int. Conf. Machine Learning","author":"Ioffe S.","year":"2015"},{"key":"S0218001423510217BIB014","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.03.003"},{"key":"S0218001423510217BIB015","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001422570038"},{"issue":"1","key":"S0218001423510217BIB016","first-page":"857","volume":"35","author":"Liu X.","year":"2021","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"S0218001423510217BIB017","first-page":"25","author":"Lopes M.","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S0218001423510217BIB018","first-page":"2721","volume-title":"Proc. 34th Int. Conf. Machine Learning","author":"Ostrovski G.","year":"2017"},{"key":"S0218001423510217BIB019","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.70"},{"key":"S0218001423510217BIB022","first-page":"30","author":"Tang H.","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S0218001423510217BIB023","first-page":"29","author":"Van den Oord A.","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"first-page":"327","volume-title":"2012 Data Compression Conf.","author":"Veness J.","key":"S0218001423510217BIB024"},{"issue":"10","key":"S0218001423510217BIB025","first-page":"2359","volume":"60","author":"Zeng J.","year":"2023","journal-title":"J. Comput. Res. Dev."}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001423510217","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,7]],"date-time":"2024-02-07T09:30:35Z","timestamp":1707298235000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218001423510217"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,15]]},"references-count":15,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2023,12,15]]}},"alternative-id":["10.1142\/S0218001423510217"],"URL":"https:\/\/doi.org\/10.1142\/s0218001423510217","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"type":"print","value":"0218-0014"},{"type":"electronic","value":"1793-6381"}],"subject":[],"published":{"date-parts":[[2023,12,15]]},"article-number":"2351021"}}