{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T00:23:10Z","timestamp":1719879790625},"reference-count":18,"publisher":"The Russian Academy of Sciences","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ARC"],"published-print":{"date-parts":[[2024,5]]},"abstract":"<jats:p>This paper is devoted to the problem of solving a system of nonlinear equations with an arbitrary but continuous vector function on the left-hand side. By assumption, the values of its components are the only a priori information available about this function. An approximate solution of the system is determined using some iterative method with parameters, and the qualitative properties of the method are assessed in terms of a quadratic residual functional. We propose a self-learning (reinforcement) procedure based on auxiliary Monte Carlo (MC) experiments, an exponential utility function, and a payoff function that implements Bellman\u2019s optimality principle. A theorem on the strict monotonic decrease of the residual functional is proven.<\/jats:p>","DOI":"10.31857\/s0005117924050076","type":"journal-article","created":{"date-parts":[[2024,7,1]],"date-time":"2024-07-01T13:05:42Z","timestamp":1719839142000},"page":"544-548","source":"Crossref","is-referenced-by-count":0,"title":["Iterative Methods with Self-Learning for Solving Nonlinear Equations"],"prefix":"10.31857","volume":"85","author":[{"given":"Yu. S.","family":"Popkov","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"name":"Federal Research Center \u201cComputer Science and Control,\u201d Russian Academy of Sciences, Moscow, Russia","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"name":"Trapeznikov Institute of Control Sciences, Russian Academy of Sciences, Moscow, Russia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"17106","reference":[{"key":"ref0","doi-asserted-by":"publisher","unstructured":"1. Krasnosel'skii, M.A., Vainikko, G.M., Zabreiko, P.P., Rutitski, Ja.B., and Stecenko, V.Ja., Approximated Solutions of Operator Equations, Groningen: Walters-Noordhoff, 1972.","DOI":"10.1007\/978-94-010-2715-1_5"},{"key":"ref1","unstructured":"2. Bakhvalov, N.S., Zhidkov, N.P., and Kobel'kov, G.M., Chislennye metody (Numerical Methods), Moscow: Binom, 2003."},{"key":"ref2","unstructured":"3. Polyak, B.T., Introduction to Optimization, Optimization Software, 1987."},{"key":"ref3","unstructured":"4. Strekalovsky, A.S., Elementy nevypukloi optimizatsii (Elements of Nonconvex Optimization), Novosibirsk: Nauka, 2003."},{"key":"ref4","unstructured":"5. Lyle, C., Rowland, M., Dabney, W., Kwiatkowska, M., and Gal, Y., Learning Dynamics and Generalization in Deep Reinforcement Learning, Proceedings of the 39th International Conference on Machine Learning (PMLR), 2022, vol. 162, pp. 14560-14581."},{"key":"ref5","unstructured":"6. Wang, C., Yaun, S., and Ross, K.W., On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning, Proceedings of the International Conference on Learning Representations (ICLR), 2022."},{"key":"ref6","unstructured":"7. Wasserman, Ph.D., Neural Computing: Theory and Practice, Coriolis Group, 1989."},{"key":"ref7","doi-asserted-by":"publisher","unstructured":"8. Kohonen, T., Self-organizing Maps, Berlin-Heidelberg: Springer, 1995.","DOI":"10.1007\/978-3-642-97610-0"},{"key":"ref8","doi-asserted-by":"publisher","unstructured":"9. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., and Fidjeland, A., Human-Level Control through Deep Reinforcement Learning, Nature, 2015, vol. 518, no. 7540, pp. 529-533.","DOI":"10.1038\/nature14236"},{"key":"ref9","unstructured":"10. Sutton, R.S. and Barto, A.G., Introduction to Reinforcement Learning, Cambridge: MIT Press, 1998."},{"key":"ref10","unstructured":"11. Russel, S.J. and Norvig, P., Artificial Intelligence: A Modern Approach, 3rd ed., Upper Saddle River: Prentice Hall, 2010."},{"key":"ref11","doi-asserted-by":"publisher","unstructured":"12. van Hasselt, H., Reinforcement Learning in Continuous State and Action Spaces, in Reinforcement Learning: State-of-the-Art, Wiering, M. and van Otterio, M., Eds., 2012, Springer, pp. 207-257.","DOI":"10.1007\/978-3-642-27645-3_7"},{"key":"ref12","unstructured":"13. Ivanov, S., Reinforcement Learning Textbook, ArXiv, 2022. https:\/\/doi.org\/10.48550\/arXiv.2201. 09746."},{"key":"ref13","doi-asserted-by":"publisher","unstructured":"14. Bozinovski, S., Crossbar Adaptive Array: The First Connectionist Network That Solved the Delayed Reinforcement Learning Problem, in Artificial Neural Nets and Genetic Algorithms, Proc. Int. Conf., Portoroz, Slovenia, Dobnikar, A., Steele, N.C., Pearson, D.W., and Albrecht, R.F., Eds., Springer, 1999, pp. 320-325.","DOI":"10.1007\/978-3-7091-6384-9_54"},{"key":"ref14","doi-asserted-by":"publisher","unstructured":"15. Watkins, C. and Dayan, P., Q-learning, Machine Learning, 1992, vol. 8, no. 3-4, pp. 279-292.","DOI":"10.1023\/A:1022676722315"},{"key":"ref15","doi-asserted-by":"publisher","unstructured":"16. van Hasselt, H., Guez, A., and Silver, D., Deep Reinforcement Learning with Double Q-learning, Proc. AAAI Conf. Artificial Intelligence, 2016, vol. 30, no. 1, pp. 2094-2100.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref16","unstructured":"17. Bellman, R., Dynamic Programming, Princeton: Princeton University Press, 1957."},{"key":"ref17","doi-asserted-by":"publisher","unstructured":"18. Robbins, H. and Monro, S., A Stochastic Approximation Method, The Annals of Mathematical Statistics, 1951, vol. 22, no. 3, pp. 400-407.","DOI":"10.1214\/aoms\/1177729586"}],"container-title":["Automation and Remote Control"],"original-title":[],"deposited":{"date-parts":[[2024,7,1]],"date-time":"2024-07-01T13:28:13Z","timestamp":1719840493000},"score":1,"resource":{"primary":{"URL":"http:\/\/ait.mtas.ru\/en\/archive\/volume85issue5\/ARC%2005-007Popov.pdf"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5]]},"references-count":18,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5]]}},"URL":"https:\/\/doi.org\/10.31857\/s0005117924050076","relation":{},"ISSN":["0005-1179","1608-3032"],"issn-type":[{"value":"0005-1179","type":"print"},{"value":"1608-3032","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5]]}}}