{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T13:38:54Z","timestamp":1740145134467,"version":"3.37.3"},"reference-count":78,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T00:00:00Z","timestamp":1655856000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T00:00:00Z","timestamp":1655856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Optim Lett"],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective\u2019s gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak\u2019s heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.<\/jats:p>","DOI":"10.1007\/s11590-022-01895-5","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T12:02:40Z","timestamp":1655899360000},"page":"657-673","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions"],"prefix":"10.1007","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2772-5840","authenticated-orcid":false,"given":"Michael C.","family":"Burkhart","sequence":"first","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"1895_CR1","unstructured":"Abdullah, A., Kumar, R., McGregor, A., Vassilvitskii, S., Venkatasubramanian, S.: Sketching, embedding, and dimensionality reduction for information spaces. In: Int. Conf. Artif. Intell. Stat. (2016)"},{"key":"1895_CR2","first-page":"4148","volume":"18","author":"N Agarwal","year":"2017","unstructured":"Agarwal, N., Bullins, B., Hazan, E.: Second-order stochastic optimization for machine learning in linear time. J. Mach. Learn. Res. 18, 4148\u20134187 (2017)","journal-title":"J. Mach. Learn. Res."},{"issue":"8","key":"1895_CR3","doi-asserted-by":"crossref","first-page":"1257","DOI":"10.1109\/LSP.2019.2926926","volume":"26","author":"\u00d6D Aky\u0131ld\u0131z","year":"2019","unstructured":"Aky\u0131ld\u0131z, \u00d6.D., Chouzenoux, \u00c9., Elvira, V., M\u00edguez, J.: A probabilistic incremental proximal gradient method. IEEE Signal Process. Lett. 26(8), 1257\u20131261 (2019)","journal-title":"IEEE Signal Process. Lett."},{"issue":"2","key":"1895_CR4","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1162\/089976698300017746","volume":"10","author":"S Amari","year":"1998","unstructured":"Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251\u2013276 (1998)","journal-title":"Neural Comput."},{"issue":"1","key":"1895_CR5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2140\/pjm.1966.16.1","volume":"16","author":"L Armijo","year":"1966","unstructured":"Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 16(1), 1\u20133 (1966)","journal-title":"Pacific J. Math."},{"key":"1895_CR6","unstructured":"Batty, E., Whiteway, M., Saxena, S., Biderman, D., Abe, T., Musall, S., Gillis, W., Markowitz, J., Churchland, A., Cunningham, J.P., Datta, S.R., Linderman, S., Paninski, L.: Behavenet: nonlinear embedding and Bayesian neural decoding of behavioral videos. In: Adv. Neur. Inf. Proc. Sys., pp. 15706\u201315717 (2019)"},{"issue":"4","key":"1895_CR7","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1080\/10556788.2020.1725751","volume":"35","author":"AS Berahas","year":"2020","unstructured":"Berahas, A.S., Bollapragada, R., Nocedal, J.: An investigation of Newton-sketch and subsampled Newton methods. Optim. Methods Softw. 35(4), 661\u2013680 (2020)","journal-title":"Optim. Methods Softw."},{"key":"1895_CR8","unstructured":"Bergou, E., Diouane, Y., Kunc, V., Kungurtsev, V., Royer, C.W.: A subsampling line-search method with second-order results (2018). ArXiv: 1810.07211"},{"issue":"3","key":"1895_CR9","doi-asserted-by":"crossref","first-page":"807","DOI":"10.1137\/S1052623494268522","volume":"6","author":"DP Bertsekas","year":"1996","unstructured":"Bertsekas, D.P.: Incremental least squares methods and the extended Kalman filter. SIAM J. Optim. 6(3), 807\u2013822 (1996)","journal-title":"SIAM J. Optim."},{"issue":"2","key":"1895_CR10","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1007\/s10107-011-0472-0","volume":"129","author":"DP Bertsekas","year":"2011","unstructured":"Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. 129(2), 163 (2011)","journal-title":"Math. Program."},{"key":"1895_CR11","first-page":"993","volume":"3","author":"DM Blei","year":"2003","unstructured":"Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993\u20131022 (2003)","journal-title":"J. Mach. Learn. Res."},{"issue":"4","key":"1895_CR12","doi-asserted-by":"crossref","first-page":"3312","DOI":"10.1137\/17M1154679","volume":"28","author":"R Bollapragada","year":"2018","unstructured":"Bollapragada, R., Byrd, R.H., Nocedal, J.: Adaptive sampling strategies for stochastic optimization. SIAM J. Optim. 28(4), 3312\u20133343 (2018)","journal-title":"SIAM J. Optim."},{"issue":"2","key":"1895_CR13","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1093\/imanum\/dry009","volume":"39","author":"R Bollapragada","year":"2019","unstructured":"Bollapragada, R., Byrd, R.H., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization. IMA J. Numer. Anal. 39(2), 545\u2013578 (2019)","journal-title":"IMA J. Numer. Anal."},{"key":"1895_CR14","doi-asserted-by":"crossref","unstructured":"Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Int. Conf. Comput. Stat., pp. 177\u2013186 (2010)","DOI":"10.1007\/978-3-7908-2604-3_16"},{"issue":"2","key":"1895_CR15","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1137\/16M1080173","volume":"60","author":"L Bottou","year":"2018","unstructured":"Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223\u2013311 (2018)","journal-title":"SIAM Rev."},{"issue":"11","key":"1895_CR16","doi-asserted-by":"crossref","first-page":"2986","DOI":"10.1162\/neco_a_01129","volume":"30","author":"DM Brandman","year":"2018","unstructured":"Brandman, D.M., Burkhart, M.C., Kelemen, J., Franco, B., Harrison, M.T., Hochberg, L.R.: Robust closed-loop control of a cursor in a person with tetraplegia using Gaussian process regression. Neural Comput. 30(11), 2986\u20133008 (2018)","journal-title":"Neural Comput."},{"key":"1895_CR17","doi-asserted-by":"crossref","unstructured":"Burkhart, M.C.: A discriminative approach to Bayesian filtering with applications to human neural decoding. Ph.D. thesis, Division of Applied Mathematics, Brown University, Providence, USA (2019)","DOI":"10.31237\/osf.io\/4j3fu"},{"issue":"5","key":"1895_CR18","doi-asserted-by":"crossref","first-page":"969","DOI":"10.1162\/neco_a_01275","volume":"32","author":"MC Burkhart","year":"2020","unstructured":"Burkhart, M.C., Brandman, D.M., Franco, B., Hochberg, L.R., Harrison, M.T.: The discriminative Kalman filter for Bayesian filtering with nonlinear and nongaussian observation models. Neural Comput. 32(5), 969\u20131017 (2020)","journal-title":"Neural Comput."},{"issue":"3","key":"1895_CR19","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1137\/10079923X","volume":"21","author":"RH Byrd","year":"2011","unstructured":"Byrd, R.H., Chin, G.M., Neveitt, W., Nocedal, J.: On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J. Optim. 21(3), 977\u2013995 (2011)","journal-title":"SIAM J. Optim."},{"key":"1895_CR20","unstructured":"Chen, Z.: Bayesian filtering: from Kalman filters to particle filters, and beyond. Tech. rep, McMaster U (2003)"},{"key":"1895_CR21","unstructured":"Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley-Interscience (2006)"},{"key":"1895_CR22","unstructured":"Culotta, A., Kulp, D., McCallum, A.: Gene prediction with conditional random fields. Tech. Rep. UM-CS-2005-028, U. Massachusetts Amherst (2005)"},{"key":"1895_CR23","first-page":"1265","volume":"200","author":"G Darmois","year":"1935","unstructured":"Darmois, G.: Sur les lois de probabilites a estimation exhaustive. C. R. Acad. Sci. Paris 200, 1265\u20131266 (1935)","journal-title":"C. R. Acad. Sci. Paris"},{"key":"1895_CR24","first-page":"2121","volume":"12","author":"J Duchi","year":"2011","unstructured":"Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121\u20132159 (2011)","journal-title":"J. Mach. Learn. Res."},{"key":"1895_CR25","first-page":"3052","volume":"28","author":"MA Erdogdu","year":"2015","unstructured":"Erdogdu, M.A., Montanari, A.: Convergence rates of sub-sampled Newton methods. In: Adv. Neur. Inf. Proc. Sys 28, 3052\u20133060 (2015)","journal-title":"In: Adv. Neur. Inf. Proc. Sys"},{"issue":"4","key":"1895_CR26","doi-asserted-by":"crossref","first-page":"A1380","DOI":"10.1137\/110830629","volume":"34","author":"MP Friedlander","year":"2012","unstructured":"Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(4), A1380\u2013A1405 (2012)","journal-title":"SIAM J. Sci. Comput."},{"issue":"2","key":"1895_CR27","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1049\/ip-f-2.1993.0015","volume":"140","author":"NJ Gordon","year":"1993","unstructured":"Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear\/non-Gaussian Bayesian state estimation. IEE Proc. F - Radar Signal Process 140(2), 107\u2013113 (1993)","journal-title":"IEE Proc. F - Radar Signal Process"},{"issue":"5","key":"1895_CR28","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1080\/00207176908905777","volume":"9","author":"JE Handschin","year":"1969","unstructured":"Handschin, J.E., Mayne, D.Q.: Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering. Int. J. Control 9(5), 547\u2013559 (1969)","journal-title":"Int. J. Control"},{"key":"1895_CR29","unstructured":"Hernandez-Lobato, J., Houlsby, N., Ghahramani, Z.: Stochastic inference for scalable probabilistic modeling of binary matrices. In: Int. Conf. Mach. Learn. (2014)"},{"key":"1895_CR30","first-page":"856","volume":"23","author":"M Hoffman","year":"2010","unstructured":"Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Adv. Neur. Inf. Proc. Sys 23, 856\u2013864 (2010)","journal-title":"In: Adv. Neur. Inf. Proc. Sys"},{"issue":"4","key":"1895_CR31","first-page":"1303","volume":"14","author":"MD Hoffman","year":"2013","unstructured":"Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(4), 1303\u20131347 (2013)","journal-title":"J. Mach. Learn. Res."},{"key":"1895_CR32","volume-title":"Matrix Analysis","author":"RA Horn","year":"2013","unstructured":"Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2013)","edition":"2"},{"key":"1895_CR33","first-page":"2114","volume":"27","author":"N Houlsby","year":"2014","unstructured":"Houlsby, N., Blei, D.: A filtering approach to stochastic variational inference. In: Adv. Neur. Inf. Proc. Sys 27, 2114\u20132122 (2014)","journal-title":"In: Adv. Neur. Inf. Proc. Sys"},{"key":"1895_CR34","doi-asserted-by":"crossref","unstructured":"Ito, K., Xiong, K.: Gaussian filters for nonlinear filtering problems. IEEE Trans. Autom. Control pp. 910\u2013927 (2000)","DOI":"10.1109\/9.855552"},{"key":"1895_CR35","first-page":"315","volume":"26","author":"R Johnson","year":"2013","unstructured":"Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Adv. Neur. Inf. Proc. Sys 26, 315\u2013323 (2013)","journal-title":"In: Adv. Neur. Inf. Proc. Sys"},{"key":"1895_CR36","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1117\/12.280797","volume":"3068","author":"SJ Julier","year":"1997","unstructured":"Julier, S.J., Uhlmann, J.K.: New extension of the Kalman filter to nonlinear systems. Proc. SPIE 3068, 182\u2013193 (1997)","journal-title":"Proc. SPIE"},{"issue":"1","key":"1895_CR37","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1115\/1.3662552","volume":"82","author":"RE Kalman","year":"1960","unstructured":"Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35\u201345 (1960)","journal-title":"J. Basic Eng."},{"issue":"10","key":"1895_CR38","doi-asserted-by":"crossref","first-page":"1847","DOI":"10.1109\/TPAMI.2009.37","volume":"31","author":"M Kim","year":"2009","unstructured":"Kim, M., Pavlovic, V.: Discriminative learning for dynamic state prediction. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1847\u20131861 (2009)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"1895_CR39","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1090\/S0002-9947-1936-1501854-3","volume":"39","author":"B Koopman","year":"1936","unstructured":"Koopman, B.: On distributions admitting a sufficient statistic. Trans. Amer. Math. Soc. 39, 399\u2013409 (1936)","journal-title":"Trans. Amer. Math. Soc."},{"issue":"5","key":"1895_CR40","doi-asserted-by":"crossref","first-page":"546","DOI":"10.1109\/TAC.1967.1098671","volume":"12","author":"H Kushner","year":"1967","unstructured":"Kushner, H.: Approximations to optimal nonlinear filters. IEEE Trans. Autom. Control 12(5), 546\u2013556 (1967)","journal-title":"IEEE Trans. Autom. Control"},{"key":"1895_CR41","unstructured":"Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Int. Conf. Mach. Learn. (2001)"},{"key":"1895_CR42","doi-asserted-by":"crossref","first-page":"105486","DOI":"10.1016\/j.knosys.2020.105486","volume":"193","author":"B Liu","year":"2020","unstructured":"Liu, B.: Particle filtering methods for stochastic optimization with application to large-scale empirical risk minimization. Knowl. Based. Syst. 193, 105486 (2020)","journal-title":"Knowl. Based. Syst."},{"key":"1895_CR43","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1007\/s10589-020-00220-z","volume":"77","author":"N Loizou","year":"2020","unstructured":"Loizou, N., Richt\u00e1rik, P.: Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods. Comput. Optim. Appl. 77, 653\u2013710 (2020)","journal-title":"Comput. Optim. Appl."},{"key":"1895_CR44","unstructured":"Luo, H., Agarwal, A., Cesa-Bianchi, N., Langford, J.: Efficient second order online learning by sketching. In: Adv. Neur. Inf. Proc. Sys., pp. 910\u2013918 (2016)"},{"issue":"119","key":"1895_CR45","first-page":"1","volume":"18","author":"M Mahsereci","year":"2017","unstructured":"Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. J. Mach. Learn. Res. 18(119), 1\u201359 (2017)","journal-title":"J. Mach. Learn. Res."},{"key":"1895_CR46","unstructured":"Mandt, S., Hoffman, M.D., Blei, D.M.: Stochastic gradient descent as approximate Bayesian inference. J. Mach. Learn. Res. 18 (2017)"},{"key":"1895_CR47","unstructured":"Martens, J.: Deep learning via Hessian-free optimization. In: Int. Conf. Mach. Learn., pp. 735\u2013742 (2010)"},{"key":"1895_CR48","unstructured":"Martens, J.: New insights and perspectives on the natural gradient method. J. Mach. Learn. Res. 21(146) (2020)"},{"key":"1895_CR49","unstructured":"McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Int. Conf. Mach. Learn., pp. 591\u2013598 (2000)"},{"key":"1895_CR50","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-3242-6","volume-title":"Generalized Linear Models","author":"P McCullagh","year":"1989","unstructured":"McCullagh, P., Nelder, J.: Generalized Linear Models, 2nd edn. Chapman & Hall, Florida (1989)","edition":"2"},{"key":"1895_CR51","unstructured":"van der Merwe, R.: Sigma-point Kalman filters for probabilistic inference in dynamic state-space models. Ph.D. thesis, Oregon Health & Science U., Portland, U.S.A. (2004)"},{"key":"1895_CR52","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/S0377-0427(00)00426-X","volume":"124","author":"SG Nash","year":"2000","unstructured":"Nash, S.G.: A survey of truncated-Newton methods. J. Comput. Appl. Math. 124, 45\u201359 (2000)","journal-title":"J. Comput. Appl. Math."},{"issue":"3","key":"1895_CR53","doi-asserted-by":"crossref","first-page":"370","DOI":"10.2307\/2344614","volume":"135","author":"J Nelder","year":"1972","unstructured":"Nelder, J., Wedderburn, R.: Generalized linear models. J. Roy. Stat. Soc. Ser. A 135(3), 370\u2013384 (1972)","journal-title":"J. Roy. Stat. Soc. Ser. A"},{"key":"1895_CR54","first-page":"841","volume":"14","author":"A Ng","year":"2002","unstructured":"Ng, A., Jordan, M.: On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In: Adv. Neur. Inf. Proc. Sys 14, 841\u2013848 (2002)","journal-title":"In: Adv. Neur. Inf. Proc. Sys"},{"issue":"3","key":"1895_CR55","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/BF00275687","volume":"15","author":"E Oja","year":"1982","unstructured":"Oja, E.: Simplified neuron model as a principal component analyzer. J. Math. Biol. 15(3), 267\u2013273 (1982)","journal-title":"J. Math. Biol."},{"issue":"1","key":"1895_CR56","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/0022-247X(85)90131-3","volume":"106","author":"E Oja","year":"1985","unstructured":"Oja, E., Karhunen, J.: On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. J. Math. Anal. Appl. 106(1), 69\u201384 (1985)","journal-title":"J. Math. Anal. Appl."},{"issue":"1","key":"1895_CR57","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1137\/18M1216250","volume":"30","author":"C Paquette","year":"2020","unstructured":"Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30(1), 349\u2013376 (2020)","journal-title":"SIAM J. Optim."},{"issue":"1","key":"1895_CR58","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1162\/neco.1994.6.1.147","volume":"6","author":"BA Pearlmutter","year":"1994","unstructured":"Pearlmutter, B.A.: Fast exact multiplication by the Hessian. Neural Comput. 6(1), 147\u2013160 (1994)","journal-title":"Neural Comput."},{"issue":"1","key":"1895_CR59","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1137\/15M1021106","volume":"27","author":"M Pilanci","year":"2017","unstructured":"Pilanci, M., Wainwright, M.J.: Newton sketch: a near linear-time optimization algorithm with linear-quadratic convergence. SIAM J. Optim. 27(1), 205\u2013245 (2017)","journal-title":"SIAM J. Optim."},{"issue":"4","key":"1895_CR60","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1017\/S0305004100019307","volume":"32","author":"E Pitman","year":"1936","unstructured":"Pitman, E., Wishart, J.: Sufficient statistics and intrinsic accuracy. Math. Proc. Cambr. Philos. Soc. 32(4), 567\u2013579 (1936)","journal-title":"Math. Proc. Cambr. Philos. Soc."},{"issue":"5","key":"1895_CR61","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/0041-5553(64)90137-5","volume":"4","author":"BT Polyak","year":"1964","unstructured":"Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1\u201317 (1964)","journal-title":"USSR Comput. Math. Math. Phys."},{"issue":"4","key":"1895_CR62","doi-asserted-by":"crossref","first-page":"838","DOI":"10.1137\/0330046","volume":"30","author":"BT Polyak","year":"1992","unstructured":"Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838\u2013855 (1992)","journal-title":"SIAM J. Control. Optim."},{"issue":"2","key":"1895_CR63","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1093\/genetics\/155.2.945","volume":"155","author":"JK Pritchard","year":"2000","unstructured":"Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945\u2013959 (2000)","journal-title":"Genetics"},{"issue":"3","key":"1895_CR64","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1214\/aoms\/1177729586","volume":"22","author":"H Robbins","year":"1951","unstructured":"Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22(3), 400\u2013407 (1951)","journal-title":"Ann. Math. Statist."},{"key":"1895_CR65","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1007\/s10107-018-1346-5","volume":"174","author":"F Roosta-Khorasani","year":"2019","unstructured":"Roosta-Khorasani, F., Mahoney, M.W.: Sub-sampled Newton methods. Math. Program. 174, 293\u2013326 (2019)","journal-title":"Math. Program."},{"key":"1895_CR66","first-page":"2663","volume":"25","author":"N Roux","year":"2012","unstructured":"Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Adv. Neur. Inf. Proc. Sys 25, 2663\u20132671 (2012)","journal-title":"In: Adv. Neur. Inf. Proc. Sys"},{"key":"1895_CR67","unstructured":"Ruppert, D.: Efficient estimations from a slowly convergent Robbins\u2013Monro process. Tech. Rep. 781, Cornell U., Ithaca, U.S.A. (1988)"},{"key":"1895_CR68","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139344203","volume-title":"Bayesian Filtering and Smoothing","author":"S S\u00e4rkk\u00e4","year":"2013","unstructured":"S\u00e4rkk\u00e4, S.: Bayesian Filtering and Smoothing. Cambridge University Press, Cambridge (2013)"},{"issue":"10","key":"1895_CR69","doi-asserted-by":"crossref","first-page":"1839","DOI":"10.1109\/TAC.2000.880982","volume":"45","author":"JC Spall","year":"2000","unstructured":"Spall, J.C.: Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Autom. Control 45(10), 1839\u20131853 (2000)","journal-title":"IEEE Trans. Autom. Control"},{"issue":"4","key":"1895_CR70","doi-asserted-by":"crossref","first-page":"2002","DOI":"10.1016\/j.jcp.2011.11.019","volume":"231","author":"P Stinis","year":"2012","unstructured":"Stinis, P.: Stochastic global optimization as a filtering problem. J. Comput. Phys. 231(4), 2002\u20132014 (2012)","journal-title":"J. Comput. Phys."},{"key":"1895_CR71","first-page":"1139","volume":"28","author":"I Sutskever","year":"2013","unstructured":"Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Int. Conf. Mach. Learn 28, 1139\u20131147 (2013)","journal-title":"In: Int. Conf. Mach. Learn"},{"key":"1895_CR72","unstructured":"Taycher, L., Shakhnarovich, G., Demirdjian, D., Darrell, T.: Conditional random people: Tracking humans with CRFs and grid filters. In: Comput. Vis. Pattern Recogn. (2006)"},{"key":"1895_CR73","first-page":"3732","volume":"32","author":"S Vaswani","year":"2019","unstructured":"Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Adv. Neur. Inf. Proc. Sys 32, 3732\u20133745 (2019)","journal-title":"In: Adv. Neur. Inf. Proc. Sys"},{"key":"1895_CR74","first-page":"1441","volume":"19","author":"J Vinson","year":"2006","unstructured":"Vinson, J., Decaprio, D., Pearson, M., Luoma, S., Galagan, J.: Comparative gene prediction using conditional random fields. In: Adv. Neur. Inf. Proc. Sys 19, 1441\u20131448 (2006)","journal-title":"In: Adv. Neur. Inf. Proc. Sys"},{"key":"1895_CR75","first-page":"1261","volume":"22","author":"O Vinyals","year":"2012","unstructured":"Vinyals, O., Povey, D.: Krylov subspace descent for deep learning. In: Int. Conf. Artif. Intell. Stats 22, 1261\u20131268 (2012)","journal-title":"In: Int. Conf. Artif. Intell. Stats"},{"key":"1895_CR76","unstructured":"Xu, P., Yang, J., Roosta-Khorasani, F., R\u00e9, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. In: Adv. Neur. Inf. Proc. Syst. (2016)"},{"key":"1895_CR77","doi-asserted-by":"crossref","unstructured":"Zhang, C.: A particle system for global optimization. In: IEEE Conf. Decis. Control, pp. 1714\u20131719 (2013)","DOI":"10.1109\/CDC.2013.6760129"},{"issue":"1","key":"1895_CR78","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1109\/TAC.2018.2833060","volume":"64","author":"C Zhang","year":"2019","unstructured":"Zhang, C., Taghvaei, A., Mehta, P.G.: A mean-field optimal control formulation for global optimization. IEEE Trans. Autom. Control 64(1), 282\u2013289 (2019)","journal-title":"IEEE Trans. Autom. Control"}],"container-title":["Optimization Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11590-022-01895-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11590-022-01895-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11590-022-01895-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T12:22:59Z","timestamp":1727439779000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11590-022-01895-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,22]]},"references-count":78,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["1895"],"URL":"https:\/\/doi.org\/10.1007\/s11590-022-01895-5","relation":{},"ISSN":["1862-4472","1862-4480"],"issn-type":[{"type":"print","value":"1862-4472"},{"type":"electronic","value":"1862-4480"}],"subject":[],"published":{"date-parts":[[2022,6,22]]},"assertion":[{"value":"13 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 May 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 June 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}