{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T13:51:07Z","timestamp":1777384267026,"version":"3.51.4"},"reference-count":73,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2020,5,14]],"date-time":"2020-05-14T00:00:00Z","timestamp":1589414400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,5,14]],"date-time":"2020-05-14T00:00:00Z","timestamp":1589414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003065","name":"University of Vienna","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003065","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Found Comput Math"],"published-print":{"date-parts":[[2021,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to<jats:inline-formula><jats:alternatives><jats:tex-math>$$L^p$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:msup><mml:mi>L<\/mml:mi><mml:mi>p<\/mml:mi><\/mml:msup><\/mml:math><\/jats:alternatives><\/jats:inline-formula>-norms,<jats:inline-formula><jats:alternatives><jats:tex-math>$$0&lt; p &lt; \\infty $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>0<\/mml:mn><mml:mo>&lt;<\/mml:mo><mml:mi>p<\/mml:mi><mml:mo>&lt;<\/mml:mo><mml:mi>\u221e<\/mml:mi><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>, for all practically used activation functions, and also not closed with respect to the<jats:inline-formula><jats:alternatives><jats:tex-math>$$L^\\infty $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:msup><mml:mi>L<\/mml:mi><mml:mi>\u221e<\/mml:mi><\/mml:msup><\/mml:math><\/jats:alternatives><\/jats:inline-formula>-norm for all practically used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if<jats:inline-formula><jats:alternatives><jats:tex-math>$$f_1, f_2$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:msub><mml:mi>f<\/mml:mi><mml:mn>1<\/mml:mn><\/mml:msub><mml:mo>,<\/mml:mo><mml:msub><mml:mi>f<\/mml:mi><mml:mn>2<\/mml:mn><\/mml:msub><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>are two functions realized by neural networks and if<jats:inline-formula><jats:alternatives><jats:tex-math>$$f_1, f_2$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:msub><mml:mi>f<\/mml:mi><mml:mn>1<\/mml:mn><\/mml:msub><mml:mo>,<\/mml:mo><mml:msub><mml:mi>f<\/mml:mi><mml:mn>2<\/mml:mn><\/mml:msub><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>are close in the sense that<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\Vert f_1 - f_2\\Vert _{L^\\infty } \\le \\varepsilon $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mrow><mml:mo>\u2016<\/mml:mo><\/mml:mrow><mml:msub><mml:mi>f<\/mml:mi><mml:mn>1<\/mml:mn><\/mml:msub><mml:mo>-<\/mml:mo><mml:msub><mml:mi>f<\/mml:mi><mml:mn>2<\/mml:mn><\/mml:msub><mml:msub><mml:mrow><mml:mo>\u2016<\/mml:mo><\/mml:mrow><mml:msup><mml:mi>L<\/mml:mi><mml:mi>\u221e<\/mml:mi><\/mml:msup><\/mml:msub><mml:mo>\u2264<\/mml:mo><mml:mi>\u03b5<\/mml:mi><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>for<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\varepsilon &gt; 0$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mi>\u03b5<\/mml:mi><mml:mo>&gt;<\/mml:mo><mml:mn>0<\/mml:mn><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>, it is, regardless of the size of<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\varepsilon $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mi>\u03b5<\/mml:mi><\/mml:math><\/jats:alternatives><\/jats:inline-formula>, usually not possible to find weights<jats:inline-formula><jats:alternatives><jats:tex-math>$$w_1, w_2$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:msub><mml:mi>w<\/mml:mi><mml:mn>1<\/mml:mn><\/mml:msub><mml:mo>,<\/mml:mo><mml:msub><mml:mi>w<\/mml:mi><mml:mn>2<\/mml:mn><\/mml:msub><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>close together such that each<jats:inline-formula><jats:alternatives><jats:tex-math>$$f_i$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:msub><mml:mi>f<\/mml:mi><mml:mi>i<\/mml:mi><\/mml:msub><\/mml:math><\/jats:alternatives><\/jats:inline-formula>is realized by a neural network with weights<jats:inline-formula><jats:alternatives><jats:tex-math>$$w_i$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:msub><mml:mi>w<\/mml:mi><mml:mi>i<\/mml:mi><\/mml:msub><\/mml:math><\/jats:alternatives><\/jats:inline-formula>. Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.<\/jats:p>","DOI":"10.1007\/s10208-020-09461-0","type":"journal-article","created":{"date-parts":[[2020,5,14]],"date-time":"2020-05-14T19:02:37Z","timestamp":1589482957000},"page":"375-444","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":47,"title":["Topological Properties of the Set of Functions Generated by Neural Networks of Fixed Size"],"prefix":"10.1007","volume":"21","author":[{"given":"Philipp","family":"Petersen","sequence":"first","affiliation":[]},{"given":"Mones","family":"Raslan","sequence":"additional","affiliation":[]},{"given":"Felix","family":"Voigtlaender","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,5,14]]},"reference":[{"key":"9461_CR1","unstructured":"Z. Allen-Zhu, Y. Li, and Z. Song, A Convergence Theory for Deep Learning via Over-Parameterization, Proceedings of the 36th International Conference on Machine Learning, 2019, pp. 242\u2013252."},{"key":"9461_CR2","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-7643-7480-8","volume-title":"Analysis III","author":"H Amann","year":"2009","unstructured":"H. Amann and J. Escher, Analysis III, Birkh\u00e4user Verlag, Basel, 2009."},{"key":"9461_CR3","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1090\/S0002-9939-1964-0169048-7","volume":"15","author":"PM Anselone","year":"1964","unstructured":"P. M. Anselone and J. Korevaar, Translation Invariant Subspaces of Finite Dimension, Proc. Amer. Math. Soc. 15 (1964), 747\u2013752.","journal-title":"Proc. Amer. Math. Soc."},{"key":"9461_CR4","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511624216","volume-title":"Neural Network Learning: Theoretical Foundations","author":"M Anthony","year":"1999","unstructured":"M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, Cambridge, 1999."},{"issue":"1","key":"9461_CR5","first-page":"629","volume":"18","author":"F Bach","year":"2017","unstructured":"F. Bach, Breaking the Curse of Dimensionality with Convex Neural Networks, J. Mach. Learn. Res. 18 (2017), no. 1, 629\u2013681.","journal-title":"J. Mach. Learn. Res."},{"issue":"1","key":"9461_CR6","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0893-6080(89)90014-2","volume":"2","author":"P Baldi","year":"1988","unstructured":"P. Baldi and K. Hornik, Neural Networks and Principal Component Analysis: Learning from Examples Without Local Minima, Neural Netw. 2 (1988), no. 1, 53\u201358.","journal-title":"Neural Netw."},{"issue":"3","key":"9461_CR7","doi-asserted-by":"crossref","first-page":"930","DOI":"10.1109\/18.256500","volume":"39","author":"AR Barron","year":"1993","unstructured":"A.R. Barron, Universal Approximation Bounds for Superpositions of a Sigmoidal Function, IEEE Trans. Inf. Theory 39 (1993), no. 3, 930\u2013945.","journal-title":"IEEE Trans. Inf. Theory"},{"issue":"1","key":"9461_CR8","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/S0304-3975(01)00057-3","volume":"284","author":"PL Bartlett","year":"2002","unstructured":"P. L. Bartlett and S. Ben-David, Hardness Results for Neural Network Approximation Problems, Theor. Comput. Sci. 284 (2002), no. 1, 53\u201366.","journal-title":"Theor. Comput. Sci."},{"key":"9461_CR9","first-page":"6240","volume":"30","author":"PL Bartlett","year":"2017","unstructured":"P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, Spectrally-normalized margin bounds for neural networks, Adv. Neural Inf. Process. Syst. 30, 2017, pp. 6240\u20136249.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"9461_CR10","unstructured":"P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks., J. Mach. Learn. Res. 20 (2019), no. 63, 1\u201317."},{"key":"9461_CR11","first-page":"463","volume":"3","author":"PL Bartlett","year":"2002","unstructured":"P. L. Bartlett and S. Mendelson, Rademacher and Gaussian Complexities: Risk Bounds and Structural Results, J. Mach. Learn. Res. 3 (2002), 463\u2013482.","journal-title":"J. Mach. Learn. Res."},{"key":"9461_CR12","unstructured":"J. Bergstra, G. Desjardins, P. Lamblin, and Y. Bengio, Quadratic Polynomials Learn Better Image Features, Technical Report 1337, D\u00e9partement d\u2019Informatique et de Recherche Op\u00e9rationnelle, Universit\u00e9 de Montr\u00e9al, 2009."},{"key":"9461_CR13","first-page":"494","volume":"2","author":"A Blum","year":"1989","unstructured":"A. Blum and R.L. Rivest, Training a 3-node neural network is NP-complete, Adv. Neural Inf. Process. Syst. 2, 1989, pp. 494\u2013501.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"9461_CR14","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1137\/18M118709X","volume":"1","author":"H B\u00f6lcskei","year":"2019","unstructured":"H. B\u00f6lcskei, P. Grohs, G. Kutyniok, and P. C. Petersen, Optimal Approximation with Sparsely Connected Deep Neural Networks, SIAM J. Math. Data Sci. 1 (2019), 8\u201345.","journal-title":"SIAM J. Math. Data Sci."},{"key":"9461_CR15","unstructured":"B. Carlile, G. Delamarter, P. Kinney, A. Marti, and B. Whitney, Improving Deep Learn- ing by Inverse Square Root Linear Units (ISRLUs), arXiv preprint arXiv:1710.09967 (2017)."},{"key":"9461_CR16","first-page":"3036","volume":"31","author":"L Chizat","year":"2018","unstructured":"L. Chizat and F. Bach, On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport, Adv. Neural Inf. Process. Syst. 31, 2018, pp. 3036\u20133046.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"9461_CR17","unstructured":"D. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learn- ing by exponential linear units (elus), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2\u20134, 2016, Conference Track Proceedings, 2016."},{"key":"9461_CR18","unstructured":"N. Cohen, O. Sharir, and A. Shashua, On the Expressive Power of Deep Learning: A Tensor Analysis, Conference on learning theory, 2016, pp. 698\u2013728."},{"key":"9461_CR19","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4614-6956-8","volume-title":"Measure Theory","author":"DL Cohn","year":"2013","unstructured":"D. L. Cohn, Measure Theory, Birkh\u00e4user Verlag, Basel, 2013."},{"key":"9461_CR20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1090\/S0273-0979-01-00923-5","volume":"39","author":"F Cucker","year":"2002","unstructured":"F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Am. Math. Soc. 39 (2002), 1\u201349.","journal-title":"Bull. Am. Math. Soc."},{"issue":"4","key":"9461_CR21","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/BF02551274","volume":"2","author":"G Cybenko","year":"1989","unstructured":"G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signal 2 (1989), no. 4, 303\u2013314.","journal-title":"Math. Control Signal"},{"issue":"1","key":"9461_CR22","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/TASL.2011.2134090","volume":"20","author":"GE Dahl","year":"2012","unstructured":"G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, IEEE Audio, Speech, Language Process. 20 (2012), no. 1, 30\u201342.","journal-title":"IEEE Audio, Speech, Language Process."},{"key":"9461_CR23","volume-title":"Foundations of Modern Analysis, Pure and Applied Mathematics","author":"J Dieudonn\u00e9","year":"1960","unstructured":"J. Dieudonn\u00e9, Foundations of Modern Analysis, Pure and Applied Mathematics, Vol. X, Academic Press, New York-London, 1960."},{"key":"9461_CR24","doi-asserted-by":"crossref","unstructured":"W. E, J. Han, and A. Jentzen, Deep learning-based numerical methods for high- dimensional parabolic partial differential equations and backward stochastic differential equations, Commun. Math. Stat. 5 (2017), no. 4, 349\u2013380.","DOI":"10.1007\/s40304-017-0117-6"},{"key":"9461_CR25","doi-asserted-by":"crossref","unstructured":"W. E and B. Yu, The Deep Ritz Method: A deep learning-based numerical algorithm for solving variational problems, Communications in Mathematics and Statistics 6 (2018), no. 1, 1\u201312.","DOI":"10.1007\/s40304-018-0127-z"},{"key":"9461_CR26","volume-title":"Real Analysis, Pure and Applied Mathematics (New York)","author":"GB Folland","year":"1999","unstructured":"G. B. Folland, Real Analysis, Pure and Applied Mathematics (New York), Wiley, New York, 1999."},{"key":"9461_CR27","unstructured":"C. D. Freeman and J. Bruna, Topology and Geometry of Half-Rectified Network Opti- mization, 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24\u201326, 2017, conference track proceedings, 2017."},{"issue":"3","key":"9461_CR28","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1007\/BF00195855","volume":"63","author":"F Girosi","year":"1990","unstructured":"F. Girosi and T. Poggio, Networks and the best approximation property, Biol. Cybern. 63 (1990), no. 3, 169\u2013176.","journal-title":"Biol. Cybern."},{"key":"9461_CR29","unstructured":"X. Glorot, A. Bordes, and Y. Bengio, Deep Sparse Rectifier Neural Networks, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 315\u2013323."},{"key":"9461_CR30","unstructured":"I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Cambridge, MA, 2016. http:\/\/www.deeplearningbook.org."},{"key":"9461_CR31","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-09432-8","volume-title":"Classical Fourier Analysis, Graduate Texts in Mathematics","author":"L Grafakos","year":"2008","unstructured":"L. Grafakos, Classical Fourier Analysis, Graduate Texts in Mathematics, vol. 249, Springer, New York, 2008."},{"key":"9461_CR32","volume-title":"Neural Networks: A Comprehensive Foundation","author":"S Haykin","year":"1998","unstructured":"S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall PTR, Upper Saddle River, 1998."},{"key":"9461_CR33","doi-asserted-by":"crossref","unstructured":"K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human- Level Performance on ImageNet Classification, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026\u20131034.","DOI":"10.1109\/ICCV.2015.123"},{"issue":"6","key":"9461_CR34","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","volume":"29","author":"G Hinton","year":"2012","unstructured":"G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Process. Mag. 29 (2012), no. 6, 82\u201397.","journal-title":"IEEE Signal Process. Mag."},{"issue":"5","key":"9461_CR35","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1016\/0893-6080(89)90020-8","volume":"2","author":"K Hornik","year":"1989","unstructured":"K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Netw. 2 (1989), no. 5, 359\u2013366.","journal-title":"Neural Netw."},{"key":"9461_CR36","doi-asserted-by":"crossref","unstructured":"G. Huang, Z. Liu, L. van der Maaten, and K. Q.Weinberger, Densely Connected Convolutional Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261\u20132269.","DOI":"10.1109\/CVPR.2017.243"},{"key":"9461_CR37","first-page":"8571","volume":"31","author":"A Jacot","year":"2018","unstructured":"A. Jacot, F. Gabriel, and C. Hongler, Neural Tangent Kernel: Convergence and Gen- eralization in Neural Networks, Adv. Neural Inf. Process. Syst. 31, 2018, pp. 8571\u2013 8580.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"9461_CR38","unstructured":"S. Judd, Learning in Networks is Hard, Proceedings of IEEE International Conference on Neural Networks, 1987, pp. 685\u2013692."},{"issue":"7","key":"9461_CR39","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1016\/S0893-6080(00)00056-3","volume":"13","author":"P Kainen","year":"2000","unstructured":"P. Kainen, V. Kurkov\u00e1, and A. Vogt, Best approximation by Heaviside perceptron networks, Neural Netw. 13 (2000), no. 7, 695\u2013697.","journal-title":"Neural Netw."},{"issue":"1\u20133","key":"9461_CR40","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/S0925-2312(99)00111-3","volume":"29","author":"PC Kainen","year":"1999","unstructured":"P. C. Kainen, V. Kurkov\u00e1, and A. Vogt, Approximation by neural networks is not continuous, Neurocomputing 29 (1999), no. 1-3, 47\u201356.","journal-title":"Neurocomputing"},{"key":"9461_CR41","first-page":"1097","volume":"25","author":"A Krizhevsky","year":"2012","unstructured":"A. Krizhevsky, I. Sutskever, and G.E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Adv. Neural Inf. Process. Syst. 25, 2012, pp. 1097\u20131105.","journal-title":"Adv. Neural Inf. Process. Syst."},{"issue":"5","key":"9461_CR42","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1109\/72.712178","volume":"9","author":"IE Lagaris","year":"1998","unstructured":"I. E. Lagaris, A. Likas, and D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Trans. Neural Netw. 9 (1998), no. 5, 987\u20131000.","journal-title":"IEEE Trans. Neural Netw."},{"issue":"7553","key":"9461_CR43","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature 521 (2015), no. 7553, 436\u2013444.","journal-title":"Nature"},{"key":"9461_CR44","volume-title":"Introduction to Topological Manifolds, Graduate Texts in Mathematics","author":"JM Lee","year":"2011","unstructured":"J. M. Lee, Introduction to Topological Manifolds, Graduate Texts in Mathematics, vol. 202, Springer, New York, 2011."},{"issue":"6","key":"9461_CR45","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/S0893-6080(05)80131-5","volume":"6","author":"M Leshno","year":"1993","unstructured":"M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Netw. 6 (1993), no. 6, 861\u2013867.","journal-title":"Neural Netw."},{"key":"9461_CR46","doi-asserted-by":"crossref","unstructured":"B. Liao, C. Ma, L. Xiao, R. Lu, and L. Ding, An Arctan-Activated WASD Neural Network Approach to the Prediction of Dow Jones Industrial Average, Advances in neural networks - ISNN 2017 - 14th international symposium, ISNN 2017, Sapporo, Hakodate, and Muroran, Hokkaido, Japan, June 21-26, 2017, Proceedings, Part I, 2017, pp. 120\u2013126.","DOI":"10.1007\/978-3-319-59072-1_15"},{"key":"9461_CR47","unstructured":"A. Maas, Y. Hannun, and A. Ng, Rectifier Nonlinearities Improve Neural Network Acoustic Models, ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013."},{"issue":"3","key":"9461_CR48","first-page":"396","volume":"62","author":"VE Maiorov","year":"2010","unstructured":"V. E. Maiorov, Best approximation by ridge functions in $$L_p$$-spaces, Ukra\u00efn. Mat. Zh. 62 (2010), no. 3, 396\u2013408.","journal-title":"Ukra\u00efn. Mat. Zh."},{"key":"9461_CR49","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/BF02478259","volume":"5","author":"W McCulloch","year":"1943","unstructured":"W. McCulloch and W. Pitts, A logical calculus of ideas immanent in nervous activity, Bull. Math. Biophys. 5 (1943), 115\u2013133.","journal-title":"Bull. Math. Biophys."},{"issue":"33","key":"9461_CR50","doi-asserted-by":"crossref","first-page":"E7665","DOI":"10.1073\/pnas.1806579115","volume":"115","author":"S Mei","year":"2018","unstructured":"S. Mei, A. Montanari, and P.-M. Nguyen, A mean field view of the landscape of two-layer neural networks, Proc. Natl. Acad. Sci. USA 115 (2018), no. 33, E7665\u2013E7671.","journal-title":"Proc. Natl. Acad. Sci. USA"},{"issue":"1","key":"9461_CR51","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1007\/BF02070821","volume":"1","author":"HN Mhaskar","year":"1993","unstructured":"H. N. Mhaskar, Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math. 1 (1993), no. 1, 61\u201380.","journal-title":"Adv. Comput. Math."},{"issue":"1","key":"9461_CR52","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1162\/neco.1996.8.1.164","volume":"8","author":"HN Mhaskar","year":"1996","unstructured":"H. N. Mhaskar, Neural networks for optimal approximation of smooth and analytic functions, Neural Comput. 8 (1996), no. 1, 164\u2013177.","journal-title":"Neural Comput."},{"key":"9461_CR53","unstructured":"M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2012."},{"key":"9461_CR54","unstructured":"G. Mont\u00fafar, R. Pascanu, K. Cho, and Y. Bengio, On the Number of Linear Regions of Deep Neural Networks, Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014, pp. 2924\u20132932."},{"key":"9461_CR55","unstructured":"V. Nair and G. Hinton, Rectified Linear Units Improve Restricted Boltzmann machines, Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 807\u2013 814."},{"key":"9461_CR56","unstructured":"Q. Nguyen and M. Hein, The Loss Surface of Deep and Wide Neural Networks, Proceedings of the 34th International Conference on Machine Learning-volume 70, 2017, pp. 2603\u20132612."},{"key":"9461_CR57","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.neunet.2018.08.019","volume":"108","author":"PC Petersen","year":"2018","unstructured":"P. C. Petersen and F. Voigtlaender, Optimal approximation of piecewise smooth func- tions using deep ReLU neural networks, Neural Netw. 108 (2018), 296\u2013330.","journal-title":"Neural Netw."},{"key":"9461_CR58","unstructured":"PhoemueX (https:\/\/math.stackexchange.com\/users\/151552\/phoemuex), Uncountable closed set A, existence of point at which A accumulates \u201cfrom two sides\u201d of a hyper- plane, 2020. URL:https:\/\/math.stackexchange.com\/q\/3513692 (version: 2020-01-18)."},{"key":"9461_CR59","unstructured":"G. M. Rotskoff and E. Vanden-Eijnden, Neural Networks as Interacting Particle Sys- tems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Ap- proximation Error, arXiv preprint arXiv:1805.00915 (2018)."},{"key":"9461_CR60","volume-title":"Real and Complex Analysis","author":"W Rudin","year":"1987","unstructured":"W. Rudin, Real and Complex Analysis, McGraw-Hill Book Co., New York, 1987."},{"key":"9461_CR61","unstructured":"W. Rudin, Functional Analysis, International Series in Pure and Applied Mathematics, McGraw-Hill, Inc., New York, 1991."},{"key":"9461_CR62","unstructured":"I. Safran and O. Shamir, Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks, Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 2979\u20132987."},{"key":"9461_CR63","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","volume":"61","author":"J Schmidhuber","year":"2015","unstructured":"J. Schmidhuber, Deep learning in eural networks: An overview, Neural Netw. 61 (2015), 85\u2013117.","journal-title":"Neural Netw."},{"key":"9461_CR64","unstructured":"D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, Graepel T., T. Lillicrap, K. Simonyan, and D. Hassabis, Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, arXiv preprint arXiv:1712.01815 (2017)."},{"key":"9461_CR65","unstructured":"Karen Simonyan and Andrew Zisserman, Very deep convolutional networks for large-scale image recognition, International conference on learning representations, 2015."},{"key":"9461_CR66","unstructured":"N. Usunier, G. Synnaeve, Z. Lin, and S. Chintala, Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement, 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017."},{"key":"9461_CR67","unstructured":"L. Venturi, A. S. Bandeira, and J. Bruna, Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys, arXiv preprint, arXiv:1802.06384 (2018)."},{"issue":"2","key":"9461_CR68","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1112\/jlms\/s1-8.2.109","volume":"8","author":"AJ Ward","year":"1933","unstructured":"A. J. Ward, The Structure of Non-Enumerable Sets of Points, J. London Math. Soc. 8 (1933), no. 2, 109\u2013112.","journal-title":"J. London Math. Soc."},{"key":"9461_CR69","doi-asserted-by":"crossref","unstructured":"C. Wu, P. Karanasou, M. JF. Gales, and K. C. Sim, Stimulated Deep Neural Network for Speech Recognition, University of Cambridge, 2016.","DOI":"10.21437\/Interspeech.2016-580"},{"key":"9461_CR70","doi-asserted-by":"crossref","unstructured":"G. N. Yannakakis and J. Togelius, Artificial Intelligence and Games, Springer, 2018.","DOI":"10.1007\/978-3-319-63519-4"},{"key":"9461_CR71","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1016\/j.neunet.2017.07.002","volume":"94","author":"D Yarotsky","year":"2017","unstructured":"D. Yarotsky, Error bounds for approximations with deep ReLU networks, Neural Netw. 94 (2017), 103\u2013114.","journal-title":"Neural Netw."},{"key":"9461_CR72","unstructured":"D. Yarotsky and A. Zhevnerchuk, The phase diagram of approximation rates for deep neural networks, arXiv preprint arXiv:1906.09477 (2019)."},{"key":"9461_CR73","unstructured":"Y. Zhang, P. Liang, and M. J.Wainwright, Convexified Convolutional Neural Networks, Proceedings of the 34th International Conference on Machine Learning-volume 70, 2017, pp. 4044\u20134053."}],"container-title":["Foundations of Computational Mathematics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10208-020-09461-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10208-020-09461-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10208-020-09461-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,1]],"date-time":"2023-10-01T03:05:35Z","timestamp":1696129535000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10208-020-09461-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,14]]},"references-count":73,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,4]]}},"alternative-id":["9461"],"URL":"https:\/\/doi.org\/10.1007\/s10208-020-09461-0","relation":{},"ISSN":["1615-3375","1615-3383"],"issn-type":[{"value":"1615-3375","type":"print"},{"value":"1615-3383","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,14]]},"assertion":[{"value":"8 November 2018","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 January 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 March 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 May 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}