{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T21:51:24Z","timestamp":1762033884403},"reference-count":60,"publisher":"MIT Press","issue":"12","content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,11,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Neural networks are versatile tools for computation, having the ability to approximate a broad range of functions. An important problem in the theory of deep neural networks is expressivity; that is, we want to understand the functions that are computable by a given network. We study real, infinitely differentiable (smooth) hierarchical functions implemented by feedforward neural networks via composing simpler functions in two cases: (1) each constituent function of the composition has fewer inputs than the resulting function and (2) constituent functions are in the more specific yet prevalent form of a nonlinear univariate function (e.g., tanh) applied to a linear multivariate function. We establish that in each of these regimes, there exist nontrivial algebraic partial differential equations (PDEs) that are satisfied by the computed functions. These PDEs are purely in terms of the partial derivatives and are dependent only on the topology of the network. Conversely, we conjecture that such PDE constraints, once accompanied by appropriate nonsingularity conditions and perhaps certain inequalities involving partial derivatives, guarantee that the smooth function under consideration can be represented by the network. The conjecture is verified in numerous examples, including the case of tree architectures, which are of neuroscientific interest. Our approach is a step toward formulating an algebraic description of functional spaces associated with specific neural networks, and may provide useful new tools for constructing neural networks.<\/jats:p>","DOI":"10.1162\/neco_a_01441","type":"journal-article","created":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T22:29:51Z","timestamp":1635460191000},"page":"3204-3263","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":2,"title":["On PDE Characterization of Smooth Hierarchical Functions Computed by Neural Networks"],"prefix":"10.1162","volume":"33","author":[{"given":"Khashayar","family":"Filom","sequence":"first","affiliation":[{"name":"Department of Mathematics, University of Michigan, Ann Arbor, MI 48109, U.S.A. filom@umich.edu"}]},{"given":"Roozbeh","family":"Farhoodi","sequence":"additional","affiliation":[{"name":"Departments of Bioengineering and Department of Neuroscience, University of Pennsylvania, Philadelphia, PA 1910, U.S.A. roozbeh@seas.upenn.edu"}]},{"given":"Konrad Paul","family":"Kording","sequence":"additional","affiliation":[{"name":"Departments of Bioengineering and Department of Neuroscience, University of Pennsylvania, Philadelphia, PA 1910, U.S.A. kording@upenn.edu"}]}],"member":"281","published-online":{"date-parts":[[2021,11,12]]},"reference":[{"key":"2021112318123745100_B1","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/978-3-642-01742-1_5","article-title":"On the representation of functions of several variables as a superposition of functions of a smaller number of variables.","author":"Arnold","year":"2009","journal-title":"Collected works: Representations of functions, celestial mechanics and KAM theory, 1957\u20131965"},{"key":"2021112318123745100_B2","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1007\/978-3-642-01742-1_6","article-title":"Representation of continuous functions of three variables by the superposition of continuous functions of two variables.","author":"Arnold","year":"2009","journal-title":"Collected works: Representations of functions, celestial mechanics and KAM theory, 1957\u20131965"},{"key":"2021112318123745100_B3","first-page":"190","volume-title":"Advances in neural information processing systems, 11","author":"Bartlett","year":"1999"},{"issue":"8","key":"2021112318123745100_B4","doi-asserted-by":"publisher","first-page":"1553","DOI":"10.1109\/TNNLS.2013.2293637","article-title":"On the complexity of neural network classifiers: A comparison between shallow and deep architectures","volume":"25","author":"Bianchini","year":"2014","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"2021112318123745100_B5","volume-title":"Elementary differential equations","author":"Boyce","year":"2012","edition":"10th"},{"key":"2021112318123745100_B6","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1007\/978-3-540-36351-4_13","volume-title":"Kolmogorov's heritage in mathematics","author":"Brattka","year":"2007"},{"key":"2021112318123745100_B7","author":"Buck","year":"1976","journal-title":"Approximate complexity and functional representation"},{"issue":"1","key":"2021112318123745100_B8","doi-asserted-by":"publisher","first-page":"280","DOI":"10.1016\/0022-247X(79)90091-X","article-title":"Approximate complexity and functional representation","volume":"70","author":"Buck","year":"1979","journal-title":"J. Math. Anal. Appl."},{"issue":"2","key":"2021112318123745100_B9","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1080\/00029890.1981.11995204","article-title":"Characterization of classes of functions.","volume":"88","author":"Buck","year":"1981","journal-title":"Amer. Math. Monthly"},{"issue":"2","key":"2021112318123745100_B10","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1016\/0022-0396(81)90060-7","article-title":"The solutions to a smooth PDE can be dense in C(I).","volume":"41","author":"Buck","year":"1981","journal-title":"J. Differential Equations"},{"issue":"1","key":"2021112318123745100_B11","article-title":"The analytic domain in the implicit function theorem.","volume":"4","author":"Chang","year":"2003","journal-title":"J. Inequal. Pure Appl. Math."},{"key":"2021112318123745100_B12","author":"Chatziafratis","year":"2019","journal-title":"Depth-width trade-offs for ReLU networks via Sharkovsky's theorem"},{"key":"2021112318123745100_B13","first-page":"698","article-title":"On the expressive power of deep learning: A tensor analysis.","author":"Cohen","year":"2016","journal-title":"Proceedings of the Conference on Learning Theory"},{"key":"2021112318123745100_B14","author":"Coste","year":"2000","journal-title":"An introduction to semialgebraic geometry"},{"issue":"4","key":"2021112318123745100_B15","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/BF02551274","article-title":"Approximation by superpositions of a sigmoidal function","volume":"2","author":"Cybenko","year":"1989","journal-title":"Mathematics of Control, Signals and Systems"},{"key":"2021112318123745100_B16","first-page":"3232","article-title":"Direct estimation of weights and efficient training of deep neural networks without SGD","author":"Dehmamy","year":"2019","journal-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing"},{"key":"2021112318123745100_B17","author":"Du","year":"2018","journal-title":"On the power of over-parametrization in neural networks with quadratic activation"},{"key":"2021112318123745100_B18","first-page":"907","article-title":"The power of depth for feedforward neural networks.","author":"Eldan","year":"2016","journal-title":"Proceedings of the Conference on Learning Theory"},{"key":"2021112318123745100_B19","doi-asserted-by":"publisher","first-page":"2075","DOI":"10.1162\/neco_a_01231","article-title":"On functions computed on trees.","volume":"31","author":"Farhoodi","year":"2019","journal-title":"Neural Computation"},{"key":"2021112318123745100_B20","first-page":"335","volume-title":"Advances in neural information processing systems","author":"Fefferman","year":"1994"},{"key":"2021112318123745100_B21","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.29089","article-title":"Conserved neural circuit structure across drosophila larval development revealed by comparative connectomics","volume":"6","author":"Gerhard","year":"2017","journal-title":"eLife"},{"issue":"1","key":"2021112318123745100_B22","article-title":"Topological characterization of neuronal arbor morphology via sequence representation: I-motif analysis","volume":"16","author":"Gillette","year":"2015","journal-title":"BMC Bioinformatics"},{"issue":"4","key":"2021112318123745100_B23","doi-asserted-by":"publisher","first-page":"465","DOI":"10.1162\/neco.1989.1.4.465","article-title":"Representation properties of networks: Kolmogorov's theorem is irrelevant","volume":"1","author":"Girosi","year":"1989","journal-title":"Neural Computation"},{"key":"2021112318123745100_B24","first-page":"11","article-title":"Kolmogorov's mapping neural network existence theorem.","volume":"3","author":"Hecht-Nielsen","year":"1987","journal-title":"Proceedings of the International Conference on Neural Networks"},{"issue":"10","key":"2021112318123745100_B25","doi-asserted-by":"publisher","first-page":"437","DOI":"10.1090\/S0002-9904-1902-00923-3","article-title":"Mathematical problems","volume":"8","author":"Hilbert","year":"1902","journal-title":"Bulletin of the American Mathematical Society"},{"issue":"2","key":"2021112318123745100_B26","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1016\/0893-6080(91)90009-T","article-title":"Approximation capabilities of multilayer feedforward networks","volume":"4","author":"Hornik","year":"1991","journal-title":"Neural Networks"},{"issue":"5","key":"2021112318123745100_B27","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1016\/0893-6080(89)90020-8","article-title":"Multilayer feedforward networks are universal approximators","volume":"2","author":"Hornik","year":"1989","journal-title":"Neural Networks"},{"key":"2021112318123745100_B28","author":"Kileel","year":"2019","journal-title":"On the expressive power of deep polynomial neural networks"},{"key":"2021112318123745100_B29","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1007\/0-387-30873-3_2","volume-title":"Branching morphogenesis","author":"Kollins","year":"2005"},{"key":"2021112318123745100_B30","first-page":"953","article-title":"On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition","volume":"114","author":"Kolmogorov","year":"1957","journal-title":"Dokl. Akad. Nauk SSSR"},{"key":"2021112318123745100_B31","author":"Krantz","year":"2002","journal-title":"The implicit function theorem"},{"issue":"4","key":"2021112318123745100_B32","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1162\/neco.1991.3.4.617","article-title":"Kolmogorov's theorem is relevant","volume":"3","author":"K\u016frkov\u00e1","year":"1991","journal-title":"Neural Computation"},{"issue":"3","key":"2021112318123745100_B33","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1016\/0893-6080(92)90012-8","article-title":"Kolmogorov's theorem and multilayer neural networks","volume":"5","author":"K\u016frkov\u00e1","year":"1992","journal-title":"Neural Networks"},{"issue":"6","key":"2021112318123745100_B34","doi-asserted-by":"publisher","first-page":"1223","DOI":"10.1007\/s10955-017-1836-5","article-title":"Why does deep and cheap learning work so well?","volume":"168","author":"Lin","year":"2017","journal-title":"Journal of Statistical Physics"},{"key":"2021112318123745100_B35","author":"Lorentz","year":"1966","journal-title":"Approximation of functions"},{"issue":"1","key":"2021112318123745100_B36","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1162\/neco.1996.8.1.164","article-title":"Neural networks for optimal approximation of smooth and analytic functions","volume":"8","author":"Mhaskar","year":"1996","journal-title":"Neural Computation"},{"key":"2021112318123745100_B37","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v31i1.10913","article-title":"When and why are deep networks better than shallow ones?","author":"Mhaskar","year":"2017","journal-title":"Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence."},{"key":"2021112318123745100_B38","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/11301.001.0001","author":"Minsky","year":"2017","journal-title":"Perceptrons: An introduction to computational geometry"},{"key":"2021112318123745100_B39","first-page":"2924","volume-title":"Advances in neural information processing systems, 27","author":"Montufar","year":"2014"},{"key":"2021112318123745100_B40","author":"Narasimhan","year":"1968","journal-title":"Analysis on real and complex manifolds"},{"issue":"3","key":"2021112318123745100_B41","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/BF01206530","article-title":"\u00dcber dirichletsche reihen und algebraische differentialgleichungen","volume":"8","author":"Ostrowski","year":"1920","journal-title":"Mathematische Zeitschrift"},{"key":"2021112318123745100_B42","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1007\/s10208-020-09461-0","article-title":"Topological properties of the set of functions generated by neural networks of fixed size","volume":"21","author":"Petersen","year":"2020","journal-title":"Foundations of Computational Mathematics"},{"key":"2021112318123745100_B43","author":"Poggio","year":"2019","journal-title":"Theoretical issues in deep networks: Approximation, optimization and generalization."},{"issue":"5","key":"2021112318123745100_B44","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1007\/s11633-017-1054-2","article-title":"Why and when can deep\u2014but not shallow\u2014networks avoid the curse of dimensionality: A review","volume":"14","author":"Poggio","year":"2017","journal-title":"International Journal of Automation and Computing"},{"key":"2021112318123745100_B45","author":"P\u00f3lya","year":"1945","journal-title":"Aufgaben und Lehrs\u00e4tze aus der Analysis"},{"key":"2021112318123745100_B46","first-page":"3360","volume-title":"Advances in neural information processing systems","author":"Poole","year":"2016"},{"key":"2021112318123745100_B47","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-21684-3","author":"Pugh","year":"2002","journal-title":"Real mathematical analysis"},{"key":"2021112318123745100_B48","first-page":"2847","article-title":"On the expressive power of deep neural networks","volume":"70","author":"Raghu","year":"2017","journal-title":"Proceedings of the 34th International Conference on Machine Learning"},{"key":"2021112318123745100_B49","author":"Rolnick","year":"2019,","journal-title":"Reverse-engineering deep ReLU networks."},{"issue":"3","key":"2021112318123745100_B50","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1090\/S0273-0979-1981-14910-7","article-title":"A universal differential equation","volume":"4","author":"Rubel","year":"1981","journal-title":"Bull. Amer. Math. Soc. (N.S.)"},{"key":"2021112318123745100_B51","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.12059","article-title":"Quantitative neuroanatomy for connectomics in drosophila","volume":"5","author":"Schneider-Mizell","year":"2016","journal-title":"eLife"},{"issue":"2","key":"2021112318123745100_B52","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1109\/TIT.2018.2854560","article-title":"Theoretical insights into the optimization landscape of over-parameterized shallow neural networks","volume":"65","author":"Soltanolkotabi","year":"2018","journal-title":"IEEE Transactions on Information Theory"},{"key":"2021112318123745100_B53","doi-asserted-by":"publisher","first-page":"340","DOI":"10.2307\/1994273","article-title":"On the structure of continuous functions of several variables","volume":"115","author":"Sprecher","year":"1965","journal-title":"Trans. Amer. Math. Soc."},{"key":"2021112318123745100_B54","author":"Telgarsky","year":"2016","journal-title":"Benefits of depth in neural networks."},{"key":"2021112318123745100_B55","author":"Venturi","year":"2018","journal-title":"Spurious valleys in two-layer neural network optimization landscapes."},{"key":"2021112318123745100_B56","first-page":"701","article-title":"On Hilbert's thirteenth problem","volume":"95","author":"Vitu\u0161kin","year":"1954","journal-title":"Doklady Akad. Nauk SSSR (N.S.)"},{"key":"2021112318123745100_B57","first-page":"1258","article-title":"A proof of the existence of analytic functions of several variables not representable by linear superpositions of continuously differentiable functions of fewer variables","volume":"156","author":"Vitu\u0161kin","year":"1964","journal-title":"Dokl. Akad. Nauk SSSR"},{"issue":"1","key":"2021112318123745100_B58","article-title":"On Hilbert's thirteenth problem and related questions","volume":"59","author":"Vitu\u0161kin","year":"2004","journal-title":"Russian Mathematical Surveys"},{"issue":"133","key":"2021112318123745100_B59","first-page":"77","article-title":"Linear superpositions of functions","volume":"22","author":"Vitu\u0161kin","year":"1967","journal-title":"Uspehi Mat. Nauk"},{"key":"2021112318123745100_B60","first-page":"429","article-title":"Remarks on functional representation.","author":"von Golitschek","year":"1980","journal-title":"Approximation theory, III (Proc. Conf., Univ. Texas, Austin, Tex., 1980)"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/33\/12\/3204\/1974334\/neco_a_01441.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/33\/12\/3204\/1974334\/neco_a_01441.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,14]],"date-time":"2023-01-14T02:15:23Z","timestamp":1673662523000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/33\/12\/3204\/107674\/On-PDE-Characterization-of-Smooth-Hierarchical"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,12]]},"references-count":60,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2021,11,12]]},"published-print":{"date-parts":[[2021,11,12]]}},"URL":"https:\/\/doi.org\/10.1162\/neco_a_01441","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,12]]},"published":{"date-parts":[[2021,11,12]]}}}