{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T21:27:41Z","timestamp":1769549261299,"version":"3.49.0"},"reference-count":74,"publisher":"Cambridge University Press (CUP)","issue":"5","license":[{"start":{"date-parts":[[2022,8,9]],"date-time":"2022-08-09T00:00:00Z","timestamp":1660003200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep nets are becoming larger and larger in practice, with no respect for (non)-factors that ought to limit growth including the so-called curse of dimensionality (CoD). Donoho suggested that dimensionality can be a blessing as well as a curse. Current practice in industry is well ahead of theory, but there are some recent theoretical results from Weinan E\u2019s group suggesting that errors may be independent of dimensions<jats:inline-formula><jats:alternatives><jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" mime-subtype=\"png\" xlink:href=\"S1351324922000365_inline1.png\"\/><jats:tex-math>$d$<\/jats:tex-math><\/jats:alternatives><\/jats:inline-formula>. Current practice suggests an even stronger conjecture: deep nets are not merely immune to CoD, but actually, deep nets thrive on scale.<\/jats:p>","DOI":"10.1017\/s1351324922000365","type":"journal-article","created":{"date-parts":[[2022,8,9]],"date-time":"2022-08-09T09:21:55Z","timestamp":1660036915000},"page":"673-682","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":3,"title":["Emerging trends: Deep nets thrive on scale"],"prefix":"10.1017","volume":"28","author":[{"given":"Kenneth Ward","family":"Church","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2022,8,9]]},"reference":[{"key":"S1351324922000365_ref22","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324921000231"},{"key":"S1351324922000365_ref36","unstructured":"Fant, G. (1973). Speech sounds and features."},{"key":"S1351324922000365_ref19","doi-asserted-by":"publisher","DOI":"10.3389\/frai.2021.625341"},{"key":"S1351324922000365_ref66","unstructured":"Sanh, V. , Debut, L. , Chaumond, J. and Wolf, T. (2019). Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. arXiv preprint arXiv: 1910.01108."},{"key":"S1351324922000365_ref21","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324922000043"},{"key":"S1351324922000365_ref68","first-page":"379","volume-title":". In ICML","volume":"99","author":"Scott","year":"1999"},{"key":"S1351324922000365_ref70","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1355"},{"key":"S1351324922000365_ref62","first-page":"1","article-title":"A survey on deep learning: algorithms, techniques, and applications","volume":"51","author":"Pouyanfar","year":"2018","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"S1351324922000365_ref67","volume-title":"The Analysis of Variance","volume":"72","author":"Scheffe","year":"1999"},{"key":"S1351324922000365_ref31","unstructured":"E W. (2020). Machine learning and computational mathematics. arXiv preprint arXiv: 2009.14596."},{"key":"S1351324922000365_ref38","volume-title":"Deep Learning","author":"Goodfellow","year":"2016"},{"key":"S1351324922000365_ref72","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"Journal of the Royal Statistical Society Series B-Methodological"},{"key":"S1351324922000365_ref41","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"S1351324922000365_ref29","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2006.871582"},{"key":"S1351324922000365_ref50","volume-title":"The Concentration of Measure Phenomenon","volume":"89","author":"Ledoux","year":"2001"},{"key":"S1351324922000365_ref51","doi-asserted-by":"publisher","DOI":"10.1145\/2996357"},{"key":"S1351324922000365_ref5","doi-asserted-by":"publisher","DOI":"10.1007\/BF02607055"},{"key":"S1351324922000365_ref34","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9868.2008.00674.x"},{"key":"S1351324922000365_ref37","author":"G\u00e9ron","year":"2019"},{"key":"S1351324922000365_ref39","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939754"},{"key":"S1351324922000365_ref42","doi-asserted-by":"publisher","DOI":"10.1109\/5254.708428"},{"key":"S1351324922000365_ref35","first-page":"101","article-title":"A selective overview of variable selection in high dimensional feature space","volume":"20","author":"Fan","year":"2010","journal-title":"Statistica Sinica"},{"key":"S1351324922000365_ref9","volume-title":"Center for Research on Foundation Model","author":"Bommasani"},{"key":"S1351324922000365_ref44","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0026683"},{"key":"S1351324922000365_ref58","doi-asserted-by":"publisher","DOI":"10.1109\/JRPROC.1961.287775"},{"key":"S1351324922000365_ref32","article-title":"Towards a mathematical understanding of neural network-based machine learning: what we know and what we don\u2019t","author":"Ma","year":"2020","journal-title":"CoRR"},{"key":"S1351324922000365_ref54","article-title":"Learning overparameterized neural networks via stochastic gradient descent on structured data","volume":"31","author":"Li","year":"2018","journal-title":"Advances in Neural Information Processing Systems"},{"key":"S1351324922000365_ref23","doi-asserted-by":"publisher","DOI":"10.1007\/BF00994018"},{"key":"S1351324922000365_ref2","first-page":"vol. 32","volume-title":"Advances in Neural Information Processing Systems","author":"Allen-Zhu","year":"2019"},{"key":"S1351324922000365_ref26","unstructured":"Devlin, J. , Chang, M.-W. , Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota: Association for Computational Linguistics, vol. 1, pp. 4171\u20134186."},{"key":"S1351324922000365_ref46","doi-asserted-by":"publisher","DOI":"10.1109\/ICACCCN.2018.8748399"},{"key":"S1351324922000365_ref65","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"S1351324922000365_ref6","doi-asserted-by":"publisher","DOI":"10.1214\/08-AOS620"},{"key":"S1351324922000365_ref55","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01247-4"},{"key":"S1351324922000365_ref18","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-tutorials.1"},{"key":"S1351324922000365_ref40","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324922000365_ref71","doi-asserted-by":"publisher","DOI":"10.1007\/BF02699376"},{"key":"S1351324922000365_ref12","volume-title":"Language Models are Few-Shot Learners","author":"Brown","year":"2020"},{"key":"S1351324922000365_ref8","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop","year":"2006"},{"key":"S1351324922000365_ref33","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1007\/s00365-021-09549-y","article-title":"The Barron space and the flow-induced function spaces for neural network models","volume":"55","author":"Ma","year":"2022","journal-title":"Constructive Approximation"},{"key":"S1351324922000365_ref56","unstructured":"Liu, P. , Yuan, W. , Fu, J. , Jiang, Z. , Hayashi, H. and Neubig, G. (2021). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv: 2107.13586."},{"key":"S1351324922000365_ref60","doi-asserted-by":"publisher","DOI":"10.1109\/JSAIT.2020.2991332"},{"key":"S1351324922000365_ref13","unstructured":"Brutzkus, A. , Globerson, A. , Malach, E. and Shalev-Shwartz, S. (2018). SGD learns over-parameterized networks that provably generalize on linearly separable data. In International Conference on Learning Representations."},{"key":"S1351324922000365_ref43","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276876"},{"key":"S1351324922000365_ref24","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324920000601"},{"key":"S1351324922000365_ref45","first-page":"29","volume-title":"Advances in Neural Information Processing Systems","author":"Kawaguchi","year":"2016"},{"key":"S1351324922000365_ref47","doi-asserted-by":"publisher","DOI":"10.6029\/smartcr.2014.03.007"},{"key":"S1351324922000365_ref57","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.1999.790410"},{"key":"S1351324922000365_ref63","doi-asserted-by":"publisher","DOI":"10.1007\/s11431-020-1647-3"},{"key":"S1351324922000365_ref25","doi-asserted-by":"publisher","DOI":"10.3233\/IDA-1997-1302"},{"key":"S1351324922000365_ref16","unstructured":"Chowdhery, A. , Narang, S. , Devlin, J. , Bosma, M. , Mishra, G. , Roberts, A. , Barham, P. , Chung, H. W. , Sutton, C. , Gehrmann, S. , Schuh, P. , Shi, K. , Tsvyashchenko, S. , Maynez, J. , Rao, A. , Barnes, P. , Tay, Y. , Shazeer, N. , Prabhakaran, V. , Reif, E. , Du, N. , Hutchinson, B. , Pope, R. , Bradbury, J. , Austin, J. , Isard, M. , Gur-Ari, G. , Yin, P. , Duke, T. , Levskaya, A. , Ghemawat, S. , Dev, S. , Michalewski, H. , Garcia, X. , Misra, V. , Robinson, K. , Fedus, L. , Zhou, D. , Ippolito, D. , Luan, D. , Lim, H. , Zoph, B. , Spiridonov, A. , Sepassi, R. , Dohan, D. , Agrawal, S. , Omernick, M. , Dai, A. M. , Pillai, T. S. , Pellat, M. , Lewkowycz, A. , Moreira, E. , Child, R. , Polozov, O. , Lee, K. , Zhou, Z. , Wang, X. , Saeta, B. , Diaz, M. , Firat, O. , Catasta, M. , Wei, J. , Meier-Hellstern, K. , Eck, D. , Dean, J. , Petrov, S. and Fiedel, N. (2022). Palm: Scaling language modeling with pathways."},{"key":"S1351324922000365_ref48","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"S1351324922000365_ref74","doi-asserted-by":"publisher","DOI":"10.1016\/j.aiopen.2021.01.001"},{"key":"S1351324922000365_ref59","first-page":"2.","volume-title":"Perceptron: An Introduction to Computational Geometry","volume":"19","author":"Minsky","year":"1969"},{"key":"S1351324922000365_ref28","unstructured":"Donoho, D. L. (2000). High-dimensional data analysis: The curses and blessings of dimensionality. Aide-Memoire of a Lecture at AMS Conference on Math Challenges of the 21st Century ."},{"key":"S1351324922000365_ref3","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1907378117"},{"key":"S1351324922000365_ref1","first-page":"893","volume-title":". In ICASSP","volume":"91","author":"Acero","year":"1991"},{"key":"S1351324922000365_ref17","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324921000322"},{"key":"S1351324922000365_ref30","unstructured":"Du, S. , Lee, J. , Li, H. , Wang, L. and Zhai, X. (2019). Gradient descent finds global minima of deep neural networks. In International Conference on Machine Learning. PMLR, pp. 1675\u20131685."},{"key":"S1351324922000365_ref49","first-page":"2","volume-title":"Advances in Neural Information Processing Systems","author":"LeCun","year":"1989"},{"key":"S1351324922000365_ref7","volume-title":"Pattern Recognition and Machine Learning. Information Science and Statistics","author":"Bishop","year":"2016"},{"key":"S1351324922000365_ref64","volume-title":"Language Models are Unsupervised Multitask Learners","author":"Radford","year":"2019"},{"key":"S1351324922000365_ref14","first-page":"28811","article-title":"A universal law of robustness via isoperimetry","volume":"34","author":"Bubeck","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"S1351324922000365_ref15","volume-title":"Deep Learning with Python","author":"Chollet","year":"2021"},{"key":"S1351324922000365_ref11","first-page":"398","volume-title":"BMVC","volume":"4","author":"Brown","year":"2002"},{"key":"S1351324922000365_ref73","volume-title":"Scale: The Universal Laws of Life, Growth, and Death in Organisms, Cities, and Companies","author":"West","year":"2018"},{"key":"S1351324922000365_ref53","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150436"},{"key":"S1351324922000365_ref20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1017\/S1351324922000481","article-title":"Emerging trends: General fine-tuning (gft)","author":"Church","year":"2022","journal-title":"Natural Language Engineering"},{"key":"S1351324922000365_ref52","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2007.33.3.305"},{"key":"S1351324922000365_ref4","doi-asserted-by":"publisher","DOI":"10.1126\/science.153.3731.34"},{"key":"S1351324922000365_ref69","first-page":"1","article-title":"The search for invariant acoustic correlates of phonetic features","author":"Stevens","year":"1981","journal-title":"Perspectives on the Study of Speech"},{"key":"S1351324922000365_ref27","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2021.100379"},{"key":"S1351324922000365_ref61","unstructured":"Page, L. , Brin, S. , Motwani, R. and Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web, Technical report, Stanford InfoLab."},{"key":"S1351324922000365_ref10","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45123-4_1"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324922000365","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,1]],"date-time":"2024-10-01T01:01:48Z","timestamp":1727744508000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324922000365\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,9]]},"references-count":74,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["S1351324922000365"],"URL":"https:\/\/doi.org\/10.1017\/s1351324922000365","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,9]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}