{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T04:17:13Z","timestamp":1761365833731,"version":"build-2065373602"},"reference-count":37,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>In deep learning, achieving the global minimum poses a significant challenge, even for relatively simple architectures such as Multi-Layer Perceptrons (MLPs). To address this challenge, we visualized model states at both local and global optima, thereby identifying the factors that impede the transition of models from local to global minima when employing conventional model training methodologies. Based on these insights, we propose the Lagrange Regressor (LReg), a framework that is mathematically equivalent to MLPs. Rather than updates via optimization techniques, LReg employs a Mesh-Refinement\u2013Coarsening (discrete) process to ensure the convergence of the model\u2019s loss function to the global minimum. LReg achieves faster convergence and overcomes the inherent limitations of neural networks in fitting multi-frequency functions. Experiments conducted on large-scale benchmarks including ImageNet-1K (image classification), GLUE (natural language understanding), and WikiText (language modeling) show that LReg consistently enhances the performance of pre-trained models, significantly lowers test loss, and scales effectively to big data scenarios. These results underscore LReg\u2019s potential as a scalable, optimization-free alternative for deep learning in large and complex datasets, aligning closely with the goals of innovative big data analytics.<\/jats:p>","DOI":"10.3390\/info16100921","type":"journal-article","created":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T15:36:31Z","timestamp":1761060991000},"page":"921","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Globally Optimal Alternative to MLP"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-2111-4047","authenticated-orcid":false,"given":"Zheng","family":"Li","sequence":"first","affiliation":[{"name":"Department of Computer Science, New York Institute of Technology, New York, NY 10023, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3968-9699","authenticated-orcid":false,"given":"Jerry","family":"Cheng","sequence":"additional","affiliation":[{"name":"Department of Computer Science, New York Institute of Technology, New York, NY 10023, USA"}]},{"given":"Huanying Helen","family":"Gu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, New York Institute of Technology, New York, NY 10023, USA"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,21]]},"reference":[{"key":"ref_1","unstructured":"Saxe, A.M., McClelland, J.L., and Ganguli, S. (2013). Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv."},{"key":"ref_2","unstructured":"Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., and Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. arXiv."},{"key":"ref_3","unstructured":"Goodfellow, I.J., Vinyals, O., and Saxe, A.M. (2014). Qualitatively characterizing neural network optimization problems. arXiv."},{"key":"ref_4","unstructured":"Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., and LeCun, Y. (2015, January 9\u201312). The loss surfaces of multilayer networks. Proceedings of the Artificial Intelligence and Statistics, PMLR, San Diego, CA, USA."},{"key":"ref_5","unstructured":"Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. (2020). Scaling laws for neural language models. arXiv."},{"key":"ref_6","unstructured":"Henighan, T., Kaplan, J., Katz, M., Chen, M., Hesse, C., Jackson, J., Jun, H., Brown, T.B., Dhariwal, P., and Gray, S. (2020). Scaling laws for autoregressive generative modeling. arXiv."},{"key":"ref_7","unstructured":"Zienkiewicz, O.C., Taylor, R.L., Nithiarasu, P., and Zhu, J. (1977). The Finite Element Method, Elsevier."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Xu, Z.Q.J., Zhang, Y., and Xiao, Y. (2019, January 12\u201315). Training behavior of deep neural network in frequency domain. Proceedings of the International Conference on Neural Information Processing, Sydney, NSW, Australia.","DOI":"10.1007\/978-3-030-36708-4_22"},{"key":"ref_9","first-page":"61","article-title":"Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods","volume":"10","author":"Platt","year":"1999","journal-title":"Adv. Large Margin Classif."},{"key":"ref_10","first-page":"386","article-title":"A rank-invariant method of linear and polynomial regression analysis. I, II, III","volume":"53","author":"Thiel","year":"1950","journal-title":"Nederl. Akad. Wetensch."},{"key":"ref_11","unstructured":"Cantzler, H. (1981). Random Sample Consensus (Ransac), Institute for Perception, Action and Behaviour, Division of Informatics University of Edinburgh."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zhang, T. (2004, January 4\u20138). Solving large scale linear prediction problems using stochastic gradient descent algorithms. Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada.","DOI":"10.1145\/1015330.1015332"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hilt, D.E., and Seegrist, D.W. (1977). Ridge, a Computer Program for Calculating Ridge Regression Estimates.","DOI":"10.5962\/bhl.title.68934"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","article-title":"Cross-validatory choice and assessment of statistical predictions","volume":"36","author":"Stone","year":"1974","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_15","unstructured":"Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., and Sidford, A. (2018, January 6\u20139). Accelerating stochastic gradient descent for least squares regression. Proceedings of the Conference on Learning Theory, PMLR, Stockholm, Sweden."},{"key":"ref_16","unstructured":"Murphy, K.P. (2012). Machine Learning: A Probabilistic Perspective, MIT Press."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_19","unstructured":"Zhang, X., Zhao, J., and LeCun, Y. (2015). Character-level convolutional networks for text classification. arXiv."},{"key":"ref_20","unstructured":"Wightman, R., Touvron, H., and J\u00e9gou, H. (2021). Resnet strikes back: An improved training procedure in timm. arXiv."},{"key":"ref_21","unstructured":"Touvron, H., Vedaldi, A., Douze, M., and J\u00e9gou, H. (2019, January 8\u201314). Fixing the train-test resolution discrepancy. Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada."},{"key":"ref_22","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_24","unstructured":"TorchVision Contributors (2024, August 09). TorchVision Models. Available online: https:\/\/pytorch.org\/vision\/master\/models.html."},{"key":"ref_25","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_28","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_29","unstructured":"Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv."},{"key":"ref_30","unstructured":"Hyeon-Woo, N., Ye-Bin, M., and Oh, T.H. (2021). Fedpara: Low-rank hadamard product for communication-efficient federated learning. arXiv."},{"key":"ref_31","first-page":"1950","article-title":"Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning","volume":"35","author":"Liu","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_32","unstructured":"Zaken, E.B., Ravfogel, S., and Goldberg, Y. (2021). Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, A. (2018). Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv.","DOI":"10.18653\/v1\/W18-5446"},{"key":"ref_34","unstructured":"Merity, S., Xiong, C., Bradbury, J., and Socher, R. (2016). Pointer sentinel mixture models. arXiv."},{"key":"ref_35","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"ref_36","unstructured":"Liu, Y. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020, January 16\u201320). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/10\/921\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T04:11:43Z","timestamp":1761365503000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/10\/921"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"references-count":37,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["info16100921"],"URL":"https:\/\/doi.org\/10.3390\/info16100921","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}