{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T06:44:48Z","timestamp":1780555488346,"version":"3.54.1"},"reference-count":56,"publisher":"MIT Press - Journals","issue":"5","content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Classification problems in the small data regime (with small data statistic T and relatively large feature space dimension D) impose challenges for the common machine learning (ML) and deep learning (DL) tools. The standard learning methods from these areas tend to show a lack of robustness when applied to data sets with significantly fewer data points than dimensions and quickly reach the overfitting bound, thus leading to poor performance beyond the training set. To tackle this issue, we propose eSPA+, a significant extension of the recently formulated entropy-optimal scalable probabilistic approximation algorithm (eSPA). Specifically, we propose to change the order of the optimization steps and replace the most computationally expensive subproblem of eSPA with its closed-form solution. We prove that with these two enhancements, eSPA+ moves from the polynomial to the linear class of complexity scaling algorithms. On several small data learning benchmarks, we show that the eSPA+ algorithm achieves a many-fold speed-up with respect to eSPA and even better performance results when compared to a wide array of ML and DL tools. In particular, we benchmark eSPA+ against the standard eSPA and the main classes of common learning algorithms in the small data regime: various forms of support vector machines, random forests, and long short-term memory algorithms. In all the considered applications, the common learning methods and eSPA are markedly outperformed by eSPA+, which achieves significantly higher prediction accuracy with an orders-of-magnitude lower computational cost.<\/jats:p>","DOI":"10.1162\/neco_a_01490","type":"journal-article","created":{"date-parts":[[2022,3,28]],"date-time":"2022-03-28T23:21:42Z","timestamp":1648509702000},"page":"1220-1255","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":24,"title":["eSPA+: Scalable Entropy-Optimal Machine Learning Classification for Small Data Problems"],"prefix":"10.1162","volume":"34","author":[{"given":"Edoardo","family":"Vecchi","sequence":"first","affiliation":[{"name":"Universit\u00e1 della Svizzera Italiana, Faculty of Informatics, TI-6900 Lugano, Switzerland edoardo.vecchi@usi.ch"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Luk\u00e1\u0161","family":"Posp\u00ed\u0161il","sequence":"additional","affiliation":[{"name":"VSB Ostrava, Department of Mathematics, Ludvika Podeste 1875\/17 708 33 Ostrava, Czech Republic lukas.pospisil@vsb.cz"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Steffen","family":"Albrecht","sequence":"additional","affiliation":[{"name":"University Medical Center of the Johannes Gutenberg-Universit\u00e4t, Institute of Physiology, 55128 Mainz, Germany s.albrecht@uni-mainz.de"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Terence J.","family":"O'Kane","sequence":"additional","affiliation":[{"name":"CSIRO Oceans and Atmosphere, Hobart, Tasmania 7001, Australia Terence.O'Kane@csiro.au"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Illia","family":"Horenko","sequence":"additional","affiliation":[{"name":"Universit\u00e1 della Svizzera Italiana, Faculty of Informatics, TI-6900 Lugano, Switzerland horenkoi@usi.ch"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","published-online":{"date-parts":[[2022,4,15]]},"reference":[{"issue":"1","key":"2022042221321236200_B1","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1049\/trit.2019.0028","article-title":"Deep learning approach for microarray cancer data classification","volume":"5","author":"Basavegowda","year":"2020","journal-title":"CAAI Trans. Intell. Technol."},{"issue":"3","key":"2022042221321236200_B2","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1016\/j.ecolecon.2007.04.009","article-title":"Are there ENSO signals in the macroeconomy?","volume":"64","author":"Berry","year":"2008","journal-title":"Ecological Economics"},{"key":"2022042221321236200_B3","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1145\/130385.130401","article-title":"A training algorithm for optimal margin classifiers","volume-title":"Proceedings of the Fifth Annual Workshop on Computational Learning Theory","author":"Boser","year":"1992"},{"issue":"1","key":"2022042221321236200_B4","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Machine Learning"},{"key":"2022042221321236200_B5","unstructured":"Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Boca Raton, FL: CRC press."},{"key":"2022042221321236200_B6","doi-asserted-by":"crossref","unstructured":"Chang, C.-C, & Lin, C.-J (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1\u201327:27. http:\/\/www.csie.ntu.edu.tw\/\u223ccjlin\/libsvm. 10.1145\/1961189.1961199","DOI":"10.1145\/1961189.1961199"},{"key":"2022042221321236200_B7","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511801389","volume-title":"An introduction to support vector machines and other kernel-based learning methods","author":"Cristianini","year":"2000"},{"key":"2022042221321236200_B8","first-page":"1528","article-title":"A kernel theory of modern data augmentation","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Dao","year":"2019"},{"issue":"3","key":"2022042221321236200_B9","doi-asserted-by":"publisher","first-page":"326","DOI":"10.1145\/212094.212114","article-title":"Overfitting and undercomputing in machine learning","volume":"27","author":"Dietterich","year":"1995","journal-title":"ACM Computing Surveys"},{"issue":"3","key":"2022042221321236200_B10","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1109\/18.382009","article-title":"De-noising by soft-thresholding","volume":"41","author":"Donoho","year":"1995","journal-title":"IEEE Transactions on Information Theory"},{"issue":"1","key":"2022042221321236200_B11","first-page":"1","article-title":"Structural analysis and optimization of convolutional neural networks with a small sample size","volume":"10","author":"D'Souza","year":"2020","journal-title":"Scientific Reports"},{"key":"2022042221321236200_B12","volume-title":"The elements of statistical learning","author":"Friedman","year":"2001"},{"key":"2022042221321236200_B13","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Annals of Statistics"},{"issue":"4","key":"2022042221321236200_B14","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1016\/S0167-9473(01)00065-2","article-title":"Stochastic gradient boosting","volume":"38","author":"Friedman","year":"2002","journal-title":"Computational Statistics and Data Analysis"},{"issue":"5","key":"2022042221321236200_B15","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.aaw0961","article-title":"Low-cost scalable discretization, prediction, and feature selection for complex systems","volume":"6","author":"Gerber","year":"2020","journal-title":"Science Advances"},{"issue":"7775","key":"2022042221321236200_B16","doi-asserted-by":"publisher","first-page":"568","DOI":"10.1038\/s41586-019-1559-7","article-title":"Deep learning for multi-year ENSO forecasts","volume":"573","author":"Ham","year":"2019","journal-title":"Nature"},{"issue":"1","key":"2022042221321236200_B17","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic ROC curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"2022042221321236200_B18","doi-asserted-by":"crossref","unstructured":"Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. New York: Springer.","DOI":"10.1007\/978-0-387-84858-7"},{"issue":"1","key":"2022042221321236200_B19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1021\/ci0342472","article-title":"The problem of overfitting","volume":"44","author":"Hawkins","year":"2004","journal-title":"Journal of Chemical Information and Computer Sciences"},{"issue":"8","key":"2022042221321236200_B20","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"issue":"8","key":"2022042221321236200_B21","doi-asserted-by":"publisher","first-page":"1563","DOI":"10.1162\/neco_a_01296","article-title":"On a scalable entropic breaching of the overfitting barrier for small data problems in machine learning","volume":"32","author":"Horenko","year":"2020","journal-title":"Neural Computation"},{"issue":"20","key":"2022042221321236200_B22","doi-asserted-by":"publisher","first-page":"8179","DOI":"10.1175\/JCLI-D-16-0836.1","article-title":"Extended reconstructed sea surface temperature, version 5 (ERSSTv5), upgrades, validations, and intercomparisons","volume":"30","author":"Huang","year":"2017","journal-title":"Journal of Climate"},{"key":"2022042221321236200_B23","doi-asserted-by":"crossref","unstructured":"Israel, R., Kelly, B. T., & Moskowitz, T. J. (2020). Can machines \u201clearn\u201d finance?SSRN3624052.","DOI":"10.2139\/ssrn.3624052"},{"key":"2022042221321236200_B24","doi-asserted-by":"crossref","unstructured":"Keshari, R., Ghosh, S., Chhabra, S., Vatsa, M., & Singh, R. (2020). Unravelling small sample size problems in the deep learning world. In Proceedings of the 2020 IEEE Sixth International Conference on Multimedia Big Data (pp. 134\u2013143). Piscataway, NJ: IEEE.","DOI":"10.1109\/BigMM50055.2020.00028"},{"key":"2022042221321236200_B25","unstructured":"Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations."},{"issue":"3","key":"2022042221321236200_B26","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1177\/0962280220970228","article-title":"Small sample sizes: A big data problem in high-dimensional data analysis","volume":"30","author":"Konietschke","year":"2021","journal-title":"Statistical Methods in Medical Research"},{"key":"2022042221321236200_B27","unstructured":"Kuhn, H. W., & Tucker, A. W. (1951). Nonlinear programming. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability (pp. 481\u2013492). Berkeley: University of California Press."},{"issue":"4","key":"2022042221321236200_B28","doi-asserted-by":"publisher","first-page":"1050","DOI":"10.1016\/j.celrep.2019.06.078","article-title":"Translational regulation of non-autonomous mitochondrial stress response promotes longevity","volume":"28","author":"Lan","year":"2019","journal-title":"Cell Reports"},{"key":"2022042221321236200_B29","doi-asserted-by":"crossref","unstructured":"Lata, K., Mayank, D., & Nishanth, K. (2019). Data augmentation using generative adversarial network. SSRN.","DOI":"10.2139\/ssrn.3349576"},{"key":"2022042221321236200_B30","first-page":"361","article-title":"Regression trees with unbiased variable selection and interaction detection","volume":"12","author":"Loh","year":"2002","journal-title":"Statistica Sinica"},{"issue":"4","key":"2022042221321236200_B31","doi-asserted-by":"publisher","first-page":"570","DOI":"10.1287\/opre.43.4.570","article-title":"Breast cancer diagnosis and prognosis via linear programming","volume":"43","author":"Mangasarian","year":"1995","journal-title":"Operations Research"},{"issue":"5806","key":"2022042221321236200_B32","doi-asserted-by":"publisher","first-page":"1740","DOI":"10.1126\/science.1132588","article-title":"ENSO as an integrating concept in earth science","volume":"314","author":"McPhaden","year":"2006","journal-title":"Science"},{"issue":"6","key":"2022042221321236200_B33","article-title":"Quantile regression forests","volume":"7","author":"Meinshausen","year":"2006","journal-title":"Journal of Machine Learning Research"},{"key":"2022042221321236200_B34","volume-title":"Numerical optimization","author":"Nocedal","year":"2006"},{"issue":"8","key":"2022042221321236200_B35","doi-asserted-by":"publisher","first-page":"2688","DOI":"10.1109\/TMI.2020.2993291","article-title":"Deep learning COVID-19 features on CXR using limited training data sets","volume":"39","author":"Oh","year":"2020","journal-title":"IEEE Transactions on Medical Imaging"},{"key":"2022042221321236200_B36","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.jcp.2013.10.058","article-title":"ENSO regimes and the late 1970's climate shift: The role of synoptic weather and South Pacific ocean spiciness","volume":"271","author":"O'Kane","year":"2014","journal-title":"Journal of Computational Physics"},{"issue":"10","key":"2022042221321236200_B37","doi-asserted-by":"publisher","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2009","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"5","key":"2022042221321236200_B38","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.1602548","article-title":"The ground truth about metadata and community detection in networks","volume":"3","author":"Peel","year":"2017","journal-title":"Science Advances"},{"key":"2022042221321236200_B39","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3031898","article-title":"Small data challenges in big data era: A survey of recent progress on unsupervised and semi-supervised methods","author":"Qi","year":"2020","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"2","key":"2022042221321236200_B40","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1177\/0962280207087173","article-title":"Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test","volume":"17","author":"Qin","year":"2008","journal-title":"Statistical Methods in Medical Research"},{"issue":"3","key":"2022042221321236200_B41","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1109\/34.75512","article-title":"Small sample size effects in statistical pattern recognition: Recommendations for practitioners","volume":"13","author":"Raudys","year":"1991","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"11","key":"2022042221321236200_B42","doi-asserted-by":"publisher","first-page":"2758","DOI":"10.1109\/78.650102","article-title":"Comparing support vector machines with gaussian kernels to radial basis function classifiers","volume":"45","author":"Scholkopf","year":"1997","journal-title":"IEEE Transactions on Signal Processing"},{"issue":"1","key":"2022042221321236200_B43","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-018-0162-3","article-title":"A survey on image data augmentation for deep learning","volume":"6","author":"Shorten","year":"2019","journal-title":"Journal of Big Data"},{"key":"2022042221321236200_B44","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1016\/j.procs.2015.04.060","article-title":"Feature selection of gene expression data for cancer classification: A review","volume":"50","author":"Singh","year":"2015","journal-title":"Procedia Computer Science"},{"issue":"1","key":"2022042221321236200_B45","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1023\/A:1005342500057","article-title":"The value of improved ENSO prediction to US agriculture","volume":"39","author":"Solow","year":"1998","journal-title":"Climatic Change"},{"issue":"1","key":"2022042221321236200_B46","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"2022042221321236200_B47","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1117\/12.148698","volume-title":"Biomedical image processing and biomedical visualization","author":"Street","year":"1993"},{"issue":"18","key":"2022042221321236200_B48","doi-asserted-by":"publisher","DOI":"10.3390\/ijerph17186933","article-title":"Unveiling COVID-19 from chest x-ray with deep learning: A hurdles race with small data","volume":"17","author":"Tartaglione","year":"2020","journal-title":"International Journal of Environmental Research and Public Health"},{"issue":"7715","key":"2022042221321236200_B49","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1038\/s41586-018-0252-6","article-title":"El Nin\u00f5\u2013southern oscillation complexity","volume":"559","author":"Timmermann","year":"2018","journal-title":"Nature"},{"key":"2022042221321236200_B50","first-page":"281","volume-title":"Advances in neural information processing systems","author":"Vapnik","year":"1997"},{"issue":"158","key":"2022042221321236200_B51","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1080\/01621459.1927.10502953","article-title":"Probable inference, the law of succession, and statistical inference","volume":"22","author":"Wilson","year":"1927","journal-title":"Journal of the American Statistical Association"},{"issue":"2\u20133","key":"2022042221321236200_B52","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/0304-3835(94)90099-X","article-title":"Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates","volume":"77","author":"Wolberg","year":"1994","journal-title":"Cancer Letters"},{"key":"2022042221321236200_B53","article-title":"An overview of overfitting and its solutions","volume":"1168","author":"Ying","year":"2019","journal-title":"Journal of Physics: Conference Series"},{"key":"2022042221321236200_B54","doi-asserted-by":"publisher","DOI":"10.1016\/j.chaos.2020.110121","article-title":"Deep learning methods for forecasting COVID-19 time-series data: A comparative study","volume":"140","author":"Zeroual","year":"2020","journal-title":"Chaos, Solitons and Fractals"},{"key":"2022042221321236200_B55","author":"Zhang","year":"2018","journal-title":"A study on overfitting in deep reinforcement learning."},{"issue":"1","key":"2022042221321236200_B56","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1109\/JPROC.2020.3004555","article-title":"A comprehensive survey on transfer learning","volume":"109","author":"Zhuang","year":"2020","journal-title":"Proceedings of the IEEE"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/34\/5\/1220\/2008663\/neco_a_01490.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/34\/5\/1220\/2008663\/neco_a_01490.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,22]],"date-time":"2022-04-22T21:33:21Z","timestamp":1650663201000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/34\/5\/1220\/110047\/eSPA-Scalable-Entropy-Optimal-Machine-Learning"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,15]]},"references-count":56,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,4,15]]},"published-print":{"date-parts":[[2022,4,15]]}},"URL":"https:\/\/doi.org\/10.1162\/neco_a_01490","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,5]]},"published":{"date-parts":[[2022,4,15]]}}}