{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T03:33:49Z","timestamp":1762918429380,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T00:00:00Z","timestamp":1727654400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000181","name":"AFOSR","doi-asserted-by":"publisher","award":["FA9550-23-1-0517","FA9550-21-1-0058"],"award-info":[{"award-number":["FA9550-23-1-0517","FA9550-21-1-0058"]}],"id":[{"id":"10.13039\/100000181","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000181","name":"MURI","doi-asserted-by":"publisher","award":["FA9550-23-1-0517","FA9550-21-1-0058"],"award-info":[{"award-number":["FA9550-23-1-0517","FA9550-21-1-0058"]}],"id":[{"id":"10.13039\/100000181","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset while ignoring data that are either misleading or bring unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data lead to worse performance and instabilities of the surrogate model, often termed sample-wise \u201cdouble descent\u201d. We find these instabilities are a result of the complexity of the underlying map and are linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models.<\/jats:p>","DOI":"10.3390\/e26100835","type":"journal-article","created":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T11:32:05Z","timestamp":1727695925000},"page":"835","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Information FOMO: The Unhealthy Fear of Missing Out on Information\u2014A Method for Removing Misleading Data for Healthier Models"],"prefix":"10.3390","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4485-6359","authenticated-orcid":false,"given":"Ethan","family":"Pickering","sequence":"first","affiliation":[{"name":"Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0302-0691","authenticated-orcid":false,"given":"Themistoklis P.","family":"Sapsis","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]}],"member":"1968","published-online":{"date-parts":[[2024,9,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Opper, M. (1995). Statistical mechanics of learning: Generalization. The Handbook of Brain Theory and Neural Networks, MIT Press.","DOI":"10.1007\/978-1-4612-0723-8_5"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"15849","DOI":"10.1073\/pnas.1903070116","article-title":"Reconciling modern machine-learning practice and the classical bias\u2013variance trade-off","volume":"116","author":"Belkin","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"124003","DOI":"10.1088\/1742-5468\/ac3a74","article-title":"Deep double descent: Where bigger models and more data hurt","volume":"2021","author":"Nakkiran","year":"2021","journal-title":"J. Stat. Mech. Theory Exp."},{"key":"ref_4","unstructured":"Nakkiran, P. (2019). More data can hurt for linear regression: Sample-wise double descent. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"474001","DOI":"10.1088\/1751-8121\/ab4c8b","article-title":"A jamming transition from under-to over-parametrization affects generalization in deep learning","volume":"52","author":"Spigler","year":"2019","journal-title":"J. Phys. A Math. Theor."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"012115","DOI":"10.1103\/PhysRevE.100.012115","article-title":"Jamming transition as a paradigm to understand the loss landscape of deep neural networks","volume":"100","author":"Geiger","year":"2019","journal-title":"Phys. Rev. E"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1016\/j.neunet.2020.08.022","article-title":"High-dimensional dynamics of generalization error in neural networks","volume":"132","author":"Advani","year":"2020","journal-title":"Neural Netw."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"949","DOI":"10.1214\/21-AOS2133","article-title":"Surprises in high-dimensional ridgeless least squares interpolation","volume":"50","author":"Hastie","year":"2022","journal-title":"Ann. Stat."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1038\/s43588-022-00376-0","article-title":"Discovering and forecasting extreme events via active learning in neural operators","volume":"2","author":"Pickering","year":"2022","journal-title":"Nat. Comput. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1007\/s00365-006-0663-2","article-title":"On early stopping in gradient descent learning","volume":"26","author":"Yao","year":"2007","journal-title":"Constr. Approx."},{"key":"ref_11","unstructured":"Heckel, R., and Yilmaz, F.F. (2021, January 4). Early Stopping in Deep Networks: Double Descent and How to Eliminate it. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_12","unstructured":"Nakkiran, P., Venkat, P., Kakade, S.M., and Ma, T. (2021, January 4). Optimal Regularization can Mitigate Double Descent. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1002\/cpa.22008","article-title":"The generalization error of random features regression: Precise asymptotics and the double descent curve","volume":"75","author":"Mei","year":"2022","journal-title":"Commun. Pure Appl. Math."},{"key":"ref_14","first-page":"1","article-title":"Data-dependent sample complexity of deep neural networks via lipschitz augmentation","volume":"32","author":"Wei","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_15","unstructured":"Wei, C., and Ma, T. (2019). Improved sample complexities for deep networks and robust classification via an all-layer margin. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10107-010-0420-4","article-title":"Pegasos: Primal estimated sub-gradient solver for svm","volume":"127","author":"Singer","year":"2011","journal-title":"Math. Program."},{"key":"ref_17","first-page":"567","article-title":"Stochastic dual coordinate ascent methods for regularized loss minimization","volume":"14","author":"Zhang","year":"2013","journal-title":"JMLR"},{"key":"ref_18","unstructured":"Allen-Zhu, Z., Qu, Z., Richt\u00e1rik, P., and Yuan, Y. (2016, January 20\u201322). Even faster accelerated coordinate descent using non-uniform sampling. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1137\/16M1060182","article-title":"Efficiency of the accelerated coordinate descent method on structured optimization problems","volume":"27","author":"Nesterov","year":"2017","journal-title":"SIAM J. Optim."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1007\/s10107-015-0864-7","article-title":"Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm","volume":"155","author":"Needell","year":"2016","journal-title":"Math. Program."},{"key":"ref_21","unstructured":"Katharopoulos, A., and Fleuret, F. (2018, January 10\u201315). Not all samples are created equal: Deep learning with importance sampling. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_22","unstructured":"Zhao, P., and Zhang, T. (2015, January 6\u201311). Stochastic optimization with importance sampling for regularized loss minimization. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_23","unstructured":"Csiba, D., Qu, Z., and Richt\u00e1rik, P. (2015, January 6\u201311). Stochastic dual coordinate ascent with adaptive probabilities. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_24","unstructured":"Perekrestenko, D., Cevher, V., and Jaggi, M. (2017, January 20\u201322). Faster coordinate descent via adaptive importance sampling. Proceedings of the Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA."},{"key":"ref_25","unstructured":"Alain, G., Lamb, A., Sankar, C., Courville, A., and Bengio, Y. (2016). Variance reduction in sgd by distributed importance sampling. arXiv."},{"key":"ref_26","first-page":"1","article-title":"Safe adaptive importance sampling","volume":"30","author":"Stich","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1038\/s42256-021-00302-5","article-title":"Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators","volume":"3","author":"Lu","year":"2021","journal-title":"Nat. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/BF02679124","article-title":"A one-dimensional model for dispersive wave turbulence","volume":"7","author":"Majda","year":"1997","journal-title":"J. Nonlinear Sci."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1137\/20M1347486","article-title":"Output-weighted optimal sampling for Bayesian experimental design and uncertainty quantification","volume":"9","author":"Blanchard","year":"2021","journal-title":"SIAM\/ASA J. Uncertain. Quantif."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"20190834","DOI":"10.1098\/rspa.2019.0834","article-title":"Output-weighted optimal sampling for Bayesian regression and rare event statistics using few samples","volume":"476","author":"Sapsis","year":"2020","journal-title":"Proc. R. Soc. A"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"20210197","DOI":"10.1098\/rsta.2021.0197","article-title":"Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression","volume":"380","author":"Sapsis","year":"2022","journal-title":"Philos. Trans. R. Soc. A"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1145\/356914.356918","article-title":"Inductive inference: Theory and methods","volume":"15","author":"Angluin","year":"1983","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1016\/0020-0190(87)90114-1","article-title":"Occam\u2019s razor","volume":"24","author":"Blumer","year":"1987","journal-title":"Inf. Process. Lett."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Vapnik, V. (1999). The Nature of Statistical Learning Theory, Springer Science & Business Media.","DOI":"10.1007\/978-1-4757-3264-1"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Hastie, T., Tibshirani, R., and Friedman, J.H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer.","DOI":"10.1007\/978-0-387-84858-7"},{"key":"ref_36","unstructured":"Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_37","unstructured":"Wilson, A.G., and Izmailov, P. (2020, January 6\u201312). Bayesian Deep Learning and a Probabilistic Perspective of Generalization. Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS\u201920, Vancouver, BC, Canada."},{"key":"ref_38","unstructured":"Pickering, E., and Sapsis, T.P. (2022). Structure and Distribution Metric for Quantifying the Quality of Uncertainty: Assessing Gaussian Processes, Deep Neural Nets, and Deep Neural Operators for Regression. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1146\/annurev-fluid-030420-032810","article-title":"Statistics of extreme events in fluid flows and waves","volume":"53","author":"Sapsis","year":"2021","journal-title":"Annu. Rev. Fluid Mech."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"686","DOI":"10.1016\/j.jcp.2018.10.045","article-title":"Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations","volume":"378","author":"Raissi","year":"2019","journal-title":"J. Comput. Phys."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"14216","DOI":"10.1073\/pnas.96.25.14216","article-title":"Spectral bifurcations in dispersive wave turbulence","volume":"96","author":"Cai","year":"1999","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1016\/S0167-2789(01)00194-4","article-title":"Wave turbulence in one-dimensional models","volume":"152","author":"Zakharov","year":"2001","journal-title":"Phys. D Nonlinear Phenom."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.physrep.2004.04.002","article-title":"One-dimensional wave turbulence","volume":"398","author":"Zakharov","year":"2004","journal-title":"Phys. Rep."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/j.physd.2013.01.003","article-title":"Quasibreathers in the MMT model","volume":"248","author":"Pushkarev","year":"2013","journal-title":"Phys. D Nonlinear Phenom."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.physd.2014.04.012","article-title":"Quantification and prediction of extreme events in a one-dimensional nonlinear dispersive wave model","volume":"280","author":"Cousins","year":"2014","journal-title":"Phys. D Nonlinear Phenom."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Rasmussen, C.E. (2003). Gaussian processes in machine learning. Summer School on Machine Learning, Springer.","DOI":"10.1007\/978-3-540-28650-9_4"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1109\/JPROC.2015.2494218","article-title":"Taking the human out of the loop: A review of Bayesian optimization","volume":"104","author":"Shahriari","year":"2015","journal-title":"Proc. IEEE"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/10\/835\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:07:54Z","timestamp":1760112474000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/10\/835"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,30]]},"references-count":47,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["e26100835"],"URL":"https:\/\/doi.org\/10.3390\/e26100835","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2024,9,30]]}}}