{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T13:41:11Z","timestamp":1762522871171,"version":"build-2065373602"},"reference-count":45,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T00:00:00Z","timestamp":1706745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Machine learning approaches are currently used to understand or model complex physical systems. In general, a substantial number of samples must be collected to create a model with reliable results. However, collecting numerous data is often relatively time-consuming or expensive. Moreover, the problems of industrial interest tend to be more and more complex, and depend on a high number of parameters. High-dimensional problems intrinsically involve the need for large amounts of data through the curse of dimensionality. That is why new approaches based on smart sampling techniques have been investigated to minimize the number of samples to be given to train the model, such as active learning methods. Here, we propose a technique based on a combination of the Fisher information matrix and sparse proper generalized decomposition that enables the definition of a new active learning informativeness criterion in high dimensions. We provide examples proving the performances of this technique on a theoretical 5D polynomial function and on an industrial crash simulation application. The results prove that the proposed strategy outperforms the usual ones.<\/jats:p>","DOI":"10.3390\/computation12020024","type":"journal-article","created":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T09:04:21Z","timestamp":1706778261000},"page":"24","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Data Augmentation for Regression Machine Learning Problems in High Dimensions"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-4094-3744","authenticated-orcid":false,"given":"Clara","family":"Guilhaumon","sequence":"first","affiliation":[{"name":"PIMM, Arts et M\u00e9tiers Institute of Technology, 151 Boulevard de l\u2019Hopital, 75013 Paris, France"},{"name":"UPR EBINNOV, Ecole de Biologie Industrielle, 49 Avenue des Genottes, 95895 Cergy, France"}]},{"given":"Nicolas","family":"Hasco\u00ebt","sequence":"additional","affiliation":[{"name":"PIMM, Arts et M\u00e9tiers Institute of Technology, 151 Boulevard de l\u2019Hopital, 75013 Paris, France"}]},{"given":"Francisco","family":"Chinesta","sequence":"additional","affiliation":[{"name":"PIMM, Arts et M\u00e9tiers Institute of Technology, 151 Boulevard de l\u2019Hopital, 75013 Paris, France"},{"name":"ESI Group, Parc Icade, Immeuble le Seville, 3bis, Saarinen, CEDEX, 94528 Rungis, France"},{"name":"CNRS@CREATE Ltd., 1 Create Way, 08-01 CREATE Tower, Singapore 138602, Singapore"}]},{"given":"Marc","family":"Lavarde","sequence":"additional","affiliation":[{"name":"UPR EBINNOV, Ecole de Biologie Industrielle, 49 Avenue des Genottes, 95895 Cergy, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4001-3132","authenticated-orcid":false,"given":"Fatima","family":"Daim","sequence":"additional","affiliation":[{"name":"ESI Group, Parc Icade, Immeuble le Seville, 3bis, Saarinen, CEDEX, 94528 Rungis, France"}]}],"member":"1968","published-online":{"date-parts":[[2024,2,1]]},"reference":[{"key":"ref_1","unstructured":"Mitchell, T. (1997). Machine Learning, McGraw-Hill."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1073\/pnas.97.1.28","article-title":"The theory of everything","volume":"97","author":"Laughlin","year":"2000","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_3","unstructured":"Goupy, J., and Creighton, L. (2006). Introduction to Design of Experiments, Dunod\/L\u2019Usine nouvelle."},{"key":"ref_4","unstructured":"Settles, B. (2009). Active Learning Literature Survey, University of Wisconsin-Madison. Computer Sciences Technical Report."},{"key":"ref_5","first-page":"042144","article-title":"Principle of maximum Fisher information from Hardy\u2019s axioms applied to statistical systems","volume":"88","author":"Frieden","year":"2013","journal-title":"Comput. Sci. Tech. Rep. E"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ib\u00e1\u00f1ez, R., and Abisset-Chavanne, E. (2018). A Multidimensional Data-Driven Sparse Identification Technique: The Sparse Proper Generalized Decomposition, Hindawi.","DOI":"10.1155\/2018\/5608286"},{"key":"ref_7","first-page":"503","article-title":"The Arrangement of Field Experiments","volume":"33","author":"Fisher","year":"1926","journal-title":"J. Minist. Agric. Great Br."},{"key":"ref_8","unstructured":"Box, G.E., and Hunter, W.G.H. (2005). Statistics for Experimenters: Design, Innovation, and Discovery, Wiley."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1093\/biomet\/33.4.305","article-title":"The Design of Optimum Multifactorial Experiments","volume":"33","author":"Plackett","year":"1946","journal-title":"Biometrika"},{"key":"ref_10","first-page":"55","article-title":"A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code","volume":"42","author":"McKay","year":"1979","journal-title":"Technometrics Am. Stat. Assoc."},{"key":"ref_11","unstructured":"Nguyen, N.K. (2008). Statistics and Applications, Volume 6, Nos.1 & 2, (New Series), Society of Statistics, Computer and Applications."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1007\/BF00116828","article-title":"Queries Concept Learning","volume":"2","author":"Angluin","year":"1988","journal-title":"Mach.-Mediat. Learn."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Angluin, D. (2001). Queries Revisited, Springer.","DOI":"10.1007\/3-540-45650-3_3"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1613\/jair.295","article-title":"Active learning with statistical models","volume":"4","author":"Cohn","year":"1996","journal-title":"J. Artif. Intell. Res."},{"key":"ref_15","unstructured":"Atlas, L., Cohn, D., Ladner, R., El-Sharkawi, M.A., and Marks, R.J. (1990). Advances in Neural Information Processing Systems 2, Morgan Kaufmann Publishers, Inc."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lewis, D., and Gale, W. (1994, January 3\u20136). A sequential algorithm for training text classifiers. Proceedings of the ACM SIGIR Conference on Research and Development Information Retrieval, Dublin, Ireland.","DOI":"10.1007\/978-1-4471-2099-5_1"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lewis, D., and Catlett, J. (1994, January 10\u201313). Heterogeneous uncertainty sampling for supervised learning. Proceedings of the International Conference on Machine Learning (ICML), New Brunswick, NJ, USA.","DOI":"10.1016\/B978-1-55860-335-6.50026-X"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Scheffer, T., Decomain, C., and Wrobel, S. (2001, January 13\u201315). Active hidden Markov models for information extraction. Proceedings of the International Conference on Advancesin Intelligent Data Analysis (CAIDA), Cascais, Portugal.","DOI":"10.1007\/3-540-44816-0_31"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Seung, H.S.M.O., and Sompolinsky, H. (1992, January 27\u201329). Query by committee. Proceedings of the ACM Workshop on Computational Learning Theory, Pittsburgh, PA, USA.","DOI":"10.1145\/130385.130417"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Dagan, I., and Engelson, S. (1995, January 9\u201312). Committee-based sampling for training probabilistic classifiers. Proceedings of the International Conference on Machine Learning (ICML), Tahoe City, CA, USA.","DOI":"10.1016\/B978-1-55860-377-6.50027-X"},{"key":"ref_22","unstructured":"McCallum, A., and Nigam, K. (1998, January 24\u201327). Employing EM in pool-based active learning for text classification. Proceedings of the International Conference on Machine Learning (ICML), Madison, WI, USA."},{"key":"ref_23","unstructured":"Seung, H.S.M.O., and Sompolins, H. (2007). Multiple-instance active learning. Adv. Neural Inf. Process. Syst. 20 (Nips), 1289\u20131296."},{"key":"ref_24","unstructured":"Settles, B., Craven, M., and Friedland, L. (2008, January 12). Active learning with real annotation costs. Proceedings of the NIPS Workshop on Cost-Sensitive Learning, Whistler, BC, Canada."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1162\/neco.1992.4.4.590","article-title":"Information-based objective functions for active data selection","volume":"4","author":"MacKay","year":"1992","journal-title":"Neural Comput."},{"key":"ref_26","unstructured":"Gal, Y., and Riashat Islam, Z.G. (2017, January 6\u201311). Deep Bayesian Active Learning with Image Data. Proceedings of the International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"103576","DOI":"10.1016\/j.ijplas.2023.103576","article-title":"Deep active learning for constitutive modelling of granular materials: From representative volume elements to implicit finite element modelling","volume":"164","author":"Qu","year":"2023","journal-title":"Int. J. Plast."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"102673","DOI":"10.1016\/j.rcim.2023.102673","article-title":"Learning by doing: A dual-loop implementation architecture of deep active learning and human-machine collaboration for smart robot vision","volume":"86","author":"Deng","year":"2024","journal-title":"Robot. Comuted Integr. Manuf."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"109359","DOI":"10.1016\/j.patcog.2023.109359","article-title":"Meta-learning for dynamic tuning of active learning on stream classification","volume":"138","author":"Martins","year":"2023","journal-title":"Pattern Recognit."},{"key":"ref_30","unstructured":"Ren, M., Triantafillou, E., Ravi, S., Snell, J., Swersky, K., Tenenbaum, J.B., Larochelle, H., and Zemel, R.S. (2018). Meta-Learning for Semi-Supervised Few-Shot Classification. Conference paper at ICLR arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/j.neucom.2016.02.007","article-title":"Active learning and data manipulation techniques for generating training examples in meta-learning","volume":"194","author":"Sousa","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1016\/j.future.2022.05.014","article-title":"A survey of human-in-the-loop for machine learning","volume":"135","author":"Wu","year":"2022","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_33","unstructured":"Atkinson, A., Donev, A., and Tobias, R. (2007). SAS, OUP."},{"key":"ref_34","first-page":"48","article-title":"An algorithm for the construction of \u201cD-optimal\u201d experimental designs","volume":"42","author":"Mitchell","year":"2000","journal-title":"Technometrics"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1016\/j.jspi.2010.07.002","article-title":"D-optimal minimax design criterion for two-level fractional factorial designs","volume":"141","author":"Wilmut","year":"2011","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.jspi.2018.06.006","article-title":"A method for augmenting supersaturated designs","volume":"199","author":"Zhang","year":"2019","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"3529","DOI":"10.1002\/qre.2931","article-title":"Input-response space-filling designs","volume":"37","author":"Lu","year":"2021","journal-title":"Qual. Reliab. Eng. Int."},{"key":"ref_38","unstructured":"Chinesta, F., Huerta, A., Rozza, G., and Willcox, K. (2015). Encyclopedia of Computational Mechanics, John Wiley and Sons. Volume Model Order Reduction."},{"key":"ref_39","unstructured":"Sancarlos, A., Victor Champaney, J.L.D., and Chinesta, F. (2021). PGD-based Advanced Nonlinear Multiparametric Regression for Constructing Metamodels at the scarce data limit. arXiv."},{"key":"ref_40","unstructured":"Ibanez, R. (2019). Advanced Physics-Based and Data-Driven Strategies. [Ph.D. Thesis, Universitat Polit\u00e8cnica de Catalunya \u00b7 Barcelona Tech\u2014UPC]."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1007\/s42452-021-04310-3","article-title":"A novel sparse reduced order formulation for modeling electromagnetic forces in electric motors","volume":"3","author":"Sancarlos","year":"2021","journal-title":"SN Appl. Sci."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1007\/s11831-020-09404-6","article-title":"From ROM of electrochemistry to ai-based battery digital and hybrid twin","volume":"28","author":"Sancarlos","year":"2020","journal-title":"Arch. Comput. Methods Eng."},{"key":"ref_43","unstructured":"Argerich, C. (2020). Study and Development of New Acoustic Technologies for Nacelle Products. [Ph.D. Thesis, Universitat Politecnica de Catalunya]."},{"key":"ref_44","first-page":"309","article-title":"On the mathematical foundations of theoretical statistics","volume":"222","author":"RA","year":"1922","journal-title":"A Contain. Pap. Math. Phys. Character"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"363","DOI":"10.4153\/CJM-1960-030-4","article-title":"The equivalence of two extremum problems","volume":"12","author":"Kiefer","year":"1960","journal-title":"Can. J. Math."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/2\/24\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:52:50Z","timestamp":1760104370000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/2\/24"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,1]]},"references-count":45,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,2]]}},"alternative-id":["computation12020024"],"URL":"https:\/\/doi.org\/10.3390\/computation12020024","relation":{},"ISSN":["2079-3197"],"issn-type":[{"type":"electronic","value":"2079-3197"}],"subject":[],"published":{"date-parts":[[2024,2,1]]}}}