{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T19:27:29Z","timestamp":1764962849232,"version":"3.46.0"},"reference-count":47,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,11,29]],"date-time":"2025-11-29T00:00:00Z","timestamp":1764374400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Machine learning approaches are commonly used to model physical phenomena due to their adaptability to complex systems. In general, a substantial number of samples must be collected to create a model with reliable results. However, collecting numerous data points is often costly. Moreover, high-dimensional problems inherently require large amounts of data due to the curse of dimensionality. That is why new approaches based on smart sampling techniques are being investigated to optimize the acquisition of training samples, such as active learning methods. Initialization is a crucial step in active learning as it influences both performance and computational cost. Moreover, the scenarios used to select the next sample, such as classic pool-based sampling, can be highly resource- and time consuming. This study focuses on optimizing active learning methods through a comprehensive analysis of initialization strategies and scenario design, proposing and evaluating multiple approaches to determine the optimal configurations. The methods are applied to high-dimensional industrial problems with dimensions ranging from 5 to 15, where challenges associated with high dimensionality are already significant. To address this, the proposed study uses an active learning criterion that combines Sparse Proper Generalized Decomposition with Fisher information theory, specifically tailored to high-dimensional industrial settings. We illustrate the effectiveness of these techniques through examples on theoretical 5D and 15D functions, as well as a practical industrial crash simulation application.<\/jats:p>","DOI":"10.3390\/a18120757","type":"journal-article","created":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T18:42:02Z","timestamp":1764960122000},"page":"757","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Optimized Active Learning Method for High-Dimensional Industrial Regression Problems"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-4094-3744","authenticated-orcid":false,"given":"Clara","family":"Guilhaumon","sequence":"first","affiliation":[{"name":"PIMM, Arts et M\u00e9tiers Institute of Technology, 151 Boulevard de l\u2019Hopital, 75013 Paris, France"},{"name":"EBI, Ecole de Biologie Industrielle, 49 Avenue des Genottes, 95895 Cergy, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicolas","family":"Hascoet","sequence":"additional","affiliation":[{"name":"PIMM, Arts et M\u00e9tiers Institute of Technology, 151 Boulevard de l\u2019Hopital, 75013 Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Francisco","family":"Chinesta","sequence":"additional","affiliation":[{"name":"PIMM, Arts et M\u00e9tiers Institute of Technology, 151 Boulevard de l\u2019Hopital, 75013 Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1186-0295","authenticated-orcid":false,"given":"Marc","family":"Lavarde","sequence":"additional","affiliation":[{"name":"EBI, Ecole de Biologie Industrielle, 49 Avenue des Genottes, 95895 Cergy, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,29]]},"reference":[{"key":"ref_1","unstructured":"Mitchell, T. (1997). Machine Learning, McGraw-Hill."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1073\/pnas.97.1.28","article-title":"The theory of everything","volume":"97","author":"Laughlin","year":"2000","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_3","unstructured":"Goupy, J., and Creighton, L. (2006). Introduction to Design of Experiments, Dunod L\u2019Usine Nouvelle."},{"key":"ref_4","unstructured":"Settles, B. (2009). Active Learning Literature Survey, University of Wisconsin\u2013Madison. Technical Report #1648."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1023\/A:1022821128753","article-title":"Queries and concept learning","volume":"2","author":"Angluin","year":"1988","journal-title":"Mach.-Mediat. Learn."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Angluin, D. (2001). Queries Revisited, Springer.","DOI":"10.1007\/3-540-45650-3_3"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1613\/jair.295","article-title":"Active learning with statistical models","volume":"4","author":"Cohn","year":"1996","journal-title":"J. Artif. Intell. Res."},{"key":"ref_8","unstructured":"Cohn, D., Atlas, L., and Ladner, R. (December, January 30). Training connection networks with queries and selective sampling. Proceedings of the NIPS (Neural Information Processing Systems), Denver, CO, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lewis, D., and Gale, W. (1994). A sequential algorithm for training text classifiers. SIGIR\u201994, Springer.","DOI":"10.1007\/978-1-4471-2099-5_1"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"110136","DOI":"10.1016\/j.ast.2025.110136","article-title":"A hybrid single-loop approach combining the target beta-hypersphere sampling and active learning Kriging for reliability-based design optimization","volume":"161","author":"Hu","year":"2025","journal-title":"Aerosp. Sci. Technol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"112307","DOI":"10.1016\/j.buildenv.2024.112307","article-title":"Updating surrogate models in early building design via tabular transfer learning","volume":"267","author":"Hinkle","year":"2025","journal-title":"Build. Environ."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/S0378-3758(03)00193-9","article-title":"Budget constrained run orders in optimum design","volume":"124","author":"Tack","year":"2004","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_13","unstructured":"Franco, J. (2008). Planification d\u2019Exp\u00e9riences Num\u00e9riques en Phase Exploratoire Pour la Simulation des Ph\u00e9nom\u00e8nes Complexes. [Ph.D. Thesis, Ecole Nationale Sup\u00e9rieure des Mines de Saint-Etienne]."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"108836","DOI":"10.1016\/j.patcog.2022.108836","article-title":"To actively initialize active learning","volume":"131","author":"Yang","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Guilhaumon, C., Hascoet, N., Chinesta, F., Lavarde, M., and Daim, F. (2024). Data Augmentation for Regression Machine Learning Problems in High Dimensions. Computation, 12.","DOI":"10.3390\/computation12020024"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"042144","DOI":"10.1103\/PhysRevE.88.042144","article-title":"Principle of maximum Fisher information from Hardy\u2019s axioms applied to statistical systems","volume":"88","author":"Frieden","year":"2013","journal-title":"Phys. Rev. E"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5608286","DOI":"10.1155\/2018\/5608286","article-title":"A Multidimensional Data-Driven Sparse Identification Technique: The Sparse Proper Generalized Decomposition","volume":"2018","year":"2018","journal-title":"Complexity"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","article-title":"Nearest Neighbor Pattern Classification","volume":"13","author":"Cover","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_19","unstructured":"MacKay, D.J.C. (2003). Information Theory, Inference and Learning Algorithms, Cambridge University Press."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_21","first-page":"503","article-title":"The Arrangement of Field Experiments","volume":"33","author":"Fisher","year":"1926","journal-title":"J. Minist. Agric. Great Br."},{"key":"ref_22","unstructured":"Box, G.E., and Hunter, W.G. (2005). Statistics for Experimenters: Design, Innovation and Discovery, Wiley."},{"key":"ref_23","first-page":"239","article-title":"A Comparison of Three Methods","volume":"21","author":"McKay","year":"1979","journal-title":"Technometrics"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1093\/biomet\/33.4.305","article-title":"The design of optimum multifactorial experiments","volume":"33","author":"Burman","year":"1946","journal-title":"Biometrika"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_26","unstructured":"Scheffer, T., Decomain, C., and Wrobel, S. (2001, January 13\u201315). Active hidden Markov models for information extraction. Proceedings of the 4th International Conference, IDA 2001, Cascais, Portugal."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Seung, H.S., Opper, M., and Sompolinsky, H. (1992, January 27\u201329). Query by committee. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA.","DOI":"10.1145\/130385.130417"},{"key":"ref_28","unstructured":"Settles, B., Craven, M., and Ray, S. (2007, January 3\u20139). Multiple-instance active learning. Proceedings of the NIPS (Neural Information Processing Systems) 20, Vancouver, BC, Canada."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1162\/neco.1992.4.4.590","article-title":"Information-based objective functions for active data selection","volume":"4","author":"MacKay","year":"1992","journal-title":"Neural Comput."},{"key":"ref_30","unstructured":"Sancarlos, A., Champaney, V., Duval, J.L., and Chinesta, F. (2021). PGD-based Advanced Nonlinear Multiparametric Regression for Constructing Metamodels at the scarce data limit. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/comjnl\/7.4.308","article-title":"A Simplex Method for Function Minimization","volume":"7","author":"Nelder","year":"1965","journal-title":"Comput. J."},{"key":"ref_32","unstructured":"Wright, M.H. (1996). Direct search methods: Once scorned, now respectable. Numerical Analysis 1995: Dundee Biennial Conference, Chapman and Hall\/CRC."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1190","DOI":"10.1137\/0916069","article-title":"A Limited Memory Algorithm for Bound Constrained Optimization","volume":"16","author":"Byrd","year":"1995","journal-title":"SIAM J. Sci. Stat. Comput."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1145\/279232.279236","article-title":"L-BFGS-B: Algorithm 778","volume":"23","author":"Zhu","year":"1997","journal-title":"ACM Trans. Math. Softw."},{"key":"ref_35","unstructured":"Kraft, D. (1988). A Software Package for Sequential Quadratic Programming, DLR German Aerospace Center\u2014Institute for Flight Mechanics. Technical Report."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gil, A., Segura, J., and Temme, N.M. (2007). Numerical Methods for Special Functions, SIAM.","DOI":"10.1137\/1.9780898717822"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1080\/08982112.2015.1100447","article-title":"Space-filling designs for computer experiments: A review","volume":"28","author":"Joseph","year":"2016","journal-title":"Qual. Eng."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1016\/0378-3758(94)00035-T","article-title":"Exploratory designs for computer experiments","volume":"43","author":"Morris","year":"1995","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1080\/03610918208812265","article-title":"A distribution-free approach to inducing rank correlation among input variables","volume":"11","author":"Iman","year":"1982","journal-title":"Commun. Stat.-Simul. Comput."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/0378-3758(94)90115-5","article-title":"Optimal Latin-hypercube designs for computer experiments","volume":"39","author":"Park","year":"1994","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1016\/j.jspi.2010.07.002","article-title":"D-optimal minimax design criterion for two-level fractional factorial designs","volume":"141","author":"Wilmut","year":"2011","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"803","DOI":"10.1016\/j.spl.2005.10.014","article-title":"Halton and Hammersley sequences in multivariate nonparametric regression","volume":"76","author":"Rafajowicz","year":"2006","journal-title":"Stat. Probab. Lett."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Asmussen, S., and Glynn, P.W. (2007). Stochastic Simulation: Algorithms and Analysis, Springer.","DOI":"10.1007\/978-0-387-69033-9"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Drmota, M., and Tichy, R.F. (1997). Sequences, Discrepancies and Applications, Springer. Lecture Notes in Mathematics.","DOI":"10.1007\/BFb0093404"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1145\/355588.365104","article-title":"Algorithm 247: Radical-inverse quasi-random point sequence","volume":"7","author":"Halton","year":"1964","journal-title":"Commun. ACM"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Hammersley, J.M., and Handscomb, D.C. (1964). Monte Carlo Methods, Springer.","DOI":"10.1007\/978-94-009-5819-7"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/0041-5553(67)90144-9","article-title":"Distribution of points in a cube and approximate evaluation of integrals","volume":"7","author":"Sobol","year":"1967","journal-title":"USSR Comput. Maths. Math. Phys."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/757\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T18:58:48Z","timestamp":1764961128000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/757"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,29]]},"references-count":47,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["a18120757"],"URL":"https:\/\/doi.org\/10.3390\/a18120757","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,29]]}}}