{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T04:51:49Z","timestamp":1776315109174,"version":"3.50.1"},"reference-count":80,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T00:00:00Z","timestamp":1758585600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Neural architecture search (NAS) exploration requires tremendous amounts of computational power to properly explore. This makes exploration of modern NAS search spaces impractical for researchers due to the infrastructure investments required and the time needed to effectively design, train, validate, and evaluate each architecture within the search space. Based on the fact that early-stopping random search algorithms are competitive against leading NAS methods, this paper explores how much of the search space should be explored by applying various forms of the famous decision-making riddle within optimal stopping theory: the Secretary Problem (SP). A total of 672 unique architectures, each trained and evaluated against the MNIST and CIFAR-10 datasets over 20,000 runs, producing 6,720 trained models confirm theoretically and empirically the need to randomly explore ~37% of the NAS search space until halting can occur for an acceptable discovered neural architecture. Additional extensions of the SP investigated include implementing a \u201cgood enough\u201d and a \u201ccall back\u201d feature; both further reduce exploration of the NAS search space to ~15 and 4%, respectively. Each of these investigations were further confirmed statistically upon NAS search space populations consisting of 100\u20133,500 neural architectures increasing in steps of 50, with each population size analyzed over 20,000 runs. The paper details how researchers should implement each of these variants, with caveats, to balance computational resource costs and the desire to conduct sufficient NAS practices in a reasonable timeframe.<\/jats:p>","DOI":"10.3389\/frai.2025.1643088","type":"journal-article","created":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T05:28:05Z","timestamp":1758605285000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Neural architecture search applying optimal stopping theory"],"prefix":"10.3389","volume":"8","author":[{"given":"Matthew","family":"Sheehan","sequence":"first","affiliation":[]},{"given":"Oleg","family":"Yakimenko","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,9,23]]},"reference":[{"key":"ref1","article-title":"Zero-cost proxies for lightweight NAS","volume-title":"Proceedings of the international conference on learning representations (ICLR)","author":"Abdelfattah","year":"2021"},{"key":"ref2","first-page":"1","volume-title":"Designing neural network architectures using reinforcement learning","author":"Baker","year":"2017"},{"key":"ref3","volume-title":"Understanding and simplifying one-shot architecture search","author":"Bender","year":"2018"},{"key":"ref9001","volume-title":"Kissing the frog: A Mathematician\u2019s guide to mating","author":"Billingham","year":"2008"},{"key":"ref4","article-title":"Once-for-all: train one network and specialize it for efficient deployment","author":"Cai","year":"2019","journal-title":"arXiv:1908.09791"},{"key":"ref5","article-title":"Dqnas: neural architecture search using reinforcement learning","author":"Chauhan","year":"2023","journal-title":"arXiv preprint arXiv:2301.06687"},{"key":"ref10","volume-title":"AutoML: automating the design of machine learning models for autonomous driving","author":"Cheng","year":"2019"},{"key":"ref6","article-title":"Searching for efficient multi-scale architectures for dense image prediction.","volume-title":"Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS)","author":"Chen","year":"2018"},{"key":"ref9","first-page":"1294","volume-title":"Progressive differentiable architecture search: bridging the depth gap between search and evaluation","author":"Chen","year":"2019"},{"key":"ref7","first-page":"9497","article-title":"Contrastive neural architecture search with neural architecture comparators","volume-title":"CVPR","author":"Chen","year":"2021"},{"key":"ref8","doi-asserted-by":"publisher","first-page":"13489","DOI":"10.1109\/TPAMI.2023.3293885","article-title":"MNGNAS: distilling adaptive combination of multiple searched networks for one-shot neural architecture search","volume":"45","author":"Chen","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref11","doi-asserted-by":"publisher","first-page":"25217","DOI":"10.1109\/ACCESS.2023.3253818","article-title":"Neural architecture search benchmarks: insights and survey","volume":"11","author":"Chitty-Venkata","year":"2023","journal-title":"IEEE Access"},{"key":"ref12","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1007\/BF02759948","article-title":"Optimal selection based on relative rank","volume":"2","author":"Chow","year":"1964","journal-title":"Isr. J. Math."},{"key":"ref9002","doi-asserted-by":"publisher","first-page":"6659","DOI":"10.1109\/ICCV48922.2021.00659","article-title":"Evolving search space for neural architecture search","author":"Ci","year":"2021","journal-title":"In Proceedings of the International Conference on Computer Vision (ICCV)"},{"key":"ref13","article-title":"The rising costs of training frontier AI models","author":"Cottier","year":"2024","journal-title":"arXiv preprint arXiv:2045.21015"},{"key":"ref14","first-page":"916","volume-title":"AutoSpeech: neural architecture search for speaker recognition","author":"Ding","year":"2020"},{"key":"ref15","first-page":"627","article-title":"The optimum choice of the instant for stopping a Markov process","volume":"4","author":"Dynkin","year":"1963","journal-title":"Sov. Math. Dokl."},{"key":"ref17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/978-3-030-05318-5_11","article-title":"Neural architecture search: a survey","volume":"20","author":"Elsken","year":"2019","journal-title":"JMLR"},{"key":"ref16","article-title":"Simple and efficient architecture search for convolutional neural networks","author":"Elsken","year":"2017","journal-title":"arXiv preprint arXiv:1711.04528"},{"key":"ref18","volume-title":"Parameter, compute and data trends in machine learning","year":"2025"},{"key":"ref19","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1214\/ss\/1177012493","article-title":"Who solved the secretary problem?","volume":"4","author":"Ferguson","year":"1989","journal-title":"Stat. Sci."},{"key":"ref1900","article-title":"Strategic dating: The 37% rule. Plus Magazine, University of Cambridge.","author":"Freiberger","year":"2017"},{"key":"ref20","doi-asserted-by":"publisher","first-page":"35","DOI":"10.2307\/2283044","article-title":"Recognizing the maximum of a sequence","volume":"1","author":"Gilbert","year":"1966","journal-title":"J. Am. Stat. Assoc."},{"key":"ref21","doi-asserted-by":"crossref","first-page":"1588","DOI":"10.1214\/aop\/1176988613","article-title":"Solution to the game of googol","volume":"22","author":"Gnedin","year":"1994","journal-title":"The annals of probability."},{"key":"ref22","first-page":"3224","article-title":"Autogan: neural architecture search for generative adversarial networks","author":"Gong","year":"2019"},{"key":"ref23","volume-title":"Deep learning","author":"Goodfellow","year":"2016"},{"key":"ref24","doi-asserted-by":"publisher","first-page":"1789","DOI":"10.1007\/s11263-021-01453-z","article-title":"Knowledge distillation: a survey","volume":"129","author":"Gou","year":"2012","journal-title":"Int. J. Comput. Vis."},{"key":"ref26","doi-asserted-by":"publisher","first-page":"6501","DOI":"10.1109\/TPAMI.2021.3086914","article-title":"Towards accurate and compact architectures via neural architecture transformer","volume":"44","author":"Guo","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref25","first-page":"544","volume-title":"Single path one-shot neural architecture search with uniform sampling","author":"Guo","year":"2020"},{"key":"ref27","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1511\/2009.77.126","article-title":"Knowing when to stop","volume":"97","author":"Hill","year":"2009","journal-title":"Am. Sci."},{"key":"ref29","first-page":"1","volume-title":"GPipe: efficient training of giant neural networks using pipeline parallelism","author":"Huang","year":"2019"},{"key":"ref28","first-page":"119","volume-title":"Angle-based search space shrinking for neural architecture search","author":"Hu","year":"2020"},{"key":"ref30","article-title":"A review of meta-reinforcement learning for deep neural networks architecture search","author":"Jaafra","year":"2018","journal-title":"arXiv preprint arXiv:1812.07995"},{"key":"ref31","first-page":"1","volume-title":"ADAM: a method for stochastic optimization","author":"Kingma","year":"2015"},{"key":"ref32","article-title":"Fast Bayesian optimization of machine learning hyperparameters on large datasets","author":"Klein","year":"2016","journal-title":"arXiv preprint arXiv:1605.07079"},{"key":"ref35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3476994","article-title":"FLASH: fast neural architecture search with hardware optimization","volume":"20","author":"Li","year":"2021","journal-title":"ACM Trans. Embed. Comput. Syst."},{"key":"ref33","article-title":"Geometry-aware gradient algorithms for neural architecture search","author":"Li","year":"2021"},{"key":"ref36","first-page":"367","volume-title":"Random search and reproducibility for neural architecture search","author":"Li","year":"2020"},{"key":"ref37","volume-title":"Best practices for scientific research on neural architecture search","author":"Lindauer","year":"2020"},{"key":"ref38","doi-asserted-by":"publisher","first-page":"39","DOI":"10.2307\/2985407","article-title":"Dynamic programming and decision theory","volume":"10","author":"Lindley","year":"1961","journal-title":"Appl. Stat."},{"key":"ref39","article-title":"DARTS: differentiable architecture search","author":"Liu","year":"2019","journal-title":"arXiv: 1806.09055"},{"key":"ref41","doi-asserted-by":"publisher","first-page":"100002","DOI":"10.1016\/j.jai.2022.100002","article-title":"A survey on computationally efficient neural architecture search","volume":"1","author":"Liu","year":"","journal-title":"J. Autom. Intell."},{"key":"ref40","article-title":"A survey on evolutionary neural architecture search","author":"Liu","year":"","journal-title":"arXiv preprint arXiv:2008.10937"},{"key":"ref42","first-page":"4750","volume-title":"NSGA-NET: A multi-objective genetic algorithm for neural architecture search","author":"Lu","year":"2018"},{"key":"ref43","article-title":"The AI Index 2024 annual report. Institute for Human-Centered AI. Stanford University, Palo Alto, CA, USA.","author":"Maslej","year":"2024"},{"key":"ref44","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-030-72062-9_37","article-title":"Local search is a remarkably strong baseline for neural architecture search","author":"Ottelander","year":"2021"},{"key":"ref45","first-page":"1","volume-title":"Fast deep neural architecture search for wearable activity recognition by early prediction of converged performance","author":"Pellatt","year":"2021"},{"key":"ref46","article-title":"Efficient neural architecture search via parameters sharing","author":"Pham","year":"2018"},{"key":"ref47","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1016\/0304-4149(87)90020-2","article-title":"The full-information best choice problem with a random number of observations","volume":"24","author":"Porosinski","year":"1987","journal-title":"Stoch. Process. Appl."},{"key":"ref48","article-title":"To share or not to share: a comprehensive appraisal of weight-sharing","author":"Pourchot","year":"2020","journal-title":"arXiv preprint arXiv:2002.04289"},{"key":"ref49","first-page":"1882","volume-title":"On network design spaces for visual recognition","author":"Radosavovic","year":"2019"},{"key":"ref50","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3447582","article-title":"A comprehensive survey of neural architecture search: challenges and solutions","volume":"54","author":"Ren","year":"2021","journal-title":"ACM Comput. Surv."},{"key":"ref52","article-title":"NAS-Bench-301 and the case for surrogate benchmarks for neural architecture search","author":"Siems","year":"2020","journal-title":"arXiv preprint arXiv:2008.09777"},{"key":"ref53","first-page":"1","volume-title":"Neural networks designing neural networks: Multi-objective hyper-parameter optimization","author":"Smithson","year":"2016"},{"key":"ref54","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1162\/106365602320169811","article-title":"Evolving neural networks through augmenting topologies","volume":"10","author":"Stanley","year":"2002","journal-title":"Evol. Comput."},{"key":"ref55","article-title":"RRR-net: reusing, reducing, and recycling a deep backbone network","author":"Sun","year":"2023","journal-title":"arXiv: 2310.01157"},{"key":"ref56","volume-title":"When to stop dating and settle down, according to math","author":"Swanson","year":"2016"},{"key":"ref57","first-page":"1","volume-title":"MixConv: mixed depthwise convolutional kernels","author":"Tan","year":"2019"},{"key":"ref58","article-title":"The computational limits of deep learning","author":"Thompson","year":"2020","journal-title":"arXiv: 2007.05558"},{"key":"ref59","doi-asserted-by":"publisher","first-page":"1840","DOI":"10.1109\/9.793723","article-title":"Optimal stopping of Markov processes: hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives","volume":"44","author":"Tsitsiklis","year":"1999","journal-title":"IEEE Trans. Automat. Control"},{"key":"ref60","volume-title":"HPE SGI 8600 (Gaffney) user guide","year":"2021"},{"key":"ref61","volume-title":"The theory of optimal stopping","author":"Weber","year":"1975"},{"key":"ref62","first-page":"658","article-title":"Exploring the loss landscape in neural architecture search","author":"White","year":"2021"},{"key":"ref63","article-title":"Neural architecture search: insights from 1000 papers","author":"White","year":"2023","journal-title":"arXiv: 2301.08727"},{"key":"ref64","first-page":"28904","volume-title":"Stronger NAS with weaker predictors.","author":"Wu","year":"2021"},{"key":"ref65","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1016\/j.icte.2023.11.001","article-title":"Training-free neural architecture search: a review","volume":"10","author":"Wu","year":"2024","journal-title":"ICT Express"},{"key":"ref66","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1088\/1757-899X\/750\/1\/012223","article-title":"A survey on one-shot neural architecture search","volume":"750","author":"Xiao","year":"2020","journal-title":"IOP Conf. Ser."},{"key":"ref67","article-title":"Efficient evaluation methods for neural architecture search: a survey","author":"Xie","year":"2023","journal-title":"arXiv: 2301.05919"},{"key":"ref68","first-page":"1","volume-title":"PC-DARTS: partial channel connections for memory-efficient architecture search","author":"Xu","year":"2020"},{"key":"ref70","article-title":"NAS evaluation is frustratingly hard","author":"Yang","year":"2020"},{"key":"ref69","article-title":"Nas-bench-x11 and the power of learning curves","author":"Yan","year":"2021"},{"key":"ref71","first-page":"7105","volume-title":"NAS-Bench-101: Towards reproducible neural architecture search","author":"Ying","year":"2019"},{"key":"ref72","article-title":"How to train your super-net: an analysis of training heuristics in weight-sharing NAS","author":"Yu","year":"","journal-title":"arXiv preprint arXiv:2003.04276"},{"key":"ref51","article-title":"In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia","author":"Yu","year":""},{"key":"ref73","article-title":"Evaluating the search phase of neural architecture search","author":"Yu","year":"2019","journal-title":"arXiv: preprint 1902.08142v3"},{"key":"ref74","article-title":"ADADELTA: an adaptive learning rate method","author":"Zeiler","year":"2012","journal-title":"arXiv:1212.5701v1"},{"key":"ref75","article-title":"NAS-Bench-1Shot1: benchmarking and dissecting one-shot neural architecture search","author":"Zela","year":"2020"},{"key":"ref76","article-title":"Deeper insights into weight sharing in neural architecture search","author":"Zhang","year":"2020","journal-title":"arXiv preprint arXiv:2001.01431"},{"key":"ref77","article-title":"Neural architecture search with reinforcement learning","author":"Zoph","year":"2016","journal-title":"arXiv: 1611.01578"},{"key":"ref78","first-page":"8697","volume-title":"Learning transferable architectures for scalable image recognition","author":"Zoph","year":"2018"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1643088\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T05:28:37Z","timestamp":1758605317000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1643088\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,23]]},"references-count":80,"alternative-id":["10.3389\/frai.2025.1643088"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1643088","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,23]]},"article-number":"1643088"}}