{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T17:22:33Z","timestamp":1774891353880,"version":"3.50.1"},"reference-count":327,"publisher":"Emerald","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,2,27]]},"abstract":"<jats:p>Deep learning (DL) has proven to be a highly effective approach for developing models in diverse contexts, including visual perception, speech recognition, and machine translation. However, the end-to-end process for applying DL is not trivial. It requires grappling with problem formulation and context understanding, data engineering, model development, deployment, continuous monitoring and maintenance, and so on. Moreover, each of these steps typically relies heavily on humans, in terms of both knowledge and interactions, which impedes the further advancement and democratization of DL. Consequently, in response to these issues, a new field has emerged over the last few years: automated deep learning (AutoDL). This endeavor seeks to minimize the need for human involvement and is best known for its achievements in neural architecture search (NAS), a topic that has been the focus of several surveys. That stated, NAS is not the be-all and end-all of AutoDL. Accordingly, this review adopts an overarching perspective, examining research efforts into automation across the entirety of an archetypal DL workflow. In so doing, this work also proposes a comprehensive set of ten criteria by which to assess existing work in both individual publications and broader research areas. These criteria are: novelty, solution quality, efficiency, stability, interpretability, reproducibility, engineering quality, scalability, generalizability, and eco-friendliness. Thus, ultimately, this review provides an evaluative overview of AutoDL in the early 2020s, identifying where future opportunities for progress may exist.<\/jats:p>","DOI":"10.1561\/2200000119","type":"journal-article","created":{"date-parts":[[2024,2,27]],"date-time":"2024-02-27T04:54:47Z","timestamp":1709009687000},"page":"767-920","source":"Crossref","is-referenced-by-count":13,"title":["Automated Deep Learning: Neural Architecture Search Is Not the End"],"prefix":"10.1561","volume":"17","author":[{"given":"Xuanyi","family":"Dong","sequence":"first","affiliation":[{"name":"University of Technology Sydney Complex Adaptive Systems Lab, , and Brain Team, Google Research, USA","place":["Australia"]}]},{"given":"David Jacob","family":"Kedziora","sequence":"additional","affiliation":[{"name":"University of Technology Sydney Complex Adaptive Systems Lab, ,","place":["Australia"]}]},{"given":"Katarzyna","family":"Musial","sequence":"additional","affiliation":[{"name":"University of Technology Sydney Complex Adaptive Systems Lab, ,","place":["Australia"]}]},{"given":"Bogdan","family":"Gabrys","sequence":"additional","affiliation":[{"name":"University of Technology Sydney Complex Adaptive Systems Lab, , and Brain Team, Google Research, USA","place":["Australia"]}]}],"member":"140","published-online":{"date-parts":[[2024,2,27]]},"reference":[{"key":"2026033012320252900_ref001","volume-title":"Zero-cost proxies for lightweight NAS","author":"Abdelfattah","year":"2021"},{"key":"2026033012320252900_ref002","first-page":"3981","volume-title":"Placeto: Learning generalizable device placement algorithms for distributed machine learning","author":"Addanki","year":"2019"},{"key":"2026033012320252900_ref003","volume-title":"An Introduction to Categorical Data Analysis","author":"Agresti","year":"2018"},{"key":"2026033012320252900_ref004","doi-asserted-by":"publisher","first-page":"2623","DOI":"10.1145\/3292500.3330701","volume-title":"Optuna: A next-generation hyperparameter optimization framework","author":"Akiba","year":"2019"},{"key":"2026033012320252900_ref005","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1007\/978-3-030-29911-8_8","article-title":"A meta-reinforcement learning approach to optimize parameters and hyper-parameters simultaneously","volume-title":"Pricai","author":"Ali","year":"2019"},{"key":"2026033012320252900_ref006","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1016\/j.procs.2018.07.204","article-title":"Cross-domain metalearning for time-series forecasting","volume":"126","author":"Ali","year":"2018","journal-title":"Procedia Computer Science"},{"key":"2026033012320252900_ref007","doi-asserted-by":"publisher","first-page":"5","DOI":"10.21314\/JOR.2001.041","article-title":"Optimal execution of portfolio transactions","volume":"3","author":"Almgren","year":"2001","journal-title":"Journal of Risk"},{"key":"2026033012320252900_ref008","volume-title":"Learning to learn by gradient descent by gradient descent","author":"Andrychowicz","year":"2016"},{"key":"2026033012320252900_ref009","doi-asserted-by":"publisher","first-page":"715","DOI":"10.1145\/3297858.3304049","volume-title":"PUMA: A programmable ultra-efficient memristor- based accelerator for machine learning inference","author":"Ankit","year":"2019"},{"key":"2026033012320252900_ref010","first-page":"354","volume-title":"Bayesian optimization of composite functions","author":"Astudillo","year":"2019"},{"key":"2026033012320252900_ref011","volume-title":"Metalearning with adaptive hyperparameters","author":"Baik","year":"2020"},{"key":"2026033012320252900_ref012","volume-title":"Designing neural network architectures using reinforcement learning","author":"Baker","year":"2017"},{"key":"2026033012320252900_ref013","volume-title":"Accelerating neural architecture search using performance prediction","author":"Baker","year":"2018"},{"key":"2026033012320252900_ref014","first-page":"279","volume-title":"Modular learning in neural networks","author":"Ballard","year":"1987"},{"key":"2026033012320252900_ref015","volume-title":"Online learning rate adaptation with hypergradient descent","author":"Baydin","year":"2018"},{"key":"2026033012320252900_ref016","doi-asserted-by":"publisher","first-page":"755","DOI":"10.1007\/978-3-642-04277-5_76","volume-title":"Evolving memory cell structures for sequence learning","author":"Bayer","year":"2009"},{"key":"2026033012320252900_ref017","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/7503.003.0022","volume-title":"Analysis of representations for domain adaptation","author":"Ben-David","year":"2007"},{"key":"2026033012320252900_ref018","first-page":"550","volume-title":"Understanding and simplifying one-shot architecture search","author":"Bender","year":"2018"},{"key":"2026033012320252900_ref019","doi-asserted-by":"publisher","first-page":"14323","DOI":"10.1109\/CVPR42600.2020.01433","volume-title":"Can weight sharing outperform random architecture search? An investigation with tunas","author":"Bender","year":"2020"},{"issue":"8","key":"2026033012320252900_ref020","doi-asserted-by":"publisher","first-page":"1889","DOI":"10.1162\/089976600300015187","article-title":"Gradient-based optimization of hyperparameters","volume":"12","author":"Bengio","year":"2000","journal-title":"Neural Computation"},{"key":"2026033012320252900_ref021","volume-title":"Learning a Synaptic Learning Rule","author":"Bengio","year":"1990"},{"key":"2026033012320252900_ref022","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1145\/1553374.1553380","volume-title":"Curriculum learning","author":"Bengio","year":"2009"},{"key":"2026033012320252900_ref023","volume-title":"Algorithms for hyper-parameter optimization","author":"Bergstra","year":"2011"},{"issue":"2","key":"2026033012320252900_ref024","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"Journal of Machine Learning Research (JMLR)"},{"key":"2026033012320252900_ref025","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-05348-2_10","volume-title":"CAVE: Configuration assessment, visualization and evaluation","author":"Biedenkapp","year":"2018"},{"issue":"2","key":"2026033012320252900_ref026","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1111\/j.2517-6161.1964.tb00553.x","article-title":"An analysis of transformations","volume":"26","author":"Box","year":"1964","journal-title":"Journal of the Royal Statistical Society: Series B (Methodological)"},{"issue":"3","key":"2026033012320252900_ref027","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1214\/ss\/1009213726","article-title":"Statistical modeling: The two cultures (with comments and a rejoinder by the author)","volume":"16","author":"Breiman","year":"2001","journal-title":"Statistical Science"},{"key":"2026033012320252900_ref028","volume-title":"SMASH: One-shot model architecture search through hypernetworks","author":"Brock","year":"2018"},{"key":"2026033012320252900_ref029","first-page":"1877","volume-title":"Language models are few-shot learners","author":"Brown","year":"2020"},{"key":"2026033012320252900_ref030","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58592-1_19","volume-title":"BATS: Binary architecture search","author":"Bulat","year":"2020"},{"key":"2026033012320252900_ref031","volume-title":"Once-for-all: Train one network and specialize it for efficient deployment","author":"Cai","year":"2020"},{"key":"2026033012320252900_ref032","doi-asserted-by":"publisher","first-page":"678","DOI":"10.1609\/aaai.v32i1.11709","volume-title":"Path-level network transformation for efficient architecture search","author":"Cai","year":"2018"},{"key":"2026033012320252900_ref033","volume-title":"ProxylessNAS: Direct neural architecture search on target task and hardware","author":"Cai","year":"2019"},{"issue":"1","key":"2026033012320252900_ref034","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1016\/S0734-189X(87)80014-2","article-title":"A massively parallel architecture for a self-organizing neural pattern recognition machine","volume":"37","author":"Carpenter","year":"1987","journal-title":"Computer Vision, Graphics, and Image Processing"},{"key":"2026033012320252900_ref035","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1145\/1015330.1015432","volume-title":"Ensemble selection from libraries of models","author":"Caruana","year":"2004"},{"issue":"9","key":"2026033012320252900_ref036","doi-asserted-by":"publisher","first-page":"3067","DOI":"10.1109\/TPAMI.2021.3062900","article-title":"Adaptation strategies for automated machine learning on evolving data","volume":"43","author":"Celik","year":"2021","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"issue":"3","key":"2026033012320252900_ref037","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1541880.1541882","article-title":"Anomaly detection: A survey","volume":"41","author":"Chandola","year":"2009","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"2026033012320252900_ref038","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1007\/978-3-319-71249-9_29","volume-title":"Speeding up hyperparameter optimization by extrapolation of learning curves using previous builds","author":"Chandrashekaran","year":"2017"},{"key":"2026033012320252900_ref039","volume-title":"Neural architecture search on imagenet in four gpu hours: A theoretically inspired perspective","author":"Chen","year":"2021"},{"key":"2026033012320252900_ref040","volume-title":"Net2Net: Accelerating learning via knowledge transfer","author":"Chen","year":"2016"},{"key":"2026033012320252900_ref041","first-page":"748","volume-title":"Learning to learn without gradient descent by gradient descent","author":"Chen","year":"2017"},{"issue":"1","key":"2026033012320252900_ref042","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1109\/JSSC.2016.2616357","article-title":"Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks","volume":"52","author":"Chen","year":"2016","journal-title":"IEEE Journal of Solid-State Circuits"},{"key":"2026033012320252900_ref043","doi-asserted-by":"crossref","DOI":"10.52202\/075280-2140","volume-title":"Symbolic discovery of optimization algorithms","author":"Chen","year":"2023"},{"key":"2026033012320252900_ref044","first-page":"578","volume-title":"TVM: An automated end-to- end optimizing compiler for deep learning","author":"Chen","year":"2018"},{"key":"2026033012320252900_ref045","volume-title":"Searching the search space of vision transformer","author":"Chen","year":"2021"},{"key":"2026033012320252900_ref046","first-page":"6642","volume-title":"DetNAS: Backbone search for object detection","author":"Chen","year":"2019"},{"key":"2026033012320252900_ref047","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1109\/DAC18074.2021.9586121","volume-title":"DANCE: Differentiable accelerator\/network co-exploration","author":"Choi","year":"2021"},{"key":"2026033012320252900_ref048","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1007\/978-3-642-40763-5_51","volume-title":"Mitosis detection in breast cancer histology images with deep neural networks","author":"Ciregan","year":"2013"},{"key":"2026033012320252900_ref049","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718768","volume-title":"Introduction to Derivative-Free Optimization","author":"Conn","year":"2009"},{"key":"2026033012320252900_ref050","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1109\/CVPR.2019.00020","volume-title":"AutoAugment: Learning augmentation strategies from data","author":"Cubuk","year":"2019"},{"key":"2026033012320252900_ref051","doi-asserted-by":"publisher","first-page":"702","DOI":"10.1109\/CVPRW50498.2020.00359","volume-title":"RandAugment: Practical automated data augmentation with a reduced search space","author":"Cubuk","year":"2020"},{"key":"2026033012320252900_ref052","first-page":"16276","volume-title":"FBNetV3: Joint architecture-recipe search using neural acquisition function","author":"Dai","year":"2021"},{"key":"2026033012320252900_ref053","doi-asserted-by":"publisher","first-page":"2633","DOI":"10.24963\/ijcai.2020\/365","volume-title":"Mixed-variable Bayesian optimization","author":"Daxberger","year":"2020"},{"key":"2026033012320252900_ref054","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1145\/2908961.2926973","volume-title":"Evolutionary computation: A unified approach","author":"De Jong","year":"2016"},{"key":"2026033012320252900_ref055","first-page":"730","volume-title":"Bayesian learning of neural network architectures","author":"Dikov","year":"2019"},{"key":"2026033012320252900_ref056","volume-title":"Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves","author":"Domhan","year":"2015"},{"key":"2026033012320252900_ref057","first-page":"318","volume-title":"Generic methods for optimization-based modeling","author":"Domke","year":"2012"},{"key":"2026033012320252900_ref058","doi-asserted-by":"publisher","first-page":"5840","DOI":"10.1109\/CVPR.2017.205","volume-title":"More is less: A more complicated network with less inference complexity","author":"Dong","year":"2017"},{"key":"2026033012320252900_ref059","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3054824","article-title":"NATS-Bench: Benchmarking NAS algorithms for architecture topology and size","volume-title":"IEEE Transactions on Pattern Analysis and Machine Intel ligence (TPAMI)","author":"Dong","year":"2021"},{"key":"2026033012320252900_ref060","volume-title":"AutoHAS: Efficient hyperparameter and architecture search","author":"Dong","year":"2021"},{"key":"2026033012320252900_ref061","first-page":"760","volume-title":"Network pruning via transformable architecture search","author":"Dong","year":"2019"},{"key":"2026033012320252900_ref062","doi-asserted-by":"publisher","first-page":"3681","DOI":"10.1109\/ICCV.2019.00378","volume-title":"One-shot neural architecture search via self-evaluated template network","author":"Dong","year":"2019"},{"key":"2026033012320252900_ref063","doi-asserted-by":"publisher","first-page":"1761","DOI":"10.1109\/CVPR.2019.00186","volume-title":"Searching for a robust neural architecture in four GPU hours","author":"Dong","year":"2019"},{"key":"2026033012320252900_ref064","doi-asserted-by":"publisher","first-page":"783","DOI":"10.1109\/ICCV.2019.00087","volume-title":"Teacher supervises students how to learn from partially labeled images for facial landmark detection","author":"Dong","year":"2019"},{"key":"2026033012320252900_ref065","volume-title":"NAS-bench-201: Extending the scope of reproducible neural architecture search","author":"Dong","year":"2020"},{"key":"2026033012320252900_ref066","first-page":"2895","volume-title":"Sequential scenario-specific meta learner for online recommendation","author":"Du","year":"2019"},{"key":"2026033012320252900_ref067","volume-title":"RL2: Fast reinforcement learning via slow reinforcement learning","author":"Duan","year":"2016"},{"key":"2026033012320252900_ref068","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.44494","article-title":"NetPyNE, a tool for data-driven multiscale modeling of brain circuits","volume":"8","author":"Dura-Bernal","year":"2019","journal-title":"eLife"},{"key":"2026033012320252900_ref069","first-page":"24","volume-title":"Surrogate benchmarks for hyperparameter optimization","author":"Eggensperger","year":"2014"},{"key":"2026033012320252900_ref070","volume-title":"HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO","author":"Eggensperger","year":"2021"},{"key":"2026033012320252900_ref071","doi-asserted-by":"publisher","first-page":"1668","DOI":"10.24963\/ijcai.2021\/230","volume-title":"DACBench: A benchmark library for dynamic algorithm configuration","author":"Eimer","year":"2021"},{"issue":"55","key":"2026033012320252900_ref072","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/978-3-030-05318-5_11","article-title":"Neural architecture search: A survey","volume":"20","author":"Elsken","year":"2019","journal-title":"Journal of Machine Learning Research (JMLR)"},{"key":"2026033012320252900_ref073","unstructured":"Epic Games\n           (2019). \u201cUnreal engine\u201d. URL: https:\/\/www.unrealengine.com."},{"key":"2026033012320252900_ref074","volume-title":"Autogluon-tabular: Robust and accurate automl for structured data","author":"Erickson","year":"2020"},{"key":"2026033012320252900_ref075","unstructured":"Facebook AI Research\n           (n.d.). \u201cPapers with code\u201d. URL: https:\/\/paperswithcode.com."},{"key":"2026033012320252900_ref076","unstructured":"Facebook Inc\n          . (2021). \u201cPyTorch V1.10.0\u201d. URL: https:\/\/github.com\/pytorch\/pytorch\/releases\/tag\/v1.10.0."},{"key":"2026033012320252900_ref077","first-page":"1437","volume-title":"BOHB: Robust and efficient hyperparameter optimization at scale","author":"Falkner","year":"2018"},{"key":"2026033012320252900_ref078","volume-title":"Learning to teach","author":"Fan","year":"2018"},{"key":"2026033012320252900_ref079","doi-asserted-by":"publisher","first-page":"10628","DOI":"10.1109\/CVPR42600.2020.01064","volume-title":"Densely connected search space for more flexible neural architecture search","author":"Fang","year":"2020"},{"key":"2026033012320252900_ref080","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-030-05318-5_1","volume-title":"Automated Machine Learning","author":"Feurer","year":"2019"},{"key":"2026033012320252900_ref081","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.194","volume-title":"Spatially adaptive computation time for residual networks","author":"Figurnov","year":"2017"},{"key":"2026033012320252900_ref082","first-page":"1126","volume-title":"Model-agnostic metalearning for fast adaptation of deep networks","author":"Finn","year":"2017"},{"key":"2026033012320252900_ref083","doi-asserted-by":"publisher","first-page":"3251","DOI":"10.1098\/rspa.2007.1900","article-title":"Multi-fidelity optimization via surrogate modelling","volume":"463","author":"Forrester","year":"2007","journal-title":"Mathematical, Physical and Engineering Sciences"},{"key":"2026033012320252900_ref084","volume-title":"The lottery ticket hypothesis: Finding sparse, trainable neural networks","author":"Frankle","year":"2018"},{"key":"2026033012320252900_ref085","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1016\/j.neunet.2013.01.001","article-title":"Training multi-layered neural network neocognitron","volume":"40","author":"Fukushima","year":"2013","journal-title":"Neural Networks"},{"key":"2026033012320252900_ref086","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1007\/978-3-642-46466-9_18","volume-title":"Competition and Cooperation in Neural Nets","author":"Fukushima","year":"1982"},{"issue":"4","key":"2026033012320252900_ref087","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2523813","article-title":"A survey on concept drift adaptation","volume":"46","author":"Gama","year":"2014","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"2026033012320252900_ref088","first-page":"1180","volume-title":"Unsupervised domain adaptation by backpropagation","author":"Ganin","year":"2015"},{"key":"2026033012320252900_ref089","first-page":"1371","volume-title":"Neural approaches to conversational AI","author":"Gao","year":"2018"},{"key":"2026033012320252900_ref090","volume-title":"Mirror Worlds: Or the Day Software Puts the Universe in a Shoebox\u2026 How It Will Happen and What it Will Mean","author":"Gelernter","year":"1993"},{"key":"2026033012320252900_ref091","doi-asserted-by":"publisher","first-page":"4760","DOI":"10.1609\/aaai.v31i2.19107","volume-title":"Automated data cleansing through meta-learning","author":"Gemp","year":"2017"},{"key":"2026033012320252900_ref092","volume-title":"HARK side of deep learning\u2013from grad student descent to automated machine learning","author":"Gencoglu","year":"2019"},{"key":"2026033012320252900_ref093","doi-asserted-by":"publisher","first-page":"7036","DOI":"10.1109\/CVPR.2019.00720","volume-title":"NAS-FPN: Learning scalable feature pyramid architecture for object detection","author":"Ghiasi","year":"2019"},{"issue":"6206","key":"2026033012320252900_ref094","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1126\/science.1259439","article-title":"Amplify scientific discovery with artificial intelligence","volume":"346","author":"Gil","year":"2014","journal-title":"Science"},{"key":"2026033012320252900_ref095","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1016\/B978-0-08-050684-5.50008-2","article-title":"A comparative analysis of selection schemes used in genetic algorithms","volume":"1","author":"Goldberg","year":"1991","journal-title":"Foundations of Genetic Algorithms"},{"key":"2026033012320252900_ref096","doi-asserted-by":"publisher","first-page":"1487","DOI":"10.1145\/3097983.3098043","volume-title":"Google vizier: A service for black-box optimization","author":"Golovin","year":"2017"},{"key":"2026033012320252900_ref097","volume-title":"Generative adversarial networks","author":"Goodfellow","year":"2014"},{"key":"2026033012320252900_ref098","unstructured":"Google\n           (2017). \u201cEdge TPU\u201d. URL: https:\/\/cloud.google.com\/edge-tpu."},{"issue":"1","key":"2026033012320252900_ref099","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1109\/TSMC.2020.3041476","article-title":"Toward autonomous adaptive intelligence: Building upon neural models of how brains make minds","volume":"51","author":"Grossberg","year":"2020","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics: Systems (TSMC)"},{"issue":"5","key":"2026033012320252900_ref100","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3236009","article-title":"A survey of methods for explaining black box models","volume":"51","author":"Guidotti","year":"2019","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"2026033012320252900_ref101","volume-title":"Textbooks are all you need","author":"Gunasekar","year":"2023"},{"key":"2026033012320252900_ref102","doi-asserted-by":"publisher","first-page":"544","DOI":"10.1007\/978-3-030-58517-4_32","volume-title":"Single path one-shot neural architecture search with uniform sampling","author":"Guo","year":"2020"},{"key":"2026033012320252900_ref103","volume-title":"Hypernetworks","author":"Ha","year":"2017"},{"key":"2026033012320252900_ref104","volume-title":"Dynamic neural networks: A survey","author":"Han","year":"2021"},{"issue":"3","key":"2026033012320252900_ref105","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1145\/3007787.3001163","article-title":"EIE: Efficient inference engine on compressed deep neural network","volume":"44","author":"Han","year":"2016","journal-title":"ACM SIGARCH Computer Architecture News"},{"key":"2026033012320252900_ref106","volume-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding","author":"Han","year":"2016"},{"issue":"2","key":"2026033012320252900_ref107","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1162\/106365601750190398","article-title":"Completely derandomized selfadaptation in evolution strategies","volume":"9","author":"Hansen","year":"2001","journal-title":"Evolutionary Computation"},{"key":"2026033012320252900_ref108","first-page":"2961","volume-title":"Mask r-cnn","author":"He","year":"2017"},{"key":"2026033012320252900_ref109","first-page":"173","volume-title":"Neural collaborative filtering","author":"He","year":"2017"},{"key":"2026033012320252900_ref110","first-page":"770","volume-title":"Deep residual learning for image recognition","author":"He","year":"2016"},{"key":"2026033012320252900_ref111","volume-title":"A baseline for detecting misclassified and out-of-distribution examples in neural networks","author":"Hendrycks","year":"2017"},{"key":"2026033012320252900_ref112","doi-asserted-by":"publisher","DOI":"10.1515\/9780691215518","volume-title":"The Self-Assembling Brain: How Neural Networks Grow Smarter","author":"Hiesinger","year":"2021"},{"key":"2026033012320252900_ref113","first-page":"177","volume-title":"Using fast weights to deblur old memories","author":"Hinton","year":"1987"},{"key":"2026033012320252900_ref114","volume-title":"Distilling the knowledge in a neural network","author":"Hinton","year":"2014"},{"issue":"8","key":"2026033012320252900_ref115","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2026033012320252900_ref116","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1007\/3-540-44668-0_13","volume-title":"Learning to learn using gradient descent","author":"Hochreiter","year":"2001"},{"key":"2026033012320252900_ref117","doi-asserted-by":"publisher","first-page":"1171","DOI":"10.1214\/009053607000000677","article-title":"Kernel methods in machine learning","volume-title":"The Annals of Statistics:","author":"Hofmann","year":"2008"},{"key":"2026033012320252900_ref118","doi-asserted-by":"publisher","first-page":"2554","DOI":"10.1073\/pnas.79.8.2554","volume-title":"Neural networks and physical systems with emergent collective computational abilities","author":"Hopfield","year":"1982"},{"issue":"5","key":"2026033012320252900_ref119","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1016\/0893-6080(89)90020-8","article-title":"Multilayer feedforward networks are universal approximators","volume":"2","author":"Hornik","year":"1989","journal-title":"Neural Networks"},{"key":"2026033012320252900_ref120","volume-title":"Evolved policy gradients","author":"Houthooft","year":"2018"},{"key":"2026033012320252900_ref121","volume-title":"Macro neural architecture search revisited","author":"Hu","year":"2018"},{"key":"2026033012320252900_ref122","doi-asserted-by":"publisher","first-page":"4700","DOI":"10.1109\/CVPR.2017.243","volume-title":"Densely connected convolutional networks","author":"Huang","year":"2017"},{"key":"2026033012320252900_ref123","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1007\/978-3-642-25566-3_40","volume-title":"Sequential modelbased optimization for general algorithm configuration","author":"Hutter","year":"2011"},{"key":"2026033012320252900_ref124","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1613\/jair.2861","article-title":"ParamILS: An automatic algorithm configuration framework","volume":"36","author":"Hutter","year":"2009","journal-title":"Journal of Artificial Intel ligence Research (JAIR)"},{"key":"2026033012320252900_ref125","volume-title":"Multiple Objective Decision Making\u2014Methods and Applications: A State-of-the-Art Survey","author":"Hwang","year":"2012"},{"key":"2026033012320252900_ref126","first-page":"448","volume-title":"Batch normalization: Accelerating deep network training by reducing internal covariate shift","author":"Ioffe","year":"2015"},{"key":"2026033012320252900_ref127","doi-asserted-by":"publisher","first-page":"364","DOI":"10.1109\/TSMC.1971.4308320","article-title":"Polynomial theory of complex systems","volume":"4","author":"Ivakhnenko","year":"1971","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics (SMC)"},{"key":"2026033012320252900_ref128","first-page":"8580","volume-title":"Neural tangent kernel: Convergence and generalization in neural networks","author":"Jacot","year":"2018"},{"key":"2026033012320252900_ref129","volume-title":"Population based training of neural networks","author":"Jaderberg","year":"2017"},{"key":"2026033012320252900_ref130","first-page":"240","volume-title":"Non-stochastic best arm identification and hyperparameter optimization","author":"Jamieson","year":"2016"},{"key":"2026033012320252900_ref131","volume-title":"Categorical reparameterization with gumbel-softmax","author":"Jang","year":"2017"},{"key":"2026033012320252900_ref132","first-page":"1655","volume-title":"Bayesian optimization with tree-structured dependencies","author":"Jenatton","year":"2017"},{"issue":"1","key":"2026033012320252900_ref133","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","article-title":"3D convolutional neural networks for human action recognition","volume":"35","author":"Ji","year":"2012","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"issue":"12","key":"2026033012320252900_ref134","doi-asserted-by":"publisher","first-page":"4805","DOI":"10.1109\/TCAD.2020.2986127","article-title":"Hardware\/software co-exploration of neural architectures","volume":"39","author":"Jiang","year":"2020","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)"},{"key":"2026033012320252900_ref135","first-page":"1946","volume-title":"Auto-keras: An efficient neural architecture search system","author":"Jin","year":"2019"},{"key":"2026033012320252900_ref136","volume-title":"Hyp-RL: Hyperparameter optimization by reinforcement learning","author":"Jomaa","year":"2019"},{"key":"2026033012320252900_ref137","first-page":"2342","volume-title":"An empirical exploration of recurrent network architectures","author":"Jozefowicz","year":"2015"},{"issue":"4","key":"2026033012320252900_ref138","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/s12293-009-0017-8","article-title":"Architecture for development of adaptive on-line prediction models","volume":"1","author":"Kadlec","year":"2009","journal-title":"Memetic Computing"},{"key":"2026033012320252900_ref139","first-page":"1000","volume-title":"Gaussian process bandit optimisation with multi-fidelity evaluations","author":"Kandasamy","year":"2016"},{"key":"2026033012320252900_ref140","volume-title":"Neural architecture search with Bayesian optimisation and optimal transport","author":"Kandasamy","year":"2018"},{"key":"2026033012320252900_ref141","first-page":"1238","volume-title":"Almost optimal exploration in multi-armed bandits","author":"Karnin","year":"2013"},{"key":"2026033012320252900_ref142","volume-title":"AutonoML: Towards an integrated framework for autonomous machine learning","author":"Kedziora","year":"2020"},{"key":"2026033012320252900_ref143","volume-title":"Adam: A method for stochastic optimization","author":"Kingma","year":"2015"},{"key":"2026033012320252900_ref144","doi-asserted-by":"publisher","first-page":"3521","DOI":"10.1073\/pnas.1611835114","volume-title":"Overcoming catastrophic forgetting in neural networks","author":"Kirkpatrick","year":"2017"},{"key":"2026033012320252900_ref145","volume-title":"Improving generalization in meta reinforcement learning using learned objectives","author":"Kirsch","year":"2020"},{"key":"2026033012320252900_ref146","volume-title":"Tabular benchmarks for joint architecture and hyperparameter optimization","author":"Klein","year":"2019"},{"key":"2026033012320252900_ref147","first-page":"4228","volume-title":"Learning active learning from data","author":"Konyushkova","year":"2017"},{"key":"2026033012320252900_ref148","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1016\/j.inffus.2017.02.004","article-title":"Ensemble learning for data stream analysis: A survey","volume":"37","author":"Krawczyk","year":"2017","journal-title":"Information Fusion"},{"key":"2026033012320252900_ref149","volume-title":"Learning Multiple Layers of Features from Tiny Images","author":"Krizhevsky","year":"2009"},{"key":"2026033012320252900_ref150","first-page":"1097","volume-title":"Imagenet classification with deep convolutional neural networks","author":"Krizhevsky","year":"2012"},{"key":"2026033012320252900_ref151","first-page":"62","volume-title":"Design and regularization of neural networks: The optimal use of a validation set","author":"Larsen","year":"1996"},{"key":"2026033012320252900_ref152","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-49430-8_6","article-title":"Adaptive regularization in neural network modeling","volume-title":"Neural Networks: Tricks of the Trade","author":"Larsen","year":"2002"},{"key":"2026033012320252900_ref153","doi-asserted-by":"publisher","DOI":"10.21236\/ADA451466","volume-title":"Winner-take-all networks of O(n) complexity","author":"Lazzaro","year":"1988"},{"key":"2026033012320252900_ref154","first-page":"599","volume-title":"A learning scheme for asymmetric threshold networks","author":"LeCun","year":"1985"},{"issue":"7553","key":"2026033012320252900_ref155","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"issue":"4","key":"2026033012320252900_ref156","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Computation"},{"issue":"12","key":"2026033012320252900_ref157","doi-asserted-by":"publisher","first-page":"2935","DOI":"10.1109\/TPAMI.2017.2773081","article-title":"Learning without forgetting","volume":"40","author":"Li","year":"2017","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"2026033012320252900_ref158","doi-asserted-by":"publisher","first-page":"580","DOI":"10.1007\/978-3-030-58542-6_35","volume-title":"DADA: Differentiable automatic data augmentation","author":"Li","year":"2020"},{"issue":"1","key":"2026033012320252900_ref159","first-page":"6765","article-title":"Hyperband: A novel bandit-based approach to hyperparameter optimization","volume":"18","author":"Li","year":"2018","journal-title":"Journal of Machine Learning Research (JMLR)"},{"key":"2026033012320252900_ref160","volume-title":"Full-cycle energy consumption benchmark for low-carbon computer vision","author":"Li","year":"2021"},{"key":"2026033012320252900_ref161","volume-title":"Learning to optimize","author":"Li","year":"2017"},{"key":"2026033012320252900_ref162","first-page":"367","article-title":"Random search and reproducibility for neural architecture search","volume-title":"Uncertainty in Artificial Intel ligence","author":"Li","year":"2020"},{"key":"2026033012320252900_ref163","volume-title":"HW-NAS-Bench: Hardware-aware neural architecture search benchmark","author":"Li","year":"2021"},{"key":"2026033012320252900_ref164","doi-asserted-by":"publisher","first-page":"1791","DOI":"10.1145\/3292500.3330649","volume-title":"A generalized framework for population based training","author":"Li","year":"2019"},{"key":"2026033012320252900_ref165","volume-title":"Fast autoaugment","author":"Lim","year":"2019"},{"issue":"243","key":"2026033012320252900_ref166","first-page":"1","article-title":"Best practices for scientific research on neural architecture search","volume":"21","author":"Lindauer","year":"2020","journal-title":"Journal of Machine Learning Research (JMLR)"},{"key":"2026033012320252900_ref167","first-page":"6","article-title":"The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors","volume-title":"Master\u2019s Thesis (in Finnish), Univ. Helsinki","author":"Linnainmaa","year":"1970"},{"issue":"1-2","key":"2026033012320252900_ref168","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1016\/0025-5564(74)90031-5","article-title":"The existence of persistent states in the brain","volume":"19","author":"Little","year":"1974","journal-title":"Mathematical Biosciences"},{"key":"2026033012320252900_ref169","volume-title":"Evolving normalization-activation layers","author":"Liu","year":"2020"},{"key":"2026033012320252900_ref170","first-page":"82","volume-title":"Auto-DeepLab: Hierarchical neural architecture search for semantic image segmentation","author":"Liu","year":"2019"},{"key":"2026033012320252900_ref171","volume-title":"Hierarchical representations for efficient architecture search","author":"Liu","year":"2018"},{"key":"2026033012320252900_ref172","volume-title":"DARTS: Differentiable Architecture Search","author":"Liu","year":"2019"},{"key":"2026033012320252900_ref173","doi-asserted-by":"publisher","first-page":"806","DOI":"10.1109\/CVPR.2015.7298681","volume-title":"Sparse convolutional neural networks","author":"Liu","year":"2015"},{"key":"2026033012320252900_ref174","volume-title":"Isometric propagation network for generalized zero-shot learning","author":"Liu","year":"2021"},{"key":"2026033012320252900_ref175","volume-title":"Learning to propagate for graph meta-learning","author":"Liu","year":"2019"},{"key":"2026033012320252900_ref176","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1007\/978-3-030-01246-5_2","volume-title":"Progressive neural architecture search","author":"Liu","year":"2018"},{"key":"2026033012320252900_ref177","first-page":"1540","volume-title":"Optimizing millions of hyperparameters by implicit differentiation","author":"Lorraine","year":"2020"},{"key":"2026033012320252900_ref178","volume-title":"CMA-ES for hyperparameter optimization of deep neural networks","author":"Loshchilov","year":"2016"},{"key":"2026033012320252900_ref179","first-page":"2952","volume-title":"Scalable gradient-based tuning of continuous regularization hyperparameters","author":"Luketina","year":"2016"},{"key":"2026033012320252900_ref180","unstructured":"Ma, E.\n           (2019). \u201cNLP Augmentation\u201d. URL: https:\/\/github.com\/makcedward\/nlpaug."},{"key":"2026033012320252900_ref181","volume-title":"Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions","author":"MacKay","year":"2019"},{"key":"2026033012320252900_ref182","first-page":"2113","volume-title":"Gradient-based hyperparameter optimization through reversible learning","author":"Maclaurin","year":"2015"},{"key":"2026033012320252900_ref183","doi-asserted-by":"publisher","DOI":"10.52591\/lxai201812039","volume-title":"Towards AutoML in the presence of drift: First results","author":"Madrid","year":"2018"},{"issue":"6","key":"2026033012320252900_ref184","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1007\/s00158-003-0368-6","article-title":"Survey of multi-objective optimization methods for engineering","volume":"26","author":"Marler","year":"2004","journal-title":"Structural and Multidisciplinary Optimization"},{"key":"2026033012320252900_ref185","article-title":"Software architecture best practices for enterprise artificial intelligence","volume-title":"INFORMATIK","author":"Martel","year":"2021"},{"key":"2026033012320252900_ref186","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1016\/S0079-7421(08)60536-8","volume-title":"Psychology of Learning and Motivation","author":"McCloskey","year":"1989"},{"issue":"4","key":"2026033012320252900_ref187","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1007\/BF02478259","article-title":"A logical calculus of the ideas immanent in nervous activity","volume":"5","author":"McCulloch","year":"1943","journal-title":"The Bul letin of Mathematical Biophysics"},{"key":"2026033012320252900_ref188","volume-title":"Neural architecture search without training","author":"Mellor","year":"2021"},{"issue":"1994","key":"2026033012320252900_ref189","first-page":"1","article-title":"Machine learning","volume":"13","author":"Michie","year":"1994","journal-title":"Neural and Statistical Classification"},{"key":"2026033012320252900_ref190","unstructured":"Microsoft Corporation\n           (2018). \u201cNeural Network Intelligence\u201d. URL: https:\/\/github.com\/microsoft\/nni."},{"key":"2026033012320252900_ref191","volume-title":"End-to-end training of differentiable pipelines across machine learning frameworks","author":"Milutinovic","year":"2017"},{"key":"2026033012320252900_ref192","volume-title":"A hierarchical model for device placement","author":"Mirhoseini","year":"2018"},{"issue":"7862","key":"2026033012320252900_ref193","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1038\/s41586-021-03544-w","article-title":"A graph placement methodology for fast chip design","volume":"594","author":"Mirhoseini","year":"2021","journal-title":"Nature"},{"key":"2026033012320252900_ref194","first-page":"2430","volume-title":"Device placement optimization with reinforcement learning","author":"Mirhoseini","year":"2017"},{"key":"2026033012320252900_ref195","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1007\/978-3-662-38527-2_55","volume-title":"On Bayesian methods for seeking the extremum","author":"Mo\u010dkus","year":"1975"},{"key":"2026033012320252900_ref196","doi-asserted-by":"publisher","first-page":"347","DOI":"10.1109\/MASCOTS.2019.00045","volume-title":"Practical design space exploration","author":"Nardi","year":"2019"},{"key":"2026033012320252900_ref197","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN52387.2021.9533431","volume-title":"Exploring opportunistic meta-knowledge to reduce search spaces for automated machine learning","author":"Nguyen","year":"2021"},{"key":"2026033012320252900_ref198","doi-asserted-by":"publisher","first-page":"1317","DOI":"10.18653\/v1\/D19-1132","volume-title":"Automatically learning data augmentation policies for dialogue tasks","author":"Niu","year":"2019"},{"key":"2026033012320252900_ref199","doi-asserted-by":"publisher","first-page":"5206","DOI":"10.1109\/ICASSP.2015.7178964","volume-title":"Lib-rispeech: An asr corpus based on public domain audio books","author":"Panayotov","year":"2015"},{"key":"2026033012320252900_ref200","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1016\/j.neunet.2019.01.012","article-title":"Continual lifelong learning with neural networks: A review","volume":"113","author":"Parisi","year":"2019","journal-title":"Neural Networks"},{"key":"2026033012320252900_ref201","doi-asserted-by":"publisher","DOI":"10.5244\/C.29.41","volume-title":"Deep face recognition","author":"Parkhi","year":"2015"},{"key":"2026033012320252900_ref202","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/ICCAD45719.2019.8942046","volume-title":"PABO: Pseudo agent-based multi-objective bayesian hyperparameter optimization for efficient neural accelerator design","author":"Parsa","year":"2019"},{"key":"2026033012320252900_ref203","volume-title":"Carbon emissions and large neural network training","author":"Patterson","year":"2021"},{"key":"2026033012320252900_ref204","first-page":"737","volume-title":"Hyperparameter optimization with approximate gradient","author":"Pedregosa","year":"2016"},{"key":"2026033012320252900_ref205","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3190508.3190517","volume-title":"Optimus: An efficient dynamic resource scheduler for deep learning clusters","author":"Peng","year":"2018"},{"key":"2026033012320252900_ref206","volume-title":"PyGlove: Symbolic programming for automated machine learning","author":"Peng","year":"2020"},{"key":"2026033012320252900_ref207","first-page":"4095","volume-title":"Efficient neural architecture search via parameters sharing","author":"Pham","year":"2018"},{"key":"2026033012320252900_ref208","first-page":"22","article-title":"Improving reproducibility in machine learning research: A report from the NeurIPS 2019 reproducibility program","volume-title":"Journal of Machine Learning Research (JMLR)","author":"Pineau","year":"2021"},{"key":"2026033012320252900_ref209","first-page":"527","volume-title":"Implications of recursive distributed representations","author":"Pollack","year":"1989"},{"issue":"1","key":"2026033012320252900_ref210","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1007\/BF00116251","article-title":"Induction of decision trees","volume":"1","author":"Quinlan","year":"1986","journal-title":"Machine Learning"},{"key":"2026033012320252900_ref211","doi-asserted-by":"publisher","first-page":"10428","DOI":"10.1109\/CVPR42600.2020.01044","volume-title":"Designing network design spaces","author":"Radosavovic","year":"2020"},{"key":"2026033012320252900_ref212","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264","volume-title":"Squad: 100,000+ questions for machine comprehension of text","author":"Rajpurkar","year":"2016"},{"key":"2026033012320252900_ref213","volume-title":"Searching for activation functions","author":"Ramachandran","year":"2017"},{"key":"2026033012320252900_ref214","volume-title":"Optimization as a model for few-shot learning","author":"Ravi","year":"2017"},{"key":"2026033012320252900_ref215","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/ISLPED.2017.8009208","volume-title":"A case for efficient accelerator design space exploration via bayesian optimization","author":"Reagen","year":"2017"},{"key":"2026033012320252900_ref216","doi-asserted-by":"publisher","first-page":"4780","DOI":"10.1609\/aaai.v33i01.33014780","volume-title":"Regularized evolution for image classifier architecture search","author":"Real","year":"2019"},{"key":"2026033012320252900_ref217","first-page":"8007","volume-title":"AutoML-zero: Evolving machine learning algorithms from scratch","author":"Real","year":"2020"},{"key":"2026033012320252900_ref218","first-page":"2902","volume-title":"Large-scale evolution of image classifiers","author":"Real","year":"2017"},{"issue":"4","key":"2026033012320252900_ref219","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3447582","article-title":"A comprehensive survey of neural architecture search: Challenges and solutions","volume":"54","author":"Ren","year":"2021","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"2026033012320252900_ref220","first-page":"4334","volume-title":"Learning to reweight examples for robust deep learning","author":"Ren","year":"2018"},{"key":"2026033012320252900_ref221","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/978-0-387-85820-3_1","volume-title":"Recommender Systems Handbook","author":"Ricci","year":"2011"},{"key":"2026033012320252900_ref222","first-page":"586","volume-title":"A direct adaptive method for faster backpropagation learning: The RPROP algorithm","author":"Riedmiller","year":"1993"},{"key":"2026033012320252900_ref223","first-page":"2152","volume-title":"An embarrassingly simple approach to zero-shot learning","author":"Romera-Paredes","year":"2015"},{"key":"2026033012320252900_ref224","first-page":"8276","volume-title":"Bayesian optimisation over multiple continuous and categorical inputs","author":"Ru","year":"2020"},{"key":"2026033012320252900_ref225","volume-title":"Neural architecture generator optimization","author":"Ru","year":"2020"},{"key":"2026033012320252900_ref226","doi-asserted-by":"publisher","DOI":"10.21236\/ADA164453","article-title":"Learning internal representations by error propagation","volume-title":"Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science","author":"Rumelhart","year":"1985"},{"issue":"3","key":"2026033012320252900_ref227","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"International Journal of Computer Vision (IJCV)"},{"key":"2026033012320252900_ref228","volume-title":"Progressive neural networks","author":"Rusu","year":"2016"},{"issue":"3","key":"2026033012320252900_ref229","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1109\/TKDE.2010.137","article-title":"A generic multilevel architecture for time series prediction","volume":"23","author":"Ruta","year":"2011","journal-title":"IEEE Transactions on Know ledge and Data Engineering (TKDE)"},{"key":"2026033012320252900_ref230","volume-title":"Evolution strategies as a scalable alternative to reinforcement learning","author":"Salimans","year":"2017"},{"key":"2026033012320252900_ref231","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1007\/978-3-319-32034-2_3","volume-title":"Towards automatic composition of multicomponent predictive systems","author":"Salvador","year":"2016"},{"issue":"2","key":"2026033012320252900_ref232","doi-asserted-by":"publisher","first-page":"946","DOI":"10.1109\/TASE.2018.2876430","article-title":"Automatic composition and optimization of multicomponent predictive systems with an extended auto-WEKA","volume":"16","author":"Salvador","year":"2019","journal-title":"IEEE Transactions on Automation Science and Engineering (TASE)"},{"key":"2026033012320252900_ref233","doi-asserted-by":"publisher","first-page":"4510","DOI":"10.1109\/CVPR.2018.00474","volume-title":"Mobilenetv2: Inverted residuals and linear bottlenecks","author":"Sandler","year":"2018"},{"issue":"6","key":"2026033012320252900_ref234","doi-asserted-by":"publisher","first-page":"4650","DOI":"10.4249\/scholarpedia.4650","article-title":"Metalearning","volume":"5","author":"Schaul","year":"2010","journal-title":"Scholarpedia"},{"key":"2026033012320252900_ref235","volume-title":"Evolutionary principles in self-referential learning","author":"Schmidhuber","year":"1987"},{"issue":"1","key":"2026033012320252900_ref236","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1162\/neco.1992.4.1.131","article-title":"Learning to control fast-weight memories: An alternative to dynamic recurrent networks","volume":"4","author":"Schmidhuber","year":"1992","journal-title":"Neural Computation"},{"key":"2026033012320252900_ref237","volume-title":"Bias-optimal incremental problem solving","author":"Schmidhuber","year":"2002"},{"key":"2026033012320252900_ref238","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","article-title":"Deep learning in neural networks: An overview","volume":"61","author":"Schmidhuber","year":"2015","journal-title":"Neural Networks"},{"key":"2026033012320252900_ref239","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1007\/978-1-4615-5529-2_12","volume-title":"Learning to Learn","author":"Schmidhuber","year":"1998"},{"key":"2026033012320252900_ref240","volume-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017"},{"key":"2026033012320252900_ref241","first-page":"2503","volume-title":"Hidden technical debt in machine learning systems","author":"Sculley","year":"2015"},{"key":"2026033012320252900_ref242","volume-title":"Python Imaging Library (PIL)","author":"Secret Labs AB Lundh, F., A. Clark Other Contributors","year":"2011"},{"key":"2026033012320252900_ref243","first-page":"1723","volume-title":"Truncated back-propagation for bilevel optimization","author":"Shaban","year":"2019"},{"issue":"1","key":"2026033012320252900_ref244","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0197-0","article-title":"A survey on image data augmentation for deep learning","volume":"6","author":"Shorten","year":"2019","journal-title":"Journal of Big Data"},{"key":"2026033012320252900_ref245","volume-title":"Meta-weight-net: Learning an explicit mapping for sample weighting","author":"Shu","year":"2019"},{"key":"2026033012320252900_ref246","volume-title":"NAS-Bench-301 and the case for surrogate benchmarks for neural architecture search","author":"Siems","year":"2020"},{"issue":"7587","key":"2026033012320252900_ref247","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"2026033012320252900_ref248","volume-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan","year":"2015"},{"key":"2026033012320252900_ref249","volume-title":"Don\u2019t decay the learning rate, increase the batch size","author":"Smith","year":"2018"},{"issue":"1","key":"2026033012320252900_ref250","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1456650.1456656","article-title":"Cross-disciplinary perspectives on metalearning for algorithm selection","volume":"41","author":"Smith-Miles","year":"2009","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"2026033012320252900_ref251","first-page":"4077","volume-title":"Prototypical networks for few-shot learning","author":"Snell","year":"2017"},{"key":"2026033012320252900_ref252","first-page":"2951","volume-title":"Practical Bayesian optimization of machine learning algorithms","author":"Snoek","year":"2012"},{"key":"2026033012320252900_ref253","first-page":"5877","volume-title":"The evolved transformer","author":"So","year":"2019"},{"key":"2026033012320252900_ref254","volume-title":"Primer: Searching for efficient transformers for language modeling","author":"So","year":"2021"},{"key":"2026033012320252900_ref255","volume-title":"Automated machine learning in action","author":"Song","year":"2022"},{"key":"2026033012320252900_ref256","volume-title":"Memory-based parameter adaptation","author":"Sprechmann","year":"2018"},{"key":"2026033012320252900_ref257","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1355","volume-title":"Energy and policy considerations for deep learning in NLP","author":"Strubell","year":"2019"},{"key":"2026033012320252900_ref258","first-page":"9206","volume-title":"Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data","author":"Such","year":"2020"},{"key":"2026033012320252900_ref259","first-page":"171","volume-title":"Adapting bias by gradient descent: An incremental version of delta-bar-delta","author":"Sutton","year":"1992"},{"key":"2026033012320252900_ref260","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"2018"},{"key":"2026033012320252900_ref261","volume-title":"Equivalence in deep neural networks via conjugate matrix ensembles","author":"S\u00fczen","year":"2020"},{"key":"2026033012320252900_ref262","volume-title":"Periodic spectral ergodicity: A complexity measure for deep neural networks and neural architecture search","author":"S\u00fczen","year":"2019"},{"key":"2026033012320252900_ref263","volume-title":"Freeze-thaw Bayesian optimization","author":"Swersky","year":"2014"},{"key":"2026033012320252900_ref264","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/CVPR.2015.7298594","volume-title":"Going deeper with convolutions","author":"Szegedy","year":"2015"},{"key":"2026033012320252900_ref265","first-page":"2820","volume-title":"MnasNet: Platform-aware neural architecture search for mobile","author":"Tan","year":"2019"},{"key":"2026033012320252900_ref266","first-page":"10096","volume-title":"EfficientNetV2: Smaller models and faster training","author":"Tan","year":"2021"},{"key":"2026033012320252900_ref267","doi-asserted-by":"publisher","first-page":"847","DOI":"10.1145\/2487575.2487629","volume-title":"Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms","author":"Thornton","year":"2013"},{"issue":"1-2","key":"2026033012320252900_ref268","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/0921-8890(95)00004-Y","article-title":"Lifelong robot learning","volume":"15","author":"Thrun","year":"1995","journal-title":"Robotics and Autonomous Systems"},{"key":"2026033012320252900_ref269","first-page":"6000","volume-title":"Attention is all you need","author":"Vaswani","year":"2017"},{"key":"2026033012320252900_ref270","volume-title":"Discovery of useful questions as auxiliary tasks","author":"Veeriah","year":"2019"},{"key":"2026033012320252900_ref271","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1145\/1390156.1390294","volume-title":"Extracting and composing robust features with denoising autoencoders","author":"Vincent","year":"2008"},{"key":"2026033012320252900_ref272","doi-asserted-by":"publisher","first-page":"12965","DOI":"10.1109\/CVPR42600.2020.01298","volume-title":"FBNetV2: Differentiable neural architecture search for spatial and channel dimensions","author":"Wan","year":"2020"},{"key":"2026033012320252900_ref273","article-title":"Learning to reinforcement learn","volume-title":"Cognitive Science Society (CogSci)","author":"Wang","year":"2017"},{"key":"2026033012320252900_ref274","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2021\/628","volume-title":"Generalizing to unseen domains: A survey on domain generalization","author":"Wang","year":"2021"},{"key":"2026033012320252900_ref275","doi-asserted-by":"publisher","first-page":"8612","DOI":"10.1109\/CVPR.2019.00881","volume-title":"HAQ: Hardwareaware automated quantization with mixed precision","author":"Wang","year":"2019"},{"key":"2026033012320252900_ref276","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3290605.3300831","volume-title":"Designing theory-driven user-centric explainable AI","author":"Wang","year":"2019"},{"key":"2026033012320252900_ref277","doi-asserted-by":"publisher","first-page":"2078","DOI":"10.1109\/CVPR42600.2020.00215","volume-title":"APQ: Joint search for network architecture, pruning and quantization policy","author":"Wang","year":"2020"},{"key":"2026033012320252900_ref278","volume-title":"Dataset distillation","author":"Wang","year":"2018"},{"key":"2026033012320252900_ref279","volume-title":"Learning from delayed rewards","author":"Watkins","year":"1989"},{"key":"2026033012320252900_ref280","first-page":"1786","volume-title":"Results of the time series prediction competition at the Santa Fe Institute","author":"Weigend","year":"1993"},{"key":"2026033012320252900_ref281","doi-asserted-by":"publisher","first-page":"660","DOI":"10.1007\/978-3-030-58526-6_39","volume-title":"Neural predictor for neural architecture search","author":"Wen","year":"2020"},{"key":"2026033012320252900_ref282","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i12.17233","volume-title":"BANANAS: Bayesian optimization with neural architectures for neural architecture search","author":"White","year":"2021"},{"key":"2026033012320252900_ref283","first-page":"557","volume-title":"A greedy approach to adapting the trace parameter for temporal difference learning","author":"White","year":"2016"},{"issue":"3-4","key":"2026033012320252900_ref284","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Machine Learning"},{"key":"2026033012320252900_ref285","volume-title":"A survey on neural architecture search","author":"Wistuba","year":"2019"},{"issue":"1","key":"2026033012320252900_ref286","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1109\/4235.585893","article-title":"No free lunch theorems for optimization","volume":"1","author":"Wolpert","year":"1997","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"2026033012320252900_ref287","first-page":"10734","volume-title":"Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search","author":"Wu","year":"2019"},{"key":"2026033012320252900_ref288","volume-title":"Mixed precision quantization of convnets via differentiable neural architecture search","author":"Wu","year":"2018"},{"issue":"9","key":"2026033012320252900_ref289","doi-asserted-by":"publisher","first-page":"2251","DOI":"10.1109\/TPAMI.2018.2857768","article-title":"Zero-shot learning\u2014A comprehensive evaluation of the good, the bad and the ugly","volume":"41","author":"Xian","year":"2018","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"2026033012320252900_ref290","first-page":"1284","volume-title":"Exploring randomly wired neural networks for image recognition","author":"Xie","year":"2019"},{"key":"2026033012320252900_ref291","volume-title":"SNAS: Stochastic neural architecture search","author":"Xie","year":"2019"},{"key":"2026033012320252900_ref292","first-page":"10514","volume-title":"On the number of linear regions of convolutional neural networks","author":"Xiong","year":"2020"},{"key":"2026033012320252900_ref293","doi-asserted-by":"publisher","first-page":"1901","DOI":"10.1109\/ICCV.2019.00199","article-title":"Resource constrained neural network architecture search: Will a submodularity assumption help?","author":"Xiong","year":"2019"},{"key":"2026033012320252900_ref294","volume-title":"Meta-gradient reinforcement learning with an objective discovered online","author":"Xu","year":"2020"},{"key":"2026033012320252900_ref295","volume-title":"Meta-gradient reinforcement learning","author":"Xu","year":"2018"},{"key":"2026033012320252900_ref296","volume-title":"PC-DARTS: Partial channel connections for memoryefficient architecture search","author":"Xu","year":"2020"},{"key":"2026033012320252900_ref297","volume-title":"NAS-bench-x11 and the power of learning curves","author":"Yan","year":"2021"},{"key":"2026033012320252900_ref298","doi-asserted-by":"publisher","first-page":"5687","DOI":"10.1109\/CVPR.2017.643","volume-title":"Designing energy-efficient convolutional neural networks using energy-aware pruning","author":"Yang","year":"2017"},{"key":"2026033012320252900_ref299","volume-title":"Detecting human actions in surveillance videos","author":"Yang","year":"2009"},{"key":"2026033012320252900_ref300","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1007\/978-3-030-01249-6_18","volume-title":"NetAdapt: Platform-aware neural network adaptation for mobile applications","author":"Yang","year":"2018"},{"key":"2026033012320252900_ref301","volume-title":"Qlib: An AI-oriented quantitative investment platform","author":"Yang","year":"2020"},{"key":"2026033012320252900_ref302","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1016\/j.neucom.2020.07.061","article-title":"On hyperparameter optimization of machine learning algorithms: Theory and practice","volume":"415","author":"Yang","year":"2020","journal-title":"Neurocomputing"},{"key":"2026033012320252900_ref303","volume-title":"Searching for low-bit weights in quantized neural networks","author":"Yang","year":"2020"},{"key":"2026033012320252900_ref304","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/DAC18072.2020.9218676","volume-title":"Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks","author":"Yang","year":"2020"},{"key":"2026033012320252900_ref305","volume-title":"Enumerating unique computational graphs via an iterative graph invariant","author":"Ying","year":"2019"},{"key":"2026033012320252900_ref306","first-page":"7105","volume-title":"NAS-Bench-101: Towards reproducible neural architecture search","author":"Ying","year":"2019"},{"key":"2026033012320252900_ref307","doi-asserted-by":"publisher","first-page":"702","DOI":"10.1007\/978-3-030-58571-6_41","volume-title":"BigNAS: Scaling up neural architecture search with big single-stage models","author":"Yu","year":"2020"},{"key":"2026033012320252900_ref308","volume-title":"Evaluating the search phase of neural architecture search","author":"Yu","year":"2020"},{"key":"2026033012320252900_ref309","volume-title":"Slimmable neural networks","author":"Yu","year":"2019"},{"key":"2026033012320252900_ref310","volume-title":"Hyper-parameter optimization: A review of algorithms and applications","author":"Yu","year":"2020"},{"key":"2026033012320252900_ref311","volume-title":"Neural ensemble search for performant and calibrated predictions","author":"Zaidi","year":"2020"},{"key":"2026033012320252900_ref312","volume-title":"Understanding and robustifying differentiable architecture search","author":"Zela","year":"2020"},{"key":"2026033012320252900_ref313","volume-title":"Towards automated deep learning: Efficient joint neural architecture and hyperparameter search","author":"Zela","year":"2018"},{"key":"2026033012320252900_ref314","volume-title":"NAS-Bench-1Shot1: Benchmarking and dissecting one-shot neural architecture search","author":"Zela","year":"2020"},{"key":"2026033012320252900_ref315","first-page":"1114","volume-title":"A reinforcement learning approach to job-shop scheduling","author":"Zhang","year":"1995"},{"key":"2026033012320252900_ref316","first-page":"919","volume-title":"Retiarii: A deep learning exploratory-training framework","author":"Zhang","year":"2020"},{"key":"2026033012320252900_ref317","doi-asserted-by":"publisher","first-page":"2423","DOI":"10.1109\/CVPR.2018.00257","volume-title":"Practical block-wise neural network architecture generation","author":"Zhong","year":"2018"},{"key":"2026033012320252900_ref318","volume-title":"Rethinking co-design of neural architectures and hardware accelerators","author":"Zhou","year":"2022"},{"key":"2026033012320252900_ref319","first-page":"2223","volume-title":"Unpaired image-to-image translation using cycle-consistent adversarial networks","author":"Zhu","year":"2017"},{"key":"2026033012320252900_ref320","doi-asserted-by":"publisher","first-page":"1249","DOI":"10.1145\/1553374.1553534","volume-title":"Multi-instance learning by treating instances as non-iid samples","author":"Zhou","year":"2009"},{"key":"2026033012320252900_ref321","first-page":"7603","volume-title":"BayesNAS: A Bayesian approach for neural architecture search","author":"Zhou","year":"2019"},{"key":"2026033012320252900_ref322","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3067763","article-title":"Auto-PyTorch tabular: Multi-fidelity metalearning for efficient and robust AutoDL","volume-title":"IEEE Transactions on Pattern Analysis and Machine Intel ligence (TPAMI)","author":"Zimmer","year":"2021"},{"issue":"1","key":"2026033012320252900_ref323","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1145\/2408736.2408746","article-title":"Next challenges for adaptive learning systems","volume":"14","author":"Zliobaite","year":"2012","journal-title":"ACM SIGKDD Explorations Newsletter"},{"issue":"2","key":"2026033012320252900_ref324","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1109\/TKDE.2012.147","article-title":"Adaptive preprocessing for streaming data","volume":"26","author":"Zliobaite","year":"2014","journal-title":"IEEE Transactions on Know ledge and Data Engineering (TKDE)"},{"key":"2026033012320252900_ref325","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1007\/978-3-030-58583-9_34","volume-title":"Learning data augmentation strategies for object detection","author":"Zoph","year":"2020"},{"key":"2026033012320252900_ref326","volume-title":"Neural architecture search with reinforcement learning","author":"Zoph","year":"2017"},{"key":"2026033012320252900_ref327","first-page":"8697","volume-title":"Learning transferable architectures for scalable image recognition","author":"Zoph","year":"2018"}],"container-title":["Foundations and Trends\u00ae in Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/17\/5\/767\/11154374\/2200000119en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/17\/5\/767\/11154374\/2200000119en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T16:33:15Z","timestamp":1774888395000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftmal\/article\/17\/5\/767\/1332397\/Automated-Deep-Learning-Neural-Architecture-Search"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,27]]},"references-count":327,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,2,27]]}},"URL":"https:\/\/doi.org\/10.1561\/2200000119","relation":{},"ISSN":["1935-8237","1935-8245"],"issn-type":[{"value":"1935-8237","type":"print"},{"value":"1935-8245","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,27]]}}}