{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T19:47:05Z","timestamp":1770493625482,"version":"3.49.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:p>Automated machine learning (AutoML) frameworks have become important tools in the data scientist's arsenal, as they dramatically reduce the manual work devoted to the construction of ML pipelines. Such frameworks intelligently search among millions of possible ML pipelines - typically containing feature engineering, model selection, and hyper parameters tuning steps - and finally output an optimal pipeline in terms of predictive accuracy.<\/jats:p>\n          <jats:p>However, when the dataset is large, each individual configuration takes longer to execute, therefore the overall AutoML running times become increasingly high.<\/jats:p>\n          <jats:p>\n            To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative data\n            <jats:italic>subset<\/jats:italic>\n            that preserves a particular characteristic of the full data. It then employs the AutoML tool on the small subset, and finally, it refines the resulting pipeline by executing a restricted, much shorter, AutoML process on the large dataset. Our experimental results, performed on three popular AutoML frameworks, Auto-Sklearn, TPOT, and H2O show that SubStrat reduces their running times by 76.3% (on average), with only a 4.15% average decrease in the accuracy of the resulting ML pipeline.\n          <\/jats:p>","DOI":"10.14778\/3574245.3574261","type":"journal-article","created":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T23:14:12Z","timestamp":1677021252000},"page":"772-780","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["SubStrat"],"prefix":"10.14778","volume":"16","author":[{"given":"Teddy","family":"Lazebnik","sequence":"first","affiliation":[{"name":"University College London"}]},{"given":"Amit","family":"Somech","sequence":"additional","affiliation":[{"name":"Bar-Ilan University"}]},{"given":"Abraham Itzhak","family":"Weinberg","sequence":"additional","affiliation":[{"name":"Bar-Ilan University"}]}],"member":"320","published-online":{"date-parts":[[2023,2,21]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1007\/s00778-015-0389-y","article-title":"Profiling relational data: a survey","volume":"24","author":"Abedjan Ziawasch","year":"2015","unstructured":"Ziawasch Abedjan , Lukasz Golab , and Felix Naumann . 2015 . Profiling relational data: a survey . The VLDB Journal 24 , 4 (2015), 557 -- 581 . Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. 2015. Profiling relational data: a survey. The VLDB Journal 24, 4 (2015), 557--581.","journal-title":"The VLDB Journal"},{"key":"e_1_2_1_2_1","volume-title":"Openml benchmarking suites. arXiv preprint arXiv:1708.03731","author":"Bischl Bernd","year":"2017","unstructured":"Bernd Bischl , Giuseppe Casalicchio , Matthias Feurer , Frank Hutter , Michel Lang , Rafael G Mantovani , Jan N van Rijn , and Joaquin Vanschoren . 2017. Openml benchmarking suites. arXiv preprint arXiv:1708.03731 ( 2017 ). Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Frank Hutter, Michel Lang, Rafael G Mantovani, Jan N van Rijn, and Joaquin Vanschoren. 2017. Openml benchmarking suites. arXiv preprint arXiv:1708.03731 (2017)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","article-title":"Variational inference: A review for statisticians","volume":"112","author":"Blei David M","year":"2017","unstructured":"David M Blei , Alp Kucukelbir , and Jon D McAuliffe . 2017 . Variational inference: A review for statisticians . Journal of the American statistical Association 112 , 518 (2017), 859 -- 877 . David M Blei, Alp Kucukelbir, and Jon D McAuliffe. 2017. Variational inference: A review for statisticians. Journal of the American statistical Association 112, 518 (2017), 859--877.","journal-title":"Journal of the American statistical Association"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1016\/j.rcim.2005.04.001","article-title":"Optimization of process route by genetic algorithms","volume":"22","author":"Bo Z. W.","year":"2006","unstructured":"Z. W. Bo , L. Z. Hua , and Z. G. Yu . 2006 . Optimization of process route by genetic algorithms . Robotics and Computer-Integrated Manufacturing 22 (2006), 180 -- 188 . Z. W. Bo, L. Z. Hua, and Z. G. Yu. 2006. Optimization of process route by genetic algorithms. Robotics and Computer-Integrated Manufacturing 22 (2006), 180--188.","journal-title":"Robotics and Computer-Integrated Manufacturing"},{"key":"e_1_2_1_5_1","volume-title":"The Conference on Innovative Data Systems Research (CIDR).","author":"Boehm Matthias","year":"2020","unstructured":"Matthias Boehm , Iulian Antonov , Sebastian Baunsgaard , Mark Dokter , Robert Ginth\u00f6r , Kevin Innerebner , Florijan Klezin , Stefanie Lindstaedt , Arnab Phani , Benjamin Rath , 2020 . SystemDS: A declarative machine learning system for the end-to-end data science lifecycle . The Conference on Innovative Data Systems Research (CIDR). Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginth\u00f6r, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, Benjamin Rath, et al. 2020. SystemDS: A declarative machine learning system for the end-to-end data science lifecycle. The Conference on Innovative Data Systems Research (CIDR)."},{"key":"e_1_2_1_6_1","volume-title":"International Conference on Modeling Decisions for Artificial Intelligence. Springer, 457--468","author":"Castiello Ciro","year":"2005","unstructured":"Ciro Castiello , Giovanna Castellano , and Anna Maria Fanelli . 2005 . Meta-data: Characterization of input features for meta-learning . In International Conference on Modeling Decisions for Artificial Intelligence. Springer, 457--468 . Ciro Castiello, Giovanna Castellano, and Anna Maria Fanelli. 2005. Meta-data: Characterization of input features for meta-learning. In International Conference on Modeling Decisions for Artificial Intelligence. Springer, 457--468."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.compeleceng.2013.11.024","article-title":"A survey on feature selection methods","volume":"40","author":"Chandrashekar Girish","year":"2014","unstructured":"Girish Chandrashekar and Ferat Sahin . 2014 . A survey on feature selection methods . Computers & Electrical Engineering 40 , 1 (2014), 16 -- 28 . Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 1 (2014), 16--28.","journal-title":"Computers & Electrical Engineering"},{"key":"e_1_2_1_8_1","volume-title":"Era of big data processing: A new approach via tensor networks and tensor decompositions. arXiv preprint arXiv:1403.2048","author":"Cichocki Andrzej","year":"2014","unstructured":"Andrzej Cichocki . 2014. Era of big data processing: A new approach via tensor networks and tensor decompositions. arXiv preprint arXiv:1403.2048 ( 2014 ). Andrzej Cichocki. 2014. Era of big data processing: A new approach via tensor networks and tensor decompositions. arXiv preprint arXiv:1403.2048 (2014)."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 47th annual ACM symposium on Theory of computing. 183--192","author":"Cohen Michael B","year":"2015","unstructured":"Michael B Cohen and Richard Peng . 2015 . Lp row sampling by lewis weights . In Proceedings of the 47th annual ACM symposium on Theory of computing. 183--192 . Michael B Cohen and Richard Peng. 2015. Lp row sampling by lewis weights. In Proceedings of the 47th annual ACM symposium on Theory of computing. 183--192."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the international joint conference on artificial intelligence","author":"Davis L.","year":"1985","unstructured":"L. Davis . 1985 . Applying adaptive algorithms to epistatic domains . Proceedings of the international joint conference on artificial intelligence (1985), 162--164. L. Davis. 1985. Applying adaptive algorithms to epistatic domains. Proceedings of the international joint conference on artificial intelligence (1985), 162--164."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","first-page":"541","DOI":"10.3390\/e21060541","article-title":"Approximate entropy and sample entropy: A comprehensive tutorial","volume":"21","author":"Delgado-Bonal Alfonso","year":"2019","unstructured":"Alfonso Delgado-Bonal and Alexander Marshak . 2019 . Approximate entropy and sample entropy: A comprehensive tutorial . Entropy 21 , 6 (2019), 541 . Alfonso Delgado-Bonal and Alexander Marshak. 2019. Approximate entropy and sample entropy: A comprehensive tutorial. Entropy 21, 6 (2019), 541.","journal-title":"Entropy"},{"key":"e_1_2_1_12_1","volume-title":"Jorge Piazentin Ono, Kyunghyun Cho, Claudio Silva, and Juliana Freire.","author":"Drori Iddo","year":"2021","unstructured":"Iddo Drori , Yamuna Krishnamurthy , Remi Rampin , Raoni de Paula Lourenco , Jorge Piazentin Ono, Kyunghyun Cho, Claudio Silva, and Juliana Freire. 2021 . AlphaD3M: Machine learning pipeline synthesis. arXiv (2021). Iddo Drori, Yamuna Krishnamurthy, Remi Rampin, Raoni de Paula Lourenco, Jorge Piazentin Ono, Kyunghyun Cho, Claudio Silva, and Juliana Freire. 2021. AlphaD3M: Machine learning pipeline synthesis. arXiv (2021)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1145\/1860702.1860709","article-title":"Search result diversification","volume":"39","author":"Drosou Marina","year":"2010","unstructured":"Marina Drosou and Evaggelia Pitoura . 2010 . Search result diversification . ACM SIGMOD Record 39 , 1 (2010), 41 -- 47 . Marina Drosou and Evaggelia Pitoura. 2010. Search result diversification. ACM SIGMOD Record 39, 1 (2010), 41--47.","journal-title":"ACM SIGMOD Record"},{"key":"e_1_2_1_14_1","volume-title":"Hands-free automl via meta-learning. arXiv preprint arXiv:2007.04074","author":"Feurer Matthias","year":"2020","unstructured":"Matthias Feurer , Katharina Eggensperger , Stefan Falkner , Marius Lindauer , and Frank Hutter . 2020. Auto-sklearn 2.0 : Hands-free automl via meta-learning. arXiv preprint arXiv:2007.04074 ( 2020 ). Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. 2020. Auto-sklearn 2.0: Hands-free automl via meta-learning. arXiv preprint arXiv:2007.04074 (2020)."},{"key":"e_1_2_1_15_1","volume-title":"Efficient and robust automated machine learning. Advances in neural information processing systems 28","author":"Feurer Matthias","year":"2015","unstructured":"Matthias Feurer , Aaron Klein , Katharina Eggensperger , Jost Springenberg , Manuel Blum , and Frank Hutter . 2015. Efficient and robust automated machine learning. Advances in neural information processing systems 28 ( 2015 ). Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. Advances in neural information processing systems 28 (2015)."},{"key":"e_1_2_1_16_1","volume-title":"Auto-sklearn: Efficient and Robust Automated Machine Learning.","author":"Feurer M.","year":"2019","unstructured":"M. Feurer , A. Klevin , K. Eggensperger , J. T. Springenberg , M. Blum , and F. Hutter . 2019 . Auto-sklearn: Efficient and Robust Automated Machine Learning. M. Feurer, A. Klevin, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. 2019. Auto-sklearn: Efficient and Robust Automated Machine Learning."},{"key":"e_1_2_1_17_1","volume-title":"An open source AutoML benchmark. arXiv preprint arXiv:1907.00909","author":"Gijsbers Pieter","year":"2019","unstructured":"Pieter Gijsbers , Erin LeDell , Janek Thomas , S\u00e9bastien Poirier , Bernd Bischl , and Joaquin Vanschoren . 2019. An open source AutoML benchmark. arXiv preprint arXiv:1907.00909 ( 2019 ). Pieter Gijsbers, Erin LeDell, Janek Thomas, S\u00e9bastien Poirier, Bernd Bischl, and Joaquin Vanschoren. 2019. An open source AutoML benchmark. arXiv preprint arXiv:1907.00909 (2019)."},{"key":"e_1_2_1_18_1","unstructured":"Elliott Gordon-Rodriguez Gabriel Loaiza-Ganem Geoff Pleiss and John Patrick Cunningham. 2020. Uses and abuses of the cross-entropy loss: Case studies in modern deep learning. (2020).  Elliott Gordon-Rodriguez Gabriel Loaiza-Ganem Geoff Pleiss and John Patrick Cunningham. 2020. Uses and abuses of the cross-entropy loss: Case studies in modern deep learning. (2020)."},{"key":"e_1_2_1_19_1","volume-title":"2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 171--180","author":"Gupta Suyog","year":"2016","unstructured":"Suyog Gupta , Wei Zhang , and Fei Wang . 2016 . Model accuracy and runtime tradeoff in distributed deep learning: A systematic study . In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 171--180 . Suyog Gupta, Wei Zhang, and Fei Wang. 2016. Model accuracy and runtime tradeoff in distributed deep learning: A systematic study. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 171--180."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","first-page":"106622","DOI":"10.1016\/j.knosys.2020.106622","article-title":"AutoML: A Survey of the State-of-the-Art","volume":"212","author":"He Xin","year":"2021","unstructured":"Xin He , Kaiyong Zhao , and Xiaowen Chu . 2021 . AutoML: A Survey of the State-of-the-Art . Knowledge-Based Systems 212 (2021), 106622 . Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems 212 (2021), 106622.","journal-title":"Knowledge-Based Systems"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2103--2113","author":"Heffetz Yuval","year":"2020","unstructured":"Yuval Heffetz , Roman Vainshtein , Gilad Katz , and Lior Rokach . 2020 . Deepline: Automl tool for pipelines generation using deep reinforcement learning and hierarchical actions filtering . In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2103--2113 . Yuval Heffetz, Roman Vainshtein, Gilad Katz, and Lior Rokach. 2020. Deepline: Automl tool for pipelines generation using deep reinforcement learning and hierarchical actions filtering. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2103--2113."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551804"},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1038\/scientificamerican0792-66","article-title":"Genetic Algorithms","volume":"267","author":"Holland J. H.","year":"1992","unstructured":"J. H. Holland . 1992 . Genetic Algorithms . Scientific American 267 , 1 (1992), 66 -- 73 . J. H. Holland. 1992. Genetic Algorithms. Scientific American 267, 1 (1992), 66--73.","journal-title":"Scientific American"},{"key":"e_1_2_1_24_1","volume-title":"International conference on learning and intelligent optimization. Springer, 507--523","author":"Hutter Frank","year":"2011","unstructured":"Frank Hutter , Holger H Hoos , and Kevin Leyton-Brown . 2011 . Sequential model-based optimization for general algorithm configuration . In International conference on learning and intelligent optimization. Springer, 507--523 . Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization. Springer, 507--523."},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3470918","article-title":"AutoML to Date and Beyond: Challenges and Opportunities","volume":"54","author":"Karmaker Shubhra Kanti","year":"2021","unstructured":"Shubhra Kanti Karmaker , Md Mahadi Hassan , Micah J Smith , Lei Xu , Chengxiang Zhai , and Kalyan Veeramachaneni . 2021 . AutoML to Date and Beyond: Challenges and Opportunities . ACM Computing Surveys (CSUR) 54 , 8 (2021), 1 -- 36 . Shubhra Kanti Karmaker, Md Mahadi Hassan, Micah J Smith, Lei Xu, Chengxiang Zhai, and Kalyan Veeramachaneni. 2021. AutoML to Date and Beyond: Challenges and Opportunities. ACM Computing Surveys (CSUR) 54, 8 (2021), 1--36.","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 33rd annual ACM conference on human factors in computing systems. 347--356","author":"Kay Matthew","year":"2015","unstructured":"Matthew Kay , Shwetak N Patel , and Julie A Kientz . 2015 . How good is 85%? A survey tool to connect classifier evaluation to acceptability of accuracy . In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 347--356 . Matthew Kay, Shwetak N Patel, and Julie A Kientz. 2015. How good is 85%? A survey tool to connect classifier evaluation to acceptability of accuracy. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 347--356."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"35","author":"Killamsetty Krishnateja","year":"2021","unstructured":"Krishnateja Killamsetty , Durga Sivasubramanian , Ganesh Ramakrishnan , and Rishabh Iyer . 2021 . Glister: Generalization based data subset selection for efficient and robust learning . In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 35 . 8110--8118. Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer. 2021. Glister: Generalization based data subset selection for efficient and robust learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8110--8118."},{"key":"e_1_2_1_28_1","volume-title":"Young-duck Choi, Yongseok Choi, Dong-Yeon Cho, and Jiwon Kim.","author":"Kim Jaehong","year":"2018","unstructured":"Jaehong Kim , Sangyeul Lee , Sungwan Kim , Moonsu Cha , Jung Kwon Lee , Young-duck Choi, Yongseok Choi, Dong-Yeon Cho, and Jiwon Kim. 2018 . Auto-meta : Automated gradient based meta learner search. arXiv preprint arXiv:1806.06927 (2018). Jaehong Kim, Sangyeul Lee, Sungwan Kim, Moonsu Cha, Jung Kwon Lee, Young-duck Choi, Yongseok Choi, Dong-Yeon Cho, and Jiwon Kim. 2018. Auto-meta: Automated gradient based meta learner search. arXiv preprint arXiv:1806.06927 (2018)."},{"key":"e_1_2_1_29_1","volume-title":"Estimating mutual information. Physical review E 69, 6","author":"Kraskov Alexander","year":"2004","unstructured":"Alexander Kraskov , Harald St\u00f6gbauer , and Peter Grassberger . 2004. Estimating mutual information. Physical review E 69, 6 ( 2004 ), 066138. Alexander Kraskov, Harald St\u00f6gbauer, and Peter Grassberger. 2004. Estimating mutual information. Physical review E 69, 6 (2004), 066138."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the AutoML Workshop at ICML","volume":"2020","author":"LeDell Erin","year":"2020","unstructured":"Erin LeDell and Sebastien Poirier . 2020 . H2o automl: Scalable automatic machine learning . In Proceedings of the AutoML Workshop at ICML , Vol. 2020 . Erin LeDell and Sebastien Poirier. 2020. H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML, Vol. 2020."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 7th annual conference on Genetic and evolutionary computation. 771--778","author":"Li Mian","year":"2005","unstructured":"Mian Li , Shapour Azarm , and Vikrant Aute . 2005 . A multi-objective genetic algorithm for robust design optimization . In Proceedings of the 7th annual conference on Genetic and evolutionary computation. 771--778 . Mian Li, Shapour Azarm, and Vikrant Aute. 2005. A multi-objective genetic algorithm for robust design optimization. In Proceedings of the 7th annual conference on Genetic and evolutionary computation. 771--778."},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1145\/3187009.3177737","article-title":"Ease. ml: Towards multi-tenant resource sharing for machine learning workloads","volume":"11","author":"Li Tian","year":"2018","unstructured":"Tian Li , Jie Zhong , Ji Liu , Wentao Wu , and Ce Zhang . 2018 . Ease. ml: Towards multi-tenant resource sharing for machine learning workloads . Proceedings of the VLDB Endowment 11 , 5 (2018), 607 -- 620 . Tian Li, Jie Zhong, Ji Liu, Wentao Wu, and Ce Zhang. 2018. Ease. ml: Towards multi-tenant resource sharing for machine learning workloads. Proceedings of the VLDB Endowment 11, 5 (2018), 607--620.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_33_1","volume-title":"VolcanoML: speeding up end-to-end AutoML via scalable search space decomposition. The VLDB Journal","author":"Li Yang","year":"2022","unstructured":"Yang Li , Yu Shen , Wentao Zhang , Ce Zhang , and Bin Cui . 2022. VolcanoML: speeding up end-to-end AutoML via scalable search space decomposition. The VLDB Journal ( 2022 ), 1--25. Yang Li, Yu Shen, Wentao Zhang, Ce Zhang, and Bin Cui. 2022. VolcanoML: speeding up end-to-end AutoML via scalable search space decomposition. The VLDB Journal (2022), 1--25."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 731--737","author":"Liberty Edo","year":"2020","unstructured":"Edo Liberty , Zohar Karnin , Bing Xiang , Laurence Rouesnel , Baris Coskun , Ramesh Nallapati , Julio Delgado , Amir Sadoughi , Yury Astashonok , Piali Das , 2020 . Elastic machine learning algorithms in amazon sagemaker . In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 731--737 . Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado, Amir Sadoughi, Yury Astashonok, Piali Das, et al. 2020. Elastic machine learning algorithms in amazon sagemaker. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 731--737."},{"key":"e_1_2_1_35_1","volume-title":"The global k-means clustering algorithm. Pattern recognition 36, 2","author":"Likas Aristidis","year":"2003","unstructured":"Aristidis Likas , Nikos Vlassis , and Jakob J Verbeek . 2003. The global k-means clustering algorithm. Pattern recognition 36, 2 ( 2003 ), 451--461. Aristidis Likas, Nikos Vlassis, and Jakob J Verbeek. 2003. The global k-means clustering algorithm. Pattern recognition 36, 2 (2003), 451--461."},{"key":"e_1_2_1_36_1","volume-title":"International Conference on Machine Learning. PMLR, 6950--6960","author":"Mirzasoleiman Baharan","year":"2020","unstructured":"Baharan Mirzasoleiman , Jeff Bilmes , and Jure Leskovec . 2020 . Coresets for data-efficient training of machine learning models . In International Conference on Machine Learning. PMLR, 6950--6960 . Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. 2020. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning. PMLR, 6950--6960."},{"key":"e_1_2_1_37_1","first-page":"11465","article-title":"Coresets for robust training of deep neural networks against noisy labels","volume":"33","author":"Mirzasoleiman Baharan","year":"2020","unstructured":"Baharan Mirzasoleiman , Kaidi Cao , and Jure Leskovec . 2020 . Coresets for robust training of deep neural networks against noisy labels . Advances in Neural Information Processing Systems 33 (2020), 11465 -- 11477 . Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec. 2020. Coresets for robust training of deep neural networks against noisy labels. Advances in Neural Information Processing Systems 33 (2020), 11465--11477.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_38_1","doi-asserted-by":"crossref","first-page":"2751","DOI":"10.14778\/3476311.3476336","article-title":"Assassin: an automatic classification system based on algorithm selection","volume":"14","author":"Mu Tianyu","year":"2021","unstructured":"Tianyu Mu , Hongzhi Wang , Shenghe Zheng , Shaoqing Zhang , Cheng Liang , and Haoyun Tang . 2021 . Assassin: an automatic classification system based on algorithm selection . Proceedings of the VLDB Endowment 14 , 12 (2021), 2751 -- 2754 . Tianyu Mu, Hongzhi Wang, Shenghe Zheng, Shaoqing Zhang, Cheng Liang, and Haoyun Tang. 2021. Assassin: an automatic classification system based on algorithm selection. Proceedings of the VLDB Endowment 14, 12 (2021), 2751--2754.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the genetic and evolutionary computation conference","author":"Olson Randal S","year":"2016","unstructured":"Randal S Olson , Nathan Bartley , Ryan J Urbanowicz , and Jason H Moore . 2016 . Evaluation of a tree-based pipeline optimization tool for automating data science . In Proceedings of the genetic and evolutionary computation conference 2016. 485--492. Randal S Olson, Nathan Bartley, Ryan J Urbanowicz, and Jason H Moore. 2016. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the genetic and evolutionary computation conference 2016. 485--492."},{"key":"e_1_2_1_40_1","volume-title":"Workshop on automatic machine learning. PMLR, 66--74","author":"Olson Randal S","year":"2016","unstructured":"Randal S Olson and Jason H Moore . 2016 . TPOT: A tree-based pipeline optimization tool for automating machine learning . In Workshop on automatic machine learning. PMLR, 66--74 . Randal S Olson and Jason H Moore. 2016. TPOT: A tree-based pipeline optimization tool for automating machine learning. In Workshop on automatic machine learning. PMLR, 66--74."},{"key":"e_1_2_1_41_1","volume-title":"TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning. In JMLR: Workshop and Conference Proceedings","volume":"64","author":"Olson R. S.","unstructured":"R. S. Olson and J. H. Moore . 2016 . TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning. In JMLR: Workshop and Conference Proceedings , Vol. 64 . 66--74. R. S. Olson and J. H. Moore. 2016. TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning. In JMLR: Workshop and Conference Proceedings, Vol. 64. 66--74."},{"key":"e_1_2_1_42_1","unstructured":"OpenML. 2022. https:\/\/www.openml.org\/.  OpenML. 2022. https:\/\/www.openml.org\/."},{"key":"e_1_2_1_43_1","volume-title":"2016 IEEE 32nd International Conference on Data Engineering (ICDE). 755--766","author":"Park Y.","year":"2016","unstructured":"Y. Park , M. Cafarella , and B. Mozafari . 2016. Visualization-aware sampling for very large databases . In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). 755--766 . 10.1109\/ICDE. 2016 .7498287 Y. Park, M. Cafarella, and B. Mozafari. 2016. Visualization-aware sampling for very large databases. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). 755--766. 10.1109\/ICDE.2016.7498287"},{"key":"e_1_2_1_44_1","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa F.","year":"2011","unstructured":"F. Pedregosa , G. Varoquaux , A. Gramfort , V. Michel , B. Thirion , O. Grisel , M. Blondel , P. Prettenhofer , R. Weiss , V. Dubourg , J. Vanderplas , A. Passos , D. Cournapeau , M. Brucher , M. Perrot , and E. Duchesnay . 2011 . Scikit-learn: Machine Learning in Python . Journal of Machine Learning Research 12 (2011), 2825 -- 2830 . F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_45_1","unstructured":"SubStrat Github Repository. 2022. https:\/\/github.com\/teddy4445\/SubStrat.  SubStrat Github Repository. 2022. https:\/\/github.com\/teddy4445\/SubStrat."},{"key":"e_1_2_1_46_1","unstructured":"UCI Machine Learning Repository. 2022. https:\/\/archive.ics.uci.edu\/.  UCI Machine Learning Repository. 2022. https:\/\/archive.ics.uci.edu\/."},{"key":"e_1_2_1_47_1","volume-title":"2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE","author":"Wang Chunnan","year":"2020","unstructured":"Chunnan Wang , Hongzhi Wang , Tianyu Mu , Jianzhong Li , and Hong Gao . 2020 . Auto-model: utilizing research papers and HPO techniques to deal with the cash problem . In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE , 1906--1909. Chunnan Wang, Hongzhi Wang, Tianyu Mu, Jianzhong Li, and Hong Gao. 2020. Auto-model: utilizing research papers and HPO techniques to deal with the cash problem. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1906--1909."},{"key":"e_1_2_1_48_1","first-page":"434","article-title":"FLAML: a fast and lightweight AutoML Library","volume":"3","author":"Wang Chi","year":"2021","unstructured":"Chi Wang , Qingyun Wu , Markus Weimer , and Erkang Zhu . 2021 . FLAML: a fast and lightweight AutoML Library . Proceedings of Machine Learning and Systems 3 (2021), 434 -- 447 . Chi Wang, Qingyun Wu, Markus Weimer, and Erkang Zhu. 2021. FLAML: a fast and lightweight AutoML Library. Proceedings of Machine Learning and Systems 3 (2021), 434--447.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","first-page":"101822","DOI":"10.1016\/j.artmed.2020.101822","article-title":"Automated machine learning: Review of the state-of-the-art and opportunities for healthcare","volume":"104","author":"Waring Jonathan","year":"2020","unstructured":"Jonathan Waring , Charlotta Lindvall , and Renato Umeton . 2020 . Automated machine learning: Review of the state-of-the-art and opportunities for healthcare . Artificial Intelligence in Medicine 104 (2020), 101822 . Jonathan Waring, Charlotta Lindvall, and Renato Umeton. 2020. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artificial Intelligence in Medicine 104 (2020), 101822.","journal-title":"Artificial Intelligence in Medicine"},{"key":"e_1_2_1_50_1","unstructured":"Kaggle Website. 2022. https:\/\/github.com\/teddy4445\/SubStrat.  Kaggle Website. 2022. https:\/\/github.com\/teddy4445\/SubStrat."},{"key":"e_1_2_1_51_1","first-page":"1","article-title":"Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification","volume":"6","author":"Weinberg Abraham Itzhak","year":"2019","unstructured":"Abraham Itzhak Weinberg and Mark Last . 2019 . Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification . Journal of Big Data 6 , 1 (2019), 1 -- 17 . Abraham Itzhak Weinberg and Mark Last. 2019. Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification. Journal of Big Data 6, 1 (2019), 1--17.","journal-title":"Journal of Big Data"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2004.1380157"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"35","author":"Wu Qingyun","year":"2021","unstructured":"Qingyun Wu , Chi Wang , and Silu Huang . 2021 . Frugal optimization for cost-related hyperparameters . In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 35 . 10347--10354. Qingyun Wu, Chi Wang, and Silu Huang. 2021. Frugal optimization for cost-related hyperparameters. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10347--10354."},{"key":"e_1_2_1_54_1","first-page":"3166","article-title":"Oracle automl: a fast and predictive automl pipeline","volume":"13","author":"Yakovlev Anatoly","year":"2020","unstructured":"Anatoly Yakovlev , Hesam Fathi Moghadam , Ali Moharrer , Jingxiao Cai , Nikan Chavoshi , Venkatanathan Varadarajan , Sandeep R Agrawal , Sam Idicula , Tomas Karnagel , Sanjay Jinturkar , 2020 . Oracle automl: a fast and predictive automl pipeline . PVLDB 13 , 12 (2020), 3166 -- 3180 . Anatoly Yakovlev, Hesam Fathi Moghadam, Ali Moharrer, Jingxiao Cai, Nikan Chavoshi, Venkatanathan Varadarajan, Sandeep R Agrawal, Sam Idicula, Tomas Karnagel, Sanjay Jinturkar, et al. 2020. Oracle automl: a fast and predictive automl pipeline. PVLDB 13, 12 (2020), 3166--3180.","journal-title":"PVLDB"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3574245.3574261","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T23:14:41Z","timestamp":1677021281000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3574245.3574261"}},"subtitle":["A Subset-Based Optimization Strategy for Faster AutoML"],"short-title":[],"issued":{"date-parts":[[2022,12]]},"references-count":54,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["10.14778\/3574245.3574261"],"URL":"https:\/\/doi.org\/10.14778\/3574245.3574261","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,12]]},"assertion":[{"value":"2023-02-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}