{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T22:34:05Z","timestamp":1780439645405,"version":"3.54.1"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search for optimal pipelines. In this work, we present a novel meta-learning system called KGpip which (1) builds a database of datasets and corresponding pipelines by mining thousands of scripts with program analysis, (2) uses dataset embeddings to find similar datasets in the database based on its content instead of metadata-based features, (3) models AutoML pipeline creation as a graph generation problem, to succinctly characterize the diverse pipelines seen for a single dataset. KGpip's meta-learning is a sub-component for AutoML systems. We demonstrate this by integrating KGpip with two AutoML systems. Our comprehensive evaluation using 121 datasets, including those used by the state-of-the-art systems, shows that KGpip significantly outperforms these systems.<\/jats:p>","DOI":"10.14778\/3551793.3551804","type":"journal-article","created":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T22:25:03Z","timestamp":1664490303000},"page":"2428-2436","source":"Crossref","is-referenced-by-count":12,"title":["A scalable AutoML approach based on graph neural networks"],"prefix":"10.14778","volume":"15","author":[{"given":"Mossad","family":"Helali","sequence":"first","affiliation":[{"name":"Concordia University, Montreal, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Essam","family":"Mansour","sequence":"additional","affiliation":[{"name":"Concordia University, Montreal, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ibrahim","family":"Abdelaziz","sequence":"additional","affiliation":[{"name":"IBM T.J. Watson Research Center"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Julian","family":"Dolby","sequence":"additional","affiliation":[{"name":"IBM T.J. Watson Research Center"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kavitha","family":"Srinivas","sequence":"additional","affiliation":[{"name":"IBM T.J. Watson Research Center"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,9,29]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"A Toolkit for Generating Code Knowledge Graphs. ArXiv","author":"Abdelaziz Ibrahim","year":"2020","unstructured":"Ibrahim Abdelaziz , Julian Dolby , James P. McCusker , and Kavitha Srinivas . 2020. A Toolkit for Generating Code Knowledge Graphs. ArXiv ( 2020 ). https:\/\/arxiv.org\/abs\/2002.09440 Ibrahim Abdelaziz, Julian Dolby, James P. McCusker, and Kavitha Srinivas. 2020. A Toolkit for Generating Code Knowledge Graphs. ArXiv (2020). https:\/\/arxiv.org\/abs\/2002.09440"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360601"},{"key":"e_1_2_1_3_1","volume-title":"Universal Sentence Encoder. CoRR abs\/1803.11175","author":"Cer Daniel","year":"2018","unstructured":"Daniel Cer , Yinfei Yang , Sheng-yi Kong, Nan Hua , Nicole Limtiaco , Rhomni St. John , Noah Constant , Mario Guajardo-Cespedes , Steve Yuan , Chris Tar , Yun-Hsuan Sung , Brian Strope , and Ray Kurzweil . 2018. Universal Sentence Encoder. CoRR abs\/1803.11175 ( 2018 ). http:\/\/arxiv.org\/abs\/1803.11175 Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. CoRR abs\/1803.11175 (2018). http:\/\/arxiv.org\/abs\/1803.11175"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-55696-3_16"},{"key":"e_1_2_1_5_1","volume-title":"Juliana Freire, and Madeleine Udell.","author":"Drori Iddo","year":"2019","unstructured":"Iddo Drori , Lu Liu , Yi Nian , Sharath C Koorathota , Jung-Shian Li , Antonio Khalil Moretti , Juliana Freire, and Madeleine Udell. 2019 . AutoML using Metadata Language Embeddings. ArXiv ( 2019). https:\/\/arxiv.org\/abs\/1910.03698 Iddo Drori, Lu Liu, Yi Nian, Sharath C Koorathota, Jung-Shian Li, Antonio Khalil Moretti, Juliana Freire, and Madeleine Udell. 2019. AutoML using Metadata Language Embeddings. ArXiv (2019). https:\/\/arxiv.org\/abs\/1910.03698"},{"key":"e_1_2_1_6_1","volume-title":"In Proceedings of the European Conference on Artificial Intelligence (ECAI). 430--434","author":"Engels Robert","year":"1998","unstructured":"Robert Engels and Christiane Theusinger . 1998 . Using a Data Metric for Preprocessing Advice for Data Mining Applications . In In Proceedings of the European Conference on Artificial Intelligence (ECAI). 430--434 . http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.56.7414&rep=rep1&type=pdf Robert Engels and Christiane Theusinger. 1998. Using a Data Metric for Preprocessing Advice for Data Mining Applications. In In Proceedings of the European Conference on Artificial Intelligence (ECAI). 430--434. http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.56.7414&rep=rep1&type=pdf"},{"key":"e_1_2_1_7_1","volume-title":"AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. ArXiv","author":"Erickson Nick","year":"2020","unstructured":"Nick Erickson , Jonas Mueller , Alexander Shirkov , Hang Zhang , Pedro Larroy , Mu Li , and Alexander Smola . 2020. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. ArXiv ( 2020 ). https:\/\/arxiv.org\/abs\/2003.06505 Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. 2020. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. ArXiv (2020). https:\/\/arxiv.org\/abs\/2003.06505"},{"key":"e_1_2_1_8_1","volume-title":"The Next Generation. arXiv","author":"Feurer Matthias","year":"2020","unstructured":"Matthias Feurer , Katharina Eggensperger , Stefan Falkner , Marius Lindauer , and Frank Hutter . 2020. Auto-Sklearn 2.0 : The Next Generation. arXiv ( 2020 ). https:\/\/arxiv.org\/abs\/2007.04074 Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. 2020. Auto-Sklearn 2.0: The Next Generation. arXiv (2020). https:\/\/arxiv.org\/abs\/2007.04074"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969442.2969547"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327254"},{"key":"e_1_2_1_11_1","volume-title":"An Open Source AutoML Benchmark. In AutoML Workshop at the International Conference on Machine Learning (ICML). https:\/\/arxiv.org\/abs\/1907","author":"Gijsbers P.","unstructured":"P. Gijsbers , E. LeDell , S. Poirier , J. Thomas , B. Bischl , and J. Vanschoren . 2019 . An Open Source AutoML Benchmark. In AutoML Workshop at the International Conference on Machine Learning (ICML). https:\/\/arxiv.org\/abs\/1907 .00909 P. Gijsbers, E. LeDell, S. Poirier, J. Thomas, B. Bischl, and J. Vanschoren. 2019. An Open Source AutoML Benchmark. In AutoML Workshop at the International Conference on Machine Learning (ICML). https:\/\/arxiv.org\/abs\/1907.00909"},{"key":"e_1_2_1_12_1","first-page":"21","article-title":"A brief Review of the ChaLearn AutoML Challenge: Any-time Any-dataset Learning without Human Intervention","volume":"64","author":"Guyon Isabelle","year":"2016","unstructured":"Isabelle Guyon , Imad Chaabane , Hugo Jair Escalante , Sergio Escalera , Damir Jajetic , James Robert Lloyd , N\u00faria Maci\u00e0 , Bisakha Ray , Lukasz Romaszko , Mich\u00e8le Sebag , Alexander Statnikov , S\u00e9bastien Treguer , and Evelyne Viegas . 2016 . A brief Review of the ChaLearn AutoML Challenge: Any-time Any-dataset Learning without Human Intervention . In Proceedings of Machine Learning Research , Vol. 64. 21 -- 30 . https:\/\/proceedings.mlr.press\/v64\/guyon_review_2016.html Isabelle Guyon, Imad Chaabane, Hugo Jair Escalante, Sergio Escalera, Damir Jajetic, James Robert Lloyd, N\u00faria Maci\u00e0, Bisakha Ray, Lukasz Romaszko, Mich\u00e8le Sebag, Alexander Statnikov, S\u00e9bastien Treguer, and Evelyne Viegas. 2016. A brief Review of the ChaLearn AutoML Challenge: Any-time Any-dataset Learning without Human Intervention. In Proceedings of Machine Learning Research, Vol. 64. 21--30. https:\/\/proceedings.mlr.press\/v64\/guyon_review_2016.html","journal-title":"Proceedings of Machine Learning Research"},{"key":"e_1_2_1_13_1","volume-title":"A Scalable AutoML Approach Based on Graph Neural Networks. ArXiv","author":"Helali Mossad","year":"2022","unstructured":"Mossad Helali , Essam Mansour , Ibrahim Abdelaziz , Julian Dolby , and Kavitha Srinivas . 2022. A Scalable AutoML Approach Based on Graph Neural Networks. ArXiv ( 2022 ). https:\/\/arxiv.org\/abs\/2111.00083 Mossad Helali, Essam Mansour, Ibrahim Abdelaziz, Julian Dolby, and Kavitha Srinivas. 2022. A Scalable AutoML Approach Based on Graph Neural Networks. ArXiv (2022). https:\/\/arxiv.org\/abs\/2111.00083"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1609\/icaps.v30i1.6686"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btz470"},{"key":"e_1_2_1_17_1","volume-title":"AutoML Workshop at the International Conference on Machine Learning (ICML). www.automl.org\/wp-content\/uploads\/2020\/07\/AutoML_2020_paper_61","author":"LeDell Erin","year":"2020","unstructured":"Erin LeDell and Sebastien Poirier . 2020 . H2O AutoML: Scalable Automatic Machine Learning . In AutoML Workshop at the International Conference on Machine Learning (ICML). www.automl.org\/wp-content\/uploads\/2020\/07\/AutoML_2020_paper_61 .pdf Erin LeDell and Sebastien Poirier. 2020. H2O AutoML: Scalable Automatic Machine Learning. In AutoML Workshop at the International Conference on Machine Learning (ICML). www.automl.org\/wp-content\/uploads\/2020\/07\/AutoML_2020_paper_61.pdf"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476270"},{"key":"e_1_2_1_19_1","volume-title":"Learning Deep Generative Models of Graphs. ArXiv","author":"Li Yujia","year":"2018","unstructured":"Yujia Li , Oriol Vinyals , Chris Dyer , Razvan Pascanu , and Peter Battaglia . 2018. Learning Deep Generative Models of Graphs. ArXiv ( 2018 ). https:\/\/arxiv.org\/abs\/1803.03324 Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. 2018. Learning Deep Generative Models of Graphs. ArXiv (2018). https:\/\/arxiv.org\/abs\/1803.03324"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5926"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i10.17077"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2019.00158"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13040-017-0154"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/645529.658105"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/457"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-012-5286-7"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999836"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487629"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2641190.2641198"},{"key":"e_1_2_1_30_1","volume-title":"International Journal of Computer Science & Applications 1","author":"Vilalta Ricardo","year":"2004","unstructured":"Ricardo Vilalta , Christophe Giraud-carrier, Pavel Brazdil , and Carlos Soares . 2004. Using Meta-Learning to Support Data Mining . International Journal of Computer Science & Applications 1 ( 2004 ). https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.105.1351&rep=rep1&type=pdf Ricardo Vilalta, Christophe Giraud-carrier, Pavel Brazdil, and Carlos Soares. 2004. Using Meta-Learning to Support Data Mining. International Journal of Computer Science & Applications 1 (2004). https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.105.1351&rep=rep1&type=pdf"},{"key":"e_1_2_1_31_1","first-page":"434","article-title":"FLAML: A Fast and Lightweight AutoML Library","volume":"3","author":"Wang Chi","year":"2021","unstructured":"Chi Wang , Qingyun Wu , Markus Weimer , and Erkang Zhu . 2021 . FLAML: A Fast and Lightweight AutoML Library . In Proceedings of Machine Learning and Systems (MLSys) , Vol. 3. 434 -- 447 . https:\/\/proceedings.mlsys.org\/paper\/2021\/file\/92cc227532d17e56e07902b254dfad10-Paper.pdf Chi Wang, Qingyun Wu, Markus Weimer, and Erkang Zhu. 2021. FLAML: A Fast and Lightweight AutoML Library. In Proceedings of Machine Learning and Systems (MLSys), Vol. 3. 434--447. https:\/\/proceedings.mlsys.org\/paper\/2021\/file\/92cc227532d17e56e07902b254dfad10-Paper.pdf","journal-title":"Proceedings of Machine Learning and Systems (MLSys)"},{"key":"e_1_2_1_32_1","volume-title":"ML-Plan for Unlimited-Length Machine Learning Pipelines. In AutoML Workshop at the International Conference on Machine Learning (ICML). https:\/\/ris.uni-paderborn.de\/download\/3852\/3853\/38","author":"Wever Marcel","year":"2018","unstructured":"Marcel Wever , Felix Mohr , and Eyke H\u00fcllermeier . 2018 . ML-Plan for Unlimited-Length Machine Learning Pipelines. In AutoML Workshop at the International Conference on Machine Learning (ICML). https:\/\/ris.uni-paderborn.de\/download\/3852\/3853\/38 .pdf Marcel Wever, Felix Mohr, and Eyke H\u00fcllermeier. 2018. ML-Plan for Unlimited-Length Machine Learning Pipelines. In AutoML Workshop at the International Conference on Machine Learning (ICML). https:\/\/ris.uni-paderborn.de\/download\/3852\/3853\/38.pdf"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415542"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330909"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3551793.3551804","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:25:21Z","timestamp":1672223121000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3551793.3551804"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":34,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["10.14778\/3551793.3551804"],"URL":"https:\/\/doi.org\/10.14778\/3551793.3551804","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,7]]}}}