{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T20:21:34Z","timestamp":1774729294254,"version":"3.50.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T00:00:00Z","timestamp":1643587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMOD Rec."],"published-print":{"date-parts":[[2022,1,31]]},"abstract":"<jats:p>Similar to Open Data initiatives, data science as a community has launched initiatives for sharing not only data but entire pipelines, derivatives, artifacts, etc. (Open Data Science). However, the few efforts that exist focus on the technical part on how to facilitate sharing, conversion, etc. This vision paper goes a step further and proposes KEK, an open federated data science platform that does not only allow for sharing data science pipelines and their (meta)data but also provides methods for efficient search and, in the ideal case, even allows for combining and defining pipelines across platforms in a federated manner. In doing so, KEK addresses the so far neglected challenge of actually finding artifacts that are semantically related and that can be combined to achieve a certain goal.<\/jats:p>","DOI":"10.1145\/3516431.3516435","type":"journal-article","created":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T23:31:58Z","timestamp":1643671918000},"page":"16-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Federated Data Science to Break Down Silos [Vision]"],"prefix":"10.1145","volume":"50","author":[{"given":"Essam","family":"Mansour","sequence":"first","affiliation":[{"name":"Concordia University, Canada"}]},{"given":"Kavitha","family":"Srinivas","sequence":"additional","affiliation":[{"name":"IBM Research, USA"}]},{"given":"Katja","family":"Hose","sequence":"additional","affiliation":[{"name":"Aalborg University, Denmark"}]}],"member":"320","published-online":{"date-parts":[[2022,1,31]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"ISWC","author":"Abdallah H.","year":"2021","unstructured":"H. Abdallah , D. Nguyen , K. Nguyen , and E. Mansour . Demonstration of KGNet: a cognitive knowledge graph platform . In ISWC , 2021 . H. Abdallah, D. Nguyen, K. Nguyen, and E. Mansour. Demonstration of KGNet: a cognitive knowledge graph platform. In ISWC, 2021."},{"key":"e_1_2_1_2_1","volume-title":"A Toolkit for generating code knowledge graphs. CoRR, https:\/\/arxiv.org\/abs\/2002.09440","author":"Abdelaziz I.","year":"2020","unstructured":"I. Abdelaziz , J. Dolby , J. P. McCusker , and K. Srinivas . A Toolkit for generating code knowledge graphs. CoRR, https:\/\/arxiv.org\/abs\/2002.09440 , 2020 . I. Abdelaziz, J. Dolby, J. P. McCusker, and K. Srinivas. A Toolkit for generating code knowledge graphs. CoRR, https:\/\/arxiv.org\/abs\/2002.09440, 2020."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3186728.3164144"},{"key":"e_1_2_1_4_1","volume-title":"CIDR","author":"Brachmann M.","year":"2020","unstructured":"M. Brachmann , W. Spoth , O. Kennedy , B. Glavic , H. Mueller , S. Castelo , C. Bautista , and J. Freire . Your notebook is not crumby enough, replace it . In CIDR , 2020 . M. Brachmann, W. Spoth, O. Kennedy, B. Glavic, H. Mueller, S. Castelo, C. Bautista, and J. Freire. Your notebook is not crumby enough, replace it. In CIDR, 2020."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360601"},{"key":"e_1_2_1_6_1","unstructured":"Canada Data Portal. https:\/\/open.canada.ca\/.  Canada Data Portal. https:\/\/open.canada.ca\/."},{"key":"e_1_2_1_7_1","volume-title":"ProGraML: Graph-based deep learning for program optimization and analysis. CoRR, https:\/\/arxiv.org\/abs\/2003.10536","author":"Cummins C.","year":"2020","unstructured":"C. Cummins , Z. V. Fisches , T. Ben-Nun , T. Hoefler , and H. Leather . ProGraML: Graph-based deep learning for program optimization and analysis. CoRR, https:\/\/arxiv.org\/abs\/2003.10536 , 2020 . C. Cummins, Z. V. Fisches, T. Ben-Nun, T. Hoefler, and H. Leather. ProGraML: Graph-based deep learning for program optimization and analysis. CoRR, https:\/\/arxiv.org\/abs\/2003.10536, 2020."},{"key":"e_1_2_1_8_1","unstructured":"DataLad. http:\/\/www.datalad.org.  DataLad. http:\/\/www.datalad.org."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314037"},{"key":"e_1_2_1_10_1","unstructured":"DVC. https:\/\/dvc.org.  DVC. https:\/\/dvc.org."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737817.2737829"},{"key":"e_1_2_1_12_1","volume-title":"GOOGLE CLOUD","author":"Li Fei-Fei","year":"2018","unstructured":"Fei-Fei Li and Jia Li. Cloud AutoML : Making AI accessible to every business . GOOGLE CLOUD , 2018 . Fei-Fei Li and Jia Li. Cloud AutoML: Making AI accessible to every business. GOOGLE CLOUD, 2018."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_2_1_14_1","volume-title":"ICDE","author":"Fernandez R. C.","year":"2018","unstructured":"R. C. Fernandez , Z. Abedjan , F. Koko , G. Yuan , S. Madden , and M. Stonebraker . Aurum: A data discovery system . In ICDE , 2018 . R. C. Fernandez, Z. Abedjan, F. Koko, G. Yuan, S. Madden, and M. Stonebraker. Aurum: A data discovery system. In ICDE, 2018."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407800"},{"key":"e_1_2_1_16_1","volume-title":"Auto-Sklearn 2.0: Hands-free AutoML via meta-learning. CoRR, https:\/\/arxiv.org\/abs\/2007.04074","author":"Feurer M.","year":"2020","unstructured":"M. Feurer , K. Eggensperger , S. Falkner , M. Lindauer , and F. Hutter . Auto-Sklearn 2.0: Hands-free AutoML via meta-learning. CoRR, https:\/\/arxiv.org\/abs\/2007.04074 , 2020 . M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. Auto-Sklearn 2.0: Hands-free AutoML via meta-learning. CoRR, https:\/\/arxiv.org\/abs\/2007.04074, 2020."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.596"},{"key":"e_1_2_1_18_1","unstructured":"Git-lfs. https:\/\/git-lfs.github.com.  Git-lfs. https:\/\/git-lfs.github.com."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476317"},{"key":"e_1_2_1_20_1","volume-title":"A scalable AutoML approach based on graph neural networks. CoRR, https:\/\/arxiv.org\/abs\/2111.00083","author":"Helali M.","year":"2021","unstructured":"M. Helali , E. Mansour , I. Abdelaziz , J. Dolby , and K. Srinivas . A scalable AutoML approach based on graph neural networks. CoRR, https:\/\/arxiv.org\/abs\/2111.00083 , 2021 . M. Helali, E. Mansour, I. Abdelaziz, J. Dolby, and K. Srinivas. A scalable AutoML approach based on graph neural networks. CoRR, https:\/\/arxiv.org\/abs\/2111.00083, 2021."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2766634"},{"key":"e_1_2_1_22_1","volume-title":"A comparative study of similarity-based and GNN-based link prediction approaches. CoRR, https:\/\/arxiv.org\/abs\/2008.08879","author":"Islam M. K.","year":"2020","unstructured":"M. K. Islam , S. Aridhi , and M. Smail-Tabbone . A comparative study of similarity-based and GNN-based link prediction approaches. CoRR, https:\/\/arxiv.org\/abs\/2008.08879 , 2020 . M. K. Islam, S. Aridhi, and M. Smail-Tabbone. A comparative study of similarity-based and GNN-based link prediction approaches. CoRR, https:\/\/arxiv.org\/abs\/2008.08879, 2020."},{"key":"e_1_2_1_23_1","unstructured":"Kaggle Portal. https:\/\/www.kaggle.com\/.  Kaggle Portal. https:\/\/www.kaggle.com\/."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2015.7363784"},{"key":"e_1_2_1_25_1","volume-title":"Leveraging abstract meaning representation for knowledge base question answering. CoRR, https:\/\/arxiv.org\/abs\/2012.01707","author":"Kapanipathi P.","year":"2020","unstructured":"P. Kapanipathi , I. Abdelaziz , S. Ravishankar , and Leveraging abstract meaning representation for knowledge base question answering. CoRR, https:\/\/arxiv.org\/abs\/2012.01707 , 2020 . P. Kapanipathi, I. Abdelaziz, S. Ravishankar, and et al. Leveraging abstract meaning representation for knowledge base question answering. CoRR, https:\/\/arxiv.org\/abs\/2012.01707, 2020."},{"key":"e_1_2_1_26_1","volume-title":"Semantic annotation for tabular data. CoRR, https:\/\/arxiv.org\/abs\/2012.08594","author":"Khurana U.","year":"2020","unstructured":"U. Khurana and S. Galhotra . Semantic annotation for tabular data. CoRR, https:\/\/arxiv.org\/abs\/2012.08594 , 2020 . U. Khurana and S. Galhotra. Semantic annotation for tabular data. CoRR, https:\/\/arxiv.org\/abs\/2012.08594, 2020."},{"key":"e_1_2_1_27_1","volume-title":"CIDR","author":"Kumar A.","year":"2021","unstructured":"A. Kumar , S. Nakandala , Y. Zhang , S. Li , A. Gemawat , and KabirNagrecha. Cerebro : A layered data platform for scalable deep learning . CIDR , 2021 . A. Kumar, S. Nakandala, Y. Zhang, S. Li, A. Gemawat, and KabirNagrecha. Cerebro: A layered data platform for scalable deep learning. CIDR, 2021."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461553"},{"key":"e_1_2_1_29_1","unstructured":"Y. Li O. Vinyals C. Dyer R. Pascanu and P. W. Battaglia. Learning deep generative models of graphs. CoRR http:\/\/arxiv.org\/abs\/1803.03324 2018.  Y. Li O. Vinyals C. Dyer R. Pascanu and P. W. Battaglia. Learning deep generative models of graphs. CoRR http:\/\/arxiv.org\/abs\/1803.03324 2018."},{"key":"e_1_2_1_30_1","volume-title":"WebConf","author":"Noy N.","year":"2019","unstructured":"N. Noy , M. Burgess , and D. Brickley . Google dataset search: Building a search engine for datasets in an open web ecosystem . In WebConf , 2019 . N. Noy, M. Burgess, and D. Brickley. Google dataset search: Building a search engine for datasets in an open web ecosystem. In WebConf, 2019."},{"key":"e_1_2_1_31_1","volume-title":"ISWC","author":"Omar R.","year":"2021","unstructured":"R. Omar , I. Dhall , N. Sheikh , and E. Mansour . A Knowledge Graph Question-Answering Platform Trained Independently of the Graph . In ISWC , 2021 . R. Omar, I. Dhall, N. Sheikh, and E. Mansour. A Knowledge Graph Question-Answering Platform Trained Independently of the Graph. In ISWC, 2021."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476364"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137789"},{"key":"e_1_2_1_34_1","volume-title":"Project CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. CoRR, https:\/\/arxiv.org\/abs\/2105.12655","author":"Puri R.","year":"2021","unstructured":"R. Puri , D. S. Kung , G. Janssen , W. Zhang , G. Domeniconi , V. Zolotov , J. Dolby , J. Chen , M. R. Choudhury , L. Decker , V. Thost , L. Buratti , S. Pujar , and U. Finkler . Project CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. CoRR, https:\/\/arxiv.org\/abs\/2105.12655 , 2021 . R. Puri, D. S. Kung, G. Janssen, W. Zhang, G. Domeniconi, V. Zolotov, J. Dolby, J. Chen, M. R. Choudhury, L. Decker, V. Thost, L. Buratti, S. Pujar, and U. Finkler. Project CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. CoRR, https:\/\/arxiv.org\/abs\/2105.12655, 2021."},{"key":"e_1_2_1_35_1","unstructured":"QRI. https:\/\/qri.io.  QRI. https:\/\/qri.io."},{"key":"e_1_2_1_36_1","unstructured":"Quilt. https:\/\/github.com\/quiltdata\/quilt.  Quilt. https:\/\/github.com\/quiltdata\/quilt."},{"key":"e_1_2_1_37_1","volume-title":"NeurIPS","author":"Rozi\u00e8re B.","year":"2020","unstructured":"B. Rozi\u00e8re , M. Lachaux , L. Chanussot , and G. Lample . Unsupervised translation of programming languages . In NeurIPS , 2020 . B. Rozi\u00e8re, M. Lachaux, L. Chanussot, and G. Lample. Unsupervised translation of programming languages. In NeurIPS, 2020."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415556"},{"key":"e_1_2_1_39_1","volume-title":"Agora: Bringing together datasets, algorithms, models and more in a unified ecosystem [vision]. SIGMOD Rec., 49(4)","author":"Traub J.","year":"2020","unstructured":"J. Traub , Z. Kaoudi , J. Quian\u00e9-Ruiz , and V. Markl . Agora: Bringing together datasets, algorithms, models and more in a unified ecosystem [vision]. SIGMOD Rec., 49(4) , 2020 . J. Traub, Z. Kaoudi, J. Quian\u00e9-Ruiz, and V. Markl. Agora: Bringing together datasets, algorithms, models and more in a unified ecosystem [vision]. SIGMOD Rec., 49(4), 2020."},{"key":"e_1_2_1_40_1","unstructured":"USA Data Portal. https:\/\/www.data.gov\/.  USA Data Portal. https:\/\/www.data.gov\/."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2641190.2641198"},{"issue":"4","key":"e_1_2_1_42_1","first-page":"16","article-title":"MODELDB: opportunities and challenges in managing machine learning models","volume":"41","author":"Vartak M.","year":"2018","unstructured":"M. Vartak and S. Madden . MODELDB: opportunities and challenges in managing machine learning models . IEEE Data Eng. Bull. , 41 ( 4 ): 16 -- 25 , 2018 . M. Vartak and S. Madden. MODELDB: opportunities and challenges in managing machine learning models. IEEE Data Eng. Bull., 41(4):16--25, 2018.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_43_1","volume-title":"MLSys","author":"Wang C.","year":"2020","unstructured":"C. Wang , Q. Wu , M. Weimer , and E. Zhu . FLAML: A fast and lightweight automl library . In MLSys , 2020 . C. Wang, Q. Wu, M. Weimer, and E. Zhu. FLAML: A fast and lightweight automl library. In MLSys, 2020."},{"key":"e_1_2_1_44_1","volume-title":"The FAIR guiding principles for scientific data management and stewardship. Scientific data, 3","author":"Wilkinson M. D.","year":"2016","unstructured":"M. D. Wilkinson , M. Dumontier , I. J. Aalbersberg , G. Appleton , M. Axton , A. Baak , N. Blomberg , J.-W. Boiten , L. B. da Silva Santos , P. E. Bourne , The FAIR guiding principles for scientific data management and stewardship. Scientific data, 3 , 2016 . M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, et al. The FAIR guiding principles for scientific data management and stewardship. Scientific data, 3, 2016."},{"key":"e_1_2_1_45_1","unstructured":"World Health Organization data portal. https:\/\/www.who.int\/data\/gho.  World Health Organization data portal. https:\/\/www.who.int\/data\/gho."},{"key":"e_1_2_1_46_1","unstructured":"World Trade Organization data portal. https:\/\/data.wto.org\/.  World Trade Organization data portal. https:\/\/data.wto.org\/."},{"key":"e_1_2_1_47_1","article-title":"A comprehensive survey on graph neural networks. The","author":"Wu Z.","year":"2020","unstructured":"Z. Wu , S. Pan , F. Chen , and . A comprehensive survey on graph neural networks. The IEEE Transactions on Neural Networks and Learning Systems, pages 1--21 , 2020 . Z. Wu, S. Pan, F. Chen, and et al. A comprehensive survey on graph neural networks. The IEEE Transactions on Neural Networks and Learning Systems, pages 1--21, 2020.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems, pages 1--21"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389738"},{"issue":"4","key":"e_1_2_1_49_1","first-page":"39","article-title":"Accelerating the machine learning lifecycle with mlflow","volume":"41","author":"Zaharia M.","year":"2018","unstructured":"M. Zaharia , A. Chen , A. Davidson , A. Ghodsi , S. A. Hong , A. Konwinski , S. Murching , T. Nykodym , P. Ogilvie , M. Parkhe , F. Xie , and C. Zumar . Accelerating the machine learning lifecycle with mlflow . The IEEE Data Engineering Bulletin , 41 ( 4 ): 39 -- 45 , 2018 . M. Zaharia, A. Chen, A. Davidson, A. Ghodsi, S. A. Hong, A. Konwinski, S. Murching, T. Nykodym, P. Ogilvie, M. Parkhe, F. Xie, and C. Zumar. Accelerating the machine learning lifecycle with mlflow. The IEEE Data Engineering Bulletin, 41(4):39--45, 2018.","journal-title":"The IEEE Data Engineering Bulletin"},{"key":"e_1_2_1_50_1","volume-title":"Labeling trick: A theory of using graph neural networks for multi-node representation learning. CoRR, https:\/\/arxiv.org\/abs\/2010.16103","author":"Zhang M.","year":"2020","unstructured":"M. Zhang , P. Li , Y. Xia , K. Wang , and L. Jin . Labeling trick: A theory of using graph neural networks for multi-node representation learning. CoRR, https:\/\/arxiv.org\/abs\/2010.16103 , 2020 . M. Zhang, P. Li, Y. Xia, K. Wang, and L. Jin. Labeling trick: A theory of using graph neural networks for multi-node representation learning. CoRR, https:\/\/arxiv.org\/abs\/2010.16103, 2020."}],"container-title":["ACM SIGMOD Record"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3516431.3516435","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3516431.3516435","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:21Z","timestamp":1750188621000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3516431.3516435"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,31]]},"references-count":50,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,1,31]]}},"alternative-id":["10.1145\/3516431.3516435"],"URL":"https:\/\/doi.org\/10.1145\/3516431.3516435","relation":{},"ISSN":["0163-5808"],"issn-type":[{"value":"0163-5808","type":"print"}],"subject":[],"published":{"date-parts":[[2022,1,31]]},"assertion":[{"value":"2022-01-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}