{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,5]],"date-time":"2025-08-05T12:47:41Z","timestamp":1754398061893},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,8,26]],"date-time":"2021-08-26T00:00:00Z","timestamp":1629936000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,26]],"date-time":"2021-08-26T00:00:00Z","timestamp":1629936000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Qatar Computing Research Institute"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["The VLDB Journal"],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Knowledge graphs represented as RDF datasets are integral to many machine learning applications. RDF is supported by a rich ecosystem of data management systems and tools, most notably RDF database systems that provide a SPARQL query interface. Surprisingly, machine learning tools for knowledge graphs do not use SPARQL, despite the obvious advantages of using a database system. This is due to the mismatch between SPARQL and machine learning tools in terms of data model and programming style. Machine learning tools work on data in tabular format and process it using an imperative programming style, while SPARQL is declarative and has as its basic operation matching graph patterns to RDF triples. We posit that a good interface to knowledge graphs from a machine learning software stack should use an imperative, navigational programming paradigm based on graph traversal rather than the SPARQL query paradigm based on graph patterns. In this paper, we present RDFFrames, a framework that provides such an interface. RDFFrames provides an imperative Python API that gets internally translated to SPARQL, and it is integrated with the PyData machine learning software stack. RDFFrames enables the user to make a sequence of Python calls to define the data to be extracted from a knowledge graph stored in an RDF database system, and it translates these calls into a compact SPQARL query, executes it on the database system, and returns the results in a standard tabular format. Thus, RDFFrames is a useful tool for data preparation that combines the usability of PyData with the flexibility and performance of RDF database systems.<\/jats:p>","DOI":"10.1007\/s00778-021-00690-5","type":"journal-article","created":{"date-parts":[[2021,8,26]],"date-time":"2021-08-26T22:02:33Z","timestamp":1630015353000},"page":"321-346","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["RDFFrames: knowledge graph access for machine learning tools"],"prefix":"10.1007","volume":"31","author":[{"given":"Aisha","family":"Mohamed","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ghadeer","family":"Abuoda","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abdurrahman","family":"Ghanem","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zoi","family":"Kaoudi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ashraf","family":"Aboulnaga","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,8,26]]},"reference":[{"issue":"4","key":"690_CR1","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1145\/3385658.3385668","volume":"48","author":"D Abadi","year":"2019","unstructured":"Abadi, D., et al.: The Seattle report on database research. SIGMOD Rec. 48(4), 44\u201353 (2019)","journal-title":"SIGMOD Rec."},{"key":"690_CR2","doi-asserted-by":"crossref","unstructured":"Agrawal, P. et\u00a0al.: Data platform for machine learning. In: SIGMOD (2019)","DOI":"10.1145\/3299869.3314050"},{"key":"690_CR3","unstructured":"Ali, M. et\u00a0al.: PyKEEN 1.0: A python library for training and evaluating knowledge graph embeddings. arXiv preprint arXiv:2007.14175 (2020)"},{"key":"690_CR4","doi-asserted-by":"crossref","unstructured":"Angles, R. et\u00a0al.: G-CORE: a core for future graph query languages. In: SIGMOD (2018)","DOI":"10.1145\/3183713.3190654"},{"key":"690_CR5","doi-asserted-by":"crossref","unstructured":"Baylor, D. et\u00a0al.: TFX: a TensorFlow-based production-scale machine learning platform. In: SIGKDD (2017)","DOI":"10.1145\/3097983.3098021"},{"key":"690_CR6","doi-asserted-by":"publisher","first-page":"706","DOI":"10.1016\/j.jbi.2008.03.004","volume":"41","author":"F Belleau","year":"2008","unstructured":"Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41, 706\u2013716 (2008)","journal-title":"J. Biomed. Inf."},{"key":"690_CR7","unstructured":"Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS (2013)"},{"key":"690_CR8","unstructured":"Costabello, L., et\u00a0al.: AmpliGraph: a library for representation learning on knowledge graphs. https:\/\/doi.org\/10.5281\/zenodo.2595043 (2019)"},{"key":"690_CR9","doi-asserted-by":"publisher","DOI":"10.1002\/0471448354","volume-title":"Exploratory Data Mining and Data Cleaning","author":"T Dasu","year":"2003","unstructured":"Dasu, T., Johnson, T.: Exploratory Data Mining and Data Cleaning. Wiley, Hoboken (2003)"},{"key":"690_CR10","first-page":"17","volume":"14","author":"R Davis","year":"1993","unstructured":"Davis, R., Shrobe, H., Szolovits, P.: What is a knowledge representation? AI Mag. 14, 17\u201317 (1993)","journal-title":"AI Mag."},{"key":"690_CR11","doi-asserted-by":"crossref","unstructured":"Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: AAAI (2018)","DOI":"10.1609\/aaai.v32i1.11573"},{"key":"690_CR12","doi-asserted-by":"crossref","unstructured":"Doan A (2018) Human-in-the-loop data analysis: a personal perspective. In: Proc. Workshop on Human-In-the-Loop Data Analytics (HILDA)","DOI":"10.1145\/3209900.3209913"},{"key":"690_CR13","doi-asserted-by":"crossref","unstructured":"Dong, XL.: Challenges and innovations in building a product knowledge graph. In: SIGKDD (2018)","DOI":"10.1145\/3219819.3219938"},{"key":"690_CR14","first-page":"1400","volume":"11","author":"JV Dsilva","year":"2018","unstructured":"Dsilva, J.V., De Moor, F., Kemma, B.: AIDA - Abstraction for advanced in-database analytics. PVLDB 11, 1400\u20131413 (2018)","journal-title":"PVLDB"},{"key":"690_CR15","volume-title":"Database Systems: The Complete Book","author":"H Garcia-Molina","year":"2008","unstructured":"Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book, 2nd edn. Pearson, London (2008)","edition":"2"},{"key":"690_CR16","doi-asserted-by":"crossref","unstructured":"Giles, CL., Bollacker, KD., Lawrence, S.: CiteSeer: an automatic citation indexing system. In: ACM DL (1998)","DOI":"10.1145\/276675.276685"},{"key":"690_CR17","doi-asserted-by":"crossref","unstructured":"Govind, Y., et\u00a0al.: Entity matching meets data science: a progress report from the Magellan project. In: SIGMOD (2019)","DOI":"10.1145\/3299869.3314042"},{"key":"690_CR18","doi-asserted-by":"crossref","unstructured":"Haase, P., Broekstra, J., Eberhart, A., Volz, R.: A comparison of RDF query languages. In: ISWC (2004)","DOI":"10.1007\/978-3-540-30475-3_35"},{"key":"690_CR19","unstructured":"Hagedorn, S., Kl\u00e4be, S., Sattler, KU.: Putting Pandas in a box. In: CIDR (2021)"},{"key":"690_CR20","doi-asserted-by":"crossref","unstructured":"Han, X., et\u00a0al.: OpenKE: an open toolkit for knowledge embedding. In: EMNLP (2018)","DOI":"10.18653\/v1\/D18-2024"},{"key":"690_CR21","unstructured":"Jenatton, R., Roux, NL., Bordes, A., Obozinski, GR.: A latent factor model for highly multi-relational data. In: NIPS (2012)"},{"key":"690_CR22","unstructured":"Jindal, A., et\u00a0al.: Magpie: Python at speed and scale using cloud backends. In: CIDR (2021)"},{"key":"690_CR23","doi-asserted-by":"crossref","unstructured":"Kaminski, M., Kostylev, EV., Cuenca Grau, B.: Semantics and expressive power of subqueries and aggregates in SPARQL 1.1. In: WWW (2016)","DOI":"10.1145\/2872427.2883022"},{"key":"690_CR24","unstructured":"Kochut, K., Janik, M.: SPARQLeR: Extended SPARQL for semantic association discovery. In: ESWC (2007)"},{"key":"690_CR25","doi-asserted-by":"publisher","first-page":"167","DOI":"10.3233\/SW-140134","volume":"6","author":"J Lehmann","year":"2015","unstructured":"Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6, 167\u2013195 (2015)","journal-title":"Semant. Web"},{"key":"690_CR26","unstructured":"Matsumoto, S., Yamanaka, R., Chiba, H.: Mapping RDF graphs to property graphs. In: Proc. Joint Int. Semantic Technology Conf. (JIST) (2018)"},{"issue":"2","key":"690_CR27","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1023\/A:1009953814988","volume":"3","author":"AK McCallum","year":"2000","unstructured":"McCallum, A.K., Nigam, K., Rennie, J., Seymore, K.: Automating the construction of Internet portals with machine learning. Inf. Retr. 3(2), 127\u2013163 (2000)","journal-title":"Inf. Retr."},{"key":"690_CR28","doi-asserted-by":"crossref","unstructured":"Mohamed, A., Abuoda, G., Ghanem, A., Kaoudi, Z., Aboulnaga, A.: RDFFrames: knowledge graph access for machine learning tools. PVLDB 13, (Demonstration) (2020)","DOI":"10.1007\/s00778-021-00690-5"},{"key":"690_CR29","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1016\/j.artint.2012.07.001","volume":"193","author":"R Navigli","year":"2012","unstructured":"Navigli, R., Ponzetto, S.P.: Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217\u2013250 (2012)","journal-title":"Artif. Intell."},{"key":"690_CR30","unstructured":"Nguyen, DQ.: An overview of embedding models of entities and relationships for knowledge base completion. arXiv preprint arXiv:17030.8098 (2017)"},{"key":"690_CR31","unstructured":"Nickel, M., Tresp, V., Kriegel, HP.: A three-way model for collective learning on multi-relational data. In: ICML (2011)"},{"key":"690_CR32","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1109\/JPROC.2015.2483592","volume":"104","author":"M Nickel","year":"2015","unstructured":"Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104, 11\u201333 (2015)","journal-title":"Proc. IEEE"},{"key":"690_CR33","doi-asserted-by":"crossref","unstructured":"Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs. In: AAAI (2016)","DOI":"10.1609\/aaai.v30i1.10314"},{"key":"690_CR34","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1016\/j.websem.2010.01.002","volume":"8","author":"J P\u00e9rez","year":"2010","unstructured":"P\u00e9rez, J., Arenas, M., Guti\u00e9rrez, C.: nSPARQL: a navigational language for RDF. J Web Semant. 8, 255\u2013270 (2010)","journal-title":"J Web Semant."},{"key":"690_CR35","doi-asserted-by":"crossref","unstructured":"Petersohn, D., et al.: Towards scalable dataframe systems. PVLDB 13,(2020)","DOI":"10.14778\/3407790.3407807"},{"key":"690_CR36","doi-asserted-by":"crossref","unstructured":"Pirahesh, H., Hellerstein, JM., Hasan, W.: Extensible\/rule based query rewrite optimization in Starburst. In: SIGMOD (1992)","DOI":"10.1145\/130283.130294"},{"key":"690_CR37","doi-asserted-by":"crossref","unstructured":"Pujara, J., Augustine, E., Getoor, L.: Sparsity and noise: where knowledge graph embeddings fall short. In: EMNLP (2017)","DOI":"10.18653\/v1\/D17-1184"},{"key":"690_CR38","doi-asserted-by":"crossref","unstructured":"Rebele, T., et\u00a0al.: YAGO: a multilingual knowledge base from Wikipedia, Wordnet, and Geonames. In: ISWC (2016)","DOI":"10.1007\/978-3-319-46547-0_19"},{"key":"690_CR39","doi-asserted-by":"crossref","unstructured":"Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: ISWC (2011)","DOI":"10.1007\/978-3-642-21064-8_39"},{"key":"690_CR40","unstructured":"Sculley, D., et\u00a0al.: Hidden technical debt in machine learning systems. In: NIPS (2015)"},{"key":"690_CR41","doi-asserted-by":"crossref","unstructured":"Suchanek, FM., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW(2007)","DOI":"10.1145\/1242572.1242667"},{"key":"690_CR42","doi-asserted-by":"crossref","unstructured":"Vrandecic, D.: Wikidata: a new platform for collaborative data collection. In: WWW (2012)","DOI":"10.1145\/2187980.2188242"},{"key":"690_CR43","first-page":"2724","volume":"29","author":"Q Wang","year":"2017","unstructured":"Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. TKDE 29, 2724\u20132743 (2017)","journal-title":"TKDE"},{"key":"690_CR44","doi-asserted-by":"crossref","unstructured":"Wang, Y., Gemulla, R., Li, H.: On multi-relational link prediction with bilinear models. In: AAAI (2018)","DOI":"10.1609\/aaai.v32i1.11738"},{"key":"690_CR45","doi-asserted-by":"crossref","unstructured":"West, R., et\u00a0al.: Knowledge base completion via search-based question answering. In: WWW (2014)","DOI":"10.1145\/2566486.2568032"},{"key":"690_CR46","first-page":"1","volume":"40","author":"H Wickham","year":"2011","unstructured":"Wickham, H.: The split-apply-combine strategy for data analysis. J. Stat. Softw. 40, 1\u201329 (2011)","journal-title":"J. Stat. Softw."},{"key":"690_CR47","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1145\/2934664","volume":"59","author":"M Zaharia","year":"2016","unstructured":"Zaharia, M., et al.: Apache spark: a unified engine for big data processing. CACM 59, 56\u201365 (2016)","journal-title":"CACM"},{"key":"690_CR48","first-page":"39","volume":"41","author":"M Zaharia","year":"2018","unstructured":"Zaharia, M., et al.: Accelerating the machine learning lifecycle with MLflow. IEEE Data Eng. Bull. 41, 39\u201345 (2018)","journal-title":"IEEE Data Eng. Bull."}],"container-title":["The VLDB Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-021-00690-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00778-021-00690-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-021-00690-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,8]],"date-time":"2023-01-08T05:21:42Z","timestamp":1673155302000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00778-021-00690-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,26]]},"references-count":48,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["690"],"URL":"https:\/\/doi.org\/10.1007\/s00778-021-00690-5","relation":{},"ISSN":["1066-8888","0949-877X"],"issn-type":[{"value":"1066-8888","type":"print"},{"value":"0949-877X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,26]]},"assertion":[{"value":"14 September 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 June 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 June 2021","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 August 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}