{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T13:44:11Z","timestamp":1767707051525},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2017,9]]},"abstract":"<jats:p>Distributed SPARQL engines promise to support very large RDF datasets by utilizing shared-nothing computer clusters. Some are based on distributed frameworks such as MapReduce; others implement proprietary distributed processing; and some rely on expensive preprocessing for data partitioning. These systems exhibit a variety of trade-offs that are not well-understood, due to the lack of any comprehensive quantitative and qualitative evaluation. In this paper, we present a survey of 22 state-of-the-art systems that cover the entire spectrum of distributed RDF data processing and categorize them by several characteristics. Then, we select 12 representative systems and perform extensive experimental evaluation with respect to preprocessing cost, query performance, scalability and workload adaptability, using a variety of synthetic and real large datasets with up to 4.3 billion triples. Our results provide valuable insights for practitioners to understand the trade-offs for their usage scenarios. Finally, we publish online our evaluation framework, including all datasets and workloads, for researchers to compare their novel systems against the existing ones.<\/jats:p>","DOI":"10.14778\/3151106.3151109","type":"journal-article","created":{"date-parts":[[2017,10,19]],"date-time":"2017-10-19T12:30:08Z","timestamp":1508416208000},"page":"2049-2060","source":"Crossref","is-referenced-by-count":64,"title":["A survey and experimental comparison of distributed SPARQL engines for very large RDF data"],"prefix":"10.14778","volume":"10","author":[{"given":"Ibrahim","family":"Abdelaziz","sequence":"first","affiliation":[{"name":"King Abdullah University of Science and Technology"}]},{"given":"Razen","family":"Harbi","sequence":"additional","affiliation":[{"name":"Saudi Aramco"}]},{"given":"Zuhair","family":"Khayyat","sequence":"additional","affiliation":[{"name":"King Abdullah University of Science and Technology"}]},{"given":"Panos","family":"Kalnis","sequence":"additional","affiliation":[{"name":"King Abdullah University of Science and Technology"}]}],"member":"320","published-online":{"date-parts":[[2017,9]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"https:\/\/github.com\/ecrc\/rdf-exp.  https:\/\/github.com\/ecrc\/rdf-exp."},{"key":"e_1_2_1_2_1","unstructured":"Apache Hadoop. http:\/\/hadoop.apache.org\/.  Apache Hadoop. http:\/\/hadoop.apache.org\/."},{"key":"e_1_2_1_3_1","unstructured":"Bio2RDF. http:\/\/bio2rdf.org\/.  Bio2RDF. http:\/\/bio2rdf.org\/."},{"key":"e_1_2_1_4_1","unstructured":"Dbpedia. http:\/\/dbpedia.org\/.  Dbpedia. http:\/\/dbpedia.org\/."},{"key":"e_1_2_1_5_1","unstructured":"HBase. http:\/\/hbase.apache.org.  HBase. http:\/\/hbase.apache.org."},{"key":"e_1_2_1_6_1","unstructured":"LUBM. http:\/\/swat.cse.lehigh.edu\/projects\/lubm\/.  LUBM. http:\/\/swat.cse.lehigh.edu\/projects\/lubm\/."},{"key":"e_1_2_1_7_1","unstructured":"PubChemRDF. http:\/\/pubchem.ncbi.nlm.nih.gov\/rdf\/.  PubChemRDF. http:\/\/pubchem.ncbi.nlm.nih.gov\/rdf\/."},{"key":"e_1_2_1_8_1","unstructured":"RDF Primer. http:\/\/www.w3.org\/TR\/rdf-primer\/.  RDF Primer. http:\/\/www.w3.org\/TR\/rdf-primer\/."},{"key":"e_1_2_1_9_1","unstructured":"SPARQL Query Language for RDF. https:\/\/www.w3.org\/TR\/rdf-sparql-query\/.  SPARQL Query Language for RDF. https:\/\/www.w3.org\/TR\/rdf-sparql-query\/."},{"key":"e_1_2_1_10_1","unstructured":"UniProt. http:\/\/www.uniprot.org\/.  UniProt. http:\/\/www.uniprot.org\/."},{"key":"e_1_2_1_11_1","unstructured":"Urika-GD. http:\/\/www.cray.com\/sites\/default\/files\/resources\/Urika-GD-TechSpecs.pdf.  Urika-GD. http:\/\/www.cray.com\/sites\/default\/files\/resources\/Urika-GD-TechSpecs.pdf."},{"key":"e_1_2_1_12_1","unstructured":"WatDiv. http:\/\/db.uwaterloo.ca\/watdiv\/.  WatDiv. http:\/\/db.uwaterloo.ca\/watdiv\/."},{"key":"e_1_2_1_13_1","unstructured":"YAGO2. http:\/\/yago-knowledge.org\/.  YAGO2. http:\/\/yago-knowledge.org\/."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1999299.1999303"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/2977797.2977806"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2720174"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824091"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772696"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2467799"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453965"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-008-0125-y"},{"key":"e_1_2_1_25_1","volume-title":"J. Quian\u00e9-Ruiz and S. Zampetakis. CliqueSquare: Flat Plans for Massively Parallel RDF Queries. In Proc. of ICDE","author":"Goasdou\u00e9 F.","year":"2015"},{"key":"e_1_2_1_26_1","first-page":"11","volume":"15","author":"Faye D.","year":"2012","journal-title":"A Survey of RDF Storage Approaches. ARIMA Journal"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_2_1_28_1","volume-title":"Proc. of ICDE Workshops","author":"Schenkel K.","year":"2013"},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"J. Huang D. Abadi and K. Ren. Scalable SPARQL Querying of Large RDF Graphs. PVLDB 4(11) 2011.  J. Huang D. Abadi and K. Ren. Scalable SPARQL Querying of Large RDF Graphs. PVLDB 4(11) 2011.","DOI":"10.14778\/3402707.3402747"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.103"},{"key":"e_1_2_1_31_1","volume-title":"Proc. of OSDI","author":"Dean J.","year":"2004"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556571"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/2535570.2488333"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0364-z"},{"key":"e_1_2_1_35_1","volume-title":"Open-Source SQL Engine for Hadoop. In Proc. of CIDR","author":"Kornacker M.","year":"2015"},{"key":"e_1_2_1_36_1","unstructured":"L. Galarraga K. Hose and R. Schenkel. Partout: A Distributed Engine for Efficient RDF Processing. CoRR abs\/1212.5636 2012.  L. Galarraga K. Hose and R. Schenkel. Partout: A Distributed Engine for Efficient RDF Processing. CoRR abs\/1212.5636 2012."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002974.2002976"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735703.2735705"},{"key":"e_1_2_1_39_1","first-page":"10","article-title":"Spark","author":"Zaharia M.","year":"2010","journal-title":"Cluster Computing with Working Sets. Proc. of HotCloud"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888916000217"},{"key":"e_1_2_1_41_1","volume-title":"Big Data","author":"Papailiou N.","year":"2013"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816762"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-016-5554-y"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/322234.322238"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-015-0415-0"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-016-0420-y"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1996014.1996021"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610511"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815948.1815953"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-11964-9_11"},{"key":"e_1_2_1_51_1","volume-title":"VLDB Workshop on Big Graphs Online Querying","author":"Sch\u00e4tzle A.","year":"2015"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.5555\/2075806.2075886"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-009-0165-y"},{"key":"e_1_2_1_54_1","first-page":"35","volume-title":"SSWS","author":"Wilkinson K.","year":"2006"},{"key":"e_1_2_1_55_1","first-page":"131","volume-title":"Proc. of SWDB","author":"Wilkinson K.","year":"2003"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544856"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213895"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3151106.3151109","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:08:26Z","timestamp":1672225706000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3151106.3151109"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,9]]},"references-count":56,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2017,9]]}},"alternative-id":["10.14778\/3151106.3151109"],"URL":"https:\/\/doi.org\/10.14778\/3151106.3151109","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2017,9]]}}}