{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T11:30:44Z","timestamp":1774956644893,"version":"3.50.1"},"reference-count":12,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,4]]},"abstract":"<jats:p>\n            Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to have an accurate description of the\n            <jats:italic>structuredness<\/jats:italic>\n            of the data at hand (how well the data conform to the schema).\n          <\/jats:p>\n          <jats:p>In this paper, we have approached the study of the structuredness of an RDF graph in a principled way: we propose a framework for specifying structuredness functions, which gauge the degree to which an RDF graph conforms to a schema. In particular, we first define a formal language for specifying structuredness functions with expressions we call rules. This language allows a user to state a rule to which an RDF graph may fully or partially conform. Then we consider the issue of discovering a refinement of a sort (type) by partitioning the dataset into subsets whose structuredness is over a specified threshold. In particular, we prove that the natural decision problem associated to this refinement problem is NP-complete, and we provide a natural translation of this problem into Integer Linear Programming (ILP). Finally, we test this ILP solution with three real world datasets and three different and intuitive rules, which gauge the structuredness in different ways. We show that the rules give meaningful refinements of the datasets, showing that our language can be a powerful tool for understanding the structure of RDF data, and we show that the ILP solution is practical for a large fraction of existing data.<\/jats:p>","DOI":"10.14778\/2732296.2732297","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"601-612","source":"Crossref","is-referenced-by-count":6,"title":["A principled approach to bridging the gap between graph data and their schemas"],"prefix":"10.14778","volume":"7","author":[{"given":"Marcelo","family":"Arenas","sequence":"first","affiliation":[{"name":"Pontificia Universidad Cat\u00f3lica de Chile and University of Oxford"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gonzalo","family":"D\u00edaz","sequence":"additional","affiliation":[{"name":"Pontificia Universidad Cat\u00f3lica de Chile"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Achille","family":"Fokoue","sequence":"additional","affiliation":[{"name":"IBM T.J. Watson Research Center"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anastasios","family":"Kementsietsidis","sequence":"additional","affiliation":[{"name":"IBM T.J. Watson Research Center"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kavitha","family":"Srinivas","sequence":"additional","affiliation":[{"name":"IBM T.J. Watson Research Center"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2014,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/645480.655281"},{"key":"e_1_2_1_2_1","volume-title":"Apr.","author":"Amato C.","year":"2010","unstructured":"C. d' Amato , N. Fanizzi , and F. Esposito . Inductive learning for the semantic web: What does it buy? Semant. web, 1(1,2):53--59 , Apr. 2010 . C. d'Amato, N. Fanizzi, and F. Esposito. Inductive learning for the semantic web: What does it buy? Semant. web, 1(1,2):53--59, Apr. 2010."},{"key":"e_1_2_1_3_1","series-title":"CEUR Workshop Proceedings","volume-title":"IJCAI 2001 Workshop on Ontology Learning","author":"Delteil A.","year":"2001","unstructured":"A. Delteil , C. Faron-Zucker , and R. Dieng . Learning Ontologies from RDF annotations . In IJCAI 2001 Workshop on Ontology Learning , volume 38 of CEUR Workshop Proceedings . CEUR-WS. org, 2001 . A. Delteil, C. Faron-Zucker, and R. Dieng. Learning Ontologies from RDF annotations. In IJCAI 2001 Workshop on Ontology Learning, volume 38 of CEUR Workshop Proceedings. CEUR-WS.org, 2001."},{"key":"e_1_2_1_4_1","volume-title":"First Intl Workshop On Practical And Scalable Semantic Systems","author":"Ding L.","year":"2003","unstructured":"L. Ding , K. Wilkinson , C. Sayers , and H. Kuno . Application-specific schema design for storing large rdf datasets . In First Intl Workshop On Practical And Scalable Semantic Systems , 2003 . L. Ding, K. Wilkinson, C. Sayers, and H. Kuno. Application-specific schema design for storing large rdf datasets. In First Intl Workshop On Practical And Scalable Semantic Systems, 2003."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989340"},{"key":"e_1_2_1_6_1","first-page":"152","volume-title":"Proceedings of the Third International Semantic Web Conference (ISWC-04)","author":"Grimnes G. A.","unstructured":"G. A. Grimnes , P. Edwards , and A. Preece . Learning Meta-Descriptions of the FOAF Network . In Proceedings of the Third International Semantic Web Conference (ISWC-04) , LNCS, pages 152 -- 165 , Hiroshima, Japan. G. A. Grimnes, P. Edwards, and A. Preece. Learning Meta-Descriptions of the FOAF Network. In Proceedings of the Third International Semantic Web Conference (ISWC-04), LNCS, pages 152--165, Hiroshima, Japan."},{"key":"e_1_2_1_7_1","volume-title":"Dept of Computer Science","author":"Lee T. Y.","year":"2013","unstructured":"T. Y. Lee , D. W. Cheung , J. Chiu , S. D. Lee , H. Zhu , P. Yee , and W. Yuan . Automating relational database schema design for very large semantic datasets. Technical report , Dept of Computer Science , Univ. of Hong Kong , 2013 . T. Y. Lee, D. W. Cheung, J. Chiu, S. D. Lee, H. Zhu, P. Yee, and W. Yuan. Automating relational database schema design for very large semantic datasets. Technical report, Dept of Computer Science, Univ. of Hong Kong, 2013."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICWS.2009.49"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/645806.670157"},{"key":"e_1_2_1_11_1","volume-title":"Department of Computer Science","author":"Pan Z.","year":"2004","unstructured":"Z. Pan and J. Heflin . Dldb: Extending relational databases to support semantic web queries. Technical report , Department of Computer Science , Lehigh University , 2004 . Z. Pan and J. Heflin. Dldb: Extending relational databases to support semantic web queries. Technical report, Department of Computer Science, Lehigh University, 2004."},{"key":"e_1_2_1_12_1","first-page":"2837","article-title":"Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance","volume":"9999","author":"Vinh N. X.","year":"2010","unstructured":"N. X. Vinh , J. Epps , and J. Bailey . Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance . J. Mach. Learn. Res. , 9999 : 2837 -- 2854 , December 2010 . N. X. Vinh, J. Epps, and J. Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn. Res., 9999: 2837--2854, December 2010.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/2008892.2008904"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2732296.2732297","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:17:23Z","timestamp":1672219043000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2732296.2732297"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,4]]},"references-count":12,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2014,4]]}},"alternative-id":["10.14778\/2732296.2732297"],"URL":"https:\/\/doi.org\/10.14778\/2732296.2732297","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,4]]}}}