{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T08:38:48Z","timestamp":1774946328901,"version":"3.50.1"},"reference-count":40,"publisher":"IGI Global","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,4,1]]},"abstract":"<p>Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.<\/p>","DOI":"10.4018\/ijswis.2014040104","type":"journal-article","created":{"date-parts":[[2014,10,6]],"date-time":"2014-10-06T14:53:55Z","timestamp":1412607235000},"page":"63-86","source":"Crossref","is-referenced-by-count":130,"title":["Improving the Quality of Linked Data Using Statistical Distributions"],"prefix":"10.4018","volume":"10","author":[{"given":"Heiko","family":"Paulheim","sequence":"first","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim, Germany"}]},{"given":"Christian","family":"Bizer","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim, Germany"}]}],"member":"2432","reference":[{"key":"ijswis.2014040104-0","article-title":"Crowdsourcing linked data quality assessment.","author":"M.Acosta","year":"2013","journal-title":"Proceedings of the 12th International Semantic Web Conference"},{"key":"ijswis.2014040104-1","article-title":"Automatic expansion of DBpedia exploiting Wikipedia cross-language information.","author":"A. P.Aprosio","year":"2013","journal-title":"Proceedings of the 10th Extended Semantic Web Conference (ESWC 2013)"},{"key":"ijswis.2014040104-2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-30284-8_21"},{"key":"ijswis.2014040104-3","unstructured":"Bizer, C., & Cyganiak, R. (2006). D2R Server \u2013 Publishing relational databases on the Semantic Web. Poster at the 5th International Semantic Web Conference."},{"key":"ijswis.2014040104-4","doi-asserted-by":"publisher","DOI":"10.4018\/jswis.2009081901"},{"key":"ijswis.2014040104-5","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2009.07.002"},{"key":"ijswis.2014040104-6","author":"A.Carlson","year":"2010","journal-title":"Toward an architecture for never-ending language learning"},{"key":"ijswis.2014040104-7","doi-asserted-by":"crossref","unstructured":"Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. In ACM Computing Surveys 41.3 (2009).","DOI":"10.1145\/1541880.1541882"},{"key":"ijswis.2014040104-8","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07443-6_20"},{"key":"ijswis.2014040104-9","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35176-1_5"},{"key":"ijswis.2014040104-10","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33876-2_10"},{"key":"ijswis.2014040104-11","doi-asserted-by":"crossref","unstructured":"Getoor, L., & Diehl, C. P. (2005). Link mining: A survey. In ACM SIGKDD Explorations Newsletter, 7(2), 3-12.","DOI":"10.1145\/1117454.1117456"},{"key":"ijswis.2014040104-12","article-title":"Type inference through the analysis of Wikipedia links","author":"A.Giovanni","year":"2012","journal-title":"Linked Data on the Web"},{"key":"ijswis.2014040104-13","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2012.06.001"},{"key":"ijswis.2014040104-14","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-41335-3_12"},{"key":"ijswis.2014040104-15","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21064-8_42"},{"key":"ijswis.2014040104-16","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-17749-1_12"},{"key":"ijswis.2014040104-17","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35176-1_20"},{"key":"ijswis.2014040104-18","doi-asserted-by":"crossref","unstructured":"Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., et al. (2014). DBpedia \u2013 A large-scale, multilingual knowledge base extracted from Wikipedia. In Semantic Web Journal.","DOI":"10.3233\/SW-140134"},{"key":"ijswis.2014040104-19","article-title":"An introduction to the syntax and content of cyc.","author":"C.Matuszek","year":"2006","journal-title":"Proceedings of the AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and its Applications to Knowledge Representation and Question Answering"},{"key":"ijswis.2014040104-20","article-title":"Iterative classification in relational data.","author":"J.Neville","year":"2000","journal-title":"Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data"},{"key":"ijswis.2014040104-21","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-72667-8_13"},{"key":"ijswis.2014040104-22","unstructured":"Paulheim, H. (2012). Browsing linked open data with auto complete. In Semantic Web Challenge."},{"key":"ijswis.2014040104-23","article-title":"Identifying wrong links between datasets by multi-dimensional outlier detection.","author":"H.Paulheim","year":"2014","journal-title":"Proceedings of the International Workshop on Debugging Ontologies and Ontology Mappings (WoDOOM 2014)"},{"key":"ijswis.2014040104-24","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-41335-3_32"},{"key":"ijswis.2014040104-25","doi-asserted-by":"publisher","DOI":"10.1145\/505248.506010"},{"key":"ijswis.2014040104-26","article-title":"Classifying the Wikipedia articles in the OpenCyc taxonomy.","author":"A.Pohl","year":"2012","journal-title":"Proceedings of the Web of Linked Entities Workshop (WoLE 2012)"},{"key":"ijswis.2014040104-27","doi-asserted-by":"crossref","unstructured":"Polleres, A., Hogan, A., Harth, A., & Decker, S. Can we ever catch up with the web? In: Semantic Web Journal, 1(1,2):45-52, 2010.","DOI":"10.3233\/SW-2010-0016"},{"key":"ijswis.2014040104-28","doi-asserted-by":"publisher","DOI":"10.1007\/11926078_42"},{"key":"ijswis.2014040104-29","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2007.03.004"},{"key":"ijswis.2014040104-30","doi-asserted-by":"publisher","DOI":"10.1109\/ICSC.2013.22"},{"key":"ijswis.2014040104-31","doi-asserted-by":"publisher","DOI":"10.1145\/2362499.2362505"},{"key":"ijswis.2014040104-32","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21034-1_9"},{"key":"ijswis.2014040104-33","unstructured":"Waitelonis, J., Ludwig, N., Knuth, M., & Sack, H. WhoKnows? - Evaluating linked data heuristics with a quiz that cleans up DBpedia. International Journal of Interactive Technology and Smart Education (ITSE)."},{"issue":"4","key":"ijswis.2014040104-34","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1080\/07421222.1996.11518099","article-title":"Beyond accuracy: What data quality means to data consumers.","volume":"12","author":"R. Y.Wang","year":"1996","journal-title":"Journal of Management Information Systems"},{"key":"ijswis.2014040104-35","doi-asserted-by":"publisher","DOI":"10.1145\/1141753.1141853"},{"key":"ijswis.2014040104-36","article-title":"Knowledge base completion via search-based question answering.","author":"R.West","year":"2014","journal-title":"International World Wide Web Conference (WWW)"},{"key":"ijswis.2014040104-37","article-title":"Detecting incorrect numerical data in DBpedia.","author":"D.Wienand","year":"2014","journal-title":"Extended Semantic Web Conference"},{"key":"ijswis.2014040104-38","doi-asserted-by":"publisher","DOI":"10.1145\/2506182.2506195"},{"key":"ijswis.2014040104-39","article-title":"Nell2RDF: Read the web, and turn it into RDF.","author":"A.Zimmermann","year":"2013","journal-title":"Proceedings of the 2nd International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data"}],"container-title":["International Journal on Semantic Web and Information Systems"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=116452","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T17:56:16Z","timestamp":1654106176000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/ijswis.2014040104"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2014,4,1]]},"references-count":40,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2014,4]]}},"URL":"https:\/\/doi.org\/10.4018\/ijswis.2014040104","relation":{},"ISSN":["1552-6283","1552-6291"],"issn-type":[{"value":"1552-6283","type":"print"},{"value":"1552-6291","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,4,1]]}}}