{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T16:12:50Z","timestamp":1774541570733,"version":"3.50.1"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2019,7]]},"abstract":"<jats:p>Estimation of the accuracy of a large-scale knowledge graph (KG) often requires humans to annotate samples from the graph. How to obtain statistically meaningful estimates for accuracy evaluation while keeping human annotation costs low is a problem critical to the development cycle of a KG and its practical applications. Surprisingly, this challenging problem has largely been ignored in prior research. To address the problem, this paper proposes an efficient sampling and evaluation framework, which aims to provide quality accuracy evaluation with strong statistical guarantee while minimizing human efforts. Motivated by the properties of the annotation cost function observed in practice, we propose the use of cluster sampling to reduce the overall cost. We further apply weighted and two-stage sampling as well as stratification for better sampling designs. We also extend our framework to enable efficient incremental evaluation on evolving KG, introducing two solutions based on stratified sampling and a weighted variant of reservoir sampling. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of our proposed solution. Compared to baseline approaches, our best solutions can provide up to 60% cost reduction on static KG evaluation and up to 80% cost reduction on evolving KG evaluation, without loss of evaluation quality.<\/jats:p>","DOI":"10.14778\/3342263.3342642","type":"journal-article","created":{"date-parts":[[2019,9,18]],"date-time":"2019-09-18T18:36:11Z","timestamp":1568831771000},"page":"1679-1691","source":"Crossref","is-referenced-by-count":48,"title":["Efficient knowledge graph accuracy evaluation"],"prefix":"10.14778","volume":"12","author":[{"given":"Junyang","family":"Gao","sequence":"first","affiliation":[{"name":"Duke University and Amazon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xian","family":"Li","sequence":"additional","affiliation":[{"name":"Amazon.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yifan Ethan","family":"Xu","sequence":"additional","affiliation":[{"name":"Amazon.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bunyamin","family":"Sisman","sequence":"additional","affiliation":[{"name":"Amazon.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xin Luna","family":"Dong","sequence":"additional","affiliation":[{"name":"Amazon.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Yang","sequence":"additional","affiliation":[{"name":"Duke University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Dbpedia. https:\/\/wiki.dbpedia.org.  Dbpedia. https:\/\/wiki.dbpedia.org."},{"key":"e_1_2_1_2_1","unstructured":"Imdb. https:\/\/www.imdb.com.  Imdb. https:\/\/www.imdb.com."},{"key":"e_1_2_1_3_1","unstructured":"Nell. http:\/\/rtw.ml.cmu.edu\/rtw\/resources.  Nell. http:\/\/rtw.ml.cmu.edu\/rtw\/resources."},{"key":"e_1_2_1_4_1","unstructured":"Wikidata. https:\/\/www.wikidata.org.  Wikidata. https:\/\/www.wikidata.org."},{"key":"e_1_2_1_5_1","unstructured":"Yago2. https:\/\/www.mpi-inf.mpg.de\/departments\/databases-and-information-systems\/research\/yago-naga.  Yago2. https:\/\/www.mpi-inf.mpg.de\/departments\/databases-and-information-systems\/research\/yago-naga."},{"key":"e_1_2_1_6_1","volume-title":"Detecting linked data quality issues via crowdsourcing: A dbpedia study. Semantic Web, (Preprint):1--33","author":"Acosta M.","year":"2016"},{"key":"e_1_2_1_7_1","volume-title":"First AAAI conference on human computation and crowdsourcing","author":"Bragg J.","year":"2013"},{"key":"e_1_2_1_8_1","volume-title":"Probabilistic similarity logic. arXiv preprint arXiv.1203.3469","author":"Brocheler M.","year":"2012"},{"key":"e_1_2_1_9_1","volume-title":"Statistical inference","author":"Casella G.","year":"2002"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-44918-8_6"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2912574"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1959.10501501"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623623"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2005.11.003"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242667"},{"key":"e_1_2_1_16_1","volume-title":"Efficient knowledge graph accuracy evaluation. Technical report","author":"Gao J.","year":"2019"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2015.08.001"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177731356"},{"key":"e_1_2_1_19_1","volume-title":"Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology, 1(1):1--136","author":"Heath T.","year":"2011"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/253262.253291"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2145432.2145494"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-22849-5_5"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-59888-8_29"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137642"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3191513"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1183"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00108"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3147.3165"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610505"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3342263.3342642","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:52:35Z","timestamp":1672221155000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3342263.3342642"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7]]},"references-count":29,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2019,7]]}},"alternative-id":["10.14778\/3342263.3342642"],"URL":"https:\/\/doi.org\/10.14778\/3342263.3342642","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2019,7]]}}}