{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T15:00:22Z","timestamp":1770994822746,"version":"3.50.1"},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2009,3,20]],"date-time":"2009-03-20T00:00:00Z","timestamp":1237507200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMOD Rec."],"published-print":{"date-parts":[[2009,3,20]]},"abstract":"<jats:p>This paper gives an overview on the YAGO-NAGA approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. YAGO harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of WordNet in order to ensure that all entities have proper classes and the class system is consistent. Currently, the YAGO knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the YAGO extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for YAGO, coined NAGA. It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources.<\/jats:p>","DOI":"10.1145\/1519103.1519110","type":"journal-article","created":{"date-parts":[[2009,4,6]],"date-time":"2009-04-06T16:34:22Z","timestamp":1239035662000},"page":"41-47","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":59,"title":["The YAGO-NAGA approach to knowledge discovery"],"prefix":"10.1145","volume":"37","author":[{"given":"Gjergji","family":"Kasneci","sequence":"first","affiliation":[{"name":"Max Planck Institute for Informatics, Saarbruecken, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maya","family":"Ramanath","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics, Saarbruecken, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fabian","family":"Suchanek","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics, Saarbruecken, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gerhard","family":"Weikum","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics, Saarbruecken, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2009,3,20]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Eugene Agichtein: Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull. 28(4) 2005.  Eugene Agichtein: Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull. 28(4) 2005."},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"S\u00f6ren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G. Ives: DBpedia: A Nucleus for a Web of Open Data. ISWC\/ASWC 2007.   S\u00f6ren Auer Christian Bizer Georgi Kobilarov Jens Lehmann Richard Cyganiak Zachary G. Ives: DBpedia: A Nucleus for a Web of Open Data. ISWC\/ASWC 2007.","DOI":"10.1007\/978-3-540-76298-0_52"},{"key":"e_1_2_1_3_1","volume-title":"IJCAI","author":"Banko Michele","year":"2007"},{"key":"e_1_2_1_4_1","volume-title":"Oren Etzioni: Structured Querying of Web Text Data: A Technical Challenge. CIDR","author":"Cafarella Michael J.","year":"2007"},{"key":"e_1_2_1_5_1","volume-title":"2nd Edition","author":"Cunningham Hamish","year":"2005"},{"key":"e_1_2_1_6_1","volume-title":"VLDB","author":"DeRose Pedro","year":"2007"},{"key":"e_1_2_1_7_1","unstructured":"Minko Dudev Shady Elbassuoni Julia Luxenburger Maya Ramanath Gerhard Weikum: Personalizing the Search for Knowledge. PersDB 2008.  Minko Dudev Shady Elbassuoni Julia Luxenburger Maya Ramanath Gerhard Weikum: Personalizing the Search for Knowledge. PersDB 2008."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2005.03.001"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1292609.1292611"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497504"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.64"},{"key":"e_1_2_1_12_1","first-page":"39","author":"Liu Xiaoyong","year":"2004","journal-title":"Bruce Croft: Statistical Language Modeling for Information Retrieval. Annual Review of Information Science and Technology"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242584"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497502"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000003"},{"key":"e_1_2_1_16_1","volume-title":"Raghu Ramakrishnan: Declarative Information Extraction Using Datalog with Embedded Extraction Predicates. VLDB","author":"Shen Warren","year":"2007"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150492"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242667"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2008.06.001"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321449"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367583"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2004.11.003"},{"key":"e_1_2_1_23_1","unstructured":"Qi Zhang Fabian M. Suchanek Lihua Yue Gerhard Weikum: TOB: Timely Ontologies for Business Relations. WebDB 2008.  Qi Zhang Fabian M. Suchanek Lihua Yue Gerhard Weikum: TOB: Timely Ontologies for Business Relations. WebDB 2008."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150457"}],"container-title":["ACM SIGMOD Record"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1519103.1519110","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1519103.1519110","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:29:53Z","timestamp":1750253393000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1519103.1519110"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,20]]},"references-count":24,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2009,3,20]]}},"alternative-id":["10.1145\/1519103.1519110"],"URL":"https:\/\/doi.org\/10.1145\/1519103.1519110","relation":{},"ISSN":["0163-5808"],"issn-type":[{"value":"0163-5808","type":"print"}],"subject":[],"published":{"date-parts":[[2009,3,20]]},"assertion":[{"value":"2009-03-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}