{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T19:48:43Z","timestamp":1759693723529},"reference-count":51,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2023,7,5]],"date-time":"2023-07-05T00:00:00Z","timestamp":1688515200000},"content-version":"vor","delay-in-days":185,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We focus on the factuality property during the extraction of an OpenIE corpus named OpenFact, which contains more than 12 million high-quality knowledge triplets. We break down the factuality property into two important aspects\u2014expressiveness and groundedness\u2014and we propose a comprehensive framework to handle both aspects. To enhance expressiveness, we formulate each knowledge piece in OpenFact based on a semantic frame. We also design templates, extra constraints, and adopt human efforts so that most OpenFact triplets contain enough details. For groundedness, we require the main arguments of each triplet to contain linked Wikidata1 entities. A human evaluation suggests that the OpenFact triplets are much more accurate and contain denser information compared to OPIEC-Linked (Gashteovski et al., 2019), one recent high-quality OpenIE corpus grounded to Wikidata. Further experiments on knowledge base completion and knowledge base question answering show the effectiveness of OpenFact over OPIEC-Linked as supplementary knowledge to Wikidata as the major KG.<\/jats:p>","DOI":"10.1162\/tacl_a_00569","type":"journal-article","created":{"date-parts":[[2023,7,5]],"date-time":"2023-07-05T17:33:48Z","timestamp":1688578428000},"page":"686-702","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":4,"title":["OpenFact: Factuality Enhanced Open Knowledge Extraction"],"prefix":"10.1162","volume":"11","author":[{"given":"Linfeng","family":"Song","sequence":"first","affiliation":[{"name":"Tencent AI Lab, Bellevue, WA, USA. lfsong@global.tencent.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ante","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Informatics, Xiamen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoman","family":"Pan","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Bellevue, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongming","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Bellevue, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dian","family":"Yu","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Bellevue, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lifeng","family":"Jin","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Bellevue, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haitao","family":"Mi","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Bellevue, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinsong","family":"Su","sequence":"additional","affiliation":[{"name":"School of Informatics, Xiamen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yue","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Engineering, Westlake University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dong","family":"Yu","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Bellevue, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2023,6,29]]},"reference":[{"key":"2023070517334332500_bib1","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1145\/1134271.1134284","article-title":"Discovering missing links in Wikipedia","volume-title":"Proceedings of the 3rd International Workshop on Link Discovery","author":"Adafre","year":"2005"},{"key":"2023070517334332500_bib2","first-page":"52","article-title":"Kraken: N-ary facts in open information extraction","volume-title":"Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web- scale Knowledge Extraction (AKBC-WEKEX)","author":"Akbik","year":"2012"},{"key":"2023070517334332500_bib3","doi-asserted-by":"publisher","first-page":"344","DOI":"10.3115\/v1\/P15-1034","article-title":"Leveraging linguistic structure for open domain information extraction","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Angeli","year":"2015"},{"key":"2023070517334332500_bib4","doi-asserted-by":"publisher","first-page":"5185","DOI":"10.18653\/v1\/D19-1522","article-title":"Tucker: Tensor factorization for knowledge graph completion","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Bala\u017eevi\u0107","year":"2019"},{"key":"2023070517334332500_bib5","doi-asserted-by":"publisher","first-page":"55","DOI":"10.18653\/v1\/D16-1006","article-title":"Nested propositions in open information extraction","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing","author":"Bhutani","year":"2016"},{"key":"2023070517334332500_bib6","doi-asserted-by":"publisher","first-page":"1247","DOI":"10.1145\/1376616.1376746","article-title":"Freebase: A collaboratively created graph database for structuring human knowledge","volume-title":"Proceedings of the 2008 ACM SIGMOD international conference on Management of data","author":"Bollacker","year":"2008"},{"key":"2023070517334332500_bib7","article-title":"Translating embeddings for modeling multi-relational data","volume":"26","author":"Bordes","year":"2013","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023070517334332500_bib8","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1162\/tacl_a_00156","article-title":"Large-scale information extraction from textual definitions through deep syntactic and semantic analysis","volume":"3","author":"Bovi","year":"2015","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2023070517334332500_bib9","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v24i1.7519","article-title":"Toward an architecture for never-ending language learning","volume-title":"Twenty-Fourth AAAI Conference on Artificial Intelligence","author":"Carlson","year":"2010"},{"key":"2023070517334332500_bib10","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1145\/1999676.1999697","article-title":"An analysis of open information extraction based on semantic role labeling","volume-title":"Proceedings of the Sixth International Conference on Knowledge Capture","author":"Christensen","year":"2011"},{"key":"2023070517334332500_bib11","doi-asserted-by":"publisher","first-page":"407","DOI":"10.18653\/v1\/P18-2065","article-title":"Neural open information extraction","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Cui","year":"2018"},{"key":"2023070517334332500_bib12","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1145\/2488388.2488420","article-title":"Clausie: Clause-based open information extraction","volume-title":"Proceedings of the 22nd International Conference on World Wide Web","author":"Del Corro","year":"2013"},{"key":"2023070517334332500_bib13","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11573","article-title":"Convolutional 2d knowledge graph embeddings","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Dettmers","year":"2018"},{"key":"2023070517334332500_bib14","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"issue":"12","key":"2023070517334332500_bib15","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1145\/1409360.1409378","article-title":"Open information extraction from the web","volume":"51","author":"Etzioni","year":"2008","journal-title":"Communications of the ACM"},{"key":"2023070517334332500_bib16","first-page":"1535","article-title":"Identifying relations for open information extraction","volume-title":"Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing","author":"Fader","year":"2011"},{"key":"2023070517334332500_bib17","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1278","article-title":"Minie: Minimizing facts in open information extraction","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Gashteovski","year":"2017"},{"key":"2023070517334332500_bib18","article-title":"{OPIEC}: An open information extraction corpus","volume-title":"Automated Knowledge Base Construction (AKBC)","author":"Gashteovski","year":"2019"},{"key":"2023070517334332500_bib19","doi-asserted-by":"publisher","first-page":"874","DOI":"10.18653\/v1\/2021.eacl-main.74","article-title":"Leveraging passage retrieval with generative models for open domain question answering","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Izacard","year":"2021"},{"key":"2023070517334332500_bib20","doi-asserted-by":"publisher","first-page":"6769","DOI":"10.18653\/v1\/2020.emnlp-main.550","article-title":"Dense passage retrieval for open-domain question answering","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Karpukhin","year":"2020"},{"key":"2023070517334332500_bib21","article-title":"Adam: A method for stochastic optimization","author":"Kingma","year":"2014","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"2023070517334332500_bib22","doi-asserted-by":"publisher","first-page":"5871","DOI":"10.18653\/v1\/2020.acl-main.521","article-title":"IMoJIE: Iterative memory-based joint open information extraction","volume-title":"Proceedings of the ACL","author":"Kolluru","year":"2020"},{"key":"2023070517334332500_bib23","first-page":"84","article-title":"Entity linking at web scale","volume-title":"Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)","author":"Lin","year":"2012"},{"key":"2023070517334332500_bib24","article-title":"Decoupled weight decay regularization","volume-title":"International Conference on Learning Representations","author":"Loshchilov","year":"2018"},{"key":"2023070517334332500_bib25","doi-asserted-by":"publisher","first-page":"3570","DOI":"10.18653\/v1\/2022.findings-acl.282","article-title":"Do pre-trained models benefit knowledge graph completion? A reliable evaluation and a reasonable approach","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Lv","year":"2022"},{"key":"2023070517334332500_bib26","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1016\/j.eswa.2018.07.017","article-title":"OpenIE-based approach for knowledge graph construction from text","volume":"113","author":"Martinez-Rodriguez","year":"2018","journal-title":"Expert Systems with Applications"},{"key":"2023070517334332500_bib27","first-page":"523","article-title":"Open language learning for information extraction","volume-title":"Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"Mausam","year":"2012"},{"issue":"5","key":"2023070517334332500_bib28","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1145\/3191513","article-title":"Never-ending learning","volume":"61","author":"Mitchell","year":"2018","journal-title":"Communications of the ACM"},{"key":"2023070517334332500_bib29","article-title":"Integrating syntactic and semantic analysis into the open information extraction paradigm","volume-title":"Twenty-Third International Joint Conference on Artificial Intelligence","author":"Moro","year":"2013"},{"key":"2023070517334332500_bib30","first-page":"1135","article-title":"Patty: A taxonomy of relational patterns with semantic types","volume-title":"Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"Nakashole","year":"2012"},{"key":"2023070517334332500_bib31","doi-asserted-by":"publisher","first-page":"1031","DOI":"10.26615\/978-954-452-072-4_116","article-title":"Improving distantly supervised relation extraction with self-ensemble noise filtering","volume-title":"Proceedings of the RANLP 2021","author":"Nayak","year":"2021"},{"key":"2023070517334332500_bib32","article-title":"A three-way model for collective learning on multi-relational data","volume-title":"Proceedings of the 28th International Conference on Machine Learning","author":"Nickel","year":"2011"},{"key":"2023070517334332500_bib33","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-naacl.115","article-title":"UniK-QA: Unified representations of structured and unstructured knowledge for open-domain question answering","volume-title":"Findings of the Association for Computational Linguistics: NAACL 2022","author":"Oguz","year":"2022"},{"issue":"1","key":"2023070517334332500_bib34","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1162\/0891201053630264","article-title":"The proposition bank: An annotated corpus of semantic roles","volume":"31","author":"Palmer","year":"2005","journal-title":"Computational Linguistics"},{"key":"2023070517334332500_bib35","doi-asserted-by":"crossref","first-page":"1946","DOI":"10.18653\/v1\/P17-1178","article-title":"Cross-lingual name tagging and linking for 282 languages","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Pan","year":"2017"},{"issue":"140","key":"2023070517334332500_bib36","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer.","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2023070517334332500_bib37","doi-asserted-by":"publisher","first-page":"3982","DOI":"10.18653\/v1\/D19-1410","article-title":"Sentence-bert: Sentence embeddings using Siamese BERT-networks","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Reimers","year":"2019"},{"key":"2023070517334332500_bib38","doi-asserted-by":"publisher","first-page":"8328","DOI":"10.18653\/v1\/2020.emnlp-main.669","article-title":"Codex: A comprehensive knowledge graph completion benchmark","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Safavi","year":"2020"},{"key":"2023070517334332500_bib39","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.201","article-title":"Sequence-to-sequence knowledge graph completion and question answering","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics","author":"Saxena","year":"2022"},{"key":"2023070517334332500_bib40","doi-asserted-by":"publisher","first-page":"4498","DOI":"10.18653\/v1\/2020.acl-main.412","article-title":"Improving multi-hop question answering over knowledge graphs using knowledge base embeddings","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Saxena","year":"2020"},{"key":"2023070517334332500_bib41","article-title":"Simple BERT models for relation extraction and semantic role labeling","author":"Shi","year":"2019","journal-title":"arXiv preprint arXiv:1904.05255"},{"key":"2023070517334332500_bib42","doi-asserted-by":"publisher","first-page":"303","DOI":"10.3115\/v1\/P15-2050","article-title":"Open IE as an intermediate structure for semantic tasks","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)","author":"Stanovsky","year":"2015"},{"key":"2023070517334332500_bib43","doi-asserted-by":"publisher","first-page":"885","DOI":"10.18653\/v1\/N18-1081","article-title":"Supervised open information extraction","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Stanovsky","year":"2018"},{"key":"2023070517334332500_bib44","first-page":"2071","article-title":"Complex embeddings for simple link prediction","volume-title":"International Conference on Machine Learning","author":"Trouillon","year":"2016"},{"key":"2023070517334332500_bib45","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1145\/2566486.2568032","article-title":"Knowledge base completion via search-based question answering","volume-title":"Proceedings of the 23rd International Conference on World Wide Web","author":"West","year":"2014"},{"key":"2023070517334332500_bib46","doi-asserted-by":"publisher","first-page":"6397","DOI":"10.18653\/v1\/2020.emnlp-main.519","article-title":"Scalable zero-shot entity linking with dense entity retrieval","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Wu","year":"2020"},{"key":"2023070517334332500_bib47","article-title":"KG-BERT: Bert for knowledge graph completion","author":"Yao","year":"2019","journal-title":"arXiv preprint arXiv:1909.03193"},{"key":"2023070517334332500_bib48","doi-asserted-by":"publisher","first-page":"25","DOI":"10.3115\/1614164.1614177","article-title":"Textrunner: Open information extraction on the web","volume-title":"Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)","author":"Yates","year":"2007"},{"key":"2023070517334332500_bib49","doi-asserted-by":"crossref","first-page":"201","DOI":"10.18653\/v1\/P16-2033","article-title":"The value of semantic parse labeling for knowledge base question answering","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Yih","year":"2016"},{"key":"2023070517334332500_bib50","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12030","article-title":"Scale up event extraction learning via automatic training data generation","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Zeng","year":"2018"},{"key":"2023070517334332500_bib51","doi-asserted-by":"publisher","first-page":"4039","DOI":"10.24963\/ijcai.2020\/559","article-title":"Knowledge graphs enhanced neural machine translation","volume-title":"Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence","author":"Zhao","year":"2021"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00569\/2141030\/tacl_a_00569.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00569\/2141030\/tacl_a_00569.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,5]],"date-time":"2023-07-05T17:34:07Z","timestamp":1688578447000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00569\/116618\/OpenFact-Factuality-Enhanced-Open-Knowledge"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":51,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00569","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}