{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:33Z","timestamp":1740185133239,"version":"3.37.3"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2022,6,27]],"date-time":"2022-06-27T00:00:00Z","timestamp":1656288000000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Australian Research Council Discovery Project","award":["DP190101350"],"award-info":[{"award-number":["DP190101350"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,6,24]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https:\/\/github.com\/jiyuc\/AutoGOAConsistency.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac230","type":"journal-article","created":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T11:10:15Z","timestamp":1649934615000},"page":"i273-i281","source":"Crossref","is-referenced-by-count":1,"title":["Exploring automatic inconsistency detection for literature-based gene ontology annotation"],"prefix":"10.1093","volume":"38","author":[{"given":"Jiyu","family":"Chen","sequence":"first","affiliation":[{"name":"School of Computing and Information Systems, The University of Melbourne , Parkville, VIC 3010, Australia"}]},{"given":"Benjamin","family":"Goudey","sequence":"additional","affiliation":[{"name":"School of Computing and Information Systems, The University of Melbourne , Parkville, VIC 3010, Australia"}]},{"given":"Justin","family":"Zobel","sequence":"additional","affiliation":[{"name":"School of Computing and Information Systems, The University of Melbourne , Parkville, VIC 3010, Australia"}]},{"given":"Nicholas","family":"Geard","sequence":"additional","affiliation":[{"name":"School of Computing and Information Systems, The University of Melbourne , Parkville, VIC 3010, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8661-1544","authenticated-orcid":false,"given":"Karin","family":"Verspoor","sequence":"additional","affiliation":[{"name":"School of Computing and Information Systems, The University of Melbourne , Parkville, VIC 3010, Australia"},{"name":"School of Computer Technologies, RMIT University , Melbourne, VIC 3000, Australia"}]}],"member":"286","published-online":{"date-parts":[[2022,6,27]]},"reference":[{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1186\/1471-2105-13-161","article-title":"Concept annotation in the CRAFT corpus","volume":"13","author":"Bada","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"bat054","DOI":"10.1093\/database\/bat054","article-title":"A guide to best practices for gene ontology (GO) manual annotation","volume":"2013","author":"Balakrishnan","year":"2013","journal-title":"Database"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"Bateman","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1007\/978-1-4939-3743-1_13","article-title":"Gene-category analysis","volume":"1446","author":"Bauer","year":"2017","journal-title":"Methods Mol. Biol. (Clifton, NJ)"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1038\/ng0504-431","article-title":"The genetic association database","volume":"36","author":"Becker","year":"2004","journal-title":"Nat. Genet"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"3045","DOI":"10.1093\/bioinformatics\/btp536","article-title":"QuickGO: a web-based tool for gene ontology searching","volume":"25","author":"Binns","year":"2009","journal-title":"Bioinformatics"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"D36","DOI":"10.1093\/nar\/gku1055","article-title":"Gene: a gene-centered information resource at NCBI","volume":"43","author":"Brown","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"D801","DOI":"10.1093\/nar\/gky1056","article-title":"Mouse genome database (MGD) 2019","volume":"47","author":"Bult","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-6-S1-S17","article-title":"An evaluation of go annotation retrieval for biocreative and Goa","volume":"6","author":"Camon","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"D325","DOI":"10.1093\/nar\/gkaa1113","article-title":"The gene ontology resource: enriching a GOld mine","volume":"49","author":"Carbon","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1186\/s12859-021-04479-9","article-title":"Automatic consistency assurance for literature-based gene ontology annotation","volume":"22","author":"Chen","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2023041407540461100_","article-title":"Benchmarks for measurement of duplicate detection methods in nucleotide databases","volume":"2017","author":"Chen","year":"2017","journal-title":"Database"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"baw163","DOI":"10.1093\/database\/baw163","article-title":"Duplicates, redundancies and inconsistencies in the primary nucleotide databases: A descriptive study","author":"Chen","year":"2017","journal-title":"Database,"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"e40519","DOI":"10.1371\/journal.pone.0040519","article-title":"Mining GO annotations for improving annotation consistency","volume":"7","author":"Faria","year":"2012","journal-title":"PLoS One"},{"first-page":"6533","year":"2017","author":"Fout","key":"2023041407540461100_"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1186\/1471-2105-15-59","article-title":"Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters","volume":"15","author":"Funk","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"1641","DOI":"10.1093\/bioinformatics\/18.12.1641","article-title":"Modeling the percolation of annotation errors in a database of protein sequences","volume":"18","author":"Gilks","year":"2002","journal-title":"Bioinformatics"},{"key":"2023041407540461100_","first-page":"1","volume-title":"ACM Transactions on Computing for Healthcare (HEALTH)","author":"Gu","year":"2021"},{"first-page":"1025","year":"2017","author":"Hamilton","key":"2023041407540461100_"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1016\/j.jbi.2013.07.011","article-title":"The DDI corpus: an annotated corpus with pharmacological substances and drug\u2013drug interactions","volume":"46","author":"Herrero-Zazo","year":"2013","journal-title":"J. Biomed. Inform"},{"year":"2019","author":"Hu","key":"2023041407540461100_"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/nar\/gkn923","article-title":"Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists","volume":"37","author":"Huang","year":"2009","journal-title":"Nucleic Acids Res"},{"first-page":"448","year":"2015","author":"Ioffe","key":"2023041407540461100_"},{"first-page":"81","year":"2009","author":"Kolb","key":"2023041407540461100_"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"1417","DOI":"10.3233\/JAD-200207","article-title":"Gene ontology curation of neuroinflammation biology improves the interpretation of Alzheimer\u2019s disease gene expression data","volume":"75","author":"Kramarz","year":"2020","journal-title":"J. Alzheimers. Dis"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-018-2103-8","article-title":"Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature","volume":"19","author":"M\u00fcller","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"43","DOI":"10.2478\/pralin-2018-0002","article-title":"Training tips for the transformer model","volume":"110","author":"Popel","year":"2018","journal-title":"Prague Bull. Math. Linguist"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1007\/978-1-4939-3743-1_4","volume-title":"The Gene Ontology Handbook","author":"Poux","year":"2017"},{"first-page":"1","year":"2005","author":"Rosenstein","key":"2023041407540461100_"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"e1002533","DOI":"10.1371\/journal.pcbi.1002533","article-title":"Quality of computationally inferred gene ontology annotations","volume":"8","author":"\u0160kunca","year":"2012","journal-title":"PLoS Comput. Biol"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"i49","DOI":"10.1093\/bioinformatics\/btx238","article-title":"BIOSSES: a semantic sentence similarity estimation system for the biomedical domain","volume":"33","author":"So\u011fanc Io\u011flu","year":"2017","journal-title":"Bioinformatics"},{"year":"2010","author":"Tanenblatt","key":"2023041407540461100_"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1007\/978-1-4939-3743-1_2","volume-title":"The Gene Ontology Handbook","author":"Thomas","year":"2017"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"1429","DOI":"10.1038\/s41588-019-0500-1","article-title":"Gene ontology causal activity modeling (GO-CAM) moves beyond GO annotations to structured descriptions of biological functions and systems","volume":"51","author":"Thomas","year":"2019","journal-title":"Nat. Genet"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"bau074","DOI":"10.1093\/database\/bau074","article-title":"BC4GO: a full-text corpus for the BioCreative IV GO task","volume":"2014","author":"Van Auken","year":"2014","journal-title":"Database"},{"article-title":"Deep graph library: a graph-centric, highly-performant package for graph neural networks","year":"2019","author":"Wang","key":"2023041407540461100_"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"W518","DOI":"10.1093\/nar\/gkt441","article-title":"Pubtator: a web-based text mining tool for assisting biocuration","volume":"41","author":"Wei","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023041407540461100_","doi-asserted-by":"crossref","first-page":"i457","DOI":"10.1093\/bioinformatics\/bty294","article-title":"Modeling polypharmacy side effects with graph convolutional networks","volume":"34","author":"Zitnik","year":"2018","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_1\/i273\/49886800\/btac230.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_1\/i273\/49886800\/btac230.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,22]],"date-time":"2024-09-22T06:55:48Z","timestamp":1726988148000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/Supplement_1\/i273\/6617491"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,24]]},"references-count":40,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2022,6,24]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac230","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2022,7,1]]},"published":{"date-parts":[[2022,6,24]]}}}