{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T17:02:51Z","timestamp":1759683771773},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1576,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Ontologies are an everyday tool in biomedicine to capture and represent knowledge. However, many ontologies lack a high degree of coverage in their domain and need to improve their overall quality and maturity. Automatically extending sets of existing terms will enable ontology engineers to systematically improve text-based ontologies level by level.<\/jats:p><jats:p>Results: We developed an approach to extend ontologies by discovering new terms which are in a sibling relationship to existing terms of an ontology. For this purpose, we combined two approaches which retrieve new terms from the web. The first approach extracts siblings by exploiting the structure of HTML documents, whereas the second approach uses text mining techniques to extract siblings from unstructured text. Our evaluation against MeSH (Medical Subject Headings) shows that our method for sibling discovery is able to suggest first-class ontology terms and can be used as an initial step towards assessing the completeness of ontologies. The evaluation yields a recall of 80% at a precision of 61% where the two independent approaches are complementing each other. For MeSH in particular, we show that it can be considered complete in its medical focus area. We integrated the work into DOG4DAG, an ontology generation plugin for the editors OBO-Edit and Prot\u00e9g\u00e9, making it the first plugin that supports sibling discovery on-the-fly.<\/jats:p><jats:p>Availability: Sibling discovery for ontology is available as part of DOG4DAG (www.biotec.tu-dresden.de\/research\/schroeder\/dog4dag) for both Prot\u00e9g\u00e9 4.1 and OBO-Edit 2.1.<\/jats:p><jats:p>Contact: \u00a0ms@biotec.tu-dresden.de; goetz.fabian@biotec.tu-dresden.de<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts215","type":"journal-article","created":{"date-parts":[[2012,6,11]],"date-time":"2012-06-11T14:09:18Z","timestamp":1339423758000},"page":"i292-i300","source":"Crossref","is-referenced-by-count":11,"title":["Extending ontologies by finding siblings using set expansion techniques"],"prefix":"10.1093","volume":"28","author":[{"given":"G\u00f6tz","family":"Fabian","sequence":"first","affiliation":[{"name":"Biotechnology Center (BIOTEC), Technische Universit\u00e4t Dresden, 01062 Dresden, Germany"}]},{"given":"Thomas","family":"W\u00e4chter","sequence":"additional","affiliation":[{"name":"Biotechnology Center (BIOTEC), Technische Universit\u00e4t Dresden, 01062 Dresden, Germany"}]},{"given":"Michael","family":"Schroeder","sequence":"additional","affiliation":[{"name":"Biotechnology Center (BIOTEC), Technische Universit\u00e4t Dresden, 01062 Dresden, Germany"}]}],"member":"286","published-online":{"date-parts":[[2012,6,9]]},"reference":[{"key":"2023012512390093500_B1","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene Ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012512390093500_B2","article-title":"Overview of the TREC 2010 entity track","volume-title":"Proceedings of the Nineteenth Text REtrieval Conference (TREC 2010)","author":"Balog","year":"2011"},{"key":"2023012512390093500_B3","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The Unified Medical Language System (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012512390093500_B4","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1007\/11730262_5","article-title":"Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM","volume-title":"Knowledge Discovery from XML Documents.","author":"Brunzel","year":"2006"},{"key":"2023012512390093500_B5","doi-asserted-by":"crossref","first-page":"W372","DOI":"10.1093\/nar\/gkn252","article-title":"The Ontology Lookup Service: more data and better tools for controlled vocabulary queries","volume":"36","author":"C\u00f4t\u00e9","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012512390093500_B6","doi-asserted-by":"crossref","first-page":"2198","DOI":"10.1093\/bioinformatics\/btm112","article-title":"OBO-Edit\u2013an ontology editor for biologists","volume":"23","author":"Day-Richter","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512390093500_B7","doi-asserted-by":"crossref","first-page":"W783","DOI":"10.1093\/nar\/gki470","article-title":"GoPubMed: exploring PubMed with the Gene Ontology","volume":"33","author":"Doms","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012512390093500_B8","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.artint.2005.03.001","article-title":"Unsupervised named-entity extraction from the Web: an experimental study","volume":"165","author":"Etzioni","year":"2005","journal-title":"Artif. Intell."},{"key":"2023012512390093500_B9","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/s007999900023","article-title":"Automatic recognition of multi-word terms: the C-value\/NC-value Method","volume":"3","author":"Frantzi","year":"2000","journal-title":"Int. J. Digit. Libr."},{"key":"2023012512390093500_B10","doi-asserted-by":"crossref","first-page":"539","DOI":"10.3115\/992133.992154","article-title":"Automatic acquisition of hyponyms from large text corpora","volume-title":"Proceedings of the 14th Conference on Computational Linguistics.","author":"Hearst","year":"1992"},{"key":"2023012512390093500_B11","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1038\/455047a","article-title":"Big data: the future of biocuration","volume":"455","author":"Howe","year":"2008","journal-title":"Nature"},{"key":"2023012512390093500_B12","first-page":"1048","article-title":"Semantic class learning from the web with hyponym pattern linkage graphs","volume-title":"Proceedings of ACL-08: HLT","author":"Kozareva","year":"2008"},{"key":"2023012512390093500_B13","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1145\/502512.502558","article-title":"Induction of semantic classes from natural language text","volume-title":"Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Lin","year":"2001"},{"key":"2023012512390093500_B14","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/j.jbi.2010.07.006","article-title":"Natural language processing methods and systems for biomedical ontology learning","volume":"44","author":"Liu","year":"2011","journal-title":"J. Biomed. Inform."},{"key":"2023012512390093500_B15","first-page":"214","article-title":"The compositional structure of gene ontology terms","volume-title":"Pacific Symposium on Biocomputing","author":"Ogren","year":"2004"},{"key":"2023012512390093500_B16","first-page":"938","article-title":"Web-scale distributional similarity and entity set expansion","volume-title":"Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing","author":"Pantel","year":"2009"},{"key":"2023012512390093500_B17","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1145\/1031171.1031194","article-title":"Acquisition of categorized named entities for web search","volume-title":"Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management","author":"Pa\u015fca,M.","year":"2004"},{"key":"2023012512390093500_B18","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1186\/1471-2105-10-125","article-title":"Survey-based naming conventions for use in OBO Foundry ontology development","volume":"10","author":"Schober","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012512390093500_B19","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1145\/1458082.1458329","article-title":"Pattern-based semantic class discovery with multi-membership support","volume-title":"Proceeding of the 17th ACM Conference on Information and Knowledge Management","author":"Shi","year":"2008"},{"key":"2023012512390093500_B20","first-page":"993","article-title":"Corpus-based semantic class mining: distributional vs. pattern-based approaches","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics","author":"Shi","year":"2010"},{"key":"2023012512390093500_B21","first-page":"73","article-title":"Acquiring hyponymy relations from web documents","volume":"2004","author":"Shinzato","year":"2004","journal-title":"Proc. HLT-NAACL"},{"key":"2023012512390093500_B22","doi-asserted-by":"crossref","first-page":"i88","DOI":"10.1093\/bioinformatics\/btq188","article-title":"Semi-automated ontology generation within OBO-Edit","volume":"26","author":"W\u00e4chter","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512390093500_B23","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1109\/ICDM.2007.104","article-title":"Language-independent set expansion of named entities using the web","author":"Wang","year":"2007","journal-title":"2007 Seventh IEEE International Conference on Data Mining"},{"key":"2023012512390093500_B24","doi-asserted-by":"crossref","first-page":"866","DOI":"10.1093\/bioinformatics\/btl005","article-title":"The MGED Ontology: a resource for semantics-based description of microarray experiments","volume":"22","author":"Whetzel","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012512390093500_B25","doi-asserted-by":"crossref","first-page":"W541","DOI":"10.1093\/nar\/gkr469","article-title":"BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications","volume":"39","author":"Whetzel","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012512390093500_B26","doi-asserted-by":"crossref","first-page":"e1001055","DOI":"10.1371\/journal.pcbi.1001055","article-title":"Benchmarking ontologies: bigger or better?","volume":"7","author":"Yao","year":"2011","journal-title":"PLoS Comput. Biol."},{"key":"2023012512390093500_B27","first-page":"459","article-title":"Employing topic models for pattern-based semantic class discovery","author":"Zhang","year":"2009","journal-title":"Proceedings of ACL\/AFNLP 2009"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/12\/i292\/48880252\/bioinformatics_28_12_i292.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/12\/i292\/48880252\/bioinformatics_28_12_i292.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,25]],"date-time":"2024-04-25T06:50:44Z","timestamp":1714027844000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/12\/i292\/268419"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,6,9]]},"references-count":27,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2012,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts215","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,6,15]]},"published":{"date-parts":[[2012,6,9]]}}}