{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T10:52:36Z","timestamp":1740135156988,"version":"3.37.3"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"S23","license":[{"start":{"date-parts":[[2020,12,1]],"date-time":"2020-12-01T00:00:00Z","timestamp":1606780800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,12,29]],"date-time":"2020-12-29T00:00:00Z","timestamp":1609200000000},"content-version":"vor","delay-in-days":28,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"publisher","award":["654021"],"award-info":[{"award-number":["654021"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Entity normalization is an important information extraction task which has gained renewed attention in the last decade, particularly in the biomedical and life science domains. In these domains, and more generally in all specialized domains, this task is still challenging for the latest machine learning-based approaches, which have difficulty handling highly multi-class and few-shot learning problems. To address this issue, we propose C-Norm, a new neural approach which synergistically combines standard and weak supervision, ontological knowledge integration and distributional semantics.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Our approach greatly outperforms all methods evaluated on the Bacteria Biotope datasets of BioNLP Open Shared Tasks 2019, without integrating any manually-designed domain-specific rules.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Our results show that relatively shallow neural network methods can perform well in domains that present highly multi-class and few-shot learning problems.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-020-03886-8","type":"journal-article","created":{"date-parts":[[2020,12,29]],"date-time":"2020-12-29T10:02:47Z","timestamp":1609236167000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["C-Norm: a neural approach to few-shot entity normalization"],"prefix":"10.1186","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9115-8222","authenticated-orcid":false,"given":"Arnaud","family":"Ferr\u00e9","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1399-4828","authenticated-orcid":false,"given":"Louise","family":"Del\u00e9ger","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6652-9319","authenticated-orcid":false,"given":"Robert","family":"Bossy","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8410-4808","authenticated-orcid":false,"given":"Pierre","family":"Zweigenbaum","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0577-0595","authenticated-orcid":false,"given":"Claire","family":"N\u00e9dellec","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,12,29]]},"reference":[{"unstructured":"Faure D, N\u00e9dellec C. A corpus-based conceptual clustering method for verb frames and ontology acquisition. In: LREC workshop on adapting lexical and corpus resources to sublanguages and applications. 1998. p. 5\u201312.","key":"3886_CR1"},{"unstructured":"Hwang CH. Incompletely and imprecisely speaking: using dynamic ontologies for representing and retrieving information. KRDB. 1999. p. 13.","key":"3886_CR2"},{"unstructured":"N\u00e9dellec C, Bossy R, Chaix E, Deleger L. Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity. In: 4th international conference on microbial diversity 2017. Marco Gobetti; 2017.","key":"3886_CR3"},{"unstructured":"Bossy R, Chaix E, Del\u00e9ger L, Ferr\u00e9 A, Ba M, Bessi\u00e8res P, et al. OntoBiotope: une ontologie pour croiser les habitats microbiens avec les analyses de g\u00e9nomes. In: Les journ\u00e9es Bioinformatique de l\u2019INRA. 2016. p. 1.","key":"3886_CR4"},{"unstructured":"Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: 8th international conference on learning representations. ICLR 2016, San Juan, Puerto Rico, May 2\u20134, 2016.","key":"3886_CR5"},{"doi-asserted-by":"crossref","unstructured":"Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a few examples: a survey on few-shot learning. ACM Computing Surveys (CSUR). 2019.","key":"3886_CR6","DOI":"10.1145\/3386252"},{"unstructured":"Larochelle H, Erhan D, Bengio Y. Zero-data learning of new tasks. In: 23rd AAAI conference on artificial intelligence. 2008. p. 3.","key":"3886_CR7"},{"unstructured":"Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA symposium. American Medical Informatics Association; 2001. p. 17.","key":"3886_CR8"},{"issue":"1","key":"3886_CR9","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1186\/1471-2105-11-85","volume":"11","author":"M Gerner","year":"2010","unstructured":"Gerner M, Nenadic G, Bergman CM. LINNAEUS: a species name identification system for biomedical literature. BMC Bioinform. 2010;11(1):85.","journal-title":"BMC Bioinform"},{"unstructured":"Lee H-C, Hsu Y-Y, Kao H-Y. An enhanced CRF-based system for disease name entity recognition and normalization on BioCreative V DNER Task. In: Proceedings of the 5th BioCreative challenge evaluation workshop. 2015. p. 226\u201333.","key":"3886_CR10"},{"issue":"Suppl 1","key":"3886_CR11","doi-asserted-by":"publisher","first-page":"S14","DOI":"10.1186\/1471-2105-6-S1-S14","volume":"6","author":"D Hanisch","year":"2005","unstructured":"Hanisch D, Fundel K, Mevissen H-T, Zimmer R, Fluck J. ProMiner: rule-based protein and gene entity recognition. BMC Bioinform. 2005;6(Suppl 1):S14.","journal-title":"BMC Bioinform"},{"issue":"20","key":"3886_CR12","doi-asserted-by":"publisher","first-page":"2768","DOI":"10.1093\/bioinformatics\/btm393","volume":"23","author":"Y Tsuruoka","year":"2007","unstructured":"Tsuruoka Y, McNaught J, Tsujii J, Ananiadou S. Learning string similarity measures for gene\/protein name dictionary look-up using logistic regression. Bioinformatics. 2007;23(20):2768\u201374.","journal-title":"Bioinformatics"},{"doi-asserted-by":"crossref","unstructured":"Ghiasvand O, Kate R. UWM: disorder mention extraction from clinical text using CRFs and normalization using learned edit distance patterns. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014). Dublin, Ireland: Association for Computational Linguistics and Dublin City University; 2014. p. 828\u201332.","key":"3886_CR13","DOI":"10.3115\/v1\/S14-2147"},{"unstructured":"Schuemie MJ, Jelier R, Kors JA. Peregrine: lightweight gene name normalization by dictionary lookup. In: Processing of the 2nd BioCreative challenge evaluation workshop. 2007. p. 131\u20133.","key":"3886_CR14"},{"unstructured":"Golik W, Warnier P, N\u00e9dellec C. Corpus-based extension of termino-ontology by linguistic analysis: a use case in biomedical event extraction. In: WS 2 workshop extended abstracts, international conference on terminology and artificial intelligence (TIA), Paris, France, Nov 2011. 2011. p. 37\u20139.","key":"3886_CR15"},{"key":"3886_CR16","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to information retrieval","author":"C Manning","year":"2008","unstructured":"Manning C, Raghavan P, Sch\u00fctze H. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008."},{"doi-asserted-by":"crossref","unstructured":"Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In: 2014 conference on empirical methods in natural language processing EMNLP. 2014. p. 1532\u201343.","key":"3886_CR17","DOI":"10.3115\/v1\/D14-1162"},{"unstructured":"Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013.","key":"3886_CR18"},{"doi-asserted-by":"crossref","unstructured":"Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep contextualized word representations. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL-HLT. 2018. p. 2227\u201337.","key":"3886_CR19","DOI":"10.18653\/v1\/N18-1202"},{"unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. p. 4171\u201386.","key":"3886_CR20"},{"doi-asserted-by":"crossref","unstructured":"Tiftikci M, Sahin H, B\u00fcy\u00fck\u00f6z B, Yay\u0131k\u00e7\u0131 A, Ozg\u00fcr A. Ontology-based categorization of bacteria and habitat entities using information retrieval techniques. In: Proceedings of the 4th BioNLP shared task workshop. 2016. p. 56.","key":"3886_CR21","DOI":"10.18653\/v1\/W16-3007"},{"key":"3886_CR22","first-page":"80","volume":"2017","author":"F Mehryary","year":"2017","unstructured":"Mehryary F, Hakala K, Kaewphan S, Bj\u00f6rne J, Salakoski T, Ginter F. End-to-end system for bacteria habitat extraction. BioNLP. 2017;2017:80.","journal-title":"BioNLP"},{"issue":"1","key":"3886_CR23","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1186\/s12859-019-2678-8","volume":"20","author":"\u0130 Karadeniz","year":"2019","unstructured":"Karadeniz \u0130, \u00d6zg\u00fcr A. Linking entities through an ontology using word embeddings and syntactic re-ranking. BMC Bioinform. 2019;20(1):156.","journal-title":"BMC Bioinform"},{"unstructured":"Roberts K. Assessing the corpus size vs. similarity trade-off for word embeddings in clinical NLP. In: Proceedings of the clinical natural language processing workshop (ClinicalNLP). Osaka, Japan: The COLING 2016 Organizing Committee. 2016. p. 54\u201363.","key":"3886_CR24"},{"doi-asserted-by":"crossref","unstructured":"Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy E, Smith NA. Retrofitting word vectors to semantic lexicons. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2014.","key":"3886_CR25","DOI":"10.3115\/v1\/N15-1184"},{"issue":"22","key":"3886_CR26","doi-asserted-by":"publisher","first-page":"2909","DOI":"10.1093\/bioinformatics\/btt474","volume":"29","author":"R Leaman","year":"2013","unstructured":"Leaman R, Islamaj Dogan R, Lu Z. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics. 2013;29(22):2909\u201317.","journal-title":"Bioinformatics"},{"key":"3886_CR27","first-page":"99","volume":"2017","author":"A Ferr\u00e9","year":"2017","unstructured":"Ferr\u00e9 A, Zweigenbaum P, N\u00e9dellec C. Representation of complex terms in a vector space structured by an ontology for a normalization task. BioNLP. 2017;2017:99\u2013106.","journal-title":"BioNLP"},{"doi-asserted-by":"crossref","unstructured":"Sil A, Kundu G, Florian R, Hamza W. Neural cross-lingual entity linking. In: 32nd AAAI conference on artificial intelligence. 2018.","key":"3886_CR28","DOI":"10.1609\/aaai.v32i1.11964"},{"doi-asserted-by":"crossref","unstructured":"Deng P, Chen H, Huang M, Ruan X, Xu L. An ensemble CNN method for biomedical entity normalization. In: Proceedings of the 5th workshop on BioNLP open shared tasks. 2019. p. 143\u20139.","key":"3886_CR29","DOI":"10.18653\/v1\/D19-5721"},{"doi-asserted-by":"crossref","unstructured":"Limsopatham N, Collier N. Normalising medical concepts in social media texts by learning semantic representation. In: ACL 2016. Berlin, Germany: Association for Computational Linguistics; 2016. p. 1014\u201323.","key":"3886_CR30","DOI":"10.18653\/v1\/P16-1096"},{"unstructured":"Ferr\u00e9 A, Del\u00e9ger L, Zweigenbaum P, N\u00e9dellec C. Combining rule-based and embedding-based approaches to normalize textual entities with an ontology. In: Proceedings of the 11th international conference on language resources and evaluation (LREC 2018). 2018.","key":"3886_CR31"},{"issue":"4","key":"3886_CR32","doi-asserted-by":"publisher","first-page":"e1249","DOI":"10.1002\/widm.1249","volume":"8","author":"O Sagi","year":"2018","unstructured":"Sagi O, Rokach L. Ensemble learning: a survey. Wiley Interdiscip Rev Data Mining Knowl Discov. 2018;8(4):e1249.","journal-title":"Wiley Interdiscip Rev Data Mining Knowl Discov"},{"doi-asserted-by":"crossref","unstructured":"Bossy R, Del\u00e9ger L, Chaix E, Ba M, N\u00e9dellec C. Bacteria biotope at BioNLP open shared tasks 2019. In: Proceedings of the 5th workshop on BioNLP open shared tasks. 2019. p. 121\u201331.","key":"3886_CR33","DOI":"10.18653\/v1\/D19-5719"},{"unstructured":"Jin-Dong K, Claire N, Robert B, Louise D. In: Proceedings of the 5th workshop on BioNLP Open Shared Tasks. 2019.","key":"3886_CR34"},{"issue":"2","key":"3886_CR35","doi-asserted-by":"publisher","first-page":"e20","DOI":"10.5808\/GI.2019.17.2.e20","volume":"17","author":"A Ferr\u00e9","year":"2019","unstructured":"Ferr\u00e9 A, Ba M, Bossy R. Improving at BLAH5 the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data. J Genomics Inform. 2019;17(2):e20.","journal-title":"J Genomics Inform"},{"unstructured":"Ferr\u00e9 A, Bossy R, Ba M, Del\u00e9ger L, Lavergne T, Zweigenbaum P, et al. Handling entity normalization with no annotated corpus: weakly supervised methods based on distributional representation and ontological information. In: Proceedings of the 12th language resources and evaluation conference (LREC). 2020. p. 1959\u201366.","key":"3886_CR36"},{"unstructured":"Dozat T. Incorporating Nesterov momentum into Adam. In: 4th international conference on learning representations (ICLR). 2016.","key":"3886_CR37"},{"unstructured":"Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:14126980. 2014.","key":"3886_CR38"},{"unstructured":"Reddi SJ, Kale S, Kumar S. On the convergence of Adam and beyond. arXiv preprint arXiv:190409237. 2019.","key":"3886_CR39"},{"unstructured":"Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural network acoustic models. In: Proc ICML. 2013. p. 3.","key":"3886_CR40"},{"issue":"10","key":"3886_CR41","doi-asserted-by":"publisher","first-page":"1274","DOI":"10.1093\/bioinformatics\/btm087","volume":"23","author":"JZ Wang","year":"2007","unstructured":"Wang JZ, Du Z, Payattakool R, Yu PS, Chen C-F. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274\u201381.","journal-title":"Bioinformatics"},{"doi-asserted-by":"crossref","unstructured":"Mao J, Liu W. Integration of deep learning and traditional machine learning for knowledge extraction from biomedical literature. In: Proceedings of the 5th workshop on BioNLP open shared tasks. 2019. p. 168\u2013173.","key":"3886_CR42","DOI":"10.18653\/v1\/D19-5724"},{"doi-asserted-by":"crossref","unstructured":"Karadeniz I, Tuna \u00d6F, \u00d6zg\u00fcr A. BOUN-ISIK participation: an unsupervised approach for the named entity normalization and relation extraction of bacteria biotopes. In: Proceedings of the 5th workshop on BioNLP open shared tasks. 2019. p. 150\u20137.","key":"3886_CR43","DOI":"10.18653\/v1\/D19-5722"},{"doi-asserted-by":"crossref","unstructured":"Del\u00e9ger L, Bossy R, Chaix E, Ba M, Ferr\u00e9 A, Bessi\u00e8res P, et al. Overview of the Bacteria Biotope task at BioNLP shared task 2016. In: Proceedings of the 4th BioNLP shared task workshop. 2016. p. 12\u201322.","key":"3886_CR44","DOI":"10.18653\/v1\/W16-3002"},{"issue":"1","key":"3886_CR45","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","volume":"20","author":"F Scarselli","year":"2008","unstructured":"Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans Neural Netw. 2008;20(1):61\u201380.","journal-title":"IEEE Trans Neural Netw"},{"doi-asserted-by":"crossref","unstructured":"Marcheggiani D, Titov I. Encoding sentences with graph convolutional networks for semantic role labeling. arXiv preprint arXiv:170304826. 2017.","key":"3886_CR46","DOI":"10.18653\/v1\/D17-1159"},{"issue":"10","key":"3886_CR47","doi-asserted-by":"publisher","first-page":"1274","DOI":"10.1093\/jamia\/ocy114","volume":"25","author":"A Sarker","year":"2018","unstructured":"Sarker A, Belousov M, Friedrichs J, Hakala K, Kiritchenko S, Mehryary F, et al. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task. J Am Med Inform Assoc. 2018;25(10):1274\u201383.","journal-title":"J Am Med Inform Assoc"},{"unstructured":"Roberts K, Demner-Fushman D, Tonning JM. Overview of the TAC 2017 adverse reaction extraction from drug labels track. In: Text analysis conference (TAC). 2017.","key":"3886_CR48"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03886-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-020-03886-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03886-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T20:32:11Z","timestamp":1670617931000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-03886-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":48,"journal-issue":{"issue":"S23","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3886"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-03886-8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2020,12]]},"assertion":[{"value":"12 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 November 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 December 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"579"}}