{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T12:24:54Z","timestamp":1764937494623},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,12,6]],"date-time":"2022-12-06T00:00:00Z","timestamp":1670284800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,6]],"date-time":"2022-12-06T00:00:00Z","timestamp":1670284800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>The roles of antibody and antigen are indispensable in targeted diagnosis, therapy, and biomedical discovery. On top of that, massive numbers of new scientific articles about antibodies and\/or antigens are published each year, which is a precious knowledge resource but has yet been exploited to its full potential. We, therefore, aim to develop a biomedical natural language processing tool that can automatically identify antibody and antigen entities from articles.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We first annotated an antibody-antigen corpus including 3210 relevant PubMed abstracts using a semi-automatic approach. The Inter-Annotator Agreement score of 3 annotators ranges from 91.46 to 94.31%, indicating that the annotations are consistent and the corpus is reliable. We then used the corpus to develop and optimize BiLSTM-CRF-based and BioBERT-based models. The models achieved overall F1 scores of 62.49% and 81.44%, respectively, which showed potential for newly studied entities. The two models served as foundation for development of a named entity recognition (NER) tool that automatically recognizes antibody and antigen names from biomedical literature.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Our antibody-antigen NER models enable users to automatically extract antibody and antigen names from scientific articles without manually scanning through vast amounts of data and information in the literature. The output of NER can be used to automatically populate antibody-antigen databases, support antibody validation, and facilitate researchers with the most appropriate antibodies of interest. The packaged NER model is available at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/TrangDinh44\/ABAG_BioBERT.git\">https:\/\/github.com\/TrangDinh44\/ABAG_BioBERT.git<\/jats:ext-link>.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-022-04993-4","type":"journal-article","created":{"date-parts":[[2022,12,6]],"date-time":"2022-12-06T14:06:47Z","timestamp":1670335607000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Extract antibody and antigen names from biomedical literature"],"prefix":"10.1186","volume":"23","author":[{"given":"Thuy Trang","family":"Dinh","sequence":"first","affiliation":[]},{"given":"Trang Phuong","family":"Vo-Chanh","sequence":"additional","affiliation":[]},{"given":"Chau","family":"Nguyen","sequence":"additional","affiliation":[]},{"given":"Viet Quoc","family":"Huynh","sequence":"additional","affiliation":[]},{"given":"Nam","family":"Vo","sequence":"additional","affiliation":[]},{"given":"Hoang Duc","family":"Nguyen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,12,6]]},"reference":[{"key":"4993_CR1","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1093\/ilar.46.3.258","volume":"46","author":"NS Lipman","year":"2005","unstructured":"Lipman NS, Jackson LR, Trudel LJ, Weis-Garcia F. Monoclonal versus polyclonal antibodies: distinguishing characteristics, applications, and information resources. ILAR J. 2005;46:258\u201368.","journal-title":"ILAR J"},{"key":"4993_CR2","doi-asserted-by":"publisher","first-page":"38","DOI":"10.3390\/data2040038","volume":"2","author":"S Subramanian","year":"2017","unstructured":"Subramanian S, Ganapathiraju MK. Antibody exchange: information extraction of biological antibody donation and a web-portal to find donors and seekers. Data. 2017;2:38.","journal-title":"Data"},{"key":"4993_CR3","doi-asserted-by":"publisher","first-page":"e1008967","DOI":"10.1371\/journal.pcbi.1008967","volume":"17","author":"C-N Hsu","year":"2021","unstructured":"Hsu C-N, Chang C-H, Poopradubsil T, Lo A, William KA, Lin K-W, et al. Antibody watch: text mining antibody specificity from the literature. PLOS Comput Biol. 2021;17:e1008967.","journal-title":"PLOS Comput Biol"},{"key":"4993_CR4","doi-asserted-by":"publisher","first-page":"D1140","DOI":"10.1093\/nar\/gkt1043","volume":"42","author":"J Dunbar","year":"2014","unstructured":"Dunbar J, Krawczyk K, Leem J, Baker T, Fuchs A, Georges G, et al. SAbDab: the structural antibody database. Nucleic Acids Res. 2014;42:D1140\u20136.","journal-title":"Nucleic Acids Res"},{"key":"4993_CR5","unstructured":"The Antibody Registry. https:\/\/antibodyregistry.org\/. Accessed 11 Feb 2022."},{"key":"4993_CR6","doi-asserted-by":"publisher","first-page":"D261","DOI":"10.1093\/nar\/gkz714","volume":"48","author":"WC Lima","year":"2020","unstructured":"Lima WC, Gasteiger E, Marcatili P, Duek P, Bairoch A, Cosson P. The ABCD database: a repository for chemically defined antibodies. Nucleic Acids Res. 2020;48:D261\u20134.","journal-title":"Nucleic Acids Res"},{"key":"4993_CR7","unstructured":"Li J, Sun A, Han J, Li C. A Survey on Deep Learning for Named Entity Recognition. ArXiv181209449 Cs. 2020."},{"key":"4993_CR8","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436\u201344.","journal-title":"Nature"},{"key":"4993_CR9","doi-asserted-by":"publisher","first-page":"1547","DOI":"10.1093\/bioinformatics\/btx815","volume":"34","author":"Q Zhu","year":"2018","unstructured":"Zhu Q, Li X, Conesa A, Pereira C. GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics. 2018;34:1547\u201354.","journal-title":"Bioinformatics"},{"key":"4993_CR10","unstructured":"Li L, Jin L, Jiang Z, Song D, Huang D. Biomedical named entity recognition based on extended recurrent neural networks. In: 2015 IEEE international conference on bioinformatics and biomedicine (BIBM). 2015. p. 649\u201352."},{"key":"4993_CR11","unstructured":"Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv150801991 Cs. 2015."},{"key":"4993_CR12","doi-asserted-by":"crossref","unstructured":"Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. arXiv160301360 Cs. 2016.","DOI":"10.18653\/v1\/N16-1030"},{"key":"4993_CR13","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1007\/978-3-030-51310-8_3","volume-title":"Natural language processing and information systems","author":"F Saad","year":"2020","unstructured":"Saad F, Aras H, Hackl-Sommer R. Improving named entity recognition for biomedical and patent data using Bi-LSTM deep neural network models. In: M\u00e9tais E, Meziane F, Horacek H, Cimiano P, editors. Natural language processing and information systems. Cham: Springer; 2020. p. 25\u201336."},{"key":"4993_CR14","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36:1234\u201340.","journal-title":"Bioinformatics"},{"key":"4993_CR15","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, vol. 1 (long and short papers). Minneapolis: Association for Computational Linguistics; 2019. p. 4171\u201386."},{"key":"4993_CR16","doi-asserted-by":"publisher","first-page":"7673","DOI":"10.1021\/acs.chemrev.6b00851","volume":"117","author":"M Krallinger","year":"2017","unstructured":"Krallinger M, Rabal O, Louren\u00e7o A, Oyarzabal J, Valencia A. Information retrieval and text mining technologies for chemistry. Chem Rev. 2017;117:7673\u2013761.","journal-title":"Chem Rev"},{"key":"4993_CR17","doi-asserted-by":"crossref","unstructured":"Naseem U, Musial K, Eklund P, Prasad M. Biomedical named-entity recognition by hierarchically fusing BioBERT representations and deep contextual-level word-embedding. In: 2020 International joint conference on neural networks (IJCNN). Glasgow, United Kingdom: IEEE; 2020. p. 1\u20138.","DOI":"10.1109\/IJCNN48605.2020.9206808"},{"key":"4993_CR18","doi-asserted-by":"crossref","unstructured":"Gondane S. Neural network to identify personal health experience mention in tweets using BioBERT embeddings. In: Proceedings of the fourth social media mining for health applications (#SMM4H) workshop and shared task. Florence, Italy: Association for Computational Linguistics; 2019. p. 110\u20133.","DOI":"10.18653\/v1\/W19-3218"},{"key":"4993_CR19","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1038\/s41597-019-0342-9","volume":"7","author":"J Legrand","year":"2020","unstructured":"Legrand J, Gogdemir R, Bousquet C, Dalleau K, Devignes M-D, Digan W, et al. PGxCorpus, a manually annotated corpus for pharmacogenomics. Sci Data. 2020;7:3.","journal-title":"Sci Data"},{"key":"4993_CR20","unstructured":"Collier N, Kim J-D. Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (NLPBA\/BioNLP). Geneva, Switzerland: COLING; 2004. p. 73\u20138."},{"key":"4993_CR21","unstructured":"Papers with Code\u2014JNLPBA Benchmark (Named Entity Recognition). https:\/\/paperswithcode.com\/sota\/named-entity-recognition-ner-on-jnlpba. Accessed 12 Jun 2021."},{"key":"4993_CR22","unstructured":"Faessler E, Modersohn L, Lohr C, Hahn U. ProGene\u2014a large-scale, high-quality protein-gene annotated benchmark corpus. In: Proceedings of the 12th language resources and evaluation conference. Marseille, France: European Language Resources Association; 2020. p. 4585\u201396."},{"key":"4993_CR23","doi-asserted-by":"publisher","first-page":"W523","DOI":"10.1093\/nar\/gky428","volume":"46","author":"D Kwon","year":"2018","unstructured":"Kwon D, Kim S, Wei C-H, Leaman R, Lu Z. ezTag: tagging biomedical concepts via interactive learning. Nucleic Acids Res. 2018;46:W523-9.","journal-title":"Nucleic Acids Res."},{"key":"4993_CR24","doi-asserted-by":"publisher","first-page":"W5","DOI":"10.1093\/nar\/gkaa333","volume":"48","author":"R Islamaj","year":"2020","unstructured":"Islamaj R, Kwon D, Kim S, Lu Z. TeamTat: a collaborative text annotation tool. Nucleic Acids Res. 2020;48:W5-11.","journal-title":"Nucleic Acids Res"},{"key":"4993_CR25","unstructured":"sonvx. anaGo. Python. 2021."},{"key":"4993_CR26","unstructured":"Nakayama H. anaGo. Python. 2021."},{"key":"4993_CR27","doi-asserted-by":"crossref","unstructured":"Fadil I, Yuniarto D, Firmansyah E, Herdiana D, Supriadi F, Rahman A. File training generator for indonesian language in named entity recognition using Anago Library. 2021.","DOI":"10.4108\/eai.11-7-2019.2297618"},{"key":"4993_CR28","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1186\/s12911-021-01395-z","volume":"21","author":"L Campillos-Llanos","year":"2021","unstructured":"Campillos-Llanos L, Valverde-Mateos A, Capllonch-Carri\u00f3n A, Moreno-Sandoval A. A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine. BMC Med Inform Decis Mak. 2021;21:69.","journal-title":"BMC Med Inform Decis Mak"},{"key":"4993_CR29","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","volume":"33","author":"JR Landis","year":"1977","unstructured":"Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159\u201374.","journal-title":"Biometrics"},{"key":"4993_CR30","doi-asserted-by":"publisher","first-page":"914","DOI":"10.1016\/j.jbi.2013.07.011","volume":"46","author":"M Herrero-Zazo","year":"2013","unstructured":"Herrero-Zazo M, Segura-Bedmar I, Mart\u00ednez P, Declerck T. The DDI corpus: an annotated corpus with pharmacological substances and drug\u2013drug interactions. J Biomed Inform. 2013;46:914\u201320.","journal-title":"J Biomed Inform"},{"key":"4993_CR31","doi-asserted-by":"publisher","first-page":"73627","DOI":"10.1109\/ACCESS.2019.2920734","volume":"7","author":"H Wei","year":"2019","unstructured":"Wei H, Gao M, Zhou A, Chen F, Qu W, Wang C, et al. Named entity recognition from biomedical texts using a fusion attention-based BiLSTM-CRF. IEEE Access. 2019;7:73627\u201336.","journal-title":"IEEE Access"},{"key":"4993_CR32","doi-asserted-by":"crossref","unstructured":"Dai X, Karimi S, Hachey B, Paris C. Using Similarity Measures to Select Pretraining Data for NER. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, vol. 1 (long and short papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. p. 1460\u201370.","DOI":"10.18653\/v1\/N19-1149"},{"key":"4993_CR33","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1186\/s12859-019-2813-6","volume":"20","author":"W Yoon","year":"2019","unstructured":"Yoon W, So CH, Lee J, Kang J. CollaboNet: collaboration of deep neural networks for biomedical named entity recognition. BMC Bioinform. 2019;20:249.","journal-title":"BMC Bioinform"},{"key":"4993_CR34","doi-asserted-by":"publisher","first-page":"735","DOI":"10.1186\/s12859-019-3321-4","volume":"20","author":"H Cho","year":"2019","unstructured":"Cho H, Lee H. Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinform. 2019;20:735.","journal-title":"BMC Bioinform"},{"key":"4993_CR35","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1186\/s12859-022-04810-y","volume":"23","author":"I Segura-Bedmar","year":"2022","unstructured":"Segura-Bedmar I, Camino-Perdones D, Guerrero-Aspizua S. Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts. BMC Bioinform. 2022;23:263.","journal-title":"BMC Bioinform"},{"key":"4993_CR36","doi-asserted-by":"publisher","first-page":"1381","DOI":"10.1093\/bioinformatics\/btx761","volume":"34","author":"L Luo","year":"2018","unstructured":"Luo L, Yang Z, Yang P, Zhang Y, Wang L, Lin H, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics. 2018;34:1381\u20138.","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04993-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-022-04993-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04993-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,16]],"date-time":"2023-03-16T04:59:22Z","timestamp":1678942762000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-022-04993-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,6]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["4993"],"URL":"https:\/\/doi.org\/10.1186\/s12859-022-04993-4","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,6]]},"assertion":[{"value":"25 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 October 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publications"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"524"}}