{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T17:18:06Z","timestamp":1776878286112,"version":"3.51.2"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T00:00:00Z","timestamp":1719878400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T00:00:00Z","timestamp":1719878400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["No.32171314"],"award-info":[{"award-number":["No.32171314"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100021171","name":"Guangdong Basic and Applied Basic Research Foundation","doi-asserted-by":"crossref","award":["2022A1515010671"],"award-info":[{"award-number":["2022A1515010671"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Guangzhou Basic and Applied Basic Research Foundation","award":["202201010371"],"award-info":[{"award-number":["202201010371"]}]},{"name":"University Innovative Team Support for Major Chronic Diseases and Drug Development","award":["26330320901"],"award-info":[{"award-number":["26330320901"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Materials science is an interdisciplinary field that studies the properties, structures, and behaviors of different materials. A large amount of scientific literature contains rich knowledge in the field of materials science, but manually analyzing these papers to find material-related data is a daunting task. In information processing, named entity recognition (NER) plays a crucial role as it can automatically extract entities in the field of materials science, which have significant value in tasks such as building knowledge graphs. The typically used sequence labeling methods for traditional named entity recognition in material science (MatNER) tasks often fail to fully utilize the semantic information in the dataset and cannot effectively extract nested entities. Herein, we proposed to convert the sequence labeling task into a machine reading comprehension (MRC) task. MRC method effectively can solve the challenge of extracting multiple overlapping entities by transforming it into the form of answering multiple independent questions. Moreover, the MRC framework allows for a more comprehensive understanding of the contextual information and semantic relationships within materials science literature, by integrating prior knowledge from queries. State-of-the-art (SOTA) performance was achieved on the Matscholar, BC4CHEMD, NLMChem, SOFC, and SOFC-Slot datasets, with F1-scores of 89.64%, 94.30%, 85.89%, 85.95%, and 71.73%, respectively in MRC approach. By effectively utilizing semantic information and extracting nested entities, this approach holds great significance for knowledge extraction and data analysis in the field of materials science, and thus accelerating the development of material science.<\/jats:p><jats:p><jats:bold>Scientific contribution<\/jats:bold><\/jats:p><jats:p>We have developed an innovative NER method that enhances the efficiency and accuracy of automatic entity extraction in the field of materials science by transforming the sequence labeling task into a MRC task, this approach provides robust support for constructing knowledge graphs and other data analysis tasks.<\/jats:p>","DOI":"10.1186\/s13321-024-00874-5","type":"journal-article","created":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T20:42:13Z","timestamp":1719952933000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Application of machine reading comprehension techniques for named entity recognition in materials science"],"prefix":"10.1186","volume":"16","author":[{"given":"Zihui","family":"Huang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liqiang","family":"He","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuhang","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andi","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiwen","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siwei","family":"Wu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yan","family":"He","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xujie","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,7,2]]},"reference":[{"key":"874_CR1","doi-asserted-by":"publisher","first-page":"3692","DOI":"10.1021\/acs.jcim.9b00470","volume":"59","author":"L Weston","year":"2019","unstructured":"Weston L, Tshitoyan V, Dagdelen J, Kononova O, Trewartha A, Persson KA, Ceder G, Jain A (2019) Named entity recognition and normalization applied to large-scale information extraction from the materials science literature. J Chem Inf Model 59:3692\u20133702","journal-title":"J Chem Inf Model"},{"key":"874_CR2","doi-asserted-by":"publisher","first-page":"1207","DOI":"10.1021\/acs.jcim.1c01199","volume":"62","author":"T Isazawa","year":"2022","unstructured":"Isazawa T, Cole JM (2022) Single model for organic and inorganic chemical named entity recognition in ChemDataExtractor. J Chem Inf Model 62:1207\u20131213","journal-title":"J Chem Inf Model"},{"key":"874_CR3","doi-asserted-by":"publisher","first-page":"S3","DOI":"10.1186\/1758-2946-7-S1-S3","volume":"7","author":"R Leaman","year":"2015","unstructured":"Leaman R, Wei C, Lu Z (2015) tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminformatics 7:S3\u2013S3","journal-title":"J Cheminformatics"},{"key":"874_CR4","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1186\/1758-2946-6-17","volume":"6","author":"S Eltyeb","year":"2014","unstructured":"Eltyeb S, Salim N (2014) Chemical named entities recognition: a review on approaches and applications. J Cheminformatics 6:17","journal-title":"J Cheminformatics"},{"key":"874_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41524-022-00734-6","volume":"8","author":"K Choudhary","year":"2022","unstructured":"Choudhary K, DeCost B, Chen C, Jain A, Tavazza F, Cohn R, Park CW, Choudhary A, Agrawal A, Billinge SJ, Holm E (2022) Recent advances and applications of deep learning methods in materials science. NPJ Comput Mater 8:1\u201326","journal-title":"NPJ Comput Mater"},{"key":"874_CR6","doi-asserted-by":"publisher","first-page":"1633","DOI":"10.1093\/bioinformatics\/bts183","volume":"28","author":"T Rockt\u00e4schel","year":"2012","unstructured":"Rockt\u00e4schel T, Weidlich M, Leser U (2012) ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28:1633\u20131640","journal-title":"Bioinformatics"},{"key":"874_CR7","unstructured":"Humphreys K, Gaizauskas R, Azzam S (1998) University of Sheffield: Description of the LaSIE-II System as Used for MUC-7. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia"},{"key":"874_CR8","doi-asserted-by":"publisher","first-page":"S14","DOI":"10.1186\/1471-2105-6-S1-S14","volume":"6","author":"D Hanisch","year":"2005","unstructured":"Hanisch D, Fundel K, Mevissen H, Zimmer R, Fluck J (2005) ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics 6:S14","journal-title":"BMC Bioinformatics"},{"key":"874_CR9","doi-asserted-by":"crossref","unstructured":"Quimbaya AP (2016) Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Comput Sci","DOI":"10.1016\/j.procs.2016.09.123"},{"key":"874_CR10","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1023\/A:1007558221122","volume":"34","author":"DM Bikel","year":"1999","unstructured":"Bikel DM, Schwartz R, Weischedel RM (1999) An algorithm that learns what\u2019s in a name. Mach Learn 34:211\u2013231","journal-title":"Mach Learn"},{"key":"874_CR11","doi-asserted-by":"crossref","unstructured":"Rabiner LR (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In","DOI":"10.1016\/B978-0-08-051584-7.50027-9"},{"key":"874_CR12","unstructured":"Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning; pp 282\u2013289"},{"key":"874_CR13","doi-asserted-by":"publisher","first-page":"1381","DOI":"10.1093\/bioinformatics\/btx761","volume":"34","author":"L Luo","year":"2018","unstructured":"Luo L, Yang Z, Yang P, Zhang Y, Wang L, Lin H, Wang J (2018) An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics 34:1381\u20131388","journal-title":"Bioinformatics"},{"key":"874_CR14","unstructured":"Lample G, M. B. S. S., (2016) Bidirectional LSTM-CRF models for sequence tagging. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics; pp 260\u2013270"},{"key":"874_CR15","first-page":"856","volume":"2016","author":"AN Jagannatha","year":"2016","unstructured":"Jagannatha AN, Yu H (2016) Structured prediction models for RNN based sequence labeling in clinical text. Proc Conf Empir Methods Nat Lang Process 2016:856\u2013865","journal-title":"Proc Conf Empir Methods Nat Lang Process"},{"key":"874_CR16","doi-asserted-by":"publisher","first-page":"735","DOI":"10.1186\/s12859-019-3321-4","volume":"20","author":"H Cho","year":"2019","unstructured":"Cho H, Lee H (2019) Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinformatics 20:735","journal-title":"BMC Bioinformatics"},{"key":"874_CR17","doi-asserted-by":"crossref","unstructured":"Strubell E, Verga P, Belanger D (2017) Fast and accurate entity recognition with iterated dilated convolutions. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing; pp 2670\u20132680","DOI":"10.18653\/v1\/D17-1283"},{"key":"874_CR18","unstructured":"Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. In arXiv: Computation and Language"},{"key":"874_CR19","doi-asserted-by":"crossref","unstructured":"Peters M, Neumann M (2018) Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics; pp 2227\u20132237","DOI":"10.18653\/v1\/N18-1202"},{"key":"874_CR20","unstructured":"Devlin J, Chang MW, Lee K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics; pp 4171\u20134186"},{"key":"874_CR21","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1038\/s41524-022-00784-w","volume":"8","author":"T Gupta","year":"2022","unstructured":"Gupta T, Zaki M, Krishnan NA, Mausam A (2022) MatSciBERT: a materials domain language model for text mining and information extraction. NPJ Comput Mater 8:102","journal-title":"NPJ Comput Mater"},{"key":"874_CR22","doi-asserted-by":"crossref","unstructured":"Shen Y, Huang PS, Gao J (2017) ReasoNet: learning to stop reading in machine comprehension. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; pp 1047\u20131055","DOI":"10.1145\/3097983.3098177"},{"key":"874_CR23","doi-asserted-by":"crossref","unstructured":"Levy O, Seo M, Choi E (2017) Zero-Shot Relation Extraction via Reading Comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning; pp 333\u2013342","DOI":"10.18653\/v1\/K17-1034"},{"key":"874_CR24","unstructured":"McCann B, Keskar NS, Xiong C (2018) The Natural Language Decathlon: Multitask Learning as Question Answering. In arXiv: Computation and Language"},{"key":"874_CR25","doi-asserted-by":"crossref","unstructured":"Li X, Yin F, Sun Z (2019) Entity-relation extraction as multi-turn question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; pp 1340\u20131350","DOI":"10.18653\/v1\/P19-1129"},{"key":"874_CR26","doi-asserted-by":"crossref","unstructured":"Li X, Feng J, Meng Y (2020) A Unified MRC Framework for Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; pp 5849\u20135859","DOI":"10.18653\/v1\/2020.acl-main.519"},{"key":"874_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2021.103799","volume":"118","author":"C Sun","year":"2021","unstructured":"Sun C, Yang Z, Wang L, Zhang Y, Lin H, Wang J (2021) Biomedical named entity recognition using BERT in the machine reading comprehension framework. J Biomed Inform 118:103799","journal-title":"J Biomed Inform"},{"key":"874_CR28","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1758-2946-7-S1-S2","volume":"7","author":"M Krallinger","year":"2015","unstructured":"Krallinger M, Rabal O, Leitner F, Vazquez M, Salgado D, Lu Z, Leaman R, Lu Y, Ji D, Lowe DM, Sayle RA, Batista-Navarro RT, Rak R, Huber T, Rockt\u00e4schel T, Matos S, Campos D, Tang B, Xu H, Munkhdalai T, Ryu KH, Ramanan SV, Nathan S, \u017ditnik S, Bajec M, Weber L, Irmer M, Akhondi SA, Kors JA, Xu S, An X, Sikdar UKEA (2015) The CHEMDNER corpus of chemicals and drugs and its annotation principles. J Cheminformatics 7:S2","journal-title":"J Cheminformatics"},{"key":"874_CR29","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1038\/s41597-021-00875-1","volume":"8","author":"R Islamaj","year":"2021","unstructured":"Islamaj R, Leaman R, Kim S, Kwon D, Wei C, Comeau DC, Peng Y, Cissel D, Coss C, Fisher C, Guzman R, Kochar PG, Koppel S, Trinh D, Sekiya K, Ward J, Whitman D, Schmidt S, Lu Z (2021) NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature. Sci Data 8:91","journal-title":"Sci Data"},{"key":"874_CR30","doi-asserted-by":"crossref","unstructured":"Friedrich A, Adel H, Tomazic F (2020) The SOFC-Exp corpus and neural approaches to information extraction in the materials science domain. In Proceedings of the 58th annual meeting of the association for computational linguistics; pp 1255\u20131268","DOI":"10.18653\/v1\/2020.acl-main.116"},{"key":"874_CR31","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36:1234\u20131240","journal-title":"Bioinformatics"},{"key":"874_CR32","doi-asserted-by":"crossref","unstructured":"Beltagy I, Lo K, Cohan A (2019) In SCIBERT: A Pretrained Language Model for Scientific Text, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 2019; Hong Kong, pp 3615\u20133620","DOI":"10.18653\/v1\/D19-1371"},{"key":"874_CR33","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1038\/s41524-023-01003-w","volume":"9","author":"P Shetty","year":"2023","unstructured":"Shetty P, Rajan AC, Kuenneth C, Gupta S, Panchumarti LP, Holm L, Zhang C, Ramprasad R (2023) A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. NPJ Comput Mater 9:52\u201352","journal-title":"NPJ Comput Mater"},{"key":"874_CR34","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1186\/s12859-019-2813-6","volume":"20","author":"W Yoon","year":"2019","unstructured":"Yoon W, So CH, Lee J, Kang J (2019) CollaboNet: collaboration of deep neural networks for biomedical named entity recognition. BMC Bioinformatics 20:55","journal-title":"BMC Bioinformatics"},{"key":"874_CR35","doi-asserted-by":"crossref","unstructured":"Watanabe T, Tamura A, Ninomiya T, Makino T, Iwakura T (2019) Multi-task learning for chemical named entity recognition with chemical compound paraphrasing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); pp 6244\u20136249","DOI":"10.18653\/v1\/D19-1648"},{"key":"874_CR36","doi-asserted-by":"publisher","first-page":"2839","DOI":"10.1093\/bioinformatics\/btw343","volume":"32","author":"R Leaman","year":"2016","unstructured":"Leaman R, Lu Z (2016) TaggerOne: joint named entity recognition and normalization with semi-Markov Models. Bioinformatics 32:2839\u20132846","journal-title":"Bioinformatics"},{"key":"874_CR37","doi-asserted-by":"crossref","unstructured":"Peng Y (2019) Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In Proceedings of the 18th BioNLP Workshop and Shared Task. Association for Computational Linguistic; pp 58\u201365","DOI":"10.18653\/v1\/W19-5006"},{"key":"874_CR38","doi-asserted-by":"publisher","first-page":"3976","DOI":"10.1093\/bioinformatics\/btac422","volume":"38","author":"Y Tong","year":"2022","unstructured":"Tong Y, Zhuang F, Zhang H, Fang C, Zhao Y, Wang D, Zhu H, Ni B (2022) Improving biomedical named entity recognition by dynamic caching inter-sentence information. Bioinformatics 38:3976\u20133983","journal-title":"Bioinformatics"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00874-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-024-00874-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00874-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T20:49:17Z","timestamp":1719953357000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-024-00874-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,2]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["874"],"URL":"https:\/\/doi.org\/10.1186\/s13321-024-00874-5","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,2]]},"assertion":[{"value":"9 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 June 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 July 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"76"}}