{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T04:36:28Z","timestamp":1742963788779,"version":"3.40.3"},"publisher-location":"Cham","reference-count":18,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031657931"},{"type":"electronic","value":"9783031657948"}],"license":[{"start":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T00:00:00Z","timestamp":1704067200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T00:00:00Z","timestamp":1723680000000},"content-version":"vor","delay-in-days":227,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our best-performing system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted F1-score of 67.80%, delivering our team the 3rd rank in Sub-task I for the Software Mention Recognition task. We release our source code at this repository (<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/thuynguyen2003\/NER-Three-Stage-Framework-for-Software-Mention-Recognition\">https:\/\/github.com\/thuynguyen2003\/NER-Three-Stage-Framework-for-Software-Mention-Recognition<\/jats:ext-link>).<\/jats:p>","DOI":"10.1007\/978-3-031-65794-8_18","type":"book-chapter","created":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:02:44Z","timestamp":1723615364000},"page":"257-266","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Software Mention Recognition with\u00a0a\u00a0Three-Stage Framework Based on\u00a0BERTology Models at\u00a0SOMD 2024"],"prefix":"10.1007","author":[{"given":"Thuy","family":"Nguyen Thi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anh","family":"Nguyen Viet","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thin","family":"Dang Van","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ngan","family":"Luu-Thuy Nguyen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,8,15]]},"reference":[{"key":"18_CR1","doi-asserted-by":"publisher","unstructured":"Arora, J., Park, Y.: Split-NER: named entity recognition via two question-answering-based classifications. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Toronto, Canada, pp. 416\u2013426. Association for Computational Linguistics (2023). https:\/\/doi.org\/10.18653\/v1\/2023.acl-short.36. https:\/\/aclanthology.org\/2023.acl-short.36","DOI":"10.18653\/v1\/2023.acl-short.36"},{"key":"18_CR2","doi-asserted-by":"publisher","unstructured":"Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3615\u20133620. Association for Computational Linguistics (2019). https:\/\/doi.org\/10.18653\/v1\/D19-1371. https:\/\/aclanthology.org\/D19-1371","DOI":"10.18653\/v1\/D19-1371"},{"key":"18_CR3","doi-asserted-by":"crossref","unstructured":"Chen, T., et al.: RoBERT-Agr: an entity relationship extraction model of massive agricultural text based on the RoBERTa and CRF algorithm. In: 2023 IEEE 8th International Conference on Big Data Analytics (ICBDA), pp. 113\u2013120. IEEE (2023)","DOI":"10.1109\/ICBDA57405.2023.10105090"},{"key":"18_CR4","doi-asserted-by":"publisher","unstructured":"Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440\u20138451. Association for Computational Linguistics (2020). https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.747. https:\/\/aclanthology.org\/2020.acl-main.747","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"18_CR5","doi-asserted-by":"publisher","DOI":"10.1016\/j.dajour.2024.100426","volume":"10","author":"A Dash","year":"2024","unstructured":"Dash, A., Darshana, S., Yadav, D.K., Gupta, V.: A clinical named entity recognition model using pretrained word embedding and deep neural networks. Decis. Anal. J. 10, 100426 (2024)","journal-title":"Decis. Anal. J."},{"key":"18_CR6","doi-asserted-by":"publisher","unstructured":"Derczynski, L., Nichols, E., van Erp, M., Limsopatham, N.: Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text, Copenhagen, Denmark, pp. 140\u2013147. Association for Computational Linguistics (2017). https:\/\/doi.org\/10.18653\/v1\/W17-4418. https:\/\/www.aclweb.org\/anthology\/W17-4418","DOI":"10.18653\/v1\/W17-4418"},{"key":"18_CR7","doi-asserted-by":"publisher","unstructured":"Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171\u20134186. Association for Computational Linguistics (2019). https:\/\/doi.org\/10.18653\/v1\/N19-1423. https:\/\/aclanthology.org\/N19-1423","DOI":"10.18653\/v1\/N19-1423"},{"key":"18_CR8","unstructured":"Simperl, E., Peter\u00a0Clark, K.K.: Natural scientific language processing and research knowledge graphs. In: Lecture Notes in Artificial Intelligence (2024)"},{"issue":"4","key":"18_CR9","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1016\/j.compbiolchem.2009.07.004","volume":"33","author":"L Li","year":"2009","unstructured":"Li, L., Zhou, R., Huang, D.: Two-phase biomedical named entity recognition using CRFs. Comput. Biol. Chem. 33(4), 334\u2013338 (2009)","journal-title":"Comput. Biol. Chem."},{"key":"18_CR10","doi-asserted-by":"crossref","unstructured":"Lopez, P., Du, C., Cohoon, J., Ram, K., Howison, J.: Mining software entities in scientific literature: document-level NER for an extremely imbalance and large-scale task. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3986\u20133995 (2021)","DOI":"10.1145\/3459637.3481936"},{"issue":"8","key":"18_CR11","doi-asserted-by":"publisher","first-page":"1381","DOI":"10.1093\/bioinformatics\/btx761","volume":"34","author":"L Luo","year":"2018","unstructured":"Luo, L., et al.: An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics 34(8), 1381\u20131388 (2018)","journal-title":"Bioinformatics"},{"key":"18_CR12","unstructured":"Pradhan, S., et al.: Towards robust linguistic analysis using OntoNotes. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, Sofia, Bulgaria, pp. 143\u2013152. Association for Computational Linguistics (2013). https:\/\/aclanthology.org\/W13-3516"},{"key":"18_CR13","doi-asserted-by":"crossref","unstructured":"Schindler, D., Bensmann, F., Dietze, S., Kr\u00fcger, F.: SoMeSci-A 5 star open data gold standard knowledge graph of software mentions in scientific articles. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 4574\u20134583 (2021)","DOI":"10.1145\/3459637.3482017"},{"key":"18_CR14","unstructured":"Tunstall, L., Von Werra, L., Wolf, T.: Natural Language Processing with Transformers: Building Language Applications With Hugging Face. O\u2019Reilly (2022)"},{"key":"18_CR15","unstructured":"Wang, S., et al.: GPT-NER: named entity recognition via large language models. arXiv preprint arXiv:2304.10428 (2023)"},{"key":"18_CR16","doi-asserted-by":"crossref","unstructured":"Zhang, H., et al.: Samsung research China-Beijing at SemEval-2023 Task 2: an AL-R model for multilingual complex named entity recognition. In: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pp. 114\u2013120 (2023)","DOI":"10.18653\/v1\/2023.semeval-1.15"},{"key":"18_CR17","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Zhao, Y., Gao, H., Hu, M.: LinkNER: linking local named entity recognition models to large language models using uncertainty. arXiv preprint arXiv:2402.10573 (2024)","DOI":"10.1145\/3589334.3645414"},{"key":"18_CR18","unstructured":"Zhou, W., Zhang, S., Gu, Y., Chen, M., Poon, H.: UniversalNER: targeted distillation from large language models for open named entity recognition. In: The Twelfth International Conference on Learning Representations (2023)"}],"container-title":["Lecture Notes in Computer Science","Natural Scientific Language Processing and Research Knowledge Graphs"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-65794-8_18","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:05:54Z","timestamp":1723615554000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-65794-8_18"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"ISBN":["9783031657931","9783031657948"],"references-count":18,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-65794-8_18","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"15 August 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"NSLP","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Workshop on Natural Scientific Language Processing and Research Knowledge Graphs","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Hersonissos, Crete","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Greece","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 May 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 May 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"1","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"nslp2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/nfdi4ds.github.io\/nslp2024\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}