{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T02:57:42Z","timestamp":1768705062944,"version":"3.49.0"},"publisher-location":"Cham","reference-count":13,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031657931","type":"print"},{"value":"9783031657948","type":"electronic"}],"license":[{"start":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T00:00:00Z","timestamp":1704067200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T00:00:00Z","timestamp":1723680000000},"content-version":"vor","delay-in-days":227,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>With the rapid expansion of academic literature and the proliferation of preprints, researchers face growing challenges in manually organizing and labeling large volumes of articles. The NSLP 2024 FoRC Shared Task I addresses this challenge organized as a competition. The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article. This paper presents our results.<\/jats:p><jats:p>Initially, we enrich the dataset (containing English scholarly articles sourced from ORKG and arXiv), then leverage different pre-trained language Models (PLMs), specifically BERT, and explore their efficacy in transfer learning for this downstream task.\u00a0Our experiments encompass feature-based and fine-tuned transfer learning approaches using diverse PLMs, optimized for scientific tasks, including SciBERT, SciNCL, and SPECTER2. We conduct hyperparameter tuning and investigate the impact of data augmentation from bibliographic databases such as OpenAlex, Semantic Scholar, and Crossref.\u00a0Our results demonstrate that fine-tuning pre-trained models substantially enhances classification performance, with SPECTER2 emerging as the most accurate model. Moreover, enriching the dataset with additional metadata improves classification outcomes significantly, especially when integrating information from S2AG, OpenAlex and Crossref. Our best-performing approach achieves a weighted F1-score of 0.7415. Overall, our study contributes to the advancement of reliable automated systems for scholarly publication categorization, offering a potential solution to the laborious manual curation process, thereby facilitating researchers in efficiently locating relevant resources.<\/jats:p>","DOI":"10.1007\/978-3-031-65794-8_16","type":"book-chapter","created":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:02:44Z","timestamp":1723615364000},"page":"234-243","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Enriched BERT Embeddings for\u00a0Scholarly Publication Classification"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9345-8958","authenticated-orcid":false,"given":"Benjamin","family":"Wolff","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7258-0532","authenticated-orcid":false,"given":"Eva","family":"Seidlmayer","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1481-2996","authenticated-orcid":false,"given":"Konrad U.","family":"F\u00f6rstner","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,8,15]]},"reference":[{"key":"16_CR1","doi-asserted-by":"crossref","unstructured":"Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)","DOI":"10.18653\/v1\/D19-1371"},{"issue":"1","key":"16_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1057\/s41599-021-00903-w","volume":"8","author":"L Bornmann","year":"2021","unstructured":"Bornmann, L., Haunschild, R., Mutz, R.: Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanit. Soc. Sci. Commun. 8(1), 1\u201315 (2021)","journal-title":"Humanit. Soc. Sci. Commun."},{"key":"16_CR3","unstructured":"Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)"},{"key":"16_CR4","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1007\/978-3-030-72113-8_11","volume-title":"Advances in Information Retrieval","author":"A Garcia-Silva","year":"2021","unstructured":"Garcia-Silva, A., Gomez-Perez, J.M.: Classifying scientific publications with BERT - is self-attention a feature selection method? In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 161\u2013175. Springer, Cham (2021). https:\/\/doi.org\/10.1007\/978-3-030-72113-8_11"},{"key":"16_CR5","unstructured":"Gombert, S.: Twin BERT contextualized sentence embedding space learning and gradient-boosted decision tree ensembles for scene segmentation in German literature. In: STSS@ KONVENS, pp. 42\u201348 (2021)"},{"issue":"1","key":"16_CR6","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1080\/02763869.2020.1704597","volume":"39","author":"MB Hoy","year":"2020","unstructured":"Hoy, M.B.: Rise of the Rxivs: how preprint servers are changing the publishing process. Med. Ref. Serv. Q. 39(1), 84\u201389 (2020)","journal-title":"Med. Ref. Serv. Q."},{"key":"16_CR7","doi-asserted-by":"crossref","unstructured":"Lu, W., Jiao, J., Zhang, R.: TwinBERT: distilling knowledge to twin-structured compressed BERT models for large-scale retrieval. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2645\u20132652 (2020)","DOI":"10.1145\/3340531.3412747"},{"key":"16_CR8","unstructured":"Ostendorff, M., Bourgonje, P., Berger, M., Moreno-Schneider, J., Rehm, G., Gipp, B.: Enriching BERT with knowledge graph embeddings for document classification. arXiv preprint arXiv:1909.08402 (2019)"},{"key":"16_CR9","doi-asserted-by":"crossref","unstructured":"Ostendorff, M., Rethmeier, N., Augenstein, I., Gipp, B., Rehm, G.: Neighborhood contrastive learning for scientific document representations with citation embeddings. arXiv preprint arXiv:2202.06671 (2022)","DOI":"10.18653\/v1\/2022.emnlp-main.802"},{"key":"16_CR10","doi-asserted-by":"crossref","unstructured":"Pappagari, R., \u017belasko, P., Villalba, J., Carmiel, Y., Dehak, N.: Hierarchical transformers for long document classification. Technical report arXiv:1910.10781 (2019)","DOI":"10.1109\/ASRU46091.2019.9003958"},{"key":"16_CR11","unstructured":"Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)"},{"key":"16_CR12","doi-asserted-by":"crossref","unstructured":"Singh, A., D\u2019Arcy, M., Cohan, A., Downey, D., Feldman, S.: SciRepEval: a multi-format benchmark for scientific document representations. arXiv preprint arXiv:2211.13308 (2022)","DOI":"10.18653\/v1\/2023.emnlp-main.338"},{"key":"16_CR13","doi-asserted-by":"publisher","unstructured":"Wang, W., Yan, M., Wu, C.: Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1705\u20131714. Association for Computational Linguistics, Melbourne (2018). https:\/\/doi.org\/10.18653\/v1\/P18-1158. https:\/\/aclanthology.org\/P18-1158","DOI":"10.18653\/v1\/P18-1158"}],"container-title":["Lecture Notes in Computer Science","Natural Scientific Language Processing and Research Knowledge Graphs"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-65794-8_16","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:05:32Z","timestamp":1723615532000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-65794-8_16"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"ISBN":["9783031657931","9783031657948"],"references-count":13,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-65794-8_16","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"15 August 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"NSLP","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Workshop on Natural Scientific Language Processing and Research Knowledge Graphs","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Hersonissos, Crete","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Greece","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 May 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 May 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"1","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"nslp2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/nfdi4ds.github.io\/nslp2024\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}