{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T17:11:59Z","timestamp":1775149919493,"version":"3.50.1"},"reference-count":23,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T00:00:00Z","timestamp":1724716800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"EDF-Discovery"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Achieving carbon neutrality by 2050 requires unprecedented technological, economic, and sociological changes. With time as a scarce resource, it is crucial to base decisions on relevant facts and information to avoid misdirection. This study aims to help decision makers quickly find relevant information related to companies and organizations in the renewable energy sector. In this study, we propose fine-tuning five RNN and transformer models trained for French on a new category, \u201cTECH\u201d. This category is used to classify technological domains and new products. In addition, as the model is fine-tuned on news related to startups, we note an improvement in the detection of startup and company names in the \u201cORG\u201d category. We further explore the capacities of the most effective model to accurately predict entities using a small amount of training data. We show the progression of the model from being trained on several hundred to several thousand annotations. This analysis allows us to demonstrate the potential of these models to extract insights without large corpora, allowing us to reduce the long process of annotating custom training data. This approach is used to automatically extract new company mentions as well as to extract technologies and technology domains that are currently being discussed in the news in order to better analyze industry trends. This approach further allows to group together mentions of specific energy domains with the companies that are actively developing new technologies in the field.<\/jats:p>","DOI":"10.3390\/make6030096","type":"journal-article","created":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T11:58:46Z","timestamp":1724759926000},"page":"1953-1968","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-5042-4337","authenticated-orcid":false,"given":"Connor","family":"MacLean","sequence":"first","affiliation":[{"name":"INSA Strasbourg, 67000 Strasbourg, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1815-5601","authenticated-orcid":false,"given":"Denis","family":"Cavallucci","sequence":"additional","affiliation":[{"name":"INSA Strasbourg, 67000 Strasbourg, France"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,27]]},"reference":[{"key":"ref_1","unstructured":"Abdullah, M. (2024, April 05). Gnews: Provide an API to Search for Articles on Google News and Returns a Usable JSON Response. Online Resource on GitHub. Available online: https:\/\/github.com\/ranahaani\/GNews."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2019). HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_3","unstructured":"Honnibal, M., and Montani, I. (2024, April 02). spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks, and Incremental Parsing. Published in 2017. Available online: https:\/\/spacy.io."},{"key":"ref_4","unstructured":"Nakayama, H., Kubo, T., Kamura, J., Taniguchi, Y., and Liang, X. (2024, April 02). doccano: Text Annotation Tool for Human. Available online: https:\/\/github.com\/doccano\/doccano."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Weichselbraun, A., Streiff, D., and Scharl, A. (2014, January 2\u20134). Linked Enterprise Data for Fine Grained Named Entity Linking and Web Intelligence. Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS \u201914), Thessaloniki, Greece.","DOI":"10.1145\/2611040.2611052"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1016\/j.jbi.2017.11.007","article-title":"Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition","volume":"76","author":"Unanue","year":"2017","journal-title":"J. Biomed. Inform."},{"key":"ref_7","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2024, April 03). Attention Is All You Need. Advances in Neural Information Processing Systems, 2017; volume 30, pp. 5998\u20136008. Available online: https:\/\/papers.nips.cc\/paper\/7181-attention-is-all-you-need.pdf."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kumar, M., Chaturvedi, K.K., Sharma, A., Arora, A., Farooqi, M.S., Lal, S.B., Lama, A., and Ranjan, R. (2024, April 03). An Algorithm for Automatic Text Annotation for Named Entity Recognition Using the spaCy Framework. Preprints 2023. Available online: https:\/\/typeset.io\/pdf\/an-algorithm-for-automatic-text-annotation-for-named-entity-3r6892x9.pdf.","DOI":"10.21203\/rs.3.rs-2930333\/v1"},{"key":"ref_9","unstructured":"Jayathilake, H.M. (2021). Custom NER Model for Pandemic Outbreak Surveillance Using Twitter. [MSc Thesis, Robert Gordon University]."},{"key":"ref_10","first-page":"74","article-title":"Resume Ranking based on Job Description using SpaCy NER model","volume":"7","author":"Satheesh","year":"2020","journal-title":"Int. Res. J. Eng. Technol."},{"key":"ref_11","unstructured":"Goel, M., Agarwal, A., Agrawal, S., Kapuriya, J., Konam, A.V., Gupta, R., Rastogi, S., and Bagler, G. (2024, April 04). Deep Learning Based Named Entity Recognition Models for Recipes. Preprint. Available online: https:\/\/arxiv.org\/abs\/2402.17447."},{"key":"ref_12","unstructured":"Richardson, L. (2024, April 04). Beautifulsoup4: Screen-Scraping Library. Available online: https:\/\/www.crummy.com\/software\/BeautifulSoup\/bs4\/."},{"key":"ref_13","unstructured":"Pomik\u00e1lek, J. (2024, April 04). jusText: Heuristic-Based Boilerplate Removal Tool. Available online: https:\/\/github.com\/pomikalek\/jusText."},{"key":"ref_14","unstructured":"Korbak, T., Elsahar, H., Kruszewski, G., and Dymetman, M. (2022). Controlling Conditional Language Models without Catastrophic Forgetting. arXiv."},{"key":"ref_15","unstructured":"Ramshaw, L.A., and Marcus, M.P. (1995). Text Chunking Using Transformation-Based Learning. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1016\/j.patcog.2017.10.013","article-title":"Recent advances in convolutional neural networks","volume":"77","author":"Gu","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_17","unstructured":"Moens, M.-F., Huang, X., Specia, L., and Wen-tau Yih, S. (2021). WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Martin, L., Muller, B., Su\u00e1rez, P.J.O., Dupont, Y., Romary, L., de La Clergerie, \u00c9.V., Seddah, D., and Sagot, B. (2019). CamemBERT: A tasty French language model. arXiv.","DOI":"10.18653\/v1\/2020.acl-main.645"},{"key":"ref_19","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_20","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_21","unstructured":"Delestre, C., and Amar, A. (2024, April 04). DistilCamemBERT: Une Distillation du Mod\u00e8le Fran\u00e7ais CamemBERT. In CAp (Conf\u00e9rence sur l\u2019Apprentissage Automatique), Vannes, France, July 2022. Available online: https:\/\/hal.archives-ouvertes.fr\/hal-03674695."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1016\/j.artint.2012.03.006","article-title":"Learning multilingual named entity recognition from Wikipedia","volume":"194","author":"Nothman","year":"2013","journal-title":"Artif. Intell."},{"key":"ref_23","unstructured":"Polle, J.B. (2024, April 04). LSTM Model for Email Signature Detection. Medium, 24 September 2021. Available online: https:\/\/medium.com\/@jean-baptiste.polle\/lstm-model-for-email-signature-detection-8e990384fefa."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/3\/96\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:43:40Z","timestamp":1760111020000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/3\/96"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,27]]},"references-count":23,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["make6030096"],"URL":"https:\/\/doi.org\/10.3390\/make6030096","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,27]]}}}