{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:23:21Z","timestamp":1772119401659,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2023,1,9]],"date-time":"2023-01-09T00:00:00Z","timestamp":1673222400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Faculty of Computer Science and Engineering"},{"name":"Methodius University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Even though named entity recognition (NER) has seen tremendous development in recent years, some domain-specific use-cases still require tagging of unique entities, which is not well handled by pre-trained models. Solutions based on enhancing pre-trained models or creating new ones are efficient, but creating reliable labeled training for them to learn on is still challenging. In this paper, we introduce PharmKE, a text analysis platform tailored to the pharmaceutical industry that uses deep learning at several stages to perform an in-depth semantic analysis of relevant publications. The proposed methodology is used to produce reliably labeled datasets leveraging cutting-edge transfer learning, which are later used to train models for specific entity labeling tasks. By building models for the well-known text-processing libraries spaCy and AllenNLP, this technique is used to find Pharmaceutical Organizations and Drugs in texts from the pharmaceutical domain. The PharmKE platform also incorporates the NER findings to resolve co-references of entities and examine the semantic linkages in each phrase, creating a foundation for further text analysis tasks, such as fact extraction and question answering. Additionally, the knowledge graph created by DBpedia Spotlight for a specific pharmaceutical text is expanded using the identified entities. The obtained results with the proposed methodology result in about a 96% F1-score on the NER tasks, which is up to 2% better than those of the fine-tuned BERT and BioBERT models developed using the same dataset. The ultimate benefits of the platform are that pharmaceutical domain specialists may more easily identify the knowledge extracted from the input texts thanks to the platform\u2019s visualization of the model findings. Likewise, the proposed techniques can be integrated into mobile and pervasive systems to give patients more relevant and comprehensive information from scanned medication guides. Similarly, it can provide preliminary insights to patients and even medical personnel on whether a drug from a different vendor is compatible with the patient\u2019s prescription medication.<\/jats:p>","DOI":"10.3390\/computers12010017","type":"journal-article","created":{"date-parts":[[2023,1,9]],"date-time":"2023-01-09T07:05:09Z","timestamp":1673247909000},"page":"17","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["PharmKE: Knowledge Extraction Platform for Pharmaceutical Texts Using Transfer Learning"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8188-4220","authenticated-orcid":false,"given":"Nasi","family":"Jofche","sequence":"first","affiliation":[{"name":"Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, 1000 Skopje, North Macedonia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3982-3330","authenticated-orcid":false,"given":"Kostadin","family":"Mishev","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, 1000 Skopje, North Macedonia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2067-3467","authenticated-orcid":false,"given":"Riste","family":"Stojanov","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, 1000 Skopje, North Macedonia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7360-8015","authenticated-orcid":false,"given":"Milos","family":"Jovanovik","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, 1000 Skopje, North Macedonia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7664-0168","authenticated-orcid":false,"given":"Eftim","family":"Zdravevski","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, 1000 Skopje, North Macedonia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3105-6010","authenticated-orcid":false,"given":"Dimitar","family":"Trajanov","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, 1000 Skopje, North Macedonia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,9]]},"reference":[{"key":"ref_1","unstructured":"Krishnan, V., and Ganapathy, V. (2022, November 01). Named Entity Recognition. Available online: https:\/\/cs229.stanford.edu\/proj2005\/KrishnanGanapathy-NamedEntityRecognition.pdf."},{"key":"ref_2","unstructured":"Sang, E.F., and De Meulder, F. (2003). Introduction to the CoNLL-2003 Shared task: Language-independent named entity recognition. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s13326-016-0111-z","article-title":"Consolidating Drug Data on a Global Scale Using Linked Data","volume":"8","author":"Jovanovik","year":"2017","journal-title":"J. Biomed. Semant."},{"key":"ref_4","unstructured":"Jofche, N., Jovanovik, M., and Trajanov, D. (2019, January 29\u201331). Named Entity Discovery for the Drug Domain. Proceedings of the 16th International Conference on Informatics and Information Technologies, Prague, Czech Republic."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Sundermeyer, M., Schl\u00fcter, R., and Ney, H. (2012, January 9\u201313). LSTM Neural Networks for Language Modeling. Proceedings of the Thirteenth Annual Conference of the International Speech Communication Association, Portland, ON, USA.","DOI":"10.21437\/Interspeech.2012-65"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., and Dyer, C. (2016). Neural Architectures for Named Entity Recognition. arXiv.","DOI":"10.18653\/v1\/N16-1030"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1162\/tacl_a_00104","article-title":"Named Entity Recognition with Bidirectional LSTM-CNNs","volume":"4","author":"Chiu","year":"2016","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_8","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_9","unstructured":"Li, J., Sun, A., Han, J., and Li, C. (2018). A Survey on Deep Learning for Named Entity Recognition. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Balasuriya, D., Ringland, N., Nothman, J., Murphy, T., and Curran, J.R. (2009, January 7). Named Entity Recognition in Wikipedia. Proceedings of the 2009 Workshop on The People\u2019s Web Meets NLP: Collaboratively Constructed Semantic Resources (People\u2019s Web), Singapore.","DOI":"10.3115\/1699765.1699767"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1016\/j.procs.2022.07.107","article-title":"Named Entity Recognition and Knowledge Extraction from Pharmaceutical Texts using Transfer Learning","volume":"203","author":"Jofche","year":"2022","journal-title":"Procedia Comput. Sci."},{"key":"ref_12","first-page":"411","article-title":"spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing","volume":"7","author":"Honnibal","year":"2017","journal-title":"Appear"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N.F., Peters, M., Schmitz, M., and Zettlemoyer, L.S. (2017). AllenNLP: A Deep Semantic Natural Language Processing Platform. arXiv.","DOI":"10.18653\/v1\/W18-2501"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep Contextualized Word Representations. arXiv.","DOI":"10.18653\/v1\/N18-1202"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining","volume":"36","author":"Lee","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_16","first-page":"2493","article-title":"Natural Language Processing (Almost) from Scratch","volume":"12","author":"Collobert","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","unstructured":"Kuru, O., Can, O.A., and Yuret, D. (2016, January 11\u201316). Charner: Character-Level Named Entity Recognition. Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Kim, Y., Jernite, Y., Sontag, D., and Rush, A.M. (2016, January 12\u201317). Character-Aware Neural Language Models. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10362"},{"key":"ref_19","first-page":"279","article-title":"Biomedical Named Entity Recognition Based on Deep Neutral Network","volume":"8","author":"Yao","year":"2015","journal-title":"Int. J. Hybrid Inf. Technol"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"i37","DOI":"10.1093\/bioinformatics\/btx228","article-title":"Deep Learning With Word Embeddings Improves Biomedical Named Entity Recognition","volume":"33","author":"Habibi","year":"2017","journal-title":"Bioinformatics"},{"key":"ref_21","first-page":"1","article-title":"Exploring the Limits of Transfer Learning With a Unified Text-To-Text Transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_22","first-page":"1","article-title":"XLNet: Generalized Autoregressive Pretraining for Language Understanding","volume":"32","author":"Yang","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Hakala, K., and Pyysalo, S. (2019, January 4). Biomedical Named Entity Recognition with Multilingual BERT. Proceedings of the 5th Workshop on BioNLP Open Shared Tasks, Hong Kong, China.","DOI":"10.18653\/v1\/D19-5709"},{"key":"ref_24","unstructured":"Souza, F., Nogueira, R., and Lotufo, R. (2019). Portuguese Named Entity Recognition using BERT-CRF. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lamurias, A., and Couto, F.M. (2019, January 1). LasigeBioTM at MEDIQA 2019: Biomedical Question Answering using Bidirectional Transformers and Named Entity Recognition. Proceedings of the 18th BioNLP Workshop and Shared Task, Florence, Italy.","DOI":"10.18653\/v1\/W19-5057"},{"key":"ref_26","first-page":"9079840","article-title":"Minimalistic approach to coreference resolution in Lithuanian medical records","volume":"2019","author":"Butleris","year":"2019","journal-title":"Comput. Math. Methods Med."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lee, K., He, L., Lewis, M., and Zettlemoyer, L. (2017). End-to-End Neural Coreference Resolution. arXiv.","DOI":"10.18653\/v1\/D17-1018"},{"key":"ref_28","unstructured":"Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., and Zhang, Y. (2012, January 12\u201314). CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes. Proceedings of the Joint Conference on EMNLP and CoNLL-Shared Task, Jeju Island, Korea."},{"key":"ref_29","unstructured":"Shi, P., and Lin, J. (2019). Simple BERT Models for Relation Extraction and Semantic Role Labeling. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Daiber, J., Jakob, M., Hokamp, C., and Mendes, P.N. (2013, January 4\u20136). Improving Efficiency and Accuracy in Multilingual Entity Extraction. Proceedings of the 9th International Conference on Semantic Systems (I-Semantics), Graz, Austria.","DOI":"10.1145\/2506182.2506198"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2019). HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Burtsev, M., Seliverstov, A., Airapetyan, R., Arkhipov, M., Baymurzina, D., Bushkov, N., Gureenkova, O., Khakhulin, T., Kuratov, Y., and Kuznetsov, D. (2018, January 15\u201320). DeepPavlov: Open-Source Library for Dialogue Systems. Proceedings of the ACL 2018, System Demonstrations, Melbourne, Australia.","DOI":"10.18653\/v1\/P18-4021"},{"key":"ref_33","first-page":"100","article-title":"Entity Recognition and Labeling for Medical Literature Based on Neural Network","volume":"6","author":"Ruijie","year":"2022","journal-title":"Data Anal. Knowl. Discov."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Colombo, C.d.S., and Oliveira, E.S.d. (2022, January 16\u201319). Intelligent Information System for Extracting Knowledge from Pharmaceutical Package Inserts. Proceedings of the XVIII Brazilian Symposium on Information Systems, Curitiba, Brazil.","DOI":"10.1145\/3535511.3535558"},{"key":"ref_35","unstructured":"Lassila, O., Swick, R.R., Wide, W., and Consortium, W. (1998). Resource Description Framework (RDF) Model and Syntax Specification, World Wide Web Consortium."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. (2007). DBpedia: A Nucleus for a Web of Open Data. The Semantic Web, Springer.","DOI":"10.1007\/978-3-540-76298-0_52"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Bizer, C., Heath, T., Idehen, K., and Berners-Lee, T. (2008, January 21\u201325). Linked Data on the Web (LDOW2008). Proceedings of the 17th International Conference on World Wide Web, Beijing, China.","DOI":"10.1145\/1367497.1367760"},{"key":"ref_38","unstructured":"(2022, November 01). PharmKE Platform: Public Instance. Available online: http:\/\/pharmke.env4health.finki.ukim.mk."},{"key":"ref_39","unstructured":"(2022, November 01). PharmKE Platform: Source Code. Available online: https:\/\/gitlab.com\/jofce.nasi\/pharma-text-analytics."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1745","DOI":"10.1093\/bioinformatics\/bty869","article-title":"Cross-type Biomedical Named Entity Recognition with Deep Multi-task Learning","volume":"35","author":"Wang","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., and McClosky, D. (2014, January 23\u201325). The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MA, USA.","DOI":"10.3115\/v1\/P14-5010"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Mendes, P.N., Jakob, M., Garc\u00eda-Silva, A., and Bizer, C. (2011, January 7\u20139). DBpedia Spotlight: Shedding Light on the Web of Documents. Proceedings of the 7th International Conference on Semantic Systems, Graz, Austria.","DOI":"10.1145\/2063518.2063519"},{"key":"ref_43","first-page":"13","article-title":"A Survey of Text Similarity Approaches","volume":"68","author":"Gomaa","year":"2013","journal-title":"Int. J. Comput. Appl."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/12\/1\/17\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:04:31Z","timestamp":1760119471000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/12\/1\/17"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,9]]},"references-count":43,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["computers12010017"],"URL":"https:\/\/doi.org\/10.3390\/computers12010017","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,9]]}}}