{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T16:55:45Z","timestamp":1772470545559,"version":"3.50.1"},"reference-count":15,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T00:00:00Z","timestamp":1760659200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Objectives: To guide language model (LM) selection by comparing finetuning vs. zero-shot use, generic pretraining vs. domain-adjacent vs. further domain-specific pretraining, and bidirectional language models (BiLMs) such as BERT vs. unidirectional LMs (LLMs) for clinical classification. Materials and Methods: We evaluated BiLMs (RoBERTa, PathologyBERT, Gatortron) and LLM (Mistral nemo instruct 12B) on three British Columbia Cancer Registry (BCCR) pathology classification tasks varying in difficulty\/data size. We assessed zero-shot vs. finetuned BiLMs, zero-shot LLM, and further BCCR-specific pretraining using macro-average F1 scores. Results: Finetuned BiLMs outperformed zero-shot BiLMs and zero-shot LLM. The zero-shot LLM outperformed zero-shot BiLMs but was consistently outperformed by finetuned BiLMs. Domain-adjacent BiLMs generally outperformed generic BiLMs after finetuning. Further domain-specific pretraining boosted complex\/low-data task performance, with otherwise modest gains. Conclusions: For specialized classification, finetuning BiLMs is crucial, often surpassing zero-shot LLMs. Domain-adjacent pretrained models are recommended. Further domain-specific pretraining provides significant performance boosts, especially for complex\/low-data scenarios. BiLMs remain relevant, offering strong performance\/resource balance for targeted clinical tasks.<\/jats:p>","DOI":"10.3390\/make7040121","type":"journal-article","created":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T10:28:36Z","timestamp":1760696916000},"page":"121","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare"],"prefix":"10.3390","volume":"7","author":[{"given":"Lovedeep","family":"Gondara","sequence":"first","affiliation":[{"name":"British Columbia Cancer Registry, Provincial Health Services Authority, Vancouver, BC V6H 4C1, Canada"},{"name":"School of Population and Public Health, University of British Columbia, Vancouver, BC V5Z 4E8, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6184-6024","authenticated-orcid":false,"given":"Jonathan","family":"Simkin","sequence":"additional","affiliation":[{"name":"British Columbia Cancer Registry, Provincial Health Services Authority, Vancouver, BC V6H 4C1, Canada"}]},{"given":"Graham","family":"Sayle","sequence":"additional","affiliation":[{"name":"The Data Science Institute, University of British Columbia, Vancouver, BC V5Z 4E8, Canada"}]},{"given":"Shebnum","family":"Devji","sequence":"additional","affiliation":[{"name":"British Columbia Cancer Registry, Provincial Health Services Authority, Vancouver, BC V6H 4C1, Canada"}]},{"given":"Gregory","family":"Arbour","sequence":"additional","affiliation":[{"name":"The Data Science Institute, University of British Columbia, Vancouver, BC V5Z 4E8, Canada"}]},{"given":"Raymond","family":"Ng","sequence":"additional","affiliation":[{"name":"The Data Science Institute, University of British Columbia, Vancouver, BC V5Z 4E8, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,17]]},"reference":[{"key":"ref_1","unstructured":"Qin, L., Chen, Q., Feng, X., Wu, Y., Zhang, Y., Li, Y., Li, M., Che, W., and Yu, P.S. (2024). Large language models meet NLP: A survey. arXiv."},{"key":"ref_2","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Bedi, S., Liu, Y., Orr-Ewing, L., Dash, D., Koyejo, S., Callahan, A., Fries, J.A., Wornow, M., Swaminathan, A., and Lehmann, L.S. (2024). A systematic review of testing and evaluation of healthcare applications of large language models (LLMs). medRxiv.","DOI":"10.1101\/2024.04.15.24305869"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1007\/s10462-024-10921-0","article-title":"Large language models in medical and healthcare fields: Applications, advances, and challenges","volume":"57","author":"Wang","year":"2024","journal-title":"Artif. Intell. Rev."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"e2400110","DOI":"10.1200\/CCI.24.00110","article-title":"Classifying Tumor Reportability Status From Unstructured Electronic Pathology Reports Using Language Models in a Population-Based Cancer Registry Setting","volume":"8","author":"Gondara","year":"2024","journal-title":"JCO Clin. Cancer Inform."},{"key":"ref_6","unstructured":"Gondara, L., Simkin, J., Devji, S., Arbour, G., and Ng, R. (2025). ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports. arXiv."},{"key":"ref_7","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized BERT pretraining approach. arXiv."},{"key":"ref_8","unstructured":"Santos, T., Tariq, A., Das, S., Vayalpati, K., Smith, G.H., Trivedi, H., and Banerjee, I. (2023, January 29). PathologyBERT-pre-trained vs. a new transformer language model for pathology domain. Proceedings of the AMIA Annual Symposium Proceedings, San Francisco, CA, USA."},{"key":"ref_9","unstructured":"Yang, X., Chen, A., PourNejatian, N., Shin, H.C., Smith, K.E., Parisien, C., Compas, C., Martin, C., Flores, M.G., and Zhang, Y. (2022). Gatortron: A large clinical language model to unlock patient information from unstructured electronic health records. arXiv."},{"key":"ref_10","unstructured":"Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., and Saulnier, L. (2023). Mistral 7B. arXiv."},{"key":"ref_11","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Howard, J., and Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv.","DOI":"10.18653\/v1\/P18-1031"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gururangan, S., Marasovi\u0107, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., and Smith, N.A. (2020). Don\u2019t stop pretraining: Adapt language models to domains and tasks. arXiv.","DOI":"10.18653\/v1\/2020.acl-main.740"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"109561","DOI":"10.1016\/j.engappai.2024.109561","article-title":"Fine-tuning language model embeddings to reveal domain knowledge: An explainable artificial intelligence perspective on medical decision making","volume":"139","author":"Harb","year":"2025","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_15","unstructured":"Kokalj, E., \u0160krlj, B., Lavra\u010d, N., Pollak, S., and Robnik-\u0160ikonja, M. (2021, January 19). BERT meets shapley: Extending SHAP explanations to transformer-based classifiers. Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, Online."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/121\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T11:15:49Z","timestamp":1760699749000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/121"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,17]]},"references-count":15,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["make7040121"],"URL":"https:\/\/doi.org\/10.3390\/make7040121","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,17]]}}}