{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T07:56:35Z","timestamp":1773734195688,"version":"3.50.1"},"reference-count":54,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,5,5]],"date-time":"2023-05-05T00:00:00Z","timestamp":1683244800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,5]],"date-time":"2023-05-05T00:00:00Z","timestamp":1683244800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100023699","name":"Health Data Research UK","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100023699","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["PIII009, PIII029, PIII032, PIII054"],"award-info":[{"award-number":["PIII009, PIII029, PIII032, PIII054"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/V050869\/1"],"award-info":[{"award-number":["EP\/V050869\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/V050869\/1"],"award-info":[{"award-number":["EP\/V050869\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Advanced Care Research Centre"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes).<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12911-023-02181-9","type":"journal-article","created":{"date-parts":[[2023,5,5]],"date-time":"2023-05-05T16:01:48Z","timestamp":1683302508000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Ontology-driven and weakly supervised rare disease identification from clinical notes"],"prefix":"10.1186","volume":"23","author":[{"given":"Hang","family":"Dong","sequence":"first","affiliation":[]},{"given":"V\u00edctor","family":"Su\u00e1rez-Paniagua","sequence":"additional","affiliation":[]},{"given":"Huayu","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Minhong","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Arlene","family":"Casey","sequence":"additional","affiliation":[]},{"given":"Emma","family":"Davidson","sequence":"additional","affiliation":[]},{"given":"Jiaoyan","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Beatrice","family":"Alex","sequence":"additional","affiliation":[]},{"given":"William","family":"Whiteley","sequence":"additional","affiliation":[]},{"given":"Honghan","family":"Wu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,5]]},"reference":[{"issue":"2","key":"2181_CR1","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1038\/s41431-019-0508-0","volume":"28","author":"S Nguengang Wakap","year":"2020","unstructured":"Nguengang Wakap S, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28(2):165\u201373.","journal-title":"Eur J Hum Genet."},{"key":"2181_CR2","unstructured":"Department of Health & Social Care. The UK Rare Diseases Framework. 2021. https:\/\/www.gov.uk\/government\/publications\/uk-rare-diseases-framework\/the-uk-rare-diseases-framework. Accessed 8 May 2022."},{"key":"2181_CR3","unstructured":"Scottish Government. Illnesses and long-term conditions. 2021. https:\/\/www.gov.scot\/policies\/illnesses-and-long-term-conditions\/rare-diseases\/. Accessed 22 Mar 2021."},{"key":"2181_CR4","unstructured":"Richesson RL, Fung KW, Bodenreider O. Coverage of Rare Disease Names in Clinical Coding Systems and Ontologies and Implications for Electronic Health Records-Based Research. In:  Proceedings of the 5th International Conference on Biomedical Ontology. Houston: CEUR Workshop Proceedings (CEUR-WS.org);\u00a02014. p. 78\u201380."},{"key":"2181_CR5","unstructured":"Bearryman E. Does your rare disease have a code? 2016. https:\/\/www.eurordis.org\/news\/does-your-rare-disease-have-code. Accessed 29 July 2021."},{"key":"2181_CR6","doi-asserted-by":"publisher","unstructured":"Dong H, Su\u00e1rez-Paniagua V, Whiteley W, Wu H. Explainable Automated Coding of Clinical Notes using Hierarchical Label-wise Attention Networks and Label Embedding Initialisation. J Biomed Inform. 2021;103728. https:\/\/doi.org\/10.1016\/j.jbi.2021.103728.","DOI":"10.1016\/j.jbi.2021.103728"},{"key":"2181_CR7","doi-asserted-by":"crossref","unstructured":"Dong H, Su\u00e1rez-Paniagua V, Zhang H, Wang M, Whitfield E, Wu H, Rare disease identification from clinical notes with ontologies and weak supervision. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). Online:\u00a0IEEE; 2021. p. 2294\u20138.","DOI":"10.1109\/EMBC46164.2021.9630043"},{"key":"2181_CR8","unstructured":"Kahn\u00a0Jr CE. An Ontology-Based Approach to Estimate the Frequency of Rare Diseases in Narrative-Text Radiology Reports. Stud Health Technol Inf. 2017;245:896\u2013900. MEDINFO 2017: Precision Healthcare through Informatics."},{"key":"2181_CR9","unstructured":"Vasant D, et al. ORDO: an ontology connecting rare disease, epidemiology and genetic data. In Bio-Ontology @ ISMB 2014. 2014. p. 1-4. https:\/\/www.researchgate.net\/publication\/281824026_ORDO_An_Ontology_Connecting_Rare_Disease_Epidemiology_and_Genetic_Data."},{"issue":"1","key":"2181_CR10","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1016\/j.ajhg.2015.05.020","volume":"97","author":"T Groza","year":"2015","unstructured":"Groza T, K\u00f6hler S, Moldenhauer D, Vasilevsky N, Baynam G, Zemojtel T, et al. The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. Am J Hum Genet. 2015;97(1):111\u201324. https:\/\/doi.org\/10.1016\/j.ajhg.2015.05.020.","journal-title":"Am J Hum Genet."},{"key":"2181_CR11","doi-asserted-by":"publisher","unstructured":"Maiella S, Olry A, Hanauer M, Lanneau V, Lourghi H, Donadille B, et\u00a0al. Harmonising phenomics information for a better interoperability in the rare disease field. European Journal of Medical Genetics. 2018;61(11):706\u2013714. Focus on rare disease research projects supported by the E-Rare ERA-Net program. https:\/\/doi.org\/10.1016\/j.ejmg.2018.01.013.","DOI":"10.1016\/j.ejmg.2018.01.013"},{"issue":"2","key":"2181_CR12","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1109\/TKDE.2014.2327028","volume":"27","author":"W Shen","year":"2015","unstructured":"Shen W, Wang J, Han J. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Trans Knowl Data Eng. 2015;27(2):443\u201360. https:\/\/doi.org\/10.1109\/TKDE.2014.2327028.","journal-title":"IEEE Trans Knowl Data Eng."},{"issue":"1","key":"2181_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-018-0723-6","volume":"19","author":"Y Wang","year":"2019","unstructured":"Wang Y, Sohn S, Liu S, Shen F, Wang L, Atkinson EJ, et al. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inf Decis Making. 2019;19(1):1. https:\/\/doi.org\/10.1186\/s12911-018-0723-6.","journal-title":"BMC Med Inf Decis Making."},{"key":"2181_CR14","unstructured":"Ratner A, Varma P, Hancock B, R\u00e9 C, other members\u00a0of Hazy\u00a0Lab. Weak Supervision: A New Programming Paradigm for Machine Learning. 2019. http:\/\/ai.stanford.edu\/blog\/weak-supervision\/. Accessed 13 Mar 2021."},{"issue":"5","key":"2181_CR15","doi-asserted-by":"publisher","first-page":"530","DOI":"10.1093\/jamia\/ocx160","volume":"25","author":"H Wu","year":"2018","unstructured":"Wu H, Toti G, Morley KI, Ibrahim ZM, Folarin A, Jackson R, et al. SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. J Am Med Inform Assoc. 2018;25(5):530\u20137. https:\/\/doi.org\/10.1093\/jamia\/ocx160.","journal-title":"J Am Med Inform Assoc."},{"key":"2181_CR16","doi-asserted-by":"crossref","unstructured":"Wu H, Hodgson K, Dyson S, Morley K, Ibrahim Z, Iqbal E, et al. Efficiently Reusing Natural Language Processing Models for Phenotype Identification in Free-text Electronic Medical Records: Methodological Study. JMIR Med Inf. 2019;7(4):e14782:1-14.","DOI":"10.2196\/14782"},{"key":"2181_CR17","unstructured":"Gorinski PJ, Wu H, Grover C, Tobin R, Talbot C, Whalley H, et\u00a0al. Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches. arXiv preprint arXiv:1903.03985. 2019;Comment: 8 pages, presented at HealTAC 2019, Cardiff, 24-25\/04\/2019."},{"key":"2181_CR18","unstructured":"Gorrell G, Song X, Roberts A. Bio-yodie: A named entity linking system for biomedical text. arXiv preprint arXiv:1811.04860. 2018."},{"key":"2181_CR19","doi-asserted-by":"crossref","unstructured":"Peng Y, Yan S, Lu Z. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. In: Proceedings of the 18th BioNLP Workshop and Shared Task. Florence: Association for Computational Linguistics; 2019. p. 58\u201365.","DOI":"10.18653\/v1\/W19-5006"},{"key":"2181_CR20","doi-asserted-by":"publisher","unstructured":"Johnson AEW, Pollard TJ, Shen L, Lehman LwH, Feng M, Ghassemi M, et\u00a0al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1\u20139. https:\/\/doi.org\/10.1038\/sdata.2016.35.","DOI":"10.1038\/sdata.2016.35"},{"issue":"e2","key":"2181_CR21","doi-asserted-by":"publisher","first-page":"e206","DOI":"10.1136\/amiajnl-2013-002428","volume":"20","author":"J Pathak","year":"2013","unstructured":"Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inf Assoc. 2013;20(e2):e206\u201311. https:\/\/doi.org\/10.1136\/amiajnl-2013-002428.","journal-title":"J Am Med Inf Assoc."},{"issue":"e2","key":"2181_CR22","doi-asserted-by":"publisher","first-page":"e253","DOI":"10.1136\/amiajnl-2013-001945","volume":"20","author":"Y Chen","year":"2013","unstructured":"Chen Y, Carroll RJ, Hinz ERM, Shah A, Eyler AE, Denny JC, et al. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J Am Med Inform Assoc. 2013;20(e2):e253\u20139. https:\/\/doi.org\/10.1136\/amiajnl-2013-001945.","journal-title":"J Am Med Inform Assoc."},{"key":"2181_CR23","doi-asserted-by":"publisher","unstructured":"Searle T, Ibrahim Z, Dobson R. Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset. In: Proceedings of BioNLP. Online: Association for Computational Linguistics. 2020. p. 76\u201385. https:\/\/doi.org\/10.18653\/v1\/2020.bionlp-1.8.","DOI":"10.18653\/v1\/2020.bionlp-1.8"},{"issue":"5","key":"2181_CR24","doi-asserted-by":"publisher","first-page":"1007","DOI":"10.1093\/jamia\/ocv180","volume":"23","author":"E Ford","year":"2016","unstructured":"Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc. 2016;23(5):1007\u201315. https:\/\/doi.org\/10.1093\/jamia\/ocv180.","journal-title":"J Am Med Inform Assoc."},{"key":"2181_CR25","doi-asserted-by":"publisher","unstructured":"Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biom Inform. 2009;42(5):839\u2013851. Biomedical Natural Language Processing. https:\/\/doi.org\/10.1016\/j.jbi.2009.05.002.","DOI":"10.1016\/j.jbi.2009.05.002"},{"key":"2181_CR26","doi-asserted-by":"publisher","first-page":"102083","DOI":"10.1016\/j.artmed.2021.102083","volume":"117","author":"Z Kraljevic","year":"2021","unstructured":"Kraljevic Z, Searle T, Shek A, Roguski L, Noor K, Bean D, et al. Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit. Artif Intell Med. 2021;117:102083. https:\/\/doi.org\/10.1016\/j.artmed.2021.102083.","journal-title":"Artif Intell Med."},{"issue":"1","key":"2181_CR27","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1186\/s13326-020-00231-z","volume":"11","author":"MG Kersloot","year":"2020","unstructured":"Kersloot MG, van Putten FJP, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. J Biomed Semant. 2020;11(1):14. https:\/\/doi.org\/10.1186\/s13326-020-00231-z.","journal-title":"J Biomed Semant."},{"key":"2181_CR28","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1016\/j.jpsychires.2021.01.052","volume":"136","author":"M Cusick","year":"2021","unstructured":"Cusick M, Adekkanattu P, Campion TR, Sholle ET, Myers A, Banerjee S, et al. Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation. J Psychiatr Res. 2021;136:95\u2013102.","journal-title":"J Psychiatr Res."},{"issue":"1","key":"2181_CR29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-021-01695-4","volume":"22","author":"Z Shen","year":"2022","unstructured":"Shen Z, Schutte D, Yi Y, Bompelli A, Yu F, Wang Y, et al. Classifying the lifestyle status for Alzheimer\u2019s disease from clinical notes using deep learning with weak supervision. BMC Med Inf Decis Making. 2022;22(1):1\u201311.","journal-title":"BMC Med Inf Decis Making."},{"key":"2181_CR30","doi-asserted-by":"publisher","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of NAACL-HLT. Minneapolis, Minnesota: Association for Computational Linguistics. 2019. p. 4171\u20134186. https:\/\/doi.org\/10.18653\/v1\/N19-1423.","DOI":"10.18653\/v1\/N19-1423"},{"key":"2181_CR31","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). Long Beach: NeurIPS Proceedings;\u00a02017. p. 5998\u20136008."},{"key":"2181_CR32","doi-asserted-by":"publisher","unstructured":"Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et\u00a0al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans Comput Healthcare. 2021;3(1). https:\/\/doi.org\/10.1145\/3458754.","DOI":"10.1145\/3458754"},{"key":"2181_CR33","doi-asserted-by":"publisher","unstructured":"Liu F, Shareghi E, Meng Z, Basaldella M, Collier N. Self-Alignment Pretraining for Biomedical Entity Representations. In: Proceedings of NAACL-HLT. Online: Association for Computational Linguistics. 2021. p. 4228\u20134238. https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.334.","DOI":"10.18653\/v1\/2021.naacl-main.334"},{"key":"2181_CR34","doi-asserted-by":"publisher","unstructured":"Noy NF. Ontology Mapping. In: Staab S, Studer R, editors. Handbook on Ontologies. International Handbooks on Information Systems. Berlin, Heidelberg: Springer. 2009. p. 573\u2013590. https:\/\/doi.org\/10.1007\/978-3-540-92673-3_26.","DOI":"10.1007\/978-3-540-92673-3_26"},{"key":"2181_CR35","doi-asserted-by":"publisher","unstructured":"Euzenat J, Shvaiko P. The Matching Problem. In: Ontology Matching. Berlin, Heidelberg: Springer Berlin Heidelberg. 2013. p. 25\u201354. https:\/\/doi.org\/10.1007\/978-3-642-38721-0_2.","DOI":"10.1007\/978-3-642-38721-0_2"},{"key":"2181_CR36","doi-asserted-by":"publisher","unstructured":"Textoris J, Leone M. Genetic Aspects of Uncommon Diseases. In: Leone M, Martin C, Vincent JL, editors. Uncommon Diseases in the ICU. Cham: Springer International Publishing; 2014. p. 3\u201311. https:\/\/doi.org\/10.1007\/978-3-319-04576-4_1.","DOI":"10.1007\/978-3-319-04576-4_1"},{"key":"2181_CR37","doi-asserted-by":"publisher","unstructured":"Gururangan S, Marasovi\u0107 A, Swayamdipta S, Lo K, Beltagy I, Downey D, et\u00a0al. Don\u2019t Stop Pretraining: Adapt Language Models to Domains and Tasks. In: Proceedings of ACL. Online: Association for Computational Linguistics. 2020. p. 8342\u20138360. https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.740.","DOI":"10.18653\/v1\/2020.acl-main.740"},{"key":"2181_CR38","unstructured":"Ma X, Wang Z, Ng P, Nallapati R, Xiang B. Universal text representation from bert: An empirical study. arXiv preprint arXiv:1910.07973. 2019."},{"key":"2181_CR39","unstructured":"Ministry of Health NZ. Mapping between ICD-10 and ICD-9. 2000. https:\/\/www.health.govt.nz\/nz-health-statistics\/data-references\/mapping-tools\/mapping-between-icd-10-and-icd-9. Accessed 30 Apr 2021."},{"key":"2181_CR40","unstructured":"NCBO BioPortal. International Classification of Diseases, Version 9 - Clinical Modification. 2021. https:\/\/bioportal.bioontology.org\/ontologies\/ICD9CM. Accessed 30 Apr 2021."},{"issue":"2","key":"2181_CR41","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1017\/S1351324920000509","volume":"27","author":"D Sykes","year":"2021","unstructured":"Sykes D, Grivas A, Grover C, Tobin R, Sudlow C, Whiteley W, et al. Comparison of rule-based and neural network models for negation detection in radiology reports. Nat Lang Eng. 2021;27(2):203\u201324. https:\/\/doi.org\/10.1017\/S1351324920000509.","journal-title":"Nat Lang Eng."},{"key":"2181_CR42","unstructured":"Xiao H. Serving Google BERT in Production using Tensorflow and ZeroMQ. 2019. https:\/\/hanxiao.io\/2019\/01\/02\/Serving-Google-BERT-in-Production-using-Tensorflow-and-ZeroMQ\/. Accessed 25 Apr 2021."},{"key":"2181_CR43","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12:2825\u201330.","journal-title":"J Mach Learn Res."},{"key":"2181_CR44","unstructured":"Bodnari A. Healthcare gets more productive with new industry-specific AI tools. 2020. https:\/\/cloud.google.com\/blog\/topics\/healthcare-life-sciences\/now-in-preview-healthcare-natural-language-api-and-automl-entity-extraction-for-healthcare. Accessed 15 Mar 2021."},{"issue":"1","key":"2181_CR45","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-021-22328-4","volume":"12","author":"JA Fries","year":"2021","unstructured":"Fries JA, Steinberg E, Khattar S, Fleming SL, Posada J, Callahan A, et al. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nat Commun. 2021;12(1):1\u201311.","journal-title":"Nat Commun."},{"issue":"2","key":"2181_CR46","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1007\/s00778-019-00552-1","volume":"29","author":"A Ratner","year":"2020","unstructured":"Ratner A, Bach SH, Ehrenberg H, Fries J, Wu S, R\u00e9 C. Snorkel: Rapid training data creation with weak supervision. VLDB J. 2020;29(2):709\u201330.","journal-title":"VLDB J."},{"key":"2181_CR47","doi-asserted-by":"publisher","unstructured":"Gibaja E, Ventura S. A Tutorial on Multilabel Learning. ACM Comput Surv. 2015;47(3). https:\/\/doi.org\/10.1145\/2716262.","DOI":"10.1145\/2716262"},{"key":"2181_CR48","unstructured":"Monarch RM. Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI. Shelter Island, NY: Manning Publications Company; 2021. Version 11, MEAP Edition (Manning Early Access Program)."},{"key":"2181_CR49","doi-asserted-by":"publisher","unstructured":"Karamanolakis G, Mukherjee S, Zheng G, Awadallah AH. Self-Training with Weak Supervision. In: Proceedings of NAACL-HLT. Online: Association for Computational Linguistics. 2021. p. 845\u2013863. https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.66.","DOI":"10.18653\/v1\/2021.naacl-main.66"},{"key":"2181_CR50","doi-asserted-by":"publisher","unstructured":"Jiang H, Zhang D, Cao T, Yin B, Zhao T. Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data. In: Proceedings of ACL-IJCNLP. Online: Association for Computational Linguistics. 2021. p. 1775\u20131789. https:\/\/doi.org\/10.18653\/v1\/2021.acl-long.140.","DOI":"10.18653\/v1\/2021.acl-long.140"},{"issue":"1","key":"2181_CR51","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13326-018-0187-8","volume":"9","author":"P Kolyvakis","year":"2018","unstructured":"Kolyvakis P, Kalousis A, Smith B, Kiritsis D. Biomedical ontology alignment: an approach based on representation learning. J Biomed Semant. 2018;9(1):1\u201320.","journal-title":"J Biomed Semant."},{"key":"2181_CR52","doi-asserted-by":"publisher","unstructured":"Lison P, Barnes J, Hubin A. skweak: Weak Supervision Made Easy for NLP. In: Proceedings of ACL-IJCNLP: System Demonstrations. Online: Association for Computational Linguistics. 2021. p. 337\u2013346. https:\/\/doi.org\/10.18653\/v1\/2021.acl-demo.40.","DOI":"10.18653\/v1\/2021.acl-demo.40"},{"key":"2181_CR53","doi-asserted-by":"publisher","unstructured":"Zhang H, Thygesen J, Wu H. Increased COVID-19 related mortality rate for patients with rare diseases: a retrospective cohort study with data from Genomics England. Lancet. 2021;398:S95. Public Health Science 2021. https:\/\/doi.org\/10.1016\/S0140-6736(21)02638-6.","DOI":"10.1016\/S0140-6736(21)02638-6"},{"issue":"1","key":"2181_CR54","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13023-022-02312-x","volume":"17","author":"H Zhang","year":"2022","unstructured":"Zhang H, Thygesen JH, Shi T, Gkoutos GV, Hemingway H, Guthrie B, et al. Increased COVID-19 mortality rate in rare disease patients: a retrospective cohort study in participants of the Genomics England 100,000 Genomes project. Orphanet J Rare Dis. 2022;17(1):1\u20137.","journal-title":"Orphanet J Rare Dis."}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02181-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-023-02181-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02181-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T13:03:05Z","timestamp":1684328585000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-023-02181-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,5]]},"references-count":54,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["2181"],"URL":"https:\/\/doi.org\/10.1186\/s12911-023-02181-9","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,5]]},"assertion":[{"value":"9 September 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We were granted access to MIMIC-III through PhysioNet after completing the ethical training in human research subject protections and HIPAA regulations, through the Collaborative Institutional Training Initiative program (). We have also received NHS Tayside Caldicott Guardian approval (CSAppMW1758) to use the anonymised brain imaging reports for this work. All our methods were carried out in accordance with relevant guidelines and regulations. The approval of both MIMIC-III and NHS Tayside datasets allows us to carry out Natural Language Processing experiments on the reports. All reports have been de-identified and we do not identify any individual patients in the methods and experiments, thus the research is exempt from requiring informed consent from the patients according to the NHS Tayside Caldicott Guardian approval.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"86"}}