{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T02:49:46Z","timestamp":1777517386819,"version":"3.51.4"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2024,4,4]],"date-time":"2024-04-04T00:00:00Z","timestamp":1712188800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"AP-HP Foundation"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,5,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>To develop and validate a natural language processing (NLP) pipeline that detects 18 conditions in French clinical notes, including 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-enhancing workflow.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>The detection pipeline relied both on rule-based and machine learning algorithms, respectively, for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with annotated clinical notes in the context of 3 cohort studies related to oncology, cardiology, and rheumatology. The overall workflow was conceived to foster collaboration between studies while respecting the privacy constraints of the data warehouse. We estimated the added values of the advanced technologies and of the collaborative setting.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The pipeline reached macro-averaged F1-score positive predictive value, sensitivity, and specificity of 95.7 (95%CI 94.5-96.3), 95.4 (95%CI 94.0-96.3), 96.0 (95%CI 94.0-96.7), and 99.2 (95%CI 99.0-99.4), respectively. F1-scores were superior to those observed using alternative technologies or non-collaborative settings. The models were shared through a secured registry.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusions<\/jats:title>\n                  <jats:p>We demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided an efficient and robust NLP pipeline that detects conditions mentioned in clinical notes.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocae069","type":"journal-article","created":{"date-parts":[[2024,4,4]],"date-time":"2024-04-04T14:18:17Z","timestamp":1712240297000},"page":"1280-1290","source":"Crossref","is-referenced-by-count":22,"title":["Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions"],"prefix":"10.1093","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4433-442X","authenticated-orcid":false,"given":"Thomas","family":"Petit-Jean","sequence":"first","affiliation":[{"name":"Innovation and Data Unit, IT Department, Assistance Publique-H\u00f4pitaux de Paris , Paris, 75012, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9303-6349","authenticated-orcid":false,"given":"Christel","family":"G\u00e9rardin","sequence":"additional","affiliation":[{"name":"Innovation and Data Unit, IT Department, Assistance Publique-H\u00f4pitaux de Paris , Paris, 75012, France"},{"name":"Institut Pierre-Louis d\u2019Epid\u00e9miologie et de Sant\u00e9 Publique, INSERM, Sorbonne Universit\u00e9 , Paris, 75012, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0418-3110","authenticated-orcid":false,"given":"Emmanuelle","family":"Berthelot","sequence":"additional","affiliation":[{"name":"Department of Cardiology, H\u00f4pital Bic\u00eatre, Assistance Publique-H\u00f4pitaux de Paris , Le Kremlin Bic\u00eatre, 94270, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6373-8956","authenticated-orcid":false,"given":"Gilles","family":"Chatellier","sequence":"additional","affiliation":[{"name":"Innovation and Data Unit, IT Department, Assistance Publique-H\u00f4pitaux de Paris , Paris, 75012, France"},{"name":"Department of Medical Informatics, Assistance Publique-H\u00f4pitaux de Paris, Centre-Universit\u00e9 de Paris (APHP-CUP), Universit\u00e9 de Paris , Paris, 75015, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7929-6523","authenticated-orcid":false,"given":"Marie","family":"Frank","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, H\u00f4pitaux Universitaires Paris-Saclay, Assistance Publique-H\u00f4pitaux de Paris , Le Kremlin-Bic\u00eatre, 94270, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2452-8868","authenticated-orcid":false,"given":"Xavier","family":"Tannier","sequence":"additional","affiliation":[{"name":"Laboratoire d'Informatique M\u00e9dicale et d'Ing\u00e9nierie des Connaissances pour la e-Sant\u00e9 (LIMICS), INSERM, Universit\u00e9 Sorbonne Paris Nord, Sorbonne Universit\u00e9 , Paris, 75005, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9285-1966","authenticated-orcid":false,"given":"Emmanuelle","family":"Kempf","sequence":"additional","affiliation":[{"name":"Laboratoire d'Informatique M\u00e9dicale et d'Ing\u00e9nierie des Connaissances pour la e-Sant\u00e9 (LIMICS), INSERM, Universit\u00e9 Sorbonne Paris Nord, Sorbonne Universit\u00e9 , Paris, 75005, France"},{"name":"Department of Medical Oncology, Henri Mondor and Albert Chenevier Teaching Hospital, Assistance Publique-H\u00f4pitaux de Paris , Cr\u00e9teil, 94000, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6413-5188","authenticated-orcid":false,"given":"Romain","family":"Bey","sequence":"additional","affiliation":[{"name":"Innovation and Data Unit, IT Department, Assistance Publique-H\u00f4pitaux de Paris , Paris, 75012, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,4,4]]},"reference":[{"issue":"1","key":"2024052019464580900_ocae069-B1","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1038\/s41591-018-0300-7","article-title":"High-performance medicine: the convergence of human and artificial intelligence","volume":"25","author":"Topol","year":"2019","journal-title":"Nat Med"},{"issue":"7956","key":"2024052019464580900_ocae069-B2","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1038\/s41586-023-05881-4","article-title":"Foundation models for generalist medical artificial intelligence","volume":"616","author":"Moor","year":"2023","journal-title":"Nature"},{"key":"2024052019464580900_ocae069-B3","author":"National Science and Technology Council","year":"2023"},{"key":"2024052019464580900_ocae069-B4","first-page":"578","author":"Lehman"},{"key":"2024052019464580900_ocae069-B5","first-page":"2633","author":"Carlini","year":"2021"},{"key":"2024052019464580900_ocae069-B6","doi-asserted-by":"crossref","first-page":"c4226","DOI":"10.1136\/bmj.c4226","article-title":"Importance of accurately identifying disease in studies using electronic health records","volume":"341","author":"Manuel","year":"2010","journal-title":"BMJ"},{"issue":"2","key":"2024052019464580900_ocae069-B7","doi-asserted-by":"crossref","first-page":"e12239","DOI":"10.2196\/12239","article-title":"Natural language processing of clinical notes on chronic diseases: systematic review","volume":"7","author":"Sheikhalishahi","year":"2019","journal-title":"JMIR Med Inform"},{"issue":"6","key":"2024052019464580900_ocae069-B8","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1016\/j.revmed.2019.12.016","article-title":"Association des comorbidit\u00e9s psychiatriques avec la dur\u00e9e de s\u00e9jour des patients en m\u00e9decine interne d\u2019aval des urgences","volume":"41","author":"Lampros","year":"2020","journal-title":"La Revue de M\u00e9decine Interne"},{"issue":"5","key":"2024052019464580900_ocae069-B9","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1016\/0021-9681(87)90171-8","article-title":"A new method of classifying prognostic comorbidity in longitudinal studies: development and validation","volume":"40","author":"Charlson","year":"1987","journal-title":"J Chronic Dis"},{"issue":"6","key":"2024052019464580900_ocae069-B10","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1016\/0895-4356(92)90133-8","article-title":"Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases","volume":"45","author":"Deyo","year":"1992","journal-title":"J Clin Epidemiol"},{"issue":"12","key":"2024052019464580900_ocae069-B11","doi-asserted-by":"crossref","first-page":"1288","DOI":"10.1016\/j.jclinepi.2004.03.012","article-title":"New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality","volume":"57","author":"Sundararajan","year":"2004","journal-title":"J Clin Epidemiol"},{"key":"2024052019464580900_ocae069-B12","first-page":"160","author":"Chuang","year":"2002"},{"issue":"9","key":"2024052019464580900_ocae069-B13","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1016\/j.mayocp.2012.04.015","article-title":"Derivation and validation of automated electronic search strategies to extract Charlson comorbidities from electronic medical records","volume":"87","author":"Singh","year":"2012","journal-title":"Mayo Clin Proc"},{"issue":"e2","key":"2024052019464580900_ocae069-B14","doi-asserted-by":"crossref","first-page":"e239","DOI":"10.1136\/amiajnl-2013-001889","article-title":"Deriving comorbidities from medical records using natural language processing","volume":"20","author":"Salmasian","year":"2013","journal-title":"J Am Med Inform Assoc"},{"issue":"9","key":"2024052019464580900_ocae069-B15","doi-asserted-by":"crossref","first-page":"1296","DOI":"10.1002\/clc.23687","article-title":"Natural language processing for the assessment of cardiovascular disease comorbidities: the cardio-canary comorbidity project","volume":"44","author":"Berman","year":"2021","journal-title":"Clin Cardiol"},{"issue":"2","key":"2024052019464580900_ocae069-B16","doi-asserted-by":"crossref","first-page":"e23934","DOI":"10.2196\/23934","article-title":"Electronic medical record\u2013based case phenotyping for the charlson conditions: scoping review","volume":"9","author":"Lee","year":"2021","journal-title":"JMIR Med Inform"},{"issue":"3","key":"2024052019464580900_ocae069-B17","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1177\/19322968211000831","article-title":"Using natural language processing to measure and improve quality of diabetes care: a systematic review","volume":"15","author":"Turchin","year":"2021","journal-title":"J Diabetes Sci Technol"},{"key":"2024052019464580900_ocae069-B18","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1109\/CBMS.2018.00009","volume-title":"2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS)","author":"Dias Pereira dos Santos","year":"2018"},{"issue":"4","key":"2024052019464580900_ocae069-B19","doi-asserted-by":"crossref","first-page":"e6328","DOI":"10.2196\/medinform.6328","article-title":"Web-based real-time case finding for the population health management of patients with diabetes mellitus: a prospective validation of the natural language processing\u2013based algorithm with statewide electronic medical records","volume":"4","author":"Zheng","year":"2016","journal-title":"JMIR Med Inform"},{"key":"2024052019464580900_ocae069-B20","author":"Dura"},{"issue":"4","key":"2024052019464580900_ocae069-B21","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"Biobert: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"issue":"5","key":"2024052019464580900_ocae069-B22","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1093\/jamia\/ocv180","article-title":"Extracting information from the text of electronic medical records to improve case detection: a systematic review","volume":"23","author":"Ford","year":"2016","journal-title":"J Am Med Inform Assoc"},{"key":"2024052019464580900_ocae069-B23","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1016\/j.jbi.2017.11.011","article-title":"Clinical information extraction applications: a literature review","volume":"77","author":"Wang","year":"2018","journal-title":"J Biomed Inform"},{"key":"2024052019464580900_ocae069-B24","doi-asserted-by":"crossref","first-page":"102083","DOI":"10.1016\/j.artmed.2021.102083","article-title":"Multi-domain clinical natural language processing with medcat: the medical concept annotation toolkit","volume":"117","author":"Kraljevic","year":"2021","journal-title":"Artif Intell Med"},{"key":"2024052019464580900_ocae069-B25","author":"Gorinski","year":"2019"},{"issue":"3","key":"2024052019464580900_ocae069-B26","doi-asserted-by":"crossref","first-page":"e17934","DOI":"10.2196\/17934","article-title":"Hybrid deep learning for medication-related information extraction from clinical texts in French: MedExt algorithm development study","volume":"9","author":"Jouffroy","year":"2021","journal-title":"JMIR Med Inform"},{"issue":"1","key":"2024052019464580900_ocae069-B27","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/s13326-018-0179-8","article-title":"Clinical natural language processing in languages other than English: opportunities and challenges","volume":"9","author":"N\u00e9v\u00e9ol","year":"2018","journal-title":"J Biomed Semantics"},{"issue":"3","key":"2024052019464580900_ocae069-B28","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1038\/s43588-021-00048-5","article-title":"We need to talk about the lack of investment in digital research infrastructure","volume":"1","author":"Knowles","year":"2021","journal-title":"Nat Comput Sci"},{"key":"2024052019464580900_ocae069-B29","first-page":"267","author":"Carlini","year":"2019"},{"key":"2024052019464580900_ocae069-B30","author":"The European Parliament and the Council of the European Union","year":"2023"},{"issue":"10","key":"2024052019464580900_ocae069-B31","doi-asserted-by":"crossref","first-page":"e1001885","DOI":"10.1371\/journal.pmed.1001885","article-title":"The reporting of studies conducted using observational routinely-collected health data (record) statement","volume":"12","author":"Benchimol","year":"2015","journal-title":"PLoS Med"},{"issue":"8","key":"2024052019464580900_ocae069-B32","doi-asserted-by":"crossref","first-page":"1244","DOI":"10.1093\/jamia\/ocaa096","article-title":"Fold-stratified cross-validation for unbiased and privacy-preserving federated learning","volume":"27","author":"Bey","year":"2020","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2024052019464580900_ocae069-B33","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1136\/jamia.2009.000893","article-title":"Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)","volume":"17","author":"Murphy","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2024052019464580900_ocae069-B34","author":"Dura"},{"key":"2024052019464580900_ocae069-B35","first-page":"1","author":"Dalloux","year":"2019"},{"key":"2024052019464580900_ocae069-B36","author":"Martin","year":"2019"},{"issue":"12","key":"2024052019464580900_ocae069-B37","doi-asserted-by":"crossref","first-page":"4090","DOI":"10.1111\/ene.15071","article-title":"Machine learning-enabled multitrust audit of stroke comorbidities using natural language processing","volume":"28","author":"Shek","year":"2021","journal-title":"Eur J Neurol"},{"key":"2024052019464580900_ocae069-B38","author":"Labrak","year":"2023"},{"key":"2024052019464580900_ocae069-B39","first-page":"80","article-title":"Solving artificial intelligence\u2019s privacy problem","volume":"17(Special Issue)","author":"de Montjoye","year":"2017","journal-title":"Field Actions Sci Rep"},{"key":"2024052019464580900_ocae069-B40","first-page":"901","author":"Aggarwal","year":"2005"},{"key":"2024052019464580900_ocae069-B41","author":"Tannier"},{"issue":"1","key":"2024052019464580900_ocae069-B42","doi-asserted-by":"crossref","first-page":"3069","DOI":"10.1038\/s41467-019-10933-3","article-title":"Estimating the success of re-identifications in incomplete datasets using generative models","volume":"10","author":"Rocher","year":"2019","journal-title":"Nat Commun"},{"issue":"1","key":"2024052019464580900_ocae069-B43","doi-asserted-by":"crossref","first-page":"180286","DOI":"10.1038\/sdata.2018.286","article-title":"On the privacy-conscientious use of mobile phone data","volume":"5","author":"De Montjoye","year":"2018","journal-title":"Sci Data"},{"issue":"3-4","key":"2024052019464580900_ocae069-B44","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1561\/0400000042","article-title":"The algorithmic foundations of differential privacy. Foundations.","volume":"9","author":"Dwork","year":"2013","journal-title":"FNT in Theoretical Computer Science"},{"issue":"1","key":"2024052019464580900_ocae069-B45","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1038\/s41746-020-00323-1","article-title":"The future of digital health with federated learning","volume":"3","author":"Rieke","year":"2020","journal-title":"NPJ Digit Med"},{"key":"2024052019464580900_ocae069-B46","first-page":"56","author":"Fort","year":"2010"},{"key":"2024052019464580900_ocae069-B47","author":"Petit-Jean","year":"2023"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/6\/1280\/57769016\/ocae069.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/6\/1280\/57769016\/ocae069.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,20]],"date-time":"2024-05-20T19:47:53Z","timestamp":1716234473000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/6\/1280\/7640728"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,4]]},"references-count":47,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2024,4,4]]},"published-print":{"date-parts":[[2024,5,20]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae069","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,6,1]]},"published":{"date-parts":[[2024,4,4]]}}}