{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T20:45:43Z","timestamp":1764103543109,"version":"3.41.2"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T00:00:00Z","timestamp":1731628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Inria, Inria Paris"},{"DOI":"10.13039\/100020806","name":"ANR","doi-asserted-by":"publisher","award":["ANR-22-PESN-0007"],"award-info":[{"award-number":["ANR-22-PESN-0007"]}],"id":[{"id":"10.13039\/100020806","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Summary<\/jats:title>\n                  <jats:p>Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>\u2002<\/jats:title>\n                  <jats:p>To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>medkit is available at https:\/\/github.com\/medkit-lib\/medkit.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae681","type":"journal-article","created":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T17:05:08Z","timestamp":1731690308000},"source":"Crossref","is-referenced-by-count":3,"title":["Facilitating phenotyping from clinical texts: the medkit library"],"prefix":"10.1093","volume":"40","author":[{"given":"Antoine","family":"Neuraz","sequence":"first","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]},{"name":"H\u00f4pital Necker, Assistance Publique\u2014H\u00f4pitaux de Paris , Paris 75015,","place":["France"]}]},{"given":"Ghislain","family":"Vaillant","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]}]},{"given":"Camila","family":"Arias","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]}]},{"given":"Olivier","family":"Birot","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]}]},{"given":"Kim-Tam","family":"Huynh","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]}]},{"given":"Thibaut","family":"Fabacher","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]},{"name":"University Hospital of Strasbourg , Strasbourg 67000,","place":["France"]}]},{"given":"Alice","family":"Rogier","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]},{"name":"H\u00f4pital Europ\u00e9en Georges Pompidou, Assistance Publique\u2014H\u00f4pitaux de Paris , Paris 75015,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3326-2811","authenticated-orcid":false,"given":"Nicolas","family":"Garcelon","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]},{"name":"Imagine Institute, Inserm UMR 1163, Universit\u00e9 Paris Cit\u00e9 , Paris 75015,","place":["France"]}]},{"given":"Ivan","family":"Lerner","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]},{"name":"H\u00f4pital Europ\u00e9en Georges Pompidou, Assistance Publique\u2014H\u00f4pitaux de Paris , Paris 75015,","place":["France"]}]},{"given":"Bastien","family":"Rance","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]},{"name":"H\u00f4pital Europ\u00e9en Georges Pompidou, Assistance Publique\u2014H\u00f4pitaux de Paris , Paris 75015,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1466-062X","authenticated-orcid":false,"given":"Adrien","family":"Coulet","sequence":"additional","affiliation":[{"name":"Inria Paris , Paris 75013,","place":["France"]},{"name":"Centre de Recherche des Cordeliers, Inserm UMR 1138, Universit\u00e9 Paris Cit\u00e9, Sorbonne Universit\u00e9 , Paris 75006,","place":["France"]}]}],"member":"286","published-online":{"date-parts":[[2024,11,15]]},"reference":[{"first-page":"54","year":"2019","author":"Akbik","key":"2024121400594655200_btae681-B1"},{"key":"2024121400594655200_btae681-B2","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1146\/annurev-biodatasci-080917-013315","article-title":"Advances in electronic phenotyping: from rule-based definitions to machine learning models","volume":"1","author":"Banda","year":"2018","journal-title":"Annu Rev Biomed Data Sci"},{"key":"2024121400594655200_btae681-B3","doi-asserted-by":"publisher","first-page":"622","DOI":"10.1038\/s41597-022-01710-x","article-title":"Introducing the FAIR principles for research software","volume":"9","author":"Barker","year":"2022","journal-title":"Sci Data"},{"volume-title":"Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit","year":"2009","author":"Bird","key":"2024121400594655200_btae681-B4"},{"key":"2024121400594655200_btae681-B5","doi-asserted-by":"publisher","first-page":"W345","DOI":"10.1093\/nar\/gkac247","article-title":"The Galaxy platform for accessible, reproducible and collaborative biomedical analyses","volume":"50","author":"Community","year":"2022","journal-title":"Nucleic Acids Res"},{"first-page":"168","year":"2002","author":"Cunningham","key":"2024121400594655200_btae681-B6"},{"journal-title":"Actes de la Journ\u00e9e D\u2019\u00e9tude Sur la Similarit\u00e9 Entre Patients","article-title":"D\u00e9tection de zones dupliqu\u00e9es dans des comptes rendus m\u00e9dicaux","author":"Fabacher","key":"2024121400594655200_btae681-B7"},{"key":"2024121400594655200_btae681-B8","doi-asserted-by":"publisher","first-page":"768","DOI":"10.3233\/SHTI230263","article-title":"Evaluating the portability of rheumatoid arthritis phenotyping algorithms: case study on French EHRs","volume":"302","author":"Fabacher","year":"2023","journal-title":"Stud Health Technol Inform"},{"article-title":"spaCy2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing","year":"2017","author":"Honnibal","key":"2024121400594655200_btae681-B9"},{"key":"2024121400594655200_btae681-B10","doi-asserted-by":"publisher","first-page":"272","DOI":"10.3233\/SHTI240396","article-title":"Comparing NER approaches on French clinical text, with easy-to-reuse pipelines","volume":"316","author":"Hubert","year":"2024","journal-title":"Stud Health Technol Inform"},{"key":"2024121400594655200_btae681-B11","doi-asserted-by":"publisher","first-page":"1499","DOI":"10.1111\/jgs.15411","article-title":"The value of unstructured electronic health record data in geriatric syndrome case identification","volume":"66","author":"Kharrazi","year":"2018","journal-title":"J Am Geriatr Soc"},{"key":"2024121400594655200_btae681-B12","doi-asserted-by":"publisher","first-page":"1046","DOI":"10.1093\/jamia\/ocv202","article-title":"PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability","volume":"23","author":"Kirby","year":"2016","journal-title":"J Am Med Inform Assoc"},{"key":"2024121400594655200_btae681-B13","doi-asserted-by":"publisher","first-page":"102083","DOI":"10.1016\/j.artmed.2021.102083","article-title":"Multi-domain clinical natural language processing with MedCAT: the medical concept annotation toolkit","volume":"117","author":"Kraljevic","year":"2021","journal-title":"Artif Intell Med"},{"key":"2024121400594655200_btae681-B14","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1016\/j.jbi.2017.07.012","article-title":"Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review","volume":"73","author":"Kreimeyer","year":"2017","journal-title":"J Biomed Inform"},{"key":"2024121400594655200_btae681-B15","article-title":"PROV-O: the PROV ontology","volume":"30","author":"Lebo","year":"2013","journal-title":"W3C"},{"key":"2024121400594655200_btae681-B16","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1146\/annurev-statistics-022513-115645","article-title":"A systematic statistical approach to evaluating evidence from observational studies","volume":"1","author":"Madigan","year":"2014","journal-title":"Annu Rev Stat Appl"},{"year":"2018","author":"Mendels","key":"2024121400594655200_btae681-B17"},{"key":"2024121400594655200_btae681-B18","doi-asserted-by":"crossref","first-page":"33","DOI":"10.12688\/f1000research.29032.2","article-title":"Sustainable data analysis with snakemake","volume":"10","author":"M\u00f6lder","year":"2021","journal-title":"F1000Res"},{"key":"2024121400594655200_btae681-B19","doi-asserted-by":"publisher","first-page":"649","DOI":"10.3233\/SHTI231045","article-title":"TAXN: translate align extract normalize, a multilingual extraction tool for clinical texts","volume":"310","author":"Neuraz","year":"2024","journal-title":"Stud Health Technol Inform"},{"author":"Nun","key":"2024121400594655200_btae681-B20","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.4869223"},{"year":"2024","author":"Pohyer","key":"2024121400594655200_btae681-B21"},{"key":"2024121400594655200_btae681-B22","doi-asserted-by":"publisher","first-page":"91","DOI":"10.3233\/SHTI220038","article-title":"Using an ontological representation of chemotherapy toxicities for guiding information extraction and integration from EHRs","volume":"290","author":"Rogier","year":"2022","journal-title":"Stud Health Technol Inform"},{"year":"2019","author":"Schuemie","key":"2024121400594655200_btae681-B23"},{"key":"2024121400594655200_btae681-B24","unstructured":"Wajsburt P, Petit-Jean T, Dura B et al EDS-NLP: efficient information extraction from French clinical notes (v0.12.0). Zenodo, 2024. 10.5281\/zenodo.11238626"},{"year":"2019","author":"Wolf","key":"2024121400594655200_btae681-B25"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae681\/60686190\/btae681.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/12\/btae681\/60924520\/btae681.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/12\/btae681\/60924520\/btae681.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,14]],"date-time":"2024-12-14T00:59:57Z","timestamp":1734137997000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae681\/7901218"}},"subtitle":[],"editor":[{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,11,15]]},"references-count":25,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,11,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae681","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2024,12]]},"published":{"date-parts":[[2024,11,15]]},"article-number":"btae681"}}