{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T19:50:04Z","timestamp":1779306604249,"version":"3.51.4"},"reference-count":60,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2020,12,15]],"date-time":"2020-12-15T00:00:00Z","timestamp":1607990400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"funder":[{"name":"ANR PractikPharma","award":["ANR-15-CE23-0028"],"award-info":[{"award-number":["ANR-15-CE23-0028"]}]},{"name":"French Agence Nationale de la Recherche"},{"name":"SIRIC CARPEM research program"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Background<\/jats:title>\n                  <jats:p>The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>Based on the literature across multiple researcho fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocaa261","type":"journal-article","created":{"date-parts":[[2020,11,3]],"date-time":"2020-11-03T20:16:01Z","timestamp":1604434561000},"page":"504-515","source":"Crossref","is-referenced-by-count":23,"title":["Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites"],"prefix":"10.1093","volume":"28","author":[{"given":"William","family":"Digan","sequence":"first","affiliation":[{"name":"INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Universit\u00e9 de Paris, Universit\u00e9 Sorbonne Paris Cit\u00e9, Paris, France"},{"name":"Department of Medical Informatics, H\u00f4pital Europ\u00e9en Georges Pompidou, Assistance publique\u2013H\u00f4pitaux de Paris, Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aur\u00e9lie","family":"N\u00e9v\u00e9ol","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Paris Saclay, CNRS, LIMSI, Orsay, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antoine","family":"Neuraz","sequence":"additional","affiliation":[{"name":"INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Universit\u00e9 de Paris, Universit\u00e9 Sorbonne Paris Cit\u00e9, Paris, France"},{"name":"Department of Medical Informatics, Necker Children\u2019s Hospital, Assistance publique\u2013H\u00f4pitaux de Paris, Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maxime","family":"Wack","sequence":"additional","affiliation":[{"name":"INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Universit\u00e9 de Paris, Universit\u00e9 Sorbonne Paris Cit\u00e9, Paris, France"},{"name":"Department of Medical Informatics, H\u00f4pital Europ\u00e9en Georges Pompidou, Assistance publique\u2013H\u00f4pitaux de Paris, Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Baudoin","sequence":"additional","affiliation":[{"name":"INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Universit\u00e9 de Paris, Universit\u00e9 Sorbonne Paris Cit\u00e9, Paris, France"},{"name":"Department of Medical Informatics, H\u00f4pital Europ\u00e9en Georges Pompidou, Assistance publique\u2013H\u00f4pitaux de Paris, Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anita","family":"Burgun","sequence":"additional","affiliation":[{"name":"INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Universit\u00e9 de Paris, Universit\u00e9 Sorbonne Paris Cit\u00e9, Paris, France"},{"name":"Department of Medical Informatics, H\u00f4pital Europ\u00e9en Georges Pompidou, Assistance publique\u2013H\u00f4pitaux de Paris, Paris, France"},{"name":"Department of Medical Informatics, Necker Children\u2019s Hospital, Assistance publique\u2013H\u00f4pitaux de Paris, Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bastien","family":"Rance","sequence":"additional","affiliation":[{"name":"INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Universit\u00e9 de Paris, Universit\u00e9 Sorbonne Paris Cit\u00e9, Paris, France"},{"name":"Department of Medical Informatics, H\u00f4pital Europ\u00e9en Georges Pompidou, Assistance publique\u2013H\u00f4pitaux de Paris, Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,12,15]]},"reference":[{"issue":"7604","key":"2021030612203325300_ocaa261-B1","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1038\/533452a","article-title":"1,500 scientists lift the lid on reproducibility","volume":"533","author":"Baker","year":"2016","journal-title":"Nature News"},{"key":"2021030612203325300_ocaa261-B2","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1016\/j.future.2017.01.012","article-title":"Scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities","volume":"75","author":"Cohen-Boulakia","year":"2017","journal-title":"Future Gen Comput Syst"},{"issue":"1","key":"2021030612203325300_ocaa261-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci Data"},{"issue":"3","key":"2021030612203325300_ocaa261-B4","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1145\/2812803","article-title":"Repeatability in computer systems research","volume":"59","author":"Collberg","year":"2016","journal-title":"Commun ACM"},{"key":"2021030612203325300_ocaa261-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3389\/fninf.2017.00069","article-title":"Re-run, repeat, reproduce, reuse, replicate: transforming code into scientific contributions","volume":"11","author":"Benureau","year":"2018","journal-title":"Front Neuroinform"},{"issue":"7","key":"2021030612203325300_ocaa261-B6","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1038\/s41592-020-0886-9","article-title":"When computational pipelines go \u2018clank.\u2019","volume":"17","author":"Marx","year":"2020","journal-title":"Nat Methods"},{"issue":"3","key":"2021030612203325300_ocaa261-B7","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1162\/coli.2008.34.3.465","article-title":"Empiricism is not a matter of faith","volume":"34","author":"Pedersen","year":"2008","journal-title":"Comput Linguistics"},{"key":"2021030612203325300_ocaa261-B8","first-page":"1691","author":"Fokkens","year":"2013"},{"key":"2021030612203325300_ocaa261-B9","first-page":"156","article-title":"Three dimensions of reproducibility in natural language processing","volume":"2018","author":"Cohen","year":"2018","journal-title":"LREC Int Conf Lang Resour Eval"},{"issue":"3","key":"2021030612203325300_ocaa261-B10","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1093\/jamia\/ocz007","article-title":"The journey to transparency, reproducibility, and replicability","volume":"26","author":"Bakken","year":"2019","journal-title":"J Am Med Inform Assoc"},{"key":"2021030612203325300_ocaa261-B11","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.jbi.2018.10.005","article-title":"Using clinical natural language processing for health outcomes research: overview and actionable suggestions for future advances","volume":"88","author":"Velupillai","year":"2018","journal-title":"J Biomed Inform"},{"issue":"5","key":"2021030612203325300_ocaa261-B12","doi-asserted-by":"crossref","first-page":"986","DOI":"10.1093\/jamia\/ocx039","article-title":"Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings","volume":"24","author":"Carrell","year":"2017","journal-title":"J Am Med Inform Assoc"},{"issue":"10","key":"2021030612203325300_ocaa261-B13","doi-asserted-by":"crossref","first-page":"e1003285","DOI":"10.1371\/journal.pcbi.1003285","article-title":"Ten simple rules for reproducible computational research","volume":"9","author":"Sandve","year":"2013","journal-title":"PLoS Comput Biol"},{"issue":"7","key":"2021030612203325300_ocaa261-B14","doi-asserted-by":"crossref","first-page":"e1000424","DOI":"10.1371\/journal.pcbi.1000424","article-title":"A quick guide to organizing computational biology projects","volume":"5","author":"Noble","year":"2009","journal-title":"PLoS Comput Biol"},{"issue":"12","key":"2021030612203325300_ocaa261-B15","doi-asserted-by":"crossref","first-page":"e1006561","DOI":"10.1371\/journal.pcbi.1006561","article-title":"Ten simple rules for documenting scientific software","volume":"14","author":"Lee","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2021030612203325300_ocaa261-B16","doi-asserted-by":"crossref","article-title":"Nextflow: enables reproduccible computational workflows","author":"Di Tommaso","DOI":"10.1038\/nbt.3820"},{"issue":"19","key":"2021030612203325300_ocaa261-B17","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2013a scalable bioinformatics workflow engine","volume":"28","author":"K\u00f6ster","year":"2012","journal-title":"Bioinformatics"},{"issue":"W1","key":"2021030612203325300_ocaa261-B18","doi-asserted-by":"crossref","first-page":"W3","DOI":"10.1093\/nar\/gkw343","article-title":"The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update","volume":"44","author":"Afgan","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2021030612203325300_ocaa261-B19"},{"key":"2021030612203325300_ocaa261-B20"},{"issue":"5","key":"2021030612203325300_ocaa261-B21","doi-asserted-by":"crossref","first-page":"e0177459","DOI":"10.1371\/journal.pone.0177459","article-title":"Singularity: scientific containers for mobility of compute","volume":"12","author":"Kurtzer","year":"2017","journal-title":"Plos One"},{"key":"2021030612203325300_ocaa261-B22","first-page":"241","author":"B\u00e1n\u00e1ti"},{"key":"2021030612203325300_ocaa261-B23","first-page":"1705","article-title":"ProvCaRe semantic provenance knowledgebase: evaluating scientific reproducibility of research studies","volume":"2017","author":"Valdez","year":"2017","journal-title":"AMIA Annu Symp Proc"},{"issue":"11","key":"2021030612203325300_ocaa261-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/gigascience\/giz095","article-title":"Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv","volume":"8","author":"Khan","year":"2019","journal-title":"Gigascience"},{"key":"2021030612203325300_ocaa261-B25","author":"Gaignard","year":"2017"},{"key":"2021030612203325300_ocaa261-B26"},{"key":"2021030612203325300_ocaa261-B27","first-page":"457","author":"Ide","year":"2016"},{"key":"2021030612203325300_ocaa261-B28","author":"Labropoulou","year":"2018"},{"issue":"7","key":"2021030612203325300_ocaa261-B29","doi-asserted-by":"crossref","first-page":"467","DOI":"10.7326\/M18-0850","article-title":"PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation","volume":"169","author":"Tricco","year":"2018","journal-title":"Ann Intern Med"},{"issue":"1","key":"2021030612203325300_ocaa261-B30","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1186\/s12874-017-0377-6","article-title":"Repeat: a framework to assess empirical reproducibility in biomedical research","volume":"17","author":"McIntosh","year":"2017","journal-title":"BMC Med Res Methodol"},{"issue":"4","key":"2021030612203325300_ocaa261-B31","doi-asserted-by":"crossref","first-page":"e1005412","DOI":"10.1371\/journal.pcbi.1005412","article-title":"Ten simple rules for making research software more robust","volume":"13","author":"Taschuk","year":"2017","journal-title":"PLOS Comput Biol"},{"issue":"5","key":"2021030612203325300_ocaa261-B32","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1136\/jamia.2009.001560","article-title":"Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation, and applications","volume":"17","author":"Savova","year":"2010","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2021030612203325300_ocaa261-B33","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1093\/jamia\/ocx132","article-title":"CLAMP\u2014a toolkit for efficiently building customized clinical natural language processing pipelines","volume":"25","author":"Soysal","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2021030612203325300_ocaa261-B34","doi-asserted-by":"crossref","first-page":"e1002854","DOI":"10.1371\/journal.pcbi.1002854","article-title":"Getting more out of biomedical documents with GATE\u2019s full lifecycle open source text analytics","volume":"9","author":"Cunningham","year":"2013","journal-title":"PLOS Comput Biol"},{"key":"2021030612203325300_ocaa261-B35","author":"Neumann","year":"2019"},{"key":"2021030612203325300_ocaa261-B36","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1016\/j.scico.2016.01.001","article-title":"TextFlows: A visual programming platform for text mining and natural language processing","volume":"121","author":"Perov\u0161ek","year":"2016","journal-title":"Sci Comput Programming"},{"issue":"2","key":"2021030612203325300_ocaa261-B37","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1023\/A:1014348124664","article-title":"GATE, a General Architecture For Text Engineering","volume":"36","author":"Cunningham","year":"2002","journal-title":"Comput Hum"},{"key":"2021030612203325300_ocaa261-B38"},{"key":"2021030612203325300_ocaa261-B39","first-page":"102","author":"Stenetorp","year":"2012"},{"key":"2021030612203325300_ocaa261-B40","first-page":"307","author":"Carpenter"},{"key":"2021030612203325300_ocaa261-B41","author":"Apache OpenNLP.Text Annotation with OpenNLP and UIMA. https:\/\/opennlp.apache.org\/ Accessed Jun 22, 2020."},{"key":"2021030612203325300_ocaa261-B42"},{"key":"2021030612203325300_ocaa261-B43","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1007\/978-3-642-33486-3_54","volume-title":"Machine Learning and Knowledge Discovery in Databases","author":"Kranjc","year":"2012"},{"key":"2021030612203325300_ocaa261-B44","first-page":"69","author":"Bird","year":"2006"},{"key":"2021030612203325300_ocaa261-B45","author":"Pedregosa"},{"key":"2021030612203325300_ocaa261-B46","doi-asserted-by":"crossref","first-page":"55","DOI":"10.3115\/v1\/P14-5010","volume-title":"Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations","author":"Manning","year":"2014"},{"key":"2021030612203325300_ocaa261-B47"},{"issue":"3-4","key":"2021030612203325300_ocaa261-B48","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1017\/S1351324904003523","article-title":"UIMA: an architectural approach to unstructured information processing in the corporate research environment","volume":"10","author":"Ferrucci","year":"2004","journal-title":"Nat Lang Eng"},{"issue":"1","key":"2021030612203325300_ocaa261-B49","doi-asserted-by":"crossref","first-page":"160035","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci Data"},{"key":"2021030612203325300_ocaa261-B50","doi-asserted-by":"crossref","first-page":"122","DOI":"10.18653\/v1\/W18-5614","volume-title":"Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis","author":"Grabar","year":"2018"},{"key":"2021030612203325300_ocaa261-B51","author":"N\u00e9v\u00e9ol","year":"2014"},{"issue":"10","key":"2021030612203325300_ocaa261-B52","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1093\/jamia\/ocy114","article-title":"Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H) 2017 shared task","volume":"25","author":"Sarker","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2021030612203325300_ocaa261-B53","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1136\/amiajnl-2011-000465","article-title":"Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions","volume":"18","author":"Chapman","year":"2011","journal-title":"J Am Med Inform Assoc"},{"key":"2021030612203325300_ocaa261-B54","first-page":"1","author":"Soldaini"},{"issue":"16","key":"2021030612203325300_ocaa261-B55","doi-asserted-by":"crossref","first-page":"2580","DOI":"10.1093\/bioinformatics\/btx192","article-title":"BioContainers: an open-source and community-driven framework for software standardization","volume":"33","author":"da Veiga Leprevost","year":"2017","journal-title":"Bioinformatics"},{"key":"2021030612203325300_ocaa261-B56","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program","author":"Aronson","year":"2001","journal-title":"Proc AMIA Symp"},{"issue":"10","key":"2021030612203325300_ocaa261-B57","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1093\/bioinformatics\/btt113","article-title":"EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats","volume":"29","author":"Ison","year":"2013","journal-title":"Bioinformatics"},{"issue":"0","key":"2021030612203325300_ocaa261-B58","doi-asserted-by":"crossref","first-page":"bat064","DOI":"10.1093\/database\/bat064","article-title":"BioC: a minimalist approach to interoperability for biomedical text processing","volume":"2013","author":"Comeau","year":"2013","journal-title":"Database (Oxford)"},{"key":"2021030612203325300_ocaa261-B59","doi-asserted-by":"crossref","first-page":"149","DOI":"10.3115\/1596276.1596305","volume-title":"Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)","author":"Buchholz","year":"2006"},{"issue":"3","key":"2021030612203325300_ocaa261-B60","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1038\/s41587-020-0439-x","article-title":"The nf-core framework for community-curated bioinformatics pipelines","volume":"38","author":"Ewels","year":"2020","journal-title":"Nat Biotechnol"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/28\/3\/504\/36428698\/ocaa261.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/28\/3\/504\/36428698\/ocaa261.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,6]],"date-time":"2021-03-06T12:21:13Z","timestamp":1615033273000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/28\/3\/504\/6034902"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,15]]},"references-count":60,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2020,12,15]]},"published-print":{"date-parts":[[2021,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaa261","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,3,1]]},"published":{"date-parts":[[2020,12,15]]}}}