{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T06:56:44Z","timestamp":1778137004656,"version":"3.51.4"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T00:00:00Z","timestamp":1696809600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T00:00:00Z","timestamp":1696809600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["TRR 374"],"award-info":[{"award-number":["TRR 374"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005626","name":"Universit\u00e4t Regensburg","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005626","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["K\u00fcnstl Intell"],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Real world data (RWD) has become an important tool in pharmaceutical research and development. Generated every time patients interact with the healthcare system when diagnoses are developed and medical interventions are selected, RWD are massive and in many regards typical big data. The use of artificial intelligence (AI) to analyze RWD seems an obvious choice. It promises new insights into medical need, drivers of diseases, and new opportunities for pharmacological interventions. When put into practice RWD analyses are challenging. The distributed generation of data, under sub-optimally standardized conditions in a patient-oriented but not information maximizing healthcare transaction, leads to a high level of sparseness and uncontrolled biases. We discuss why this needs to be addressed independent of the type of analysis approach. While classical statistical analysis and modeling approaches provide a rigorous framework for the handling of bias and sparseness, AI methods are not necessarily suited when applied naively. Special precautions need to be taken from choice of method until interpretation of results to prevent potentially harmful fallacies. The conscious use of prior medical subject matter expertise may also be required. Based on typical application examples we illustrate challenges and methodological considerations.<\/jats:p>","DOI":"10.1007\/s13218-023-00809-6","type":"journal-article","created":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T18:02:29Z","timestamp":1696874549000},"page":"7-18","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Opportunities and Challenges for AI-Based Analysis of RWD in Pharmaceutical R&amp;D: A Practical Perspective"],"prefix":"10.1007","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4255-5237","authenticated-orcid":false,"given":"Merle","family":"Behr","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rolf","family":"Burghaus","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christian","family":"Diedrich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J\u00f6rg","family":"Lippert","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,10,9]]},"reference":[{"key":"809_CR1","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1016\/j.jclinepi.2018.09.003","volume":"105","author":"M Anderson","year":"2019","unstructured":"Anderson M, Naci H, Morrison D, Osipenko L, Mossialos E (2019) A review of NICE appraisals of pharmaceuticals 2000\u20132016 found variation in establishing comparative clinical effectiveness. J Clin Epidemiol 105:50\u201359","journal-title":"J Clin Epidemiol"},{"key":"809_CR2","doi-asserted-by":"crossref","unstructured":"Athey S, Julie T, Stefan W (2019) Generalized random forests. Ann Stat 47(2)","DOI":"10.1214\/18-AOS1709"},{"issue":"8","key":"809_CR3","doi-asserted-by":"publisher","first-page":"1943","DOI":"10.1073\/pnas.1711236115","volume":"115","author":"S Basu","year":"2018","unstructured":"Basu S, Kumbier K, Brown JB, Bin Yu (2018) Iterative random forests to discover predictive and stable high-order interactions. Proc Natl Acad Sci 115(8):1943\u20131948","journal-title":"Proc Natl Acad Sci"},{"issue":"22","key":"809_CR4","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2118636119","volume":"119","author":"Yu Merle Behr","year":"2022","unstructured":"Merle Behr Yu, Wang XL, Bin Yu (2022) Provable Boolean interaction recovery from tree ensemble obtained via random forests. Proc Natl Acad Sci 119(22):e2118636119","journal-title":"Proc Natl Acad Sci"},{"key":"809_CR5","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45:5\u201332","journal-title":"Mach Learn"},{"key":"809_CR6","doi-asserted-by":"crossref","unstructured":"B\u00e9nard C, Biau G, Da\u00a0Veiga S, Scornet E (2021) SIRUS: stable and interpretable RUle set for classification. Electron J Stat 15(1)","DOI":"10.1214\/20-EJS1792"},{"key":"809_CR7","first-page":"1","volume":"20","author":"A Fisher","year":"2019","unstructured":"Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: learning a variable\u2019s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20:1\u201381","journal-title":"J Mach Learn Res"},{"key":"809_CR8","unstructured":"Gan L, Zheng L, Allen GI (2022) Inference for interpretable machine learning: fast, model-agnostic confidence intervals for feature importance. arXiv:2206.02088 [cs, stat]"},{"issue":"3","key":"809_CR9","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1002\/cld.902","volume":"15","author":"E Gochanour","year":"2020","unstructured":"Gochanour E, Jayasekera C, Kowdley K (2020) Primary sclerosing cholangitis: epidemiology, genetics, diagnosis, and current management. Clin Liver Dis 15(3):125\u2013128","journal-title":"Clin Liver Dis"},{"key":"809_CR10","volume-title":"Causal inference: what if","author":"MA Hernan","year":"2023","unstructured":"Hernan MA, Robins JM (2023) Causal inference: what if. Chapman & Hall\/CRC, Boca Raton"},{"issue":"8","key":"809_CR11","doi-asserted-by":"publisher","first-page":"758","DOI":"10.1093\/aje\/kwv254","volume":"183","author":"MA Hern\u00e1n","year":"2016","unstructured":"Hern\u00e1n MA, Robins JM (2016) Using big data to emulate a target trial when a randomized trial is not available: table 1. Am J Epidemiol 183(8):758\u2013764","journal-title":"Am J Epidemiol"},{"issue":"9904","key":"809_CR12","doi-asserted-by":"publisher","first-page":"1587","DOI":"10.1016\/S0140-6736(13)60096-3","volume":"382","author":"GM Hirschfield","year":"2013","unstructured":"Hirschfield GM, Karlsen TH, Lindor KD, Adams DH (2013) Primary sclerosing cholangitis. The Lancet 382(9904):1587\u20131599","journal-title":"The Lancet"},{"issue":"1","key":"809_CR13","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1186\/s12874-022-01768-6","volume":"22","author":"F Liu","year":"2022","unstructured":"Liu F, Demosthenes P (2022) Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol 22(1):287","journal-title":"BMC Med Res Methodol"},{"issue":"7","key":"809_CR14","doi-asserted-by":"publisher","first-page":"858","DOI":"10.1016\/j.jval.2017.03.008","volume":"20","author":"A Makady","year":"2017","unstructured":"Makady A, de Boer A, Hillege H, Klungel O, Goettsch W (2017) What is real-world data? A review of definitions based on literature and stakeholder interviews. Value Health 20(7):858\u2013865","journal-title":"Value Health"},{"key":"809_CR15","doi-asserted-by":"crossref","unstructured":"Mayer I, Sverdrup E, Gauss T, Moyer J-D, Wager S, Josse J (2020) Doubly robust treatment effect estimation with missing attributes. arXiv:1910.10624 [stat]","DOI":"10.1214\/20-AOAS1356"},{"issue":"1","key":"809_CR16","doi-asserted-by":"publisher","first-page":"4053","DOI":"10.1038\/s41598-023-30986-1","volume":"13","author":"K Merkelbach","year":"2023","unstructured":"Merkelbach K, Schaper S, Diedrich C, Fritsch SJ, Schuppert A (2023) Novel architecture for gated recurrent unit autoencoder trained on time series from electronic health records enables detection of ICU patient subgroups. Sci Rep 13(1):4053","journal-title":"Sci Rep"},{"key":"809_CR17","unstructured":"Morvan M\u00a0Le, Josse J, Scornet E, Varoquaux G (2021) What\u2019s a good imputation to predict with missing values? arXiv:2106.00311 [cs, stat]"},{"issue":"3","key":"809_CR18","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1038\/nrd3078","volume":"9","author":"SM Paul","year":"2010","unstructured":"Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, Schacht AL (2010) How to improve R &D productivity: the pharmaceutical industry\u2019s grand challenge. Nat Rev Drug Discovery 9(3):203\u2013214","journal-title":"Nat Rev Drug Discovery"},{"issue":"11","key":"809_CR19","first-page":"1343","volume":"10","author":"R Ramaswamy","year":"2021","unstructured":"Ramaswamy R, Wee SN, George K, Ghosh A, Sarkar J, Burghaus R, Lippert J (2021) CKD subpopulations defined by risk-factors: a longitudinal analysis of electronic health records. CPT: Pharmacom Syst Pharmacol 10(11):1343\u20131356","journal-title":"CPT: Pharmacom Syst Pharmacol"},{"key":"809_CR20","first-page":"3145","volume":"70","author":"A Shrikumar","year":"2017","unstructured":"Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. Proc Mach Learn Res 70:3145\u20133153","journal-title":"Proc Mach Learn Res"},{"issue":"11","key":"809_CR21","doi-asserted-by":"publisher","first-page":"825","DOI":"10.7326\/0003-4819-158-11-201306040-00007","volume":"158","author":"PE Stevens","year":"2013","unstructured":"Stevens PE (2013) Evaluation and management of chronic kidney disease: synopsis of the kidney disease: improving global outcomes 2012 clinical practice guideline. Ann Int Med 158(11):825","journal-title":"Ann Int Med"},{"issue":"1","key":"809_CR22","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/s12874-020-01191-9","volume":"21","author":"JM Strayhorn","year":"2021","unstructured":"Strayhorn JM (2021) Virtual controls as an alternative to randomized controlled trials for assessing efficacy of interventions. BMC Med Res Methodol 21(1):3","journal-title":"BMC Med Res Methodol"},{"key":"809_CR23","doi-asserted-by":"publisher","first-page":"457","DOI":"10.2147\/CLEP.S242097","volume":"12","author":"K Thorlund","year":"2020","unstructured":"Thorlund K, Dron L, Park JJH, Mills EJ (2020) Synthetic and external controls in clinical trials - a primer for researchers. Clin Epidemiol 12:457\u2013467","journal-title":"Clin Epidemiol"},{"issue":"523","key":"809_CR24","doi-asserted-by":"publisher","first-page":"1228","DOI":"10.1080\/01621459.2017.1319839","volume":"113","author":"S Wager","year":"2018","unstructured":"Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 113(523):1228\u20131242","journal-title":"J Am Stat Assoc"},{"key":"809_CR25","doi-asserted-by":"crossref","unstructured":"Wasserman L, Ramdas A, Balakrishnan S (2020) Universal inference using the split likelihood ratio test. arXiv:1912.11436","DOI":"10.1073\/pnas.1922664117"},{"issue":"1","key":"809_CR26","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1038\/s41746-022-00617-6","volume":"5","author":"N Zong","year":"2022","unstructured":"Zong N, Wen A, Moon S, Fu S, Wang L, Zhao Y, Yu Y, Huang M, Wang Y, Zheng G, Mielke MM, Cerhan JR, Liu H (2022) Computational drug repurposing based on electronic health records: a scoping review. NPJ Digital Med 5(1):77","journal-title":"NPJ Digital Med"},{"issue":"3","key":"809_CR27","doi-asserted-by":"publisher","first-page":"647","DOI":"10.1007\/s10115-013-0679-x","volume":"41","author":"E \u0160trumbelj","year":"2014","unstructured":"\u0160trumbelj E, Kononenko I (2014) Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst 41(3):647\u2013665","journal-title":"Knowl Inf Syst"}],"container-title":["KI - K\u00fcnstliche Intelligenz"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13218-023-00809-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13218-023-00809-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13218-023-00809-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,3]],"date-time":"2025-05-03T08:31:52Z","timestamp":1746261112000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13218-023-00809-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,9]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["809"],"URL":"https:\/\/doi.org\/10.1007\/s13218-023-00809-6","relation":{},"ISSN":["0933-1875","1610-1987"],"issn-type":[{"value":"0933-1875","type":"print"},{"value":"1610-1987","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,9]]},"assertion":[{"value":"14 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"CD, JL and RB are employees of Bayer AG and make use of RWD analyses as part of their professional roles.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}