{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T05:21:25Z","timestamp":1768454485368,"version":"3.49.0"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T00:00:00Z","timestamp":1704672000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T00:00:00Z","timestamp":1704672000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>The handling of missing data is a challenge for inference and regression modelling. A particular challenge is dealing with missing predictor information, particularly when trying to build and make predictions from models for use in clinical practice.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>We utilise a flexible Bayesian approach for handling missing predictor information in regression models. This provides practitioners with full posterior predictive distributions for both the missing predictor information (conditional on the observed predictors) and the outcome-of-interest. We apply this approach to a previously proposed counterfactual treatment selection model for type 2 diabetes second-line therapies. Our approach combines a regression model and a Dirichlet process mixture model (DPMM), where the former defines the treatment selection model, and the latter provides a flexible way to model the joint distribution of the predictors.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We show that DPMMs can model complex relationships between predictor variables and can provide powerful means of fitting models to incomplete data (under missing-completely-at-random and missing-at-random assumptions). This framework ensures that the posterior distribution for the parameters and the conditional average treatment effect estimates automatically reflect the additional uncertainties associated with missing data due to the hierarchical model structure. We also demonstrate that in the presence of multiple missing predictors, the DPMM model can be used to explore which variable(s), if collected, could provide the most additional information about the likely outcome.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>When developing clinical prediction models, DPMMs offer a flexible way to model complex covariate structures and handle missing predictor information. DPMM-based counterfactual prediction models can also provide additional information to support clinical decision-making, including allowing predictions with appropriate uncertainty to be made for individuals with incomplete predictor data.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12911-023-02400-3","type":"journal-article","created":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T11:02:35Z","timestamp":1704711755000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Dirichlet process mixture models to impute missing predictor data in counterfactual prediction models: an application to predict optimal type 2 diabetes therapy"],"prefix":"10.1186","volume":"24","author":[{"given":"Pedro","family":"Cardoso","sequence":"first","affiliation":[]},{"given":"John\u00a0M.","family":"Dennis","sequence":"additional","affiliation":[]},{"given":"Jack","family":"Bowden","sequence":"additional","affiliation":[]},{"given":"Beverley\u00a0M.","family":"Shields","sequence":"additional","affiliation":[]},{"given":"Trevelyan\u00a0J.","family":"McKinley","sequence":"additional","affiliation":[]},{"name":"the\u00a0MASTERMIND Consortium","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,1,8]]},"reference":[{"key":"2400_CR1","doi-asserted-by":"crossref","unstructured":"Kent DM, Paulus JK, van Klaveren D, D\u2019Agostino R, Goodman S, Hayward R, et\u00a0al. The predictive approaches to treatment effect heterogeneity (PATH) statement. Ann Intern Med. 2020;172(35).","DOI":"10.7326\/M18-3667"},{"issue":"12","key":"2400_CR2","doi-asserted-by":"publisher","first-page":"e873","DOI":"10.1016\/S2589-7500(22)00174-1","volume":"4","author":"JM Dennis","year":"2022","unstructured":"Dennis JM, Young KG, McGovern AP, Mateen BA, Vollmer SJ, Simpson MD, et al. Development of a treatment selection algorithm for SGLT2 and DPP-4 inhibitor therapies in people with type 2 diabetes: a retrospective cohort study. Lancet Digit Health. 2022;4(12):e873\u201383.","journal-title":"Lancet Digit Health."},{"key":"2400_CR3","doi-asserted-by":"crossref","unstructured":"Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley series in probability and mathematical statistics. Probability and mathematical statistics. Wiley; 2002.","DOI":"10.1002\/9781119013563"},{"key":"2400_CR4","volume-title":"Comprehensive Chemometrics: Chemical and Biochemical Data Analysis","author":"GJ McLachlan","year":"2020","unstructured":"McLachlan GJ, Rathnayake S, Lee SX. Comprehensive Chemometrics: Chemical and Biochemical Data Analysis. 2nd ed. Oxford: Elsevier; 2020.","edition":"2"},{"key":"2400_CR5","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316696","volume-title":"Multiple Imputation for Nonresponse in Surveys","author":"DB Rubin","year":"1987","unstructured":"Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: Wiley; 1987."},{"issue":"1","key":"2400_CR6","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1002\/mpr.329","volume":"20","author":"MJ Azur","year":"2011","unstructured":"Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res. 2011;20(1):40\u20139.","journal-title":"Int J Methods Psychiatr Res."},{"issue":"8","key":"2400_CR7","doi-asserted-by":"publisher","first-page":"1461","DOI":"10.1177\/09622802231165001","volume":"32","author":"R Sisk","year":"2023","unstructured":"Sisk R, Sperrin M, Peek N, van Smeden M, Martin GP. Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study. Stat Methods Med Res. 2023;32(8):1461\u201377.","journal-title":"Stat Methods Med Res."},{"issue":"10","key":"2400_CR8","doi-asserted-by":"publisher","first-page":"1092","DOI":"10.1016\/j.jclinepi.2006.01.009","volume":"59","author":"KGM Moons","year":"2006","unstructured":"Moons KGM, Donders RART, Stijen T, Harrell FE Jr. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006;59(10):1092\u2013101.","journal-title":"J Clin Epidemiol."},{"key":"2400_CR9","doi-asserted-by":"publisher","DOI":"10.1201\/b16018","volume-title":"Bayesian Data Analysis","author":"A Gelman","year":"2013","unstructured":"Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. New York: Chapman & Hall\/CRC; 2013."},{"key":"2400_CR10","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1007\/s11222-006-5196-2","volume":"16","author":"JD McAuliffe","year":"2006","unstructured":"McAuliffe JD, Blei DM, Jordan MI. Nonparametric empirical Bayes for the Dirichlet process mixture model. Stat Comput. 2006;16:5\u201314.","journal-title":"Stat Comput."},{"issue":"3","key":"2400_CR11","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1093\/biostatistics\/kxq013","volume":"11","author":"J Molitor","year":"2010","unstructured":"Molitor J, Papathomas M, Jerrett M, Richardson S. Bayesian profile regression with an application to the national survey of children\u2019s health. Biostatistics. 2010;11(3):484\u201398.","journal-title":"Biostatistics."},{"issue":"7","key":"2400_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v064.i07","volume":"64","author":"S Liverani","year":"2015","unstructured":"Liverani S, Hastie DI, Azizi L, Papathomas M, Richardson S. PReMiuM: An R package for profile regression mixture models using Dirichlet processes. J Stat Softw. 2015;64(7):1\u201330.","journal-title":"J Stat Softw."},{"key":"2400_CR13","first-page":"1","volume":"31","author":"A Banerjee","year":"2013","unstructured":"Banerjee A, Murray J, Dunson D. Bayesian learning of joint distributions of objects. Artif Intell Stat. 2013;31:1\u20139.","journal-title":"Artif Intell Stat."},{"issue":"3","key":"2400_CR14","doi-asserted-by":"publisher","first-page":"679","DOI":"10.1214\/16-BA1020","volume":"12","author":"M DeYoreo","year":"2017","unstructured":"DeYoreo M, Reiter JP, Hillygus DS. Bayesian mixture models with focused clustering for mixed ordinal and nominal data. Bayesian Anal. 2017;12(3):679\u2013703.","journal-title":"Bayesian Anal."},{"issue":"2","key":"2400_CR15","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1080\/00031305.2016.1277158","volume":"71","author":"O Akande","year":"2017","unstructured":"Akande O, Li F, Reiter J. An empirical comparison of multiple imputation methods for categorical data. Am Stat. 2017;71(2):162\u201370.","journal-title":"Am Stat."},{"issue":"2","key":"2400_CR16","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1214\/aos\/1176342360","volume":"1","author":"TS Ferguson","year":"1973","unstructured":"Ferguson TS. A Bayesian analysis of some nonparametric problems. Annals Stat. 1973;1(2):209\u201330.","journal-title":"Annals Stat."},{"key":"2400_CR17","doi-asserted-by":"crossref","unstructured":"Favaro S, Walker SG. A generalized constructive definition for the Dirichlet process. Stat Probab Lett. 2010;78(16).","DOI":"10.1016\/j.spl.2008.04.001"},{"key":"2400_CR18","volume-title":"Finite Mixture Models","author":"D Peel","year":"2000","unstructured":"Peel D, McLachlan G. Finite Mixture Models. New York: Wiley; 2000."},{"issue":"1","key":"2400_CR19","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1093\/biomet\/asm086","volume":"95","author":"O Papaspiliopoulos","year":"2008","unstructured":"Papaspiliopoulos O, Roberts GO. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika. 2008;95(1):169\u201386.","journal-title":"Biometrika."},{"key":"2400_CR20","doi-asserted-by":"crossref","unstructured":"Daniels MJ, Linero AR, Roy J. Bayesian Nonparametrics for Causal Inference and Missing Data. Chapman & Hall\/CRC; 2023.","DOI":"10.1201\/9780429324222"},{"key":"2400_CR21","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1016\/j.jmp.2019.04.004","volume":"91","author":"Y Li","year":"2019","unstructured":"Li Y, Schofield E, G\u00fcnen M. A tutorial on Dirichlet process mixture modeling. J Math Psychol. 2019;91:128\u201344.","journal-title":"J Math Psychol."},{"issue":"11","key":"2400_CR22","doi-asserted-by":"publisher","first-page":"5305","DOI":"10.1016\/j.csda.2006.10.002","volume":"51","author":"MD Zio","year":"2007","unstructured":"Zio MD, Guarnera U, Luzi O. Imputation through finite Gaussian mixture models. Comput Stat Data Anal. 2007;51(11):5305\u201316.","journal-title":"Comput Stat Data Anal."},{"issue":"2","key":"2400_CR23","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1080\/07350015.2014.885435","volume":"31","author":"HJ Kim","year":"2014","unstructured":"Kim HJ, Reiter JP, Wang Q, Cox LH, Karr AF. Multiple imputation of missing or faulty values under linear constraints. J Bus Econ Stat. 2014;31(2):375\u201386.","journal-title":"J Bus Econ Stat."},{"issue":"5","key":"2400_CR24","doi-asserted-by":"publisher","first-page":"499","DOI":"10.3102\/1076998613480394","volume":"38","author":"Y Si","year":"2013","unstructured":"Si Y, Reiter JP. Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. J Educ Behav Stat. 2013;38(5):499\u2013521.","journal-title":"J Educ Behav Stat."},{"key":"2400_CR25","unstructured":"Wang C, Liao X, Carin L, Dunson DB. Classification with incomplete data using Dirichlet process priors. J Mach Learn Res. 2010;11(12)."},{"issue":"520","key":"2400_CR26","doi-asserted-by":"publisher","first-page":"1708","DOI":"10.1080\/01621459.2016.1231612","volume":"112","author":"D Manrique-Vallier","year":"2017","unstructured":"Manrique-Vallier D, Reiter JP. Bayesian simultaneous edit and imputation for multivariate categorical data. J Am Stat Assoc. 2017;112(520):1708\u201319.","journal-title":"J Am Stat Assoc."},{"issue":"4","key":"2400_CR27","doi-asserted-by":"publisher","first-page":"1193","DOI":"10.1111\/biom.12875","volume":"74","author":"J Roy","year":"2018","unstructured":"Roy J, Lum KJ, Zeldow B, Dworkin JD, Lo Re III V, Daniels MJ. Bayesian nonparametric generative models for causal inference with missing at random covariates. Biometrics. 2018;74(4):1193\u2013202.","journal-title":"Biometrics."},{"issue":"3","key":"2400_CR28","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1214\/ba\/1339616468","volume":"6","author":"S Wade","year":"2011","unstructured":"Wade S, Mongelluzzo S, Petrone S. An enriched conjugate prior for Bayesian nonparametric inference. Bayesian Anal. 2011;6(3):359\u201386.","journal-title":"Bayesian Anal."},{"key":"2400_CR29","first-page":"1041","volume":"15","author":"S Wade","year":"2014","unstructured":"Wade S, Dunson DB, Petrone S, Trippa L. Improving prediction from Dirichlet process mixtures via enrichment. J Mach Learn Res. 2014;15:1041\u201371.","journal-title":"J Mach Learn Res."},{"issue":"1","key":"2400_CR30","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1093\/biomet\/83.1.67","volume":"83","author":"P M\u00fcller","year":"1996","unstructured":"M\u00fcller P, Erkanli A, West M. Bayesian curve fitting using multivariate normal mixtures. Biometrika. 1996;83(1):67\u201379.","journal-title":"Biometrika."},{"key":"2400_CR31","doi-asserted-by":"publisher","first-page":"2075","DOI":"10.2337\/dbi20-0002","volume":"69","author":"J Dennis","year":"2020","unstructured":"Dennis J. Precision medicine in type 2 diabetes: using individualized prediction models to optimise selection of treatment. Diabetes. 2020;69:2075\u201385.","journal-title":"Diabetes."},{"key":"2400_CR32","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1080\/10618600.2016.1172487","volume":"26","author":"P de Valpine","year":"2017","unstructured":"de Valpine P, Turek D, Paciorek CJ, Anderson-Bergman C, Temple Lang D, Bodik R. Programming with models: writing statistical algorithms for general model structures with NIMBLE. J Comput Graph Stat. 2017;26:403\u201313.","journal-title":"J Comput Graph Stat."},{"key":"2400_CR33","unstructured":"de Valpine P, Paciorek C, Turek D, Michaud N, Anderson-Bergman C, Obermeyer F, et\u00a0al. NIMBLE: MCMC, particle filtering, and programmable hierarchical modeling. 2022. R package version 0.12.2. https:\/\/cran.r-project.org\/package=nimble."},{"key":"2400_CR34","unstructured":"R Core Team. R: a language and environment for statistical computing. Vienna, Austria. 2021. https:\/\/www.R-project.org\/."},{"issue":"3","key":"2400_CR35","doi-asserted-by":"publisher","first-page":"827","DOI":"10.1093\/ije\/dyv098","volume":"44","author":"E Herrett","year":"2015","unstructured":"Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: clinical practice research datalink (CPRD). Int J Epidemiol. 2015;44(3):827\u201336.","journal-title":"Int J Epidemiol."},{"key":"2400_CR36","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-19425-7","volume-title":"Regression Modeling Strategies","author":"FE Harrell Jr","year":"2015","unstructured":"Harrell Jr FE. Regression Modeling Strategies. New York: Springer International Publishing; 2015."},{"issue":"4","key":"2400_CR37","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1214\/ss\/1177011136","volume":"7","author":"A Gelman","year":"1992","unstructured":"Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457\u2013511.","journal-title":"Stat Sci."},{"issue":"1","key":"2400_CR38","first-page":"63","volume":"26","author":"PM Bossuyt","year":"2015","unstructured":"Bossuyt PM, Parvin T. Evaluating biomarkers for guiding treatment decisions. EJIFCC. 2015;26(1):63\u201370.","journal-title":"EJIFCC."},{"issue":"509","key":"2400_CR39","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1080\/01621459.2014.969424","volume":"110","author":"AR Linero","year":"2015","unstructured":"Linero AR, Daniels MJ. A flexible Bayesian approach to monotone missing data in longitudinal studies with nonignorable missingness with application to an acute schizophrenia clinical trial. J Am Stat Assoc. 2015;110(509):45\u201355.","journal-title":"J Am Stat Assoc."},{"issue":"1","key":"2400_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v080.i01","volume":"80","author":"PC B\u00fcrkner","year":"2017","unstructured":"B\u00fcrkner PC. brms: an R package for Bayesian multilevel models using Stan. J Stat Softw. 2017;80(1):1\u201328.","journal-title":"J Stat Softw."},{"issue":"3","key":"2400_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v045.i03","volume":"45","author":"S van Buuren","year":"2011","unstructured":"van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011;45(3):1\u201367.","journal-title":"J Stat Softw."},{"issue":"2","key":"2400_CR42","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1016\/j.jeconom.2011.08.003","volume":"165","author":"M van Hasselt","year":"2011","unstructured":"van Hasselt M. Bayesian inference in a sample selection model. J Econ. 2011;165(2):221\u201332.","journal-title":"J Econ."},{"issue":"3","key":"2400_CR43","doi-asserted-by":"publisher","first-page":"965","DOI":"10.1214\/19-BA1195","volume":"15","author":"PR Hahn","year":"2020","unstructured":"Hahn PR, Murray JS, Carvalho CM. Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects (with discussion). Bayesian Anal. 2020;15(3):965\u20131056.","journal-title":"Bayesian Anal."},{"key":"2400_CR44","doi-asserted-by":"crossref","unstructured":"Daniels MJ, Gaskins JT. Bayesian methods for the analysis of mixed categorical and continuous (incomplete) data. In: de Leon AR, Chough KC, editors. Analysis of Mixed Data: Methods and Applications. Chapman & Hall\/CRC; 2013.","DOI":"10.1201\/b14571-14"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02400-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-023-02400-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02400-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T11:08:32Z","timestamp":1704712112000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-023-02400-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,8]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["2400"],"URL":"https:\/\/doi.org\/10.1186\/s12911-023-02400-3","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,8]]},"assertion":[{"value":"25 January 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Approval for CPRD data access and the study protocol was granted by the CPRD Independent Scientific Advisory Committee (Project\u2013ID: ISAC 13_177R), and informed consent was waived for this retrospective study. Individual patients can opt out of sharing their data for CPRD, and CPRD does not collect data for these patients. All methods were carried out in accordance with relevant guidelines and regulations.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"12"}}