{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T00:25:09Z","timestamp":1776212709577,"version":"3.50.1"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,11,11]],"date-time":"2020-11-11T00:00:00Z","timestamp":1605052800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,11,11]],"date-time":"2020-11-11T00:00:00Z","timestamp":1605052800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>High-need, high-cost (HNHC) patients\u2014usually defined as those who account for the top 5% of annual healthcare costs\u2014use as much as half of the total healthcare costs. Accurately predicting future HNHC patients and designing targeted interventions for them has the potential to effectively control rapidly growing healthcare expenditures. To achieve this goal, we used a nationally representative random sample of the working-age population who underwent a screening program in Japan in 2013\u20132016, and developed five machine-learning-based prediction models for HNHC patients in the subsequent year. Predictors include demographics, blood pressure, laboratory tests (e.g., HbA1c, LDL-C, and AST), survey responses (e.g., smoking status, medications, and past medical history), and annual healthcare cost in the prior year. Our prediction models for HNHC patients combining clinical data from the national screening program with claims data showed a c-statistics of 0.84 (95%CI, 0.83\u20130.86), and overperformed traditional prediction models relying only on claims data.<\/jats:p>","DOI":"10.1038\/s41746-020-00354-8","type":"journal-article","created":{"date-parts":[[2020,11,11]],"date-time":"2020-11-11T11:02:31Z","timestamp":1605092551000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":54,"title":["Machine-learning-based prediction models for high-need high-cost patients using nationwide clinical and claims data"],"prefix":"10.1038","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7790-9130","authenticated-orcid":false,"given":"Itsuki","family":"Osawa","sequence":"first","affiliation":[]},{"given":"Tadahiro","family":"Goto","sequence":"additional","affiliation":[]},{"given":"Yuji","family":"Yamamoto","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1937-4833","authenticated-orcid":false,"given":"Yusuke","family":"Tsugawa","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,11,11]]},"reference":[{"key":"354_CR1","doi-asserted-by":"publisher","unstructured":"OECD. Health at a Glance 2019. OECD Publishing. https:\/\/doi.org\/10.1787\/4dd50c09-en (2019).","DOI":"10.1787\/4dd50c09-en"},{"key":"354_CR2","unstructured":"Mitchell, E. M. Concentration of Health Expenditures in the U.S. Civilian Noninstitutionalized Population, 2014. https:\/\/meps.ahrq.gov\/data_files\/publications\/st497\/stat497.shtml (2016)."},{"key":"354_CR3","doi-asserted-by":"publisher","first-page":"2572","DOI":"10.1001\/jama.2013.7103","volume":"309","author":"KE Joynt","year":"2013","unstructured":"Joynt, K. E. et al. Contribution of preventable acute care spending to total spending for high-cost. JAMA 309, 2572\u20132578 (2013).","journal-title":"JAMA"},{"key":"354_CR4","doi-asserted-by":"publisher","first-page":"909","DOI":"10.1056\/NEJMp1608511","volume":"375","author":"D Blumenthal","year":"2016","unstructured":"Blumenthal, D. et al. Caring for high-need, high-cost patients\u2014an urgent priority. N. Engl. J. Med. 375, 909\u2013911 (2016).","journal-title":"N. Engl. J. Med"},{"key":"354_CR5","doi-asserted-by":"publisher","first-page":"1657","DOI":"10.1001\/jama.2016.12388","volume":"316","author":"D Blumenthal","year":"2016","unstructured":"Blumenthal, D. & Abrams, M. K. Tailoring complex care management for high-need, high-cost patients. JAMA 316, 1657\u20131658 (2016).","journal-title":"JAMA"},{"key":"354_CR6","unstructured":"Das, L. T., Abramson, E. L., Kaushal, R. High-need, high-cost patients offer solutions for improving their care and reducing costs. NEJM Catal. https:\/\/catalyst.nejm.org\/doi\/full\/10.1056\/CAT.19.0015 (2019)."},{"key":"354_CR7","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0217353","volume":"14","author":"M Tanke","year":"2019","unstructured":"Tanke, M. et al. A challenge to all. A primer on inter-country differences of high-need, high-cost patients. PLoS ONE. 14, e0217353 (2019).","journal-title":"PLoS ONE."},{"key":"354_CR8","doi-asserted-by":"publisher","first-page":"1382","DOI":"10.1287\/opre.1080.0619","volume":"56","author":"D Bertsimas","year":"2008","unstructured":"Bertsimas, D. et al. Algorithmic prediction of health-care costs. Oper. Res. 56, 1382\u20131392 (2008).","journal-title":"Oper. Res."},{"key":"354_CR9","doi-asserted-by":"publisher","first-page":"532","DOI":"10.1111\/j.1475-6773.2009.01080.x","volume":"45","author":"JA Fleishman","year":"2010","unstructured":"Fleishman, J. A. & Cohen, J. W. Using information on clinical conditions to predict high-cost patients. Health Serv. Res. 45, 532\u2013552 (2010).","journal-title":"Health Serv. Res."},{"key":"354_CR10","first-page":"68","volume":"9","author":"Y Chechulin","year":"2014","unstructured":"Chechulin, Y. et al. Predicting patients with high risk of becoming high-cost healthcare users in Ontario (Canada). Health. Policy 9, 68\u201379 (2014).","journal-title":"Health. Policy"},{"key":"354_CR11","first-page":"e399","volume":"20","author":"LJ Leininger","year":"2014","unstructured":"Leininger, L. J. et al. Predicting high-need cases among new Medicaid enrollees. Am. J. Manag Care. 20, e399\u2013e407 (2014).","journal-title":"Am. J. Manag Care."},{"key":"354_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1136\/bmjopen-2016-011580","volume":"7","author":"S Tamang","year":"2017","unstructured":"Tamang, S. et al. Predicting patient \u201ccost blooms\u201d in Denmark: A longitudinal population-based study. BMJ Open. 7, 1\u201310 (2017).","journal-title":"BMJ Open."},{"key":"354_CR13","unstructured":"Shah, N. R. et al. Predictive analytics determine next year\u2019s highest-cost patients. NEJM Catal. https:\/\/catalyst.nejm.org\/doi\/full\/10.1056\/CAT.17.0542 (2016)."},{"key":"354_CR14","first-page":"e215","volume":"23","author":"PJ Cunningham","year":"2017","unstructured":"Cunningham, P. J. Predicting high-cost privately insured patients based on self-reported health and utilization data. Am. J. Manag Care 23, e215\u2013e222 (2017).","journal-title":"Am. J. Manag Care"},{"key":"354_CR15","first-page":"1312","volume":"2017","author":"MA Morid","year":"2017","unstructured":"Morid, M. A. et al. Supervised learning methods for predicting healthcare costs: systematic literature review and empirical evaluation. AMIA Annu Symp. Proc. 2017, 1312\u20131321 (2017).","journal-title":"AMIA Annu Symp. Proc."},{"key":"354_CR16","doi-asserted-by":"publisher","first-page":"1577","DOI":"10.1177\/1460458219881335","volume":"26","author":"L Luo","year":"2019","unstructured":"Luo, L. et al. Using machine learning approaches to predict high-cost chronic obstructive pulmonary disease patients in China. Health Inform. J. 26, 1577\u20131598 (2019).","journal-title":"Health Inform. J."},{"key":"354_CR17","doi-asserted-by":"publisher","DOI":"10.1186\/s12938-018-0568-3","volume":"17","author":"C Yang","year":"2018","unstructured":"Yang, C. et al. Machine learning approaches for predicting high cost high need patient expenditures in health care. Biomed. Eng. Online 17, 131 (2018).","journal-title":"Biomed. Eng. Online"},{"key":"354_CR18","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1089\/big.2018.0096","volume":"7","author":"YJ Kim","year":"2019","unstructured":"Kim, Y. J. & Park, H. Improving prediction of high-cost health care users with medical check-up data. Big Data 7, 163\u2013175 (2019).","journal-title":"Big Data"},{"key":"354_CR19","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1377\/hlthaff.2014.0041","volume":"33","author":"DW Bates","year":"2014","unstructured":"Bates, D. W. et al. Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33, 1123\u20131131 (2014).","journal-title":"Health Aff."},{"key":"354_CR20","doi-asserted-by":"publisher","DOI":"10.1136\/bmjopen-2018-023113","volume":"8","author":"JJG Wammes","year":"2018","unstructured":"Wammes, J. J. G. et al. Systematic review of high-cost patients\u2019 characteristics and healthcare utilisation. BMJ Open 8, e023113 (2018).","journal-title":"BMJ Open"},{"key":"354_CR21","doi-asserted-by":"publisher","first-page":"796","DOI":"10.1177\/0962280214558972","volume":"26","author":"PC Austin","year":"2017","unstructured":"Austin, P. C. & Steyerberg, E. W. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat. Methods Med. Res. 26, 796\u2013808 (2017).","journal-title":"Stat. Methods Med. Res."},{"key":"354_CR22","first-page":"9","volume":"4","author":"W Yoo","year":"2014","unstructured":"Yoo, W. et al. A study of effects of multicollinearity in the multivariable analysis. Int. J. Appl Sci. Technol. 4, 9\u201319 (2014).","journal-title":"Int. J. Appl Sci. Technol."},{"key":"354_CR23","doi-asserted-by":"publisher","first-page":"1216","DOI":"10.1056\/NEJMp1606181","volume":"375","author":"Z Obermeyer","year":"2016","unstructured":"Obermeyer, Z. & Emanuel, E. J. Predicting the future\u2014big data, machine learning, and clinical medicine. N. Engl. J. Med. 375, 1216\u20131219 (2016).","journal-title":"N. Engl. J. Med"},{"key":"354_CR24","doi-asserted-by":"publisher","first-page":"2627","DOI":"10.1001\/jama.2016.16885","volume":"316","author":"JL Dieleman","year":"2016","unstructured":"Dieleman, J. L. et al. US spending on personal health care and public health, 1996-2013. JAMA 316, 2627\u20132646 (2016).","journal-title":"JAMA"},{"key":"354_CR25","unstructured":"Ministry of Health, Labor and welfare. Overview of national medical care expenditure. https:\/\/www.mhlw.go.jp\/toukei\/saikin\/hw\/k-iryohi\/17\/dl\/data.pdf (2019)."},{"key":"354_CR26","doi-asserted-by":"publisher","first-page":"764","DOI":"10.1093\/aje\/kwt312","volume":"179","author":"AD Shah","year":"2014","unstructured":"Shah, A. D. et al. Comparison of random forest and parametric imputation models for imputing missing data using MICE: A CALIBER study. Am. J. Epidemiol. 179, 764\u2013774 (2014).","journal-title":"Am. J. Epidemiol."},{"key":"354_CR27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2261-14-120","volume":"14","author":"D Shima","year":"2014","unstructured":"Shima, D. et al. A retrospective, cross-sectional study of real-world values of cardiovascular risk factors using a healthcare database in Japan. BMC Cardiovasc Disord. 14, 1\u201314 (2014).","journal-title":"BMC Cardiovasc Disord."},{"key":"354_CR28","unstructured":"Ministry of Health, Labor and welfare. Health screening program based on industrial safety and health act. https:\/\/www.mhlw.go.jp\/english\/wp\/wp-hw5\/dl\/23010809e.pdf (2011)."},{"key":"354_CR29","unstructured":"glmnet: Lasso and elastic-net regularized generalized linear models. https:\/\/cran.r-project.org\/web\/packages\/glmnet\/index.html (2018)."},{"key":"354_CR30","unstructured":"ranger: a fast implementation of random forests. https:\/\/cran.r-project.org\/web\/packages\/ranger\/index.html (2018)."},{"key":"354_CR31","doi-asserted-by":"publisher","unstructured":"Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. https:\/\/doi.org\/10.1145\/2939672.2939785 (2016).","DOI":"10.1145\/2939672.2939785"},{"key":"354_CR32","unstructured":"R interface to Keras. https:\/\/github.com\/rstudio\/keras (2020)."},{"key":"354_CR33","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. 58, 267\u2013288 (1996).","journal-title":"J. R. Stat. Soc. Ser. B."},{"key":"354_CR34","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L. Random forests. Mach. Learn. 45, 5\u201332 (2001).","journal-title":"Mach. Learn."},{"key":"354_CR35","doi-asserted-by":"publisher","first-page":"21","DOI":"10.3389\/fnbot.2013.00021","volume":"7","author":"A Natekin","year":"2013","unstructured":"Natekin, A. & Knoll, A. Gradient boosting machines, a tutorial. Front Neurorobot. 7, 21 (2013).","journal-title":"Front Neurorobot."},{"key":"354_CR36","unstructured":"caret: classification and regression training. https:\/\/cran.r-project.org\/web\/packages\/caret\/index.html (2020)."},{"key":"354_CR37","first-page":"17","volume":"16","author":"C Cao","year":"2018","unstructured":"Cao, C. et al. Deep learning and its applications in biomedicine. Genomics. Proteom. Bioinforma. 16, 17\u201332 (2018).","journal-title":"Proteom. Bioinforma."},{"key":"354_CR38","unstructured":"Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https:\/\/arxiv.org\/abs\/1412.6980 (2017)."},{"key":"354_CR39","unstructured":"missForest: nonparametric missing value imputation using random forest. https:\/\/cran.r-project.org\/web\/packages\/missForest\/index.html (2013)."},{"key":"354_CR40","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1097\/01.ede.0000147512.81966.ba","volume":"16","author":"EF Schisterman","year":"2005","unstructured":"Schisterman, E. F. et al. Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples. Epidemiology 16, 73\u201381 (2005).","journal-title":"Epidemiology"},{"key":"354_CR41","doi-asserted-by":"publisher","first-page":"565","DOI":"10.1177\/0272989X06295361","volume":"26","author":"AJ Vickers","year":"2006","unstructured":"Vickers, A. J. & Elkin, E. B. Decision curve analysis: a novel method for evaluating prediction models. Med. Decis. Mak. 26, 565\u2013574 (2006).","journal-title":"Med. Decis. Mak."},{"key":"354_CR42","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1186\/s41512-019-0064-7","volume":"3","author":"AJ Vickers","year":"2019","unstructured":"Vickers, A. J. et al. A simple, step-by-step guide to interpreting decision curve analysis. Diagn. Progn. Res. 3, 18 (2019).","journal-title":"Diagn. Progn. Res."},{"key":"354_CR43","unstructured":"xgboost: extreme gradient boosting. https:\/\/cran.r-project.org\/web\/packages\/xgboost\/index.html (2020)."},{"key":"354_CR44","doi-asserted-by":"publisher","first-page":"837","DOI":"10.2307\/2531595","volume":"44","author":"ER DeLong","year":"1988","unstructured":"DeLong, E. R. et al. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837\u2013845 (1988).","journal-title":"Biometrics"}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-020-00354-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-020-00354-8","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-020-00354-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,17]],"date-time":"2024-08-17T07:22:57Z","timestamp":1723879377000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-020-00354-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,11]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["354"],"URL":"https:\/\/doi.org\/10.1038\/s41746-020-00354-8","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,11]]},"assertion":[{"value":"8 April 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 October 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 November 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"All authors have completed the ICMJE uniform disclosure form atand declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"148"}}