{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T19:13:43Z","timestamp":1770232423980,"version":"3.49.0"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,4,24]],"date-time":"2022-04-24T00:00:00Z","timestamp":1650758400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,24]],"date-time":"2022-04-24T00:00:00Z","timestamp":1650758400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>In the early stages of the COVID-19 pandemic our institution was interested in forecasting how long surgical patients receiving elective procedures would spend in the hospital. Initial examination of our models indicated that, due to the skewed nature of the length of stay, accurate prediction was challenging and we instead opted for a simpler classification model. In this work we perform a deeper examination of predicting in-hospital length of stay.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>We used electronic health record data on length of stay from 42,209 elective surgeries. We compare different loss-functions (mean squared error, mean absolute error, mean relative error), algorithms (LASSO, Random Forests, multilayer perceptron) and data transformations (log and truncation). We also assess the performance of two stage hybrid classification-regression approach.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Our results show that while it is possible to accurately predict short length of stays, predicting longer length of stay is extremely challenging. As such, we opt for a two-stage model that first classifies patients into long versus short length of stays and then a second stage that fits a regresssor among those predicted to have a short length of stay.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Discussion<\/jats:title>\n                <jats:p>The results indicate both the challenges and considerations necessary to applying machine-learning methods to skewed outcomes.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>Two-stage models allow those developing clinical decision support tools to explicitly acknowledge where they can and cannot make accurate predictions.\n<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12911-022-01855-0","type":"journal-article","created":{"date-parts":[[2022,4,24]],"date-time":"2022-04-24T15:02:23Z","timestamp":1650812543000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data"],"prefix":"10.1186","volume":"22","author":[{"given":"Zhenhui","family":"Xu","sequence":"first","affiliation":[]},{"given":"Congwen","family":"Zhao","sequence":"additional","affiliation":[]},{"suffix":"Jr","given":"Charles D.","family":"Scales","sequence":"additional","affiliation":[]},{"given":"Ricardo","family":"Henao","sequence":"additional","affiliation":[]},{"given":"Benjamin A.","family":"Goldstein","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,4,24]]},"reference":[{"issue":"11","key":"1855_CR1","doi-asserted-by":"publisher","DOI":"10.1001\/jamanetworkopen.2020.23547","volume":"3","author":"BA Goldstein","year":"2020","unstructured":"Goldstein BA, Cerullo M, Krishnamoorthy V, et al. Development and performance of a clinical decision support tool to inform resource utilization for elective operations. JAMA Netw Open. 2020;3(11): e2023547. https:\/\/doi.org\/10.1001\/jamanetworkopen.2020.23547.","journal-title":"JAMA Netw Open"},{"issue":"2","key":"1855_CR2","doi-asserted-by":"publisher","first-page":"121","DOI":"10.4258\/hir.2013.19.2.121","volume":"19","author":"PR Hachesu","year":"2013","unstructured":"Hachesu PR, Ahmadi M, Alizadeh S, Sadoughi F. Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthc Inform Res. 2013;19(2):121\u20139. https:\/\/doi.org\/10.4258\/hir.2013.19.2.121.","journal-title":"Healthc Inform Res"},{"key":"1855_CR3","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1038\/s41746-020-0249-z","volume":"3","author":"CB Hilton","year":"2020","unstructured":"Hilton CB, Milinovich A, Felix C, et al. Personalized predictions of patient outcomes during and after hospitalization using artificial intelligence. NPJ Digit Med. 2020;3:51. https:\/\/doi.org\/10.1038\/s41746-020-0249-z.","journal-title":"NPJ Digit Med"},{"issue":"7","key":"1855_CR4","doi-asserted-by":"publisher","first-page":"478","DOI":"10.1016\/j.ejim.2015.06.002","volume":"26","author":"CP Launay","year":"2015","unstructured":"Launay CP, Rivi\u00e8re H, Kabeshova A, Beauchet O. Predicting prolonged length of hospital stay in older emergency department users: use of a novel analysis method, the artificial neural network. Eur J Intern Med. 2015;26(7):478\u201382. https:\/\/doi.org\/10.1016\/j.ejim.2015.06.002.","journal-title":"Eur J Intern Med"},{"key":"1855_CR5","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1186\/1472-6947-14-26","volume":"14","author":"EM Carter","year":"2014","unstructured":"Carter EM, Potts HWW. Predicting length of stay from an electronic patient record system: a primary total knee replacement example. BMC Med Inform Decis Mak. 2014;14:26. https:\/\/doi.org\/10.1186\/1472-6947-14-26.","journal-title":"BMC Med Inform Decis Mak"},{"key":"1855_CR6","doi-asserted-by":"publisher","unstructured":"Morton A, Marzban E, Giannoulis G, Patel A, Aparasu R, Kakadiaris IA. A comparison of supervised machine learning techniques for predicting short-term in-hospital length of stay among diabetic patients. In: 2014 13th international conference on machine learning and applications. IEEE; 2014, pp. 428\u2013431. https:\/\/doi.org\/10.1109\/ICMLA.2014.76","DOI":"10.1109\/ICMLA.2014.76"},{"key":"1855_CR7","doi-asserted-by":"publisher","unstructured":"Al Taleb AR, Hoque M, Hasanat A, Khan MB. Application of data mining techniques to predict length of stay of stroke patients. In: 2017 International Conference on Informatics, Health & Technology (ICIHT). IEEE; 2017. pp. 1\u20135. https:\/\/doi.org\/10.1109\/ICIHT.2017.7899004","DOI":"10.1109\/ICIHT.2017.7899004"},{"issue":"8","key":"1855_CR8","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1097\/MLR.0b013e3181e359f3","volume":"48","author":"V Liu","year":"2010","unstructured":"Liu V, Kipnis P, Gould MK, Escobar GJ. Length of stay predictions: improvements through the use of automated laboratory and comorbidity variables. Med Care. 2010;48(8):739\u201344. https:\/\/doi.org\/10.1097\/MLR.0b013e3181e359f3.","journal-title":"Med Care"},{"key":"1855_CR9","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1007\/978-3-030-45688-7_21","volume-title":"Trends and innovations in information systems and technologies. Advances in intelligent systems and computing","author":"RN Mekhaldi","year":"2020","unstructured":"Mekhaldi RN, Caulier P, Chaabane S, Chraibi A, Piechowiak S. Using machine learning models to predict the length of stay in a hospital setting. In: Rocha \u00c1, Adeli H, Reis LP, Costanzo S, Orovic I, Moreira F, editors. Trends and innovations in information systems and technologies. Advances in intelligent systems and computing, vol. 1159. Berlin: Springer; 2020. p. 202\u201311. https:\/\/doi.org\/10.1007\/978-3-030-45688-7_21."},{"issue":"5","key":"1855_CR10","doi-asserted-by":"publisher","first-page":"1026","DOI":"10.1097\/ALN.0b013e3181f79a8d","volume":"113","author":"DI Sessler","year":"2010","unstructured":"Sessler DI, Sigl JC, Manberg PJ, Kelley SD, Schubert A, Chamoun NG. Broadly applicable risk stratification system for predicting duration of hospitalization and mortality. Anesthesiology. 2010;113(5):1026\u201337. https:\/\/doi.org\/10.1097\/ALN.0b013e3181f79a8d.","journal-title":"Anesthesiology"},{"issue":"11","key":"1855_CR11","doi-asserted-by":"publisher","first-page":"3058","DOI":"10.1097\/CCM.0b013e31825bc399","volume":"40","author":"SR Levin","year":"2012","unstructured":"Levin SR, Harley ET, Fackler JC, et al. Real-time forecasting of pediatric intensive care unit length of stay using computerized provider orders. Crit Care Med. 2012;40(11):3058\u201364. https:\/\/doi.org\/10.1097\/CCM.0b013e31825bc399.","journal-title":"Crit Care Med"},{"issue":"3\u20134","key":"1855_CR12","doi-asserted-by":"publisher","first-page":"198","DOI":"10.1007\/s10742-017-0169-9","volume":"17","author":"VA Smith","year":"2017","unstructured":"Smith VA, Neelon B, Maciejewski ML, Preisser JS. Two parts are better than one: modeling marginal means of semicontinuous data. Health Serv Outcomes Res Methodol. 2017;17(3\u20134):198\u2013218. https:\/\/doi.org\/10.1007\/s10742-017-0169-9.","journal-title":"Health Serv Outcomes Res Methodol"},{"issue":"3","key":"1855_CR13","doi-asserted-by":"publisher","DOI":"10.1001\/jamanetworkopen.2021.3460","volume":"4","author":"RW Moehring","year":"2021","unstructured":"Moehring RW, Phelan M, Lofgren E, et al. Development of a machine learning model using electronic health record data to identify antibiotic use among hospitalized patients. JAMA Netw Open. 2021;4(3): e213460. https:\/\/doi.org\/10.1001\/jamanetworkopen.2021.3460.","journal-title":"JAMA Netw Open"},{"key":"1855_CR14","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1016\/j.ress.2011.10.012","volume":"99","author":"SD Guikema","year":"2012","unstructured":"Guikema SD, Quiring SM. Hybrid data mining-regression for infrastructure risk assessment based on zero-inflated data. Reliab Eng Syst Saf. 2012;99:178\u201382. https:\/\/doi.org\/10.1016\/j.ress.2011.10.012.","journal-title":"Reliab Eng Syst Saf"},{"key":"1855_CR15","doi-asserted-by":"crossref","unstructured":"Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol. 1996:267\u2013288.","DOI":"10.1111\/j.2517-6161.1996.tb02080.x"},{"key":"1855_CR16","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001;45:5\u201332.","journal-title":"Mach Learn"},{"issue":"2","key":"1855_CR17","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1016\/S0957-4174(00)00026-9","volume":"19","author":"PN SubbaNarasimha","year":"2000","unstructured":"SubbaNarasimha PN, Arinze B, Anandarajan M. The predictive accuracy of artificial neural networks and multiple regression in the case of skewed data: exploration of some issues. Expert Syst Appl. 2000;19(2):117\u201323. https:\/\/doi.org\/10.1016\/S0957-4174(00)00026-9.","journal-title":"Expert Syst Appl"},{"issue":"4","key":"1855_CR18","doi-asserted-by":"publisher","first-page":"226","DOI":"10.4103\/ijabmr.IJABMR_370_18","volume":"9","author":"Z Hoodbhoy","year":"2019","unstructured":"Hoodbhoy Z, Noman M, Shafique A, Nasim A, Chowdhury D, Hasan B. Use of machine learning algorithms for prediction of fetal risk using cardiotocographic data. Int J Appl Basic Med Res. 2019;9(4):226\u201330. https:\/\/doi.org\/10.4103\/ijabmr.IJABMR_370_18.","journal-title":"Int J Appl Basic Med Res"},{"key":"1855_CR19","doi-asserted-by":"publisher","unstructured":"Sushmita S, Newman S, Marquardt J, et al. Population cost prediction on public healthcare datasets. In: Proceedings of the 5th international conference on digital health 2015. ACM; 2015. Pp. 87\u201394. https:\/\/doi.org\/10.1145\/2750511.2750521","DOI":"10.1145\/2750511.2750521"},{"issue":"2","key":"1855_CR20","doi-asserted-by":"publisher","first-page":"424","DOI":"10.1016\/j.eswa.2005.04.034","volume":"29","author":"U Kumar","year":"2005","unstructured":"Kumar U. Comparison of neural networks and regression analysis: a new insight. Expert Syst Appl. 2005;29(2):424\u201330. https:\/\/doi.org\/10.1016\/j.eswa.2005.04.034.","journal-title":"Expert Syst Appl"},{"key":"1855_CR21","unstructured":"Zhang H, Nettleton D, Zhu Z. Regression-Enhanced Random Forests Published online April 23, 2019. Accessed 18 Oct 2021. http:\/\/arxiv.org\/abs\/1904.10416"},{"key":"1855_CR22","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1146\/annurev.publhealth.20.1.125","volume":"20","author":"P Diehr","year":"1999","unstructured":"Diehr P, Yanez D, Ash A, Hornbrook M, Lin DY. Methods for analyzing health care utilization and costs. Annu Rev Public Health. 1999;20:125\u201344. https:\/\/doi.org\/10.1146\/annurev.publhealth.20.1.125.","journal-title":"Annu Rev Public Health"},{"issue":"1","key":"1855_CR23","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1023\/a:1021908220013","volume":"6","author":"F Cots","year":"2003","unstructured":"Cots F, Elvira D, Castells X, S\u00e1ez M. Relevance of outlier cases in case mix systems and evaluation of trimming methods. Health Care Manag Sci. 2003;6(1):27\u201335. https:\/\/doi.org\/10.1023\/a:1021908220013.","journal-title":"Health Care Manag Sci"},{"issue":"10","key":"1855_CR24","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0109684","volume":"9","author":"IWM Verburg","year":"2014","unstructured":"Verburg IWM, de Keizer NF, de Jonge E, Peek N. Comparison of regression methods for modeling intensive care length of stay. PLoS ONE. 2014;9(10): e109684. https:\/\/doi.org\/10.1371\/journal.pone.0109684.","journal-title":"PLoS ONE"},{"issue":"23","key":"1855_CR25","doi-asserted-by":"publisher","first-page":"4124","DOI":"10.1002\/sim.6986","volume":"35","author":"GS Collins","year":"2016","unstructured":"Collins GS, Ogundimu EO, Cook JA, Manach YL, Altman DG. Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model. Stat Med. 2016;35(23):4124\u201335. https:\/\/doi.org\/10.1002\/sim.6986.","journal-title":"Stat Med"},{"issue":"1","key":"1855_CR26","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1007\/s10651-005-6817-1","volume":"12","author":"D Fletcher","year":"2005","unstructured":"Fletcher D, MacKenzie D, Villouta E. Modelling skewed data with many zeros: a simple approach combining ordinary and logistic regression. Environ Ecol Stat. 2005;12(1):45\u201354. https:\/\/doi.org\/10.1007\/s10651-005-6817-1.","journal-title":"Environ Ecol Stat"},{"issue":"2","key":"1855_CR27","doi-asserted-by":"publisher","first-page":"848","DOI":"10.1109\/JBHI.2018.2819646","volume":"23","author":"A Kumar","year":"2019","unstructured":"Kumar A, Anjomshoa H. A two-stage model to predict surgical patients\u2019 lengths of stay from an electronic patient database. IEEE J Biomed Health Inform. 2019;23(2):848\u201356. https:\/\/doi.org\/10.1109\/JBHI.2018.2819646.","journal-title":"IEEE J Biomed Health Inform"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-022-01855-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-022-01855-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-022-01855-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,24]],"date-time":"2022-04-24T15:02:31Z","timestamp":1650812551000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-022-01855-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,24]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["1855"],"URL":"https:\/\/doi.org\/10.1186\/s12911-022-01855-0","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,24]]},"assertion":[{"value":"30 November 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 April 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 April 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All work was performed in accordance with all relevant ethical guidelines. Experimental protocols were approved by the Duke University Health System\u2019s (DUHS) IRB under protocol number: Pro00065513. The consent to participate requirement for the study was waived by the DUHS IRB.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors have no conflict of interests to declare.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"110"}}