{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T12:13:07Z","timestamp":1762863187905,"version":"build-2065373602"},"reference-count":63,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,6,25]],"date-time":"2022-06-25T00:00:00Z","timestamp":1656115200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Preterm birth (PTB) is the leading cause of infant mortality in the U.S. and globally. The goal of this study is to increase understanding of PTB risk factors that are present early in pregnancy by leveraging statistical and machine learning (ML) techniques on big data. The 2016 U.S. birth records were obtained and combined with two other area-level datasets, the Area Health Resources File and the County Health Ranking. Then, we applied logistic regression with elastic net regularization, random forest, and gradient boosting machines to study a cohort of 3.6 million singleton deliveries to identify generalizable PTB risk factors. The response variable is preterm birth, which includes spontaneous and indicated PTB, and we performed a binary classification. Our results show that the most important predictors of preterm birth are gestational and chronic hypertension, interval since last live birth, and history of a previous preterm birth, which explains 10.92, 5.98, and 5.63% of the predictive power, respectively. Parents\u2019 education is one of the influential variables in predicting PTB, explaining 7.89% of the predictive power. The relative importance of race declines when parents are more educated or have received adequate prenatal care. The gradient boosting machines outperformed with an AUC of 0.75 (sensitivity: 0.64, specificity: 0.73) for the validation dataset. In this study, we compare our results with seminal and most related studies to demonstrate the superiority of our results. The application of ML techniques improved the performance measures in the prediction of preterm birth. The results emphasize the importance of socioeconomic factors such as parental education as one of the most important indicators of preterm birth. More research is needed on these mechanisms through which socioeconomic factors affect biological responses.<\/jats:p>","DOI":"10.3390\/info13070310","type":"journal-article","created":{"date-parts":[[2022,6,26]],"date-time":"2022-06-26T09:00:13Z","timestamp":1656234013000},"page":"310","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Identifying the Early Signs of Preterm Birth from U.S. Birth Records Using Machine Learning Techniques"],"prefix":"10.3390","volume":"13","author":[{"given":"Alireza","family":"Ebrahimvandi","sequence":"first","affiliation":[{"name":"Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24060, USA"},{"name":"UCSF Health, San Francisco, CA 94143, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3809-3778","authenticated-orcid":false,"given":"Niyousha","family":"Hosseinichimeh","sequence":"additional","affiliation":[{"name":"Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24060, USA"}]},{"given":"Zhenyu James","family":"Kong","sequence":"additional","affiliation":[{"name":"Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24060, USA"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1742-4755-10-S1-S2","article-title":"Born Too Soon: The global epidemiology of 15 million preterm births","volume":"10","author":"Blencowe","year":"2013","journal-title":"Reprod. Health"},{"key":"ref_2","first-page":"1","article-title":"Infant Mortality Statistics from the 2013 Period: Linked Birth\/Infant Death Data Set","volume":"64","author":"Mathews","year":"2015","journal-title":"Natl. Vital Stat."},{"key":"ref_3","first-page":"1271","article-title":"Understanding State-Level Variations in the US Infant Mortality: 2000 to 2015","volume":"36","author":"Ebrahimvandi","year":"2018","journal-title":"Am. J. Perinatol."},{"key":"ref_4","unstructured":"Butler, A.S., and Behrman, R.E. (2007). Preterm Birth: Causes, Consequences, and Prevention, National Academies Press."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/S0140-6736(08)60136-1","article-title":"An overview of mortality and sequelae of preterm birth from infancy to adulthood","volume":"371","author":"Saigal","year":"2008","journal-title":"Lancet"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1317","DOI":"10.1097\/AOG.0000000000000276","article-title":"Identification of candidates for progesterone: Why, who, how, and when?","volume":"123","author":"Iams","year":"2014","journal-title":"Obstet. Gynecol."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Katz, K.S., Blake, S.M., Milligan, R.A., Sharps, P.W., White, D.B., Rodan, M.F., Rossi, M., and Murray, K.B. (2008). The design, implementation and acceptability of an integrated intervention to address multiple behavioral and psychosocial risk factors among pregnant African American women. BMC Pregnancy Childbirth, 8.","DOI":"10.1186\/1471-2393-8-22"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/S0140-6736(08)60074-4","article-title":"Epidemiology and causes of preterm birth","volume":"371","author":"Goldenberg","year":"2008","journal-title":"Lancet"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1016\/j.placenta.2020.07.021","article-title":"Prevention of preterm birth: Proactive and reactive clinical practice-are we on the right track?","volume":"98","author":"Singh","year":"2020","journal-title":"Placenta"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1097\/AOG.0000000000001195","article-title":"A Core Outcome Set for Evaluation of Interventions to Prevent Preterm Birth","volume":"127","author":"Hooft","year":"2016","journal-title":"Obstet. Gynecol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1002\/sres.2563","article-title":"Using a Stakeholder Analysis to Improve Systems Modelling of Health Issues: The Impact of Progesterone Therapy on Infant Mortality in Ohio","volume":"36","author":"Hosseinichimeh","year":"2019","journal-title":"Syst. Res. Behav. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"115278","DOI":"10.1016\/j.eswa.2021.115278","article-title":"A DEA evaluation of US States\u2019 healthcare systems in terms of their birth outcomes","volume":"182","author":"Darabi","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"54.e1","DOI":"10.1016\/j.ajog.2013.09.004","article-title":"The short-term prediction of preterm birth: A systematic review and diagnostic metaanalysis","volume":"210","author":"Boots","year":"2014","journal-title":"Am. J. Obstet. Gynecol."},{"key":"ref_14","first-page":"CD004902","article-title":"Risk-scoring systems for predicting preterm birth with the aim of reducing associated adverse outcomes","volume":"2015","author":"Davey","year":"2015","journal-title":"Cochrane Database Syst. Rev."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1097\/AOG.0b013e3181dffcdb","article-title":"Inherited Predisposition to Spontaneous Preterm Delivery","volume":"115","author":"Bhattacharya","year":"2010","journal-title":"Obstet. Gynecol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"131.e1","DOI":"10.1016\/j.ajog.2013.09.014","article-title":"The NICHD Consecutive Pregnancies Study: Recurrent preterm delivery by subtype","volume":"210","author":"Laughon","year":"2013","journal-title":"Am. J. Obstet. Gynecol."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1186\/s12884-014-0368-0","article-title":"Lessons learned from the Philadelphia Collaborative Preterm Prevention Project: The prevalence of risk factors and program participation rates among women in the intervention group","volume":"14","author":"Webb","year":"2014","journal-title":"BMC Pregnancy Childbirth"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2173","DOI":"10.1038\/s41372-021-01109-3","article-title":"Clinical risk models for preterm birth less than 28 weeks and less than 32 weeks of gestation using a large retrospective cohort","volume":"41","author":"Belaghi","year":"2021","journal-title":"J. Perinatol."},{"key":"ref_19","unstructured":"Martin, J.A., Hamilton, B.E., Osterman, M.J., Driscoll, A.K., and Drake, P. (2018). Births: Final Data for 2016, National Vital Statistics Reports."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Fuchs, F., Monet, B., Ducruet, T., Chaillet, N., and Audibert, F. (2018). Effect of maternal age on the risk of preterm birth: A large cohort study. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0191002"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1097\/AOG.0b013e3181842087","article-title":"Preterm prediction study: Comparison of the cervical score and Bishop score for prediction of spontaneous preterm delivery","volume":"112","author":"Newman","year":"2008","journal-title":"Obstet. Gynecol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"770-e20","DOI":"10.1111\/j.1471-0528.2007.01315.x","article-title":"The control of hypertension in pregnancy study pilot trial","volume":"114","author":"Magee","year":"2007","journal-title":"BJOG Int. J. Obstet. Gynaecol."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"472.e1","DOI":"10.1016\/j.ajog.2013.03.005","article-title":"A proposed method to predict preterm birth using clinical data, standard maternal serum screening, and cholesterol","volume":"208","author":"Alleman","year":"2013","journal-title":"Am. J. Obstet. Gynecol."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"783","DOI":"10.1016\/j.annepidem.2018.08.008","article-title":"Application of machine-learning to predict early spontaneous preterm birth among nulliparous non-Hispanic black and white women","volume":"28","author":"Weber","year":"2018","journal-title":"Ann. Epidemiol."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"103334","DOI":"10.1016\/j.jbi.2019.103334","article-title":"Deep learning predicts extreme preterm birth from electronic health records","volume":"100","author":"Gao","year":"2019","journal-title":"J. Biomed. Inform."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1016\/j.jbi.2003.09.020","article-title":"Data mining issues and opportunities for building nursing knowledge","volume":"36","author":"Goodwin","year":"2003","journal-title":"J. Biomed. Inform."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1136\/jamia.1994.95153433","article-title":"Machine Learning for an Expert System to Predict Preterm Birth Risk","volume":"1","author":"Woolery","year":"1994","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"5384","DOI":"10.1016\/j.eswa.2010.10.017","article-title":"Exploring the risk factors of preterm birth using data mining","volume":"38","author":"Chen","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_30","unstructured":"Van Dyne, M., Woolery, L., Gryzmala-Busse, J., and Tsatsoulis, C. (1994, January 1\u20134). Using machine learning and expert systems to predict preterm delivery in pregnant women. Proceedings of the Tenth Conference on Artificial Intelligence for Applications, San Antonia, TX, USA."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"9635526","DOI":"10.1155\/2022\/9635526","article-title":"Machine Learning-Based Prediction Model of Preterm Birth Using Electronic Health Record","volume":"2022","author":"Sun","year":"2022","journal-title":"J. Health Eng."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1177\/10998004211025641","article-title":"Systematic Review of Prediction Models for Preterm Birth Using CHARMS","volume":"23","author":"Kim","year":"2021","journal-title":"Biol. Res. Nurs."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Pereira, G., Regan, A.K., Wong, K., and Tessema, G.A. (2021). Gestational age as a predictor for subsequent preterm birth in New South Wales, Australia. BMC Pregnancy Childbirth, 21.","DOI":"10.1186\/s12884-021-04084-x"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"780389","DOI":"10.3389\/fbioe.2021.780389","article-title":"Using Machine Learning to Predict Complications in Pregnancy: A Systematic Review","volume":"9","author":"Bertini","year":"2022","journal-title":"Front. Bioeng. Biotechnol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"600.e1","DOI":"10.1016\/j.ajog.2017.02.025","article-title":"17-alpha Hydroxyprogesterone caproate did not reduce the rate of recurrent preterm birth in a prospective cohort study","volume":"216","author":"Nelson","year":"2017","journal-title":"Am. J. Obstet. Gynecol."},{"key":"ref_36","unstructured":"Robinson, J.N., and Norwitz, E. (2019, March 02). Preterm Birth: Risk Factors, Interventions for Risk Reduction, and Maternal Prognosis. Available online: https:\/\/www.uptodate.com\/contents\/preterm-birth-risk-factors-interventions-for-risk-reduction-and-maternal-prognosis."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1056\/NEJMcp1103640","article-title":"Prevention of preterm parturition","volume":"370","author":"Iams","year":"2014","journal-title":"N. Engl. J. Med."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"He, J.-R., Ramakrishnan, R., Lai, Y.-M., Li, W.-D., Zhao, X., Hu, Y., Chen, N.-N., Hu, F., Lu, J.-H., and Wei, X.-L. (2018). Predictions of Preterm Birth from Early Pregnancy Characteristics: Born in Guangzhou Cohort Study. J. Clin. Med., 7.","DOI":"10.3390\/jcm7080185"},{"key":"ref_39","unstructured":"Centers for Disease Control and Prevention (CDC) (2019, March 02). Linked Birth\/Infant Death Records 2007\u20132019, Available online: https:\/\/wonder.cdc.gov\/lbd-current.html."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Bengio, Y. (2013, January 29\u201331). Deep learning of representations: Looking forward. Proceedings of the International Conference on Statistical Language and Speech Processing, Tarragona, Spain.","DOI":"10.1007\/978-3-642-39593-2_1"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Goldstein, M., and Uchida, S. (2016). A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. PLoS ONE, 11.","DOI":"10.1371\/journal.pone.0152173"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"412957","DOI":"10.1155\/2015\/412957","article-title":"One-Class Classification with Extreme Learning Machine","volume":"2015","author":"Leng","year":"2015","journal-title":"Math. Probl. Eng."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Friedman, J., Hastie, T., and Tibshirani, R. (2001). The Elements of Statistical Learning, Springer.","DOI":"10.1007\/978-0-387-21606-5"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1108\/03684921311295547","article-title":"Boosting: Foundations and algorithms","volume":"42","author":"Schapire","year":"2013","journal-title":"Kybernetes"},{"key":"ref_48","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017, January 4\u20139). Lightgbm: A highly efficient gradient boosting decision tree. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"340","DOI":"10.1097\/00006199-200111000-00003","article-title":"Data Mining Methods Find Demographic Predictors of Preterm Birth","volume":"50","author":"Goodwin","year":"2001","journal-title":"Nurs. Res."},{"key":"ref_50","unstructured":"Vovsha, I., Rajan, A., Salleb-Aouissi, A., Raja, A., Radeva, A., Diab, H., Tomar, A., and Wapner, R. (2014, January 24\u201326). Predicting preterm birth is not elusive: Machine learning paves the way to individual wellness. Proceedings of the 2014 AAAI Spring Symposium Series, Palo Alto, CA, USA."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1016\/S0002-9378(98)70439-9","article-title":"The preterm prediction study: Risk factors for indicated preterm births","volume":"178","author":"Meis","year":"1998","journal-title":"Am. J. Obstet. Gynecol."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1053\/j.semperi.2017.08.010","article-title":"Racial and ethnic differences in preterm birth: A complex, multifactorial problem","volume":"41","author":"Manuck","year":"2017","journal-title":"Semin. Perinatol."},{"key":"ref_53","first-page":"62","article-title":"Closing the Black-White gap in birth outcomes: A life-course approach","volume":"20","author":"Lu","year":"2010","journal-title":"Ethn. Dis."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1016\/j.clp.2011.06.007","article-title":"The Contribution of Maternal Stress to Preterm Birth: Issues and Considerations","volume":"38","author":"Wadhwa","year":"2011","journal-title":"Clin. Perinatol."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1007\/s13224-011-0092-x","article-title":"Placental Insufficiency and Fetal Growth Restriction","volume":"61","author":"Krishna","year":"2011","journal-title":"J. Obstet. Gynecol. India"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1056\/NEJM199504273321701","article-title":"Association of Young Maternal Age with Adverse Reproductive Outcomes","volume":"332","author":"Fraser","year":"1995","journal-title":"N. Engl. J. Med."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1016\/j.ajog.2004.09.021","article-title":"The Preterm Prediction study: Association between maternal body mass index and spontaneous and indicated preterm birth","volume":"192","author":"Hendler","year":"2005","journal-title":"Am. J. Obstet. Gynecol."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.ejogrb.2004.07.041","article-title":"The accuracy of maternal anthropometry measurements as predictor for spontaneous preterm birth\u2014A systematic review","volume":"119","author":"Honest","year":"2005","journal-title":"Eur. J. Obstet. Gynecol. Reprod. Biol."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"1715","DOI":"10.2105\/AJPH.2011.300564","article-title":"Temporal Changes in Socioeconomic Influences on Health: Maternal Education and Preterm Birth","volume":"102","author":"Galea","year":"2012","journal-title":"Am. J. Public Health"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.annepidem.2012.10.007","article-title":"Extreme maternal education and preterm birth: Time-to-event analysis of age and nativity-dependent risks","volume":"23","author":"Auger","year":"2013","journal-title":"Ann. Epidemiol."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"1415","DOI":"10.1503\/cmaj.051096","article-title":"Effect of neighbourhood income and maternal education on birth outcomes: A population-based study","volume":"174","author":"Luo","year":"2006","journal-title":"Can. Med. Assoc. J."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1111\/aogs.13358","article-title":"Prediction models for the risk of spontaneous preterm birth based on maternal characteristics: A systematic review and independent external validation","volume":"97","author":"Meertens","year":"2018","journal-title":"Acta Obstet. Gynecol. Scand."},{"key":"ref_63","first-page":"1","article-title":"Measuring Gestational Age in Vital Statistics Data: Transitioning to the Obstetric Estimate","volume":"64","author":"Martin","year":"2015","journal-title":"Natl. Vital Stat. Rep."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/7\/310\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:40:03Z","timestamp":1760139603000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/7\/310"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,25]]},"references-count":63,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["info13070310"],"URL":"https:\/\/doi.org\/10.3390\/info13070310","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2022,6,25]]}}}