{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T20:34:13Z","timestamp":1775162053127,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2022,8,22]],"date-time":"2022-08-22T00:00:00Z","timestamp":1661126400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Predicting corporate bankruptcy is one of the fundamental tasks in credit risk assessment. In particular, since the 2007\/2008 financial crisis, it has become a priority for most financial institutions, practitioners, and academics. The recent advancements in machine learning (ML) enabled the development of several models for bankruptcy prediction. The most challenging aspect of this task is dealing with the class imbalance due to the rarity of bankruptcy events in the real economy. Furthermore, a fair comparison in the literature is difficult to make because bankruptcy datasets are not publicly available and because studies often restrict their datasets to specific economic sectors and markets and\/or time periods. In this work, we investigated the design and the application of different ML models to two different tasks related to default events: (a) estimating survival probabilities over time; (b) default prediction using time-series accounting data with different lengths. The entire dataset used for the experiments has been made available to the scientific community for further research and benchmarking purposes. The dataset pertains to 8262 different public companies listed on the American stock market between 1999 and 2018. Finally, in light of the results obtained, we critically discuss the most interesting metrics as proposed benchmarks for future studies.<\/jats:p>","DOI":"10.3390\/fi14080244","type":"journal-article","created":{"date-parts":[[2022,8,22]],"date-time":"2022-08-22T21:30:34Z","timestamp":1661203834000},"page":"244","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1808-4487","authenticated-orcid":false,"given":"Gianfranco","family":"Lombardo","sequence":"first","affiliation":[{"name":"Department of Engineering and Architecture, University of Parma, 43124 Parma, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6592-7451","authenticated-orcid":false,"given":"Mattia","family":"Pellegrino","sequence":"additional","affiliation":[{"name":"Department of Engineering and Architecture, University of Parma, 43124 Parma, Italy"}]},{"given":"George","family":"Adosoglou","sequence":"additional","affiliation":[{"name":"Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4669-512X","authenticated-orcid":false,"given":"Stefano","family":"Cagnoni","sequence":"additional","affiliation":[{"name":"Department of Engineering and Architecture, University of Parma, 43124 Parma, Italy"}]},{"given":"Panos M.","family":"Pardalos","sequence":"additional","affiliation":[{"name":"Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3528-0260","authenticated-orcid":false,"given":"Agostino","family":"Poggi","sequence":"additional","affiliation":[{"name":"Department of Engineering and Architecture, University of Parma, 43124 Parma, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,22]]},"reference":[{"key":"ref_1","unstructured":"Danilov, C., and Konstantin, A. (2022, August 14). Corporate Bankruptcy: Assessment, Analysis and Prediction of Financial Distress, Insolvency, and Failure. Available online: https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=2467580."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1080\/01621459.2012.682806","article-title":"A class of discrete transformation survival models with application to default probability prediction","volume":"107","author":"Ding","year":"2012","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Prusak, B. (2018). Review of research into enterprise bankruptcy prediction in selected central and eastern European countries. Int. J. Financ. Stud., 6.","DOI":"10.3390\/ijfs6030060"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.eswa.2016.04.001","article-title":"Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction","volume":"58","author":"Tomczak","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1016\/j.ejor.2018.10.024","article-title":"Deep learning models for bankruptcy prediction using textual disclosures","volume":"274","author":"Mai","year":"2019","journal-title":"Eur. J. Oper. Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"9430919","DOI":"10.1155\/2022\/9430919","article-title":"Lazy Network: A Word Embedding-Based Temporal Financial Network to Avoid Economic Shocks in Asset Pricing Models","volume":"2022","author":"Adosoglou","year":"2022","journal-title":"Complexity"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci. Data"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Thakur, N., and Han, C.Y. (2021). A study of fall detection in assisted living: Identifying and improving the optimal machine learning method. J. Sens. Actuator Netw., 10.","DOI":"10.3390\/jsan10030039"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Gandomi, A.H., Chen, F., and Abualigah, L. (2022). Machine learning technologies for big data analytics. Electronics, 11.","DOI":"10.3390\/electronics11030421"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"692","DOI":"10.3846\/jbem.2018.7063","article-title":"Financial health of enterprises introducing safeguard procedure based on bankruptcy models","volume":"19","year":"2018","journal-title":"J. Bus. Econ. Manag."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"113567","DOI":"10.1016\/j.eswa.2020.113567","article-title":"Corporate default forecasting with machine learning","volume":"161","author":"Moscatelli","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3194","DOI":"10.1016\/j.eswa.2014.12.001","article-title":"Selection of Support Vector Machines based classifiers for credit risk domain","volume":"42","author":"Danenas","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1016\/j.ejor.2016.03.008","article-title":"A two-stage classification technique for bankruptcy prediction","volume":"254","year":"2016","journal-title":"Eur. J. Oper. Res."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1016\/j.asoc.2014.08.047","article-title":"A comparative study of classifier ensembles for bankruptcy prediction","volume":"24","author":"Tsai","year":"2014","journal-title":"Appl. Soft Comput."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2353","DOI":"10.1016\/j.eswa.2013.09.033","article-title":"An improved boosting based on feature selection for corporate bankruptcy prediction","volume":"41","author":"Wang","year":"2014","journal-title":"Expert Syst. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1080\/00207721.2012.720293","article-title":"Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation","volume":"45","author":"Zhou","year":"2014","journal-title":"Int. J. Syst. Sci."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Bottani, E., Mordonini, M., Franchi, B., and Pellegrino, M. (2021). Demand Forecasting for an Automotive Company with Neural Network and Ensemble Classifiers Approaches. IFIP International Conference on Advances in Production Management Systems, Springer.","DOI":"10.1007\/978-3-030-85874-2_14"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1016\/j.ejor.2014.08.016","article-title":"Prediction of financial distress: An empirical study of listed Chinese companies using data mining","volume":"241","author":"Geng","year":"2015","journal-title":"Eur. J. Oper. Res."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.dss.2007.12.002","article-title":"Bankruptcy forecasting: An empirical comparison of AdaBoost and Neural Networks","volume":"45","author":"Alfaro","year":"2008","journal-title":"Decis. Support Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1016\/j.ejor.2005.05.009","article-title":"Predicting the survival or failure of click-and-mortar corporations: A knowledge discovery approach","volume":"174","author":"Bose","year":"2006","journal-title":"Eur. J. Oper. Res."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1016\/j.jbankfin.2014.12.003","article-title":"Variable selection and corporate bankruptcy forecasts","volume":"52","author":"Tian","year":"2015","journal-title":"J. Bank. Financ."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1016\/j.ejor.2014.06.044","article-title":"Financial distress drivers in Brazilian banks: A dynamic slacks approach","volume":"240","author":"Wanke","year":"2015","journal-title":"Eur. J. Oper. Res."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1111\/j.1540-6261.1968.tb00843.x","article-title":"Financial ratios, discriminant analysis and the prediction of corporate bankruptcy","volume":"23","author":"Altman","year":"1968","journal-title":"J. Financ."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Altman, E.I., Hotchkiss, E., and Wang, W. (2019). Corporate Financial Distress, Restructuring, and Bankruptcy: Analyze Leveraged Finance, Distressed Debt, and Bankruptcy, John Wiley & Sons.","DOI":"10.1002\/9781119541929"},{"key":"ref_25","unstructured":"Kralicek, P. (1991). Fundamentals of Finance: Balance Sheets, Profit and Loss Accounts, Cash Flow, Calculation Bases, Financial Planning, Early Warning Systems, Ueberreuter."},{"key":"ref_26","first-page":"50","article-title":"Going, going, gone\u2013four factors which predict","volume":"88","author":"Taffler","year":"1977","journal-title":"Accountancy"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"109","DOI":"10.2307\/2490395","article-title":"Financial ratios and the probabilistic prediction of bankruptcy","volume":"18","author":"Ohlson","year":"1980","journal-title":"J. Account. Res."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"71","DOI":"10.2307\/2490171","article-title":"Financial ratios as predictors of failure","volume":"4","author":"Beaver","year":"1966","journal-title":"J. Account. Res."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.knosys.2011.06.020","article-title":"Two credit scoring models based on dual strategy ensemble trees","volume":"26","author":"Wang","year":"2012","journal-title":"Knowl.-Based Syst."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3028","DOI":"10.1016\/j.eswa.2008.01.018","article-title":"An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring","volume":"36","author":"Nanni","year":"2009","journal-title":"Expert Syst. Appl."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"3373","DOI":"10.1016\/j.eswa.2009.10.012","article-title":"Ensemble with Neural Networks for bankruptcy prediction","volume":"37","author":"Kim","year":"2010","journal-title":"Expert Syst. Appl."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1016\/j.eswa.2010.06.048","article-title":"A comparative assessment of ensemble learning for credit scoring","volume":"38","author":"Wang","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1016\/j.eswa.2017.04.006","article-title":"Machine-learning models and bankruptcy prediction","volume":"83","author":"Barboza","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1111\/j.1540-6288.1998.tb01367.x","article-title":"An empirical comparison of bankruptcy models","volume":"33","author":"Mossman","year":"1998","journal-title":"Financ. Rev."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1016\/j.jeconom.2012.05.002","article-title":"Multiperiod corporate default prediction\u2014A forward intensity approach","volume":"170","author":"Duan","year":"2012","journal-title":"J. Econom."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Kim, H., Cho, H., and Ryu, D. (2020). Corporate default predictions using machine learning: Literature review. Sustainability, 12.","DOI":"10.3390\/su12166325"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"114053","DOI":"10.1016\/j.eswa.2020.114053","article-title":"Neural Network embeddings on corporate annual filings for portfolio selection","volume":"164","author":"Adosoglou","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2899","DOI":"10.1111\/j.1540-6261.2008.01416.x","article-title":"In search of distress risk","volume":"63","author":"Campbell","year":"2008","journal-title":"J. Financ."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1006\/jcss.1997.1504","article-title":"A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting","volume":"55","author":"Freund","year":"1997","journal-title":"J. Comput. Syst. Sci."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A Gradient Boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_42","unstructured":"Chen, T., and He, T. (2022, August 14). Xgboost: Extreme Gradient Boosting. Available online: https:\/\/cran.microsoft.com\/snapshot\/2017-12-11\/web\/packages\/xgboost\/vignettes\/xgboost.pdf."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/14\/8\/244\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:13:31Z","timestamp":1760141611000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/14\/8\/244"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,22]]},"references-count":43,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["fi14080244"],"URL":"https:\/\/doi.org\/10.3390\/fi14080244","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,22]]}}}