{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T15:18:13Z","timestamp":1767971893886,"version":"3.49.0"},"reference-count":28,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,3,3]],"date-time":"2019-03-03T00:00:00Z","timestamp":1551571200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The application of machine learning models for prediction and prognosis of disease development has become an irrevocable part of cancer studies aimed at improving the subsequent therapy and management of patients. The application of machine learning models for accurate prediction of survival time in breast cancer on the basis of clinical data is the main objective of the presented study. The paper discusses an approach to the problem in which the main factor used to predict survival time is the originally developed tumor-integrated clinical feature, which combines tumor stage, tumor size, and age at diagnosis. Two datasets from corresponding breast cancer studies are united by applying a data integration approach based on horizontal and vertical integration by using proper document-oriented and graph databases which show good performance and no data losses. Aside from data normalization and classification, the applied machine learning methods provide promising results in terms of accuracy of survival time prediction. The analysis of our experiments shows an advantage of the linear Support Vector Regression, Lasso regression, Kernel Ridge regression, K-neighborhood regression, and Decision Tree regression\u2014these models achieve most accurate survival prognosis results. The cross-validation for accuracy demonstrates best performance of the same models on the studied breast cancer data. As a support for the proposed approach, a Python-based workflow has been developed and the plans for its further improvement are finally discussed in the paper.<\/jats:p>","DOI":"10.3390\/info10030093","type":"journal-article","created":{"date-parts":[[2019,3,4]],"date-time":"2019-03-04T05:45:36Z","timestamp":1551678336000},"page":"93","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":44,"title":["Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies"],"prefix":"10.3390","volume":"10","author":[{"given":"Iliyan","family":"Mihaylov","sequence":"first","affiliation":[{"name":"Faculty of Mathematics and Informatics, Sofia University \u201cSt. Kliment Ohridski\u201d, 5 James Bourchier Blvd., Sofia 1164, Bulgaria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9917-9535","authenticated-orcid":false,"given":"Maria","family":"Nisheva","sequence":"additional","affiliation":[{"name":"Faculty of Mathematics and Informatics, Sofia University \u201cSt. Kliment Ohridski\u201d, 5 James Bourchier Blvd., Sofia 1164, Bulgaria"},{"name":"Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Acad. G.Bonchev Str., Block 8, Sofia 1113, Bulgaria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dimitar","family":"Vassilev","sequence":"additional","affiliation":[{"name":"Faculty of Mathematics and Informatics, Sofia University \u201cSt. Kliment Ohridski\u201d, 5 James Bourchier Blvd., Sofia 1164, Bulgaria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hull, R., Wodtke, W.D., Weissenfels, J., Weikum, G., Patil, R.S., Fikes, R.E., Patel-schneider, P.F., Mckay, D., Finin, T., and Gruber, T.R. (1997, January 11\u201315). Managing Semantic Heterogeneity in Databases: A Theoretical Perspective. Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Tucson, AZ, USA.","DOI":"10.1145\/263661.263668"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Ullman, J. (1997, January 8\u201310). Information Integration Using Logical Views. Proceedings of the International Conference on Database Theory, Delphi, Greece.","DOI":"10.1007\/3-540-62222-5_34"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"439","DOI":"10.3322\/caac.21412","article-title":"Breast cancer statistics, 2017, racial disparity in mortality by state","volume":"67","author":"DeSantis","year":"2017","journal-title":"CA Cancer J. Clin."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Cruz, J.A., and Wishart, D.S. (2006). Applications of Machine Learning in Cancer Prediction and Prognosis. Cancer Inform., 2.","DOI":"10.1177\/117693510600200030"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1021\/pr0499693","article-title":"Systems Biology, Proteomics, and the Future of Health Care: Toward Predictive, Preventative, and Personalized Medicine","volume":"3","author":"Weston","year":"2004","journal-title":"J. Proteome Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1005","DOI":"10.1093\/annonc\/mdi211","article-title":"Communicating prognosis in cancer care: A systematic review of the literature","volume":"16","author":"Tattersall","year":"2005","journal-title":"Ann. Oncol."},{"key":"ref_7","first-page":"53","article-title":"Prediction of clinical behaviour and treatment for cancers","volume":"2","author":"Futschik","year":"2003","journal-title":"Appl. Bioinform."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1016\/j.csbj.2014.11.005","article-title":"Machine learning applications in cancer prognosis and prediction","volume":"13","author":"Kourou","year":"2015","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, Y., Wang, C., and Zhang, L. (2009, January 11\u201313). Decision Tree Based Predictive Models for Breast Cancer Survivability on Imbalanced Data. Proceedings of the 3rd International Conference on Bioinformatics and Biomedical Engineering, iCBBE 2009, Beijing, China.","DOI":"10.1109\/ICBBE.2009.5162571"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.artmed.2004.07.002","article-title":"Predicting breast cancer survivability: A comparison of three data mining methods","volume":"34","author":"Delen","year":"2005","journal-title":"Artif. Intell. Med."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1504\/IJCBDD.2008.021422","article-title":"An ensemble machine learning approach to predict survival in breast cancer","volume":"1","author":"Djebbari","year":"2008","journal-title":"Int. J. Comput. Biol. Drug Des."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0933-3657(03)00033-2","article-title":"A Bayesian neural network approach for modelling censored data with an application to prognosis after surgery for breast cancer","volume":"28","author":"Lisboa","year":"2003","journal-title":"Artif. Intell. Med."},{"key":"ref_13","first-page":"433","article-title":"Assessment of nodal involvement and survival analysis in breast cancer patients using image cytometric data: Statistical, neural network and fuzzy approaches","volume":"22","author":"Seker","year":"2002","journal-title":"Anticancer Res."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1007\/s007780100054","article-title":"Answering queries using views: A survey","volume":"10","author":"Halevy","year":"2001","journal-title":"VLDB J."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, H., Guo, Y., Li, Q., George, T.J., Shenkman, E.A., and Bian, J. (2017, January 13\u201316). Data Integration through Ontology-Based Data Access to Support Integrative Data Analysis: A Case Study of Cancer Survival. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, Kansas City, MO, USA.","DOI":"10.1109\/BIBM.2017.8217849"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1109\/TCBB.2014.2377729","article-title":"Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach","volume":"12","author":"Liang","year":"2015","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1248","DOI":"10.1158\/1078-0432.CCR-17-0853","article-title":"Deep Learning\u2013Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer","volume":"24","author":"Chaudhary","year":"2018","journal-title":"Clin. Cancer Res."},{"key":"ref_18","first-page":"52","article-title":"Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review","volume":"49","author":"Abreu","year":"2016","journal-title":"ACM Comput. Surv."},{"key":"ref_19","first-page":"21","article-title":"Different Machine Learning Algorithms for Breast Cancer Diagnosis","volume":"3","author":"Aloraini","year":"2012","journal-title":"Int. J. Artif. Intell. Appl."},{"key":"ref_20","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_21","unstructured":"(2019, March 02). Python Release Python 3.7.0. Available online: https:\/\/www.python.org\/downloads\/release\/python-370\/."},{"key":"ref_22","first-page":"16","article-title":"Article: Type of NOSQL databases and its comparison with relational databases","volume":"5","author":"Nayak","year":"2013","journal-title":"Int. J. Appl. Inf. Sys."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Mihaylov, I., Nisheva, M., and Vassilev, D. (2018, January 12\u201314). Machine Learning Techniques for Survival Time Prediction in Breast Cancer. Proceedings of the 18th International Conference on Artificial Intelligence: Methodology, Systems, Applications, AIMSA 2018, Varna, Bulgaria.","DOI":"10.1007\/978-3-319-99344-7_17"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Gupta, S., Tran, T., Luo, W., Phung, D., Kennedy, R.L., Broad, A., Campbell, D., Kipp, D., Singh, M., and Khasraw, M. (2014). Machine-learning prediction of cancer survival: A retrospective study using electronic administrative records and a cancer registry. BMJ Open, 4.","DOI":"10.1136\/bmjopen-2013-004007"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3107","DOI":"10.1093\/bioinformatics\/btt549","article-title":"Are graph databases ready for bioinformatics?","volume":"29","author":"Have","year":"2013","journal-title":"Bioinformatics"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"McLachlan, G., Do, K., and Ambroise, C. (2004). Analyzing Microarray Gene Expression Data, Wiley.","DOI":"10.1002\/047172842X"},{"key":"ref_27","unstructured":"Lindqvist, N., and Price, T. (2018). Evaluation of Feature Selection Methods for Machine Learning Classification of Breast Cancer, KTH Royal Institute of Technology. Degree Project in Computer Science."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3240","DOI":"10.1016\/j.eswa.2008.01.009","article-title":"Support vector machines combined with feature selection for breast cancer diagnosis","volume":"36","author":"Akay","year":"2009","journal-title":"Expert Syst. Appl."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/3\/93\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:35:57Z","timestamp":1760186157000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/3\/93"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,3]]},"references-count":28,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["info10030093"],"URL":"https:\/\/doi.org\/10.3390\/info10030093","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,3,3]]}}}