{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T14:12:45Z","timestamp":1770819165875,"version":"3.50.1"},"reference-count":32,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,9,2]],"date-time":"2019-09-02T00:00:00Z","timestamp":1567382400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Prostate cancer can be low- or high-risk to the patient\u2019s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 patients from the screening arm of the National Cancer Institute\u2019s Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. We segmented the data into instances without prostate cancer, instances with low-risk prostate cancer, and instances with high-risk prostate cancer. We developed a pipeline to deal with imbalanced data and proposed algorithms to perform preprocessing on such datasets. We evaluated the accuracy of various machine learning algorithms in predicting high-risk prostate cancer. An accuracy of 91.5% can be achieved by the proposed pipeline, using standard scaling, SVMSMOTE sampling method, and AdaBoost for machine learning. We then evaluated the contribution of rate of change of PSA, age, BMI, and filtration by race to this model\u2019s accuracy. We identified that including the rate of change of PSA and age in our model increased the area under the curve (AUC) of the model by 6.8%, whereas BMI and race had a minimal effect.<\/jats:p>","DOI":"10.3390\/data4030129","type":"journal-article","created":{"date-parts":[[2019,9,3]],"date-time":"2019-09-03T03:06:14Z","timestamp":1567479974000},"page":"129","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":51,"title":["Predicting High-Risk Prostate Cancer Using Machine Learning Methods"],"prefix":"10.3390","volume":"4","author":[{"given":"Henry","family":"Barlow","sequence":"first","affiliation":[{"name":"School of Computer Science, University of Sydney, 2006 Sydney, Australia"}]},{"given":"Shunqi","family":"Mao","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Sydney, 2006 Sydney, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7792-2327","authenticated-orcid":false,"given":"Matloob","family":"Khushi","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Sydney, 2006 Sydney, Australia"}]}],"member":"1968","published-online":{"date-parts":[[2019,9,2]]},"reference":[{"key":"ref_1","unstructured":"U.S. Preventive Services Task Force (2018). Final Update Summary: Prostate Cancer: Screening, U.S. Preventive Services Task Force."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wang, G., Teoh, J.Y., and Choi, K. (2018, January 17\u201321). Diagnosis of prostate cancer in a Chinese population by using machine learning methods. Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA.","DOI":"10.1109\/EMBC.2018.8513365"},{"key":"ref_3","unstructured":"(2019, June 08). Prostate-Specific Antigen (PSA) Test. [4\/10\/2019], Available online: https:\/\/www.cancer.gov\/types\/prostate\/psa-fact-sheet."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"883","DOI":"10.1001\/jama.2018.0154","article-title":"Effect of a low-intensity PSA-based screening intervention on prostate cancer mortality: The CAP randomized clinical trialeffect of 1-time PSA screening on prostate cancer mortality effect of 1-time PSA screening on prostate cancer mortality","volume":"319","author":"Martin","year":"2018","journal-title":"JAMA"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"k3702","DOI":"10.1136\/bmj.k3702","article-title":"What should doctors say to men asking for a PSA test?","volume":"362","author":"Roland","year":"2018","journal-title":"BMJ"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"120","DOI":"10.7326\/0003-4819-157-2-201207170-00459","article-title":"Screening for prostate cancer: U.S. Preventive services task force recommendation statement","volume":"157","author":"Moyer","year":"2012","journal-title":"Ann. Intern. Med."},{"key":"ref_7","unstructured":"Quah, S.R. (2017). Cancer Screening: Theory and Applications. International Encyclopedia of Public Health, Academic Press. [2nd ed.]."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2801","DOI":"10.1002\/cncr.31549","article-title":"Annual report to the Nation on the status of cancer, part II: Recent changes in prostate cancer trends and disease characteristics","volume":"124","author":"Negoita","year":"2018","journal-title":"Cancer"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1038\/nrclinonc.2009.18","article-title":"Is it time to consider a role for MRI before prostate biopsy?","volume":"6","author":"Ahmed","year":"2009","journal-title":"Nat. Rev. Clin. Oncol."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lapa, P., Goncales, I., Rundo, L., and Casteli, M. (2019, January 13\u201317). Semantic learning machine improves the CNN-Based detection of prostate cancer in non-contrast-enhanced MRI. Proceedings of the ACM Genetic and Evolutionary Computation Conference Companion, Prague, Czechia.","DOI":"10.1145\/3319619.3326864"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Rundo, L., Militello, C., Russo, G., Garufi, A., Vitabile, S., Gilardi, M.C., and Mauri, G. (2017). Automated prostate gland segmentation based on an unsupervised fuzzy C-means clustering technique using multispectral T1w and T2w MR imaging. Information, 8.","DOI":"10.3390\/info8020049"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1002\/pros.23258","article-title":"Prostate specific antigen-growth curve model to predict high-risk prostate cancer","volume":"77","author":"Shoaibi","year":"2017","journal-title":"Prostate"},{"key":"ref_13","first-page":"1","article-title":"Development and validation of a multiparameterized artificial neural network for prostate cancer risk prediction and stratification","volume":"2","author":"Roffman","year":"2018","journal-title":"JCO Clin. Cancer Inf."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2240","DOI":"10.1200\/JCO.2016.69.4935","article-title":"Prediction of breast and prostate cancer risks in male BRCA1 and BRCA2 mutation carriers using polygenic risk scores","volume":"35","author":"Lecarpentier","year":"2017","journal-title":"J. Clin. Oncol."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Vickers, A.J., Cronin, A.M., Aus, G., Pihl, C.-G., Becker, C., Pettersson, K., Scardino, P.T., Hugosson, J., and Lilja, H. (2008). A panel of kallikrein markers can reduce unnecessary biopsy for prostate cancer: data from the European Randomized Study of Prostate Cancer Screening in G\u00f6teborg, Sweden. BMC Med., 6.","DOI":"10.1186\/1741-7015-6-19"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1038\/nrclinonc.2014.68","article-title":"High-risk prostate cancer-classification and therapy","volume":"11","author":"Chang","year":"2014","journal-title":"Nat. Rev. Clin. Oncol."},{"key":"ref_17","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"JMLR"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"24649","DOI":"10.1109\/ACCESS.2019.2899578","article-title":"Variance ranking attributes selection techniques for binary classification problem in imbalance data","volume":"7","author":"Ebenuwa","year":"2019","journal-title":"IEEE Access"},{"key":"ref_19","unstructured":"(2019, June 10). Imbalanced-Learn. Available online: https:\/\/imbalanced-learn.readthedocs.io\/en\/stable\/index.html."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Han, H., Wang, W.-Y., and Mao, B.-H. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. International Conference on Intelligent Computing, Springer.","DOI":"10.1007\/11538059_91"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Jeatrakul, P., Wong, K.W., and Fung, C.C. (2010). Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. International Conference on Neural Information Processing, Springer.","DOI":"10.1007\/978-3-642-17534-3_19"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1109\/TSMCB.2008.2002909","article-title":"SVMs modeling for highly imbalanced classification","volume":"39","author":"Tang","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern. Part B"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1109\/MCI.2018.2866730","article-title":"Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches","volume":"13","author":"Santos","year":"2018","journal-title":"IEEE Comput. Intell. Mag."},{"key":"ref_25","unstructured":"Brownlee, J. (2019, May 26). How to Train. a Final Machine Learning Model. Available online: https:\/\/machinelearningmastery.com\/train-final-machine-learning-model\/."},{"key":"ref_26","unstructured":"(2019, May 26). ROC Curve Analysis. Available online: https:\/\/www.medcalc.org\/manual\/roc-curves.php."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1684","DOI":"10.1093\/jnci\/djt281","article-title":"The prostate, lung, colorectal, and ovarian cancer screening trial and its associated research resource","volume":"105","author":"Zhu","year":"2013","journal-title":"J. Natl. Cancer Inst."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Khushi, M., Dean, I.M., Teber, E.T., Chircop, M., Arhtur, J.W., and Flores-Rodriguez, N. (2017). Automated classification and characterization of the mitotic spindle following knockdown of a mitosis-related protein. BMC Bioinform., 18.","DOI":"10.1186\/s12859-017-1966-4"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"8879","DOI":"10.1038\/s41598-017-08786-1","article-title":"MatCol: A tool to measure fluorescence signal colocalisation in biological systems","volume":"7","author":"Khushi","year":"2017","journal-title":"Sci. Rep."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"e654","DOI":"10.7717\/peerj.654","article-title":"Bioinformatic analysis of cis-regulatory interactions between progesterone and estrogen receptors in breast cancer","volume":"2","author":"Khushi","year":"2014","journal-title":"Peer J."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"193","DOI":"10.18632\/oncotarget.6220","article-title":"Prostate cancer stem cells: the role of androgen and estrogen receptors","volume":"7","author":"Galasso","year":"2016","journal-title":"Oncotarget"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2","DOI":"10.3389\/fonc.2018.00002","article-title":"Estrogens and their receptors in prostate cancer: Therapeutic implications","volume":"8","author":"Galasso","year":"2018","journal-title":"Front. Oncol."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/4\/3\/129\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:16:12Z","timestamp":1760188572000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/4\/3\/129"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,2]]},"references-count":32,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,9]]}},"alternative-id":["data4030129"],"URL":"https:\/\/doi.org\/10.3390\/data4030129","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,2]]}}}