{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T16:22:56Z","timestamp":1772468576641,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,8,10]],"date-time":"2025-08-10T00:00:00Z","timestamp":1754784000000},"content-version":"vor","delay-in-days":40,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"U.S. National Science Foundation","doi-asserted-by":"publisher","award":["IIS-2128307"],"award-info":[{"award-number":["IIS-2128307"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000062","name":"National Institute of Diabetes and Digestive and Kidney Diseases","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000062","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01DK132885"],"award-info":[{"award-number":["R01DK132885"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R37CA277812"],"award-info":[{"award-number":["R37CA277812"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000054","name":"National Cancer Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Classification of patient multicategory survival outcomes is important for personalized cancer treatments. Machine learning (ML) algorithms have increasingly been used to inform healthcare decisions, but these models are vulnerable to biases in data collection and algorithm creation. ML models have previously been shown to exhibit racial bias, but their fairness towards patients from different age and sex groups have yet to be studied. Therefore, we compared the multimetric performances of five ML models (random forests, multinomial logistic regression, linear support vector classifier, linear discriminant analysis, and multilayer perceptron) when classifying colorectal cancer patients (n\u2009=\u2009589) of various age, sex, and racial groups using The Cancer Genome Atlas data. All five models exhibited biases for these sociodemographic groups. We then repeated the same process on lung adenocarcinoma (n\u2009=\u2009515) to validate our findings. Surprisingly, most models tended to perform more poorly overall for the largest sociodemographic groups. Methods to optimize model performance, including testing the model on merged age, sex, or racial groups, and creating a model trained on and used for an individual or merged sociodemographic group, show potential to reduce disparities in model performance for different groups. This is supported by our regression analysis showing associations between model choice and methodology used with reduced performance disparities across demographic subgroups. Notably, these methods may be used to improve ML fairness while avoiding penalizing the model for exhibiting bias and thus sacrificing overall performance.<\/jats:p>","DOI":"10.1093\/bib\/bbaf398","type":"journal-article","created":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T22:19:49Z","timestamp":1755037189000},"source":"Crossref","is-referenced-by-count":1,"title":["Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7413-9748","authenticated-orcid":false,"given":"Catherine H","family":"Feng","sequence":"first","affiliation":[{"name":"Department of Molecular and Cellular Biology, Harvard University , 52 Oxford St, Cambridge, MA, 02138","place":["United States"]},{"name":"Department of Statistics , Harvard University, 1 Oxford St, Cambridge, MA 02138,","place":["United States"]}]},{"given":"Fei","family":"Deng","sequence":"additional","affiliation":[{"name":"Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University , 160 Frelinghuysen Rd., Piscataway, NJ 08854,","place":["United States"]}]},{"given":"Mary L","family":"Disis","sequence":"additional","affiliation":[{"name":"UW Medicine Cancer Vaccine Institute University of Washington, 850 Republican St, Seattle, WA 98109 ,","place":["United States"]}]},{"given":"Nan","family":"Gao","sequence":"additional","affiliation":[{"name":"Department of Pharmacology, Physiology, and Neuroscience, New Jersey Medical School, Rutgers University , 185 South Orange Avenue, Newark, NJ 07101,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5436-887X","authenticated-orcid":false,"given":"Lanjing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University , 160 Frelinghuysen Rd., Piscataway, NJ 08854,","place":["United States"]},{"name":"Department of Pathology, Princeton Medical Center, One Plainsboro Rd, Plainsboro , NJ 08536,","place":["United States"]},{"name":"Rutgers Cancer Institute of New Jersey , 195 Little Albany St, New Brunswick, NJ 08901,","place":["United States"]},{"name":"Department of Biological Sciences, Rutgers University , 195 University Ave, Newark, NJ 07102,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,8,10]]},"reference":[{"key":"2025081218194010300_ref1","doi-asserted-by":"publisher","first-page":"209","DOI":"10.14218\/erhm.2024.00006","article-title":"Advances in the clinical application of high-throughput proteomics","volume":"9","author":"Cui","year":"2024","journal-title":"Explor Res Hypothesis Med"},{"key":"2025081218194010300_ref2","doi-asserted-by":"publisher","first-page":"1170","DOI":"10.1038\/s41374-022-00830-7","article-title":"High-throughput proteomics: A methodological mini-review","volume":"102","author":"Cui","year":"2022","journal-title":"Lab Investig"},{"key":"2025081218194010300_ref3","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1038\/s41374-018-0125-5","article-title":"Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001-2017","volume":"99","author":"Liu","year":"2019","journal-title":"Lab Investig"},{"key":"2025081218194010300_ref4","doi-asserted-by":"publisher","first-page":"10","DOI":"10.3322\/caac.21871","article-title":"Cancer statistics, 2025","volume":"75","author":"Siegel","year":"2025","journal-title":"CA Cancer J Clin"},{"key":"2025081218194010300_ref5","doi-asserted-by":"publisher","first-page":"104161","DOI":"10.1016\/j.compbiomed.2020.104161","article-title":"Predict multicategory causes of death in lung cancer patients using clinicopathologic factors","volume":"129","author":"Deng","year":"2021","journal-title":"Comput Biol Med"},{"key":"2025081218194010300_ref6","doi-asserted-by":"publisher","first-page":"e205842","DOI":"10.1001\/jamanetworkopen.2020.5842","article-title":"Development and validation of a deep learning model for non-small cell lung cancer survival","volume":"3","author":"She","year":"2020","journal-title":"JAMA Netw Open"},{"key":"2025081218194010300_ref7","first-page":"4624","article-title":"Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: Machine learning versus multinomial models","volume":"10","author":"Deng","year":"2020","journal-title":"Am J Cancer Res"},{"key":"2025081218194010300_ref8","first-page":"1212","article-title":"Histology and oncogenic driver alterations of lung adenocarcinoma in Chinese","volume":"9","author":"Shang","year":"2019","journal-title":"Am J Cancer Res"},{"key":"2025081218194010300_ref9","doi-asserted-by":"publisher","first-page":"100320","DOI":"10.1016\/j.labinv.2023.100320","article-title":"Union with recursive feature elimination: A feature selection framework to improve the classification performance of multicategory causes of death in colorectal cancer","volume":"104","author":"Deng","year":"2024","journal-title":"Lab Investig"},{"key":"2025081218194010300_ref10","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1038\/s41374-021-00662-x","article-title":"Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: Random forest and multinomial logistic regression models","volume":"102","author":"Feng","year":"2021","journal-title":"Lab Investig"},{"key":"2025081218194010300_ref11","doi-asserted-by":"publisher","first-page":"683","DOI":"10.1007\/s10552-020-01313-0","article-title":"Association of KRAS mutation with tumor deposit status and overall survival of colorectal cancer","volume":"31","author":"Zhang","year":"2020","journal-title":"Cancer Causes Control"},{"key":"2025081218194010300_ref12","doi-asserted-by":"publisher","first-page":"102111","DOI":"10.1016\/j.labinv.2024.102111","article-title":"Implementation of digital pathology and artificial intelligence in routine pathology practice","volume":"104","author":"Zhang","year":"2024","journal-title":"Lab Investig"},{"key":"2025081218194010300_ref13","doi-asserted-by":"publisher","first-page":"104677","DOI":"10.1016\/j.jbi.2024.104677","article-title":"Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases","volume":"156","author":"Wang","year":"2024","journal-title":"J Biomed Inform"},{"key":"2025081218194010300_ref14","doi-asserted-by":"crossref","DOI":"10.3390\/a17040141","article-title":"Challenges in reducing bias using post-processing fairness for breast cancer stage classification with deep learning","volume":"17","author":"Soltan","year":"2024","journal-title":"Algorithms"},{"key":"2025081218194010300_ref15","doi-asserted-by":"publisher","first-page":"2101","DOI":"10.1002\/cncr.35307","article-title":"Uses and limitations of artificial intelligence for oncology","volume":"130","author":"Kolla","year":"2024","journal-title":"Cancer"},{"key":"2025081218194010300_ref16","doi-asserted-by":"publisher","first-page":"e164","DOI":"10.1016\/S1470-2045(22)00018-3","article-title":"Cancer detection, diagnosis, and treatment for adults with disabilities","volume":"23","author":"Iezzoni","year":"2022","journal-title":"Lancet Oncol"},{"key":"2025081218194010300_ref17","doi-asserted-by":"publisher","first-page":"4750","DOI":"10.1038\/s41467-024-48972-0","article-title":"Fairer AI in ophthalmology via implicit fairness learning for mitigating sexism and ageism","volume":"15","author":"Tan","year":"2024","journal-title":"Nat Commun"},{"key":"2025081218194010300_ref18","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1007\/s10729-024-09691-6","article-title":"Evaluating machine learning model bias and racial disparities in non-small cell lung cancer using SEER registry data","volume":"27","author":"Trentz","year":"2024","journal-title":"Health Care Manag Sci"},{"key":"2025081218194010300_ref19","doi-asserted-by":"publisher","first-page":"e2421290","DOI":"10.1001\/jamanetworkopen.2024.21290","article-title":"Fairness in predicting cancer mortality across racial subgroups","volume":"7","author":"Ganta","year":"2024","journal-title":"JAMA Netw Open"},{"key":"2025081218194010300_ref20","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/978-3-031-45249-9_22","article-title":"An investigation into race bias in random Forest models based on breast DCE-MRI derived radiomics features","volume":"14242","author":"Huti","year":"2023","journal-title":"Clin Image Based Proced Fairness AI Med Imaging Ethical Philos Issues Med Imaging (2023)"},{"key":"2025081218194010300_ref21","doi-asserted-by":"publisher","first-page":"430","DOI":"10.1038\/s41374-020-00525-x","article-title":"Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data","volume":"101","author":"Deng","year":"2021","journal-title":"Lab Investig"},{"key":"2025081218194010300_ref22","doi-asserted-by":"publisher","first-page":"103758","DOI":"10.1016\/j.clon.2025.103758","article-title":"Sex-based bias in artificial intelligence-based segmentation models in clinical oncology","volume":"39","author":"Doo","year":"2025","journal-title":"Clin Oncol (R Coll Radiol)"},{"key":"2025081218194010300_ref23","doi-asserted-by":"publisher","first-page":"1452457","DOI":"10.3389\/fncom.2024.1452457","article-title":"Sex differences in brain MRI using deep learning toward fairer healthcare outcomes","volume":"18","author":"Dibaji","year":"2024","journal-title":"Front Comput Neurosci"},{"key":"2025081218194010300_ref24","first-page":"061102","article-title":"Fairness-related performance and explainability effects in deep learning models for brain image analysis","volume":"9","author":"Stanley","year":"2022","journal-title":"J Med Imaging (Bellingham)"},{"key":"2025081218194010300_ref25","first-page":"232","article-title":"CheXclusion: Fairness gaps in deep chest X-ray classifiers","volume":"26","author":"Seyyed-Kalantari","year":"2021","journal-title":"Pac Symp Biocomput"},{"key":"2025081218194010300_ref26","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1515\/ijb-2022-0025","article-title":"Penalized logistic regression with prior information for microarray gene expression classification","volume":"20","author":"Genc","year":"2024","journal-title":"Int J Biostat"},{"key":"2025081218194010300_ref27","doi-asserted-by":"publisher","first-page":"479","DOI":"10.1038\/s41596-019-0251-6","article-title":"Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data","volume":"15","author":"Maros","year":"2020","journal-title":"Nat Protoc"},{"key":"2025081218194010300_ref28","article-title":"Large-scale benchmark study of survival prediction methods using multi-omics data","volume":"22","author":"Herrmann","journal-title":"Brief Bioinform"},{"key":"2025081218194010300_ref29","doi-asserted-by":"publisher","first-page":"pl1","DOI":"10.1126\/scisignal.2004088","article-title":"Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal","volume":"6","author":"Gao","year":"2013","journal-title":"Sci Signal"},{"key":"2025081218194010300_ref30","doi-asserted-by":"publisher","first-page":"1065","DOI":"10.3389\/fonc.2020.01065","article-title":"Integrative network fusion: A multi-omics approach in molecular profiling","volume":"10","author":"Chierici","year":"2020","journal-title":"Front Oncol"},{"key":"2025081218194010300_ref31","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1111\/biom.12621","article-title":"Estimation of the optimal regime in treatment of prostate cancer recurrence from observational data using flexible weighting models","volume":"73","author":"Shen","year":"2017","journal-title":"Biometrics"},{"key":"2025081218194010300_ref32","doi-asserted-by":"publisher","DOI":"10.3390\/cancers15071932","article-title":"From head and neck tumour and lymph node segmentation to survival prediction on PET\/CT: An end-to-end framework featuring uncertainty, fairness, and multi-region multi-modal radiomics","volume":"15","author":"Salahuddin","year":"2023","journal-title":"Cancers (Basel)"},{"key":"2025081218194010300_ref33","doi-asserted-by":"publisher","first-page":"e55820","DOI":"10.2196\/55820","article-title":"Mitigating sociodemographic bias in opioid use disorder prediction: Fairness-aware machine learning framework","volume":"3","author":"Yaseliani","year":"2024","journal-title":"JMIR AI"},{"key":"2025081218194010300_ref34","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1007\/s10140-022-02019-3","article-title":"Deep learning prediction of sex on chest radiographs: A potential contributor to biased algorithms","volume":"29","author":"Li","year":"2022","journal-title":"Emerg Radiol"},{"key":"2025081218194010300_ref35","doi-asserted-by":"crossref","first-page":"1240","DOI":"10.3389\/fgene.2019.01240","article-title":"CLARITE facilitates the quality control and analysis process for EWAS of metabolic-related traits","volume":"10","author":"Lucas","year":"2019","journal-title":"Front Genet"},{"key":"2025081218194010300_ref36","doi-asserted-by":"publisher","first-page":"e2211613120","DOI":"10.1073\/pnas.2211613120","article-title":"Bias in machine learning models can be significantly mitigated by careful training: Evidence from neuroimaging studies","volume":"120","author":"Wang","year":"2023","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025081218194010300_ref37","doi-asserted-by":"publisher","first-page":"104646","DOI":"10.1016\/j.jbi.2024.104646","article-title":"A survey of recent methods for addressing AI fairness and bias in biomedicine","volume":"154","author":"Yang","year":"2024","journal-title":"J Biomed Inform"},{"key":"2025081218194010300_ref38","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/s41746-023-00918-4","article-title":"A translational perspective towards clinical AI fairness","volume":"6","author":"Liu","year":"2023","journal-title":"NPJ Digit Med"},{"key":"2025081218194010300_ref39","doi-asserted-by":"publisher","first-page":"826","DOI":"10.1111\/biom.13632","article-title":"A joint fairness model with applications to risk predictions for underrepresented populations","volume":"79","author":"Do","year":"2023","journal-title":"Biometrics"},{"key":"2025081218194010300_ref40","doi-asserted-by":"crossref","DOI":"10.3390\/e23091165","article-title":"The problem of fairness in synthetic healthcare data","volume":"23","author":"Bhanot","year":"2021","journal-title":"Entropy (Basel)"},{"key":"2025081218194010300_ref41","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1037\/pas0001228","article-title":"Statistical learning methods and cross-cultural fairness: Trade-offs and implications for risk assessment instruments","volume":"35","author":"Ashford","year":"2023","journal-title":"Psychol Assess"},{"key":"2025081218194010300_ref42","doi-asserted-by":"crossref","first-page":"e100459","DOI":"10.1136\/bmjhci-2021-100459","article-title":"Conceptualising fairness: Three pillars for medical algorithms and health equity","volume":"29","author":"Sikstrom","year":"2022","journal-title":"BMJ Health Care Inform"},{"key":"2025081218194010300_ref43","doi-asserted-by":"publisher","first-page":"e2345050","DOI":"10.1001\/jamanetworkopen.2023.45050","article-title":"Guiding principles to address the impact of algorithm bias on racial and ethnic disparities in health and health care","volume":"6","author":"Chin","year":"2023","journal-title":"JAMA Netw Open"},{"key":"2025081218194010300_ref44","doi-asserted-by":"publisher","first-page":"2730","DOI":"10.1093\/jamia\/ocae209","article-title":"Toward a responsible future: Recommendations for AI-enabled clinical decision support","volume":"31","author":"Labkoff","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"2025081218194010300_ref45","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1001\/jama.2024.21700","article-title":"Testing and evaluation of health care applications of large language models: A systematic review","volume":"333","author":"Bedi","year":"2025","journal-title":"JAMA"},{"key":"2025081218194010300_ref46","doi-asserted-by":"publisher","first-page":"104693","DOI":"10.1016\/j.jbi.2024.104693","article-title":"Recommendations to promote fairness and inclusion in biomedical AI research and clinical use","volume":"157","author":"Griffin","year":"2024","journal-title":"J Biomed Inform"},{"key":"2025081218194010300_ref47","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1186\/s12911-025-02862-7","article-title":"Mitigating bias in AI mortality predictions for minority populations: A transfer learning approach","volume":"25","author":"Gu","year":"2025","journal-title":"BMC Med Inform Decis Mak"},{"key":"2025081218194010300_ref48","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s11604-023-01474-3","article-title":"Fairness of artificial intelligence in healthcare: Review and recommendations","volume":"42","author":"Ueda","year":"2024","journal-title":"Jpn J Radiol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf398\/63997282\/bbaf398.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf398\/63997282\/bbaf398.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T22:19:51Z","timestamp":1755037191000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf398\/8229714"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":48,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf398","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7]]},"article-number":"bbaf398"}}