{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T18:26:59Z","timestamp":1776882419585,"version":"3.51.2"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"S14","license":[{"start":{"date-parts":[[2020,9,1]],"date-time":"2020-09-01T00:00:00Z","timestamp":1598918400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T00:00:00Z","timestamp":1601424000000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Machine learning has been utilized to predict cancer drug response from multi-omics data generated from sensitivities of cancer cell lines to different therapeutic compounds. Here, we build machine learning models using gene expression data from patients\u2019 primary tumor tissues to predict whether a patient will respond positively or negatively to two chemotherapeutics: 5-Fluorouracil and Gemcitabine.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We focused on 5-Fluorouracil and Gemcitabine because based on our exclusion criteria, they provide the largest numbers of patients within TCGA. Normalized gene expression data were clustered and used as the input features for the study. We used matching clinical trial data to ascertain the response of these patients via multiple classification methods. Multiple clustering and classification methods were compared for prediction accuracy of drug response. Clara and random forest were found to be the best clustering and classification methods, respectively. The results show our models predict with up to 86% accuracy; despite the study\u2019s limitation of sample size. We also found the genes most informative for predicting drug response were enriched in well-known cancer signaling pathways and highlighted their potential significance in chemotherapy prognosis.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>Primary tumor gene expression is a good predictor of cancer drug response. Investment in larger datasets containing both patient gene expression and drug response is needed to support future work of machine learning models. Ultimately, such predictive models may aid oncologists with making critical treatment decisions.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-020-03690-4","type":"journal-article","created":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T08:05:22Z","timestamp":1601453122000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["Leveraging TCGA gene expression data to build predictive models for cancer drug response"],"prefix":"10.1186","volume":"21","author":[{"given":"Evan A.","family":"Clayton","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Toyya A.","family":"Pujol","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John F.","family":"McDonald","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peng","family":"Qiu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,9,30]]},"reference":[{"issue":"2","key":"3690_CR1","doi-asserted-by":"publisher","first-page":"e81","DOI":"10.1016\/S1470-2045(15)00620-8","volume":"17","author":"V Prasad","year":"2016","unstructured":"Prasad V, Fojo T, Brada M. Precision oncology: origins, optimism, and potential. Lancet Oncol. 2016;17(2):e81\u20136.","journal-title":"Lancet Oncol"},{"issue":"12","key":"3690_CR2","doi-asserted-by":"publisher","first-page":"2284","DOI":"10.1200\/JCO.2004.05.166","volume":"22","author":"M Ayers","year":"2004","unstructured":"Ayers M, et al. Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer. J Clin Oncol. 2004;22(12):2284\u201393.","journal-title":"J Clin Oncol"},{"issue":"7391","key":"3690_CR3","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1038\/nature11003","volume":"483","author":"J Barretina","year":"2012","unstructured":"Barretina J, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603\u20137.","journal-title":"Nature"},{"issue":"12","key":"3690_CR4","doi-asserted-by":"publisher","first-page":"689","DOI":"10.1038\/nchembio840","volume":"2","author":"I Collins","year":"2006","unstructured":"Collins I, Workman P. New approaches to molecular cancer therapeutics. Nat Chem Biol. 2006;2(12):689.","journal-title":"Nat Chem Biol"},{"issue":"6","key":"3690_CR5","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1002\/stem.160413","volume":"16","author":"JS Ross","year":"1998","unstructured":"Ross JS, Fletcher JA. The HER-2\/neu oncogene in breast cancer: prognostic factor, predictive factor, and target for therapy. Stem Cells. 1998;16(6):413\u201328.","journal-title":"Stem Cells"},{"issue":"3","key":"3690_CR6","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1177\/0141076816631154","volume":"109","author":"S-M Tu","year":"2016","unstructured":"Tu S-M, Bilen MA, Tannir NM. Personalised cancer care: promises and challenges of targeted therapy. J R Soc Med. 2016;109(3):98\u2013105.","journal-title":"J R Soc Med"},{"key":"3690_CR7","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1038\/nature11003","volume":"483","author":"J Barretina","year":"2012","unstructured":"Barretina J, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603.","journal-title":"Nature"},{"issue":"1","key":"3690_CR8","doi-asserted-by":"publisher","first-page":"8857","DOI":"10.1038\/s41598-018-27214-6","volume":"8","author":"Y Chang","year":"2018","unstructured":"Chang Y, et al. Cancer drug response profile scan (CDRscan): a deep learning model that predicts drug effectiveness from cancer genomic signature. Sci Rep. 2018;8(1):8857.","journal-title":"Sci Rep"},{"issue":"1","key":"3690_CR9","first-page":"18","volume":"12","author":"Y-C Chiu","year":"2019","unstructured":"Chiu Y-C, et al. Predicting drug response of tumors from integrated genomic profiles by deep neural networks. BMC Med Genet. 2019;12(1):18.","journal-title":"BMC Med Genet"},{"key":"3690_CR10","doi-asserted-by":"publisher","first-page":"1202","DOI":"10.1038\/nbt.2877","volume":"32","author":"JC Costello","year":"2014","unstructured":"Costello JC, et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014;32:1202.","journal-title":"Nat Biotechnol"},{"issue":"3","key":"3690_CR11","doi-asserted-by":"publisher","first-page":"R47","DOI":"10.1186\/gb-2014-15-3-r47","volume":"15","author":"P Geeleher","year":"2014","unstructured":"Geeleher P, Cox NJ, Huang RS. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 2014;15(3):R47.","journal-title":"Genome Biol"},{"key":"3690_CR12","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1016\/j.omtn.2019.05.017","volume":"17","author":"N-N Guan","year":"2019","unstructured":"Guan N-N, et al. Anticancer drug response prediction in cell lines using weighted graph regularized matrix factorization. Mol Therapy-Nucleic Acids. 2019;17:164\u201374.","journal-title":"Mol Therapy-Nucleic Acids"},{"issue":"2","key":"3690_CR13","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1002\/psp4.2","volume":"4","author":"H Hejase","year":"2015","unstructured":"Hejase H, Chan C. Improving drug sensitivity prediction using different types of data. CPT Pharmacometrics Syst Pharmacol. 2015;4(2):98\u2013105.","journal-title":"CPT Pharmacometrics Syst Pharmacol"},{"issue":"10","key":"3690_CR14","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0186906","volume":"12","author":"C Huang","year":"2017","unstructured":"Huang C, et al. Open source machine-learning algorithms for the prediction of optimal cancer drug therapies. PLoS One. 2017;12(10):e0186906.","journal-title":"PLoS One"},{"key":"3690_CR15","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1016\/j.omtn.2018.09.011","volume":"13","author":"H Liu","year":"2018","unstructured":"Liu H, et al. Anti-cancer drug response prediction using neighbor-based collaborative filtering with global effect removal. Mol Therapy-Nucleic Acids. 2018;13:303\u201311.","journal-title":"Mol Therapy-Nucleic Acids"},{"issue":"22","key":"3690_CR16","doi-asserted-by":"publisher","first-page":"3907","DOI":"10.1093\/bioinformatics\/bty452","volume":"34","author":"C Suphavilai","year":"2018","unstructured":"Suphavilai C, Bertrand D, Nagarajan N. Predicting cancer drug response using a recommender system. Bioinformatics. 2018;34(22):3907\u201314.","journal-title":"Bioinformatics"},{"issue":"1","key":"3690_CR17","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1186\/s12859-019-2608-9","volume":"20","author":"D Wei","year":"2019","unstructured":"Wei D, et al. Comprehensive anticancer drug response prediction based on a simple cell line-drug complex network model. BMC Bioinformatics. 2019;20(1):44.","journal-title":"BMC Bioinformatics"},{"issue":"9","key":"3690_CR18","doi-asserted-by":"publisher","first-page":"1527","DOI":"10.1093\/bioinformatics\/bty848","volume":"35","author":"J Yang","year":"2019","unstructured":"Yang J, et al. A novel approach for drug response prediction in cancer cell lines via network representation learning. Bioinformatics. 2019;35(9):1527\u201335.","journal-title":"Bioinformatics"},{"issue":"5","key":"3690_CR19","first-page":"820","volume":"18","author":"F Azuaje","year":"2016","unstructured":"Azuaje F. Computational models for predicting drug responses in cancer research. Brief Bioinform. 2016;18(5):820\u20139.","journal-title":"Brief Bioinform"},{"key":"3690_CR20","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1146\/annurev-pharmtox-010814-124502","volume":"55","author":"M Vidyasagar","year":"2015","unstructured":"Vidyasagar M. Identifying predictive features in drug response using machine learning: opportunities and challenges. Annu Rev Pharmacol Toxicol. 2015;55:15\u201334.","journal-title":"Annu Rev Pharmacol Toxicol"},{"key":"3690_CR21","volume-title":"OptCluster: an R package for determining the optimal clustering algorithm and optimal number of clusters","author":"MN Sekula","year":"2015","unstructured":"Sekula MN. OptCluster: an R package for determining the optimal clustering algorithm and optimal number of clusters; 2015."},{"issue":"3","key":"3690_CR22","first-page":"18","volume":"2","author":"A Liaw","year":"2002","unstructured":"Liaw A, Wiener M. Classification and regression by randomForest. R news. 2002;2(3):18\u201322.","journal-title":"R news"},{"key":"3690_CR23","doi-asserted-by":"crossref","unstructured":"Cutler A, Cutler DR, Stevens JR. Random Forests. In C. Zhang & Y. Ma (Eds.). Ensemble machine learning: methods and applications. Boston: Springer US; 2012. p. 157\u201375.","DOI":"10.1007\/978-1-4419-9326-7_5"},{"issue":"10","key":"3690_CR24","doi-asserted-by":"publisher","first-page":"1113","DOI":"10.1038\/ng.2764","volume":"45","author":"JN Weinstein","year":"2013","unstructured":"Weinstein JN, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113.","journal-title":"Nat Genet"},{"issue":"8","key":"3690_CR25","doi-asserted-by":"publisher","first-page":"1551","DOI":"10.1038\/nprot.2013.092","volume":"8","author":"H Mi","year":"2013","unstructured":"Mi H, et al. Large-scale gene function analysis with the PANTHER classification system. Nat Protoc. 2013;8(8):1551.","journal-title":"Nat Protoc"},{"issue":"2","key":"3690_CR26","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1016\/j.cell.2018.03.022","volume":"173","author":"KA Hoadley","year":"2018","unstructured":"Hoadley KA, et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell. 2018;173(2):291\u2013304. e6.","journal-title":"Cell"},{"issue":"11","key":"3690_CR27","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1000217","volume":"4","author":"E Lee","year":"2008","unstructured":"Lee E, et al. Inferring pathway activity toward precise disease classification. PLoS Comput Biol. 2008;4(11):e1000217.","journal-title":"PLoS Comput Biol"},{"key":"3690_CR28","first-page":"1712.00336","volume-title":"BioMM: biologically-informed multi-stage machine learning for identification of epigenetic fingerprints","author":"J Chen","year":"2017","unstructured":"Chen J, Schwarz E. BioMM: biologically-informed multi-stage machine learning for identification of epigenetic fingerprintsarXiv preprint arXiv; 2017. p. 1712.00336."},{"issue":"1","key":"3690_CR29","doi-asserted-by":"publisher","first-page":"5397","DOI":"10.1038\/s41598-018-23618-6","volume":"8","author":"RC Fong","year":"2018","unstructured":"Fong RC, Scheirer WJ, Cox DD. Using human brain activity to guide machine learning. Sci Rep. 2018;8(1):5397.","journal-title":"Sci Rep"},{"issue":"11","key":"3690_CR30","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1016\/S1364-6613(98)01241-8","volume":"2","author":"RC O'Reilly","year":"1998","unstructured":"O'Reilly RC. Six principles for biologically based computational models of cortical cognition. Trends Cogn Sci. 1998;2(11):455\u201362.","journal-title":"Trends Cogn Sci"},{"key":"3690_CR31","doi-asserted-by":"publisher","unstructured":"Moreno-Layseca P, Icha J, Hamidi H, Ivaska J. Integrin trafficking in cells and tissues. Nat Cell Biol. 2019;21(2):122\u201332. https:\/\/doi.org\/10.1038\/s41556-018-0223-z.","DOI":"10.1038\/s41556-018-0223-z"},{"issue":"4","key":"3690_CR32","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1016\/j.tcb.2014.12.006","volume":"25","author":"L Seguin","year":"2015","unstructured":"Seguin L, et al. Integrins and cancer: regulators of cancer stemness, metastasis, and drug resistance. Trends Cell Biol. 2015;25(4):234\u201340.","journal-title":"Trends Cell Biol"},{"issue":"2","key":"3690_CR33","doi-asserted-by":"publisher","first-page":"68","DOI":"10.4161\/org.4.2.5851","volume":"4","author":"Y Komiya","year":"2008","unstructured":"Komiya Y, Habas R. Wnt signal transduction pathways. Organogenesis. 2008;4(2):68\u201375.","journal-title":"Organogenesis"},{"issue":"2","key":"3690_CR34","doi-asserted-by":"publisher","first-page":"84","DOI":"10.5493\/wjem.v5.i2.84","volume":"5","author":"MA Chiurillo","year":"2015","unstructured":"Chiurillo MA. Role of the Wnt\/\u03b2-catenin pathway in gastric cancer: an in-depth literature review. World J Experimental Med. 2015;5(2):84.","journal-title":"World J Experimental Med"},{"issue":"11","key":"3690_CR35","doi-asserted-by":"publisher","first-page":"2563","DOI":"10.1016\/j.bbamcr.2014.05.014","volume":"1843","author":"MD Turner","year":"2014","unstructured":"Turner MD, et al. Cytokines and chemokines: at the crossroads of cell signalling and inflammatory disease. Biochimica et Biophysica Acta (BBA) - Mol Cell Res. 2014;1843(11):2563\u201382.","journal-title":"Biochimica et Biophysica Acta (BBA) - Mol Cell Res"},{"key":"3690_CR36","volume-title":"American Association for the Advancement of Science","author":"M Hutson","year":"2018","unstructured":"Hutson M. Artificial intelligence faces reproducibility crisis. In: American Association for the Advancement of Science; 2018."},{"key":"3690_CR37","doi-asserted-by":"publisher","unstructured":"Banfield J, Raftery A. Model-based gaussian and non-gaussian clustering. Biometrics. 1993;49(3):803\u201321. https:\/\/doi.org\/10.2307\/2532201.","DOI":"10.2307\/2532201"},{"issue":"3","key":"3690_CR38","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/BF02289588","volume":"32","author":"SC Johnson","year":"1967","unstructured":"Johnson SC. Hierarchical clustering schemes. Psychometrika. 1967;32(3):241\u201354.","journal-title":"Psychometrika"},{"key":"3690_CR39","doi-asserted-by":"crossref","unstructured":"Lloyd, S., Least square quantization in PCM. Bell telephone laboratories paper. Published in journal much later: Lloyd, SP: Least squares quantization in PCM. IEEE trans. Inform. Theor. (1957\/1982) Google Scholar, 1957.","DOI":"10.1109\/TIT.1982.1056489"},{"issue":"1","key":"3690_CR40","first-page":"111","volume":"34","author":"PJ Rousseeuw","year":"1990","unstructured":"Rousseeuw PJ, Kaufman L. Finding groups in data. Ser Probability Mathematical Stat 1990. 1990;34(1):111\u20132.","journal-title":"Ser Probability Mathematical Stat 1990"},{"issue":"6","key":"3690_CR41","doi-asserted-by":"publisher","first-page":"2907","DOI":"10.1073\/pnas.96.6.2907","volume":"96","author":"P Tamayo","year":"1999","unstructured":"Tamayo P, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci. 1999;96(6):2907\u201312.","journal-title":"Proc Natl Acad Sci"},{"key":"3690_CR42","volume-title":"2018 IEEE 15th international conference on networking, sensing and control (ICNSC)","author":"H Liu","year":"2018","unstructured":"Liu H, et al. Weighted Gini index feature selection method for imbalanced data. In: 2018 IEEE 15th international conference on networking, sensing and control (ICNSC); 2018."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03690-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-020-03690-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03690-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,30]],"date-time":"2021-09-30T00:24:34Z","timestamp":1632961474000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-03690-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9]]},"references-count":42,"journal-issue":{"issue":"S14","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["3690"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-03690-4","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9]]},"assertion":[{"value":"30 September 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"364"}}