{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T20:45:13Z","timestamp":1757623513753,"version":"3.44.0"},"reference-count":0,"publisher":"Milano University Press","license":[{"start":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T00:00:00Z","timestamp":1757289600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ebph"],"abstract":"<jats:p>INTRODUCTION\r\nGut microbiome profiling through 16S rRNA sequencing has emerged as a promising non-invasive tool for colorectal cancer (CRC) detection. Despite their predictive accuracy, machine learning (ML) models often struggle with interpretability, especially when dealing with high-dimensional and correlated microbial data. Ensemble methods such as random forests provide strong classification performance, but their internal mechanisms are opaque. The fuzzy forest (FF) algorithm extends the random forest approach by improving feature selection under multicollinearity, but still lacks direct interpretability of predictions. To address this limitation, explainability techniques such as Partial Dependence Plots (PDPs) can be used to visualize the marginal contribution of key features, enabling better understanding of the relationships between microbial taxa and disease risk.\r\nOBJECTIVES\r\nThis study aims to enhance the interpretability of a microbiome-based classifier applied to Baxter et al.\u2019s 16S rRNA sequencing dataset by using Partial Dependence Plots (PDPs), while also reducing feature importance bias by employing the Functional Forest (FF) method, which effectively addresses the limitations of Random Forests in handling highly correlated features. PDPs allow for the visualization of the marginal effect of each microbial or clinical feature on the predicted probability of CRC. The goal is to offer interpretable insights into the nonlinear and complex relationships captured by the FF model.\r\nMETHODS\r\nWe analysed faecal samples from CRC patients and healthy controls included in the Baxter et al.\u2019s dataset. After centered log-ratio (clr) transformation of the data, we implemented the fuzzy forest (FF) algorithm for feature selection and classification. FF enhances the standard random forest by incorporating recursive feature elimination and correlation clustering, resulting in an unbiased ranking of features even in the presence of high multicollinearity. We then applied PDPs to the top-ranked microbial and clinical features. These plots allow the visualization of the marginal effect of each feature on the model's predicted probability of CRC, offering a means to interpret the impact of each variable in isolation.\r\nRESULTS\r\nThe PDPs highlighted non-linear and threshold effects for both microbial and clinical predictors in the Baxter dataset (Figure 1). Age showed a biphasic relationship with CRC probability: a decreasing effect up to around 65 years, followed by a marked increase in risk thereafter. Among microbial features, Porphyromonas (ASV 417) was positively associated with CRC in a monotonic pattern, whereas Faecalibacterium (ASV 471) and Paraprevotella (ASV 446) showed threshold behaviour, with CRC probability increasing only beyond certain abundance levels. These results support the use of non-linear models for microbiome-based prediction tasks and highlight biologically plausible patterns in feature-response relationships.\r\nCONCLUSIONS\r\nBy combining fuzzy forest feature selection with Partial Dependence Plots, we constructed an interpretable and robust modelling pipeline for microbiome-based CRC prediction. This framework enables a better understanding of the individual contribution of microbial and clinical features to model predictions, enhancing both scientific interpretability and clinical relevance.<\/jats:p>","DOI":"10.54103\/2282-0930\/29480","type":"journal-article","created":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T07:03:28Z","timestamp":1757401408000},"source":"Crossref","is-referenced-by-count":0,"title":["Explainability in Microbiome-Based Models for CRC Prediction via Partial Dependence Plots"],"prefix":"10.54103","author":[{"given":"Annamaria","family":"Porreca","sequence":"first","affiliation":[]},{"given":"Eliana","family":"Ibrahimi","sequence":"additional","affiliation":[]},{"given":"Fabrizio","family":"Maturo","sequence":"additional","affiliation":[]},{"given":"Laura Judith","family":"Marcos Zambrano","sequence":"additional","affiliation":[]},{"given":"Melisa","family":"Meto","sequence":"additional","affiliation":[]},{"given":"Marta B.","family":"Lopes","sequence":"additional","affiliation":[]}],"member":"32390","published-online":{"date-parts":[[2025,9,8]]},"container-title":["Epidemiology, Biostatistics, and Public Health"],"original-title":[],"deposited":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T07:03:28Z","timestamp":1757401408000},"score":1,"resource":{"primary":{"URL":"https:\/\/riviste.unimi.it\/index.php\/ebph\/article\/view\/29480"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.54103\/2282-0930\/29480","relation":{},"ISSN":["2282-0930"],"issn-type":[{"value":"2282-0930","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,8]]}}}