{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T06:25:43Z","timestamp":1759991143695},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The incidence of ageing-related diseases has been constantly increasing in the last decades, raising the need for creating effective methods to analyze ageing-related protein data. These methods should have high predictive accuracy and be easily interpretable by ageing experts. To enable this, one needs interpretable classification models (supervised machine learning) and features with rich biological meaning. In this paper we propose two interpretable feature types based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and compare them with traditional feature types in hierarchical classification (a more challenging classification task regarding predictive performance) and binary classification (a classification task producing easier to interpret classification models). As far as we know, this work is the first to: (i) explore the potential of the KEGG pathway data in the hierarchical classification setting, (i) use the graph structure of KEGG pathways to create a feature type that quantifies the influence of a current protein on another specific protein within a KEGG pathway graph and (iii) propose a method for interpreting the classification models induced using KEGG features.<\/jats:p>\n               <jats:p>Results: We performed tests measuring predictive accuracy considering hierarchical and binary class labels extracted from the Mouse Phenotype Ontology. One of the KEGG feature types leads to the highest predictive accuracy among five individual feature types across three hierarchical classification algorithms. Additionally, the combination of the two KEGG feature types proposed in this work results in one of the best predictive accuracies when using the binary class version of our datasets, at the same time enabling the extraction of knowledge from ageing-related data using quantitative influence information.<\/jats:p>\n               <jats:p>Availability and Implementation: The datasets created in this paper will be freely available after publication.<\/jats:p>\n               <jats:p>Contact: \u00a0ff79@kent.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw363","type":"journal-article","created":{"date-parts":[[2016,6,19]],"date-time":"2016-06-19T00:18:20Z","timestamp":1466295500000},"page":"2988-2995","source":"Crossref","is-referenced-by-count":11,"title":["New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins"],"prefix":"10.1093","volume":"32","author":[{"given":"Fabio","family":"Fabris","sequence":"first","affiliation":[{"name":"School of Computing, University of Kent, CT2 7NF Canterbury, Kent, UK"}]},{"given":"Alex A.","family":"Freitas","sequence":"additional","affiliation":[{"name":"School of Computing, University of Kent, CT2 7NF Canterbury, Kent, UK"}]}],"member":"286","published-online":{"date-parts":[[2016,6,17]]},"reference":[{"key":"2023020113450416900_btw363-B1","first-page":"451","article-title":"Area under the precision-recall curve: point estimates and confidence intervals","volume":"8190","author":"Boyd","year":"2013","journal-title":"Mach. Learn. Knowl. Discov. Datab"},{"key":"2023020113450416900_btw363-B2","doi-asserted-by":"crossref","first-page":"9209","DOI":"10.1073\/pnas.1201416109","article-title":"Molecular signaling network complexity is correlated with cancer patient survivability","volume":"109","author":"Breitkreutz","year":"2012","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020113450416900_btw363-B3","doi-asserted-by":"crossref","first-page":"8177","DOI":"10.3390\/molecules15118177","article-title":"Analysis of protein pathway networks using hybrid properties","volume":"15","author":"Chen","year":"2010","journal-title":"Molecules"},{"key":"2023020113450416900_btw363-B4","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Demsar","year":"2006","journal-title":"J. Mach. Learn. Res"},{"key":"2023020113450416900_btw363-B5","doi-asserted-by":"crossref","first-page":"D726","DOI":"10.1093\/nar\/gku967","article-title":"The mouse genome database (MGD): facilitating mouse as a model for human biology and disease","volume":"43","author":"Eppig","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020113450416900_btw363-B6","first-page":"241","author":"Fabris","year":"2014"},{"key":"2023020113450416900_btw363-B7","first-page":"294","author":"Fabris","year":"2015"},{"key":"2023020113450416900_btw363-B8","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1145\/772862.772881","article-title":"Feature engineering for a gene regulation prediction task","volume":"4","author":"Forman","year":"2002","journal-title":"ACM SIGKDD Explor. Newslett"},{"key":"2023020113450416900_btw363-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2594473.2594475","article-title":"Comprehensible classification models \u2013 a position paper","volume":"15","author":"Freitas","year":"2013","journal-title":"ACM SIGKDD Explor. Newslett"},{"key":"2023020113450416900_btw363-B10","doi-asserted-by":"crossref","first-page":"1698","DOI":"10.1377\/hlthaff.2013.0052","article-title":"Substantial health and economic returns from delayed aging may warrant a new focus for medical research","volume":"32","author":"Goldman","year":"2013","journal-title":"Health Affairs"},{"key":"2023020113450416900_btw363-B11","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The Weka data mining software: an update","volume":"11","author":"Hall","year":"2009","journal-title":"SIGKDD Explorations Newsletter"},{"key":"2023020113450416900_btw363-B12","author":"Hall","year":"1999"},{"key":"2023020113450416900_btw363-B13","first-page":"49","article-title":"Dependency networks for inference, collaborative filtering, and data visualization","volume":"1","author":"Heckerman","year":"2001","journal-title":"J. Mach. Learn. Res"},{"key":"2023020113450416900_btw363-B14","first-page":"1","author":"Jungjit","year":"2014"},{"key":"2023020113450416900_btw363-B15","doi-asserted-by":"crossref","first-page":"D109","DOI":"10.1093\/nar\/gkr988","article-title":"KEGG for integration and interpretation of large-scale molecular data sets","volume":"40","author":"Kanehisa","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020113450416900_btw363-B16","doi-asserted-by":"crossref","first-page":"D457","DOI":"10.1093\/nar\/gkv1070","article-title":"Kegg as a reference resource for gene and protein annotation","volume":"44","author":"Kanehisa","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023020113450416900_btw363-B17","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1093\/dnares\/dsp019","article-title":"Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach","volume":"16","author":"Keerthikumar","year":"2009","journal-title":"DNA Res"},{"key":"2023020113450416900_btw363-B18","first-page":"80","author":"Salama","year":"2013"},{"key":"2023020113450416900_btw363-B19","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1038\/msb4100129","article-title":"Network-based prediction of protein function","volume":"3","author":"Sharan","year":"2007","journal-title":"Mol. Syst. Biol"},{"key":"2023020113450416900_btw363-B20","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1007\/s10618-010-0175-9","article-title":"A survey of hierarchical classification across different application domains","volume":"44","author":"Silla","year":"2011","journal-title":"Data Mining Knowl. Discov"},{"key":"2023020113450416900_btw363-B21","doi-asserted-by":"crossref","first-page":"979","DOI":"10.3233\/IDA-2011-0505","article-title":"Selecting different protein representations and classification algorithms in hierarchical protein function prediction","volume":"15","author":"Silla","year":"2011","journal-title":"Intell. Data Anal"},{"key":"2023020113450416900_btw363-B22","first-page":"272","author":"Struyf","year":"2005"},{"key":"2023020113450416900_btw363-B23","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/s10994-008-5077-3","article-title":"Decision trees for hierarchical multi-label classification","volume":"73","author":"Vens","year":"2008","journal-title":"Mach. Learn"},{"key":"2023020113450416900_btw363-B24","doi-asserted-by":"crossref","first-page":"2342","DOI":"10.1093\/bioinformatics\/btq418","article-title":"Metpa: a web-based metabolomics tool for pathway analysis and visualization","volume":"26","author":"Xia","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020113450416900_btw363-B25","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1186\/s12859-015-0539-7","article-title":"Feature engineering for medline citation categorization with mesh","volume":"16","author":"Yepes","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023020113450416900_btw363-B26","doi-asserted-by":"crossref","first-page":"1470","DOI":"10.1093\/bioinformatics\/btp167","article-title":"Kegggraph: a graph approach to kegg pathway in r and bioconductor","volume":"25","author":"Zhang","year":"2009","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/19\/2988\/49021067\/bioinformatics_32_19_2988.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/19\/2988\/49021067\/bioinformatics_32_19_2988.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T23:50:29Z","timestamp":1675295429000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/19\/2988\/2196584"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,17]]},"references-count":26,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2016,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw363","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,10,1]]},"published":{"date-parts":[[2016,6,17]]}}}