{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,8]],"date-time":"2025-11-08T22:35:32Z","timestamp":1762641332714},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Disease state prediction from biomarker profiling studies is an important problem because more accurate classification models will potentially lead to the discovery of better, more discriminative markers. Data mining methods are routinely applied to such analyses of biomedical datasets generated from high-throughput \u2018omic\u2019 technologies applied to clinical samples from tissues or bodily fluids. Past work has demonstrated that rule models can be successfully applied to this problem, since they can produce understandable models that facilitate review of discriminative biomarkers by biomedical scientists. While many rule-based methods produce rules that make predictions under uncertainty, they typically do not quantify the uncertainty in the validity of the rule itself. This article describes an approach that uses a Bayesian score to evaluate rule models.<\/jats:p>\n               <jats:p>Results: We have combined the expressiveness of rules with the mathematical rigor of Bayesian networks (BNs) to develop and evaluate a Bayesian rule learning (BRL) system. This system utilizes a novel variant of the K2 algorithm for building BNs from the training data to provide probabilistic scores for IF-antecedent-THEN-consequent rules using heuristic best-first search. We then apply rule-based inference to evaluate the learned models during 10-fold cross-validation performed two times. The BRL system is evaluated on 24 published \u2018omic\u2019 datasets, and on average it performs on par or better than other readily available rule learning methods. Moreover, BRL produces models that contain on average 70% fewer variables, which means that the biomarker panels for disease prediction contain fewer markers for further verification and validation by bench scientists.<\/jats:p>\n               <jats:p>Contact: \u00a0vanathi@pitt.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq005","type":"journal-article","created":{"date-parts":[[2010,1,16]],"date-time":"2010-01-16T01:14:00Z","timestamp":1263604440000},"page":"668-675","source":"Crossref","is-referenced-by-count":32,"title":["Bayesian rule learning for biomedical data mining"],"prefix":"10.1093","volume":"26","author":[{"given":"Vanathi","family":"Gopalakrishnan","sequence":"first","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Avenue Suite M-183, Pittsburgh, PA 15260, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan L.","family":"Lustgarten","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Avenue Suite M-183, Pittsburgh, PA 15260, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shyam","family":"Visweswaran","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Avenue Suite M-183, Pittsburgh, PA 15260, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gregory F.","family":"Cooper","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Avenue Suite M-183, Pittsburgh, PA 15260, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2010,1,14]]},"reference":[{"key":"2023012511000491400_B1","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511000491400_B2","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/ng765","article-title":"MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia","volume":"30","author":"Armstrong","year":"2002","journal-title":"Nat. Genet."},{"key":"2023012511000491400_B3","first-page":"119","article-title":"Increasing the efficiency of data mining algorithms with breadth-first marker propagation","volume-title":"Proceedings of the Third International Conference on Knowledge Discovery and Data Mining.","author":"Aronis","year":"1997"},{"key":"2023012511000491400_B4","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/nm733","article-title":"Gene-expression profiles predict survival of patients with lung adenocarcinoma","volume":"8","author":"Beer","year":"2002","journal-title":"Nat. Med."},{"key":"2023012511000491400_B5","doi-asserted-by":"crossref","first-page":"13790","DOI":"10.1073\/pnas.191502998","article-title":"Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses","volume":"98","author":"Bhattacharjee","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511000491400_B6","first-page":"80","article-title":"A Bayesian approach to learning Bayesian networks with local structure","volume-title":"Proceedings of the thirteenth Conference on Uncertainty in Artificial Intelligence (UAI-97).","author":"Chickering","year":"1997"},{"key":"2023012511000491400_B7","first-page":"115","article-title":"Fast effective rule induction","volume-title":"Proceedings of the Twelfth International Conference on Machine Learning.","author":"Cohen","year":"1995"},{"key":"2023012511000491400_B8","first-page":"124","article-title":"Learning to classify english text with ILP methods","volume-title":"Advances in Inductive Logic Programming","author":"Cohen","year":"1996"},{"key":"2023012511000491400_B9","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1007\/BF00994110","article-title":"A Bayesian method for the induction of probabilistic networks from data","volume":"9","author":"Cooper","year":"1992","journal-title":"Mach. Learn."},{"key":"2023012511000491400_B10","first-page":"1022","article-title":"Multi-interval discretization of continuous-valued attributes for classification learning","volume-title":"Proceedings of the Thirteenth International Joint Conference on AI (IJCAI-93).","author":"Fayyad","year":"1993"},{"key":"2023012511000491400_B11","first-page":"256","article-title":"Using prior knowledge and rule induction methods to discover molecular markers of prognosis in lung cancer","volume-title":"AMIA Annual Symposium Proceedings","author":"Frey","year":"2005"},{"key":"2023012511000491400_B12","first-page":"252","article-title":"Learning Bayesian networks with Local Structure","volume-title":"Proceedings of the 12th Conference on Uncertainty in Artifiical Intelligence (UAI-96).","author":"Friedman","year":"1996"},{"key":"2023012511000491400_B13","first-page":"70","article-title":"Incremental reduced error pruning","volume-title":"Proceedings of the 11th International Conference on Machine Learning.","author":"Furnkranz","year":"1994"},{"key":"2023012511000491400_B14","first-page":"41","article-title":"Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4. 5","volume-title":"Proceedings of the 21st International Conference on Machine Learning","author":"Gabrilovich","year":"2004"},{"key":"2023012511000491400_B15","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2023012511000491400_B16","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1007\/11691730_10","article-title":"Rule learning for disease-specific biomarker discovery from clinical proteomic mass spectra","volume":"3916","author":"Gopalakrishnan","year":"2006","journal-title":"Springer Lect. Notes Comput. Sci."},{"key":"2023012511000491400_B17","article-title":"Proteomic data mining challenges in identification of disease-specific biomarkers from variable resolution mass spectra","volume-title":"SIAM Bioinformatics Workshop.","author":"Gopalakrishnan","year":"2004"},{"key":"2023012511000491400_B18","volume-title":"Data Mining: Concepts and Techniques","author":"Han","year":"2006","edition":"2"},{"key":"2023012511000491400_B19","first-page":"9","article-title":"Probabilistic interpretations for MYCIN's Certainty Factor","volume-title":"Proceedings of the Workshop on Uncertainty and Probability in Artificial Intelligence","author":"Heckerman","year":"1985"},{"key":"2023012511000491400_B20","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1056\/NEJM200102223440801","article-title":"Gene-expression profiles in hereditary breast cancer","volume":"344","author":"Hedenfalk","year":"2001","journal-title":"N. Engl. J. Med."},{"key":"2023012511000491400_B21","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1016\/S0140-6736(03)12775-4","article-title":"Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection","volume":"361","author":"Iizuka","year":"2003","journal-title":"Lancet"},{"key":"2023012511000491400_B22","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1038\/89044","article-title":"Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks","volume":"7","author":"Khan","year":"2001","journal-title":"Nat. Med."},{"key":"2023012511000491400_B23","article-title":"A Bayesian rule generation framework for \u2018Omic\u2019 biomedical data analysis","volume-title":"PhD Dissertation","author":"Lustgarten","year":"2009"},{"key":"2023012511000491400_B24","first-page":"527","article-title":"An evaluation of discretization methods for learning rules from biomedical datasets","volume-title":"Proceedings of the 2008 International Conference on Bioinformatics and Computational Biology","author":"Lustgarten","year":"2008"},{"key":"2023012511000491400_B25","first-page":"474","article-title":"DrC4.5: improving C4.5 by means of prior knowledge","volume-title":"Proceedings of the 2005 ACM Symposium on Applied Computing.","author":"Miriam","year":"2005"},{"key":"2023012511000491400_B26","volume-title":"Learning Bayesian Networks.","author":"Neapolitan","year":"2004"},{"key":"2023012511000491400_B27","first-page":"1602","article-title":"Gene expression-based classification of malignant gliomas correlates better with survival than histological classification","volume":"63","author":"Nutt","year":"2003","journal-title":"Cancer Res."},{"key":"2023012511000491400_B28","volume-title":"Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference","author":"Pearl","year":"1988"},{"key":"2023012511000491400_B29","doi-asserted-by":"crossref","first-page":"1576","DOI":"10.1093\/jnci\/94.20.1576","article-title":"Serum proteomic patterns for detection of prostate cancer","volume":"94","author":"Petricoin","year":"2002","journal-title":"J. Natl Cancer Inst."},{"key":"2023012511000491400_B30","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/415436a","article-title":"Prediction of central nervous system embryonal tumour outcome based on gene expression","volume":"415","author":"Pomeroy","year":"2002","journal-title":"Nature"},{"key":"2023012511000491400_B31","doi-asserted-by":"crossref","first-page":"1814","DOI":"10.1002\/cncr.20203","article-title":"Pharmacoproteomic analysis of pre-and post-chemotherapy plasma samples from patients receiving neoadjuvant or adjuvant chemotherapy for breast cancer","volume":"100","author":"Pusztai","year":"2004","journal-title":"Cancer"},{"key":"2023012511000491400_B32","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1007\/BF00116251","article-title":"Induction of decision trees","volume":"1","author":"Quinlan","year":"1986","journal-title":"Mach. Learn."},{"key":"2023012511000491400_B33","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1022645310020","article-title":"C4.5: programs for machine learning","volume":"16","author":"Quinlan","year":"1994","journal-title":"Mach. Learn."},{"key":"2023012511000491400_B34","doi-asserted-by":"crossref","first-page":"15149","DOI":"10.1073\/pnas.211566398","article-title":"Multiclass cancer diagnosis using tumor gene expression signatures","volume":"98","author":"Ramaswamy","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511000491400_B35","doi-asserted-by":"crossref","first-page":"1461","DOI":"10.1111\/j.1471-4159.2005.03478.x","article-title":"Proteomic profiling of cerebrospinal fluid identifies biomarkers for amyotrophic lateral sclerosis","volume":"95","author":"Ranganathan","year":"2005","journal-title":"J. Neurochem."},{"key":"2023012511000491400_B36","doi-asserted-by":"crossref","first-page":"1937","DOI":"10.1056\/NEJMoa012914","article-title":"The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N","volume":"346","author":"Rosenwald","year":"2002","journal-title":"Engl. J. Med."},{"key":"2023012511000491400_B37","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nm0102-68","article-title":"Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning","volume":"8","author":"Shipp","year":"2002","journal-title":"Nat. Med."},{"key":"2023012511000491400_B38","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1016\/0010-4809(75)90009-9","article-title":"Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system","volume":"8","author":"Shortliffe","year":"1975","journal-title":"Comput. Biomed. Res."},{"key":"2023012511000491400_B39","doi-asserted-by":"crossref","DOI":"10.1137\/1.9781611972719.16","article-title":"Information theoretic feature crediting in multiclass support vector machines","volume-title":"Proceedings of the 1st SIAM International Conference on Data Mining.","author":"Sindhwani","year":"2001"},{"key":"2023012511000491400_B40","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/S1535-6108(02)00030-2","article-title":"Gene expression correlates of clinical prostate cancer behavior","volume":"1","author":"Singh","year":"2002","journal-title":"Cancer Cell"},{"key":"2023012511000491400_B41","doi-asserted-by":"crossref","first-page":"10787","DOI":"10.1073\/pnas.191368598","article-title":"Chemosensitivity prediction by transcriptional profiling","volume":"98","author":"Staunton","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511000491400_B42","first-page":"7388","article-title":"Molecular classification of human carcinomas by use of gene expression signatures","volume":"61","author":"Su","year":"2001","journal-title":"Cancer Res."},{"key":"2023012511000491400_B43","first-page":"530","volume":"415","author":"van't Veer","year":"2002","journal-title":"Gene expression profiling predicts clinical outcome of breast cancer"},{"key":"2023012511000491400_B44","first-page":"759","article-title":"Patient-Specific Models for Predicting the Outcomes of Patients with Community Acquired Pneumonia","volume-title":"Proceedings of AMIA 2005 Annual Symposium.","author":"Visweswaran","year":"2005"},{"key":"2023012511000491400_B45","doi-asserted-by":"crossref","first-page":"1176","DOI":"10.1073\/pnas.98.3.1176","article-title":"Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer","volume":"98","author":"Welsh","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511000491400_B46","volume-title":"Data Mining: Practical Machine Learning Tools and Techniques.","author":"Witten","year":"2005"},{"key":"2023012511000491400_B47","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1038\/sj.pcan.4500384","article-title":"Proteinchip(R) surface enhanced laser desorption\/ionization (SELDI) mass spectrometry: a novel protein biochip technology for detection of prostate cancer biomarkers in complex protein mixtures","volume":"2","author":"Wright","year":"1999","journal-title":"Prostate Cancer Prostatic Dis."},{"key":"2023012511000491400_B48","first-page":"868","article-title":"Combination data mining methods with new medical data to predicting outcome of coronary heart disease","volume-title":"Proccedings of the International Conference on Convergence Information Technology.","author":"Xing","year":"2007"},{"key":"2023012511000491400_B49","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/S1535-6108(02)00032-6","article-title":"Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling","volume":"1","author":"Yeoh","year":"2002","journal-title":"Cancer Cell"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/5\/668\/48860437\/bioinformatics_26_5_668.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/5\/668\/48860437\/bioinformatics_26_5_668.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T11:01:23Z","timestamp":1674644483000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/5\/668\/212302"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,14]]},"references-count":49,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2010,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq005","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,3,1]]},"published":{"date-parts":[[2010,1,14]]}}}