{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:20:06Z","timestamp":1772166006206,"version":"3.50.1"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,1,9]],"date-time":"2021-01-09T00:00:00Z","timestamp":1610150400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2021,1,9]],"date-time":"2021-01-09T00:00:00Z","timestamp":1610150400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002790","name":"Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"crossref","award":["xxxxx 50503-10275 500"],"award-info":[{"award-number":["xxxxx 50503-10275 500"]}],"id":[{"id":"10.13039\/501100002790","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic\/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it needs methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits\/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait\/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover a smaller set of succinct significant patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare\/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12911-020-01356-y","type":"journal-article","created":{"date-parts":[[2021,1,9]],"date-time":"2021-01-09T04:10:20Z","timestamp":1610165420000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement"],"prefix":"10.1186","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6651-0079","authenticated-orcid":false,"given":"Pei-Yuan","family":"Zhou","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew K. C.","family":"Wong","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,1,9]]},"reference":[{"issue":"1","key":"1356_CR1","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1186\/s12911-017-0443-3","volume":"17","author":"T Chan","year":"2017","unstructured":"Chan T, Li Y, Chiau C, Zhu J, Jiang J, Huo Y. Imbalanced target prediction with pattern discovery on clinical data repositories. BMC Med Inform Decis Mak. 2017;17(1):47.","journal-title":"BMC Med Inform Decis Mak"},{"issue":"1","key":"1356_CR2","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1038\/s41591-018-0300-7","volume":"25","author":"EJ Topol","year":"2019","unstructured":"Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44\u201356.","journal-title":"Nat Med"},{"key":"1356_CR3","doi-asserted-by":"crossref","unstructured":"Aggarwal C, Sathe S. Bias reduction in outlier ensembles: the guessing game. In:  Outlier ensembles: Springer; 2017.","DOI":"10.1007\/978-3-319-54765-7"},{"issue":"2","key":"1356_CR4","doi-asserted-by":"publisher","first-page":"216","DOI":"10.1093\/bib\/bbt074","volume":"16","author":"S Naulaerts","year":"2015","unstructured":"Naulaerts S, Meysman P, Bittremieux W, Vu TN, Vanden Berghe W, Goethals B, Laukens K. A primer to frequent itemset mining for bioinformatics. Brief Bioinform. 2015;16(2):216\u201331.","journal-title":"Brief Bioinform."},{"key":"1356_CR5","doi-asserted-by":"publisher","unstructured":"Aggarwal C, Bhuiyan M, Hasan M (2014) Frequent pattern mining algorithms: a survey. In: Aggarwal C, Han J, editors. Frequent pattern mining.\nCham: Springer. https:\/\/doi.org\/10.1007\/978-3-319-07821-2_2.","DOI":"10.1007\/978-3-319-07821-2_2"},{"issue":"6","key":"1356_CR6","doi-asserted-by":"publisher","first-page":"877","DOI":"10.1109\/69.649314","volume":"9","author":"AK Wong","year":"1997","unstructured":"Wong AK, Wang Y. High-order pattern discovery from discrete-valued data. IEEE Trans Knowl Syst. 1997;9(6):877\u201393.","journal-title":"IEEE Trans Knowl Syst"},{"issue":"1","key":"1356_CR7","doi-asserted-by":"publisher","first-page":"10","DOI":"10.3390\/proteomes6010010","volume":"6","author":"P-Y Zhou","year":"2018","unstructured":"Zhou P-Y, Lee AE, Sze-To A, Wong AK. Revealing subtle functional subgroups in class A scavenger receptors by pattern discovery and disentanglement of aligned pattern clusters. Proteomes. 2018;6(1):10.","journal-title":"Proteomes"},{"issue":"1","key":"1356_CR8","doi-asserted-by":"publisher","first-page":"2045","DOI":"10.1038\/s41598-018-20473-3","volume":"8","author":"AK Wong","year":"2018","unstructured":"Wong AK, Sze-To AHY, Johanning GL. Pattern to knowledge: deep knowledge-directed machine learning for residue-residue interaction prediction. Nat Sci Rep. 2018;8(1):2045\u2013322.","journal-title":"Nat Sci Rep"},{"issue":"5","key":"1356_CR9","first-page":"103","volume":"11","author":"P-Y Zhou","year":"2018","unstructured":"Zhou P-Y, Sze-To A, Wong AK. Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics. BMC Med Genet. 2018;11(5):103.","journal-title":"BMC Med Genet"},{"key":"1356_CR10","volume-title":"2017 IEEE international conference on bioinformatics and biomedicine (BIBM)","author":"P-Y Zhou","year":"2017","unstructured":"Zhou P-Y, Wong AK, Sze-To A. Discovery and disentanglement of protein aligned pattern clusters to reveal subtle functional subgroups. In:  2017 IEEE international conference on bioinformatics and biomedicine (BIBM). Kansas City: IEEE; 2017."},{"key":"1356_CR11","unstructured":"Samek W, Wiegand T, M\u00fcller K. Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models; 2017. arXiv preprint arXiv:1708.08296."},{"key":"1356_CR12","doi-asserted-by":"crossref","unstructured":"Voosen P. How AI detectives are cracking open the black box of deep learning. Science;2017. https:\/\/www.sciencemag.org\/news\/2017\/07\/howai-detectives-are-cracking-open-black-box-deep-learning.","DOI":"10.1126\/science.aan7059"},{"issue":"7","key":"1356_CR13","doi-asserted-by":"publisher","first-page":"977","DOI":"10.1109\/TKDE.2008.38","volume":"20","author":"AK Wong","year":"2008","unstructured":"Wong AK, Li GC. Simultaneous pattern and data clustering for pattern cluster analysis. IEEE Trans Knowl Data Eng. 2008;20(7):977\u201323.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1356_CR14","doi-asserted-by":"publisher","first-page":"7847","DOI":"10.1109\/ACCESS.2016.2624418","volume":"4","author":"P-Y Zhou","year":"2016","unstructured":"Zhou P-Y, Li GC, Wong AK. An effective pattern pruning and summarization method retaining high quality patterns with high area coverage in relational datasets. IEEE Access. 2016;4:7847\u201358.","journal-title":"IEEE Access"},{"key":"1356_CR15","volume-title":"Proc. 13th Int. Conf. Data Min. DMIN\u201917","author":"AK Wong","year":"2017","unstructured":"Wong AK, Zhou P, Sze-To A. Discovering deep knowledge from relational data by attribute-value association. In:  Proc. 13th Int. Conf. Data Min. DMIN\u201917; 2017."},{"key":"1356_CR16","doi-asserted-by":"publisher","unstructured":"Cheng J, Ke Y, Ng W. \u03b4-Tolerance closed frequent itemsets. In:  Sixth international conference on data mining (ICDM'06), Hong Kong;\n2006, p. 139\u201348. https:\/\/doi.org\/10.1109\/ICDM.2006.1. https:\/\/ieeexplore.ieee.org\/abstract\/document\/4053042?casa_token=wN7NYMxevd8AAAAA:0w6-FStj5rjV-QHj7ncpXGvBj4wylQ-hkDFjL_vKq_YywE1KFlCeGdEsOXj0u_uXbASEL2s.","DOI":"10.1109\/ICDM.2006.1"},{"key":"1356_CR17","unstructured":"Li J, Liu G, Wong L. Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining; 2007, p. 430\u20139. https:\/\/dl.acm.org\/doi\/abs\/10.1145\/1281192.1281240?casa_token=gzcpJh2miJEAAAAA%3Abh-XHMSL35m8CR8CThhu8qR0MH5A5lr2xfGAGR2FGFXSKtNgBogO0qAB6T7ozLEw4-Y5kL1goZs."},{"issue":"1","key":"1356_CR18","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1109\/TSMCC.2003.809869","volume":"33","author":"AK Wong","year":"2003","unstructured":"Wong AK, Wang Y. Pattern discovery: a data driven approach to decision support. IEEE Trans Syst Man Cybern Part C Appl Rev. 2003;33(1):114\u201324.","journal-title":"IEEE Trans Syst Man Cybern Part C Appl Rev"},{"issue":"03","key":"1356_CR19","doi-asserted-by":"publisher","first-page":"1450027","DOI":"10.1142\/S0219649214500270","volume":"13","author":"N Abdelhamid","year":"2014","unstructured":"Abdelhamid N, Thabtah F. Associative classification approaches: review and comparison. J Inf Knowl Manag. 2014;13(03):1450027.","journal-title":"J Inf Knowl Manag"},{"key":"1356_CR20","unstructured":"U. M. L. Repository. Thoracic surgery data data set, 13 November 2013. Available: http:\/\/archive.ics.uci.edu\/ml\/datasets\/Thoracic+Surgery+Data."},{"issue":"2","key":"1356_CR21","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1145\/170036.170072","volume":"22","author":"R Agrawal","year":"1993","unstructured":"Agrawal R, Tomasz I, Arun S. Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 1993;22(2):207\u201316.","journal-title":"ACM SIGMOD Rec"},{"issue":"10","key":"1356_CR22","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1038\/s41551-018-0305-z","volume":"2","author":"K-H Yu","year":"2018","unstructured":"Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2(10):719\u201331.","journal-title":"Nat Biomed Eng"},{"key":"1356_CR23","doi-asserted-by":"publisher","first-page":"433","DOI":"10.1038\/s41591-018-0335-9","volume":"25","author":"HY Liang","year":"2019","unstructured":"Liang HY, Tsui B, Xia H, et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med. 2019;25:433\u20138.","journal-title":"Nat Med"},{"key":"1356_CR24","doi-asserted-by":"publisher","first-page":"116480","DOI":"10.1109\/ACCESS.2019.2932037","volume":"7","author":"L Ali","year":"2019","unstructured":"Ali L, Zhu C, Golilarz NA, Javeed A, Zhou M, Liu Y. Reliable Parkinson\u2019s disease detection by analyzing handwritten drawings: construction of an unbiased cascaded learning system based on feature selection and adaptive boosting model. IEEE Access. 2019;7:116480\u20139.","journal-title":"IEEE Access"},{"key":"1356_CR25","unstructured":"Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learning Technol. 2011;2(1):37\u201363. https:\/\/www.researchgate.net\/publication\/276412348_Evaluation_From_precision_recall_and_Fmeasure_to_ROC_informedness_markedness_correlation."},{"issue":"1","key":"1356_CR26","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1186\/s12864-019-6413-7","volume":"21","author":"D Chicco","year":"2020","unstructured":"Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):6.","journal-title":"BMC Genomics"},{"key":"1356_CR27","volume-title":"2010 20th international conference on pattern recognition","author":"KH Brodersen","year":"2010","unstructured":"Brodersen KH, Ong CS, Stephan KE, Buhmann JM. The balanced accuracy and its posterior distribution. In:  2010 20th international conference on pattern recognition; 2010."},{"key":"1356_CR28","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825\u201330.","journal-title":"J Mach Learn Res"},{"key":"1356_CR29","unstructured":"Branco P, Torgo L, Ribeiro R. A survey of predictive modelling under imbalanced distributions; 2015. arXiv preprint arXiv:1505.01658."},{"key":"1356_CR30","volume-title":"Kdd","author":"CX Ling","year":"1998","unstructured":"Ling CX, Li C. Data mining for direct marketing: problems and solutions. In:  Kdd; 1998."},{"key":"1356_CR31","unstructured":"He H, Ma Y.  Imbalanced learning: foundations, algorithms, and applications. John Wiley & Sons; 2013. https:\/\/books.google.ca\/books?hl=zh-TW&lr=&id=CVHx-Gp9jzUC&oi=fnd&pg=PT9&dq=Imbalanced+learning:+foundations,+algorithms,+and+applications&ots=2iKpHkIq5m&sig=Zr0x96yUy_-HOJrEmqEL25k3fXk#v=onepage&q=Imbalanced%20learning%3A%20foundations%2C%20algorithms%2C%20and%20applications&f=false."},{"key":"1356_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/978-3-319-98074-4","volume-title":"Learning from imbalanced data sets","author":"A Fern\u00e1ndez","year":"2018","unstructured":"Fern\u00e1ndez A, Garc\u00eda S, Galar M, Prati RC, Krawczyk B, Herrera F. Learning from imbalanced data sets. Berlin: Springer; 2018. p. 1\u2013377."},{"issue":"1","key":"1356_CR33","first-page":"559","volume":"18","author":"G Lema\u00eetre","year":"2017","unstructured":"Lema\u00eetre G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(1):559\u201363.","journal-title":"J Mach Learn Res"},{"issue":"3","key":"1356_CR34","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1007\/s10844-015-0368-1","volume":"46","author":"K Napierala","year":"2016","unstructured":"Napierala K, Stefanowski J. Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst. 2016;46(3):563\u201397.","journal-title":"J Intell Inf Syst"},{"issue":"12","key":"1356_CR35","doi-asserted-by":"publisher","first-page":"2969","DOI":"10.1109\/TKDE.2014.2310219","volume":"26","author":"DE Zhuang","year":"2014","unstructured":"Zhuang DE, Li GC, Wong AK. Discovery of temporal associations in multivariate time series. IEEE Trans Knowl Data Eng. 2014;26(12):2969\u201382.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1356_CR36","unstructured":"Wang S. Mining textural features from financial reports for corporate bankruptcy risk assessment. M.Sc. Thesis, Systems Design Engineering, University of Waterloo, Waterloo; 2017."}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-020-01356-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12911-020-01356-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-020-01356-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,12]],"date-time":"2021-01-12T15:55:01Z","timestamp":1610466901000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-020-01356-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,9]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["1356"],"URL":"https:\/\/doi.org\/10.1186\/s12911-020-01356-y","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-28409\/v4","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.3.rs-28409\/v1","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.3.rs-28409\/v2","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.3.rs-28409\/v3","asserted-by":"object"}]},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,9]]},"assertion":[{"value":"8 May 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 November 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 January 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not required as the datasets were published retrospective datasets, where the corresponding approval and consent were handled specifically","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"16"}}