{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,2,3]],"date-time":"2024-02-03T08:40:37Z","timestamp":1706949637476},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2006,3,7]],"date-time":"2006-03-07T00:00:00Z","timestamp":1141689600000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high \"surprise score\" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand.<\/jats:p><jats:p>The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1472-6947-6-13","type":"journal-article","created":{"date-parts":[[2006,3,10]],"date-time":"2006-03-10T19:18:26Z","timestamp":1142018306000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method"],"prefix":"10.1186","volume":"6","author":[{"given":"Mir S","family":"Siadaty","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"William A","family":"Knaus","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2006,3,7]]},"reference":[{"key":"102_CR1","volume-title":"Principles of Knowledge Discovery in Databases, CMPUT690","author":"OR Za\u00efane","year":"1999","unstructured":"Za\u00efane OR: Principles of Knowledge Discovery in Databases, CMPUT690. 1999, University of Alberta, Department of Computing Science"},{"key":"102_CR2","volume-title":"Chicago","author":"R Grossman","year":"1999","unstructured":"Grossman R, Kasif S, Moore R, Rocke D, Ullman J: Data Mining Research: Opportunities and Challenges, A Report of three NSF Workshops on Mining Large, Massive, and Distributed Data. Chicago. 1999, [http:\/\/www.rgrossman.com\/epapers\/dmr-v8-4-5.htm]"},{"key":"102_CR3","unstructured":"Azmy A: SuperQuery; Data Mining for Everyone. Online white paper. [http:\/\/www.azmy.com\/wp1.htm]"},{"key":"102_CR4","volume-title":"The Handbook of Data Mining. In Human Factors and Ergonomics Series: Lea","author":"N Ye","year":"2003","unstructured":"Ye N: The Handbook of Data Mining. In Human Factors and Ergonomics Series: Lea. 2003"},{"key":"102_CR5","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1109\/72.977258","volume":"13","author":"S Mitra","year":"2002","unstructured":"Mitra S, Pal SK, Mitra P: Data mining in soft computing framework: A survey. IEEE Transactions on Neural Networks. 2002, 13: 3-14.","journal-title":"IEEE Transactions on Neural Networks"},{"key":"102_CR6","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3283-2","volume-title":"Knowledge Discovery and Measures of Interest, Kluwer","author":"Hilderman and Hamilton","year":"2001","unstructured":"Hilderman and Hamilton: Knowledge Discovery and Measures of Interest, Kluwer. 2001"},{"issue":"n.6","key":"102_CR7","doi-asserted-by":"publisher","first-page":"970","DOI":"10.1109\/69.553165","volume":"8","author":"A Silberschatz","year":"1996","unstructured":"Silberschatz A, Tuzhilin A: What Makes Patterns Interesting in Knowledge Discovery Systems. IEEE Transactions on Knowledge and Data Engineering. 1996, 8 (n.6): 970-974.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"102_CR8","first-page":"145","volume-title":"Proc. of the 5th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining","author":"R Bayardo","year":"1999","unstructured":"Bayardo R, Agrawal R: Mining the most interesting rules. Proc. of the 5th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 1999, 145-154."},{"key":"102_CR9","unstructured":"Sahar S: Interestingness Via What Is Not Interesting. KDD'99."},{"key":"102_CR10","volume-title":"IEEE Intellgent Systems","author":"B Liu","year":"2000","unstructured":"Liu B, Hsu W, Chen S, Ma Y: Analyzing the Subjective Interestingness of Association Rules. IEEE Intellgent Systems. 2000"},{"key":"102_CR11","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1016\/S0950-7051(99)00019-2","volume":"12","author":"AA Freitas","year":"1999","unstructured":"Freitas AA: On rule interestingness measures. Knowledge-Based Systems journal. 1999, 12: 309-315.","journal-title":"Knowledge-Based Systems journal"},{"key":"102_CR12","unstructured":"Pohle C: Integrating and Updating Domain Knowledge With Data Mining. Leipzig Graduate School of Management, [http:\/\/www.hhl.de\/fileadmin\/LS\/micro\/Download\/Pohle_2003_IntegratingAndUpdatingB.pdf]"},{"key":"102_CR13","doi-asserted-by":"publisher","DOI":"10.1002\/0471249688","volume-title":"Categorical Data Analysis","author":"A Agresti","year":"2002","unstructured":"Agresti A: Categorical Data Analysis. 2002, New York: Wiley-Interscience"},{"key":"102_CR14","doi-asserted-by":"publisher","DOI":"10.1002\/9781118186435","volume-title":"Robust Statistics : The Approach Based on Influence Functions","author":"FR Hampel","year":"2005","unstructured":"Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA: Robust Statistics : The Approach Based on Influence Functions. 2005, New York: Wiley"},{"key":"102_CR15","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-4541-9","volume-title":"An Introduction to the Bootstrap","author":"B Efron","year":"1993","unstructured":"Efron B, Tibshirani R: An Introduction to the Bootstrap. 1993, New York: Chapman and Hall"},{"key":"102_CR16","volume-title":"An Introduction to Categorical Data Analysis","author":"A Agresti","year":"1996","unstructured":"Agresti A: An Introduction to Categorical Data Analysis. 1996, New York: Wiley-Interscience"},{"key":"102_CR17","unstructured":"Clinical data repository. [http:\/\/cdr.virginia.edu\/cdr\/]"},{"key":"102_CR18","unstructured":"Entrez PubMed. [http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi]"},{"key":"102_CR19","unstructured":"Unified medical language system (UMLS). [http:\/\/www.nlm.nih.gov\/research\/umls\/]"},{"key":"102_CR20","unstructured":"CPAN. [http:\/\/www.cpan.org\/]"},{"key":"102_CR21","unstructured":"The R project for statistical computing. [http:\/\/www.r-project.org]"},{"key":"102_CR22","unstructured":"Nazeri Z, Bloedorn E: Exploiting Available Domain Knowledge to Improve Mining Aviation Safety and Network Security Data. The MITRE Corporation, McLean, Virginia 22102 U.S.A"},{"key":"102_CR23","doi-asserted-by":"crossref","unstructured":"Basu S, Mooney R, Pasupuleti K, Ghosh J: Evaluating the novelty of text-mined rules using lexical knowledge. Proc KDD-2001.","DOI":"10.1145\/502512.502544"},{"key":"102_CR24","doi-asserted-by":"publisher","first-page":"524","DOI":"10.1037\/0033-2909.125.5.524","volume":"125","author":"D Klahr","year":"1999","unstructured":"Klahr D, Simon HA: Studies of scientific discovery: complementary approaches and convergent findings. Psychol Bull. 1999, 125: 524-543.","journal-title":"Psychol Bull"},{"key":"102_CR25","volume-title":"Technical Report TR04-07","author":"ML Antonie","year":"2004","unstructured":"Antonie ML, Zaane O: Mining positive and negative association rules: An approach for confined rules. Technical Report TR04-07. 2004, Dept. of Computing Science, University of Alberta"},{"key":"102_CR26","volume-title":"KDD","author":"PN Tan","year":"2000","unstructured":"Tan PN, Kumar V: Interestingness measures for association patterns: A perspective. KDD. 2000"},{"issue":"1","key":"102_CR27","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1023\/A:1009713703947","volume":"2","author":"C Silverstein","year":"1998","unstructured":"Silverstein C, Brin S, Motwani R: Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery. 1998, 2 (1): 39-68.","journal-title":"Data Mining and Knowledge Discovery"},{"key":"102_CR28","unstructured":"Mullins IM, Siadaty MS, Lyman J, Scully K, Garrett CT, Miller WG, Muller R, Robson B, Apte C, Weiss S, Rigoutsos I, Platt D, Cohen S, Knaus WA: Data mining and clinical data repositories: Insights from a 667,000 patient data set. Comp Biol Med."},{"key":"102_CR29","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1145\/319950.320008","volume-title":"CIKM","author":"S Yoon","year":"1999","unstructured":"Yoon S, Henschen L, Park E, Makki S: Using Domain Knowledge in Knowledge Discovery. CIKM. 1999, 243-250."},{"key":"102_CR30","volume-title":"Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining","author":"MJ Zaki","year":"2000","unstructured":"Zaki MJ: Generating non-redundant association rules. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 2000"},{"issue":"1","key":"102_CR31","first-page":"69","volume":"25","author":"BG Buchanan","year":"2004","unstructured":"Buchanan BG, Livingston GR: Toward Automated Discovery in the Biological Sciences. AI Magazine. 2004, 25 (1): 69-84.","journal-title":"AI Magazine"},{"key":"102_CR32","unstructured":"Creese G: Duo-Mining -Combining Data and Text Mining. DMReview.com. (September 16, 2004), [http:\/\/www.dmreview.com\/article_sub.cfm?articleId=1010449]"},{"key":"102_CR33","volume-title":"SIGMOD","author":"D Tsur","year":"1998","unstructured":"Tsur D, Ullman JD, Abiteboul S, Clifton C, Motwani R, Nestorov S, Rosenthal A: Query Flocks: A Generalization of Association-Rule Mining. SIGMOD. 1998"},{"issue":"6","key":"102_CR34","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1197\/jamia.M1305","volume":"10","author":"V Maojo","year":"2003","unstructured":"Maojo V, Kulikowski CA: Bioinformatics and medical informatics: collaborations on the road to genomic medicine?. J Am Med Inform Assoc. 2003, 10 (6): 515-22.","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"102_CR35","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1002\/asi.10389","volume":"55","author":"P Srinivasan","year":"2004","unstructured":"Srinivasan P: Text Mining: Generating Hypotheses from MEDLINE. JASIST. 2004, 55 (5): 396-413.","journal-title":"JASIST"},{"issue":"4","key":"102_CR36","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1145\/604596.604597","volume":"2","author":"MD Gordon","year":"2002","unstructured":"Gordon MD, Lindsay R, Fan W: Literature-based discovery on the WWW. ACM Transactions on Internet Technology (TOIT). 2002, 2 (4): 262-275.","journal-title":"ACM Transactions on Internet Technology (TOIT)"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1472-6947-6-13.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1472-6947-6-13\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1472-6947-6-13","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1472-6947-6-13.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,3]],"date-time":"2024-02-03T08:06:50Z","timestamp":1706947610000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/1472-6947-6-13"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,3,7]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["102"],"URL":"https:\/\/doi.org\/10.1186\/1472-6947-6-13","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,3,7]]},"assertion":[{"value":"27 July 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 March 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 March 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"13"}}