{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T18:29:57Z","timestamp":1764700197810,"version":"3.41.0"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2007,12,1]],"date-time":"2007-12-01T00:00:00Z","timestamp":1196467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2007,12]]},"abstract":"<jats:p>The problem of assessing the significance of data mining results on high-dimensional 0--1 datasets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by standard statistical tests such as chi-square, or other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are difficult to apply to sets of patterns or other complex results of data mining algorithms. In this article, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins as the given dataset, computing the results of interest on the randomized instances and comparing them to the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, and spectral analysis. To generate random datasets with given margins, we use variations of a Markov chain approach which is based on a simple swap operation. We give theoretical results on the efficiency of different randomization methods, and apply the swap randomization method to several well-known datasets. Our results indicate that for some datasets the structure discovered by the data mining algorithms is expected, given the row and column margins of the datasets, while for other datasets the discovered structure conveys information that is not captured by the margin counts.<\/jats:p>","DOI":"10.1145\/1297332.1297338","type":"journal-article","created":{"date-parts":[[2007,12,7]],"date-time":"2007-12-07T19:19:01Z","timestamp":1197055141000},"page":"14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":148,"title":["Assessing data mining results via swap randomization"],"prefix":"10.1145","volume":"1","author":[{"given":"Aristides","family":"Gionis","sequence":"first","affiliation":[{"name":"Yahoo! Research, Barcelona, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heikki","family":"Mannila","sequence":"additional","affiliation":[{"name":"University of Helsinki and Helsinki University of Technology, Helsinki, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Taneli","family":"Mielik\u00e4inen","sequence":"additional","affiliation":[{"name":"Nokia Research Center, Palo Alto, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Panayiotis","family":"Tsaparas","sequence":"additional","affiliation":[{"name":"Search Labs, Microsoft Research, Mountain View, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2007,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Besag J. 2004. Markov chain Monte Carlo methods for statistical inference. http:\/\/www.ims.nus.edu.sg\/Programs\/mcmc\/files\/besag_tl.pdf.  Besag J. 2004. Markov chain Monte Carlo methods for statistical inference. http:\/\/www.ims.nus.edu.sg\/Programs\/mcmc\/files\/besag_tl.pdf.","DOI":"10.1007\/978-1-4419-9017-4_11"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/76.4.633"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/78.2.301"},{"volume-title":"Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, 414--423","author":"Bez\u00e1kov\u00e1 I.","key":"e_1_2_1_4_1","unstructured":"Bez\u00e1kov\u00e1 , I. , Bhatnagar , N. , and Vigoda , E . 2006. Sampling binary contingency tables with a greedy start . In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, 414--423 . Bez\u00e1kov\u00e1, I., Bhatnagar, N., and Vigoda, E. 2006. Sampling binary contingency tables with a greedy start. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, 414--423."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Bez\u00e1kov\u00e1 I. Sinclair A. Stefankovic D. and Vigoda E. 2006. Negative examples for sequential importance sampling of binary contingency tables. http:\/\/arxiv.org\/abs\/math.ST\/0606650.  Bez\u00e1kov\u00e1 I. Sinclair A. Stefankovic D. and Vigoda E. 2006. Negative examples for sequential importance sampling of binary contingency tables. http:\/\/arxiv.org\/abs\/math.ST\/0606650.","DOI":"10.1007\/11841036_15"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/312129.312241"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/253260.253327"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1055558.1055580"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214504000001303"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1080\/00029890.2003.11919964"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Diaconis P. and Gangolli A. 1995. Rectangular arrays with fixed margins. In Discrete Probability and Algorithms 15--41.  Diaconis P. and Gangolli A. 1995. Rectangular arrays with fixed margins. In Discrete Probability and Algorithms 15--41.","DOI":"10.1007\/978-1-4612-0801-3_3"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/502512.502526"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/780542.780643"},{"key":"e_1_2_1_15_1","unstructured":"Fortelius M. 2006. Neogene of the old world database of fossil mammals (NOW). http:\/\/www.helsinki.fi\/science\/now\/.  Fortelius M. 2006. Neogene of the old world database of fossil mammals (NOW). http:\/\/www.helsinki.fi\/science\/now\/."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3235-1"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/57.1.97"},{"volume-title":"Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), 253--265","author":"Jaroszewicz S.","key":"e_1_2_1_18_1","unstructured":"Jaroszewicz , S. and Simovici , D. A . 2001. A general measure of rule interestingness . In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), 253--265 . Jaroszewicz, S. and Simovici, D. A. 2001. A general measure of rule interestingness. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), 253--265."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth163"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/312129.312216"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/502512.502560"},{"volume-title":"Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD)","author":"Megiddo N.","key":"e_1_2_1_22_1","unstructured":"Megiddo , N. and Srikant , R . 1998. Discovering predictive association rules . In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD) , New York, 274--278. Megiddo, N. and Srikant, R. 1998. Discovering predictive association rules. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD), New York, 274--278."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.1699114"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 2nd Workshop on Privacy Preserving Data Mining (PPDM), IEEE Computer Society, 18--23","author":"Mielik\u00e4inen T.","year":"2003","unstructured":"Mielik\u00e4inen , T. 2003 . On inverse frequent set mining . In Proceedings of the 2nd Workshop on Privacy Preserving Data Mining (PPDM), IEEE Computer Society, 18--23 . Mielik\u00e4inen, T. 2003. On inverse frequent set mining. In Proceedings of the 2nd Workshop on Privacy Preserving Data Mining (PPDM), IEEE Computer Society, 18--23."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.298.5594.824"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1137\/S003614450342480"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.4153\/CJM-1957-044-3"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1511\/2000.4.332"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02294482"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775053"},{"key":"e_1_2_1_31_1","unstructured":"Tomkins A. 2006. Private communication.  Tomkins A. 2006. Private communication."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0012-365X(97)00197-0"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150451"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-007-5006-x"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014090"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1297332.1297338","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1297332.1297338","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T15:14:02Z","timestamp":1750259642000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1297332.1297338"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,12]]},"references-count":34,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2007,12]]}},"alternative-id":["10.1145\/1297332.1297338"],"URL":"https:\/\/doi.org\/10.1145\/1297332.1297338","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2007,12]]},"assertion":[{"value":"2007-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}