{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:29:35Z","timestamp":1750307375458,"version":"3.41.0"},"reference-count":21,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2010,5,27]],"date-time":"2010-05-27T00:00:00Z","timestamp":1274918400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGKDD Explor. Newsl."],"published-print":{"date-parts":[[2010,5,27]]},"abstract":"<jats:p>We organized the KDD cup 2009 around a marketing problem with the goal of identifying data mining techniques capable of rapidly building predictive models and scoring new entries on a large database. Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offered to participants an opportunity to work on a large marketing database from the French Telecom company Orange. The tasks were to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades\/addons proposed to them to make the sale more profitable (upselling). The challenge, which lasted from March 10 to May 11, 2009, attracted over 450 participants from 46 countries. We attribute its popularity to several factors: (1) A generic problem relevant to the Industry (a classification problem), but presenting a number of scientific and technical challenges, including many missing values (about 60%), a large number of features (15000) and a large number of training examples (50000), unbalanced class proportions (fewer than 10% of the examples of the positive class), noisy data, and the presence of categorical variables with many different values. (2) Prizes (Orange offers 10000 Euros in prizes). (3) A well designed protocol and web site (we benefitted from past experience). (4) An effective advertising campaign using mailings and a teleconference to answer potential participants questions. The results of the challenge were discussed at the KDD conference (June 28, 2009). The principal conclusions are that ensemble methods are very effective and that ensemble of decision trees offer off-the-shelf solutions to problems with large numbers of samples and attributes, mixed types of variables, and lots of missing values. The data and the platform of the challenge remain available for research and educational purposes at http:\/\/www.kddcup-orange.com\/.<\/jats:p>","DOI":"10.1145\/1809400.1809414","type":"journal-article","created":{"date-parts":[[2010,6,1]],"date-time":"2010-06-01T12:21:35Z","timestamp":1275394895000},"page":"68-76","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Design and analysis of the KDD cup 2009"],"prefix":"10.1145","volume":"11","author":[{"given":"Isabelle","family":"Guyon","sequence":"first","affiliation":[{"name":"Clopinet, Berkeley, California"}]},{"given":"Vincent","family":"Lemaire","sequence":"additional","affiliation":[{"name":"Orange Labs, Lannion, France"}]},{"given":"Marc","family":"Boull\u00e9","sequence":"additional","affiliation":[{"name":"Orange Labs, Lannion, France"}]},{"given":"Gideon","family":"Dror","sequence":"additional","affiliation":[{"name":"Academic College of TelAvivYaffo, Tel Aviv, Israel"}]},{"given":"David","family":"Vogel","sequence":"additional","affiliation":[{"name":"Data Mining Solutions, Orlando, Florida"}]}],"member":"320","published-online":{"date-parts":[[2010,5,27]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"267","volume-title":"2nd International Symposium on Information Theory","author":"Akaike H.","year":"1973","unstructured":"H. Akaike . Information theory and an extension of the maximum likelihood principle. In B. Petrov and F. Csaki, editors , 2nd International Symposium on Information Theory , pages 267 -- 281 . Akademia Kiado, Budapest , 1973 . H. Akaike. Information theory and an extension of the maximum likelihood principle. In B. Petrov and F. Csaki, editors, 2nd International Symposium on Information Theory, pages 267--281. Akademia Kiado, Budapest, 1973."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/130385.130401"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/1314498.1314554"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018054314350"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.76"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015432"},{"key":"e_1_2_1_8_1","unstructured":"Clopinet. Challenges in machine learning.  Clopinet. Challenges in machine learning."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1248547.1248548"},{"key":"e_1_2_1_10_1","first-page":"2009","volume-title":"JMLR W&CP","volume":"7","author":"F\u00e9raud R.","year":"2009","unstructured":"R. F\u00e9raud , M. Boull\u00e9 , F. Cl\u00e9rot , F. Fessant , and V. Lemaire . The orange customer analysis platform . In JMLR W&CP , volume 7 , KDD cup 2009 , Paris , 2009 . R. F\u00e9raud, M. Boull\u00e9, F. Cl\u00e9rot, F. Fessant, and V. Lemaire. The orange customer analysis platform. In JMLR W&CP, volume 7, KDD cup 2009, Paris, 2009."},{"key":"e_1_2_1_11_1","first-page":"148","volume-title":"ICML","author":"Freund Y.","year":"1996","unstructured":"Y. Freund and R. E. Schapire . Experiments with a new boosting algorithm . In ICML , pages 148 -- 156 , 1996 . Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In ICML, pages 148--156, 1996."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1013203451"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1937.10503522"},{"key":"e_1_2_1_14_1","volume-title":"IEEE\/INNS conference IJCNN 2006","author":"Guyon I.","year":"2006","unstructured":"I. Guyon , A. Saffari , G. Dror , and J. Buhmann . Performance prediction challenge . In IEEE\/INNS conference IJCNN 2006 , Vancouver, Canada , July 16-21 2006 . I. Guyon, A. Saffari, G. Dror, and J. Buhmann. Performance prediction challenge. In IEEE\/INNS conference IJCNN 2006, Vancouver, Canada, July 16-21 2006."},{"key":"e_1_2_1_15_1","volume-title":"Data Mining, Inference and Prediction","author":"Hastie T.","year":"2000","unstructured":"T. Hastie , R. Tibshirani , and J. Friedman . The Elements of Statistical Learning , Data Mining, Inference and Prediction . Springer Verlag , 2000 . T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Data Mining, Inference and Prediction. Springer Verlag, 2000."},{"key":"e_1_2_1_16_1","first-page":"2009","volume-title":"JMLR W&CP","volume":"7","author":"Research IBM","year":"2009","unstructured":"IBM Research . Winning the KDD cup orange challenge with ensemble selection . In JMLR W&CP , volume 7 , KDD cup 2009 , Paris , 2009 . IBM Research. Winning the KDD cup orange challenge with ensemble selection. In JMLR W&CP, volume 7, KDD cup 2009, Paris, 2009."},{"key":"e_1_2_1_17_1","first-page":"2009","volume-title":"JMLR W&CP","volume":"7","author":"Lo H.-Y.","year":"2009","unstructured":"H.-Y. Lo An ensemble of three classifiers for KDD cup 2009: Expanded linear model, heterogeneous boosting, and selective na\u00efve Bayes . In JMLR W&CP , volume 7 , KDD cup 2009 , Paris , 2009 . H.-Y. Lo et al. An ensemble of three classifiers for KDD cup 2009: Expanded linear model, heterogeneous boosting, and selective na\u00efve Bayes. In JMLR W&CP, volume 7, KDD cup 2009, Paris, 2009."},{"key":"e_1_2_1_18_1","first-page":"2009","volume-title":"JMLR W&CP","volume":"7","author":"Miller H.","year":"2009","unstructured":"H. Miller Predicting customer behaviour: The University of Melbourne's KDD cup report . In JMLR W&CP , volume 7 , KDD cup 2009 , Paris , 2009 . H. Miller et al. Predicting customer behaviour: The University of Melbourne's KDD cup report. In JMLR W&CP, volume 7, KDD cup 2009, Paris, 2009."},{"key":"e_1_2_1_19_1","volume-title":"Machine Learning","author":"Mitchell T.","year":"1997","unstructured":"T. Mitchell . Machine Learning . McGraw-Hill Co., Inc. , New York , 1997 . T. Mitchell. Machine Learning. McGraw-Hill Co., Inc., New York, 1997."},{"key":"e_1_2_1_21_1","volume-title":"Statistical Learning Theory","author":"Vapnik V.","year":"1998","unstructured":"V. Vapnik . Statistical Learning Theory . John Wiley and Sons , N.Y. , 1998 . V. Vapnik. Statistical Learning Theory. John Wiley and Sons, N.Y., 1998."},{"key":"e_1_2_1_22_1","first-page":"2009","volume-title":"JMLR W&CP","volume":"7","author":"Xie J.","year":"2009","unstructured":"J. Xie A combination of boosting and bagging for KDD cup 2009 - fast scoring on a large database . In JMLR W&CP , volume 7 , KDD cup 2009 , Paris , 2009 . J. Xie et al. A combination of boosting and bagging for KDD cup 2009 - fast scoring on a large database. In JMLR W&CP, volume 7, KDD cup 2009, Paris, 2009."}],"container-title":["ACM SIGKDD Explorations Newsletter"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1809400.1809414","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1809400.1809414","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T11:23:04Z","timestamp":1750245784000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1809400.1809414"}},"subtitle":["fast scoring on a large orange customer database"],"short-title":[],"issued":{"date-parts":[[2010,5,27]]},"references-count":21,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2010,5,27]]}},"alternative-id":["10.1145\/1809400.1809414"],"URL":"https:\/\/doi.org\/10.1145\/1809400.1809414","relation":{},"ISSN":["1931-0145","1931-0153"],"issn-type":[{"type":"print","value":"1931-0145"},{"type":"electronic","value":"1931-0153"}],"subject":[],"published":{"date-parts":[[2010,5,27]]},"assertion":[{"value":"2010-05-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}