{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T06:20:43Z","timestamp":1771482043681,"version":"3.50.1"},"reference-count":47,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2018,12,20]],"date-time":"2018-12-20T00:00:00Z","timestamp":1545264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Machine learning competitions such as those organized by Kaggle or KDD represent a useful benchmark for data science research. In this work, we present our winning solution to the Game Data Mining competition hosted at the 2017 IEEE Conference on Computational Intelligence and Games (CIG 2017). The contest consisted of two tracks, and participants (more than 250, belonging to both industry and academia) were to predict which players would stop playing the game, as well as their remaining lifetime. The data were provided by a major worldwide video game company, NCSoft, and came from their successful massively multiplayer online game Blade and Soul. Here, we describe the long short-term memory approach and conditional inference survival ensemble model that made us win both tracks of the contest, as well as the validation procedure that we followed in order to prevent overfitting. In particular, choosing a survival method able to deal with censored data was crucial to accurately predict the moment in which each player would leave the game, as censoring is inherent in churn. The selected models proved to be robust against evolving conditions\u2014since there was a change in the business model of the game (from subscription-based to free-to-play) between the two sample datasets provided\u2014and efficient in terms of time cost. Thanks to these features and also to their ability to scale to large datasets, our models could be readily implemented in real business settings.<\/jats:p>","DOI":"10.3390\/make1010016","type":"journal-article","created":{"date-parts":[[2018,12,20]],"date-time":"2018-12-20T12:54:36Z","timestamp":1545310476000},"page":"252-264","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["The Winning Solution to the IEEE CIG 2017 Game Data Mining Competition"],"prefix":"10.3390","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0480-1131","authenticated-orcid":false,"given":"Anna","family":"Guitart","sequence":"first","affiliation":[{"name":"Yokozuna Data, a Keywords Studio, 102-0074 Tokyo, Japan"}]},{"given":"Pei Pei","family":"Chen","sequence":"additional","affiliation":[{"name":"Yokozuna Data, a Keywords Studio, 102-0074 Tokyo, Japan"}]},{"given":"\u00c1frica","family":"Peri\u00e1\u00f1ez","sequence":"additional","affiliation":[{"name":"Yokozuna Data, a Keywords Studio, 102-0074 Tokyo, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2018,12,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Peri\u00e1 nez, \u00c1., Saas, A., Guitart, A., and Magne, C. (2016, January 17\u201319). Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles. Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada.","DOI":"10.1109\/DSAA.2016.84"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Bertens, P., Guitart, A., and Peri\u00e1 nez, \u00c1. (2017, January 22\u201325). Games and Big Data: A Scalable Multi-Dimensional Churn Prediction Model. Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), New York, NY, USA.","DOI":"10.1109\/CIG.2017.8080412"},{"key":"ref_3","unstructured":"(2018, December 10). Game Data Mining Competition 2017. Available online: https:\/\/cilab.sejong.ac.kr\/gdmc2017\/."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1126\/science.1127647","article-title":"Reducing the dimensionality of data with neural networks","volume":"313","author":"Hinton","year":"2006","journal-title":"Science"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000006","article-title":"Learning deep architectures for ai","volume":"2","author":"Bengio","year":"2009","journal-title":"Found. Trends Mach. Learn."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: A review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","unstructured":"Stober, S., Sternin, A., Owen, A.M., and Grahn, J.A. (arXiv, 2015). Deep feature learning for EGG recordings, arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Deng, L., Seltzer, M.L., Yu, D., Acero, A., Mohamed, A.-R., and Hinton, G. (2010, January 15\u201319). Binary coding of speech spectrograms using a deep auto-encoder. Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Graz, Austria.","DOI":"10.21437\/Interspeech.2010-487"},{"key":"ref_9","unstructured":"Li, J., Luong, M.-T., and Jurafsky, D. (December, January 27). A hierarchical neural autoencoder for paragraphs and documents. Proceedings of the 7th International Joint Conference on Natural Language Processing, Taipei, Taiwan."},{"key":"ref_10","unstructured":"Larsen, A.B.L., S\u00f8nderby, S.K., Larochelle, H., and Winther, O. (2015, January 19\u201324). Autoencoding beyond pixels using a learned similarity metric. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_11","unstructured":"Bengio, Y., Courville, A.C., and Vincent, P. (2018, December 15). Unsupervised feature learning and deep learning: A review and new perspectives. Available online: https:\/\/pdfs.semanticscholar.org\/f8c8\/619ea7d68e604e40b814b40c72888a755e95.pdf."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Janitza, S., Strobl, C., and Boulesteix, A.-L. (2013). An auc-based permutation variable importance measure for random forests. BMC Bioinform., 14.","DOI":"10.1186\/1471-2105-14-119"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1080\/01621459.1963.10500855","article-title":"Problems in the analysis of survey data, and a proposal","volume":"58","author":"Morgan","year":"1963","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_14","unstructured":"Breiman, L., Friedman, J., Stone, C.J., and Olshen, R.A. (1984). Classification and Regression Trees, CRC Press."},{"key":"ref_15","unstructured":"Quinlan, J.R. (2014). C4. 5: Programs for Machine Learning, Elsevier."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1016\/S0167-9473(01)00067-6","article-title":"Effect of pruning and early stopping on performance of a boosting ensemble","volume":"38","author":"Drucker","year":"2002","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1007\/BF00993345","article-title":"Trading accuracy for simplicity in decision trees","volume":"15","author":"Bohanec","year":"1994","journal-title":"Mach. Learn."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/BF00993473","article-title":"Bias and the quantification of stability","volume":"20","author":"Turney","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/j.aca.2011.12.069","article-title":"Identification of human protein complexes from local sub-graphs of protein\u2013protein interaction network based on random forest with topological structure features","volume":"718","author":"Li","year":"2012","journal-title":"Anal. Chim. Acta"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1423","DOI":"10.1021\/ac048561m","article-title":"Boosting partial least squares","volume":"77","author":"Zhang","year":"2005","journal-title":"Anal. Chem."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1021\/ci060164k","article-title":"Random forest models to predict aqueous solubility","volume":"47","author":"Palmer","year":"2007","journal-title":"J. Chem. Iinf. Model."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kretowska, M. (2014, January 13\u201317). Comparison of tree-based ensembles in application to censored data. Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland.","DOI":"10.1007\/978-3-319-07173-2_47"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","article-title":"Extremely randomized trees","volume":"63","author":"Geurts","year":"2006","journal-title":"Mach. Learn."},{"key":"ref_25","unstructured":"Geurts, P., and Louppe, G. (2011, January 11\u201313). Learning to rank with extremely randomized trees. Proceedings of the JMLR: Workshop and Conference Proceedings, Fort Lauderdale, FL, USA."},{"key":"ref_26","unstructured":"Bertens, P., Guitart, A., and Peri\u00e1 nez, \u00c1. (arXiv, 2004). A Machine-Learning Item Recommendation System for Video Games, arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1198\/106186006X133933","article-title":"Unbiased recursive partitioning: A conditional inference framework","volume":"15","author":"Hothorn","year":"2006","journal-title":"J. Comput. Graph. Stat."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1198\/106186008X319331","article-title":"Model-based recursive partitioning","volume":"17","author":"Zeileis","year":"2008","journal-title":"J. Comput. Graph. Stat."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1093\/biostatistics\/kxj011","article-title":"Survival ensembles","volume":"7","author":"Hothorn","year":"2005","journal-title":"Biostatistics"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1214\/08-AOAS169","article-title":"Random survival forests","volume":"2","author":"Ishwaran","year":"2008","journal-title":"Ann. Appl. Stat"},{"key":"ref_32","first-page":"44","article-title":"A review of survival trees","volume":"5","author":"Larocque","year":"2011","journal-title":"Stat. Surv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Graves, A., Mohamed, A.-R., and Hinton, G. (2013, January 26\u201331). Speech recognition with deep recurrent neural networks. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp), Vancouver, British.","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"ref_35","unstructured":"Sutskever, I., Vinyals, O., and Le, Q.V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, Mit Press."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gers, F.A., Schmidhuber, J., and Cummins, F. (1999, January 7\u201310). Learning to forget: Continual prediction with LSTM. Proceedings of the 9th International Conference on Artificial Neural Networks: ICANN \u201999, Edinburgh, UK.","DOI":"10.1049\/cp:19991218"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1109\/TASLP.2016.2520371","article-title":"Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval","volume":"24","author":"Palangi","year":"2016","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"602","DOI":"10.1016\/j.neunet.2005.06.042","article-title":"Framewise phoneme classification with bidirectional LSTM and other neural network architectures","volume":"18","author":"Graves","year":"2005","journal-title":"Neural Netw."},{"key":"ref_39","unstructured":"Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (arXiv, 2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. NIPS 2014 Workshop on Deep Learning, arXiv."},{"key":"ref_40","unstructured":"Arai, K., Kapoor, S., and Bhatia, R. (2018). Forecasting Player Behavioral Data and Simulating In-Game Events. Advances in Information and Communication Networks. Future of Information and Communication Conference (FICC). Advances in Intelligent Systems and Computing, Springer."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chen, P.P., Guitart, A., Peri\u00e1nez, \u00c1., and Fern\u00e1ndez del R\u00edo, A. (2018, January 10\u201313). Customer lifetime value in video games using deep learning and parametric models. Proceedings of the IEEE International Conference on Big Data, Seattle, WA, USA.","DOI":"10.1109\/BigData.2018.8622151"},{"key":"ref_42","unstructured":"Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.R. (arXiv, 2012). Improving neural networks by preventing co-adaptation of feature detectors, arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","article-title":"Cross-validatory choice and assessment of statistical predictions","volume":"36","author":"Stone","year":"1974","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_44","first-page":"2079","article-title":"On over-fitting in model selection and subsequent selection bias in performance evaluation","volume":"11","author":"Cawley","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1016\/S0031-3203(96)00142-2","article-title":"The use of the area under the roc curve in the evaluation of machine learning algorithms","volume":"30","author":"Bradley","year":"1997","journal-title":"Pattern Recognit."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2529","DOI":"10.1002\/(SICI)1097-0258(19990915\/30)18:17\/18<2529::AID-SIM274>3.0.CO;2-5","article-title":"Assessment and comparison of prognostic classification schemes for survival data","volume":"18","author":"Graf","year":"1999","journal-title":"Stat. Med."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v050.i11","article-title":"Evaluating random forests for survival analysis using prediction error curves","volume":"50","author":"Mogensen","year":"2012","journal-title":"J. Stat. Softw."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/16\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:35:04Z","timestamp":1760196904000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/16"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,20]]},"references-count":47,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["make1010016"],"URL":"https:\/\/doi.org\/10.3390\/make1010016","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,20]]}}}