{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,18]],"date-time":"2026-02-18T01:17:35Z","timestamp":1771377455701,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,6,12]],"date-time":"2023-06-12T00:00:00Z","timestamp":1686528000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Football (also known as Soccer), boasts a staggering fan base of 3.5 billion individuals spread across 200 countries, making it the world\u2019s most beloved sport. The widespread adoption of advanced technology in sports has become increasingly prominent, empowering players, coaches, and team management to enhance their performance and refine team strategies. Among these advancements, player substitution plays a crucial role in altering the dynamics of a match. However, due to the absence of proven methods or software capable of accurately predicting substitutions, these decisions are often based on instinct rather than concrete data. The purpose of this research is to explore the potential of employing machine learning algorithms to predict substitutions in Football, and how it could influence the outcome of a match. This study investigates the effect of timely and tactical substitutions in football matches and their influence on the match results. Machine learning techniques such as Logistic Regression (LR), Decision tree (DT), K-nearest Neighbor (KNN), Support Vector Machine (SVM), Multinomial Na\u00efve Bayes (MNB), Random Forest (RF) classifiers were implemented and tested to develop models and to predict player substitutions. Relevant data was collected from the Kaggle dataset, which contains data of 51,738 substitutions from 9074 European league football matches in 5 leagues spanning 6 seasons. Machine learning models were trained and tested using an 80-20 data split and it was observed that RF model provided the best accuracy of over 70% and the best F1-score of 0.65 on the test set across all football leagues. SVM model achieved the best Precision of almost 0.8. However, the worst computation time of up to 2 min was consumed. LR showed some overfitting issues with 100% accuracy in the training set, but only 60% accuracy was obtained for the test set. To conclude, based on the time of substitution and match score-line, it was possible to predict the players who can be substituted, which can provide a match advantage. The achieved results provided an effective way to decide on player substitutions for both the team manager and coaches.<\/jats:p>","DOI":"10.3390\/bdcc7020117","type":"journal-article","created":{"date-parts":[[2023,6,12]],"date-time":"2023-06-12T02:28:42Z","timestamp":1686536922000},"page":"117","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Tactically Maximize Game Advantage by Predicting Football Substitutions Using Machine Learning"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-9183-2600","authenticated-orcid":false,"given":"Alex","family":"Mohandas","sequence":"first","affiliation":[{"name":"Enterprise SSD Division, Micron Technology, Bengaluru 560103, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7300-506X","authenticated-orcid":false,"given":"Mominul","family":"Ahsan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of York, Deramore Lane, York YO10 5GH, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7010-8285","authenticated-orcid":false,"given":"Julfikar","family":"Haider","sequence":"additional","affiliation":[{"name":"Department of Engineering, Manchester Metropolitan University, Chester Street, Manchester M1 5GD, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ritzer, G. (2012). The Wiley-Blackwell Encyclopedia of Globalization, Weily.","DOI":"10.1002\/9780470670590.wbeog260"},{"key":"ref_2","first-page":"e10200188","article-title":"The three and six-substitution rules in football: A preliminary comparative analysis in quantitative replacing, game statistics, win rate and winning probability","volume":"26","author":"Ribeiro","year":"2020","journal-title":"Mot. Rev. Educ. F\u00edsica"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Teoldo, I., Guilherme, J., and Garganta, J. (2021). Football Intelligence: Training and Tactics for Soccer Success, Routledge. [1st ed.].","DOI":"10.4324\/9781003223375"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"128","DOI":"10.14445\/22312803\/IJCTT-V48P126","article-title":"Supervised Machine Learning Algorithms: Classification and Comparison","volume":"48","author":"Osisanwo","year":"2017","journal-title":"Int. J. Comput. Trends Technol."},{"key":"ref_5","unstructured":"Anderson, C., and Sally, D. (2014). The Numbers Game. Why Everything You Know about Football Is Wrong, Penguin Books."},{"key":"ref_6","unstructured":"Hall, M.A. (2022, October 29). Correlation-Based Feature Selection of Discrete and Numeric Class Machine Learning. Available online: https:\/\/hdl.handle.net\/10289\/1024."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1831","DOI":"10.1080\/02640414.2014.898852","article-title":"Match Analysis in Football: A Systematic Review","volume":"32","author":"Sarmento","year":"2014","journal-title":"J. Sport. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1016\/S0378-4371(02)01030-0","article-title":"Football goal distributions and extremal statistics","volume":"316","author":"Greenhough","year":"2002","journal-title":"Phys. A Stat. Mech. Its Appl."},{"key":"ref_9","first-page":"301","article-title":"The Effect of Substitutions on Team Tactical Behaviour in Professional Soccer","volume":"93","author":"Rein","year":"2020","journal-title":"Res. Q. Exerc. Sport"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Carrilho, D., Couceiro, M.S., Brito, J., Figueiredo, P., Lopes, R.J., and Ara\u00fajo, D. (2020). Using Optical Tracking System Data to Measure Team Synergic Behaviour: Synchronization of Player-Ball-Goal Angles in a Football Match. Sensors, 20.","DOI":"10.3390\/s20174990"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Chambers, J. (2008). Software for Data Analysis, Springer. Statistics and Computing.","DOI":"10.1007\/978-0-387-75936-4"},{"key":"ref_12","unstructured":"Raschka, S., and Mirjalili, V. (2017). Python Machine Learning: Machine Learning and Deep Learning with Python, Packt Publishing. [1st ed.]."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1080\/24748668.2015.11868835","article-title":"Timing and tactical analysis of player substitutions in the UEFA Champions League","volume":"15","author":"Rey","year":"2015","journal-title":"Int. J. Perform. Anal. Sport"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2331","DOI":"10.1519\/JSC.0000000000002147","article-title":"Influence of Tactical and Situational Variables on Offensive Sequences During Elite Football Matches","volume":"32","author":"Sarmento","year":"2018","journal-title":"J. Strength Cond. Res."},{"key":"ref_15","first-page":"1114","article-title":"Decision tree analysis on j48 algorithm for data mining","volume":"3","author":"Bhargava","year":"2013","journal-title":"Int. J. Adv. Res. Comput. Sci. Softw. Eng."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The WEKA data mining software: An update","volume":"11","author":"Hall","year":"2009","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Dijkhuis, T.B., Kempe, M., and Lemmink, K.A.P.M. (2021). Early Prediction of Physical Performance in Elite Soccer Matches\u2014A Machine Learning Approach to Support Substitutions. Entropy, 23.","DOI":"10.3390\/e23080952"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.1109\/TCSVT.2015.2455713","article-title":"Sentioscope: A Soccer Player Tracking System Using Model Field Particles","volume":"26","author":"Baysal","year":"2015","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"127532","DOI":"10.1016\/j.physa.2022.127532","article-title":"A physics-based algorithm to perform predictions in football leagues","volume":"600","author":"Stock","year":"2022","journal-title":"Phys. A Stat. Mech. Its Appl."},{"key":"ref_20","unstructured":"(2022, September 20). International Federation of Association Football. Available online: https:\/\/www.fifa.com\/."},{"key":"ref_21","unstructured":"(2022, October 23). Kaggle Soccer Analysis Dataset. Available online: https:\/\/www.kaggle.com\/code\/angps95\/soccer-analysis\/data."},{"key":"ref_22","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_23","first-page":"1","article-title":"pandas: A foundational Python library for data analysis and statistics","volume":"14","author":"McKinney","year":"2011","journal-title":"Python High Perform. Sci. Comput."},{"key":"ref_24","unstructured":"Franklin, M. (2008). Approaches and Methodologies in the Social Sciences: A Pluralist Perspective, Cambridge University Press."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"7732","DOI":"10.1038\/d41586-018-07196-1","article-title":"Why Jupyter is data scientists\u2019 computational notebook of choice","volume":"563","author":"Perkel","year":"2018","journal-title":"Nature"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1016\/j.procs.2019.11.177","article-title":"A Review on Data Cleansing Methods for Big Data","volume":"161","author":"Ridzuan","year":"2019","journal-title":"Procedia Comput. Sci."},{"key":"ref_27","first-page":"54","article-title":"Ethical considerations in research studies","volume":"23","author":"Connelly","year":"2014","journal-title":"Medsurg Nurs."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Santos, R.J., Bernardino, J., and Vieira, M. (2011, January 21\u201323). A data masking technique for data warehouses. Proceedings of the 15th Symposium on International Database Engineering & Applications, Lisboa, Portugal.","DOI":"10.1145\/2076623.2076632"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Medar, R., Rajpurohit, V.S., and Rashmi, B. (2017, January 17\u201318). Impact of Training and Testing Data Splits on Accuracy of Time Series Forecasting in Machine Learning. Proceedings of the International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India.","DOI":"10.1109\/ICCUBEA.2017.8463779"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"358","DOI":"10.2307\/1270048","article-title":"Applied Logistic Regression","volume":"34","author":"Cucchiara","year":"2012","journal-title":"Technometrics"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1016\/j.asoc.2016.07.007","article-title":"The role of decision tree representation in regression problems\u2014An evolutionary perspective","volume":"48","author":"Czajkowski","year":"2016","journal-title":"Appl. Soft Comput."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.neucom.2017.04.018","article-title":"An efficient instance selection algorithm for k nearest neighbor regression","volume":"251","author":"Song","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1137\/120875909","article-title":"Euclidean Distance Geometry and Applications","volume":"56","author":"Liberti","year":"2014","journal-title":"SIAM Rev."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Suthaharan, S. (2016). Machine Learning Models and Algorithms for Big Data Classification, Springer.","DOI":"10.1007\/978-1-4899-7641-3"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Webb, G.I., and Yu, X. (2004). AI 2004: Advances in Artificial Intelligence, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/b104336"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/s11749-016-0481-7","article-title":"A random forest guided tour","volume":"25","author":"Biau","year":"2016","journal-title":"Test"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhang, C., and Ma, Y. (2012). Ensemble Machine Learning: Methods and Applications, Springer Science & Business Media.","DOI":"10.1007\/978-1-4419-9326-7"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Yin, M., Vaughan, J.W., and Wallach, H. (2019, January 4\u20139). Understanding the effect of accuracy on trust in machine learning models. Proceedings of the 2019 Chi Conference on Human Factors in Computing Systems, Glasgow Scotland, UK.","DOI":"10.1145\/3290605.3300509"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Davis, J., and Goadrich, M. (2006, January 25\u201329). The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143874"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Lipton, Z.C., Elkan, C., and Narayanaswamy, B. (2014). Thresholding classifiers to maximize F1 score. arXiv.","DOI":"10.1007\/978-3-662-44851-9_15"},{"key":"ref_41","unstructured":"Susmaga, R. (2004). Intelligent Information Processing and Web Mining, Springer."},{"key":"ref_42","unstructured":"McKinney, W. (2012). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, O\u2019Reilly Media, Inc."},{"key":"ref_43","unstructured":"Mohandas, A. (2022). Predicting Substitutions During Football Match Using Machine Learning Models to Tactically Maximize Game Advantage. [Master\u2019s Thesis, University of York]."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/7\/2\/117\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:52:56Z","timestamp":1760125976000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/7\/2\/117"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,12]]},"references-count":43,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,6]]}},"alternative-id":["bdcc7020117"],"URL":"https:\/\/doi.org\/10.3390\/bdcc7020117","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,12]]}}}