{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T21:38:58Z","timestamp":1769549938838,"version":"3.49.0"},"reference-count":39,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,1,26]],"date-time":"2025-01-26T00:00:00Z","timestamp":1737849600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>In healthcare applications, often it is not possible to record sufficient data as required for deep learning or data-driven classification and feature detection systems due to the patient condition, various clinical or experimental limitations, or time constraints. On the other hand, data imbalance invalidates many of the test results crucial for clinical approvals. Generating synthetic (artificial or dummy) data has become a potential solution to address this issue. Such data should possess adequate information, properties, and characteristics to mimic the real-world data recorded in natural circumstances. Several methods have been proposed for this purpose, and results often show that adding surrogates improves the decision-making accuracy. This article evaluates the most recent surrogate data generation and data synthesis methods to investigate the effects of the number of surrogates on improving the classification results. It is shown that the data analysis\/classification results improve with an increasing number of surrogates, but this no longer continues after a certain number of surrogates. This achievement helps in deciding on the number of surrogates for each strategy, resulting in the alleviation of the computation cost.<\/jats:p>","DOI":"10.3390\/bdcc9020022","type":"journal-article","created":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T06:39:51Z","timestamp":1737959991000},"page":"22","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Evaluating the Effect of Surrogate Data Generation on Healthcare Data Assessment"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1446-5744","authenticated-orcid":false,"given":"Saeid","family":"Sanei","sequence":"first","affiliation":[{"name":"Electrical and Electronic Engineering Department, Imperial College London, London SW7 2AZ, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3450-4317","authenticated-orcid":false,"given":"Tracey K. M.","family":"Lee","sequence":"additional","affiliation":[{"name":"School of Information Technology, Monash University Australia, Malaysia Campus, Subang Jaya 47500, Selangor, Malaysia"},{"name":"School of Electrical and Electronic Engineering, Singapore Polytechnic, Singapore 139651, Singapore"}]},{"given":"Issam","family":"Boukhennoufa","sequence":"additional","affiliation":[{"name":"School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6699-8721","authenticated-orcid":false,"given":"Delaram","family":"Jarchi","sequence":"additional","affiliation":[{"name":"School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1030-8311","authenticated-orcid":false,"given":"Xiaojun","family":"Zhai","sequence":"additional","affiliation":[{"name":"School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6412-8519","authenticated-orcid":false,"given":"Klaus","family":"McDonald-Maier","sequence":"additional","affiliation":[{"name":"School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK"}]}],"member":"1968","published-online":{"date-parts":[[2025,1,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1429","DOI":"10.1007\/s10115-021-01560-w","article-title":"The impact of data difficulty factors on classification of imbalanced and concept drifting data streams","volume":"63","author":"Brzezinski","year":"2021","journal-title":"Knowl. Inf. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"14985","DOI":"10.1109\/ACCESS.2018.2886814","article-title":"Recent advances of generative adversarial networks in computer vision","volume":"7","author":"Cao","year":"2019","journal-title":"IEEE Access"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.cviu.2018.10.009","article-title":"Pros and cons of GAN evaluation measures","volume":"179","author":"Borji","year":"2019","journal-title":"Comput. Vis. Image Understand"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wen, Q., Sun, L., Yang, F., Song, X., Gao, J., Wang, X., and Xu, H. (2020). Time series data augmentation for deep learning: A survey. arXiv.","DOI":"10.24963\/ijcai.2021\/631"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.physrep.2018.06.001","article-title":"Surrogate data for hypothesis testing of physical systems","volume":"748","author":"Lancaster","year":"2018","journal-title":"Phys. Rep."},{"key":"ref_6","first-page":"1","article-title":"Testing for nonlinearity in time series: The method of surrogate data","volume":"58","author":"Theiler","year":"1991","journal-title":"Phys. D Nonlinear Phenom."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1103\/PhysRevLett.77.635","article-title":"Improved Surrogate Data for Nonlinearity Tests","volume":"77","author":"Schreiber","year":"1996","journal-title":"Phys. Rev. Lett."},{"key":"ref_8","unstructured":"Cui, Z., Chen, W., and Chen, Y. (2016). Multi-Scale Convolutional Neural Networks for Time Series Classification. arXiv."},{"key":"ref_9","unstructured":"Le Guennec, A., Malinowski, S., and Tavenard, R. (2016, January 19\u201323). Data Augmentation for time series classification using convolutional neural networks. Proceedings of the 2nd ECML\/PKDD Workshop on Advanced Analytics and Learning on Temporal Data, Riva Del Garda, Italy."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Rashid, K.M., and Louis, J. (2019, January 21\u201324). Window-Warping: A time series data augmentation of IMU data for construction equipment activity identification. Proceedings of the 36th International Symposium on Automation and Robotics in Construction (ISARC 2019), Banff, AB, Canada.","DOI":"10.22260\/ISARC2019\/0087"},{"key":"ref_11","unstructured":"Kruskal, B., and Liberman, M. (1983). The Symmetric Time-Warping Problem: From Continuous to Discrete, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley Publishing Company, INC."},{"key":"ref_12","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G. (2012, January 3\u20136). Classification with deep convolutional neural networks. Proceedings of the Conference on Neural Information Processing Systems (NIPS12), Lake Tahoe, NV, USA."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/s40537-016-0043-6","article-title":"A survey of transfer learning","volume":"3","author":"Weiss","year":"2016","journal-title":"Big Data"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lee, T.K.M., Chan, H.W., Leo, K.-H., Chew, E., Zhao, L., and Sanei, S. (2019, January 2\u20136). Surrogate rehabilitative time series data for image-based deep learning. Proceedings of the European Signal Processing Conference EUSIPCO 2019, A Coruna, Spain.","DOI":"10.23919\/EUSIPCO.2019.8903012"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Aldrich, C. (2023). A Comparative Analysis of Image Encoding of Time Series for Anomaly Detection. Time Series Analysis\u2014Recent Advances, New Perspectives and Applications, IntechOpen.","DOI":"10.5772\/intechopen.1002535"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Byeon, Y.H., Pan, S.B., and Kwak, K.C. (2019). Intelligent deep models based on scalograms of electrocardiogram signals for biometrics. Sensors, 19.","DOI":"10.3390\/s19040935"},{"key":"ref_17","first-page":"941","article-title":"Predict Forex Trend via Convolutional Neural Networks","volume":"29","author":"Tsai","year":"2020","journal-title":"Intell. Syst."},{"key":"ref_18","unstructured":"Wang, Z., and Oates, T. (2015, January 25\u201330). Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks. Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lee, T.K.M., Chan, H.W., Leo, K.-H., Chew, E., Zhao, L., and Sanei, S. (2020, January 23\u201325). Surrogate Data for Deep Learning Architectures in Rehabilitative Edge Systems. Proceedings of the 24th Conference on Signal Processing: Algorithms, Architectures, Arrangements, and Applications, SPA 2020, Pozna\u0144, Poland.","DOI":"10.23919\/SPA50552.2020.9241275"},{"key":"ref_20","first-page":"64","article-title":"Interpreting Action Research Arm Test Assessment Scores to Plan Treatment","volume":"39","author":"Grattan","year":"2019","journal-title":"OTJR"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Golyandina, N., and Zhigljavski, A. (2020). Singular Spectrum Analysis for Time Series, Springer. [2nd ed.].","DOI":"10.1007\/978-3-662-62436-4"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lee, T.K.M., Chan, H.W., Leo, K.-H., Chew, E., Zhao, L., and Sanei, S. (2022, January 21\u201322). Improving Rehabilitative Assessment with Statistical and Shape Preserving Surrogate Data and Singular Spectrum Analysis. Proceedings of the IEEE International Conference on Signal Processing Algorithms, Architecture, Arrangements, and Applications, SPA 2022, Poznan, Poland.","DOI":"10.23919\/SPA53010.2022.9927805"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3373","DOI":"10.1175\/1520-0442(1996)009<3373:MCSDIO>2.0.CO;2","article-title":"Monte Carlo SSA: Detecting irregular oscillations in the Presence of Colored Noise","volume":"9","author":"Allen","year":"1996","journal-title":"J. Clim."},{"key":"ref_24","first-page":"1875","article-title":"On relevant dimensions in kernel feature spaces","volume":"9","author":"Braun","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Iwana, B.K., and Uchida, S. (2021). An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE, 16.","DOI":"10.1371\/journal.pone.0254841"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1736","DOI":"10.1016\/j.clinph.2012.02.062","article-title":"In vivo neuronal firing patterns during human epileptiform discharges replicated by electrical stimulation","volume":"123","author":"Martinez","year":"2012","journal-title":"Clin. Neurophysiol."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Belay, M.A., Blakseth, S.S., Rasheed, A., and Salvo Rossi, P. (2023). Unsupervised anomaly detection for IoT-based multivariate time series: Existing solutions, performance analysis and future directions. Sensors, 23.","DOI":"10.3390\/s23052844"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lee, T.K.M., Chan, H.W., Leo, K.-H., Chew, E., Zhao, L., and Sanei, S. (2023, January 2\u20135). Intrinsic properties of human accelerometer data for machine learning. Proceedings of the IEEE Workshop on Statistical Signal Processing, SSP 2023, Hanoi, Vietnam.","DOI":"10.1109\/SSP53291.2023.10207963"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lee, T.K.M., Chan, H.W., Leo, K.-H., Chew, E., Zhao, L., and Sanei, S. (2023, January 15\u201317). Fidelitous augmentation of human accelerometric data for deep learning. Proceedings of the IEEE International Conference on E-Health Networking, Application, and Services, Healthcom 2023, Chongqing, China.","DOI":"10.1109\/Healthcom56612.2023.10472398"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1821","DOI":"10.1007\/s40279-017-0716-0","article-title":"Accelerometer data collection and processing criteria to assess physical activity and other outcomes: A systematic review and practical considerations","volume":"47","author":"Migueles","year":"2017","journal-title":"Sports Med."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1093\/biomet\/71.3.599","article-title":"Testing for unit roots in autoregressive moving average models of unknown order","volume":"71","author":"Said","year":"1984","journal-title":"Biometrika"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1016\/0304-4076(92)90104-Y","article-title":"Testing the null hypothesis of stationarity against the alternative of a unit root","volume":"54","author":"Kwiatkowski","year":"1992","journal-title":"J. Econom."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1111\/j.1467-9892.1983.tb00373.x","article-title":"Diagnostic checking ARMA time series models using squared residual autocorrelations","volume":"4","author":"McLeod","year":"1983","journal-title":"J. Time Ser. Anal."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2676","DOI":"10.1109\/TNSRE.2023.3283045","article-title":"TS-SGAN\u2014An approach to generate heterogeneous time series data for post-stroke rehabilitation assessment","volume":"31","author":"Boukhennoufa","year":"2023","journal-title":"IEEE Trans. Neural Syst. Rehabil. Eng."},{"key":"ref_35","unstructured":"Allahyani, M., Alsulami, R., Alwafi, T., Alafif, T., Ammar, H., Sabban, S., and Chen, X. (2017). SD2GAN: A Siamese dual discriminator generative adversarial network for mode collapse reduction. arXiv."},{"key":"ref_36","unstructured":"Weiss, G. (2019). WISDM Smartphone and Smartwatch Activity and Biometrics Dataset, UCI Machine Learning Repository."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1257\/jel.37.3.1067","article-title":"Nash equilibrium and the history of economic theory","volume":"37","author":"Myerson","year":"1999","journal-title":"J. Econ. Lit."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wang, H., Xuan, J., and Leung, D.Y.C. (2020). Powering future body sensor network systems: A review of power sources. Biosens. Bioelectron., 166.","DOI":"10.1016\/j.bios.2020.112410"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1016\/j.ins.2019.11.004","article-title":"Data imbalance in classification: Experimental evaluation","volume":"513","author":"Thabtah","year":"2020","journal-title":"Inf. Sci."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/2\/22\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T10:36:25Z","timestamp":1759919785000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/2\/22"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,26]]},"references-count":39,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["bdcc9020022"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9020022","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,26]]}}}