{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T12:04:25Z","timestamp":1776081865450,"version":"3.50.1"},"reference-count":103,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2025,1,2]],"date-time":"2025-01-02T00:00:00Z","timestamp":1735776000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Patient-level grouped data are prevalent in public health and medical fields, and multiple instance learning (MIL) offers a framework to address the challenges associated with this type of data structure. This study compares four data aggregation methods designed to tackle the grouped structure in classification tasks: post-mean, post-max, post-min, and pre-mean aggregation. We developed a customized AI pipeline that incorporates twelve machine learning algorithms along with the four aggregation methods to detect Parkinson\u2019s disease (PD) using multiple voice recordings from individuals available in the UCI Machine Learning Repository, which includes 756 voice recordings from 188 PD patients and 64 healthy individuals. Seven performance metrics\u2014accuracy, precision, sensitivity, specificity, F1 score, AUC, and MCC\u2014were utilized for model evaluation. Various techniques, such as Bag Over-Sampling (BOS), cross-validation, and grid search, were implemented to enhance classification performance. Among the four aggregation methods, post-mean aggregation combined with XGBoost achieved the highest accuracy (0.880), F1 score (0.922), and MCC (0.672). Furthermore, we identified potential trends in selecting aggregation methods that are suitable for imbalanced data, particularly based on their differences in sensitivity and specificity. These findings provide meaningful implications for the further exploration of grouped imbalanced data.<\/jats:p>","DOI":"10.3390\/data10010004","type":"journal-article","created":{"date-parts":[[2025,1,2]],"date-time":"2025-01-02T07:44:53Z","timestamp":1735803893000},"page":"4","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Optimizing Parkinson\u2019s Disease Prediction: A Comparative Analysis of Data Aggregation Methods Using Multiple Voice Recordings via an Automated Artificial Intelligence Pipeline"],"prefix":"10.3390","volume":"10","author":[{"given":"Zhengxiao","family":"Yang","sequence":"first","affiliation":[{"name":"Biostatistics and Data Science Graduate Program, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, 1440 Canal St., New Orleans, LA 70112, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8293-8592","authenticated-orcid":false,"given":"Hao","family":"Zhou","sequence":"additional","affiliation":[{"name":"Biostatistics and Data Science Graduate Program, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, 1440 Canal St., New Orleans, LA 70112, USA"}]},{"given":"Sudesh","family":"Srivastav","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Data Science, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA 70112, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3941-3772","authenticated-orcid":false,"given":"Jeffrey G.","family":"Shaffer","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Data Science, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA 70112, USA"}]},{"given":"Kuukua E.","family":"Abraham","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, Minnesota State University, Mankato, MN 60001, USA"}]},{"given":"Samuel M.","family":"Naandam","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Cape Coast, Cape Coast 00233, Ghana"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6362-5126","authenticated-orcid":false,"given":"Samuel","family":"Kakraba","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Data Science, Celia Scott Weatherhead School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA 70112, USA"},{"name":"Tulane Center for Aging, School of Medicine, Tulane University, 1440 Canal St., New Orleans, LA 70112, USA"}]}],"member":"1968","published-online":{"date-parts":[[2025,1,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"100108","DOI":"10.1016\/j.glmedi.2024.100108","article-title":"Artificial intelligence in healthcare delivery: Prospects and pitfalls","volume":"3","author":"Olawade","year":"2024","journal-title":"J. Med. Surg. Public Health"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Alowais, S.A., Alghamdi, S.S., Alsuhebany, N., Alqahtani, T., Alshaya, A.I., Almohareb, S.N., Aldairem, A., Alrashed, M., Bin Saleh, K., and Badreldin, H.A. (2023). Revolutionizing healthcare: The role of artificial intelligence in clinical practice. BMC Med. Educ., 23.","DOI":"10.1186\/s12909-023-04698-z"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Maleki Varnosfaderani, S., and Forouzanfar, M. (2024). The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering, 11.","DOI":"10.3390\/bioengineering11040337"},{"key":"ref_4","first-page":"58","article-title":"Significance of machine learning in healthcare: Features, pillars and applications","volume":"3","author":"Javaid","year":"2022","journal-title":"Int. J. Intell. Netw."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"220","DOI":"10.3390\/encyclopedia1010021","article-title":"Machine Learning in Healthcare Communication","volume":"1","author":"Siddique","year":"2021","journal-title":"Encyclopedia"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"9979","DOI":"10.1007\/s11229-021-03233-1","article-title":"The no-free-lunch theorems of supervised learning","volume":"199","author":"Sterkenburg","year":"2021","journal-title":"Synthese"},{"key":"ref_7","first-page":"100179","article-title":"Artificial intelligence: A powerful paradigm for scientific research","volume":"2","author":"Xu","year":"2021","journal-title":"Innovation"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"e26888","DOI":"10.1016\/j.heliyon.2024.e26888","article-title":"Artificial intelligence and machine learning applications in the project lifecycle of the construction industry: A comprehensive review","volume":"10","author":"Datta","year":"2024","journal-title":"Heliyon"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/s10462-024-10884-2","article-title":"Handling imbalanced medical datasets: Review of a decade of research","volume":"57","author":"Salmi","year":"2024","journal-title":"Artif. Intell. Rev."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/j.patcog.2017.10.009","article-title":"Multiple instance learning: A survey of problem characteristics and applications","volume":"77","author":"Carbonneau","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1017\/S026988890999035X","article-title":"A review of multi-instance learning assumptions","volume":"25","author":"Foulds","year":"2010","journal-title":"Knowl. Eng. Rev."},{"key":"ref_12","unstructured":"Zhou, S.K., Rueckert, D., and Fichtinger, G. (2020). Chapter 22\u2014Deep multiple instance learning for digital histopathology. Handbook of Medical Image Computing and Computer Assisted Intervention, Academic Press."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1016\/j.patrec.2019.10.022","article-title":"An embarrassingly simple approach to neural multiple instance classification","volume":"128","author":"Asif","year":"2019","journal-title":"Pattern Recognit. Lett."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1145\/3644076","article-title":"Multi-Instance Learning with One Side Label Noise","volume":"18","author":"Luan","year":"2024","journal-title":"ACM Trans. Knowl. Discov. Data"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"M\u00f8llersen, K., Hardeberg, J.Y., and Godtliebsen, F. (2020). A Probabilistic Bag-to-Class Approach to Multiple-Instance Learning. Data, 5.","DOI":"10.3390\/data5020056"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Herold, F., T\u00f6rpel, A., Hamacher, D., Budde, H., Zou, L., Strobach, T., M\u00fcller, N.G., and Gronwald, T. (2021). Causes and Consequences of Interindividual Response Variability: A Call to Apply a More Rigorous Research Design in Acute Exercise-Cognition Studies. Front. Physiol., 12.","DOI":"10.3389\/fphys.2021.682891"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1007\/s10115-006-0029-3","article-title":"Solving multi-instance problems with classifier ensemble based on constructive clustering","volume":"11","author":"Zhou","year":"2007","journal-title":"Knowl. Inf. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1038\/s41572-021-00280-3","article-title":"Parkinson disease-associated cognitive impairment","volume":"7","author":"Aarsland","year":"2021","journal-title":"Nat. Rev. Dis. Primers"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"200","DOI":"10.3934\/Neuroscience.2023017","article-title":"Depletion of dopamine in Parkinson\u2019s disease and relevant therapeutic options: A review of the literature","volume":"10","author":"Ramesh","year":"2023","journal-title":"AIMS Neurosci."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"24","DOI":"10.37349\/ent.2023.00036","article-title":"Pathophysiology of non-motor signs in Parkinson\u2019s disease: Some recent updating with brief presentation","volume":"3","author":"Radad","year":"2023","journal-title":"Explor. Neuroprotect. Ther."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1016\/S1474-4422(09)70068-7","article-title":"Non-motor symptoms of Parkinson\u2019s disease: Dopaminergic pathophysiology and treatment","volume":"8","author":"Chaudhuri","year":"2009","journal-title":"Lancet Neurol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1586\/14737175.2015.1038244","article-title":"Parkinson\u2019s disease: A review of non-motor symptoms","volume":"15","author":"Rana","year":"2015","journal-title":"Expert Rev. Neurother."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1007\/s00415-009-5240-1","article-title":"Non-motor symptoms in Parkinson\u2019s disease","volume":"256","author":"Park","year":"2009","journal-title":"J. Neurol."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Thach, A., Jones, E., Pappert, E., Pike, J., Wright, J., and Gillespie, A. (2021). Real-world assessment of the impact of \u201cOFF\u201d episodes on health-related quality of life among patients with Parkinson\u2019s disease in the United States. BMC Neurol., 21.","DOI":"10.1186\/s12883-021-02074-2"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"e464","DOI":"10.1016\/S2666-7568(24)00094-1","article-title":"Temporal trends in the prevalence of Parkinson\u2019s disease from 1980 to 2023: A systematic review and meta-analysis","volume":"5","author":"Zhu","year":"2024","journal-title":"Lancet Healthy Longev."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1001\/jamaneurol.2017.3299","article-title":"The Parkinson Pandemic\u2014A Call to Action","volume":"75","author":"Dorsey","year":"2018","journal-title":"JAMA Neurol."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"929","DOI":"10.1001\/jamaneurol.2022.1783","article-title":"Six Action Steps to Address Global Disparities in Parkinson Disease: A World Health Organization Priority","volume":"79","author":"Schiess","year":"2022","journal-title":"JAMA Neurol."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"981","DOI":"10.1001\/jamaneurol.2016.0947","article-title":"Time Trends in the Incidence of Parkinson Disease","volume":"73","author":"Savica","year":"2016","journal-title":"JAMA Neurol."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1002\/mds.25292","article-title":"The current and projected economic burden of Parkinson\u2019s disease in the United States","volume":"28","author":"Kowal","year":"2013","journal-title":"Mov. Disord."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"e1986","DOI":"10.1212\/WNL.0000000000012826","article-title":"Trends in Mortality from Parkinson Disease in the United States, 1999\u20132019","volume":"97","author":"Rong","year":"2021","journal-title":"Neurology"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1383","DOI":"10.1089\/ars.2016.6978","article-title":"Aspirin-Mediated Acetylation Protects Against Multiple Neurodegenerative Pathologies by Impeding Protein Aggregation","volume":"27","author":"Ayyadevara","year":"2017","journal-title":"Antioxid. Redox Signal."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bowroju, S.K., Mainali, N., Ayyadevara, S., Penthala, N.R., Krishnamachari, S., Kakraba, S., Shmookler Reis, R.J., and Crooks, P.A. (2020). Design and Synthesis of Novel Hybrid 8-Hydroxy Quinoline-Indole Derivatives as Inhibitors of A\u03b2 Self-Aggregation and Metal Chelation-Induced A\u03b2 Aggregation. Molecules, 25.","DOI":"10.3390\/molecules25163610"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kakraba, S., Ayyadevara, S., Mainali, N., Balasubramaniam, M., Bowroju, S., Penthala, N.R., Atluri, R., Barger, S.W., Griffin, S.T., and Crooks, P.A. (2023). Thiadiazolidinone (TDZD) Analogs Inhibit Aggregation-Mediated Pathology in Diverse Neurodegeneration Models, and Extend Life- and Healthspan. Pharmaceuticals, 16.","DOI":"10.3390\/ph16101498"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kakraba, S., Ayyadevara, S., Penthala, N.R., Balasubramaniam, M., Ganne, A., Liu, L., Alla, R., Bommagani, S.B., Barger, S.W., and Griffin, W.S.T. (2019). A Novel Microtubule-Binding Drug Attenuates and Reverses Protein Aggregation in Animal Models of Alzheimer\u2019s Disease. Front. Mol. Neurosci., 12.","DOI":"10.3389\/fnmol.2019.00310"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"e006813","DOI":"10.1136\/bmjopen-2014-006813","article-title":"Good-quality social care for people with Parkinson\u2019s disease: A qualitative study","volume":"6","author":"Tod","year":"2016","journal-title":"BMJ Open"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1016\/j.pneurobio.2013.08.003","article-title":"A review of quality of life after predictive testing for and earlier identification of neurodegenerative diseases","volume":"110","author":"Paulsen","year":"2013","journal-title":"Prog. Neurobiol."},{"key":"ref_37","first-page":"269","article-title":"Parkinson\u2019s Disease: Neurotransmitter Imbalance, Motor Dysfunction, and Nursing Interventions for Quality of Life","volume":"7","author":"Alanazi","year":"2024","journal-title":"J. Int. Crisis Risk Commun. Res."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bu\u017egov\u00e1, R., Koz\u00e1kov\u00e1, R., and Bar, M. (2020). The effect of neuropalliative care on quality of life and satisfaction with quality of care in patients with progressive neurological disease and their family caregivers: An interventional control study. BMC Palliat. Care, 19.","DOI":"10.1186\/s12904-020-00651-9"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Rees, R.N., Acharya, A.P., Schrag, A., and Noyce, A.J. (2018). An early diagnosis is not the same as a timely diagnosis of Parkinson\u2019s disease. F1000Research, 7.","DOI":"10.12688\/f1000research.14528.1"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1013","DOI":"10.2165\/00019053-200119100-00004","article-title":"Health-Related Quality of Life and Healthcare Utilisation in Patients with Parkinson\u2019s Disease","volume":"19","author":"Dodel","year":"2001","journal-title":"Pharmacoeconomics"},{"key":"ref_41","first-page":"135","article-title":"Delivering Multidisciplinary Rehabilitation Care in Parkinson\u2019s Disease: An International Consensus Statement","volume":"14","author":"Goldman","year":"2024","journal-title":"J. Park. Dis."},{"key":"ref_42","unstructured":"Sakar, C., Serbes, G., Gunduz, A., Nizam, H., and Sakar, B. (2024, September 01). Parkinson\u2019s Disease Classification [Dataset]. Available online: https:\/\/archive.ics.uci.edu\/dataset\/470\/parkinson+s+disease+classification."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, M., Zhao, X., Li, F., Wu, L., Li, Y., Tang, R., Yao, J., Lin, S., Zheng, Y., and Ling, Y. (2024). Using sustained vowels to identify patients with mild Parkinson\u2019s disease in a Chinese dataset. Front. Aging Neurosci., 16.","DOI":"10.3389\/fnagi.2024.1377442"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Aich, S., Kim, H.C., Younga, K., Hui, K.L., Al-Absi, A.A., and Sain, M. (2019, January 17\u201320). A Supervised Machine Learning Approach using Different Feature Selection Techniques on Voice Datasets for Prediction of Parkinson\u2019s Disease. Proceedings of the 2019 21st International Conference on Advanced Communication Technology (ICACT), PyeongChang, Republic of Korea.","DOI":"10.23919\/ICACT.2019.8701961"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1155\/1999\/327643","article-title":"Speech impairment in a large sample of patients with Parkinson\u2019s disease","volume":"11","author":"Ho","year":"1998","journal-title":"Behav. Neurol."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1055\/s-0041-1735249","article-title":"Speech Characteristics of Patients with Parkinson\u2019s Disease-Does Dopaminergic Medications Have a Role?","volume":"12","author":"Vandana","year":"2021","journal-title":"J. Neurosci. Rural Pract."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/j.jns.2011.07.020","article-title":"Aspects of speech rate and regularity in Parkinson\u2019s disease","volume":"310","author":"Skodda","year":"2011","journal-title":"J. Neurol. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Tabari, F., Berger, J.I., Flouty, O., Copeland, B., Greenlee, J.D., and Johari, K. (2024). Speech, voice, and language outcomes following deep brain stimulation: A systematic review. PLoS ONE, 19.","DOI":"10.1371\/journal.pone.0302739"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Krasko, M.N., Hoffmeister, J.D., Schaen-Heacock, N.E., Welsch, J.M., Kelm-Nelson, C.A., and Ciucci, M.R. (2021). Rat Models of Vocal Deficits in Parkinson\u2019s Disease. Brain Sci., 11.","DOI":"10.3390\/brainsci11070925"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Iyer, A., Kemp, A., Rahmatallah, Y., Pillai, L., Glover, A., Prior, F., Larson-Prior, L., and Virmani, T. (2023). A machine learning method to process voice samples for identification of Parkinson\u2019s disease. Sci. Rep., 13.","DOI":"10.1038\/s41598-023-47568-w"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Berus, L., Klancnik, S., Brezocnik, M., and Ficko, M. (2019). Classifying Parkinson\u2019s Disease Based on Acoustic Measures Using Artificial Neural Networks. Sensors, 19.","DOI":"10.3390\/s19010016"},{"key":"ref_52","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_53","unstructured":"Tan, J., Yang, J., Wu, S., Chen, G., and Zhao, J. (2021). A critical look at the current train\/test split in machine learning. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Szeghalmy, S., and Fazekas, A. (2023). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23.","DOI":"10.3390\/s23042333"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1007\/s42979-020-0074-0","article-title":"The Effects of Class Imbalance and Training Data Size on Classifier Learning: An Empirical Study","volume":"1","author":"Zheng","year":"2020","journal-title":"SN Comput. Sci."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1016\/j.compmedimag.2018.08.008","article-title":"Efficient multi-kernel multi-instance learning using weakly supervised and imbalanced data for diabetic retinopathy diagnosis","volume":"69","author":"Cao","year":"2018","journal-title":"Comput. Med. Imaging Graph."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Chawla, N., Bowyer, K., Hall, L.O., and Kegelmeyer, W.P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. arXiv.","DOI":"10.1613\/jair.953"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Liu, L. (2018, January 26\u201327). Research on Logistic Regression Algorithm of Breast Cancer Diagnose Data by Machine Learning. Proceedings of the 2018 International Conference on Robots & Intelligent System (ICRIS), Changsha, China.","DOI":"10.1109\/ICRIS.2018.00049"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"20","DOI":"10.38094\/jastt20165","article-title":"Classification Based on Decision Tree Algorithm for Machine Learning","volume":"2","author":"Charbuty","year":"2021","journal-title":"J. Appl. Sci. Technol. Trends"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Liu, Y., Wang, Y., and Zhang, J. (2012). New Machine Learning Algorithm: Random Forest. Information Computing and Applications, Springer.","DOI":"10.1007\/978-3-642-34062-8_32"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Natekin, A., and Knoll, A. (2013). Gradient boosting machines, a tutorial. Front. Neurorobot., 7.","DOI":"10.3389\/fnbot.2013.00021"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_63","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Neural Information Processing Systems, Curran Associates Inc."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"100459","DOI":"10.1016\/j.simpa.2022.100459","article-title":"PL-kNN: A Python-based implementation of a parameterless k-Nearest Neighbors classifier","volume":"15","author":"Jodas","year":"2023","journal-title":"Softw. Impacts"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"81","DOI":"10.48161\/qaj.v1n2a50","article-title":"Machine Learning Applications based on SVM Classification A Review","volume":"1","author":"Abdullah","year":"2021","journal-title":"Qubahan Acad. J."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Lowd, D., and Domingos, P.M. (2005, January 7\u201311). Naive Bayes models for probability estimation. Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany.","DOI":"10.1145\/1102351.1102418"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Denison, D.D., Hansen, M.H., Holmes, C.C., Mallick, B., and Yu, B. (2003). The Boosting Approach to Machine Learning: An Overview. Nonlinear Estimation and Classification, Springer.","DOI":"10.1007\/978-0-387-21579-2"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Orr\u00f9, P.F., Zoccheddu, A., Sassu, L., Mattia, C., Cozza, R., and Arena, S. (2020). Machine Learning Approach Using MLP and SVM Algorithms for the Fault Prediction of a Centrifugal Pump in the Oil and Gas Industry. Sustainability, 12.","DOI":"10.3390\/su12114776"},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"99129","DOI":"10.1109\/ACCESS.2022.3207287","article-title":"A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects","volume":"10","author":"Mienye","year":"2022","journal-title":"IEEE Access"},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"1341","DOI":"10.1007\/s12065-023-00824-4","article-title":"SELF: A stacked-based ensemble learning framework for breast cancer classification","volume":"17","author":"Jakhar","year":"2024","journal-title":"Evol. Intell."},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Shekar, B.H., and Dagnew, G. (2019, January 25\u201328). Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data. Proceedings of the 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India.","DOI":"10.1109\/ICACCP.2019.8882943"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1006\/jmps.1999.1279","article-title":"Cross-Validation Methods","volume":"44","author":"Browne","year":"2000","journal-title":"J. Math. Psychol."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"913","DOI":"10.1111\/ecog.02881","article-title":"Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure","volume":"40","author":"Roberts","year":"2017","journal-title":"Ecography"},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Wardhani, N.W.S., Rochayani, M.Y., Iriany, A., Sulistyono, A.D., and Lestantyo, P. (2019, January 23\u201324). Cross-validation Metrics for Evaluating Classification Performance on Imbalanced Data. Proceedings of the 2019 International Conference on Computer, Control, Informatics and its Applications (IC3INA), Tangerang, Indonesia.","DOI":"10.1109\/IC3INA48034.2019.8949568"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Halimu, C., Kasem, A., and Newaz, S.M. (2019, January 25\u201328). Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification. Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, Da Lat, Vietnam.","DOI":"10.1145\/3310986.3311023"},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"4900409","DOI":"10.1109\/JTEHM.2021.3066800","article-title":"Detecting Effect of Levodopa in Parkinson\u2019s Disease Patients Using Sustained Phonemes","volume":"9","author":"Pah","year":"2021","journal-title":"IEEE J. Transl. Eng. Health Med."},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Ngo, Q.C., Motin, M.A., Pah, N.D., Drot\u00e1r, P., Kempster, P., and Kumar, D. (2022). Computerized analysis of speech and voice for Parkinson\u2019s disease: A systematic review. Comput. Methods Programs Biomed., 226.","DOI":"10.1016\/j.cmpb.2022.107133"},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"2000410","DOI":"10.1109\/JTEHM.2019.2940900","article-title":"Automated Detection of Parkinson\u2019s Disease Based on Multiple Types of Sustained Phonations Using Linear Discriminant Analysis and Genetically Optimized Neural Network","volume":"7","author":"Ali","year":"2019","journal-title":"IEEE J. Transl. Eng. Health Med."},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Arora, S., and Tsanas, A. (2021). Assessing Parkinson\u2019s Disease at Scale Using Telephone-Recorded Speech: Insights from the Parkinson\u2019s Voice Initiative. Diagnostics, 11.","DOI":"10.3390\/diagnostics11101892"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Azadi, H., Akbarzadeh-T, M.-R., Shoeibi, A., and Kobravi, H.R. (2021). Evaluating the Effect of Parkinson\u2019s Disease on Jitter and Shimmer Speech Features. Adv. Biomed. Res., 10.","DOI":"10.4103\/abr.abr_254_21"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Viswanathan, R., Arjunan, S.P., Bingham, A., Jelfs, B., Kempster, P., Raghav, S., and Kumar, D.K. (2020). Complexity Measures of Voice Recordings as a Discriminative Tool for Parkinson\u2019s Disease. Biosensors, 10.","DOI":"10.3390\/bios10010001"},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"115540","DOI":"10.1109\/ACCESS.2019.2936564","article-title":"Deep Learning-Based Parkinson\u2019s Disease Classification Using Vocal Feature Sets","volume":"7","author":"Gunduz","year":"2019","journal-title":"IEEE Access"},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"109678","DOI":"10.1016\/j.mehy.2020.109678","article-title":"Parkinson disease classification using one against all based data sampling with the acoustic features from the speech signals","volume":"140","author":"Polat","year":"2020","journal-title":"Med. Hypotheses"},{"key":"ref_84","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1016\/j.asoc.2018.10.022","article-title":"A comparative analysis of speech signal processing algorithms for Parkinson\u2019s disease classification and the use of the tunable Q-factor wavelet transform","volume":"74","author":"Sakar","year":"2019","journal-title":"Appl. Soft Comput."},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1109\/JBHI.2013.2245674","article-title":"Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings","volume":"17","author":"Sakar","year":"2013","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/j.cmpb.2017.02.019","article-title":"A two-stage variable selection and classification approach for Parkinson\u2019s disease detection by using voice recording replications","volume":"142","author":"Naranjo","year":"2017","journal-title":"Comput. Methods Programs Biomed."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1016\/j.asoc.2017.11.001","article-title":"Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson\u2019s Disease","volume":"62","author":"Villalba","year":"2018","journal-title":"Appl. Soft Comput."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.bandl.2016.07.008","article-title":"How language flows when movements don\u2019t: An automated analysis of spontaneous discourse in Parkinson\u2019s disease","volume":"162","author":"Carrillo","year":"2016","journal-title":"Brain Lang."},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1016\/j.patcog.2019.02.023","article-title":"The impact of class imbalance in classification performance metrics based on the binary confusion matrix","volume":"91","author":"Luque","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/S0009-8981(01)00710-0","article-title":"Objective evaluation of data in screening for disease","volume":"315","author":"Sasse","year":"2002","journal-title":"Clin. Chim. Acta"},{"key":"ref_91","doi-asserted-by":"crossref","first-page":"113406","DOI":"10.1016\/j.eswa.2020.113406","article-title":"A comparative evaluation of aggregation methods for machine learning over vertically partitioned data","volume":"152","author":"Trevizan","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1016\/j.eswa.2015.10.034","article-title":"Addressing voice recording replications for Parkinson\u2019s disease detection","volume":"46","author":"Naranjo","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_93","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1080\/136828200410654","article-title":"Voice characteristics in the progression of Parkinson\u2019s disease","volume":"35","author":"Holmes","year":"2000","journal-title":"Int. J. Lang. Commun. Disord."},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Tsanas, A., and Arora, S. (2020, January 24\u201326). Large-scale Clustering of People Diagnosed with Parkinson\u2019s Disease using Acoustic Analysis of Sustained Vowels: Findings in the Parkinson\u2019s Voice Initiative Study. Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing, Valletta, Malta.","DOI":"10.5220\/0009361203690376"},{"key":"ref_95","doi-asserted-by":"crossref","first-page":"109282","DOI":"10.1016\/j.jneumeth.2021.109282","article-title":"A convolutional-recurrent neural network approach to resting-state EEG classification in Parkinson\u2019s disease","volume":"361","author":"Lee","year":"2021","journal-title":"J. Neurosci. Methods"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Shen, M., Mortezaagha, P., and Rahgozar, A. (2024). Explainable Artificial Intelligence to Diagnose Early Parkinson\u2019s Disease via Voice Analysis. medRxiv.","DOI":"10.1101\/2024.09.29.24314580"},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","article-title":"Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups","volume":"29","author":"Hinton","year":"2012","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_98","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1109\/TETCI.2017.2784878","article-title":"Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks","volume":"2","author":"Hou","year":"2018","journal-title":"IEEE Trans. Emerg. Top. Comput. Intell."},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Ye, F., and Yang, J. (2021). A Deep Neural Network Model for Speaker Identification. Appl. Sci., 11.","DOI":"10.3390\/app11083603"},{"key":"ref_100","doi-asserted-by":"crossref","first-page":"e215","DOI":"10.1016\/j.jvoice.2020.05.029","article-title":"Deep Neural Network for Automatic Classification of Pathological Voice Signals","volume":"36","author":"Chen","year":"2022","journal-title":"J. Voice"},{"key":"ref_101","doi-asserted-by":"crossref","first-page":"1301","DOI":"10.1109\/JSTSP.2017.2764438","article-title":"End-to-End Multimodal Emotion Recognition Using Deep Neural Networks","volume":"11","author":"Tzirakis","year":"2017","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_102","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1016\/j.bbe.2022.03.002","article-title":"Voice disorder classification using speech enhancement and deep learning models","volume":"42","author":"Chaiani","year":"2022","journal-title":"Biocybern. Biomed. Eng."},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"1753","DOI":"10.1007\/s11831-021-09647-x","article-title":"Survey on Machine Learning in Speech Emotion Recognition and Vision Systems Using a Recurrent Neural Network (RNN)","volume":"29","author":"Yadav","year":"2022","journal-title":"Arch. Comput. Methods Eng."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/1\/4\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T15:23:41Z","timestamp":1759850621000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/1\/4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,2]]},"references-count":103,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,1]]}},"alternative-id":["data10010004"],"URL":"https:\/\/doi.org\/10.3390\/data10010004","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,2]]}}}