{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"institution":[{"name":"medRxiv"}],"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T09:00:44Z","timestamp":1768554044952,"version":"3.49.0"},"posted":{"date-parts":[[2022,10,7]]},"group-title":"Infectious Diseases (except HIV\/AIDS)","reference-count":59,"publisher":"openRxiv","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"accepted":{"date-parts":[[2023,9,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                <jats:sec>\n                  <jats:title>Background<\/jats:title>\n                  <jats:p>Nowadays, the chance of discovering the best antibody candidates for explaining naturally acquired protection to malaria and detecting exposure to malaria parasites has notably increased due to publicly available multi-sera data. The analysis of these data is typically divided into a feature selection phase followed by a predictive one where several models are constructed for the outcome of interest. A key question in the analysis is to determine which and how each feature should be included in the predictive stage.<\/jats:p>\n                <\/jats:sec>\n                <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>\n                    To answer this question, we developed three approaches for classifying malaria protected and susceptible groups: (i) a basic and simple approach based on selecting antibodies via the nonparametric Mann-Whitney test; (ii) a dichotomization approach where each antibody was selected according to the optimal cut-off via maximization of the \u03c7\n                    <jats:sup>2<\/jats:sup>\n                    statistic for two-way tables; (iii) a hybrid parametric\/non-parametric approach that integrates Box-Cox transformation followed by a t-test, together with the use of finite mixture models and the Mann-Whitney test as a last resort. We illustrated the application of these three approaches with published serological data for predicting clinical malaria in 121 Kenyan children. The predictive analysis was based on a Super-Learner where predictions from multiple classifiers were pooled together. Our results led to almost similar areas under the Receiver Operating Characteristic curves of 0.72 (95% CI = [0.61, 0.82]), 0.80 (95% CI = [0.71, 0.90]), 0.79 (95% CI = [0.7, 0.88]) for the simple, dichotomization and hybrid approaches, respectively.\n                  <\/jats:p>\n                <\/jats:sec>\n                <jats:sec>\n                  <jats:title>Conclusions<\/jats:title>\n                  <jats:p>The three feature selection strategies provided a better predictive performance of the outcome when compared to the previous results solely relying on Random Forests alone (AUC=0.68). Given the similar predictive performance, we recommended the three strategies should be used in conjunction in the same data set and selected according to their complexity.<\/jats:p>\n                <\/jats:sec>","DOI":"10.1101\/2022.10.06.22280719","type":"posted-content","created":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T12:06:22Z","timestamp":1665144382000},"source":"Crossref","is-referenced-by-count":0,"title":["Antibody selection strategies and their impact in the analysis of malaria multi-sera data"],"prefix":"10.64898","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8249-0354","authenticated-orcid":false,"given":"Andr\u00e9","family":"Fonseca","sequence":"first","affiliation":[]},{"given":"Mikolaj","family":"Spytek","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8423-1823","authenticated-orcid":false,"given":"Przemyslaw","family":"Biecek","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1026-6078","authenticated-orcid":false,"given":"Clara","family":"Cordeiro","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8542-1706","authenticated-orcid":false,"given":"Nuno","family":"Sep\u00falveda","sequence":"additional","affiliation":[]}],"member":"54368","reference":[{"key":"2023090804001099000_2022.10.06.22280719v2.1","doi-asserted-by":"publisher","DOI":"10.1002\/1097-0320(20010901)45:1<27::AID-CYTO1141>3.0.CO;2-I"},{"key":"2023090804001099000_2022.10.06.22280719v2.2","doi-asserted-by":"publisher","DOI":"10.1128\/IAI.01539-07"},{"key":"2023090804001099000_2022.10.06.22280719v2.3","doi-asserted-by":"publisher","DOI":"10.1186\/s12936-018-2365-7"},{"key":"2023090804001099000_2022.10.06.22280719v2.4","doi-asserted-by":"publisher","DOI":"10.1186\/1475-2875-7-108"},{"key":"2023090804001099000_2022.10.06.22280719v2.5","doi-asserted-by":"publisher","DOI":"10.1016\/j.vaccine.2017.01.001"},{"key":"2023090804001099000_2022.10.06.22280719v2.6","doi-asserted-by":"publisher","DOI":"10.3389\/fimmu.2020.00893"},{"key":"2023090804001099000_2022.10.06.22280719v2.7","doi-asserted-by":"publisher","DOI":"10.1074\/mcp.RA118.001256"},{"key":"2023090804001099000_2022.10.06.22280719v2.8","doi-asserted-by":"publisher","DOI":"10.1126\/scitranslmed.3008705"},{"key":"2023090804001099000_2022.10.06.22280719v2.9","doi-asserted-by":"publisher","DOI":"10.1128\/IAI.01585-07"},{"key":"2023090804001099000_2022.10.06.22280719v2.10","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.28673"},{"key":"2023090804001099000_2022.10.06.22280719v2.11","doi-asserted-by":"publisher","DOI":"10.3389\/fimmu.2020.00928"},{"key":"2023090804001099000_2022.10.06.22280719v2.12","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-020-0841-4"},{"key":"2023090804001099000_2022.10.06.22280719v2.13","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1501705112"},{"key":"2023090804001099000_2022.10.06.22280719v2.14","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1001323107"},{"key":"2023090804001099000_2022.10.06.22280719v2.15","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1005812"},{"key":"2023090804001099000_2022.10.06.22280719v2.16","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-020-57876-0"},{"key":"2023090804001099000_2022.10.06.22280719v2.17","doi-asserted-by":"publisher","DOI":"10.12688\/wellcomeopenres.14950.2"},{"key":"2023090804001099000_2022.10.06.22280719v2.18","doi-asserted-by":"publisher","DOI":"10.1186\/1475-2875-9-317"},{"key":"2023090804001099000_2022.10.06.22280719v2.19","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"2023090804001099000_2022.10.06.22280719v2.20","doi-asserted-by":"publisher","DOI":"10.14419\/ijet.v7i4.11.20790"},{"key":"2023090804001099000_2022.10.06.22280719v2.21","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v077.i01"},{"key":"2023090804001099000_2022.10.06.22280719v2.22","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(96)00142-2"},{"key":"2023090804001099000_2022.10.06.22280719v2.23","unstructured":"Kuhn M. caret: Classification and Regression Training. Published online 2022. Accessed May 26, 2022. https:\/\/CRAN.R-project.org\/package=caret"},{"key":"2023090804001099000_2022.10.06.22280719v2.24","doi-asserted-by":"publisher","DOI":"10.20982\/tqmp.04.1.p013"},{"key":"2023090804001099000_2022.10.06.22280719v2.25","doi-asserted-by":"publisher","DOI":"10.3389\/fmed.2021.686736"},{"key":"2023090804001099000_2022.10.06.22280719v2.26","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1902623116"},{"key":"2023090804001099000_2022.10.06.22280719v2.27","doi-asserted-by":"publisher","DOI":"10.1080\/03610918.2014.957839"},{"key":"2023090804001099000_2022.10.06.22280719v2.28","doi-asserted-by":"publisher","DOI":"10.1155\/2015\/738030"},{"key":"2023090804001099000_2022.10.06.22280719v2.29","doi-asserted-by":"publisher","DOI":"10.1101\/2021.03.08.21252807"},{"key":"2023090804001099000_2022.10.06.22280719v2.30","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1013699998"},{"key":"2023090804001099000_2022.10.06.22280719v2.31","doi-asserted-by":"publisher","DOI":"10.2202\/1544-6115.1309"},{"key":"2023090804001099000_2022.10.06.22280719v2.32","unstructured":"Polley E, LeDell E, Kennedy C, Van der Laan M. SuperLearner: Super Learner Prediction. Published online 2021. Accessed March 13, 2023. https:\/\/CRAN.R-project.org\/package=SuperLearner"},{"key":"2023090804001099000_2022.10.06.22280719v2.33","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143874"},{"key":"2023090804001099000_2022.10.06.22280719v2.34","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/1229\/1\/012055"},{"key":"2023090804001099000_2022.10.06.22280719v2.35","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v061.i08"},{"key":"2023090804001099000_2022.10.06.22280719v2.36","doi-asserted-by":"publisher","DOI":"10.1145\/3494672"},{"key":"2023090804001099000_2022.10.06.22280719v2.37","doi-asserted-by":"publisher","DOI":"10.1007\/s44176-022-00006-z"},{"key":"2023090804001099000_2022.10.06.22280719v2.38","unstructured":"R Core Team. R: A Language and Environment for Statistical Computing. Published online 2022. Accessed October 26, 2022. https:\/\/www.R-project.org\/"},{"key":"2023090804001099000_2022.10.06.22280719v2.39","doi-asserted-by":"publisher","DOI":"10.1080\/03610918.2016.1204458"},{"key":"2023090804001099000_2022.10.06.22280719v2.40","unstructured":"Microsoft Corporation, Weston S. doParallel: Foreach Parallel Adaptor for the \u201cparallel\u201d Package. Published online 2022. Accessed March 23, 2023. https:\/\/CRAN.R-project.org\/package=doParallel"},{"key":"2023090804001099000_2022.10.06.22280719v2.41","unstructured":"Wickman H, Fran\u00e7ois R, Henry L, M\u00fcller K. dplyr: A Grammar of Data Manipulation. Published online 2021. Accessed March 14, 2022. https:\/\/CRAN.R-project.org\/package=dplyr"},{"key":"2023090804001099000_2022.10.06.22280719v2.42","doi-asserted-by":"crossref","unstructured":"Wickham H. ggplot2: Elegant Graphics for Data Analysis. Published online 2016. Accessed March 13, 2023. https:\/\/ggplot2.tidyverse.org","DOI":"10.1007\/978-3-319-24277-4"},{"key":"2023090804001099000_2022.10.06.22280719v2.43","unstructured":"Slowikowski K. ggrepel: Automatically Position Non-Overlapping Text Labels with \u201cggplot2.\u201d Published online 2023. Accessed April 11, 2023. https:\/\/CRAN.R-project.org\/package=ggrepel"},{"key":"2023090804001099000_2022.10.06.22280719v2.44","unstructured":"Hothorn T, Zeileis A, Farebrother WR, et al. lmtest: Testing Linear Regression Models. Published online March 21, 2022. Accessed January 27, 2023. https:\/\/CRAN.R-project.org\/doc\/Rnews\/"},{"key":"2023090804001099000_2022.10.06.22280719v2.45","doi-asserted-by":"crossref","unstructured":"Venables WB, Ripley BD. Modern Applied Statistics with S. Fourth.; 2002. Accessed April 23, 2022. https:\/\/www.stats.ox.ac.uk\/pub\/MASS4\/","DOI":"10.1007\/978-0-387-21706-2"},{"key":"2023090804001099000_2022.10.06.22280719v2.46","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v054.i12"},{"key":"2023090804001099000_2022.10.06.22280719v2.47","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-12-77"},{"key":"2023090804001099000_2022.10.06.22280719v2.48","unstructured":"Azzalini A. sn: The Skew-Normal and Related Distributions Such as the Skew-t and the SUN. Published online April 4, 2023. Accessed May 18, 2022. http:\/\/azzalini.stat.unipd.it\/SN\/"},{"key":"2023090804001099000_2022.10.06.22280719v2.49","unstructured":"Wickham H. tidyr: Tidy Messy Data. Published online 2021. Accessed March 13, 2023. https:\/\/CRAN.R-project.org\/package=tidyr"},{"key":"2023090804001099000_2022.10.06.22280719v2.50","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijpara.2016.06.002"},{"key":"2023090804001099000_2022.10.06.22280719v2.51","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-017-02646-2"},{"key":"2023090804001099000_2022.10.06.22280719v2.52","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0273106"},{"key":"2023090804001099000_2022.10.06.22280719v2.53","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.05.042"},{"key":"2023090804001099000_2022.10.06.22280719v2.54","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2005.11.001"},{"key":"2023090804001099000_2022.10.06.22280719v2.55","doi-asserted-by":"publisher","DOI":"10.1145\/980972.980974"},{"key":"2023090804001099000_2022.10.06.22280719v2.56","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm344"},{"key":"2023090804001099000_2022.10.06.22280719v2.57","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2004.01.007"},{"key":"2023090804001099000_2022.10.06.22280719v2.58","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.65776"},{"key":"2023090804001099000_2022.10.06.22280719v2.59","doi-asserted-by":"publisher","DOI":"10.1080\/03610926.2020.1764042"}],"container-title":[],"original-title":[],"link":[{"URL":"https:\/\/syndication.highwire.org\/content\/doi\/10.1101\/2022.10.06.22280719","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T14:51:49Z","timestamp":1768488709000},"score":1,"resource":{"primary":{"URL":"http:\/\/medrxiv.org\/lookup\/doi\/10.1101\/2022.10.06.22280719"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,7]]},"references-count":59,"URL":"https:\/\/doi.org\/10.1101\/2022.10.06.22280719","relation":{},"subject":[],"published":{"date-parts":[[2022,10,7]]},"subtype":"preprint"}}