{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T17:00:36Z","timestamp":1772643636271,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004807","name":"DFG","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004807","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"German Research Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Germany\u2019s Excellence Strategy","award":["390727645"],"award-info":[{"award-number":["390727645"]}]},{"name":"German Federal Ministry of Education and Research"},{"name":"Training Center Machine Learning, T\u00fcbingen","award":["01-S17054"],"award-info":[{"award-number":["01-S17054"]}]},{"name":"German Federal Ministry of Education and Research"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite P.falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of P.falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for P.falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>PlasmoFAB is publicly available on Zenodo with DOI 10.5281\/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https:\/\/github.com\/msmdev\/PlasmoFAB.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad206","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T08:14:45Z","timestamp":1688112885000},"page":"i86-i93","source":"Crossref","is-referenced-by-count":3,"title":["PlasmoFAB: a benchmark to foster machine learning for <i>Plasmodium falciparum<\/i> protein antigen candidate prediction"],"prefix":"10.1093","volume":"39","author":[{"given":"Jonas C","family":"Ditz","sequence":"first","affiliation":[{"name":"Methods in Medical Informatics, Department of Computer Science, University of T\u00fcbingen , 72076 T\u00fcbingen, Germany"}]},{"given":"Jacqueline","family":"Wistuba-Hamprecht","sequence":"additional","affiliation":[{"name":"Methods in Medical Informatics, Department of Computer Science, University of T\u00fcbingen , 72076 T\u00fcbingen, Germany"}]},{"given":"Timo","family":"Maier","sequence":"additional","affiliation":[{"name":"Methods in Medical Informatics, Department of Computer Science, University of T\u00fcbingen , 72076 T\u00fcbingen, Germany"},{"name":"Computomics GmbH , 72072 T\u00fcbingen, Germany"}]},{"given":"Rolf","family":"Fendel","sequence":"additional","affiliation":[{"name":"Institute of Tropical Medicine, University Hospital T\u00fcbingen , 72074 T\u00fcbingen, Germany"},{"name":"German Center for Infection Research (DZIF), Partner Site T\u00fcbingen , T\u00fcbingen, Germany"}]},{"given":"Nico","family":"Pfeifer","sequence":"additional","affiliation":[{"name":"Methods in Medical Informatics, Department of Computer Science, University of T\u00fcbingen , 72076 T\u00fcbingen, Germany"}]},{"given":"Bernhard","family":"Reuter","sequence":"additional","affiliation":[{"name":"Methods in Medical Informatics, Department of Computer Science, University of T\u00fcbingen , 72076 T\u00fcbingen, Germany"}]}],"member":"286","published-online":{"date-parts":[[2023,6,30]]},"reference":[{"key":"2023063008143135900_btad206-B1","doi-asserted-by":"crossref","first-page":"3387","DOI":"10.1093\/bioinformatics\/btx431","article-title":"Deeploc: prediction of protein subcellular localization using deep learning","volume":"33","author":"Almagro Armenteros","year":"2017","journal-title":"Bioinformatics"},{"key":"2023063008143135900_btad206-B2","doi-asserted-by":"crossref","first-page":"D898","DOI":"10.1093\/nar\/gkab929","article-title":"Veupathdb: the eukaryotic pathogen, vector and host bioinformatics resource center","volume":"50","author":"Amos","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023063008143135900_btad206-B3","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1111\/cbdd.13821","article-title":"Plasmodium falciparum fikk9. 1 is a monomeric serine-threonine protein kinase with features to exploit as a drug target","volume":"97","author":"Anil Kumar","year":"2021","journal-title":"Chem Biol Drug Des"},{"key":"2023063008143135900_btad206-B4","doi-asserted-by":"crossref","first-page":"D154","DOI":"10.1093\/nar\/gki070","article-title":"The universal protein resource (uniprot)","volume":"33","author":"Bairoch","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023063008143135900_btad206-B5","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1038\/nmeth0610-429","article-title":"Making membrane proteins for structures: a trillion tiny tweaks","volume":"7","author":"Baker","year":"2010","journal-title":"Nat Methods"},{"key":"2023063008143135900_btad206-B6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-019-6413-7","article-title":"The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation","volume":"21","author":"Chicco","year":"2020","journal-title":"BMC Genomics"},{"key":"2023063008143135900_btad206-B7","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: Toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2023063008143135900_btad206-B8","doi-asserted-by":"crossref","first-page":"e108","DOI":"10.1002\/cpbi.108","article-title":"Protein sequence analysis using the MPI bioinformatics toolkit","volume":"72","author":"Gabler","year":"2020","journal-title":"Curr Protoc Bioinformatics"},{"key":"2023063008143135900_btad206-B9","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1038\/nature01097","article-title":"Genome sequence of the human malaria parasite Plasmodium falciparum","volume":"419","author":"Gardner","year":"2002","journal-title":"Nature"},{"key":"2023063008143135900_btad206-B10","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1145\/3458723","article-title":"Datasheets for datasets","volume":"64","author":"Gebru","year":"2021","journal-title":"Commun ACM"},{"key":"2023063008143135900_btad206-B11","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.drup.2014.10.004","article-title":"The conserved clag multigene family of malaria parasites: essential roles in host\u2013pathogen interaction","volume":"18","author":"Gupta","year":"2015","journal-title":"Drug Resist Updat"},{"key":"2023063008143135900_btad206-B12","author":"Hallgren","year":"2022"},{"key":"2023063008143135900_btad206-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-022-30133-w","article-title":"Malaria in 2022: increasing challenges, cautious optimism","volume":"13","author":"Jagannathan","year":"2022","journal-title":"Nat Commun"},{"key":"2023063008143135900_btad206-B14","doi-asserted-by":"crossref","first-page":"664","DOI":"10.1016\/j.pt.2021.04.009","article-title":"Defining the essential exportome of the malaria parasite","volume":"37","author":"Jonsdottir","year":"2021","journal-title":"Trends Parasitol"},{"key":"2023063008143135900_btad206-B15","doi-asserted-by":"crossref","first-page":"W429","DOI":"10.1093\/nar\/gkm256","article-title":"Advantages of combined transmembrane topology and signal peptide prediction\u2014the phobius web server","volume":"35","author":"K\u00e4ll","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023063008143135900_btad206-B16","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1006\/jmbi.2000.4315","article-title":"Predicting transmembrane protein topology with a hidden markov model: application to complete genomes","volume":"305","author":"Krogh","year":"2001","journal-title":"J Mol Biol"},{"key":"2023063008143135900_btad206-B17","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1186\/1471-2105-5-169","article-title":"Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites","volume":"5","author":"Meinicke","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023063008143135900_btad206-B18","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1038\/nature21060","article-title":"Sterile protection against human malaria by chemoattenuated PfSPZ vaccine","volume":"542","author":"Mordm\u00fcller","year":"2017","journal-title":"Nature"},{"key":"2023063008143135900_btad206-B19","doi-asserted-by":"crossref","first-page":"e00027","DOI":"10.1128\/mSphereDirect.00027-19","article-title":"Antibody biomarkers associated with sterile protection induced by controlled human malaria infection under chloroquine prophylaxis","volume":"4","author":"Obiero","year":"2019","journal-title":"Msphere"},{"key":"2023063008143135900_btad206-B20","doi-asserted-by":"crossref","first-page":"1111","DOI":"10.1056\/NEJMoa1207564","article-title":"Four-year efficacy of RTS, S\/AS01e and its interaction with malaria exposure","volume":"368","author":"Olotu","year":"2013","journal-title":"N Engl J Med"},{"key":"2023063008143135900_btad206-B21","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.molbiopara.2010.01.003","article-title":"The host targeting motif in exported plasmodium proteins is cleaved in the parasite endoplasmic reticulum","volume":"171","author":"Osborne","year":"2010","journal-title":"Mol Biochem Parasitol"},{"key":"2023063008143135900_btad206-B22","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2023063008143135900_btad206-B23","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res"},{"key":"2023063008143135900_btad206-B24","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1038\/nm.3083","article-title":"Immune mechanisms in malaria: new insights in vaccine development","volume":"19","author":"Riley","year":"2013","journal-title":"Nat Med"},{"key":"2023063008143135900_btad206-B25","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023063008143135900_btad206-B26","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/S0140-6736(15)60721-8","article-title":"Efficacy and safety of RTS, S\/AS01 malaria vaccine with or without a booster dose in infants and children in Africa: final results of a phase 3, individually randomised, controlled trial","volume":"386","author":"RTS,S Clinical Trials Partnership","year":"2015","journal-title":"Lancet"},{"key":"2023063008143135900_btad206-B27","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1038\/s41592-019-0437-4","article-title":"Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold","volume":"16","author":"Steinegger","year":"2019","journal-title":"Nat Methods"},{"key":"2023063008143135900_btad206-B28","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","article-title":"Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches","volume":"31","author":"Suzek","year":"2015","journal-title":"Bioinformatics"},{"key":"2023063008143135900_btad206-B29","doi-asserted-by":"crossref","first-page":"e1005606","DOI":"10.1371\/journal.ppat.1005606","article-title":"Interrogating the plasmodium sporozoite surface: identification of surface-exposed proteins and demonstration of glycosylation on CSP and TRAP by mass spectrometry-based proteomics","volume":"12","author":"Swearingen","year":"2016","journal-title":"PLoS Pathog"},{"key":"2023063008143135900_btad206-B30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12880-015-0068-x","article-title":"Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool","volume":"15","author":"Taha","year":"2015","journal-title":"BMC Med Imaging"},{"key":"2023063008143135900_btad206-B31","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/j.molbiopara.2014.07.011","article-title":"A conserved domain targets exported PHISTb family proteins to the periphery of plasmodium infected erythrocytes","volume":"196","author":"Tarr","year":"2014","journal-title":"Mol Biochem Parasitol"},{"key":"2023063008143135900_btad206-B32","doi-asserted-by":"crossref","first-page":"D523","DOI":"10.1093\/nar\/gkac1052","article-title":"Uniprot: the universal protein knowledgebase in 2023","volume":"51","author":"The UniProt Consortium","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2023063008143135900_btad206-B33","doi-asserted-by":"crossref","first-page":"W228","DOI":"10.1093\/nar\/gkac278","article-title":"Deeploc 2.0: multi-label subcellular localization prediction using protein language models","volume":"50","author":"Thumuluri","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023063008143135900_btad206-B34","doi-asserted-by":"crossref","first-page":"4670","DOI":"10.1111\/j.1742-4658.2007.05997.x","article-title":"Malaria\u2014an overview","volume":"274","author":"Tuteja","year":"2007","journal-title":"FEBS J"},{"key":"2023063008143135900_btad206-B35","doi-asserted-by":"crossref","first-page":"D339","DOI":"10.1093\/nar\/gky1006","article-title":"The immune epitope database (IEDB): 2018 update","volume":"47","author":"Vita","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023063008143135900_btad206-B36","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1038\/nrmicro.2017.47","article-title":"Variant surface antigens of Plasmodium falciparum and their roles in severe malaria","volume":"15","author":"Wahlgren","year":"2017","journal-title":"Nat Rev Microbiol"},{"key":"2023063008143135900_btad206-B37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.18","article-title":"The fair guiding principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci Data"},{"key":"2023063008143135900_btad206-B38","doi-asserted-by":"crossref","DOI":"10.30875\/6c551ba0-en","volume-title":"World Malaria Report 2021","author":"World Health Organization","year":"2021"},{"key":"2023063008143135900_btad206-B39","volume-title":"World Malaria Report 2022","author":"World Health Organization","year":"2022"},{"key":"2023063008143135900_btad206-B40","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1053\/j.semdp.2019.04.014","article-title":"Evaluation of the sick returned traveler","volume":"36","author":"Wu","year":"2019","journal-title":"Semin Diagn Pathol"},{"key":"2023063008143135900_btad206-B41","first-page":"23519","article-title":"Towards a theoretical framework of out-of-distribution generalization","volume":"34","author":"Ye","year":"2021","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023063008143135900_btad206-B42","doi-asserted-by":"crossref","first-page":"2237","DOI":"10.1016\/j.jmb.2017.12.007","article-title":"A completely reimplemented MPI bioinformatics toolkit with a new hhpred server at its core","volume":"430","author":"Zimmermann","year":"2018","journal-title":"J Mol Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/Supplement_1\/i86\/50741386\/btad206.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/Supplement_1\/i86\/50741386\/btad206.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T08:15:38Z","timestamp":1688112938000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/39\/Supplement_1\/i86\/7210434"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":42,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2023,6,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad206","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,6,1]]},"published":{"date-parts":[[2023,6,1]]}}}