{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T14:58:51Z","timestamp":1777561131665,"version":"3.51.4"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2024,2,2]],"date-time":"2024-02-02T00:00:00Z","timestamp":1706832000000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"publisher","award":["12371290"],"award-info":[{"award-number":["12371290"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Effective drug delivery systems are paramount in enhancing pharmaceutical outcomes, particularly through the use of cell-penetrating peptides (CPPs). These peptides are gaining prominence due to their ability to penetrate eukaryotic cells efficiently without inflicting significant damage to the cellular membrane, thereby ensuring optimal drug delivery. However, the identification and characterization of CPPs remain a challenge due to the laborious and time-consuming nature of conventional methods, despite advances in proteomics. Current computational models, however, are predominantly tailored for balanced datasets, an approach that falls short in real-world applications characterized by a scarcity of known positive CPP instances.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To navigate this shortfall, we introduce PractiCPP, a novel deep-learning framework tailored for CPP prediction in highly imbalanced data scenarios. Uniquely designed with the integration of hard negative sampling and a sophisticated feature extraction and prediction module, PractiCPP facilitates an intricate understanding and learning from imbalanced data. Our extensive computational validations highlight PractiCPP\u2019s exceptional ability to outperform existing state-of-the-art methods, demonstrating remarkable accuracy, even in datasets with an extreme positive-to-negative ratio of 1:1000. Furthermore, through methodical embedding visualizations, we have established that models trained on balanced datasets are not conducive to practical, large-scale CPP identification, as they do not accurately reflect real-world complexities. In summary, PractiCPP potentially offers new perspectives in CPP prediction methodologies. Its design and validation, informed by real-world dataset constraints, suggest its utility as a valuable tool in supporting the acceleration of drug delivery advancements.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code of PractiCPP is available on Figshare at https:\/\/doi.org\/10.6084\/m9.figshare.25053878.v1.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae058","type":"journal-article","created":{"date-parts":[[2024,2,2]],"date-time":"2024-02-02T14:31:16Z","timestamp":1706884276000},"source":"Crossref","is-referenced-by-count":23,"title":["PractiCPP: a deep learning approach tailored for extremely imbalanced datasets in cell-penetrating peptide prediction"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1276-1053","authenticated-orcid":false,"given":"Kexin","family":"Shi","sequence":"first","affiliation":[{"name":"Syneron Technology , Guangzhou 510000, China"},{"name":"Individualized Interdisciplinary Program (Data Science and Analytics), The Hong Kong University of Science and Technology , Hong Kong SAR, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0393-6184","authenticated-orcid":false,"given":"Yuanpeng","family":"Xiong","sequence":"additional","affiliation":[{"name":"Syneron Technology , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3526-9494","authenticated-orcid":false,"given":"Yu","family":"Wang","sequence":"additional","affiliation":[{"name":"Syneron Technology , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1339-2253","authenticated-orcid":false,"given":"Yifan","family":"Deng","sequence":"additional","affiliation":[{"name":"Syneron Technology , Guangzhou 510000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9219-0494","authenticated-orcid":false,"given":"Wenjia","family":"Wang","sequence":"additional","affiliation":[{"name":"Data Science and Analytics Thrust, The Hong Kong University of Science and Technology (Guangzhou) , Nansha, Guangzhou, 511400, Guangdong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8876-1570","authenticated-orcid":false,"given":"Bingyi","family":"Jing","sequence":"additional","affiliation":[{"name":"Department of Statistics and Data Science, Southern University of Science and Technology , Shenzhen 518000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7108-3574","authenticated-orcid":false,"given":"Xin","family":"Gao","sequence":"additional","affiliation":[{"name":"Syneron Technology , Guangzhou 510000, China"},{"name":"Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST) , Thuwal 23955, Saudi Arabia"},{"name":"Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST) , Thuwal 23955, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,2,1]]},"reference":[{"key":"2024062814445439900_btae058-B1","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1038\/nrg3356","article-title":"Next-generation proteomics: towards an integrative view of proteome dynamics","volume":"14","author":"Altelaar","year":"2013","journal-title":"Nat Rev Genet"},{"key":"2024062814445439900_btae058-B2","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1007\/s10822-020-00307-z","article-title":"TargetCPP: accurate prediction of cell-penetrating peptides from optimized multi-scale features using gradient boost decision tree","volume":"34","author":"Arif","year":"2020","journal-title":"J Comput Aided Mol Des"},{"key":"2024062814445439900_btae058-B3","doi-asserted-by":"crossref","first-page":"W39","DOI":"10.1093\/nar\/gkv416","article-title":"The meme suite","volume":"43","author":"Bailey","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2024062814445439900_btae058-B4","doi-asserted-by":"crossref","first-page":"1512","DOI":"10.1038\/s41588-023-01465-0","article-title":"Genome-wide prediction of disease variant effects with a deep protein language model","volume":"55","author":"Brandes","year":"2023","journal-title":"Nat Genet"},{"key":"2024062814445439900_btae058-B5","doi-asserted-by":"crossref","first-page":"1378","DOI":"10.1021\/acs.molpharmaceut.1c00924","article-title":"Discovery of a cyclic cell-penetrating peptide with improved endosomal escape and cytosolic delivery efficiency","volume":"19","author":"Buyanova","year":"2022","journal-title":"Mol Pharm"},{"key":"2024062814445439900_btae058-B6","doi-asserted-by":"crossref","first-page":"1184","DOI":"10.1016\/j.bbamem.2006.04.006","article-title":"Tryptophan- and arginine-rich antimicrobial peptides: structures and mechanisms of action","volume":"1758","author":"Chan","year":"2006","journal-title":"Biochim Biophys Acta"},{"key":"2024062814445439900_btae058-B7","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1093\/bib\/bbv066","article-title":"Drug\u2013target interaction prediction: databases, web servers and computational models","volume":"17","author":"Chen","year":"2016","journal-title":"Brief Bioinform"},{"key":"2024062814445439900_btae058-B8","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"Uniprot: a worldwide hub of protein knowledge","volume":"47","author":"Consortium, U","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024062814445439900_btae058-B9","doi-asserted-by":"crossref","first-page":"7628","DOI":"10.1038\/s41598-021-87134-w","article-title":"Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space","volume":"11","author":"de Oliveira","year":"2021","journal-title":"Sci Rep"},{"key":"2024062814445439900_btae058-B10","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1038\/embor.2008.56","article-title":"Peptideatlas: a resource for target selection for emerging targeted proteomics workflows","volume":"9","author":"Deutsch","year":"2008","journal-title":"EMBO Rep"},{"key":"2024062814445439900_btae058-B11","doi-asserted-by":"crossref","first-page":"10241","DOI":"10.1021\/acs.chemrev.9b00008","article-title":"Understanding cell penetration of cyclic peptides","volume":"119","author":"Dougherty","year":"2019","journal-title":"Chem Rev"},{"key":"2024062814445439900_btae058-B12","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2024062814445439900_btae058-B13","doi-asserted-by":"crossref","first-page":"3028","DOI":"10.1093\/bioinformatics\/btaa131","article-title":"StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency","volume":"36","author":"Fu","year":"2020","journal-title":"Bioinformatics"},{"key":"2024062814445439900_btae058-B14","doi-asserted-by":"crossref","first-page":"bas015","DOI":"10.1093\/database\/bas015","article-title":"Cppsite: a curated database of cell penetrating peptides","volume":"2012","author":"Gautam","year":"2012","journal-title":"Database (Oxford)"},{"key":"2024062814445439900_btae058-B15","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1186\/1479-5876-11-74","article-title":"In silico approaches for designing highly effective cell penetrating peptides","volume":"11","author":"Gautam","year":"2013","journal-title":"J Transl Med"},{"key":"2024062814445439900_btae058-B16","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/978-1-4939-2806-4_4","article-title":"Computer-aided virtual screening and designing of cell-penetrating peptides","volume":"1324","author":"Gautam","year":"2015","journal-title":"Methods Mol Biol"},{"key":"2024062814445439900_btae058-B17","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/btt518","article-title":"CPPpred: prediction of cell penetrating peptides","volume":"29","author":"Holton","year":"2013","journal-title":"Bioinformatics"},{"key":"2024062814445439900_btae058-B18","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1038\/sj.mt.6300346","article-title":"Characterization of a novel cytotoxic cell-penetrating peptide derived from P14ARF protein","volume":"16","author":"Johansson","year":"2008","journal-title":"Mol Ther"},{"key":"2024062814445439900_btae058-B19","doi-asserted-by":"crossref","first-page":"166703","DOI":"10.1016\/j.jmb.2020.11.002","article-title":"Cppsite 2.0: an available database of experimentally validated cell-penetrating peptides predicting their secondary and tertiary structures","volume":"433","author":"Kardani","year":"2021","journal-title":"J Mol Biol"},{"key":"2024062814445439900_btae058-B20","first-page":"500902","article-title":"Language models of protein sequences at the scale of evolution enable accurate structure prediction","volume":"2022","author":"Lin","year":"2022","journal-title":"BioRxiv"},{"key":"2024062814445439900_btae058-B21","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1124\/dmd.32.1.132","article-title":"Development of a computational approach to predict blood\u2013brain barrier permeability","volume":"32","author":"Liu","year":"2004","journal-title":"Drug Metab Dispos"},{"key":"2024062814445439900_btae058-B22","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1021\/acs.jcim.9b00225","article-title":"PERMM: a web tool and database for analysis of passive membrane permeability and translocation pathways of bioactive molecules","volume":"59","author":"Lomize","year":"2019","journal-title":"J Chem Inf Model"},{"key":"2024062814445439900_btae058-B23","doi-asserted-by":"crossref","first-page":"167604","DOI":"10.1016\/j.jmb.2022.167604","article-title":"MLCPP 2.0: an updated cell-penetrating peptides and their uptake efficiency predictor","volume":"434","author":"Manavalan","year":"2022","journal-title":"J Mol Biol"},{"key":"2024062814445439900_btae058-B24","doi-asserted-by":"crossref","first-page":"850","DOI":"10.1016\/j.drudis.2012.03.002","article-title":"Cell-penetrating peptides: classes, origin, and current landscape","volume":"17","author":"Milletti","year":"2012","journal-title":"Drug Discov Today"},{"key":"2024062814445439900_btae058-B25","doi-asserted-by":"crossref","first-page":"4034","DOI":"10.1021\/bi5004102","article-title":"Early endosomal escape of a cyclic cell-penetrating peptide allows effective cytosolic cargo delivery","volume":"53","author":"Qian","year":"2014","journal-title":"Biochemistry"},{"key":"2024062814445439900_btae058-B26","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1016\/S1074-5521(02)00189-8","article-title":"Cellular import mediated by nuclear localization signal peptide sequences","volume":"9","author":"Ragin","year":"2002","journal-title":"Chem Biol"},{"key":"2024062814445439900_btae058-B27","first-page":"273","author":"Rendle"},{"key":"2024062814445439900_btae058-B28","doi-asserted-by":"crossref","first-page":"12690","DOI":"10.1002\/chem.201702117","article-title":"Bicyclic peptides as next-generation therapeutics","volume":"23","author":"Rhodes","year":"2017","journal-title":"Chemistry"},{"key":"2024062814445439900_btae058-B29","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1074\/jbc.M209548200","article-title":"Cell-penetrating peptides: a reevaluation of the mechanism of cellular uptake","volume":"278","author":"Richard","year":"2003","journal-title":"J Biol Chem"},{"key":"2024062814445439900_btae058-B30","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J Chem Inf Model"},{"key":"2024062814445439900_btae058-B31","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/j.jconrel.2014.07.055","article-title":"Distal phenylalanine modification for enhancing cellular delivery of fluorophores, proteins and quantum dots by cell penetrating peptides","volume":"195","author":"Sayers","year":"2014","journal-title":"J Control Release"},{"key":"2024062814445439900_btae058-B32","doi-asserted-by":"crossref","first-page":"1806","DOI":"10.1016\/j.febslet.2009.11.046","article-title":"Arginine-rich cell-penetrating peptides","volume":"584","author":"Schmidt","year":"2010","journal-title":"FEBS Lett"},{"key":"2024062814445439900_btae058-B33","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1093\/bib\/bby124","article-title":"Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools","volume":"21","author":"Su","year":"2020","journal-title":"Brief Bioinform"},{"key":"2024062814445439900_btae058-B34","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1016\/j.bbrc.2016.06.035","article-title":"Prediction of cell-penetrating peptides with feature selection techniques","volume":"477","author":"Tang","year":"2016","journal-title":"Biochem Biophys Res Commun"},{"key":"2024062814445439900_btae058-B35","doi-asserted-by":"crossref","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","article-title":"Large language models in medicine","volume":"29","author":"Thirunavukarasu","year":"2023","journal-title":"Nat Med"},{"key":"2024062814445439900_btae058-B36","first-page":"30","article-title":"Attention is all you need","author":"Vaswani","year":"2017","journal-title":"31st International Conference on Neural Information Processing Systems (NIPS'17)"},{"key":"2024062814445439900_btae058-B37","doi-asserted-by":"crossref","first-page":"8293","DOI":"10.1038\/s41598-017-08963-2","article-title":"Cell surface binding, uptaking and anticancer activity of l-k6, a lysine\/leucine-rich peptide, on human breast cancer mcf-7 cells","volume":"7","author":"Wang","year":"2017","journal-title":"Sci Rep"},{"key":"2024062814445439900_btae058-B38","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1186\/s12864-017-4128-1","article-title":"SkipCPP-pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides","volume":"18","author":"Wei","year":"2017","journal-title":"BMC Genomics"},{"key":"2024062814445439900_btae058-B39","doi-asserted-by":"crossref","first-page":"2044","DOI":"10.1021\/acs.jproteome.7b00019","article-title":"CPPred-rf: a sequence-based predictor for identifying cell-penetrating peptides and their uptake efficiency","volume":"16","author":"Wei","year":"2017","journal-title":"J Proteome Res"},{"key":"2024062814445439900_btae058-B40","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1093\/bioinformatics\/btv550","article-title":"Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data","volume":"32","author":"Yang","year":"2016","journal-title":"Bioinformatics"},{"key":"2024062814445439900_btae058-B41","doi-asserted-by":"crossref","first-page":"bbac545","DOI":"10.1093\/bib\/bbac545","article-title":"SiameseCPP: a sequence-based siamese network to predict cell-penetrating peptides by contrastive learning","volume":"24","author":"Zhang","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024062814445439900_btae058-B42","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1016\/j.addr.2012.10.003","article-title":"Advanced materials and processing for drug delivery: the past and the future","volume":"65","author":"Zhang","year":"2013","journal-title":"Adv Drug Deliv Rev"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae058\/56555935\/btae058.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/2\/btae058\/58357988\/btae058.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/2\/btae058\/58357988\/btae058.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T17:27:42Z","timestamp":1719595662000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae058\/7596624"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2024,2,1]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae058","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,2,1]]},"published":{"date-parts":[[2024,2,1]]},"article-number":"btae058"}}