{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T06:21:53Z","timestamp":1773901313875,"version":"3.50.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T00:00:00Z","timestamp":1672272000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Australian Government Research Training Program Scholarship"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Subcellular localization of human proteins is essential to comprehend their functions and roles in physiological processes, which in turn helps in diagnostic and prognostic studies of pathological conditions and impacts clinical decision-making. Since proteins reside at multiple locations at the same time and few subcellular locations host far more proteins than other locations, the computational task for their subcellular localization is to train a multilabel classifier while handling data imbalance. In imbalanced data, minority classes are underrepresented, thus leading to a heavy bias towards the majority classes and the degradation of predictive capability for the minority classes. Furthermore, data imbalance in multilabel settings is an even more complex problem due to the coexistence of majority and minority classes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Our studies reveal that based on the extent of concurrence of majority and minority classes, oversampling of minority samples through appropriate data augmentation techniques holds promising scope for boosting the classification performance for the minority classes. We measured the magnitude of data imbalance per class and the concurrence of majority and minority classes in the dataset. Based on the obtained values, we identified minority and medium classes, and a new oversampling method is proposed that includes non-linear mixup, geometric and colour transformations for data augmentation and a sampling approach to prepare minibatches. Performance evaluation on the Human Protein Atlas Kaggle challenge dataset shows that the proposed method is capable of achieving better predictions for minority classes than existing methods.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Data used in this study are available at https:\/\/www.kaggle.com\/competitions\/human-protein-atlas-image-classification\/data. Source code is available at https:\/\/github.com\/priyarana\/Protein-subcellular-localisation-method.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac841","type":"journal-article","created":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T08:26:11Z","timestamp":1672302371000},"source":"Crossref","is-referenced-by-count":20,"title":["Imbalanced classification for protein subcellular localization with multilabel oversampling"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6468-8236","authenticated-orcid":false,"given":"Priyanka","family":"Rana","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales , Sydney, NSW 2052, Australia"}]},{"given":"Arcot","family":"Sowmya","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales , Sydney, NSW 2052, Australia"}]},{"given":"Erik","family":"Meijering","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales , Sydney, NSW 2052, Australia"}]},{"given":"Yang","family":"Song","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales , Sydney, NSW 2052, Australia"}]}],"member":"286","published-online":{"date-parts":[[2022,12,29]]},"reference":[{"key":"2023010802584335900_btac841-B1","doi-asserted-by":"crossref","first-page":"83591","DOI":"10.1109\/ACCESS.2022.3197189","article-title":"A convolutional neural network-based framework for classification of protein localization using confocal microscopy images","volume":"10","author":"Aggarwal","year":"2022","journal-title":"IEEE Access"},{"key":"2023010802584335900_btac841-B2","first-page":"241","author":"Arcamone","year":"2021"},{"key":"2023010802584335900_btac841-B3","first-page":"4413","author":"Berman","year":"2018"},{"key":"2023010802584335900_btac841-B4","first-page":"150","author":"Charte","year":"2013"},{"key":"2023010802584335900_btac841-B5","first-page":"110","author":"Charte","year":"2014"},{"key":"2023010802584335900_btac841-B6","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.neucom.2014.08.091","article-title":"Addressing imbalance in multilabel classification: measures and random resampling algorithms","volume":"163","author":"Charte","year":"2015","journal-title":"Neurocomputing"},{"key":"2023010802584335900_btac841-B7","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.knosys.2015.07.019","article-title":"MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation","volume":"89","author":"Charte","year":"2015","journal-title":"Knowl. Based Syst"},{"key":"2023010802584335900_btac841-B8","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1016\/j.neucom.2016.08.158","article-title":"Dealing with difficult minority labels in imbalanced mutilabel data sets","volume":"326-327","author":"Charte","year":"2019","journal-title":"Neurocomputing"},{"key":"2023010802584335900_btac841-B9","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.neucom.2017.01.118","article-title":"REMEDIAL-HwR: tackling multilabel imbalance through label decoupling and data resampling hybridization","author":"Charte","year":"2019","journal-title":"Neurocomputing"},{"key":"2023010802584335900_btac841-B10","first-page":"95","author":"Chou","year":"2020"},{"key":"2023010802584335900_btac841-B11","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1002\/jcb.20879","article-title":"Predicting protein subcellular location by fusing multiple classifiers","volume":"99","author":"Chou","year":"2006","journal-title":"J. Cell. Biochem"},{"key":"2023010802584335900_btac841-B12","doi-asserted-by":"crossref","first-page":"i7","DOI":"10.1093\/bioinformatics\/btq220","article-title":"Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixing","volume":"26","author":"Coelho","year":"2010","journal-title":"Bioinformatics"},{"key":"2023010802584335900_btac841-B13","first-page":"4690","author":"Deng","year":"2019"},{"key":"2023010802584335900_btac841-B14","doi-asserted-by":"crossref","first-page":"2993","DOI":"10.1016\/j.patcog.2015.04.005","article-title":"Deep feature learning with relative distance comparison for person re-identification","volume":"48","author":"Ding","year":"2015","journal-title":"Patt. Recogn"},{"key":"2023010802584335900_btac841-B15","author":"Elisseeff","year":"2001"},{"key":"2023010802584335900_btac841-B16","first-page":"323","author":"Galdran","year":"2021"},{"key":"2023010802584335900_btac841-B17","first-page":"770","author":"He","year":"2016"},{"key":"2023010802584335900_btac841-B18","doi-asserted-by":"crossref","first-page":"193907","DOI":"10.1109\/ACCESS.2020.3031549","article-title":"Contrastive representation learning: a framework and review","volume":"8","author":"Le-Khac","year":"2020","journal-title":"IEEE Access"},{"key":"2023010802584335900_btac841-B19","first-page":"2980","author":"Lin","year":"2017"},{"key":"2023010802584335900_btac841-B20","doi-asserted-by":"crossref","first-page":"1254","DOI":"10.1038\/s41592-019-0658-6","article-title":"Analysis of the human protein atlas image classification competition","volume":"16","author":"Ouyang","year":"2019","journal-title":"Nat. Methods"},{"key":"2023010802584335900_btac841-B21","doi-asserted-by":"crossref","first-page":"2944","DOI":"10.1073\/pnas.0912090107","article-title":"Determining the distribution of probes between different subcellular locations through automated unmixing of subcellular patterns","volume":"107","author":"Peng","year":"2010","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023010802584335900_btac841-B22","first-page":"1929","author":"Rana","year":"2021"},{"key":"2023010802584335900_btac841-B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-022-22882-x","article-title":"Data augmentation with improved regularisation and sampling for imbalanced blood cell image classification","volume":"12","author":"Rana","year":"2022","journal-title":"Sci. Rep"},{"key":"2023010802584335900_btac841-B24","first-page":"1","author":"Rana","year":"2022"},{"key":"2023010802584335900_btac841-B25","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1214\/aoms\/1177729586","article-title":"A stochastic approximation method","volume":"22","author":"Robbins","year":"1951","journal-title":"Ann. Math. Statist"},{"key":"2023010802584335900_btac841-B26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-019-0197-0","article-title":"A survey on image data augmentation for deep learning","volume":"6","author":"Shorten","year":"2019","journal-title":"J. Big Data"},{"key":"2023010802584335900_btac841-B27","first-page":"464","author":"Smith","year":"2017"},{"key":"2023010802584335900_btac841-B28","doi-asserted-by":"crossref","first-page":"107965","DOI":"10.1016\/j.patcog.2021.107965","article-title":"A review of methods for imbalanced multi-label classification","volume":"118","author":"Tarekegn","year":"2021","journal-title":"Patt. Recogn"},{"key":"2023010802584335900_btac841-B29","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1002\/pro.3307","article-title":"The human protein atlas: a spatial map of the human proteome","volume":"27","author":"Thul","year":"2018","journal-title":"Protein Sci"},{"key":"2023010802584335900_btac841-B30","doi-asserted-by":"crossref","first-page":"bbab605","DOI":"10.1093\/bib\/bbab605","article-title":"SIFLoc: a self-supervised pre-training method for enhancing the recognition of protein subcellular localization in immunofluorescence microscopic images","volume":"23","author":"Tu","year":"2022","journal-title":"Brief. Bioinformatics"},{"key":"2023010802584335900_btac841-B31","first-page":"6438","author":"Verma","year":"2019"},{"key":"2023010802584335900_btac841-B32","first-page":"230","author":"Wang","year":"2020"},{"key":"2023010802584335900_btac841-B33","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1007\/s11704-016-6309-5","article-title":"Bioimage-based protein subcellular location prediction: a comprehensive review","volume":"12","author":"Xu","year":"2018","journal-title":"Front. Comput. Sci"},{"key":"2023010802584335900_btac841-B34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-021-04196-3","article-title":"Multi-labelled proteins recognition for high-throughput microscopy images using deep convolutional neural networks","volume":"22","author":"Zhang","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2023010802584335900_btac841-B35","first-page":"1","author":"Zhang","year":"2018"},{"key":"2023010802584335900_btac841-B36","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1007\/s11063-009-9095-3","article-title":"ML-RBF: RBF neural networks for multi-label learning","volume":"29","author":"Zhang","year":"2009","journal-title":"Neural Process. Lett"},{"key":"2023010802584335900_btac841-B37","doi-asserted-by":"crossref","first-page":"1338","DOI":"10.1109\/TKDE.2006.162","article-title":"Multilabel neural networks with applications to functional genomics and text categorization","volume":"18","author":"Zhang","year":"2006","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"2023010802584335900_btac841-B38","doi-asserted-by":"crossref","first-page":"2038","DOI":"10.1016\/j.patcog.2006.12.019","article-title":"ML-KNN: a lazy learning approach to multi-label learning","volume":"40","author":"Zhang","year":"2007","journal-title":"Patt. Recogn"},{"key":"2023010802584335900_btac841-B39","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1049\/cje.2020.00.330","article-title":"Prediction of protein subcellular localization based on microscopic images via multi-task multi-instance learning","volume":"31","author":"Zhang","year":"2022","journal-title":"Chin. J. Electron"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac841\/48443101\/btac841.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac841\/48514064\/btac841.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac841\/48514064\/btac841.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,7]],"date-time":"2023-01-07T21:59:23Z","timestamp":1673128763000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btac841\/6965017"}},"subtitle":[],"editor":[{"given":"Hanchuan","family":"Peng","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,12,29]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac841","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.09.12.507675","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,1,1]]},"published":{"date-parts":[[2022,12,29]]},"article-number":"btac841"}}