{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T19:44:22Z","timestamp":1773431062267,"version":"3.50.1"},"reference-count":53,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,2,18]],"date-time":"2022-02-18T00:00:00Z","timestamp":1645142400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"FEDER---PT2020 partnership agreement","award":["UIDB\/50008\/2020"],"award-info":[{"award-number":["UIDB\/50008\/2020"]}]},{"name":"FCT\/MEC through national funds","award":["FCT\/MEC"],"award-info":[{"award-number":["FCT\/MEC"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Large-scale labeled datasets are generally necessary for successfully training a deep neural network in the computer vision domain. In order to avoid the costly and tedious work of manually annotating image datasets, self-supervised learning methods have been proposed to learn general visual features automatically. In this paper, we first focus on image colorization with generative adversarial networks (GANs) because of their ability to generate the most realistic colorization results. Then, via transfer learning, we use this as a proxy task for visual understanding. Particularly, we propose to use conditional GANs (cGANs) for image colorization and transfer the gained knowledge to two other downstream tasks, namely, multilabel image classification and semantic segmentation. This is the first time that GANs have been used for self-supervised feature learning through image colorization. Through extensive experiments with the COCO and Pascal datasets, we show an increase of 5% for the classification task and 2.5% for the segmentation task. This demonstrates that image colorization with conditional GANs can boost other downstream tasks\u2019 performance without the need for manual annotation.<\/jats:p>","DOI":"10.3390\/s22041599","type":"journal-article","created":{"date-parts":[[2022,2,21]],"date-time":"2022-02-21T08:34:47Z","timestamp":1645432487000},"page":"1599","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["GAN-Based Image Colorization for Self-Supervised Visual Feature Learning"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3101-8495","authenticated-orcid":false,"given":"Sandra","family":"Treneska","sequence":"first","affiliation":[{"name":"Faculty of Computer Science and Engineering, University Ss. Cyril and Methodius, 1000 Skopje, North Macedonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7664-0168","authenticated-orcid":false,"given":"Eftim","family":"Zdravevski","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, University Ss. Cyril and Methodius, 1000 Skopje, North Macedonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3394-6762","authenticated-orcid":false,"given":"Ivan Miguel","family":"Pires","sequence":"additional","affiliation":[{"name":"Instituto de Telecomunica\u00e7\u00f5es, Universidade da Beira Interior, 6200-001 Covilh\u00e3, Portugal"},{"name":"Escola de Ci\u00eancias e Tecnologias, University of Tr\u00e1s-os-Montes e Alto Douro, Quinta de Prados, 5001-801 Vila Real, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5336-1796","authenticated-orcid":false,"given":"Petre","family":"Lameski","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, University Ss. Cyril and Methodius, 1000 Skopje, North Macedonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3411-2399","authenticated-orcid":false,"given":"Sonja","family":"Gievska","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, University Ss. Cyril and Methodius, 1000 Skopje, North Macedonia"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_3","first-page":"91","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015, January 7\u201312). Show and tell: A neural image caption generator. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Li, F.-F. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1452","DOI":"10.1109\/TPAMI.2017.2723009","article-title":"Places: A 10 million image database for scene recognition","volume":"40","author":"Zhou","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","first-page":"17","article-title":"Transfer learning. Handbook of Research on Machine Learning Applications","volume":"3","author":"Torrey","year":"2009","journal-title":"IGI Glob."},{"key":"ref_12","unstructured":"Beyer, L., H\u00e9naff, O.J., Kolesnikov, A., Zhai, X., and van den Oord, A. (2020). Are we done with imagenet?. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"4037","DOI":"10.1109\/TPAMI.2020.2992393","article-title":"Self-supervised visual feature learning with deep neural networks: A survey","volume":"43","author":"Jing","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., and Efros, A.A. (2016, January 27\u201330). Context encoders: Feature learning by inpainting. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.278"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ledig, C., Theis, L., Husz\u00e1r, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., and Wang, Z. (2017, January 21\u201326). Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.19"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, R., Isola, P., and Efros, A.A. (2016, January 11\u201314). Colorful image colorization. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_40"},{"key":"ref_17","unstructured":"Mirza, M., and Osindero, S. (2014). Conditional generative adversarial nets. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4018\/jdwm.2007070101","article-title":"Multi-label classification: An overview","volume":"3","author":"Tsoumakas","year":"2007","journal-title":"Int. J. Data Warehous. Min. (IJDWM)"},{"key":"ref_19","unstructured":"Thoma, M. (2016). A survey of semantic segmentation. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Noroozi, M., Vinjimoor, A., Favaro, P., and Pirsiavash, H. (2018, January 16\u201323). Boosting self-supervised learning via knowledge transfer. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00975"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Noroozi, M., and Favaro, P. (2016, January 11\u201314). Unsupervised learning of visual representations by solving jigsaw puzzles. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46466-4_5"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Misra, I., Zitnick, C.L., and Hebert, M. (2016, January 11\u201314). Shuffle and learn: Unsupervised learning using temporal order verification. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_32"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Pathak, D., Girshick, R., Doll\u00e1r, P., Darrell, T., and Hariharan, B. (2017, January 21\u201326). Learning features by watching objects move. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.638"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ren, Z., and Lee, Y.J. (2018, January 16\u201323). Cross-domain self-supervised multi-task feature learning using synthetic imagery. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00086"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Agrawal, P., Carreira, J., and Malik, J. (2015, January 7\u201313). Learning to see by moving. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.13"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Sayed, N., Brattoli, B., and Ommer, B. (2018, January 9\u201312). Cross and learn: Cross-modal self-supervision. Proceedings of the German Conference on Pattern Recognition, Stuttgart, Germany.","DOI":"10.1007\/978-3-030-12939-2_17"},{"key":"ref_27","unstructured":"Korbar, B., Tran, D., and Torresani, L. (2018). Cooperative learning of audio and video models from self-supervised synchronization. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, C.L., Sohn, K., Yoon, J., and Pfister, T. (2021, January 20\u201325). Cutpaste: Self-supervised learning for anomaly detection and localization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00954"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Jin, X., Chen, Z., Lin, J., Chen, Z., and Zhou, W. (2019, January 22\u201325). Unsupervised single image deraining with self-supervised constraints. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803238"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Larsson, G., Maire, M., and Shakhnarovich, G. (2016, January 11\u201314). Learning representations for automatic colorization. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_35"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Larsson, G., Maire, M., and Shakhnarovich, G. (2017, January 21\u201326). Colorization as a proxy task for visual understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.96"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhang, R., Isola, P., and Efros, A.A. (2017, January 21\u201326). Split-brain autoencoders: Unsupervised learning by cross-channel prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.76"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J.Y., Zhou, T., and Efros, A.A. (2017, January 21\u201326). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.632"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Nazeri, K., Ng, E., and Ebrahimi, M. (2018, January 12\u201313). Image colorization using generative adversarial networks. Proceedings of the International Conference on Articulated Motion and Deformable Objects, Palma de Mallorca, Spain.","DOI":"10.1007\/978-3-319-94544-6_9"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Cao, Y., Zhou, Z., Zhang, W., and Yu, Y. (2017, January 18\u201322). Unsupervised diverse colorization via generative adversarial networks. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Skopje, North Macedonia.","DOI":"10.1007\/978-3-319-71249-9_10"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Kiani, L., Saeed, M., and Nezamabadi-pour, H. (2020, January 18\u201320). Image Colorization Using Generative Adversarial Networks and Transfer Learning. Proceedings of the 2020 International Conference on Machine Vision and Image Processing (MVIP), Qom, Iran.","DOI":"10.1109\/MVIP49855.2020.9116882"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Deshpande, A., Rock, J., and Forsyth, D. (2015, January 7\u201313). Learning large-scale automatic image colorization. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.72"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2897824.2925974","article-title":"Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification","volume":"35","author":"Iizuka","year":"2016","journal-title":"ACM Trans. Graph. (ToG)"},{"key":"ref_39","unstructured":"Baldassarre, F., Mor\u00edn, D.G., and Rod\u00e9s-Guirao, L. (2017). Deep koalarization: Image colorization using cnns and inception-resnet-v2. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Kalajdjieski, J., Zdravevski, E., Corizzo, R., Lameski, P., Kalajdziski, S., Pires, I.M., Garcia, N.M., and Trajkovik, V. (2020). Air Pollution Prediction with Multi-Modal Data and Deep Neural Networks. Remote Sens., 12.","DOI":"10.3390\/rs12244142"},{"key":"ref_41","first-page":"114332R","article-title":"Refined image colorization using capsule generative adversarial networks","volume":"Volume 11433","author":"Hosni","year":"2020","journal-title":"Proceedings of the Twelfth International Conference on Machine Vision (ICMV 2019)"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Vitoria, P., Raad, L., and Ballester, C. (2020, January 1\u20135). Chromagan: Adversarial picture colorization with semantic class distribution. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093389"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Yoo, S., Bahng, H., Chung, S., Lee, J., Chang, J., and Choo, J. (2019, January 15\u201320). Coloring with limited data: Few-shot colorization via memory augmented networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01154"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"21604","DOI":"10.1109\/ACCESS.2021.3055575","article-title":"Double-Channel Guided Generative Adversarial Network for Image Colorization","volume":"9","author":"Du","year":"2021","journal-title":"IEEE Access"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.neucom.2019.02.003","article-title":"Survey on semantic segmentation using deep learning techniques","volume":"338","author":"Lateef","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_48","unstructured":"Treneska, S. (2022, January 26). Image Colorization. Available online: https:\/\/github.com\/sandratreneska\/Image-colorization."},{"key":"ref_49","unstructured":"Treneska, S. (2022, January 26). Self-Supervised Visual Feature Learning. Available online: https:\/\/github.com\/sandratreneska\/Self-supervised-visual-feature-learning."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Lameski, J., Jovanov, A., Zdravevski, E., Lameski, P., and Gievska, S. (2019, January 1\u20134). Skin lesion segmentation with deep learning. Proceedings of the IEEE EUROCON 2019-18th International Conference on Smart Technologies, Novi Sad, Serbia.","DOI":"10.1109\/EUROCON.2019.8861636"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"11591","DOI":"10.1038\/s41598-019-48004-8","article-title":"iW-Net: An automatic and minimalistic interactive lung nodule segmentation deep network","volume":"9","author":"Aresta","year":"2019","journal-title":"Sci. Rep."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"106164","DOI":"10.1016\/j.asoc.2020.106164","article-title":"From Big Data to business analytics: The case study of churn prediction","volume":"90","author":"Zdravevski","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"100203","DOI":"10.1016\/j.bdr.2021.100203","article-title":"Cost optimization for big data workloads based on dynamic scheduling and cluster-size tuning","volume":"25","author":"Grzegorowski","year":"2021","journal-title":"Big Data Res."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/4\/1599\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:22:18Z","timestamp":1760134938000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/4\/1599"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,18]]},"references-count":53,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["s22041599"],"URL":"https:\/\/doi.org\/10.3390\/s22041599","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,18]]}}}