{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T13:44:14Z","timestamp":1767015854907,"version":"3.41.2"},"reference-count":37,"publisher":"World Scientific Pub Co Pte Ltd","issue":"09","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072076"],"award-info":[{"award-number":["62072076"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J CIRCUIT SYST COMP"],"published-print":{"date-parts":[[2022,6]]},"abstract":"<jats:p> Self-supervised learning is a promising paradigm to address the problem of manual-annotation through effectively leveraging unlabeled videos. By solving self-supervised pretext tasks, powerful video representations can be discovered automatically. However, recent pretext tasks for videos rely on utilizing the temporal properties of videos, ignoring the crucial supervisory signals from the spatial subspace of videos. Therefore, we present a new self-supervised pretext task called Multi-Label Transformation Prediction (MLTP) to sufficiently utilize the spatiotemporal information in videos. In MLTP, all videos are jointly transformed by a set of geometric and color-space transformations, such as rotation, cropping, and color-channel split. We formulate the pretext as a multi-label prediction task. The 3D-CNN is trained to predict a composition of underlying transformations as multiple outputs. Thereby, transformation invariant video features can be learned in a self-supervised manner. Experimental results verify that 3D-CNNs pre-trained using MLTP yield video representations with improved generalization performance for action recognition downstream tasks on UCF101 ([Formula: see text]) and HMDB51 ([Formula: see text]) datasets. <\/jats:p>","DOI":"10.1142\/s0218126622501596","type":"journal-article","created":{"date-parts":[[2022,2,20]],"date-time":"2022-02-20T06:09:11Z","timestamp":1645337351000},"source":"Crossref","is-referenced-by-count":8,"title":["Self-Supervised Multi-Label Transformation Prediction for Video Representation Learning"],"prefix":"10.1142","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2815-7993","authenticated-orcid":false,"given":"Maregu","family":"Assefa","sequence":"first","affiliation":[{"name":"School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, P.\u00a0R.\u00a0China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, P.\u00a0R.\u00a0China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Getinet","family":"Yilma","sequence":"additional","affiliation":[{"name":"School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, P.\u00a0R.\u00a0China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bulbula","family":"Kumeda","sequence":"additional","affiliation":[{"name":"School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, P.\u00a0R.\u00a0China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Melese","family":"Ayalew","sequence":"additional","affiliation":[{"name":"School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, P.\u00a0R.\u00a0China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammed","family":"Seid","sequence":"additional","affiliation":[{"name":"School of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, P.\u00a0R.\u00a0China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2022,2,18]]},"reference":[{"key":"S0218126622501596BIB001","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2992393"},{"key":"S0218126622501596BIB002","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00028"},{"key":"S0218126622501596BIB003","first-page":"1","volume-title":"Int. Conf. Learning Representations","author":"Ryoo M. S.","year":"2020"},{"key":"S0218126622501596BIB004","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01301"},{"key":"S0218126622501596BIB005","first-page":"1","volume-title":"Int. Conf. Learning Representations","author":"Gidaris S.","year":"2018"},{"key":"S0218126622501596BIB006","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.167"},{"key":"S0218126622501596BIB007","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46466-4_5"},{"key":"S0218126622501596BIB009","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2019.00025"},{"key":"S0218126622501596BIB010","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3025661"},{"key":"S0218126622501596BIB011","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_30"},{"key":"S0218126622501596BIB012","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00658"},{"key":"S0218126622501596BIB013","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58604-1_26"},{"key":"S0218126622501596BIB014","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.79"},{"key":"S0218126622501596BIB015","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018545"},{"key":"S0218126622501596BIB016","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2901464"},{"key":"S0218126622501596BIB018","first-page":"6","volume-title":"Proc. IEEE Int. Conf. Computer Vision","volume":"4","author":"Jhuang H.","year":"2011"},{"key":"S0218126622501596BIB019","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.96"},{"key":"S0218126622501596BIB020","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_32"},{"key":"S0218126622501596BIB021","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01058"},{"key":"S0218126622501596BIB022","volume":"32","author":"Huang J.","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"S0218126622501596BIB023","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3084840"},{"key":"S0218126622501596BIB024","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00413"},{"key":"S0218126622501596BIB025","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6840"},{"key":"S0218126622501596BIB027","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01105"},{"key":"S0218126622501596BIB028","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00689"},{"key":"S0218126622501596BIB029","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2019.105820"},{"key":"S0218126622501596BIB030","first-page":"1","volume-title":"Proc. Neural Information Processing Systems (NIPS)","author":"Simonyan K.","year":"2015"},{"key":"S0218126622501596BIB031","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"S0218126622501596BIB032","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"S0218126622501596BIB033","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00685"},{"key":"S0218126622501596BIB034","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00675"},{"key":"S0218126622501596BIB035","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2935128"},{"key":"S0218126622501596BIB036","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009982220290"},{"key":"S0218126622501596BIB037","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526738"},{"volume-title":"Modern Information Retrieval","year":"1999","author":"Baeza-Yates R.","key":"S0218126622501596BIB038"},{"key":"S0218126622501596BIB039","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"S0218126622501596BIB040","first-page":"2579","volume":"9","author":"Maaten L. V. D.","year":"2008","journal-title":"J. Mach. Learn. Res."}],"container-title":["Journal of Circuits, Systems and Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218126622501596","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,27]],"date-time":"2022-05-27T08:15:34Z","timestamp":1653639334000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218126622501596"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,18]]},"references-count":37,"journal-issue":{"issue":"09","published-print":{"date-parts":[[2022,6]]}},"alternative-id":["10.1142\/S0218126622501596"],"URL":"https:\/\/doi.org\/10.1142\/s0218126622501596","relation":{},"ISSN":["0218-1266","1793-6454"],"issn-type":[{"type":"print","value":"0218-1266"},{"type":"electronic","value":"1793-6454"}],"subject":[],"published":{"date-parts":[[2022,2,18]]},"article-number":"2250159"}}