{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T17:15:35Z","timestamp":1767374135046,"version":"3.37.3"},"reference-count":60,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2023,9,19]],"date-time":"2023-09-19T00:00:00Z","timestamp":1695081600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,19]],"date-time":"2023-09-19T00:00:00Z","timestamp":1695081600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100008205","name":"Auckland University of Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100008205","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Semi-supervised learning offers a solution to the high cost and limited availability of manually labeled samples in supervised learning. In semi-supervised visual object detection, the use of unlabeled data can significantly enhance the performance of deep learning models. In this paper, we introduce an end-to-end framework, named CISO (Co-Iteration Semi-Supervised Learning for Object Detection), which integrates a knowledge distillation approach and a collaborative, iterative semi-supervised learning strategy. To maximize the utilization of pseudo-label data and address the scarcity of pseudo-label data due to high threshold settings, we propose a mean iteration approach where all unlabeled data is applied to each training iteration. Pseudo-label data with high confidence is extracted based on an ever-changing threshold (average intersection over union of all pseudo-labeled data). This strategy not only ensures the accuracy of the pseudo-label but also optimizes the use of unlabeled data. Subsequently, we apply a weak-strong data augmentation strategy to update the model. Lastly, we evaluate CISO using Swin Transformer model and conduct comprehensive experiments on MS-COCO. Our framework showcases impressive results, outperforms the state-of-the-art methods by 2.16 mAP and 1.54 mAP with 10% and 5% labeled data, respectively.<\/jats:p>","DOI":"10.1007\/s11042-023-16915-4","type":"journal-article","created":{"date-parts":[[2023,9,19]],"date-time":"2023-09-19T08:45:10Z","timestamp":1695113110000},"page":"33941-33957","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["CISO: Co-iteration semi-supervised learning for visual object detection"],"prefix":"10.1007","volume":"83","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0391-2315","authenticated-orcid":false,"given":"Jianchun","family":"Qi","sequence":"first","affiliation":[]},{"given":"Minh","family":"Nguyen","sequence":"additional","affiliation":[]},{"given":"Wei Qi","family":"Yan","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,9,19]]},"reference":[{"key":"16915_CR1","doi-asserted-by":"crossref","unstructured":"Arazo E, Ortego D, Albert P, O\u2019Connor NE, McGuinness K (2020) Pseudo-labeling and confirmation bias in deep semi-supervised learning. International Joint Conference on Neural Networks, pp 1\u20138","DOI":"10.1109\/IJCNN48605.2020.9207304"},{"key":"16915_CR2","unstructured":"Bachman P, Alsharif O, Precup D (2014) Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems, pp 3365\u20133373"},{"key":"16915_CR3","doi-asserted-by":"crossref","unstructured":"Bar A, Wang X, Kantorov V, Reed CJ, Herzig R, Chechik G, Rohrbach A, Darrell T, Globerson A (2022) DETReg: unsupervised pretraining with region priors for object detection. IEEE International Conference on Computer Vision, pp 14605\u201314615","DOI":"10.1109\/CVPR52688.2022.01420"},{"key":"16915_CR4","unstructured":"Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: a holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems, pp 5050\u20135060"},{"key":"16915_CR5","doi-asserted-by":"crossref","unstructured":"Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. ECCV, pp 213\u2013229","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"16915_CR6","unstructured":"Chapelle O, Sch\u00f6lkopf B, Zien A (2010) Semi-supervised learning. Adaptive Computation and Machine Learning. MIT Press 21(1):2"},{"key":"16915_CR7","doi-asserted-by":"crossref","unstructured":"Chen B, Chen W, Yang S, Xuan Y, Song J, Xie D, Pu S, Song M, Zhuang Y (2022) Label matching semi-supervised object detection. IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 14381\u201314390","DOI":"10.1109\/CVPR52688.2022.01398"},{"key":"16915_CR8","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88:303\u2013338","journal-title":"Int J Comput Vision"},{"key":"16915_CR9","doi-asserted-by":"crossref","unstructured":"Girshick R (2015) Fast R-CNN. IEEE International Conference on Computer Vision, pp 1440\u20131448","DOI":"10.1109\/ICCV.2015.169"},{"key":"16915_CR10","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask R-CNN. IEEE International Conference on Computer Vision, pp 2961\u20132969","DOI":"10.1109\/ICCV.2017.322"},{"key":"16915_CR11","doi-asserted-by":"crossref","unstructured":"Heo B, Kim J, Yun S, Park H, Kwak N, Choi JY (2019) A comprehensive overhaul of feature distillation. IEEE International Conference on Computer Vision, pp 1921\u20131930","DOI":"10.1109\/ICCV.2019.00201"},{"key":"16915_CR12","unstructured":"Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network.  In NIPS Deep Learning and Representation Learning Workshop."},{"key":"16915_CR13","doi-asserted-by":"crossref","unstructured":"Iscen A, Tolias G, Avrithis Y, Chum O (2019) Label propagation for deep semi-supervised learning. IEEE International Conference on Computer Vision, pp 5070\u20135079","DOI":"10.1109\/CVPR.2019.00521"},{"issue":"3","key":"16915_CR14","doi-asserted-by":"publisher","first-page":"685","DOI":"10.1007\/s12525-021-00475-2","volume":"31","author":"C Janiesch","year":"2021","unstructured":"Janiesch C, Zschech P, Heinrich K (2021) Machine learning and deep learning. Electron Mark 31(3):685\u2013695","journal-title":"Electron Mark"},{"key":"16915_CR15","unstructured":"Jeong J, Lee S, Kim J, Kwak N (2019) Consistency-based semi-supervised learning for object detection. In Advances in Neural Information Processing Systems, pp 10759\u201310768"},{"key":"16915_CR16","doi-asserted-by":"crossref","unstructured":"Joseph KJ, Khan S, Khan FS, Balasubramanian VN (2021) Towards open world object detection. IEEE International Conference on Computer Vision, pp 5830\u20135840","DOI":"10.1109\/CVPR46437.2021.00577"},{"key":"16915_CR17","unstructured":"Kim J, Hur Y, Park S, Yang E, Hwang SJ, Shin J (2020) Distribution aligning refinery of pseudo-label for imbalanced semi-supervised learning. In Advances in Neural Information Processing Systems, pp 14567\u201314579"},{"key":"16915_CR18","unstructured":"Komodakis N, Zagoruyko S (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. International Conference on Learning Representations."},{"issue":"6","key":"16915_CR19","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","volume":"60","author":"A Krizhevsky","year":"2017","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84\u201390","journal-title":"Commun ACM"},{"issue":"7553","key":"16915_CR20","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436\u2013444","journal-title":"Nature"},{"key":"16915_CR21","unstructured":"Lee DH (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. ICML, pp 896"},{"key":"16915_CR22","doi-asserted-by":"crossref","unstructured":"Li K, Liu C, Zhao H, Zhang Y, Fu Y (2021) Ecacl: a holistic framework for semi-supervised domain adaptation. IEEE\/CVF International Conference on Computer Vision, pp 8578\u20138587","DOI":"10.1109\/ICCV48922.2021.00846"},{"key":"16915_CR23","doi-asserted-by":"crossref","unstructured":"Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft COCO: common objects in context. ECCV, pp 740\u2013755","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"16915_CR24","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. ECCV, pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"16915_CR25","doi-asserted-by":"crossref","unstructured":"Liu YC, Ma CY, He Z, Kuo CW, Chen K, Zhang P, Wu B, Kira Z, Vajda P (2022) Unbiased teacher for semi-supervised object detection.  International Conference on Learning Representations, pp 1\u201314","DOI":"10.1109\/CVPR52688.2022.00959"},{"key":"16915_CR26","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. IEEE International Conference on Computer Vision, pp 10012\u201310022","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"16915_CR27","first-page":"1979","volume":"8","author":"T Miyato","year":"2018","unstructured":"Miyato T, Maeda SI, Koyama M, Ishii S (2018) Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 8:1979\u20131993","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"16915_CR28","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1023\/A:1008162616689","volume":"38","author":"C Papageorgiou","year":"2000","unstructured":"Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vis 38:15\u201333","journal-title":"Int J Comput Vis"},{"key":"16915_CR29","doi-asserted-by":"crossref","unstructured":"Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. IEEE International Conference on Computer Vision, pp 3967\u20133976","DOI":"10.1109\/CVPR.2019.00409"},{"issue":"2","key":"16915_CR30","first-page":"5","volume":"1","author":"N Passalis","year":"2018","unstructured":"Passalis N, Tefas A (2021) Probabilistic knowledge transfer for deep representation learning.\u00a0IEEE Transactions on Neural Networks and Learning Systems, 32(5)\u00a0","journal-title":"CoRR"},{"key":"16915_CR31","unstructured":"Rasmus A, Berglund M, Honkala M, Valpola H, Raiko T (2015) Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pp 3546\u20133554"},{"key":"16915_CR32","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. IEEE International Conference on Computer Vision, pp 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"16915_CR33","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pp 91\u201399"},{"key":"16915_CR34","unstructured":"Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) FitNets: hints for thin deep nets. International Conference on Learning Representations."},{"key":"16915_CR35","doi-asserted-by":"crossref","unstructured":"Sajjadi M, Javanmardi M, Tasdizen T (2016) Mutual exclusivity loss for semi-supervised deep learning. IEEE Int Conf Image Process (ICIP), pp 1908\u20131912","DOI":"10.1109\/ICIP.2016.7532690"},{"key":"16915_CR36","unstructured":"Sajjadi M, Javanmardi M, Tasdizen T (2016) Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems, pp 1163\u20131171"},{"key":"16915_CR37","doi-asserted-by":"crossref","unstructured":"Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. IEEE International Conference on Computer Vision, pp 761\u2013769","DOI":"10.1109\/CVPR.2016.89"},{"key":"16915_CR38","doi-asserted-by":"crossref","unstructured":"Simard PY, Steinkraus D, Platt JC (2003) Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of International Conference on Document Analysis and Recognition, p 958","DOI":"10.1109\/ICDAR.2003.1227801"},{"key":"16915_CR39","unstructured":"Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, Li CL (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. In Advances in Neural Information Processing Systems, pp 596\u2013608"},{"key":"16915_CR40","unstructured":"Sohn K, Zhang Z, Li CL, Zhang H, Lee CY, Pfister T (2021) A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2102.09480"},{"key":"16915_CR41","doi-asserted-by":"crossref","unstructured":"Suzuki T (2022) Teachaugment: data augmentation optimization using teacher knowledge. IEEE International Conference on Computer Vision, pp 10904\u201310914","DOI":"10.1109\/CVPR52688.2022.01063"},{"key":"16915_CR42","doi-asserted-by":"crossref","unstructured":"Tang Y, Chen W, Luo Y, Zhang Y (2021) Humble teachers teach better students for semi-supervised object detection. IEEE International Conference on Computer Vision, pp 3132\u20133141","DOI":"10.1109\/CVPR46437.2021.00315"},{"key":"16915_CR43","unstructured":"Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems, pp 1195\u20131204"},{"key":"16915_CR44","doi-asserted-by":"crossref","unstructured":"Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. IEEE International Conference on Computer Vision, pp 9627\u20139636","DOI":"10.1109\/ICCV.2019.00972"},{"key":"16915_CR45","doi-asserted-by":"crossref","unstructured":"Wang CY, Bochkovskiy A, Liao HYM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 7464\u20137475","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"16915_CR46","doi-asserted-by":"crossref","unstructured":"Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. IEEE International Conference on Computer Vision, pp 3156\u20133164","DOI":"10.1109\/CVPR.2017.683"},{"key":"16915_CR47","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1016\/j.neucom.2020.01.085","volume":"396","author":"X Wu","year":"2020","unstructured":"Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39\u201364","journal-title":"Neurocomputing"},{"key":"16915_CR48","unstructured":"Xie Q, Dai Z, Hovy E, Luong T, Le Q (2020) Unsupervised data augmentation for consistency training. In Advances in Neural Information Processing Systems, pp 6256\u20136268"},{"key":"16915_CR49","doi-asserted-by":"crossref","unstructured":"Xie Q, Luong MT, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. IEEE International Conference on Computer Vision, pp 10687\u201310698","DOI":"10.1109\/CVPR42600.2020.01070"},{"key":"16915_CR50","doi-asserted-by":"crossref","unstructured":"Xu M, Zhang Z, Hu H, Wang J, Wang L, Wei F, Bai X, Liu Z (2021) End-to-end semi-supervised object detection with soft teacher. IEEE\/CVF International Conference on Computer Vision, pp 3060\u20133069","DOI":"10.1109\/ICCV48922.2021.00305"},{"key":"16915_CR51","doi-asserted-by":"crossref","unstructured":"Yang F, Wu K, Zhang S, Jiang G, Liu Y, Zheng F, Zhang W, Wang C, Zeng L (2022) Class-aware contrastive semi-supervised learning. IEEE International Conference on Computer Vision, pp 14421\u201314430","DOI":"10.1109\/CVPR52688.2022.01402"},{"key":"16915_CR52","doi-asserted-by":"crossref","unstructured":"Yang Q, Wei X, Wang B, Hua XS, Zhang L (2021) Interactive self-training with mean teachers for semi-supervised object detection. IEEE International Conference on Computer Vision, pp 5941\u20135950","DOI":"10.1109\/CVPR46437.2021.00588"},{"key":"16915_CR53","doi-asserted-by":"crossref","unstructured":"Yang X, Song Z, King I, Xu Z (2022) A survey on deep semi-supervised learning. IEEE Trans Knowl Data Eng:1\u201320","DOI":"10.1109\/TKDE.2022.3151315"},{"key":"16915_CR54","doi-asserted-by":"crossref","unstructured":"Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. IEEE International Conference on Computer Vision, pp 4133\u20134141","DOI":"10.1109\/CVPR.2017.754"},{"key":"16915_CR55","doi-asserted-by":"crossref","unstructured":"Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. ECCV: 325\u2013341","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"16915_CR56","doi-asserted-by":"crossref","unstructured":"Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: regularization strategy to train strong classifiers with localizable features. IEEE International Conference on Computer Vision, pp 6023\u20136032","DOI":"10.1109\/ICCV.2019.00612"},{"key":"16915_CR57","doi-asserted-by":"crossref","unstructured":"Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. ACM International Conference on Multimedia, pp 516\u2013520","DOI":"10.1145\/2964284.2967274"},{"key":"16915_CR58","doi-asserted-by":"publisher","first-page":"3212","DOI":"10.1109\/TNNLS.2018.2876865","volume":"30","author":"ZQ Zhao","year":"2019","unstructured":"Zhao ZQ, Zheng P, Xu ST, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30:3212\u20133232","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"16915_CR59","doi-asserted-by":"crossref","unstructured":"Zhai X, Oliver A, Kolesnikov A, Beyer L (2019) S4l: Self-supervised semi-supervised learning. IEEE International Conference on Computer Vision, pp 1476\u20131485","DOI":"10.1109\/ICCV.2019.00156"},{"key":"16915_CR60","doi-asserted-by":"crossref","unstructured":"Zhou Q, Yu C, Wang Z, Qian Q, Li H (2021) Instant-teaching: an end-to-end semi-supervised object detection framework. IEEE International Conference on Computer Vision, pp 4081\u20134090","DOI":"10.1109\/CVPR46437.2021.00407"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-023-16915-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11042-023-16915-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-023-16915-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,8]],"date-time":"2024-03-08T06:50:10Z","timestamp":1709880610000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11042-023-16915-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,19]]},"references-count":60,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2024,3]]}},"alternative-id":["16915"],"URL":"https:\/\/doi.org\/10.1007\/s11042-023-16915-4","relation":{},"ISSN":["1573-7721"],"issn-type":[{"type":"electronic","value":"1573-7721"}],"subject":[],"published":{"date-parts":[[2023,9,19]]},"assertion":[{"value":"25 April 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 June 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 September 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 September 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no financial or other conflicts of interests to declare.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"The dataset used has no ethical risk and is public dataset.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical and informed consent for data used"}}]}}