{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,5]],"date-time":"2026-04-05T01:07:10Z","timestamp":1775351230972,"version":"3.50.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:00:00Z","timestamp":1643241600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Open Research Project of the State Key Laboratory of Media Convergence and Communication, Communication University of China","award":["SKLMCC2020KF004"],"award-info":[{"award-number":["SKLMCC2020KF004"]}]},{"DOI":"10.13039\/501100009592","name":"Beijing Municipal Science & Technology Commission","doi-asserted-by":"crossref","award":["Z191100007119002"],"award-info":[{"award-number":["Z191100007119002"]}],"id":[{"id":"10.13039\/501100009592","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Key Research Program of Frontier Sciences, CAS","award":["ZDBS-LY-7024"],"award-info":[{"award-number":["ZDBS-LY-7024"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62006221"],"award-info":[{"award-number":["62006221"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,1,31]]},"abstract":"<jats:p>As a basic component in multimedia applications, object detectors are generally trained on a fixed set of classes that are pre-defined. However, new object classes often emerge after the models are trained in practice. Modern object detectors based on Convolutional Neural Networks (CNN) suffer from catastrophic forgetting when fine-tuning on new classes without the original training data. Therefore, it is critical to improve the incremental learning capability on object detection. In this article, we propose a novel Residual-Distillation-based Incremental learning method on Object Detection (RD-IOD). Our approach rests on the creation of a triple-network based on Faster R-CNN. To enable continuous learning from new classes, we use the original model as well as a residual model to guide the learning of the incremental model on new classes while maintaining the previous learned knowledge. To better maintain the discrimination between the features of old and new classes, the residual model is jointly trained with the incremental model on new classes in the incremental learning procedure. In addition, a two-level distillation scheme is designed to guide the training process, which consists of (1) a general distillation for imitating the original model in feature space along with a residual distillation on the features in both image level and instance level, and (2) a joint classification distillation on the output layers. To well preserve the learned knowledge, we design a 2-threshold training strategy to guide the learning of a Region Proposal Network and a detection head. Extensive experiments conducted on VOC2007 and COCO demonstrate that the proposed method can effectively learn to incrementally detect objects of new classes, and the problem of catastrophic forgetting is mitigated. Our code is available at https:\/\/github.com\/yangdb\/RD-IOD.<\/jats:p>","DOI":"10.1145\/3472393","type":"journal-article","created":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T19:44:21Z","timestamp":1643312661000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["RD-IOD: Two-Level Residual-Distillation-Based Triple-Network for Incremental Object Detection"],"prefix":"10.1145","volume":"18","author":[{"given":"Dongbao","family":"Yang","sequence":"first","affiliation":[{"name":"Chinese Academy of Sciences and University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Yu","family":"Zhou","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences, Beijing, China"}]},{"given":"Wei","family":"Shi","sequence":"additional","affiliation":[{"name":"Carleton University, Ottawa, Canada"}]},{"given":"Dayan","family":"Wu","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences, Beijing, China"}]},{"given":"Weiping","family":"Wang","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2022,1,27]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_9"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.753"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.49"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00644"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.5555\/3008751.3008808"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2019.8851980"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.5555\/2986459.2986733"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0275-4"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1364-6613(99)01294-2"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_1_12_2","article-title":"An empirical investigation of catastrophic forgetting in gradient-based neural networks","author":"Goodfellow Ian J.","year":"2013","unstructured":"Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. 2013. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013).","journal-title":"arXiv preprint arXiv:1312.6211"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3323873.3325033"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00009"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_16_2","article-title":"Distilling the knowledge in a neural network","author":"Hinton Geoffrey","year":"2015","unstructured":"Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).","journal-title":"arXiv preprint arXiv:1503.02531"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00092"},{"key":"e_1_3_1_18_2","article-title":"Less-forgetting learning in deep neural networks","author":"Jung Heechul","year":"2016","unstructured":"Heechul Jung, Jeongwoo Ju, Minju Jung, and Junmo Kim. 2016. Less-forgetting learning in deep neural networks. arXiv preprint arXiv:1607.00122 (2016).","journal-title":"arXiv preprint arXiv:1607.00122"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/3504035.3504446"},{"key":"e_1_3_1_20_2","article-title":"Supervised contrastive learning","volume":"33","author":"Khosla Prannay","year":"2020","unstructured":"Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. In Advances in Neural Information Processing Systems 33.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.431"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3318216.3363317"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2773081"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2954747"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6840"},{"key":"e_1_3_1_31_2","first-page":"109","volume-title":"Psychology of Learning and Motivation","author":"McCloskey Michael","year":"1989","unstructured":"Michael McCloskey and Neal J. Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Vol. 24. Elsevier, 109\u2013165."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.83"},{"key":"e_1_3_1_33_2","article-title":"Incremental few-shot object detection","author":"Perez-Rua Juan-Manuel","year":"2020","unstructured":"Juan-Manuel Perez-Rua, Xiatian Zhu, Timothy Hospedales, and Tao Xiang. 2020. Incremental few-shot object detection. arXiv preprint arXiv:2003.04668 (2020).","journal-title":"arXiv preprint arXiv:2003.04668"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/5326.983933"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01354"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9413821"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2019.00095"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.148"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.587"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969250"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.368"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/3504035.3504538"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2018.2841046"},{"key":"e_1_3_1_44_2","article-title":"R-Net: A relationship network for efficient and accurate scene text detection","author":"Wang Yuxin","year":"2020","unstructured":"Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Youliang Tian, Zilong Fu, and Yongdong Zhang. 2020. R-Net: A relationship network for efficient and accurate scene text detection. IEEE Transactions on Multimedia 23 (2020), 1316\u20131329.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00658"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.5555\/3305890.3306093"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093365"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9412301"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2885238"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3004267"},{"key":"e_1_3_1_51_2","article-title":"Objects as points","author":"Zhou Xingyi","year":"2019","unstructured":"Xingyi Zhou, Dequan Wang, and Philipp Kr\u00e4henb\u00fchl. 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).","journal-title":"arXiv preprint arXiv:1904.07850"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_26"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00851"},{"key":"e_1_3_1_54_2","doi-asserted-by":"crossref","unstructured":"Xin Wang Thomas E. Huang Trevor Darrell Joseph E. Gonzalez and Fisher Yu. 2020. Frustratingly simple few-shot object detection. arXiv preprint arXiv:2003.06957 .","DOI":"10.1109\/ICCV.2019.00851"},{"key":"e_1_3_1_55_2","unstructured":"Dingwen Zhang Haibin Tian and Jungong Han. 2021. Few-cost salient object detection with adversarial-paced learning. arXiv preprint arXiv:2104.01928 ."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472393","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472393","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:10Z","timestamp":1750193290000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472393"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,27]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1,31]]}},"alternative-id":["10.1145\/3472393"],"URL":"https:\/\/doi.org\/10.1145\/3472393","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,27]]},"assertion":[{"value":"2020-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}