{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:08:31Z","timestamp":1777734511153,"version":"3.51.4"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,10,22]],"date-time":"2021-10-22T00:00:00Z","timestamp":1634860800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2019YFB2102100"],"award-info":[{"award-number":["2019YFB2102100"]}]},{"name":"Science and Technology Development Fund of Macau SAR","award":["0015\/2019\/AKP"],"award-info":[{"award-number":["0015\/2019\/AKP"]}]},{"DOI":"10.13039\/501100021171","name":"GuangDong Basic and Applied Basic Research Foundation","doi-asserted-by":"crossref","award":["2020B515130004"],"award-info":[{"award-number":["2020B515130004"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Key-Area Research and Development Program of Guangdong Province","award":["2020B010164003"],"award-info":[{"award-number":["2020B010164003"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>\n            Transfer learning through fine-tuning a pre-trained neural network with an extremely large dataset, such as ImageNet, can significantly improve and accelerate training while the accuracy is frequently bottlenecked by the limited dataset size of the new target task. To solve the problem, some regularization methods, constraining the outer layer weights of the target network using the starting point as references (SPAR), have been studied. In this article, we propose a novel regularized transfer learning framework\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"TeX\" version=\"MathJax\">\\operatorname{DELTA}<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , namely\n            <jats:italic>\n              <jats:underline>DE<\/jats:underline>\n              ep\n              <jats:underline>L<\/jats:underline>\n              earning\n              <jats:underline>T<\/jats:underline>\n              ransfer using Feature Map with\n              <jats:underline>A<\/jats:underline>\n              ttention\n            <\/jats:italic>\n            . Instead of constraining the weights of neural network,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"TeX\" version=\"MathJax\">\\operatorname{DELTA}<\/jats:tex-math>\n            <\/jats:inline-formula>\n            aims at preserving the outer layer outputs of the source network. Specifically, in addition to minimizing the empirical loss,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"TeX\" version=\"MathJax\">\\operatorname{DELTA}<\/jats:tex-math>\n            <\/jats:inline-formula>\n            aligns the outer layer outputs of two networks, through constraining a subset of feature maps that are precisely selected by attention that has been learned in a supervised learning manner. We evaluate\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"TeX\" version=\"MathJax\">\\operatorname{DELTA}<\/jats:tex-math>\n            <\/jats:inline-formula>\n            with the state-of-the-art algorithms, including\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"TeX\" version=\"MathJax\">L^2<\/jats:tex-math>\n            <\/jats:inline-formula>\n            and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"TeX\" version=\"MathJax\">\\emph {L}^2\\text{-}SP<\/jats:tex-math>\n            <\/jats:inline-formula>\n            . The experiment results show that our method outperforms these baselines with higher accuracy for new tasks. Code has been made publicly available.\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n          <\/jats:p>","DOI":"10.1145\/3473912","type":"journal-article","created":{"date-parts":[[2021,10,23]],"date-time":"2021-10-23T04:28:40Z","timestamp":1634963320000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Knowledge Distillation with Attention for Deep Transfer Learning of Convolutional Networks"],"prefix":"10.1145","volume":"16","author":[{"given":"Xingjian","family":"Li","sequence":"first","affiliation":[{"name":"Baidu, Inc., China and University of Macau, Macau, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haoyi","family":"Xiong","sequence":"additional","affiliation":[{"name":"Baidu, Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zeyu","family":"Chen","sequence":"additional","affiliation":[{"name":"Baidu, Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Huan","sequence":"additional","affiliation":[{"name":"StylingAI Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ji","family":"Liu","sequence":"additional","affiliation":[{"name":"Baidu, Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cheng-Zhong","family":"Xu","sequence":"additional","affiliation":[{"name":"University of Macau, Macau, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dejing","family":"Dou","sequence":"additional","affiliation":[{"name":"Baidu, Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,22]]},"reference":[{"key":"e_1_3_2_2_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Li Xingjian","year":"2019","unstructured":"Xingjian Li, Haoyi Xiong, Hanchao Wang, Yuxuan Rao, Liping Liu, and Jun Huan. 2019. DELTA: Deep learning transfer using feature map with attention for convolutional networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2745702"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2477040"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2571625"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2839534"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2571999"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2751969"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2645404"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969033.2969197"},{"key":"e_1_3_2_11_2","unstructured":"Minyoung Huh Pulkit Agrawal and Alexei A. Efros. 2016. What makes ImageNet good for transfer learning? arXiv:1608.08614 . Retrieved from https:\/\/arxiv.org\/abs\/1608.08614."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2959426"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2922125"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2818329"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2482819"},{"key":"e_1_3_2_16_2","unstructured":"Hong Liu Mingsheng Long Jianmin Wang and Michael I. Jordan. 2019. Towards understanding the transferability of deep representations. arXiv:1909.12031 . Retrieved from https:\/\/arxiv.org\/abs\/1909.12031."},{"key":"e_1_3_2_17_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Frankle Jonathan","year":"2019","unstructured":"Jonathan Frankle and Michael Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2773081"},{"key":"e_1_3_2_19_2","first-page":"814","volume-title":"Lecture notes for course 18S997","author":"Rigollet Phillippe","year":"2015","unstructured":"Phillippe Rigollet and Jan-Christian H\u00fctter. 2015. High dimensional statistics. Lecture notes for course 18S997 813 (2015), 814."},{"key":"e_1_3_2_20_2","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Li Xuhong","year":"2018","unstructured":"Xuhong Li, Yves Grandvalet, and Franck Davoine. 2018. Explicit inductive bias for transfer learning with convolutional networks. In Proceedings of the 35th International Conference on Machine Learning."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.754"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2017.309"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01096"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_25_2","volume-title":"Proceedings of the First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition","author":"Khosla Aditya","year":"2011","unstructured":"Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. 2011. Novel dataset for fine-grained image categorization. In Proceedings of the First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2006.18.7.1527"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.5555\/2976456.2976476"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_29_2","first-page":"3319","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Bau David","year":"2017","unstructured":"David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network Dissection: Quantifying interpretability of deep visual representations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). 3319\u20133327."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007379606734"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.5555\/3044805.3044879"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.9"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00432"},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1007\/978-3-030-16145-3_7","volume-title":"Pacific-Asia Conference on Knowledge Discovery and Data Mining","author":"Zhang Yinghua","year":"2019","unstructured":"Yinghua Zhang, Yu Zhang, and Qiang Yang. 2019. Parameter transfer unit for deep neural networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 82\u201395."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454458"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2019.00068"},{"key":"e_1_3_2_38_2","first-page":"6010","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Li Xingjian","year":"2020","unstructured":"Xingjian Li, Haoyi Xiong, Haozhe An, Cheng-Zhong Xu, and Dejing Dou. 2020. RIFLE: Backpropagation in depth for deep transfer learning through re-initializing the fully-connected LayEr. In Proceedings of the International Conference on Machine Learning. 6010\u20136019."},{"key":"e_1_3_2_39_2","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531 . Retrieved from https:\/\/arxiv.org\/abs\/1503.02531."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2873305"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150464"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969033.2969123"},{"key":"e_1_3_2_43_2","unstructured":"Adriana Romero Nicolas Ballas Samira Ebrahimi Kahou Antoine Chassang Carlo Gatta and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv:1412.6550 . Retrieved from https:\/\/arxiv.org\/abs\/1412.6550."},{"key":"e_1_3_2_44_2","volume-title":"International Conference on Learning Representations (ICLR\u201917)","author":"Zagoruyko Sergey","year":"2017","unstructured":"Sergey Zagoruyko and Nikos Komodakis. 2017. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In International Conference on Learning Representations (ICLR\u201917)."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2903448"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2951463"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969033.2969073"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045336"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2648498"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2928494"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1080\/135062800394667"},{"key":"e_1_3_2_53_2","volume-title":"International Conference on Learning Representations (ICLR\u201915)","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR\u201915)."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.5555\/3297863.3297977"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2966830"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2864148"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.10"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.557"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.476"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3473912","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3473912","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:46Z","timestamp":1750193326000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3473912"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,22]]},"references-count":58,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3473912"],"URL":"https:\/\/doi.org\/10.1145\/3473912","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,22]]},"assertion":[{"value":"2021-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}