{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:14:51Z","timestamp":1760238891189,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T00:00:00Z","timestamp":1600041600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61806215","61671459"],"award-info":[{"award-number":["61806215","61671459"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Fine-grained image classification has seen a great improvement benefiting from the advantages of deep learning techniques. Most fine-grained image classification methods focus on extracting discriminative features and combining the global features with the local ones. However, the accuracy is limited due to the inter-class similarity and the inner-class divergence as well as the lack of enough labelled images to train a deep network which can generalize to fine-grained classes. To deal with these problems, we develop an algorithm which combines Maximizing the Mutual Information (MMI) with the Learning Attention (LA). We make use of MMI to distill knowledge from the image pairs which contain the same object. Meanwhile we take advantage of the LA mechanism to find the salient region of the image to enhance the information distillation. Our model can extract more discriminative semantic features and improve the performance on fine-grained image classification. Our model has a symmetric structure, in which the paired images are inputted into the same network to extract the local and global features for the subsequent MMI and LA modules. We train the model by maximizing the mutual information and minimizing the cross-entropy stage by stage alternatively. Experiments show that our model can improve the performance of the fine-grained image classification effectively.<\/jats:p>","DOI":"10.3390\/sym12091511","type":"journal-article","created":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T09:04:53Z","timestamp":1600074293000},"page":"1511","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Convolutional Attention Network with Maximizing Mutual Information for Fine-Grained Image Classification"],"prefix":"10.3390","volume":"12","author":[{"given":"Fenglei","family":"Wang","sequence":"first","affiliation":[{"name":"Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410003, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hao","family":"Zhou","sequence":"additional","affiliation":[{"name":"Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410003, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuohao","family":"Li","sequence":"additional","affiliation":[{"name":"Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410003, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Lei","sequence":"additional","affiliation":[{"name":"Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410003, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410003, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,9,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Wu, J., Chen, T., Wu, H., Yang, Z., Luo, G., and Lin, L. (2020). Fine-Grained Image Captioning with Global-Local Discriminative Objective. arXiv.","DOI":"10.1109\/TMM.2020.3011317"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Xie, S., Kirillov, A., Girshick, R.B., and He, K. (2019, January 27\u201328). Exploring Randomly Wired Neural Networks for Image Recognition. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00137"},{"key":"ref_3","unstructured":"Wei, X.S., Wu, J., and Cui, Q. (2019). Deep Learning for Fine-Grained Image Analysis: A Survey. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1007\/s11633-017-1053-3","article-title":"A survey on deep learning-based fine-grained object classification and semantic segmentation","volume":"14","author":"Zhao","year":"2017","journal-title":"Int. J. Autom. Comput."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"727","DOI":"10.1007\/s11063-020-10246-3","article-title":"A New Supervised Clustering Framework Using Multi Discriminative Parts and Expectation-Maximization Approach for a Fine-Grained Animal Breed Classification (SC-MPEM)","volume":"52","author":"Sundaram","year":"2020","journal-title":"Neural Process. Lett."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1791","DOI":"10.1109\/TCYB.2018.2813971","article-title":"Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition","volume":"49","author":"Wu","year":"2019","journal-title":"IEEE Trans. Cybern."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zheng, M., Li, Q., ao Geng, Y., Yu, H., Wang, J., Gan, J., and Xue, W. (2018, January 12\u201316). A Survey of Fine-Grained Image Categorization. Proceedings of the 2018 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China.","DOI":"10.1109\/ICSP.2018.8652307"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zheng, H., Fu, J., Zha, Z., and Luo, J. (2019, January 16\u201320). Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00515"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the Inception Architecture for Computer Vision. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"Lecun","year":"2015","journal-title":"Nature"},{"key":"ref_12","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very deep convolutional networks for large-scale image recognition. Proceedings of the International Conference on Learning Representations, ICLR 2015\u2014Conference Track Proceedings, San Diego, CA. USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016, January 27\u201330). Learning Deep Features for Discriminative Localization. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.319"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., and Metaxas, D. (2016, January 27\u201330). SPDA-CNN: Unifying semantic part detection and abstraction for fine-grained recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.129"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Huang, S., Xu, Z., Tao, D., and Zhang, Y. (2016, January 27\u201330). Part-stacked CNN for fine-grained visual categorization. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.132"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zheng, H., Fu, J., Mei, T., and Luo, J. (2017, January 22\u201329). Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition. Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.557"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Liu, C., Xie, H., Zha, Z.j., Ma, L., Yu, L., and Zhang, Y. (2020, January 7\u201312). Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6822"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"DIng, Y., Zhou, Y., Zhu, Y., Ye, Q., and Jiao, J. (2019, January 27\u201328). Selective sparse sampling for fine-grained image recognition. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00670"},{"key":"ref_19","unstructured":"Jetley, S., Lord, N.A., Lee, N., and Torr, P. (May, January 30). Learn to Pay Attention. Proceedings of the International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ge, Z., Bewley, A., McCool, C., Corke, P., Upcroft, B., and Sanderson, C. (2016, January 7\u20139). Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks. Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, Lake Placid, NY, USA.","DOI":"10.1109\/WACV.2016.7477700"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"7006","DOI":"10.1109\/TIP.2020.2996736","article-title":"Bi-Modal Progressive Mask Attention for Fine-Grained Recognition","volume":"29","author":"Song","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, T., Lin, L., Chen, R., Wu, Y., and Luo, X. (2018, January 13\u201319). Knowledge-embedded representation learning for fine-grained image recognition. Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/87"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Xu, H., Qi, G., Li, J., Wang, M., Xu, K., and Gao, H. (2018, January 13\u201319). Fine-grained image classification by visual-semantic embedding. Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/145"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"113819","DOI":"10.1016\/j.eswa.2020.113819","article-title":"Generative Adversarial Networks and Markov Random Fields for oversampling very small training sets","volume":"163","author":"Salazar","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Cui, Y., Song, Y., Sun, C., Howard, A., and Belongie, S. (2018, January 18\u201322). Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00432"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhang, F., Li, M., Zhai, G., and Liu, Y. (2020). Multi-branch and Multi-Scale Attention Learning for Fine-Grained Visual Categorization. arXiv.","DOI":"10.1007\/978-3-030-67832-6_12"},{"key":"ref_27","first-page":"3371","article-title":"Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion","volume":"11","author":"Vincent","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_28","unstructured":"Rezende, D.J., Mohamed, S., Danihelka, I., Gregor, K., and Wierstra, D. (2016, January 18\u201320). One-Shot Generalization in Deep Generative Models. Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML\u201916, Oxford, UK."},{"key":"ref_29","unstructured":"Donahue, J., Kr\u00e4henb\u00fchl, P., and Darrell, T. (2016). Adversarial Feature Learning. arXiv."},{"key":"ref_30","first-page":"531","article-title":"Mutual Information Neural Estimation","volume":"Volume 80","author":"Dy","year":"2018","journal-title":"Proceedings of the 35th International Conference on Machine Learning"},{"key":"ref_31","unstructured":"Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., and Bengio, Y. (2019, January 6\u20139). Learning deep representations by mutual information estimation and maximization. Proceedings of the International Conference on Learning Representations, ICLR2019, New Orleans, LA, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ji, X., Henriques, J.F., and Vedaldi, A. (2019, January 27\u201328). Invariant Information Clustering for Unsupervised Image Classification and Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00996"},{"key":"ref_33","unstructured":"L\u00e9on, B. (2010, January 22\u201327). Large-scale machine learning with stochastic gradient descent. Proceedings of the COMPSTAT\u20192010, Physica-Verlag HD, Paris, France."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Golik, P., Doetsch, P., and Ney, H. (2013, January 25\u201329). Cross-entropy vs. Squared error training: A theoretical and experimental comparison. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Lyon, France.","DOI":"10.21437\/Interspeech.2013-436"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1137\/030601296","article-title":"A robust gradient sampling algorithm for nonsmooth, nonconvex optimization","volume":"15","author":"Burke","year":"2005","journal-title":"SIAM J. Optim."},{"key":"ref_36","unstructured":"Allen-Zhu, Z., Li, Y., and Song, Z. (2019, January 10\u201315). A convergence theory for deep learning via over-parameterization. Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"He, K., and Sun, J. (2015, January 7\u201312). Convolutional neural networks at constrained time cost. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299173"},{"key":"ref_38","unstructured":"Krizhevsky, A., and Hinton, G. (2009). Learning Multiple Layers of Features from tiny Images, University of Toronto. Technical Report TR-2009."},{"key":"ref_39","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset, California Institute of Technology. Computation & Neural Systems Technical Report, CNS-TR-2011-001."},{"key":"ref_40","unstructured":"Zagoruyko, S., and Komodakis, N. (2017, January 24\u201326). Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. Proceedings of the International Conference on Learning Representation, ICLR 2017, Toulon, France."},{"key":"ref_41","unstructured":"Li, Y., Wei, C., and Ma, T. (2019, January 8\u201314). Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks. Proceedings of the Conference on Neural Information Processing Systems NIPS 2019, Vancouver, BC, Canada."},{"key":"ref_42","first-page":"3221","article-title":"Accelerating t-SNE using tree-based algorithms","volume":"15","year":"2014","journal-title":"J. Mach. Learn. Res."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/9\/1511\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:09:48Z","timestamp":1760177388000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/9\/1511"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,14]]},"references-count":42,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2020,9]]}},"alternative-id":["sym12091511"],"URL":"https:\/\/doi.org\/10.3390\/sym12091511","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2020,9,14]]}}}