{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T16:13:31Z","timestamp":1774541611753,"version":"3.50.1"},"reference-count":96,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,10,20]],"date-time":"2024-10-20T00:00:00Z","timestamp":1729382400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,20]],"date-time":"2024-10-20T00:00:00Z","timestamp":1729382400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/X028631\/1"],"award-info":[{"award-number":["EP\/X028631\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2025,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>This paper presents a novel approach for Fine-Grained Visual Classification (FGVC) by exploring Graph Neural Networks (GNNs) to facilitate high-order feature interactions, with a specific focus on constructing both inter- and intra-region graphs. Unlike previous FGVC techniques that often isolate global and local features, our method combines both features seamlessly during learning via graphs. Inter-region graphs capture long-range dependencies to recognize global patterns, while intra-region graphs delve into finer details within specific regions of an object by exploring high-dimensional convolutional features. A key innovation is the use of shared GNNs with an attention mechanism coupled with the Approximate Personalized Propagation of Neural Predictions (APPNP) message-passing algorithm, enhancing information propagation efficiency for better discriminability and simplifying the model architecture for computational efficiency. Additionally, the introduction of residual connections improves performance and training stability. Comprehensive experiments showcase state-of-the-art results on benchmark FGVC datasets, affirming the efficacy of our approach. This work underscores the potential of GNN in modeling high-level feature interactions, distinguishing it from previous FGVC methods that typically focus on singular aspects of feature representation. Our source code is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/Arindam-1991\/I2-HOFI\" ext-link-type=\"uri\">https:\/\/github.com\/Arindam-1991\/I2-HOFI<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s11263-024-02260-y","type":"journal-article","created":{"date-parts":[[2024,10,20]],"date-time":"2024-10-20T07:01:54Z","timestamp":1729407714000},"page":"1755-1779","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Interweaving Insights: High-Order Feature Interaction for Fine-Grained Visual Recognition"],"prefix":"10.1007","volume":"133","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5697-0060","authenticated-orcid":false,"given":"Arindam","family":"Sikdar","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3774-2134","authenticated-orcid":false,"given":"Yonghuai","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Siddhardha","family":"Kedarisetty","sequence":"additional","affiliation":[]},{"given":"Yitian","family":"Zhao","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7749-7911","authenticated-orcid":false,"given":"Amr","family":"Ahmed","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0276-9000","authenticated-orcid":false,"given":"Ardhendu","family":"Behera","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,10,20]]},"reference":[{"issue":"12","key":"2260_CR1","doi-asserted-by":"crossref","first-page":"5771","DOI":"10.1109\/TIP.2019.2922100","volume":"28","author":"M Ali","year":"2019","unstructured":"Ali, M., Gao, J., & Antolovich, M. (2019). Parametric classification of Bingham distributions based on Grassmann manifolds. IEEE Transactions on Image Processing, 28(12), 5771\u20135784.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR2","doi-asserted-by":"crossref","unstructured":"Barz, B., Denzler, J. (2020). Deep learning on small datasets without pre-training using cosine loss. In IEEE Winter conference on applications of computer vision (pp. 1371\u20131380).","DOI":"10.1109\/WACV45572.2020.9093286"},{"key":"2260_CR3","doi-asserted-by":"crossref","unstructured":"Behera, A., Wharton, Z., Hewage, P., & Bera, A. (2021). Context-aware attentional pooling (CAP) for fine-grained visual classification. In Proceedings of 35th AAAI conference on artificial intelligence (pp. 929\u2013937).","DOI":"10.1609\/aaai.v35i2.16176"},{"key":"2260_CR4","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1109\/TAFFC.2020.3031841","volume":"14","author":"A Behera","year":"2020","unstructured":"Behera, A., Wharton, Z., Liu, Y., Ghahremani, M., Kumar, S., & Bessis, N. (2020). Regional attention network (RAN) for head pose and fine-grained gesture recognition. IEEE Transactions on Affective Computing, 14, 549\u2013562.","journal-title":"IEEE Transactions on Affective Computing"},{"key":"2260_CR5","doi-asserted-by":"crossref","first-page":"3691","DOI":"10.1109\/TIP.2021.3064256","volume":"30","author":"A Bera","year":"2021","unstructured":"Bera, A., Wharton, Z., Liu, Y., Bessis, N., & Behera, A. (2021). Attend and guide (ag-net): A keypoints-driven attention-based deep network for image recognition. IEEE Transactions on Image Processing, 30, 3691\u20133704.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR6","doi-asserted-by":"crossref","first-page":"6017","DOI":"10.1109\/TIP.2022.3205215","volume":"31","author":"A Bera","year":"2022","unstructured":"Bera, A., Wharton, Z., Liu, Y., Bessis, N., & Behera, A. (2022). SR-GNN: Spatial relation-aware graph neural network for fine-grained image categorization. IEEE Transactions on Image Processing, 31, 6017\u20136031.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR7","doi-asserted-by":"crossref","unstructured":"Cai, S., Zuo, W., & Zhang, L. (2017). Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In Proceedings of the IEEE international conference on computer vision (pp. 511\u2013520).","DOI":"10.1109\/ICCV.2017.63"},{"key":"2260_CR8","doi-asserted-by":"crossref","unstructured":"Chai, Y., Lempitsky, V., & Zisserman, A. (2013). Symbiotic segmentation and part localization for fine-grained categorization. In Proceeding of the IEEE international conference on computer vision (pp. 321\u2013328).","DOI":"10.1109\/ICCV.2013.47"},{"key":"2260_CR9","doi-asserted-by":"crossref","unstructured":"Chang, D., Pang, K., Zheng, Y., Ma, Z., Song, Y.-Z., & Guo, J. (2021). Your \u201cflamingo\" is my \u201cbird\u201d: Fine-grained, or not. In Proceedings of IEEE\/CVF conference computer vision and pattern recognition (pp. 11476\u201311485).","DOI":"10.1109\/CVPR46437.2021.01131"},{"key":"2260_CR10","doi-asserted-by":"crossref","first-page":"4683","DOI":"10.1109\/TIP.2020.2973812","volume":"29","author":"D Chang","year":"2020","unstructured":"Chang, D., Ding, Y., Xie, J., Bhunia, A. K., Li, X., Ma, Z., Wu, M., Guo, J., & Song, Y.-Z. (2020). The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Transactions on Image Processing, 29, 4683\u20134695.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR11","unstructured":"Chaudhuri, A., Mancini, M., Akata, Z., & Dutta, A. (2024). Transitivity recovering decompositions: Interpretable and robust fine-grained relationships. Advances in Neural Information Processing Systems, 36."},{"key":"2260_CR12","first-page":"31145","volume":"35","author":"A Chaudhuri","year":"2022","unstructured":"Chaudhuri, A., Mancini, M., Akata, Z., & Dutta, A. (2022). Relational proxies: Emergent relationships as fine-grained discriminators. Advances in Neural Information Processing Systems, 35, 31145\u201331157.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2260_CR13","doi-asserted-by":"crossref","unstructured":"Chen, Y., Bai, Y., Zhang, W., & Mei, T. (2019). Destruction and construction learning for fine-grained image recognition. In IEEE conference on computer vision and pattern recognition (pp. 5157\u20135166).","DOI":"10.1109\/CVPR.2019.00530"},{"key":"2260_CR14","doi-asserted-by":"crossref","first-page":"110265","DOI":"10.1016\/j.patcog.2024.110265","volume":"149","author":"H Chen","year":"2024","unstructured":"Chen, H., Zhang, H., Liu, C., An, J., Gao, Z., & Qiu, J. (2024). FET-FGVC: Feature-enhanced transformer for fine-grained visual classification. Pattern Recognition, 149, 110265.","journal-title":"Pattern Recognition"},{"key":"2260_CR15","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In IEEE conference on computer vision and pattern recognition (pp. 1251\u20131258).","DOI":"10.1109\/CVPR.2017.195"},{"key":"2260_CR16","doi-asserted-by":"crossref","unstructured":"Cui, Y., Song, Y., Sun, C., Howard, A., & Belongie, S. (2018). Large scale fine-grained categorization and domain-specific transfer learning. In IEEE conference on computer vision and pattern recognition (pp. 4109\u20134118).","DOI":"10.1109\/CVPR.2018.00432"},{"key":"2260_CR17","doi-asserted-by":"crossref","unstructured":"Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., & Belongie, S. (2017). Kernel pooling for convolutional neural networks. In IEEE conference on computer vision and pattern recognition (pp. 2921\u20132930).","DOI":"10.1109\/CVPR.2017.325"},{"key":"2260_CR18","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 764\u2013773).","DOI":"10.1109\/ICCV.2017.89"},{"key":"2260_CR19","doi-asserted-by":"crossref","unstructured":"Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., & Jiao, J. (2019). Selective sparse sampling for fine-grained image recognition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 6599\u20136608).","DOI":"10.1109\/ICCV.2019.00670"},{"key":"2260_CR20","doi-asserted-by":"crossref","first-page":"2826","DOI":"10.1109\/TIP.2021.3055617","volume":"30","author":"Y Ding","year":"2021","unstructured":"Ding, Y., Ma, Z., Wen, S., Xie, J., Chang, D., Si, Z., Wu, M., & Ling, H. (2021). AP-CNN: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification. IEEE Transactions on Image Processing, 30, 2826\u20132836.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR21","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of international conference learning representations (ICLR)."},{"key":"2260_CR22","doi-asserted-by":"crossref","unstructured":"Engin, M., Wang, L., Zhou, L., & Liu, X. (2018). Deepkspd: Learning kernel-matrix-based spd representation for fine-grained image recognition. In Proceedings of the European conference on computer vision (pp. 612\u2013627).","DOI":"10.1007\/978-3-030-01216-8_38"},{"key":"2260_CR23","doi-asserted-by":"crossref","unstructured":"Ge, W., & Yu, Y. (2017). Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1086\u20131095).","DOI":"10.1109\/CVPR.2017.9"},{"key":"2260_CR24","doi-asserted-by":"crossref","unstructured":"Ge, W., Lin, X., & Yu, Y. (2019). Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In Proceedings IEEE conference on computer vision and pattern recognition (pp. 3034\u20133043).","DOI":"10.1109\/CVPR.2019.00315"},{"key":"2260_CR25","first-page":"8291","volume":"35","author":"K Han","year":"2022","unstructured":"Han, K., Wang, Y., Guo, J., Tang, Y., & Wu, E. (2022). Vision GNN: An image is worth graph of nodes. Advances in Neural Information Processing Systems, 35, 8291\u20138303.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2260_CR26","doi-asserted-by":"crossref","unstructured":"He, X., & Peng, Y. (2017). Fine-grained image classification via combining vision and language. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5994\u20136002).","DOI":"10.1109\/CVPR.2017.775"},{"key":"2260_CR27","doi-asserted-by":"crossref","unstructured":"He, J., Chen, J., Lin, M.-X., Yu, Q., & Yuille, A.L. (2023). Compositor: Bottom-up clustering and compositing for robust part and object segmentation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 11259\u201311268).","DOI":"10.1109\/CVPR52729.2023.01083"},{"key":"2260_CR28","doi-asserted-by":"crossref","unstructured":"He, J., Chen, J.-N., Liu, S., Kortylewski, A., Yang, C., Bai, Y., Wang, C., & Yuille, A. (2021). Transfg: A transformer architecture for fine-grained recognition. arXiv preprint arXiv:2103.07976.","DOI":"10.1609\/aaai.v36i1.19967"},{"key":"2260_CR29","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770\u2013778).","DOI":"10.1109\/CVPR.2016.90"},{"issue":"9","key":"2260_CR30","doi-asserted-by":"crossref","first-page":"1235","DOI":"10.1007\/s11263-019-01176-2","volume":"127","author":"X He","year":"2019","unstructured":"He, X., Peng, Y., & Zhao, J. (2019). Which and how many regions to gaze: Focus discriminative regions for fine-grained visual categorization. International Journal of Computer Vision, 127(9), 1235\u20131255.","journal-title":"International Journal of Computer Vision"},{"key":"2260_CR31","doi-asserted-by":"crossref","unstructured":"Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., & Vasudevan, V. (2019). Searching for mobilenetv3. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1314\u20131324).","DOI":"10.1109\/ICCV.2019.00140"},{"key":"2260_CR32","unstructured":"Hu, T., Qi, H., Huang, Q., & Lu, Y. (2019). See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891"},{"key":"2260_CR33","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der\u00a0Maaten, L., & Weinberger, K.Q. (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700\u20134708).","DOI":"10.1109\/CVPR.2017.243"},{"key":"2260_CR34","doi-asserted-by":"crossref","unstructured":"Huang, S., Xu, Z., Tao, D., & Zhang, Y. (2016). Part-stacked CNN for fine-grained visual categorization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1173\u20131182).","DOI":"10.1109\/CVPR.2016.132"},{"issue":"4","key":"2260_CR35","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1109\/TMM.2016.2631122","volume":"19","author":"C Huang","year":"2016","unstructured":"Huang, C., Li, H., Xie, Y., Wu, Q., & Luo, B. (2016). PBC: Polygon-based classifier for fine-grained categorization. IEEE Transactions on Multimedia, 19(4), 673\u2013684.","journal-title":"IEEE Transactions on Multimedia"},{"key":"2260_CR36","doi-asserted-by":"crossref","unstructured":"H\u00fcbler, C., Kriegel, H.-P., Borgwardt, K., & Ghahramani, Z. (2008). Metropolis algorithms for representative subgraph sampling. In 2008 Eighth IEEE international conference on data mining (pp. 283\u2013292).","DOI":"10.1109\/ICDM.2008.124"},{"key":"2260_CR37","doi-asserted-by":"crossref","unstructured":"Hung, W.-C., Jampani, V., Liu, S., Molchanov, P., Yang, M.-H., & Kautz, J. (2019). Scops: Self-supervised co-part segmentation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 869\u2013878).","DOI":"10.1109\/CVPR.2019.00096"},{"key":"2260_CR38","doi-asserted-by":"crossref","first-page":"109305","DOI":"10.1016\/j.patcog.2023.109305","volume":"137","author":"X Ke","year":"2023","unstructured":"Ke, X., Cai, Y., Chen, B., Liu, H., & Guo, W. (2023). Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification. Pattern Recognition, 137, 109305.","journal-title":"Pattern Recognition"},{"key":"2260_CR39","unstructured":"Khosla, A., Jayadevaprakash, N., Yao, B., & Li, F.-F. (2011). Novel dataset for fine-grained image categorization: Stanford dogs. In Proceedings of CVPR workshop on fine-grained visual categorization (Vol. 2)."},{"key":"2260_CR40","unstructured":"Kipf, T.N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In International conference on learning representations."},{"key":"2260_CR41","doi-asserted-by":"crossref","unstructured":"Klicpera, J., Bojchevski, A., & G\u00fcnnemann, S. (2019). Predict then propagate: Graph neural networks meet personalized pagerank. In International conference on learning representations (ICLR).","DOI":"10.1145\/3394486.3403296"},{"key":"2260_CR42","doi-asserted-by":"crossref","unstructured":"Kong, S., & Fowlkes, C. (2017). Low-rank bilinear pooling for fine-grained classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 365\u2013374).","DOI":"10.1109\/CVPR.2017.743"},{"key":"2260_CR43","doi-asserted-by":"crossref","unstructured":"Korsch, D., Bodesheim, P., & Denzler, J. (2019). Classification-specific parts for improving fine-grained visual categorization. In German conference on pattern recognition (pp. 62\u201375). Springer","DOI":"10.1007\/978-3-030-33676-9_5"},{"key":"2260_CR44","doi-asserted-by":"crossref","unstructured":"Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3D object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops (pp. 554\u2013561).","DOI":"10.1109\/ICCVW.2013.77"},{"key":"2260_CR45","unstructured":"Li, Y., Tarlow, D., Brockschmidt, M., & Zemel, R. (2016). Gated graph sequence neural networks. In International Conference on Learning Representations (ICLR)."},{"key":"2260_CR46","doi-asserted-by":"crossref","unstructured":"Li, Y., Wang, N., Liu, J., & Hou, X. (2017). Factorized bilinear models for image recognition. In Proceedings of the IEEE international conference on computer vision (pp. 2079\u20132087).","DOI":"10.1109\/ICCV.2017.229"},{"key":"2260_CR47","doi-asserted-by":"crossref","unstructured":"Li, P., Xie, J., Wang, Q., & Zuo, W. (2017). Is second-order information helpful for large-scale visual recognition? In Proceedings of the IEEE international conference on computer vision (pp. 2070\u20132078).","DOI":"10.1109\/ICCV.2017.228"},{"key":"2260_CR48","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., RoyChowdhury, A., & Maji, S. (2015). Bilinear CNN models for fine-grained visual recognition. In Proceedings of the IEEE international conference on computer vision (pp. 1449\u20131457).","DOI":"10.1109\/ICCV.2015.170"},{"issue":"1","key":"2260_CR49","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1109\/TNNLS.2020.3027603","volume":"33","author":"D Lin","year":"2020","unstructured":"Lin, D., Wang, Y., Liang, L., Li, P., & Chen, C. P. (2020). Deep LSAC for fine-grained recognition. IEEE Transactions on Neural Networks and Learning Systems, 33(1), 200\u2013214.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"2260_CR50","doi-asserted-by":"crossref","first-page":"2902","DOI":"10.1109\/TMM.2021.3090274","volume":"24","author":"H Liu","year":"2021","unstructured":"Liu, H., Li, J., Li, D., See, J., & Lin, W. (2021). Learning scale-consistent attention part network for fine-grained image recognition. IEEE Transactions on Multimedia, 24, 2902\u20132913.","journal-title":"IEEE Transactions on Multimedia"},{"key":"2260_CR51","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of international conference on computer vision (pp. 10012\u201310022).","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"2260_CR52","doi-asserted-by":"crossref","first-page":"1785","DOI":"10.1109\/TMM.2019.2954747","volume":"22","author":"C Liu","year":"2019","unstructured":"Liu, C., Xie, H., Zha, Z.-J., Yu, L., Chen, Z., & Zhang, Y. (2019). Bidirectional attention-recognition model for fine-grained object classification. IEEE Transactions on Multimedia, 22, 1785\u20131795.","journal-title":"IEEE Transactions on Multimedia"},{"key":"2260_CR53","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1109\/TIP.2021.3135477","volume":"31","author":"M Liu","year":"2021","unstructured":"Liu, M., Zhang, C., Bai, H., Zhang, R., & Zhao, Y. (2021). Cross-part learning for fine-grained image classification. IEEE Transactions on Image Processing, 31, 748\u2013758.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR54","unstructured":"Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. arXiv preprint: arXiv:1306.5151"},{"key":"2260_CR55","doi-asserted-by":"crossref","first-page":"1983","DOI":"10.1109\/LSP.2021.3114622","volume":"28","author":"Z Miao","year":"2021","unstructured":"Miao, Z., Zhao, X., Wang, J., Li, Y., & Li, H. (2021). Complemental attention multi-feature fusion network for fine-grained classification. IEEE Signal Processing Letters, 28, 1983\u20131987.","journal-title":"IEEE Signal Processing Letters"},{"key":"2260_CR56","doi-asserted-by":"crossref","unstructured":"Nilsback, M.-E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In Indian conference on computer vision, graphics & image processing (pp. 722\u2013729).","DOI":"10.1109\/ICVGIP.2008.47"},{"issue":"3","key":"2260_CR57","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1109\/TIP.2017.2774041","volume":"27","author":"Y Peng","year":"2018","unstructured":"Peng, Y., He, X., & Zhao, J. (2018). Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing, 27(3), 1487\u20131500.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR58","doi-asserted-by":"crossref","unstructured":"Pham, N., & Pagh, R. (2013). Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 239\u2013247).","DOI":"10.1145\/2487575.2487591"},{"key":"2260_CR59","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1007\/s11263-013-0636-x","volume":"105","author":"J S\u00e1nchez","year":"2013","unstructured":"S\u00e1nchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the fisher vector: Theory and practice. International Journal of Computer Vision, 105, 222\u2013245.","journal-title":"International Journal of Computer Vision"},{"issue":"3","key":"2260_CR60","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1109\/TPAMI.2018.2885764","volume":"42","author":"M Simon","year":"2020","unstructured":"Simon, M., Rodner, E., Darrell, T., & Denzler, J. (2020). The whole is more than its parts? from explicit to implicit pose normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 749\u2013763.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"3","key":"2260_CR61","first-page":"3554","volume":"45","author":"Y Song","year":"2022","unstructured":"Song, Y., Sebe, N., & Wang, W. (2022). On the eigenvalues of global covariance pooling for fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 3554\u20133566.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2260_CR62","doi-asserted-by":"crossref","first-page":"7006","DOI":"10.1109\/TIP.2020.2996736","volume":"29","author":"K Song","year":"2020","unstructured":"Song, K., Wei, X.-S., Shu, X., Song, R.-J., & Lu, J. (2020). Bi-modal progressive mask attention for fine-grained recognition. IEEE Transactions on Image Processing, 29, 7006\u20137018.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR63","doi-asserted-by":"crossref","unstructured":"Van\u00a0Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., & Belongie, S. (2015). Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In IEEE conference on computer vision and pattern recognition (pp. 595\u2013604).","DOI":"10.1109\/CVPR.2015.7298658"},{"key":"2260_CR64","unstructured":"Veli\u010dkovi\u0107, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2018). Graph attention networks. In International conference on learning representations (ICLR)."},{"key":"2260_CR65","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-UCSD birds-200-2011 dataset."},{"key":"2260_CR66","doi-asserted-by":"crossref","unstructured":"Wang, Q., Li, P., & Zhang, L. (2017). G2denet: Global gaussian distribution embedding network and its application to visual recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2730\u20132739).","DOI":"10.1109\/CVPR.2017.689"},{"key":"2260_CR67","doi-asserted-by":"crossref","unstructured":"Wang, Z., Wang, S., Li, H., Dou, Z., & Li, J. (2020). Graph-propagation based correlation learning for weakly supervised fine-grained image classification. In AAAI (pp. 12289\u201312296).","DOI":"10.1609\/aaai.v34i07.6912"},{"key":"2260_CR68","doi-asserted-by":"crossref","unstructured":"Wang, S., Wang, Z., Li, H., & Ouyang, W. (2020). Category-specific semantic coherency learning for fine-grained image recognition. In Procedings of the 28th ACM international conference on multimedia (pp. 174\u2013183).","DOI":"10.1145\/3394171.3413871"},{"key":"2260_CR69","doi-asserted-by":"crossref","unstructured":"Wang, Z., Wang, S., Zhang, P., Li, H., Zhong, W., & Li, J. (2019). Weakly supervised fine-grained image classification via correlation-guided discriminative learning. In Proceedings of the 27th ACM international conference on multimedia (pp. 1851\u20131860).","DOI":"10.1145\/3343031.3350976"},{"key":"2260_CR70","doi-asserted-by":"crossref","unstructured":"Wang, L., Zhang, J., Zhou, L., Tang, C., & Li, W. (2015). Beyond covariance: Feature representation with nonlinear kernel matrices. In Proceedings of international conference on computer vision (pp. 4570\u20134578).","DOI":"10.1109\/ICCV.2015.519"},{"key":"2260_CR71","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1007\/s11263-023-01873-z","volume":"132","author":"S Wang","year":"2023","unstructured":"Wang, S., Wang, Z., Li, H., Chang, J., Ouyang, W., & Tian, Q. (2023). Accurate fine-grained object recognition with structure-driven relation graph networks. International Journal of Computer Vision, 132, 137\u2013160.","journal-title":"International Journal of Computer Vision"},{"issue":"8","key":"2260_CR72","first-page":"2582","volume":"43","author":"Q Wang","year":"2020","unstructured":"Wang, Q., Xie, J., Zuo, W., Zhang, L., & Li, P. (2020). Deep CNNs meet global covariance pooling: Better representation and generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8), 2582\u20132597.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2260_CR73","doi-asserted-by":"crossref","first-page":"8927","DOI":"10.1109\/TPAMI.2021.3126648","volume":"44","author":"X-S Wei","year":"2021","unstructured":"Wei, X.-S., Song, Y.-Z., Mac Aodha, O., Wu, J., Peng, Y., Tang, J., Yang, J., & Belongie, S. (2021). Fine-grained image analysis with deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 8927\u20138948.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2260_CR74","unstructured":"Wei, X.-S., Wu, J., & Cui, Q. (2019). Deep learning for fine-grained image analysis: A survey. arXiv preprint arXiv:1907.03069"},{"key":"2260_CR75","doi-asserted-by":"crossref","unstructured":"Wei, X., Zhang, Y., Gong, Y., Zhang, J., & Zheng, N. (2018). Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In Proceedings of the European conference on computer vision (pp. 355\u2013370).","DOI":"10.1007\/978-3-030-01219-9_22"},{"key":"2260_CR76","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/TNNLS.2020.2978386","volume":"32","author":"Z Wu","year":"2020","unstructured":"Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y. (2020). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32, 4\u201324.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"8","key":"2260_CR77","doi-asserted-by":"crossref","first-page":"4499","DOI":"10.1109\/TNNLS.2021.3116209","volume":"34","author":"Z Xie","year":"2023","unstructured":"Xie, Z., Zhang, W., Sheng, B., Li, P., & Chen, C. P. (2023). BaGFN: Broad attentive graph fusion network for high-order feature interactions. IEEE Transactions on Neural Networks and Learning Systems, 34(8), 4499\u20134513.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"2260_CR78","doi-asserted-by":"crossref","unstructured":"Xie, L., Zheng, L., Wang, J., Yuille, A.L., & Tian, Q. (2016). Interactive: Inter-layer activeness propagation. In IEEE conference on computer vision and pattern recognition (pp. 270\u2013279).","DOI":"10.1109\/CVPR.2016.36"},{"key":"2260_CR79","unstructured":"Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How powerful are graph neural networks? In International conference on learning representations (ICLR)."},{"key":"2260_CR80","doi-asserted-by":"crossref","first-page":"3488","DOI":"10.1109\/TNNLS.2021.3112768","volume":"34","author":"K Xu","year":"2021","unstructured":"Xu, K., Lai, R., Gu, L., & Li, Y. (2021). Multiresolution discriminative mixup network for fine-grained visual categorization. IEEE Transactions on Neural Networks and Learning Systems, 34, 3488\u20133500.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"2260_CR81","doi-asserted-by":"crossref","first-page":"110042","DOI":"10.1016\/j.patcog.2023.110042","volume":"146","author":"Y Xu","year":"2024","unstructured":"Xu, Y., Wu, S., Wang, B., Yang, M., Wu, Z., Yao, Y., & Wei, Z. (2024). Two-stage fine-grained image classification model based on multi-granularity feature fusion. Pattern Recognition, 146, 110042.","journal-title":"Pattern Recognition"},{"key":"2260_CR82","first-page":"118","volume":"54","author":"S Yan","year":"2017","unstructured":"Yan, S., Smith, J. S., & Zhang, B. (2017). Action recognition from still images based on deep VLAD spatial pyramids. Signal Processing: Image Communication, 54, 118\u2013129.","journal-title":"Signal Processing: Image Communication"},{"issue":"1","key":"2260_CR83","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/TIP.2017.2751960","volume":"27","author":"H Yao","year":"2017","unstructured":"Yao, H., Zhang, S., Yan, C., Zhang, Y., Li, J., & Tian, Q. (2017). Autobd: Automated bi-level description for scalable fine-grained visual categorization. IEEE Transactions on Image Processing, 27(1), 10\u201323.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR84","doi-asserted-by":"crossref","unstructured":"Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., & Sang, N. (2018). Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 325\u2013341).","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"2260_CR85","doi-asserted-by":"crossref","unstructured":"Yu, C., Zhao, X., Zheng, Q., Zhang, P., & You, X. (2018). Hierarchical bilinear pooling for fine-grained visual recognition. In European conference on computer vision (pp. 574\u2013589).","DOI":"10.1007\/978-3-030-01270-0_35"},{"key":"2260_CR86","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., Taylor, G.W., & Fergus, R. (2011). Adaptive deconvolutional networks for mid and high level feature learning. In Proceedings of international conference on computer vision (pp. 2018\u20132025).","DOI":"10.1109\/ICCV.2011.6126474"},{"key":"2260_CR87","doi-asserted-by":"crossref","unstructured":"Zhang, N., Donahue, J., Girshick, R., & Darrell, T. (2014). Part-based R-CNNs for fine-grained category detection. In European conference on computer vision (pp. 834\u2013849). Springer.","DOI":"10.1007\/978-3-319-10590-1_54"},{"key":"2260_CR88","doi-asserted-by":"crossref","unstructured":"Zhang, L., Huang, S., Liu, W., & Tao, D. (2019). Learning a mixture of granularity-specific experts for fine-grained categorization. In Proceedings of IEEE international conference on computer vision (pp. 8331\u20138340).","DOI":"10.1109\/ICCV.2019.00842"},{"issue":"4","key":"2260_CR89","doi-asserted-by":"crossref","first-page":"1713","DOI":"10.1109\/TIP.2016.2531289","volume":"25","author":"Y Zhang","year":"2016","unstructured":"Zhang, Y., Wei, X.-S., Wu, J., Cai, J., Lu, J., Nguyen, V.-A., & Do, M. N. (2016). Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing, 25(4), 1713\u20131725.","journal-title":"IEEE Transactions on Image Processing"},{"key":"2260_CR90","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Yan, K., Huang, F., & Li, J. (2021). Graph-based high-order relation discovery for fine-grained recognition. In Proceedings IEEE\/CVF conference on computer vision and pattern recognition (pp. 15079\u201315088).","DOI":"10.1109\/CVPR46437.2021.01483"},{"key":"2260_CR91","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Yan, K., Huang, F., & Li, J. (2021). Graph-based high-order relation discovery for fine-grained recognition. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 15079\u201315088).","DOI":"10.1109\/CVPR46437.2021.01483"},{"key":"2260_CR92","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1016\/j.neunet.2023.01.050","volume":"161","author":"P Zhao","year":"2023","unstructured":"Zhao, P., Li, Y., Tang, B., Liu, H., & Yao, S. (2023). Feature relocation network for fine-grained image classification. Neural Networks, 161, 306\u2013317.","journal-title":"Neural Networks"},{"key":"2260_CR93","doi-asserted-by":"crossref","unstructured":"Zheng, H., Fu, J., Zha, Z.-J., & Luo, J. (2019). Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5012\u20135021).","DOI":"10.1109\/CVPR.2019.00515"},{"key":"2260_CR94","doi-asserted-by":"crossref","unstructured":"Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020). Random erasing data augmentation. Proceedings of AAAI Conference on Artificial Intelligence, 34, 13001\u201313008.","DOI":"10.1609\/aaai.v34i07.7000"},{"key":"2260_CR95","doi-asserted-by":"crossref","first-page":"13130","DOI":"10.1609\/aaai.v34i07.7016","volume":"34","author":"P Zhuang","year":"2020","unstructured":"Zhuang, P., Wang, Y., & Qiao, Y. (2020). Learning attentive pairwise interaction for fine-grained classification. Proceedings of AAAI Conference on Artificial Intelligence, 34, 13130\u201313137.","journal-title":"Proceedings of AAAI Conference on Artificial Intelligence"},{"key":"2260_CR96","doi-asserted-by":"crossref","unstructured":"Zoph, B., Vasudevan, V., Shlens, J., & Le, Q.V. (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8697\u20138710).","DOI":"10.1109\/CVPR.2018.00907"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-024-02260-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-024-02260-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-024-02260-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,30]],"date-time":"2025-03-30T22:11:08Z","timestamp":1743372668000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-024-02260-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,20]]},"references-count":96,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,4]]}},"alternative-id":["2260"],"URL":"https:\/\/doi.org\/10.1007\/s11263-024-02260-y","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,20]]},"assertion":[{"value":"2 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 October 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}