{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T21:02:12Z","timestamp":1771966932781,"version":"3.50.1"},"reference-count":0,"publisher":"TechForum Publishing Group","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Bull. Comput. Data Sci."],"published-print":{"date-parts":[[2023,12,30]]},"abstract":"<jats:p>Fine-grained visual classification (FGVC) becomes especially challenging when categories are organized hierarchically and the discriminative cues shrink from global shapes (order\/family) to tiny parts (genus\/species). Existing hierarchy-aware methods such as CHRF learn level-specific attentions implicitly, but they only use human gaze as a post-hoc validation signal, leaving a rich source of supervision unused. In this work we introduce GS-HAN, a gaze-supervised hierarchical attention network that explicitly aligns model attention with human fixation patterns at every level of the taxonomy. GS-HAN builds on a backbone feature extractor and CHRF-style region feature mining, but augments each hierarchy level with gaze-conditioned attention heads and a Hierarchical Gaze Alignment Loss that combines KL divergence and cosine similarity to match human gaze distributions. We further retain cross-hierarchical orthogonal fusion so that coarse-level, gaze-aligned context can enhance fine-level recognition. Evaluations on CUB-200-2011 with ARISTO gaze, as well as on Butterfly-200, VegFru, FGVC-Aircraft, and Stanford Cars, show that GS-HAN consistently outperforms strong FGVC baselines and hierarchy-aware methods, achieving 90.8% on CUB and clear gains at the most fine-grained (species) level. Ablations verify that (i) direct gaze supervision\u2014not just hierarchy\u2014drives the improvements, (ii) our loss improves quantitative gaze\u2013attention similarity, and (iii) even partial gaze availability yields benefits. The results demonstrate that human gaze is an effective, underexploited supervisory signal for hierarchical FGVC, improving both accuracy and interpretability.<\/jats:p>","DOI":"10.71448\/bcds2341-1","type":"journal-article","created":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T20:10:17Z","timestamp":1771963817000},"page":"1-14","source":"Crossref","is-referenced-by-count":0,"title":["Gaze-Supervised Hierarchical Attention Networks for Fine-Grained Visual Classification"],"prefix":"10.71448","volume":"4","author":[{"name":"School of Computer Science and Engineering, State Key Laboratory of Software, University of York, U.K.","sequence":"first","affiliation":[]},{"given":"Edwin R.","family":"Hancock","sequence":"first","affiliation":[]}],"member":"52394","published-online":{"date-parts":[[2023,12,30]]},"container-title":["Bulletin of Computer and Data Sciences"],"original-title":[],"deposited":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T20:10:18Z","timestamp":1771963818000},"score":1,"resource":{"primary":{"URL":"https:\/\/bcds.ch\/gaze-supervised-hierarchical-attention-networks-for-fine-grained-visual-classification\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,30]]},"references-count":0,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12,30]]},"published-print":{"date-parts":[[2023,12,30]]}},"URL":"https:\/\/doi.org\/10.71448\/bcds2341-1","relation":{},"ISSN":["3072-2926"],"issn-type":[{"value":"3072-2926","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,30]]}}}