{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:12:39Z","timestamp":1757617959923,"version":"3.44.0"},"reference-count":51,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T00:00:00Z","timestamp":1747872000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T00:00:00Z","timestamp":1747872000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["SH 1682\/1-1"],"award-info":[{"award-number":["SH 1682\/1-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007569","name":"Carl-Zeiss-Stiftung","doi-asserted-by":"publisher","award":["Breakthroughs: Exploring Intelligent Systems"],"award-info":[{"award-number":["Breakthroughs: Exploring Intelligent Systems"]}],"id":[{"id":"10.13039\/100007569","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["1LC1903E"],"award-info":[{"award-number":["1LC1903E"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In fine-grained classification, which is classifying images into subcategories within a common broader category, it is crucial to have precise visual explanations of the classification model\u2019s decision. While commonly used attention- or gradient-based methods deliver either too coarse or too noisy explanations unsuitable for highlighting subtle visual differences reliably, perturbation-based methods can precisely locate pixels causally responsible for the predicted category. The <jats:italic>fill-in of the dropout<\/jats:italic> (FIDO) algorithm is one of those methods, which utilizes <jats:italic>concrete dropout<\/jats:italic> (CD) to sample a set of attribution masks and updates the sampling parameters based on the output of the classification model. In this paper, we present a solution against the high variance in the gradient estimates, a known problem of the FIDO algorithm that has been mitigated until now by large mini-batch updates of the sampling parameters. First, our solution allows for estimating the parameters with smaller mini-batch sizes without losing the quality of the estimates but with a reduced computational effort. Next, our method produces finer and more coherent attribution masks. Finally, we use the resulting attribution masks to improve the classification performance on three fine-grained datasets without additional fine-tuning steps and achieve results that are otherwise only achieved if ground truth bounding boxes are used.<\/jats:p>","DOI":"10.1007\/s11263-025-02453-z","type":"journal-article","created":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T00:28:43Z","timestamp":1747873723000},"page":"5857-5871","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Simplified Concrete Dropout - Improving the Generation of Attribution Masks for Fine-grained Classification"],"prefix":"10.1007","volume":"133","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7187-1151","authenticated-orcid":false,"given":"Dimitri","family":"Korsch","sequence":"first","affiliation":[]},{"given":"Maha","family":"Shadaydeh","sequence":"additional","affiliation":[]},{"given":"Joachim","family":"Denzler","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,22]]},"reference":[{"key":"2453_CR1","unstructured":"Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.(2018). Sanity checks for saliency maps. Advances in neural information processing systems 31"},{"key":"2453_CR2","unstructured":"Ahmed, S., Sarofeen, C., Ruberry, M., Yan, E., Gimelshein, N., Carilli, M., Migacz, S., Bialecki, P., Micikevicius, P., Stosic, D., Yang, D., & Maruyama, N. (2022). What Every User Should Know About Mixed Precision Training in PyTorch. Retrieved 16 Jan 2025 from https:\/\/pytorch.org\/blog\/what-every-user-should-know-about-mixed-precision-training-in-pytorch\/#picking-the-right-approach"},{"issue":"5","key":"2453_CR3","doi-asserted-by":"publisher","first-page":"2055","DOI":"10.1214\/15-AOS1337","volume":"43","author":"RF Barber","year":"2015","unstructured":"Barber, R. F., & Cand\u00e8s, E. J. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5), 2055\u20132085. https:\/\/doi.org\/10.1214\/15-AOS1337","journal-title":"The Annals of Statistics"},{"issue":"2","key":"2453_CR4","doi-asserted-by":"publisher","first-page":"343","DOI":"10.3390\/s21020343","volume":"21","author":"K Bjerge","year":"2021","unstructured":"Bjerge, K., Nielsen, J. B., Sepstrup, M. V., Helsing-Nielsen, F., & H\u00f8ye, T. T. (2021). An automated light trap to monitor moths (lepidoptera) using computer vision-based tracking and deep learning. Sensors, 21(2), 343.","journal-title":"Sensors"},{"key":"2453_CR5","doi-asserted-by":"publisher","unstructured":"Brust, C.-A., Burghardt, T., Groenenberg, M., K\u00e4ding, C., K\u00fchl, H., Manguette, M., Denzler, J.( 2017). Towards automated visual monitoring of individual gorillas in the wild. In: ICCV Workshop on Visual Wildlife Monitoring (ICCV-WS), pp. 2820\u2013 2830 . https:\/\/doi.org\/10.1109\/ICCVW.2017.333","DOI":"10.1109\/ICCVW.2017.333"},{"key":"2453_CR6","unstructured":"Chang, C.-H., Creager, E., Goldenberg, A., Duvenaud, D.( 2018). Explaining image classifiers by counterfactual generation. In: International Conference on Learning Representations"},{"key":"2453_CR7","doi-asserted-by":"publisher","unstructured":"Cui, Y., Song, Y., Sun, C., Howard, A., Belongie, S.( 2018). Large scale fine-grained categorization and domain-specific transfer learning. In: Proceedings of CVPR . https:\/\/doi.org\/10.1109\/cvpr.2018.00432","DOI":"10.1109\/cvpr.2018.00432"},{"key":"2453_CR8","unstructured":"Dabkowski, P., Gal, Y.(2017). Real time image saliency for black box classifiers. Advances in neural information processing systems 30"},{"key":"2453_CR9","doi-asserted-by":"crossref","unstructured":"Fong, R.C., Vedaldi, A.( 2017). Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3429\u2013 3437","DOI":"10.1109\/ICCV.2017.371"},{"key":"2453_CR10","unstructured":"Gal, Y., Hron, J., Kendall, A.(2017). Concrete dropout. Advances in neural information processing systems 30"},{"issue":"11","key":"2453_CR11","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1145\/3422622","volume":"63","author":"I Goodfellow","year":"2020","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139\u2013144.","journal-title":"Communications of the ACM"},{"key":"2453_CR12","doi-asserted-by":"crossref","unstructured":"He, J., Chen, J.-N., Liu, S., Kortylewski, A., Yang, C., Bai, Y., Wang, C.( 2022). Transfg: A transformer architecture for fine-grained recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 852\u2013 860","DOI":"10.1609\/aaai.v36i1.19967"},{"key":"2453_CR13","doi-asserted-by":"crossref","unstructured":"He, X., Peng, Y., Zhao, J.(2019). Which and how many regions to gaze: Focus discriminative regions for fine-grained visual categorization. IJCV, 1\u201321","DOI":"10.1007\/s11263-019-01176-2"},{"key":"2453_CR14","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.( 2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770\u2013 778","DOI":"10.1109\/CVPR.2016.90"},{"key":"2453_CR15","unstructured":"Hu, T., Qi, H., Huang, Q., Lu, Y.(2019). See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891"},{"issue":"3","key":"2453_CR16","doi-asserted-by":"publisher","first-page":"542","DOI":"10.1007\/s11263-016-0961-y","volume":"122","author":"B Hughes","year":"2017","unstructured":"Hughes, B., & Burghardt, T. (2017). Automated visual fin identification of individual great white sharks. International Journal of Computer Vision, 122(3), 542\u2013557.","journal-title":"International Journal of Computer Vision"},{"key":"2453_CR17","unstructured":"Hui, Z., Li, J., Wang, X., Gao, X.(2020). Image fine-grained inpainting. arXiv:2002.02609"},{"key":"2453_CR18","unstructured":"Jang, E., Gu, S., Poole, B.( 2017). Categorical reparameterization with gumbel-softmax. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, ??? . https:\/\/openreview.net\/forum?id=rkE3y85ee"},{"key":"2453_CR19","unstructured":"K\u00e4ding, C., Rodner, E., Freytag, A., Mothes, O., Barz, B., Denzler, J. (2018). Active learning for regression tasks with expected model output changes. In: British Machine Vision Conference (BMVC)"},{"key":"2453_CR20","unstructured":"Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.( 2011). Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO"},{"key":"2453_CR21","doi-asserted-by":"crossref","unstructured":"Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.( 2023) Segment anything. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 4015\u2013 4026","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"2453_CR22","unstructured":"Korsch, D., Bodesheim, P., Brehm, G., Denzler, J. ( 2022) Automated visual monitoring of nocturnal insects with light-based camera traps. In: CVPR Workshop on Fine-grained Visual Classification (CVPR-WS)"},{"key":"2453_CR23","doi-asserted-by":"publisher","unstructured":"Korsch, D., Bodesheim, P., Denzler, J.( 2021). End-to-end learning of fisher vector encodings for part features in fine-grained recognition. In: German Conference on Pattern Recognition (DAGM-GCPR), pp. 142\u2013 158 . https:\/\/doi.org\/10.1007\/978-3-030-92659-5_9","DOI":"10.1007\/978-3-030-92659-5_9"},{"key":"2453_CR24","doi-asserted-by":"crossref","unstructured":"Korsch, D., Bodesheim, P., Denzler, J.(2019). Classification-specific parts for improving fine-grained visual categorization. In: Proceedings of the German Conference on Pattern Recognition, pp. 62\u2013 75","DOI":"10.1007\/978-3-030-33676-9_5"},{"key":"2453_CR25","doi-asserted-by":"crossref","unstructured":"K\u00f6rschens, M., Denzler, J.( 2019).Elpephants: A fine-grained dataset for elephant re-identification. In: ICCV Workshop on Computer Vision for Wildlife Conservation (ICCV-WS)","DOI":"10.1109\/ICCVW.2019.00035"},{"key":"2453_CR26","doi-asserted-by":"crossref","unstructured":"Krause, J., Sapp, B., Howard, A., Zhou, H., Toshev, A., Duerig, T., Philbin, J., Fei-Fei, L.(2016). The unreasonable effectiveness of noisy data for fine-grained recognition. In: ECCV, pp. 301\u2013 320 . Springer","DOI":"10.1007\/978-3-319-46487-9_19"},{"key":"2453_CR27","doi-asserted-by":"publisher","unstructured":"Krause, J., Stark, M., Deng, J., Fei-Fei, L.(2013). 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13) https:\/\/doi.org\/10.1109\/iccvw.2013.77","DOI":"10.1109\/iccvw.2013.77"},{"key":"2453_CR28","volume-title":"MNIST handwritten digit database","author":"Y LeCun","year":"2010","unstructured":"LeCun, Y., Cortes, C., Burges, C., et al. (2010). MNIST handwritten digit database. NJ, USA: Florham Park."},{"key":"2453_CR29","doi-asserted-by":"publisher","unstructured":"Lin, T.-Y., RoyChowdhury, A., Maji, S.( 2015). Bilinear cnn models for fine-grained visual recognition. In: Proceedings of ICCV, pp. 1449\u2013 1457 . https:\/\/doi.org\/10.1109\/iccv.2015.170","DOI":"10.1109\/iccv.2015.170"},{"key":"2453_CR30","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S. (2022). A convnet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"2453_CR31","unstructured":"Loshchilov, I., Hutter, F. ( 2018).Decoupled weight decay regularization. In: International Conference on Learning Representations"},{"key":"2453_CR32","unstructured":"Maddison, C., Mnih, A., Teh, Y.( 2017). The concrete distribution: A continuous relaxation of discrete random variables. In: Proceedings of the International Conference on Learning Representations . International Conference on Learning Representations"},{"key":"2453_CR33","unstructured":"Popescu, O.-I., Shadaydeh, M., Denzler, J.(2021). Counterfactual generation with knockoffs. arXiv preprint arXiv:2102.00951"},{"key":"2453_CR34","unstructured":"Redmon, J., Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767"},{"key":"2453_CR35","doi-asserted-by":"crossref","unstructured":"Reimers, C., Penzel, N., Bodesheim, P., Runge, J., Denzler, J.( 2021). Conditional dependence tests reveal the usage of abcd rule features and bias variables in automatic skin lesion classification. In: CVPR ISIC Skin Image Analysis Workshop (CVPR-WS), pp. 1810\u2013 1819","DOI":"10.1109\/CVPRW53098.2021.00200"},{"key":"2453_CR36","unstructured":"Rodner, E., Simon, M., Brehm, G., Pietsch, S., W\u00e4gele, J.W., Denzler, J.( 2015). Fine-grained recognition datasets for biodiversity analysis. In: CVPR Workshop on Fine-grained Visual Classification (CVPR-WS)"},{"issue":"3","key":"2453_CR37","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","volume":"115","author":"O Russakovsky","year":"2015","unstructured":"Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211\u2013252.","journal-title":"International journal of computer vision"},{"key":"2453_CR38","unstructured":"Sakib, F., Burghardt, T.( 2021). Visual recognition of great ape behaviours in the wild. In: International Conference on Pattern Recognition (ICPR) Workshop on Visual Observation and Analysis of Vertebrate And Insect Behavior"},{"key":"2453_CR39","unstructured":"Shrikumar, A., Greenside, P., Kundaje, A.( 2017). Learning important features through propagating activation differences. In: International Conference on Machine Learning, pp. 3145\u2013 3153 , PMLR"},{"key":"2453_CR40","doi-asserted-by":"publisher","unstructured":"Simon, M., Rodner, E., Darell, T., Denzler, J.(2018) . The whole is more than its parts? from explicit to implicit pose normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1\u201313 https:\/\/doi.org\/10.1109\/TPAMI.2018.2885764","DOI":"10.1109\/TPAMI.2018.2885764"},{"key":"2453_CR41","unstructured":"Simonyan, K., Vedaldi, A., Zisserman, A.( 2014). Deep inside convolutional networks: visualising image classification models and saliency maps. In: Proceedings of the International Conference on Learning Representations (ICLR) , ICLR"},{"key":"2453_CR42","unstructured":"Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.( 2015). Striving for simplicity: The all convolutional net. In: ICLR (workshop Track)"},{"key":"2453_CR43","unstructured":"Sundararajan, M., Taly, A., Yan, Q.( 2017). Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319\u2013 3328 , PMLR"},{"key":"2453_CR44","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.( 2016). Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","DOI":"10.1109\/CVPR.2016.308"},{"key":"2453_CR45","unstructured":"Tran, B.(2024) Bird detection by Yolo-V3. https:\/\/github.com\/xmba15\/yolov3_pytorch. [Online; accessed 27-May-2024]"},{"key":"2453_CR46","doi-asserted-by":"crossref","unstructured":"Van\u00a0Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., Belongie, S.( 2015). Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595\u2013 604","DOI":"10.1109\/CVPR.2015.7298658"},{"key":"2453_CR47","doi-asserted-by":"crossref","unstructured":"Van\u00a0Horn, G., Mac\u00a0Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S.( 2018) The inaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769\u2013 8778","DOI":"10.1109\/CVPR.2018.00914"},{"key":"2453_CR48","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.(2011). The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology"},{"key":"2453_CR49","doi-asserted-by":"crossref","unstructured":"Yang, X., Mirmehdi, M., Burghardt, T.( 2019). Great ape detection in challenging jungle camera trap footage via attention-based spatial and temporal feature blending. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops","DOI":"10.1109\/ICCVW.2019.00034"},{"key":"2453_CR50","doi-asserted-by":"crossref","unstructured":"Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., Yan, S.(2022). Metaformer is actually what you need for vision. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 10819\u2013 10829","DOI":"10.1109\/CVPR52688.2022.01055"},{"key":"2453_CR51","doi-asserted-by":"crossref","unstructured":"Zhang, L., Huang, S., Liu, W., Tao, D.( 2019). Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of ICCV, pp. 8331\u2013 8340","DOI":"10.1109\/ICCV.2019.00842"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02453-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-025-02453-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02453-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T15:12:26Z","timestamp":1757171546000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-025-02453-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,22]]},"references-count":51,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["2453"],"URL":"https:\/\/doi.org\/10.1007\/s11263-025-02453-z","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"type":"print","value":"0920-5691"},{"type":"electronic","value":"1573-1405"}],"subject":[],"published":{"date-parts":[[2025,5,22]]},"assertion":[{"value":"27 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 April 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}