{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T08:29:38Z","timestamp":1781598578591,"version":"3.54.5"},"reference-count":70,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T00:00:00Z","timestamp":1757030400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"FAPESP","award":["2023\/10442-2"],"award-info":[{"award-number":["2023\/10442-2"]}]},{"name":"FAPESP","award":["2024\/00530-4"],"award-info":[{"award-number":["2024\/00530-4"]}]},{"name":"FAPESP","award":["2023\/04583-2"],"award-info":[{"award-number":["2023\/04583-2"]}]},{"name":"FAPESP","award":["2022\/03668-1"],"award-info":[{"award-number":["2022\/03668-1"]}]},{"name":"FAPESP","award":["2018\/22214-6"],"award-info":[{"award-number":["2018\/22214-6"]}]},{"name":"FAPESP","award":["2021\/08325-2"],"award-info":[{"award-number":["2021\/08325-2"]}]},{"name":"FAPESP","award":["88887. 631085\/2021-00"],"award-info":[{"award-number":["88887. 631085\/2021-00"]}]},{"name":"FAPESP","award":["305610\/2022-8"],"award-info":[{"award-number":["305610\/2022-8"]}]},{"name":"CAPES","award":["2023\/10442-2"],"award-info":[{"award-number":["2023\/10442-2"]}]},{"name":"CAPES","award":["2024\/00530-4"],"award-info":[{"award-number":["2024\/00530-4"]}]},{"name":"CAPES","award":["2023\/04583-2"],"award-info":[{"award-number":["2023\/04583-2"]}]},{"name":"CAPES","award":["2022\/03668-1"],"award-info":[{"award-number":["2022\/03668-1"]}]},{"name":"CAPES","award":["2018\/22214-6"],"award-info":[{"award-number":["2018\/22214-6"]}]},{"name":"CAPES","award":["2021\/08325-2"],"award-info":[{"award-number":["2021\/08325-2"]}]},{"name":"CAPES","award":["88887. 631085\/2021-00"],"award-info":[{"award-number":["88887. 631085\/2021-00"]}]},{"name":"CAPES","award":["305610\/2022-8"],"award-info":[{"award-number":["305610\/2022-8"]}]},{"name":"Flemish Government","award":["2023\/10442-2"],"award-info":[{"award-number":["2023\/10442-2"]}]},{"name":"Flemish Government","award":["2024\/00530-4"],"award-info":[{"award-number":["2024\/00530-4"]}]},{"name":"Flemish Government","award":["2023\/04583-2"],"award-info":[{"award-number":["2023\/04583-2"]}]},{"name":"Flemish Government","award":["2022\/03668-1"],"award-info":[{"award-number":["2022\/03668-1"]}]},{"name":"Flemish Government","award":["2018\/22214-6"],"award-info":[{"award-number":["2018\/22214-6"]}]},{"name":"Flemish Government","award":["2021\/08325-2"],"award-info":[{"award-number":["2021\/08325-2"]}]},{"name":"Flemish Government","award":["88887. 631085\/2021-00"],"award-info":[{"award-number":["88887. 631085\/2021-00"]}]},{"name":"Flemish Government","award":["305610\/2022-8"],"award-info":[{"award-number":["305610\/2022-8"]}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"publisher","award":["2023\/10442-2"],"award-info":[{"award-number":["2023\/10442-2"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"publisher","award":["2024\/00530-4"],"award-info":[{"award-number":["2024\/00530-4"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"publisher","award":["2023\/04583-2"],"award-info":[{"award-number":["2023\/04583-2"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"publisher","award":["2022\/03668-1"],"award-info":[{"award-number":["2022\/03668-1"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"publisher","award":["2018\/22214-6"],"award-info":[{"award-number":["2018\/22214-6"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"publisher","award":["2021\/08325-2"],"award-info":[{"award-number":["2021\/08325-2"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"publisher","award":["88887. 631085\/2021-00"],"award-info":[{"award-number":["88887. 631085\/2021-00"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"publisher","award":["305610\/2022-8"],"award-info":[{"award-number":["305610\/2022-8"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Texture, a significant visual attribute in images, plays an important role in many pattern recognition tasks. While Convolutional Neural Networks (CNNs) have been among the most effective methods for texture analysis, alternative architectures such as Vision Transformers (ViTs) have recently demonstrated superior performance on a range of visual recognition problems. However, the suitability of ViTs for texture recognition remains underexplored. In this work, we investigate the capabilities and limitations of ViTs for texture recognition by analyzing 25 different ViT variants as feature extractors and comparing them to CNN-based and hand-engineered approaches. Our evaluation encompasses both accuracy and efficiency, aiming to assess the trade-offs involved in applying ViTs to texture analysis. Our results indicate that ViTs generally outperform CNN-based and hand-engineered models, particularly when using strong pre-training and in-the-wild texture datasets. Notably, BeiTv2-B\/16 achieves the highest average accuracy (85.7%), followed by ViT-B\/16-DINO (84.1%) and Swin-B (80.8%), outperforming the ResNet50 baseline (75.5%) and the hand-engineered baseline (73.4%). As a lightweight alternative, EfficientFormer-L3 attains a competitive average accuracy of 78.9%. In terms of efficiency, although ViT-B and BeiT(v2) have a higher number of GFLOPs and parameters, they achieve significantly faster feature extraction on GPUs compared to ResNet50. These findings highlight the potential of ViTs as a powerful tool for texture analysis while also pointing to areas for future exploration, such as efficiency improvements and domain-specific adaptations.<\/jats:p>","DOI":"10.3390\/jimaging11090304","type":"journal-article","created":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T14:53:01Z","timestamp":1757083981000},"page":"304","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3986-7747","authenticated-orcid":false,"given":"Leonardo","family":"Scabini","sequence":"first","affiliation":[{"name":"S\u00e3o Carlos Institute of Physics, University of S\u00e3o Paulo, S\u00e3o Carlos 13560-970, SP, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andre","family":"Sacilotti","sequence":"additional","affiliation":[{"name":"Institute of Mathematics and Computer Sciences, University of S\u00e3o Paulo, S\u00e3o Carlos 13566-590, SP, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kallil M.","family":"Zielinski","sequence":"additional","affiliation":[{"name":"S\u00e3o Carlos Institute of Physics, University of S\u00e3o Paulo, S\u00e3o Carlos 13560-970, SP, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2490-180X","authenticated-orcid":false,"given":"Lucas C.","family":"Ribas","sequence":"additional","affiliation":[{"name":"Institute of Biosciences, Humanities and Exact Sciences, S\u00e3o Paulo State University, S\u00e3o Jos\u00e9 do Rio Preto 15054-000, SP, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3876-620X","authenticated-orcid":false,"given":"Bernard","family":"De Baets","sequence":"additional","affiliation":[{"name":"KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2945-1556","authenticated-orcid":false,"given":"Odemir M.","family":"Bruno","sequence":"additional","affiliation":[{"name":"S\u00e3o Carlos Institute of Physics, University of S\u00e3o Paulo, S\u00e3o Carlos 13560-970, SP, Brazil"},{"name":"Institute of Mathematics and Computer Sciences, University of S\u00e3o Paulo, S\u00e3o Carlos 13566-590, SP, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1016\/j.ins.2019.11.042","article-title":"Spatio-spectral networks for color-texture analysis","volume":"515","author":"Scabini","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1007\/s11263-018-1125-z","article-title":"From BoW to CNN: Two decades of texture representation for texture classification","volume":"127","author":"Liu","year":"2019","journal-title":"Int. J. Comput. Vis."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"8975","DOI":"10.1109\/ACCESS.2018.2890743","article-title":"Texture feature extraction methods: A survey","volume":"7","year":"2019","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Pietik\u00e4inen, M., and Ojala, T. (1996). Texture analysis in industrial applications. Image Technology: Advances in Image Processing, Multimedia and Machine Vision, Springer.","DOI":"10.1007\/978-3-642-58288-2_13"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"809","DOI":"10.3174\/ajnr.A2061","article-title":"Texture analysis: A review of neurologic MR imaging applications","volume":"31","author":"Kassner","year":"2010","journal-title":"Am. J. Neuroradiol."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1016\/j.ins.2019.02.060","article-title":"Multilayer complex network descriptors for color\u2013texture characterization","volume":"491","author":"Scabini","year":"2019","journal-title":"Inf. Sci."},{"key":"ref_10","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Touvron, H., Cord, M., and J\u00e9gou, H. (2022, January 23\u201327). Deit iii: Revenge of the vit. Proceedings of the Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel. Proceedings, Part XXIV.","DOI":"10.1007\/978-3-031-20053-3_30"},{"key":"ref_12","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is All You Need. Proceedings of the Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_13","unstructured":"Bao, H., Dong, L., Piao, S., and Wei, F. (2022, January 25\u201329). BEiT: BERT Pre-Training of Image Transformers. Proceedings of the International Conference on Learning Representations, Online."},{"key":"ref_14","unstructured":"Peng, Z., Dong, L., Bao, H., Ye, Q., and Wei, F. (2022). BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers. arXiv."},{"key":"ref_15","first-page":"26183","article-title":"You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection","volume":"Volume 34","author":"Ranzato","year":"2021","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Hong, W., Lao, J., Ren, W., Wang, J., Chen, J., and Chu, W. (2022, January 18\u201324). Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00462"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, H., Xie, S., Lin, L., Iwamoto, Y., Han, X.H., Chen, Y.W., and Tong, R. (2022, January 23\u201327). Mixed Transformer U-Net for Medical Image Segmentation. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746172"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lu, Z., Li, J., Liu, H., Huang, C., Zhang, L., and Zeng, T. (2021). Transformer for Single Image Super-Resolution. arXiv.","DOI":"10.1109\/CVPRW56347.2022.00061"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"971","DOI":"10.1109\/TPAMI.2002.1017623","article-title":"Multiresolution gray-scale and rotation invariant texture classification with local binary patterns","volume":"24","author":"Ojala","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1016\/j.sigpro.2004.10.009","article-title":"Color texture measurement and segmentation","volume":"85","author":"Hoang","year":"2005","journal-title":"Signal Process."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1984","DOI":"10.1016\/j.patcog.2011.11.009","article-title":"Color texture analysis based on fractal descriptors","volume":"45","author":"Backes","year":"2012","journal-title":"Pattern Recognit."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1016\/S0031-3203(01)00074-7","article-title":"Brief review of invariant texture analysis methods","volume":"35","author":"Zhang","year":"2002","journal-title":"Pattern Recognit."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1109\/TCYB.2018.2873135","article-title":"Importance of vertices in complex networks applied to texture analysis","volume":"50","author":"Cantero","year":"2018","journal-title":"IEEE Trans. Cybern."},{"key":"ref_24","unstructured":"Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. (2014, January 22\u201324). Decaf: A deep convolutional activation feature for generic visual recognition. Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., and Vedaldi, A. (2014, January 23\u201328). Describing textures in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.461"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"109802","DOI":"10.1016\/j.patcog.2023.109802","article-title":"RADAM: Texture Recognition through Randomized Aggregated Encoding of Deep Activation Maps","volume":"143","author":"Scabini","year":"2023","journal-title":"Pattern Recognit."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, H., Xue, J., and Dana, K. (2017, January 21\u201326). Deep TEN: Texture encoding network. Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.309"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"118223","DOI":"10.1016\/j.eswa.2022.118223","article-title":"DFAEN: Double-order knowledge fusion and attentional encoding network for texture recognition","volume":"209","author":"Yang","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_29","first-page":"12116","article-title":"Do vision transformers see like convolutional neural networks?","volume":"34","author":"Raghu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","first-page":"26831","article-title":"Are transformers more robust than cnns?","volume":"34","author":"Bai","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yang, F., Yang, H., Fu, J., Lu, H., and Guo, B. (2020, January 13\u201319). Learning texture transformer network for image super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00583"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yao, C., Zhang, S., Yang, M., Liu, M., and Qi, J. (2021, January 5\u20139). Depth super-resolution by texture-depth transformer. Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China.","DOI":"10.1109\/ICME51207.2021.9428393"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4653","DOI":"10.1109\/JSTARS.2022.3179415","article-title":"MSFusion: Multistage for remote sensing image spatiotemporal fusion based on texture transformer and convolutional neural network","volume":"15","author":"Yang","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhang, X., and Saniie, J. (2021, January 14\u201315). Material Texture Recognition using Ultrasonic Images with Transformer Neural Networks. Proceedings of the 2021 IEEE International Conference on Electro Information Technology (EIT), Mt. Pleasant, MI, USA.","DOI":"10.1109\/EIT51626.2021.9491908"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Soleymani, M., Bonyani, M., Mahami, H., and Nasirzadeh, F. (2021). Construction material classification on imbalanced datasets using Vision Transformer (ViT) architecture. arXiv.","DOI":"10.21203\/rs.3.rs-1948162\/v1"},{"key":"ref_36","first-page":"1","article-title":"ViTALnet: Anomaly on Industrial Textured Surfaces with Hybrid Transformer","volume":"72","author":"Tao","year":"2023","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Xu, W., Xu, Y., Chang, T., and Tu, Z. (2021, January 10\u201317). Co-scale conv-attentional image transformers. Proceedings of the the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00983"},{"key":"ref_38","unstructured":"Mehta, S., and Rastegari, M. (2021, January 4). MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_39","unstructured":"Mehta, S., and Rastegari, M. (2022). Separable Self-attention for Mobile Vision Transformers. arXiv."},{"key":"ref_40","unstructured":"Li, Y., Yuan, G., Wen, Y., Hu, E., Evangelidis, G., Tulyakov, S., Wang, Y., and Ren, J. (2022). EfficientFormer: Vision Transformers at MobileNet Speed. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Caron, M., Touvron, H., Misra, I., J\u00e9gou, H., Mairal, J., Bojanowski, P., and Joulin, A. (2021, January 10\u201317). Emerging properties in self-supervised vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"ref_42","unstructured":"Chen, X., Hsieh, C.J., and Gong, B. (2022, January 25\u201329). When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations. Proceedings of the International Conference on Learning Representations, Online."},{"key":"ref_43","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Chen, C.F.R., Fan, Q., and Panda, R. (2021, January 10\u201317). Crossvit: Cross-attention multi-scale vision transformer for image classification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00041"},{"key":"ref_45","unstructured":"d\u2019Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., and Sagun, L. (2021, January 18\u201324). Convit: Improving vision transformers with soft convolutional inductive biases. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_46","unstructured":"Hatamizadeh, A., Yin, H., Kautz, J., and Molchanov, P. (2022). Global context vision transformers. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Li, Y., Wu, C.Y., Fan, H., Mangalam, K., Xiong, B., Malik, J., and Feichtenhofer, C. (2022, January 18\u201324). MViTv2: Improved Multiscale Vision Transformers for Classification and Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00476"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., and J\u00e9gou, H. (2021, January 10\u201317). Going deeper with image transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00010"},{"key":"ref_49","first-page":"20014","article-title":"Xcit: Cross-covariance image transformers","volume":"34","author":"Ali","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_51","unstructured":"Li, J., Xia, X., Li, W., Li, H., Wang, X., Xiao, X., Wang, R., Zheng, M., and Pan, X. (2022). Next-vit: Next generation vision transformer for efficient deployment in realistic industrial scenarios. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Yun, S., and Ro, Y. (2024, January 16\u201322). Shvit: Single-head vision transformer with memory efficient macro design. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00550"},{"key":"ref_53","unstructured":"Tschannen, M., Gritsenko, A., Wang, X., Naeem, M.F., Alabdulmohsin, I., Parthasarathy, N., Evans, T., Beyer, L., Xia, Y., and Mustafa, B. (2025). Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features. arXiv."},{"key":"ref_54","unstructured":"Alabdulmohsin, I., Zhai, X., Kolesnikov, A., and Beyer, L. (2023). Getting vit in shape: Scaling laws for compute-optimal model design, 2024. arXiv."},{"key":"ref_55","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Fix, E., and Hodges, J.L. (1951). Discriminatory Analysis, Nonparametric Estimation: Consistency Properties, USAF School of Aviation Medicine. Report 4, Project n\u00ba 21\u201349.","DOI":"10.1037\/e471672008-001"},{"key":"ref_57","unstructured":"Ripley, B.D. (2007). Pattern Recognition and Neural Networks, Cambridge University Press."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_59","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the Advances in Neural Information Processing Systems 32, Vancouver, BC, Canada."},{"key":"ref_60","unstructured":"Wightman, R. (2024, November 10). PyTorch Image Models. Available online: https:\/\/github.com\/rwightman\/pytorch-image-models."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Perronnin, F., S\u00e1nchez, J., and Mensink, T. (2010, January 5\u201311). Improving the fisher kernel for large-scale image classification. Proceedings of the Computer Vision\u2013ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece. Proceedings, Part IV 11.","DOI":"10.1007\/978-3-642-15561-1_11"},{"key":"ref_62","first-page":"867","article-title":"\u03b8 (1) time complexity parallel local binary pattern feature extractor on a graphical processing unit","volume":"13","author":"Badanidiyoor","year":"2019","journal-title":"ICIC Express Lett."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Beyer, L., Zhai, X., Royer, A., Markeeva, L., Anil, R., and Kolesnikov, A. (2022, January 18\u201324). Knowledge distillation: A good teacher is patient and consistent. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01065"},{"key":"ref_65","unstructured":"Huovinen, S., Pietik\u00e4inen, M., Ojala, T., Kyll\u00f6nen, J., Viertola, J., and M\u00e4enp\u00e4\u00e4, T. (2002, January 11\u201315). Outex\u2013New Framework for Empirical Evaluation of Texture Analysis Algorithms. Proceedings of the 16th International Conference on Pattern Recognition, Quebec, QC, Canada."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"784","DOI":"10.1167\/9.8.784","article-title":"Material perception: What can you see in a brief glance?","volume":"9","author":"Sharan","year":"2010","journal-title":"J. Vis."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Caputo, B., Hayman, E., and Mallikarjuna, P. (2005, January 17\u201321). Class-specific material categorisation. Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV\u201905) Volume 1, Beijing, China.","DOI":"10.1109\/ICCV.2005.54"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"111282","DOI":"10.1016\/j.asoc.2024.111282","article-title":"A multilevel pooling scheme in convolutional neural networks for texture image recognition","volume":"152","author":"Lyra","year":"2024","journal-title":"Appl. Soft Comput."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"109959","DOI":"10.1016\/j.patcog.2023.109959","article-title":"Enhancing texture representation with deep tracing pattern encoding","volume":"146","author":"Chen","year":"2024","journal-title":"Pattern Recognit."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"122978","DOI":"10.1016\/j.eswa.2023.122978","article-title":"Fractal pooling: A new strategy for texture recognition using convolutional neural networks","volume":"243","author":"Florindo","year":"2024","journal-title":"Expert Syst. Appl."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/9\/304\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:40:21Z","timestamp":1760035221000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/9\/304"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,5]]},"references-count":70,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["jimaging11090304"],"URL":"https:\/\/doi.org\/10.3390\/jimaging11090304","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,5]]}}}