{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:30:51Z","timestamp":1760059851086,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,7,16]],"date-time":"2025-07-16T00:00:00Z","timestamp":1752624000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Innovate UK and Unilever PLC","award":["KTP13113","TSI-100927-2023-1"],"award-info":[{"award-number":["KTP13113","TSI-100927-2023-1"]}]},{"name":"Ministry for Digital Transformation and the Civil Service","award":["KTP13113","TSI-100927-2023-1"],"award-info":[{"award-number":["KTP13113","TSI-100927-2023-1"]}]},{"name":"University of Granada","award":["KTP13113","TSI-100927-2023-1"],"award-info":[{"award-number":["KTP13113","TSI-100927-2023-1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Generalising deep-learning models to perform well on unseen data domains with minimal retraining remains a significant challenge in computer vision. Even when the target task\u2014such as quantifying the number of elements in an image\u2014stays the same, data quality, shape, or form variations can deviate from the training conditions, often necessitating manual intervention. As a real-world industry problem, we aim to automate stock level estimation in retail cabinets. As technology advances, new cabinet models with varying shapes emerge alongside new camera types. This evolving scenario poses a substantial obstacle to deploying long-term, scalable solutions. To surmount the challenge of generalising to new cabinet models and cameras with minimal amounts of sample images, this research introduces a new solution. This paper proposes a novel ensemble model that combines DenseNet-201 and Vision Transformer (ViT-B\/8) architectures to achieve generalisation in stock-level classification. The novelty aspect of our solution comes from the fact that we combine a transformer with a DenseNet model in order to capture both the local, hierarchical details and the long-range dependencies within the images, improving generalisation accuracy with less data. Key contributions include (i) a novel DenseNet-201 + ViT-B\/8 feature-level fusion, (ii) an adaptation workflow that needs only two images per class, (iii) a balanced layer-unfreezing schedule, (iv) a publicly described domain-shift benchmark, and (v) a 47 pp accuracy gain over four standard few-shot baselines. Our approach leverages fine-tuning techniques to adapt two pre-trained models to the new retail cabinets (i.e., standing or horizontal) and camera types using only two images per class. Experimental results demonstrate that our method achieves high accuracy rates of 91% on new cabinets with the same camera and 89% on new cabinets with different cameras, significantly outperforming standard few-shot learning methods.<\/jats:p>","DOI":"10.3390\/make7030066","type":"journal-article","created":{"date-parts":[[2025,7,16]],"date-time":"2025-07-16T10:46:05Z","timestamp":1752662765000},"page":"66","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Generalising Stock Detection in Retail Cabinets with Minimal Data Using a DenseNet and Vision Transformer Ensemble"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1047-9903","authenticated-orcid":false,"given":"Babak","family":"Rahi","sequence":"first","affiliation":[{"name":"Ice Cream Research and Development, Unilever PLC, Bedford MK44 1LQ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Deniz","family":"Sagmanli","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Nottingham, Nottingham NG8 1BB, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Felix","family":"Oppong","sequence":"additional","affiliation":[{"name":"Ice Cream Research and Development, Unilever PLC, Bedford MK44 1LQ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Direnc","family":"Pekaslan","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Nottingham, Nottingham NG8 1BB, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0150-0651","authenticated-orcid":false,"given":"Isaac","family":"Triguero","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Artificial Intelligence, Andalusian Research Institute, Data Science and Computational Intelligence (DaSCI), University of Granada, 18071 Granada, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,7,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"\u0160iki\u0107, F., Kalafati\u0107, Z., Suba\u0161i\u0107, M., and Lon\u010dari\u0107, S. (2024). Enhanced Out-of-Stock Detection in Retail Shelf Images Based on Deep Learning. Sensors, 24.","DOI":"10.3390\/s24020693"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Clarke, C., and Cox, A. (2024). Science of Ice Cream, Royal Society of Chemistry.","DOI":"10.1039\/9781837673032"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"102135","DOI":"10.1016\/j.inffus.2023.102135","article-title":"General Purpose Artificial Intelligence Systems (GPAIS): Properties, definition, taxonomy, societal implications and responsible governance","volume":"103","author":"Triguero","year":"2024","journal-title":"Inf. Fusion"},{"key":"ref_4","unstructured":"Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., and Brunskill, E. (2021). On the opportunities and risks of foundation models. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision 2021, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_6","first-page":"3965","article-title":"CoAtNet: Marrying Convolution and Attention for All Data Sizes","volume":"34","author":"Dai","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_7","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2024, September 21). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations (ICLR). Available online: https:\/\/openreview.net\/forum?id=YicbFdNTTy."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely Connected Convolutional Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_9","first-page":"1","article-title":"Generalising from a Few Examples: A Survey on Few-Shot Learning","volume":"53","author":"Wang","year":"2020","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Savit, A., and Damor, A. (2023, January 4\u20136). Revolutionizing Retail Stores with Computer Vision and Edge AI: A Novel Shelf Management System. Proceedings of the 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India.","DOI":"10.1109\/ICAAIC56838.2023.10140947"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Majdi, M.A., Dewantara, B.S.B., and Bachtiar, M.M. (2020, January 29\u201330). Product Stock Management Using Computer Vision. Proceedings of the 2020 International Electronics Symposium (IES), Surabaya, Indonesia.","DOI":"10.1109\/IES50839.2020.9231673"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Xu, J., and Ma, J. (2022, January 20\u201322). Auto Parts Defect Detection Based on Few-Shot Learning. Proceedings of the 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China.","DOI":"10.1109\/CVIDLICCEA56201.2022.9823993"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Kong, L., Gong, L., Wang, G., and Liu, S. (2023, January 1\u20133). DP-ProtoNet: An Interpretable Dual Path Prototype Network for Medical Image Diagnosis. Proceedings of the 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Exeter, UK.","DOI":"10.1109\/TrustCom60117.2023.00390"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"5652","DOI":"10.1109\/TIP.2018.2861573","article-title":"A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning","volume":"27","author":"Rahman","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, Y., Chang, G., Fu, G., Wei, Y., Lan, J., and Liu, J. (2022, January 15\u201317). Self-Attention Based Siamese Neural Network Recognition Model. Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC), Hefei, China.","DOI":"10.1109\/CCDC55256.2022.10034228"},{"key":"ref_16","first-page":"3320","article-title":"How Transferable are Features in Deep Neural Networks?","volume":"27","author":"Yosinski","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Howard, J., and Ruder, S. (2018). Universal Language Model Fine-Tuning for Text Classification. arXiv.","DOI":"10.18653\/v1\/P18-1031"},{"key":"ref_18","first-page":"4077","article-title":"Prototypical Networks for Few-shot Learning","volume":"30","author":"Snell","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst. (NeurIPS)"},{"key":"ref_19","first-page":"3630","article-title":"Matching Networks for One Shot Learning","volume":"29","author":"Vinyals","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst. (NeurIPS)"},{"key":"ref_20","unstructured":"Koch, G., Zemel, R., and Salakhutdinov, R. (2015, January 6\u201311). Siamese Neural Networks for One-shot Image Recognition. Proceedings of the 32nd International Conference on Machine Learning (ICML) Deep Learning Workshop, Lille, France."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., and Hospedales, T.M. (2018, January 18\u201323). Learning to Compare: Relation Network for Few-Shot Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00131"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, Y., Liu, Z., Xu, H., Darrell, T., and Wang, X. (2021, January 10\u201317). Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning. Proceedings of the IEEE\/CVF International Conference on Computer Vision 2021, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00893"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Tzeng, E., Hoffman, J., Saenko, K., and Darrell, T. (2017, January 21\u201326). Adversarial discriminative domain adaptation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.316"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Csurka, G. (2017). Domain adaptation for visual applications: A comprehensive survey. arXiv.","DOI":"10.1007\/978-3-319-58347-1"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/66\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:10:27Z","timestamp":1760033427000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/66"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,16]]},"references-count":24,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["make7030066"],"URL":"https:\/\/doi.org\/10.3390\/make7030066","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2025,7,16]]}}}