{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T06:31:53Z","timestamp":1773729113972,"version":"3.50.1"},"reference-count":65,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T00:00:00Z","timestamp":1739232000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Convolutional Neural Networks (CNNs) have proven to be very effective in image classification due to their status as a powerful feature learning algorithm. Traditional approaches have considered the problem of multiclass classification, where the goal is to classify a set of objects at once. However, co-occurrence can make the discriminative features of the target less salient and may lead to overfitting of the model, resulting in lower performance. To address this, we propose a multi-label classification ensemble model including a Vision Transformer (ViT) and CNN for directly detecting one or multiple objects in an image. First, we improve the MobileNetV2 and DenseNet201 models using extra convolutional layers to strengthen image classification. In detail, three convolution layers are applied in parallel at the end of both models. ViT can learn dependencies among distant positions and local detail, making it an effective tool for multi-label classification. Finally, an ensemble learning algorithm is used to combine the classification predictions of the ViT, the modified MobileNetV2, and DenseNet201 bands for increased image classification accuracy using a voting system. The performance of the proposed model is examined on four benchmark datasets, achieving accuracies of 98.24%, 98.89%, 99.91%, and 96.69% on ASCAL VOC 2007, PASCAL VOC 2012, MS-COCO, and NUS-WIDE 318, respectively, showing that our framework can enhance current state-of-the-art methods.<\/jats:p>","DOI":"10.3390\/bdcc9020039","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T11:01:08Z","timestamp":1739271668000},"page":"39","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A Deep Ensemble Learning Approach Based on a Vision Transformer and Neural Network for Multi-Label Image Classification"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3277-4505","authenticated-orcid":false,"given":"Anas W.","family":"Abulfaraj","sequence":"first","affiliation":[{"name":"Department of Information Systems, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9560-092X","authenticated-orcid":false,"given":"Faisal","family":"Binzagr","sequence":"additional","affiliation":[{"name":"Department of Computer Science, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"24495","DOI":"10.1109\/ACCESS.2017.2762354","article-title":"Self-organizing hierarchical particle swarm optimization of correlation filters for object recognition","volume":"5","author":"Tehsin","year":"2017","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Tehsin, S., Rehman, S., Awan, A.B., Chaudry, Q., Abbas, M., Young, R., and Asif, A. (2016, January 20\u201321). Improved maximum average correlation height filter with adaptive log base selection for object recognition. Proceedings of the Optical Pattern Recognition XXVII, Baltimore, MD, USA.","DOI":"10.1117\/12.2223621"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Tehsin, S., Rehman, S., Riaz, F., Saeed, O., Hassan, A., Khan, M., and Alam, M.S. (2017, January 12\u201313). Fully invariant wavelet enhanced minimum average correlation energy filter for object recognition in cluttered and occluded environments. Proceedings of the Pattern Recognition and Tracking XXVIII, Bellingham, WA, USA.","DOI":"10.1117\/12.2262434"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Tehsin, S., Rehman, S., Bilal, A., Chaudry, Q., Saeed, O., Abbas, M., and Young, R. (2017, January 12\u201313). Comparative analysis of zero aliasing logarithmic mapped optimal trade-off correlation filter. Proceedings of the Pattern Recognition and Tracking XXVIII, Bellingham, WA, USA.","DOI":"10.1117\/12.2261439"},{"key":"ref_5","unstructured":"Akbar, N., Tehsin, S., Bilal, A., Rubab, S., Rehman, S., and Young, R. (May, January 27). Detection of moving human using optimized correlation filters in homogeneous environments. Proceedings of the Pattern Recognition and Tracking XXXI, Online."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Akbar, N., Tehsin, S., ur Rehman, H., Rehman, S., and Young, R. (2019, January 15\u201316). Hardware design of correlation filters for target detection. Proceedings of the Pattern Recognition and Tracking XXX, Baltimore, MD, USA.","DOI":"10.1117\/12.2519497"},{"key":"ref_7","first-page":"1017","article-title":"A hybrid deep learning architecture for the classification of superhero fashion products: An application for medical-tech classification","volume":"124","author":"Alhaisoni","year":"2020","journal-title":"Comput. Model. Eng. Sci."},{"key":"ref_8","first-page":"141","article-title":"A blockchain based framework for stomach abnormalities recognition","volume":"67","author":"Khan","year":"2021","journal-title":"Comput. Mater. Contin."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"136","DOI":"10.2174\/1573405616666200423085826","article-title":"An optimized approach for breast cancer classification for histopathological images based on hybrid feature set","volume":"17","author":"Nasir","year":"2021","journal-title":"Curr. Med. Imaging"},{"key":"ref_10","first-page":"59","article-title":"Customer prioritization for medical supply chain during COVID-19 pandemic","volume":"70","author":"Mushtaq","year":"2021","journal-title":"Comput. Mater. Contin."},{"key":"ref_11","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 25, Available online: https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2012\/file\/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C. (2020, January 13\u201319). Ghostnet: More features from cheap operations. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"ref_14","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S. (2016). Wide residual networks. arXiv.","DOI":"10.5244\/C.30.87"},{"key":"ref_16","unstructured":"Tan, M., and Le, Q. (2019, January 9\u201315). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1016\/j.ijar.2011.12.011","article-title":"The impact of diversity on the accuracy of evidential classifier ensembles","volume":"53","author":"Bi","year":"2012","journal-title":"Int. J. Approx. Reason."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"109706","DOI":"10.1016\/j.compag.2024.109706","article-title":"FLTrans-Net: Transformer-based feature learning network for wheat head detection","volume":"229","author":"Yousafzai","year":"2025","journal-title":"Comput. Electron. Agric."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.asoc.2017.04.058","article-title":"Considering diversity and accuracy simultaneously for ensemble pruning","volume":"58","author":"Dai","year":"2017","journal-title":"Appl. Soft Comput."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Nasir, I.M., Tehsin, S., Dama\u0161evi\u010dius, R., and Maskeli\u016bnas, R. (2024). Integrating Explanations into CNNs by Adopting Spiking Attention Block for Skin Cancer Detection. Algorithms, 17.","DOI":"10.3390\/a17120557"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"107212","DOI":"10.1016\/j.asoc.2021.107212","article-title":"Balancing accuracy and diversity in ensemble learning using a two-phase artificial bee colony approach","volume":"105","author":"Shiue","year":"2021","journal-title":"Appl. Soft Comput."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Nasir, I.M., Alrasheedi, M.A., and Alreshidi, N.A. (2024). MFAN: Multi-Feature Attention Network for Breast Cancer Classification. Mathematics, 12.","DOI":"10.3390\/math12233639"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1108\/IJICC-04-2024-0184","article-title":"X-News dataset for online news categorization","volume":"17","author":"Yousafzai","year":"2024","journal-title":"Int. J. Intell. Comput. Cybern."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Saad, S.M., Bilal, A., Tehsin, S., and Rehman, S. (2020, January 9\u201313). Spoof detection for fake biometric images using feature-based techniques. Proceedings of the SPIE Future Sensing Technologies, Online.","DOI":"10.1117\/12.2576873"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tehsin, S., Hassan, A., and Riaz, F. (2024, January 19\u201320). Ensemble Learning for Offline Signature Verification using Fused Deep Features. Proceedings of the 2024 5th International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan.","DOI":"10.1109\/ICACS60934.2024.10473290"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Tehsin, S., Nasir, I.M., Dama\u0161evi\u010dius, R., and Maskeli\u016bnas, R. (2024). DaSAM: Disease and spatial attention module-based explainable model for brain tumor detection. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8090097"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"107805","DOI":"10.1016\/j.compeleceng.2022.107805","article-title":"HAREDNet: A deep learning based architecture for autonomous video surveillance by recognizing human actions","volume":"99","author":"Nasir","year":"2022","journal-title":"Comput. Electr. Eng."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Nasir, I.M., Raza, M., Shah, J.H., Khan, M.A., and Rehman, A. (2021, January 6\u20137). Human action recognition using machine learning in uncontrolled environment. Proceedings of the 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia.","DOI":"10.1109\/CAIDA51941.2021.9425202"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"120311","DOI":"10.1016\/j.eswa.2023.120311","article-title":"ENGA: Elastic net-based genetic algorithm for human action recognition","volume":"227","author":"Nasir","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_30","first-page":"2667","article-title":"Improved shark smell optimization algorithm for human action recognition","volume":"76","author":"Nasir","year":"2023","journal-title":"Comput. Mater. Contin."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Nasir, I.M., Khan, M.A., Yasmin, M., Shah, J.H., Gabryel, M., Scherer, R., and Dama\u0161evi\u010dius, R. (2020). Pearson correlation-based feature selection for document classification using balanced training. Sensors, 20.","DOI":"10.3390\/s20236793"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Nasir, I.M., Khan, M.A., Armghan, A., and Javed, M.Y. (2020, January 13\u201315). SCNN: A secure convolutional neural network using blockchain. Proceedings of the 2020 2nd International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia.","DOI":"10.1109\/ICCIS49240.2020.9257635"},{"key":"ref_33","first-page":"3903","article-title":"Fast intra mode selection in HEVC using statistical model","volume":"70","author":"Tariq","year":"2022","journal-title":"Comput. Mater. Contin."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Malik, D.S., Shah, T., Tehsin, S., Nasir, I.M., Fitriyani, N.L., and Syafrudin, M. (2024). Block Cipher Nonlinear Component Generation via Hybrid Pseudo-Random Binary Sequence for Image Encryption. Mathematics, 12.","DOI":"10.3390\/math12152302"},{"key":"ref_35","unstructured":"Asfia, Y., Tehsin, S., Shahzeen, A., and Khan, U.S. (2019, January 5\u20138). Visual person identification device using raspberry Pi. Proceedings of the Conference of Open Innovations Association, FRUCT, Helsinki, Finland."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1016\/j.future.2019.05.074","article-title":"A new approach for mobile robot localization based on an online IoT system","volume":"100","author":"Junior","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Tehsin, S., Hassan, A., Riaz, F., Nasir, I.M., Fitriyani, N.L., and Syafrudin, M. (2024). Enhancing Signature Verification Using Triplet Siamese Similarity Networks in Digital Documents. Mathematics, 12.","DOI":"10.3390\/math12172757"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.neunet.2019.04.021","article-title":"Redundant feature pruning for accelerated inference in deep neural networks","volume":"118","author":"Ayinde","year":"2019","journal-title":"Neural Netw."},{"key":"ref_39","unstructured":"Liu, S., Ren, B., Shen, X., and Wang, Y. (2020). CoCoPIE: Making Mobile AI Sweet As PIE\u2013Compression-Compilation Co-Design Goes a Long Way. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"5113","DOI":"10.1007\/s10462-020-09816-7","article-title":"A comprehensive survey on model compression and acceleration","volume":"53","author":"Choudhary","year":"2020","journal-title":"Artif. Intell. Rev."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Ahmed, W.S. (2020, January 16\u201318). The impact of filter size and number of filters on classification accuracy in CNN. Proceedings of the 2020 International conference on computer science and software engineering (CSASE), Duhok, Iraq.","DOI":"10.1109\/CSASE48920.2020.9142089"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Chen, S., Wang, Y., and Huan, W. (2020, January 12\u201314). Review of research on lightweight convolutional neural networks. Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China.","DOI":"10.1109\/ITOEC49072.2020.9141847"},{"key":"ref_43","unstructured":"Yu, F. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_44","unstructured":"Yang, F., and Xiao, X. (2020). Msdu-net: A multi-scale dilated u-net for blur detection. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"124087","DOI":"10.1109\/ACCESS.2019.2927169","article-title":"A dilated CNN model for image classification","volume":"7","author":"Lei","year":"2019","journal-title":"IEEE Access"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Mahbod, A., Schaefer, G., Wang, C., Dorffner, G., Ecker, R., and Ellinger, I. (2020). Transfer learning using a multi-scale and multi-network ensemble for skin lesion classification. Comput. Methods Programs Biomed., 193.","DOI":"10.1016\/j.cmpb.2020.105475"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"107098","DOI":"10.1016\/j.patcog.2019.107098","article-title":"Multi-model ensemble with rich spatial information for object detection","volume":"99","author":"Xu","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"982","DOI":"10.1109\/TPAMI.2019.2943860","article-title":"Nonlinear regression via deep negative correlation learning","volume":"43","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Yuan, Z., Zhang, K., and Huang, T. (2024, January 15\u201319). Positive label is all you need for multi-label classification. Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada.","DOI":"10.1109\/ICME57554.2024.10687587"},{"key":"ref_50","first-page":"1137","article-title":"Pervasive Attentive Neural Network for Intelligent Image Classification Based on N-CDE\u2019s","volume":"79","author":"Abulfaraj","year":"2024","journal-title":"Comput. Mater. Contin."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.patrec.2024.08.020","article-title":"A semantic guidance-based fusion network for multi-label image classification","volume":"185","author":"Wang","year":"2024","journal-title":"Pattern Recognit. Lett."},{"key":"ref_52","unstructured":"Arya, S., Xiang, Y., and Gogate, V. (2024, January 2\u20134). Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification. Proceedings of the International Conference on Artificial Intelligence and Statistics, Valencia, Spain."},{"key":"ref_53","first-page":"405","article-title":"A network model based on scrap metal classification and grading","volume":"Volume 13105","author":"Yu","year":"2024","journal-title":"Proceedings of the International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2023)"},{"key":"ref_54","unstructured":"Chong, C.F., Guo, J., Yang, X., Ke, W., and Wang, Y. (2024). Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Popordanoska, T., Tiulpin, A., and Blaschko, M.B. (2024, January 3\u20138). Beyond classification: Definition and density-based estimation of calibration in object detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV57701.2024.00064"},{"key":"ref_56","unstructured":"Shah, M., and Bhalgat, Y. (2024). Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification. arXiv."},{"key":"ref_57","unstructured":"Jiu, M., Zhu, H., and Sahbi, H. (2024). Multi-label Classification using Deep Multi-order Context-aware Kernel Networks. arXiv."},{"key":"ref_58","unstructured":"Liu, Y., Luo, W., Chen, Z., and Naseem, M.L. (2024). Showing Many Labels in Multi-label Classification Models: An Empirical Study of Adversarial Examples. arXiv."},{"key":"ref_59","unstructured":"Zhu, X., Liu, J., Tang, D., Ge, J., Liu, W., Liu, B., and Cao, J. (2024). Query-Based Knowledge Sharing for Open-Vocabulary Multi-Label Classification. arXiv."},{"key":"ref_60","unstructured":"Shi, L., Tang, C., Deng, H., Xu, C., Xing, L., and Chen, B. (2024). Generalized Trusted Multi-view Classification Framework with Hierarchical Opinion Aggregation. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"112077","DOI":"10.1016\/j.knosys.2024.112077","article-title":"Diverse and tailored image generation for zero-shot multi-label classification","volume":"299","author":"Zhang","year":"2024","journal-title":"Knowl.-Based Syst."},{"key":"ref_62","first-page":"1","article-title":"Pascal VOC 2008 challenge","volume":"24","author":"Hoiem","year":"2009","journal-title":"World Lit. Today"},{"key":"ref_63","unstructured":"Everingham, M., Van Gool, L., Williams, C.K., Winn, J., and Zisserman, A. (2012, January 8). The pascal visual object classes challenge 2012 (voc2012). Proceedings of the Results, Wuhan, China."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Lee, C.W., Fang, W., Yeh, C.K., and Wang, Y.C.F. (2018, January 18\u201323). Multi-label zero-shot learning with structured knowledge graphs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00170"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. (2009, January 8\u201310). Nus-wide: A real-world web image database from national university of singapore. Proceedings of the ACM International Conference on Image and Video Retrieval, Santorini Island, Greece.","DOI":"10.1145\/1646396.1646452"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/2\/39\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:31:21Z","timestamp":1760027481000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/2\/39"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,11]]},"references-count":65,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["bdcc9020039"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9020039","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,11]]}}}