{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T15:31:50Z","timestamp":1768923110344,"version":"3.49.0"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T00:00:00Z","timestamp":1698364800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T00:00:00Z","timestamp":1698364800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int. J. Mach. Learn. &amp; Cyber."],"published-print":{"date-parts":[[2024,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Since the advent of Transformers followed by Vision Transformers (ViTs), enormous success has been achieved by researchers in the field of computer vision and object detection. The difficulty mechanism of splitting images into fixed patches posed a serious challenge in this arena and resulted in loss of useful information at the time of object detection and classification. To overcome the challengers, we propose an innovative Intelligent-based patching mechanism and integrated it seamlessly into the conventional Patch-based ViT framework. The proposed method enables the utilization of patches with flexible sizes to capture and retain essential semantic content from input images and therefore increases the performance compared with conventional methods. Our method was evaluated with three renowned datasets Microsoft Common Objects in Context (MSCOCO-2017), Pascal VOC (Visual Object Classes Challenge) and Cityscapes upon object detection and classification. The experimental results showed promising improvements in specific metrics, particularly in higher confidence thresholds, making it a notable performer in object detection and classification tasks.<\/jats:p>","DOI":"10.1007\/s13042-023-01996-2","type":"journal-article","created":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T04:01:40Z","timestamp":1698379300000},"page":"1767-1778","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["IntelPVT: intelligent patch-based pyramid vision transformers for object detection and classification"],"prefix":"10.1007","volume":"15","author":[{"given":"Divya","family":"Nimma","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6789-166X","authenticated-orcid":false,"given":"Zhaoxian","family":"Zhou","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,10,27]]},"reference":[{"key":"1996_CR1","doi-asserted-by":"crossref","unstructured":"Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., Veit, A. (2021) Understanding robustness of transformers for image classification. CoRR abs\/2103.14586 https:\/\/arxiv.org\/abs\/2103.14586","DOI":"10.1109\/ICCV48922.2021.01007"},{"key":"1996_CR2","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1007\/978-3-030-58452-8_13","volume-title":"Computer Vision \u2013 ECCV 2020","author":"N Carion","year":"2020","unstructured":"Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision \u2013 ECCV 2020. Springer, Cham, pp 213\u2013229"},{"key":"1996_CR3","doi-asserted-by":"crossref","unstructured":"Strudel, R., Garcia, R., Laptev, I., Schmid, C. (2021) Segmentary: Transformer for semantic segmentation. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), 7262\u20137272","DOI":"10.1109\/ICCV48922.2021.00717"},{"key":"1996_CR4","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y. (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764\u2013773","DOI":"10.1109\/ICCV.2017.89"},{"key":"1996_CR5","doi-asserted-by":"crossref","unstructured":"Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Tay, F.E.H., Feng, J., Yan, S. (2021) Tokens-to-token vit: Training vision transformers from scratch on ImageNet. CoRR abs\/2101.11986 https:\/\/arxiv.org\/abs\/2101. 11986","DOI":"10.1109\/ICCV48922.2021.00060"},{"key":"1996_CR6","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L. (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), pp. 568\u2013578","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"1996_CR7","first-page":"367","volume":"2019","author":"P Dhruv","year":"2020","unstructured":"Dhruv P, Naskar S (2020) Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): a review. Machine Learn Inform Proces: Proceed of ICMLIP 2019:367\u2013381","journal-title":"Machine Learn Inform Proces: Proceed of ICMLIP"},{"key":"1996_CR8","unstructured":"Chu, X., Tian, Z., Zhang, B., Wang, X., Shen, C. (2021) Conditional positional encodings for vision transformers. In: The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum? id=3KWnuT-R1bh"},{"key":"1996_CR9","doi-asserted-by":"crossref","unstructured":"d\u2019Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., Sagun, L.: Convit: (2021) Improving vision transformers with soft convolutional inductive biases. In: International Conference on Machine Learning, pp. 2286\u20132296. PMLR","DOI":"10.1088\/1742-5468\/ac9830"},{"key":"1996_CR10","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.-Y., Cubuk, E.D., Le, Q.V., Zoph, B. (2021) Simple copy-paste is a strong data augmentation metho for instance segmentation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 2918\u20132928","DOI":"10.1109\/CVPR46437.2021.00294"},{"key":"1996_CR11","doi-asserted-by":"publisher","unstructured":"Fu, J., Zheng, H., Mei, T. (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4476\u20134484. https:\/\/doi.org\/10.1109\/CVPR.2017.476","DOI":"10.1109\/CVPR.2017.476"},{"key":"1996_CR12","doi-asserted-by":"crossref","unstructured":"Gao, P., Zheng, M., Wang, X., Dai, J., Li, H. (2021) Fast convergence of DETR with spatially modulated co-attention. CoRR abs\/2101.07448 https:\/\/arxiv.org\/abs\/2101.07448","DOI":"10.1109\/ICCV48922.2021.00360"},{"key":"1996_CR13","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J. (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"1996_CR14","doi-asserted-by":"publisher","unstructured":"Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132\u20137141 (2018). https:\/\/doi.org\/10.1109\/CVPR.2018.00745","DOI":"10.1109\/CVPR.2018.00745"},{"key":"1996_CR15","doi-asserted-by":"publisher","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P. (2017) Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999\u20133007. https:\/\/doi.org\/10.1109\/ICCV. 2017.324","DOI":"10.1109\/ICCV"},{"key":"1996_CR16","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. CoRR abs\/2103.14030 https:\/\/arxiv.org\/abs\/2103.14030","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"1996_CR17","unstructured":"Ren, S., He, K., Girshick, R.B., Sun, J. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs\/1506.01497 https:\/\/arxiv.org\/abs\/1506.01497"},{"key":"1996_CR18","doi-asserted-by":"crossref","unstructured":"Srinivas, A., Lin, T., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A. (2021) Bottleneck transformers for visual recognition. CoRR abs\/2101.11605 https:\/\/arxiv.org\/abs\/2101.11605","DOI":"10.1109\/CVPR46437.2021.01625"},{"key":"1996_CR19","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H.S., Zhang, L. (2020) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. CoRR abs\/2012.15840 https:\/\/arxiv.org\/abs\/2012.15840"},{"key":"1996_CR20","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-020-02623-6","author":"AA Movassagh","year":"2021","unstructured":"Movassagh AA, Alzubi JA, Gheisari M, Rahimi M, Mohan S, Abbasi AA, Nabipour N (2021) Artificial neural networks training algorithm integrating invasive weed optimization with differential evolutionary model. J Ambient Intell Human Comput. https:\/\/doi.org\/10.1007\/s12652-020-02623-6","journal-title":"J Ambient Intell Human Comput"},{"key":"1996_CR21","doi-asserted-by":"publisher","first-page":"16091","DOI":"10.1007\/s00521-020-04761-6","volume":"32","author":"OA Alzubi","year":"2020","unstructured":"Alzubi OA, Alzubi JA, Alweshah M, Qiqieh I, Al-Shami S, Ramachandran M (2020) An optimal pruning algorithm of classifier ensembles: dynamic programming approach. Neural Comput Appl 32:16091\u201316107","journal-title":"Neural Comput Appl"},{"key":"1996_CR22","doi-asserted-by":"crossref","unstructured":"Alzubi JA, (2015) Diversity based improved bagging algorithm. In: Proceedings of the International Conference on Engineering & MIS, pp. 1\u20135","DOI":"10.1145\/2832987.2833043"},{"issue":"1","key":"1996_CR23","first-page":"76","volume":"15","author":"OA Alzubi","year":"2018","unstructured":"Alzubi OA, Alzubi JAA, Tedmori S, Rashaideh H, Almomani O (2018) Consensus-based combining method for classifier ensembles. Int Arab J Inf Technol 15(1):76\u201386","journal-title":"Int Arab J Inf Technol"},{"issue":"12","key":"1996_CR24","doi-asserted-by":"publisher","first-page":"1336","DOI":"10.19026\/rjaset.11.2241","volume":"11","author":"JA Alzubi","year":"2015","unstructured":"Alzubi JA (2015) Research article optimal classifier ensemble design based on cooperative game theory. Res J Appl Sci Eng Technol 11(12):1336\u20131343","journal-title":"Res J Appl Sci Eng Technol"},{"key":"1996_CR25","volume-title":"Computer Vision \u2013 ECCV","author":"T-Y Lin","year":"2014","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision \u2013 ECCV. Springer, Cham"},{"issue":"1","key":"1996_CR26","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","volume":"111","author":"M Everingham","year":"2015","unstructured":"Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vision 111(1):98\u2013136. https:\/\/doi.org\/10.1007\/s11263-014-0733-5","journal-title":"Int J Comput Vision"},{"key":"1996_CR27","doi-asserted-by":"crossref","unstructured":"Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B, (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213\u20133223","DOI":"10.1109\/CVPR.2016.350"}],"updated-by":[{"DOI":"10.1007\/s13042-023-02052-9","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2023,12,9]],"date-time":"2023-12-09T00:00:00Z","timestamp":1702080000000}}],"container-title":["International Journal of Machine Learning and Cybernetics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13042-023-01996-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13042-023-01996-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13042-023-01996-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,12]],"date-time":"2024-04-12T14:20:11Z","timestamp":1712931611000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13042-023-01996-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,27]]},"references-count":27,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5]]}},"alternative-id":["1996"],"URL":"https:\/\/doi.org\/10.1007\/s13042-023-01996-2","relation":{"correction":[{"id-type":"doi","id":"10.1007\/s13042-023-02052-9","asserted-by":"object"}]},"ISSN":["1868-8071","1868-808X"],"issn-type":[{"value":"1868-8071","type":"print"},{"value":"1868-808X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,27]]},"assertion":[{"value":"3 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 December 2023","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1007\/s13042-023-02052-9","URL":"https:\/\/doi.org\/10.1007\/s13042-023-02052-9","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}]}}