{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T23:37:02Z","timestamp":1780443422668,"version":"3.54.1"},"reference-count":60,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,4,30]],"date-time":"2023-04-30T00:00:00Z","timestamp":1682812800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Research Foundation of Korea (NRF)","award":["NRF-2020M3C1C2A01080819"],"award-info":[{"award-number":["NRF-2020M3C1C2A01080819"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential modules for object detection models to consider various object scales. However, the AP for small objects is lower than the AP for medium and large objects. It is difficult to recognize small objects because they do not have sufficient information, and information is lost in deeper CNN layers. This paper proposes a new FPN model named ssFPN (scale sequence (S2) feature-based feature pyramid network) to detect multi-scale objects, especially small objects. We propose a new scale sequence (S2) feature that is extracted by 3D convolution on the level of the FPN. It is defined and extracted from the FPN to strengthen the information on small objects based on scale-space theory. Motivated by this theory, the FPN is regarded as a scale space and extracts a scale sequence (S2) feature by three-dimensional convolution on the level axis of the FPN. The defined feature is basically scale-invariant and is built on a high-resolution pyramid feature map for small objects. Additionally, the deigned S2 feature can be extended to most object detection models based on FPNs. We also designed a feature-level super-resolution approach to show the efficiency of the scale sequence (S2) feature. We verified that the scale sequence (S2) feature could improve the classification accuracy for low-resolution images by training a feature-level super-resolution model. To demonstrate the effect of the scale sequence (S2) feature, experiments on the scale sequence (S2) feature built-in object detection approach including both one-stage and two-stage models were conducted on the MS COCO dataset. For the two-stage object detection models Faster R-CNN and Mask R-CNN with the S2 feature, AP improvements of up to 1.6% and 1.4%, respectively, were achieved. Additionally, the APS of each model was improved by 1.2% and 1.1%, respectively. Furthermore, the one-stage object detection models in the YOLO series were improved. For YOLOv4-P5, YOLOv4-P6, YOLOR-P6, YOLOR-W6, and YOLOR-D6 with the S2 feature, 0.9%, 0.5%, 0.5%, 0.1%, and 0.1% AP improvements were observed. For small object detection, the APS increased by 1.1%, 1.1%, 0.9%, 0.4%, and 0.1%, respectively. Experiments using the feature-level super-resolution approach with the proposed scale sequence (S2) feature were conducted on the CIFAR-100 dataset. By training the feature-level super-resolution model, we verified that ResNet-101 with the S2 feature trained on LR images achieved a 55.2% classification accuracy, which was 1.6% higher than for ResNet-101 trained on HR images.<\/jats:p>","DOI":"10.3390\/s23094432","type":"journal-article","created":{"date-parts":[[2023,5,1]],"date-time":"2023-05-01T12:12:11Z","timestamp":1682943131000},"page":"4432","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":50,"title":["ssFPN: Scale Sequence (S2) Feature-Based Feature Pyramid Network for Object Detection"],"prefix":"10.3390","volume":"23","author":[{"given":"Hye-Jin","family":"Park","sequence":"first","affiliation":[{"name":"Department of Artificial Intelligence Engineering, Sookmyung Women\u2019s University, 100 Chungpa-ro 47 gil, Yongsna-gu, Seoul 04310, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7622-0817","authenticated-orcid":false,"given":"Ji-Woo","family":"Kang","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence Engineering, Sookmyung Women\u2019s University, 100 Chungpa-ro 47 gil, Yongsna-gu, Seoul 04310, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6555-3464","authenticated-orcid":false,"given":"Byung-Gyu","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence Engineering, Sookmyung Women\u2019s University, 100 Chungpa-ro 47 gil, Yongsna-gu, Seoul 04310, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1016\/j.patrec.2022.11.014","article-title":"Group-based Bi-Directional Recurrent Wavelet Neural Network for Efficient Video Super-Resolution (VSR)","volume":"164","author":"Choi","year":"2022","journal-title":"Pattern Recognit. Lett."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"7512","DOI":"10.1007\/s10489-022-03867-9","article-title":"Deepfake detection algorithm based on improved vision transformer","volume":"53","author":"Heo","year":"2023","journal-title":"Appl. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Park, S.J., Kim, B.G., and Chilamkurti, N. (2021). A robust facial expression recognition algorithm based on multi-rate feature fusion scheme. Sensors, 21.","DOI":"10.3390\/s21216954"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1089\/big.2020.0274","article-title":"Residual-based graph convolutional network for emotion recognition in conversation for smart Internet of Things","volume":"9","author":"Choi","year":"2021","journal-title":"Big Data"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Choi, Y.J., Lee, Y.W., and Kim, B.G. (2021, January 10\u201315). Wavelet attention embedding networks for video super-resolution. Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), Milan, Italy.","DOI":"10.1109\/ICPR48806.2021.9412623"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"85","DOI":"10.33851\/JMIS.2021.8.2.85","article-title":"Frontal face generation algorithm from multi-view images based on generative adversarial network","volume":"8","author":"Heo","year":"2021","journal-title":"J. Multimed. Inf. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Jeong, D., Kim, B.G., and Dong, S.Y. (2020). Deep joint spatiotemporal network (DJSTN) for efficient facial expression recognition. Sensors, 20.","DOI":"10.3390\/s20071936"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"105246","DOI":"10.1016\/j.jobe.2022.105246","article-title":"Vision-based concrete crack detection using a hybrid framework considering noise effect","volume":"61","author":"Yu","year":"2022","journal-title":"J. Build. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"115066","DOI":"10.1016\/j.engstruct.2022.115066","article-title":"Torsional capacity evaluation of RC beams using an improved bird swarm algorithm optimised 2D convolutional neural network","volume":"273","author":"Yu","year":"2022","journal-title":"Eng. Struct."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1882","DOI":"10.1109\/TIP.2022.3148876","article-title":"Siamese implicit region proposal network with compound attention for visual tracking","volume":"31","author":"Chan","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"108793","DOI":"10.1016\/j.patcog.2022.108793","article-title":"Online multiple object tracking using joint detection and embedding network","volume":"130","author":"Chan","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1016\/j.isprsjprs.2022.06.008","article-title":"UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery","volume":"190","author":"Wang","year":"2022","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Bulat, A., Kossaifi, J., Tzimiropoulos, G., and Pantic, M. (2020, January 16\u201320). Toward fast and accurate human pose estimation via soft-gated skip connections. Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), Buenos Aires, Argentina.","DOI":"10.1109\/FG47880.2020.00014"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zheng, W., Tang, W., Jiang, L., and Fu, C.W. (2021, January 20\u201325). SE-SSD: Self-ensembling single-stage object detector from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01426"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, Y., Wang, T., Zhang, X., and Sun, J. (2022). Petr: Position embedding transformation for multi-view 3d object detection. arXiv.","DOI":"10.1007\/978-3-031-19812-0_31"},{"key":"ref_16","unstructured":"Huang, Y., Chen, J., and Huang, D. (March, January 22). UFPMP-Det: Toward accurate and efficient object detection on drone imagery. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Virtual."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_18","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst., 28."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"3388","DOI":"10.1109\/TPAMI.2020.2981890","article-title":"Imbalance problems in object detection: A review","volume":"43","author":"Oksuz","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"103910","DOI":"10.1016\/j.imavis.2020.103910","article-title":"Recent advances in small object detection based on deep learning: A review","volume":"97","author":"Tong","year":"2020","journal-title":"Image Vis. Comput."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201323). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_24","unstructured":"Wang, C.Y., Yeh, I.H., and Liao, H.Y.M. (2021). You only learn one representation: Unified network for multiple tasks. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201325). Object recognition from local scale-invariant features. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Corfu, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_26","unstructured":"Lindeberg, T. (2013). Scale-Space Theory in Computer Vision, Springer Science & Business Media."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, X., Zhang, S., Yu, Z., Feng, L., and Zhang, W. (2020, January 13\u201319). Scale-equalizing pyramid convolution for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01337"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Dai, X., Chen, Y., Xiao, B., Chen, D., Liu, M., Yuan, L., and Zhang, L. (2021, January 20\u201325). Dynamic head: Unifying object detection heads with attentions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00729"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Azimi, S.M., Vig, E., Bahmanyar, R., K\u00f6rner, M., and Reinartz, P. (2018, January 2\u20136). Towards multi-class object detection in unconstrained remote sensing imagery. Proceedings of the Asian Conference on Computer Vision (ACCV), Perth, Australia.","DOI":"10.1007\/978-3-030-20893-6_10"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, Z., Gao, G., Sun, L., and Fang, Z. (2021, January 5\u20139). HRDNet: High-resolution detection network for small objects. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China.","DOI":"10.1109\/ICME51207.2021.9428241"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Yang, Y., Wang, J., Xu, W., and Yuille, A.L. (2016, January 27\u201330). Attention to scale: Scale-aware semantic image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.396"},{"key":"ref_35","unstructured":"Tao, A., Sapra, K., and Catanzaro, B. (2020). Hierarchical multi-scale attention for semantic segmentation. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Lin, T.Y., and Le, Q.V. (2019, January 15\u201320). Nas-fpn: Learning scalable feature pyramid architecture for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00720"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 13\u201319). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., and Ling, H. (2020, January 7\u201312). Cbnet: A novel composite backbone network architecture for object detection. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6834"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"6893","DOI":"10.1109\/TIP.2022.3216771","article-title":"Cbnet: A composite backbone network architecture for object detection","volume":"31","author":"Liang","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, Y., Xie, H., Fu, Z., and Zhang, Y. (2019, January 10\u201316). DSRN: A Deep Scale Relationship Network for Scene Text Detection. Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), Macao, China.","DOI":"10.24963\/ijcai.2019\/133"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1109\/TPAMI.2015.2439281","article-title":"Image super-resolution using deep convolutional networks","volume":"38","author":"Dong","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Dong, C., Loy, C.C., and Tang, X. (2016, January 11\u201314). Accelerating the super-resolution convolutional neural network. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_25"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Kim, J., Lee, J.K., and Lee, K.M. (2016, January 27\u201330). Accurate image super-resolution using very deep convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.182"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lim, B., Son, S., Kim, H., Nah, S., and Mu Lee, K. (2017, January 21\u201326). Enhanced deep residual networks for single image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.151"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.jvcir.2011.06.004","article-title":"Evaluation of image resolution and super-resolution on face recognition performance","volume":"23","author":"Fookes","year":"2012","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Haris, M., Shakhnarovich, G., and Ukita, N. (2021, January 8\u201312). Task-driven super resolution: Object detection in low-resolution images. Proceedings of the Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Indonesia.","DOI":"10.1007\/978-3-030-92307-5_45"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S. (2017, January 21\u201326). Perceptual generative adversarial networks for small object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.211"},{"key":"ref_49","unstructured":"Noh, J., Bae, W., Lee, W., Seo, J., and Kim, G. (November, January 27). Better to follow, follow to be better: Towards precise supervision of feature super-resolution for small object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Repblic of Korea."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1145\/3422622","article-title":"Generative adversarial networks","volume":"63","author":"Goodfellow","year":"2020","journal-title":"Commun. ACM"},{"key":"ref_51","unstructured":"Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., and Vijayanarasimhan, S. (2016). Youtube-8m: A large-scale video classification benchmark. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_53","unstructured":"Xu, B., Wang, N., Chen, T., and Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2021, January 20\u201325). Scaled-yolov4: Scaling cross stage partial network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01283"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., and Li, M. (2019, January 15\u201320). Bag of tricks for image classification with convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00065"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_57","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (2022). CenterNet++ for Object Detection. arXiv."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Qiao, S., Chen, L.C., and Yuille, A. (2021, January 20\u201325). Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01008"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Fang, Y., Yang, S., Wang, X., Li, Y., Fang, C., Shan, Y., Feng, B., and Liu, W. (2021, January 11\u201317). Instances as queries. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00683"},{"key":"ref_60","unstructured":"Krizhevsky, A. (2022, September 01). Learning Multiple Layers of Features from Tiny Images. Available online: https:\/\/www.cs.toronto.edu\/~kriz\/learning-features-2009-TR.pdf."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/9\/4432\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:27:29Z","timestamp":1760124449000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/9\/4432"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,30]]},"references-count":60,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["s23094432"],"URL":"https:\/\/doi.org\/10.3390\/s23094432","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,30]]}}}