{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T10:09:23Z","timestamp":1760609363129,"version":"build-2065373602"},"reference-count":49,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T00:00:00Z","timestamp":1624838400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41971424, 61701191"],"award-info":[{"award-number":["41971424, 61701191"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003392","name":"Natural Science Foundation of Fujian Province","doi-asserted-by":"publisher","award":["2020J01701"],"award-info":[{"award-number":["2020J01701"]}],"id":[{"id":"10.13039\/501100003392","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fujian Provincial Science and Technology Program 426 Project","award":["JAT190318"],"award-info":[{"award-number":["JAT190318"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Object detection is a challenging computer vision task with numerous real-world applications. In recent years, the concept of the object relationship model has become helpful for object detection and has been verified and realized in deep learning. Nonetheless, most approaches to modeling object relations are limited to using the anchor-based algorithms; they cannot be directly migrated to the anchor-free frameworks. The reason is that the anchor-free algorithms are used to eliminate the complex design of anchors and predict heatmaps to represent the locations of keypoints of different object categories, without considering the relationship between keypoints. Therefore, to better fuse the information between the heatmap channels, it is important to model the visual relationship between keypoints. In this paper, we present a knowledge-driven network (KDNet)\u2014a new architecture that can aggregate and model keypoint relations to augment object features for detection. Specifically, it processes a set of keypoints simultaneously through interactions between their local and geometric features, thereby allowing the modeling of their relationship. Finally, the updated heatmaps were used to obtain the corners of the objects and determine their positions. The experimental results conducted on the RIDER dataset confirm the effectiveness of the proposed KDNet, which significantly outperformed other state-of-the-art object detection methods.<\/jats:p>","DOI":"10.3390\/a14070195","type":"journal-article","created":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T13:39:22Z","timestamp":1624887562000},"page":"195","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Knowledge-Driven Network for Object Detection"],"prefix":"10.3390","volume":"14","author":[{"given":"Yundong","family":"Wu","sequence":"first","affiliation":[{"name":"Computer Engineering College, Jimei University, Xiamen 361021, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8562-2743","authenticated-orcid":false,"given":"Jiajia","family":"Liao","sequence":"additional","affiliation":[{"name":"Computer Engineering College, Jimei University, Xiamen 361021, China"}]},{"given":"Yujun","family":"Liu","sequence":"additional","affiliation":[{"name":"Computer Engineering College, Jimei University, Xiamen 361021, China"}]},{"given":"Kaiming","family":"Ding","sequence":"additional","affiliation":[{"name":"Computer Engineering College, Jimei University, Xiamen 361021, China"}]},{"given":"Shimin","family":"Li","sequence":"additional","affiliation":[{"name":"Computer Engineering College, Jimei University, Xiamen 361021, China"}]},{"given":"Zhilin","family":"Zhang","sequence":"additional","affiliation":[{"name":"Computer Engineering College, Jimei University, Xiamen 361021, China"}]},{"given":"Guorong","family":"Cai","sequence":"additional","affiliation":[{"name":"Computer Engineering College, Jimei University, Xiamen 361021, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1707-5685","authenticated-orcid":false,"given":"Jinhe","family":"Su","sequence":"additional","affiliation":[{"name":"Computer Engineering College, Jimei University, Xiamen 361021, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,28]]},"reference":[{"unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv.","key":"ref_1"},{"key":"ref_2","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","key":"ref_3","DOI":"10.1109\/CVPR.2015.7298594"},{"doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","key":"ref_4","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"610","DOI":"10.1109\/TSMC.1973.4309314","article-title":"Textural Features for Image Classification","volume":"6","author":"Haralick","year":"1973","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1901","DOI":"10.1109\/TPAMI.2015.2491929","article-title":"HCP: A Flexible CNN Framework for Multi-Label Image Classification","volume":"38","author":"Wei","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"doi-asserted-by":"crossref","unstructured":"Hershey, S., Chaudhuri, S., Ellis, D.P., Gemmeke, J.F., Jansen, A., Moore, R.C., Plakal, M., Platt, D., Saurous, R.A., and Seybold, B. (2017, January 5\u20139). CNN architectures for large-scale audio classification. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","key":"ref_7","DOI":"10.1109\/ICASSP.2017.7952132"},{"unstructured":"Ren, S., He, K., Girshick, R., and Sun, J.J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv.","key":"ref_8"},{"doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","key":"ref_9","DOI":"10.1109\/ICCV.2015.169"},{"doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","key":"ref_10","DOI":"10.1109\/CVPR.2016.91"},{"doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","key":"ref_11","DOI":"10.1007\/978-3-319-46448-0_2"},{"doi-asserted-by":"crossref","unstructured":"G\u00fcler, R.A., Neverova, N., and Kokkinos, I. (2018, January 18\u201322). Densepose: Dense human pose estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_12","DOI":"10.1109\/CVPR.2018.00762"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1109\/TPAMI.2019.2929257","article-title":"OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields","volume":"43","author":"Cao","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). Cornernet: Detecting objects as paired keypoints. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","key":"ref_14","DOI":"10.1007\/978-3-030-01264-9_45"},{"unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (November, January 27). Centernet: Keypoint triplets for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","key":"ref_15"},{"unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as points. arXiv.","key":"ref_16"},{"doi-asserted-by":"crossref","unstructured":"Zhou, X., Zhuo, J., and Krahenbuhl, P. (2019, January 15\u201320). Bottom-up object detection by grouping extreme and center points. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_17","DOI":"10.1109\/CVPR.2019.00094"},{"doi-asserted-by":"crossref","unstructured":"Galleguillos, C., Rabinovich, A., and Belongie, S. (2008, January 23\u201328). Object categorization using co-occurrence, location and appearance. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","key":"ref_18","DOI":"10.1109\/CVPR.2008.4587799"},{"doi-asserted-by":"crossref","unstructured":"Torralba, A., Murphy, K.P., Freeman, W.T., and Rubin, M.A. (2003, January 18\u201320). Context-based vision system for place and object recognition. Proceedings of the Computer Vision, IEEE International Conference on IEEE Computer Society, Madison, WI, USA.","key":"ref_19","DOI":"10.1109\/ICCV.2003.1238354"},{"unstructured":"Tu, Z. (2008, January 23\u201328). Auto-context and its application to high-level vision tasks. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","key":"ref_20"},{"doi-asserted-by":"crossref","unstructured":"Mottaghi, R., Chen, X., Liu, X., Cho, N.-G., Lee, S.-W., Fidler, S., Urtasun, R., and Yuille, A. (2014, January 23\u201328). The role of context for object detection and semantic segmentation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","key":"ref_21","DOI":"10.1109\/CVPR.2014.119"},{"doi-asserted-by":"crossref","unstructured":"Hu, H., Gu, J., Zhang, Z., Dai, J., and Wei, Y. (2018, January 18\u201322). Relation networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_22","DOI":"10.1109\/CVPR.2018.00378"},{"doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","key":"ref_23","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective Search for Object Recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","key":"ref_25","DOI":"10.1109\/ICCV.2017.322"},{"doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201322). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_26","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1016\/j.isprsjprs.2020.09.022","article-title":"Oriented objects as pairs of middle lines","volume":"169","author":"Wei","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"doi-asserted-by":"crossref","unstructured":"Voigtlaender, P., Luiten, J., Torr, P.H., and Leibe, B. (2020, January 14\u201319). Siam r-cnn: Visual tracking by re-detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_28","DOI":"10.1109\/CVPR42600.2020.00661"},{"doi-asserted-by":"crossref","unstructured":"Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., and Ouyang, W. (2019, January 15\u201320). Hybrid task cascade for instance segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_29","DOI":"10.1109\/CVPR.2019.00511"},{"doi-asserted-by":"crossref","unstructured":"Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., and Fu, Y. (2020, January 14\u201319). Rethinking classification and localization for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_30","DOI":"10.1109\/CVPR42600.2020.01020"},{"doi-asserted-by":"crossref","unstructured":"Song, G., Liu, Y., and Wang, X. (2020, January 14\u201319). Revisiting the sibling head in object detector. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_31","DOI":"10.1109\/CVPR42600.2020.01158"},{"doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 22\u201329). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Venice, Italy.","key":"ref_32","DOI":"10.1109\/CVPR.2017.690"},{"unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv.","key":"ref_33"},{"doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","key":"ref_34","DOI":"10.1109\/ICCV.2017.324"},{"doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","key":"ref_35","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1109\/LGRS.2019.2923564","article-title":"AVDNet: A small-sized vehicle detection network for aerial visual data","volume":"17","author":"Mandal","year":"2019","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"unstructured":"Yang, X., Liu, Q., Yan, J., Li, A., Zhang, Z., and Yu, G. (2019). R3det: Refined single-stage detector with feature refinement for rotating object. arXiv.","key":"ref_37"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"7389","DOI":"10.1109\/TIP.2020.3002345","article-title":"Foveabox: Beyound anchor-based object detection","volume":"29","author":"Kong","year":"2020","journal-title":"IEEE Trans. Image Process."},{"doi-asserted-by":"crossref","unstructured":"Mylavarapu, S.K., Choudhuri, S., Shrivastava, A., Lee, J., and Givargis, T. (2020, January 9\u201313). FSAF: File system aware flash translation layer for NAND flash memories. Proceedings of the 2009 Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France.","key":"ref_39","DOI":"10.1109\/DATE.2009.5090696"},{"doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (2019, January 15\u201320). Fcos: Fully convolutional one-stage object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","key":"ref_40","DOI":"10.1109\/ICCV.2019.00972"},{"doi-asserted-by":"crossref","unstructured":"Yu, J., Jiang, Y., Wang, Z., Cao, Z., and Huang, T. (2016, January 7\u201310). Unitbox: An advanced object detection network. Proceedings of the 24th ACM International Conference on Multimedia, Palo Alto, CA, USA.","key":"ref_41","DOI":"10.1145\/2964284.2967274"},{"unstructured":"Law, H., Teng, Y., Russakovsky, O., and Deng, J. (2019). Cornernet-lite: Efficient keypoint based object detection. arXiv.","key":"ref_42"},{"doi-asserted-by":"crossref","unstructured":"Pang, Y., Xie, J., Khan, M.H., Anwer, R.M., Khan, F.S., and Shao, L. (2019, January 15\u201320). Mask-guided attention network for occluded pedestrian detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","key":"ref_43","DOI":"10.1109\/ICCV.2019.00507"},{"unstructured":"Du, X., Shi, X., and Huang, R. (2019). Repgn: Object detection with relational proposal graph network. arXiv.","key":"ref_44"},{"doi-asserted-by":"crossref","unstructured":"Zhou, P., and Chi, M. (2019, January 15\u201320). Relation parsing neural network for human-object interaction detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","key":"ref_45","DOI":"10.1109\/ICCV.2019.00093"},{"doi-asserted-by":"crossref","unstructured":"Deng, J., Pan, Y., Yao, T., Zhou, W., Li, H., and Mei, T. (2019, January 15\u201320). Relation distillation networks for video object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","key":"ref_46","DOI":"10.1109\/ICCV.2019.00712"},{"doi-asserted-by":"crossref","unstructured":"Yang, J., Lu, J., Lee, S., Batra, D., and Parikh, D. (2018, January 8\u201314). Graph r-cnn for scene graph generation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","key":"ref_47","DOI":"10.1007\/978-3-030-01246-5_41"},{"doi-asserted-by":"crossref","unstructured":"Xu, H., Jiang, C., Liang, X., and Li, Z. (2019, January 15\u201320). Spatial-aware graph relation network for large-scale object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_48","DOI":"10.1109\/CVPR.2019.00952"},{"unstructured":"Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., and Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv.","key":"ref_49"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/7\/195\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:26:13Z","timestamp":1760163973000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/7\/195"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,28]]},"references-count":49,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["a14070195"],"URL":"https:\/\/doi.org\/10.3390\/a14070195","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2021,6,28]]}}}