{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:48:26Z","timestamp":1760233706773,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,2,15]],"date-time":"2021-02-15T00:00:00Z","timestamp":1613347200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61876155"],"award-info":[{"award-number":["61876155"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018556","name":"Science and Technology Program of Suzhou","doi-asserted-by":"publisher","award":["SYG201712, SZS201613"],"award-info":[{"award-number":["SYG201712, SZS201613"]}],"id":[{"id":"10.13039\/501100018556","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"publisher","award":["BK20181189"],"award-info":[{"award-number":["BK20181189"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Object detection has wide applications in intelligent systems and sensor applications. Compared with two stage detectors, recent one stage counterparts are capable of running more efficiently with comparable accuracy, which satisfy the requirement of real-time processing. To further improve the accuracy of one stage single shot detector (SSD), we propose a novel Multi-Path fusion Single Shot Detector (MPSSD). Different from other feature fusion methods, we exploit the connection among different scale representations in a pyramid manner. We propose feature fusion module to generate new feature pyramids based on multiscale features in SSD, and these pyramids are sent to our pyramid aggregation module for generating final features. These enhanced features have both localization and semantics information, thus improving the detection performance with little computation cost. A series of experiments on three benchmark datasets PASCAL VOC2007, VOC2012, and MS COCO demonstrate that our approach outperforms many state-of-the-art detectors both qualitatively and quantitatively. In particular, for input images with size 512 \u00d7 512, our method attains mean Average Precision (mAP) of 81.8% on VOC2007 test, 80.3% on VOC2012 test, and 33.1% mAP on COCO test-dev 2015.<\/jats:p>","DOI":"10.3390\/s21041360","type":"journal-article","created":{"date-parts":[[2021,2,15]],"date-time":"2021-02-15T02:35:23Z","timestamp":1613356523000},"page":"1360","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Multipath Fusion Strategy Based Single Shot Detector"],"prefix":"10.3390","volume":"21","author":[{"given":"Shuyi","family":"Qu","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Liverpool, Liverpool L69 7ZX, UK"},{"name":"School of Advanced Technology, Xi\u2019an Jiaotong-Liverpool University, Suzhou 215123, China"}]},{"given":"Kaizhu","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Advanced Technology, Xi\u2019an Jiaotong-Liverpool University, Suzhou 215123, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8080-082X","authenticated-orcid":false,"given":"Amir","family":"Hussain","sequence":"additional","affiliation":[{"name":"School of Computing, Edinburgh Napier University, Edinburgh EH11 4BN, UK"}]},{"given":"Yannis","family":"Goulermas","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Liverpool, Liverpool L69 7ZX, UK"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/s11263-019-01247-4","article-title":"Deep learning for generic object detection: A survey","volume":"128","author":"Liu","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016). Ssd: Single shot multibox detector. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_7","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Qu, S., Huang, K., Hussain, A., and Goulermas, Y. (2019, January 14\u201319). MPSSD: Multi-Path Fusion Single Shot Detector. Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary.","DOI":"10.1109\/IJCNN.2019.8852053"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft coco: Common objects in context. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., and Van Gool, L. (2015, January 13\u201316). Deepproposal: Hunting objects by cascading deep convolutional layers. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.296"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_13","unstructured":"Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C. (2017). DSSD: Deconvolutional single shot detector. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Eigen, D., and Fergus, R. (2015, January 13\u201316). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.304"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ochoa-Zezzatti, A., Mej\u00eda, J., Contreras-Masse, R., Mart\u00ednez, E., and Hern\u00e1ndez, A. (2020). Ethnic Characterization in Amalgamated People for Airport Security Using a Repository of Images and Pigeon-Inspired Optimization (PIO) Algorithm for the Improvement of Their Results. Applications of Hybrid Metaheuristic Algorithms for Image Processing, Springer.","DOI":"10.1007\/978-3-030-40977-7_5"},{"key":"ref_16","unstructured":"Li, Z., and Zhou, F. (2017). FSSD: Feature Fusion Single Shot Multibox Detector. arXiv."},{"key":"ref_17","unstructured":"Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision, ECCV. Available online: https:\/\/people.eecs.berkeley.edu\/~efros\/courses\/AP06\/Papers\/csurka-eccv-04.pdf."},{"key":"ref_18","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, X., Han, T.X., and Yan, S. (October, January 29). An HOG-LBP human detector with partial occlusion handling. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459207"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Perronnin, F., S\u00e1nchez, J., and Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15561-1_11"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Felzenszwalb, P.F., Girshick, R.B., and McAllester, D. (2010, January 13\u201318). Cascade object detection with deformable part models. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539906"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_23","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Huang, K., Hussain, A., Wang, Q.F., and Zhang, R. (2019). Deep Learning: Fundamentals, Theory and Applications, Springer.","DOI":"10.1007\/978-3-030-06073-2"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2892","DOI":"10.1016\/j.neucom.2017.10.043","article-title":"Siamese network ensemble for visual tracking","volume":"275","author":"Jiang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1007\/s12559-017-9515-z","article-title":"Reducing and stretching deep convolutional activation features for accurate image classification","volume":"10","author":"Zhong","year":"2018","journal-title":"Cogn. Comput."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"778","DOI":"10.1007\/s12559-018-9566-9","article-title":"A Novel Deep Density Model for Unsupervised Learning","volume":"11","author":"Yang","year":"2019","journal-title":"Cogn. Comput."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_32","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_33","unstructured":"Kong, T., Yao, A., Chen, Y., and Sun, F. (July, January 26). Hypernet: Towards accurate region proposal generation and joint object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_34","unstructured":"Bell, S., Lawrence Zitnick, C., Bala, K., and Girshick, R. (July, January 26). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Woo, S., Hwang, S., and Kweon, I.S. (2018, January 12\u201315). Stairnet: Top-down semantic aggregation for accurate one shot detection. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00125"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Liu, S., and Huang, D. (2018, January 8\u201314). Receptive field block net for accurate and fast object detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01252-6_24"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Pang, Y., Wang, T., Anwer, R.M., Khan, F.S., and Shao, L. (2019, January 16\u201320). Efficient featurized image pyramid network for single shot detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00751"},{"key":"ref_38","unstructured":"Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., and Ling, H. (February, January 27). M2det: A single-shot object detector based on multi-level feature pyramid network. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 13\u201316). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hariharan, B., Arbel\u00e1ez, P., Bourdev, L., Maji, S., and Malik, J. (2011, January 6\u201313). Semantic contours from inverse detectors. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126343"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/4\/1360\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:24:15Z","timestamp":1760160255000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/4\/1360"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,15]]},"references-count":42,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["s21041360"],"URL":"https:\/\/doi.org\/10.3390\/s21041360","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,2,15]]}}}