{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T15:56:41Z","timestamp":1774367801906,"version":"3.50.1"},"reference-count":60,"publisher":"MDPI AG","issue":"20","license":[{"start":{"date-parts":[[2019,10,13]],"date-time":"2019-10-13T00:00:00Z","timestamp":1570924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection in remote sensing images on a satellite or aircraft has important economic and military significance and is full of challenges. This task requires not only accurate and efficient algorithms, but also high-performance and low power hardware architecture. However, existing deep learning based object detection algorithms require further optimization in small objects detection, reduced computational complexity and parameter size. Meanwhile, the general-purpose processor cannot achieve better power efficiency, and the previous design of deep learning processor has still potential for mining parallelism. To address these issues, we propose an efficient context-based feature fusion single shot multi-box detector (CBFF-SSD) framework, using lightweight MobileNet as the backbone network to reduce parameters and computational complexity, adding feature fusion units and detecting feature maps to enhance the recognition of small objects and improve detection accuracy. Based on the analysis and optimization of the calculation of each layer in the algorithm, we propose efficient hardware architecture of deep learning processor with multiple neural processing units (NPUs) composed of 2-D processing elements (PEs), which can simultaneously calculate multiple output feature maps. The parallel architecture, hierarchical on-chip storage organization, and the local register are used to achieve parallel processing, sharing and reuse of data, and make the calculation of processor more efficient. Extensive experiments and comprehensive evaluations on the public NWPU VHR-10 dataset and comparisons with some state-of-the-art approaches demonstrate the effectiveness and superiority of the proposed framework. Moreover, for evaluating the performance of proposed hardware architecture, we implement it on Xilinx XC7Z100 field programmable gate array (FPGA) and test on the proposed CBFF-SSD and VGG16 models. Experimental results show that our processor are more power efficient than general purpose central processing units (CPUs) and graphics processing units (GPUs), and have better performance density than other state-of-the-art FPGA-based designs.<\/jats:p>","DOI":"10.3390\/rs11202376","type":"journal-article","created":{"date-parts":[[2019,10,14]],"date-time":"2019-10-14T03:54:13Z","timestamp":1571025253000},"page":"2376","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":46,"title":["Efficient Object Detection Framework and Hardware Architecture for Remote Sensing Images"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3183-808X","authenticated-orcid":false,"given":"Lin","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Northwestern Polytechnical University, Xi\u2019an 710072, China"},{"name":"Fourth Design Department, Beijing Institute of Microelectronics Technology, Beijing 100076, China"}]},{"given":"Shengbing","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Juan","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Animation and Software, Xi\u2019an Vocational and Technical College, Xi\u2019an 710077, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,10,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2016.03.014","article-title":"A survey on object detection in optical remote sensing images","volume":"117","author":"Cheng","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Xu, Y., Zhu, M., and Li, S. (2018). End-to-end airport detection in remote sensing images combining cascade region proposal networks and multi-threshold detection networks. Remote Sens., 10.","DOI":"10.3390\/rs10101516"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zhu, M., Xu, Y., Ma, S., Li, S., Ma, H., and Han, Y. (2019). Effective airplane detection in remote sensing images based on multilayer feature fusion and improved nonmaximal suppression algorithm. Remote Sens., 11.","DOI":"10.3390\/rs11091062"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2795","DOI":"10.1109\/TGRS.2010.2043109","article-title":"Vehicle detection in very high resolution satellite images of city areas","volume":"48","author":"Leitloff","year":"2010","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"He, H., Yang, D., Wang, S.C., Wang, S.Y., and Li, Y. (2019). Road extraction by using atrous spatial pyramid pooling integrated encoder-decoder network and structural similarity loss. Remote Sens., 11.","DOI":"10.3390\/rs11091015"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"8331","DOI":"10.1080\/01431161.2010.540587","article-title":"Semi-automated road tracking by template matching and distance transformation in urban areas","volume":"32","author":"Zhang","year":"2011","journal-title":"Int. J. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1186","DOI":"10.1016\/j.patrec.2013.03.031","article-title":"Interactive geospatial object extraction in high resolution remote sensing images using shape-based global minimization active contour model","volume":"34","author":"Liu","year":"2013","journal-title":"Pattern Recog. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1701","DOI":"10.1109\/TGRS.2012.2207123","article-title":"Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery","volume":"51","author":"Ok","year":"2013","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1016\/j.isprsjprs.2015.01.013","article-title":"Water flow based geometric active deformable model for road network","volume":"102","author":"Leninisha","year":"2015","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1080\/01431160512331326675","article-title":"Model and context-driven building extraction in dense urban aerial images","volume":"26","author":"Peng","year":"2005","journal-title":"Int. J. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.isprsjprs.2013.03.006","article-title":"Change detection from remotely sensed images: From pixel-based to object-based approaches","volume":"80","author":"Hussain","year":"2013","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1175","DOI":"10.1080\/01431161.2013.876120","article-title":"Mapping vegetation morphology types in a dry savanna ecosystem: Integrating hierarchical object-based image analysis with Random Forest","volume":"35","author":"Mishra","year":"2014","journal-title":"Int. J. Remote Sens."},{"key":"ref_13","first-page":"219","article-title":"Systematic evaluation of fuzzy operators for object-based landslide mapping","volume":"3","author":"Feizizadeh","year":"2014","journal-title":"South East. Eur. J. Earth Obs. Geomat."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201327). Object recognition from local scale-invariant features. Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_15","unstructured":"Dalal, N., and Triggs, B. (2005, January 21\u201323). Histograms of oriented gradients for human detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1109\/LGRS.2011.2161569","article-title":"Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model","volume":"9","author":"Sun","year":"2012","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"3446","DOI":"10.1109\/TGRS.2010.2046330","article-title":"A novel hierarchical method of ship detection from spaceborne optical image based on shape and texture features","volume":"48","author":"Zhu","year":"2010","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.isprsjprs.2010.11.001","article-title":"Support vector machines in remote sensing: A review","volume":"66","author":"Mountrakis","year":"2011","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1023\/A:1013912006537","article-title":"Logistic regression, adaboost and bregman distances","volume":"48","author":"Collins","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ali, A., Olaleye, O.G., and Bayoumi, M. (2016, January 16\u201319). Fast region-based DPM object detection for autonomous vehicles. Proceedings of the 2016 IEEE 59th International Midwest Symposium on Circuits and Systems, Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/MWSCAS.2016.7870113"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1109\/JSTARS.2010.2053521","article-title":"Building detection from one orthophoto and high-resolution InSAR data using conditional random fields","volume":"4","author":"Wegner","year":"2011","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Cheng, G., Han, J., Zhou, P., Yao, X., Zhang, D., and Guo, L. (2014, January 11\u201314). Sparse coding based airport detection from medium resolution Landsat-7 satellite remote sensing images. Proceedings of the 2014 3rd International Workshop on Earth Observation and Remote Sensing Applications, Changsha, China.","DOI":"10.1109\/EORSA.2014.6927883"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/j.jag.2006.05.001","article-title":"Road detection from high-resolution satellite images using artificial neural networks","volume":"9","author":"Mokhtarzade","year":"2007","journal-title":"Int. J. Appl. Earth Observ. Geoinform."},{"key":"ref_24","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe Nevada, NV, USA."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_26","first-page":"219","article-title":"Research on the infrastructure target detection of remote sensing image based on deep learning","volume":"48","author":"Wang","year":"2018","journal-title":"Radio Eng."},{"key":"ref_27","unstructured":"Jiao, L., Zhao, J., Yang, S., and Liu, F. (2017). Deep Learning, Optimization and Recognition, Tsinghua University Press. [1st ed.]."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrelland, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for object detection and semantic segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2016, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detetction. arXiv.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single Shot MultiBox Detector. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_34","unstructured":"Hennessy, J.L., and Patterson, D.A. (2019). Computer Architecture: A Quantitative Approach, Morgan Kaufman. [6th ed.]."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Farabet, C., Poulet, C., Han, J.Y., and Lecun, Y. (September, January 31). CNP: An FPGA based processor for convolutional networks. Proceedings of the 2009 International Conference on Field Programmable Logic and Applications, Prague, Czech Republic.","DOI":"10.1109\/FPL.2009.5272559"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., and Lecun, Y. (2011, January 20\u201325). NeuFlow: A runtime reconfigurable dataflow processor for vision. Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPRW.2011.5981829"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Peemen, M., Setio, A.A.A., Mesman, B., and Corporaal, H. (2013, January 6\u20139). Memory-centric accelerator design for convolutional neural networks. Proceedings of the 2013 IEEE 31st International Conference on Computer Design, Asheville, NC, USA.","DOI":"10.1109\/ICCD.2013.6657019"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Alwani, M., Chen, H., Ferdman, M., and Milder, P. (2016, January 15\u201319). Fused-layer CNN accelerators. Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture, Taipei, Taiwan.","DOI":"10.1109\/MICRO.2016.7783725"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1145\/2644865.2541967","article-title":"DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning","volume":"49","author":"Chen","year":"2014","journal-title":"ACM Sigplan Not."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1145\/2872887.2750389","article-title":"ShiDianNao: Shifting vision processing closer to the sensor","volume":"43","author":"Du","year":"2015","journal-title":"SIGARCH Comput. Archit. News"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., and Cong, J. (2015, January 22\u201324). Optimizing FPGA-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/2684746.2689060"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3140659.3080246","article-title":"In-datacenter performance analysis of a tensor processing unit","volume":"45","author":"Jouppi","year":"2017","journal-title":"SIGARCH Comput. Archit. News"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Li, L., Zhang, S.B., and Wu, J. (2017, January 27\u201330). Design and realization of deep learning coprocessor oriented to image recognition. Proceedings of the 2017 17th IEEE International Conference on Communication Technology, Chengdu, China.","DOI":"10.1109\/ICCT.2017.8359892"},{"key":"ref_44","unstructured":"Chang, J.W., Kang, K.W., and Kang, S.J. (2018). An energy-efficient FPGA-based deconvolutional neural networks accelerator for single image super-resolution. IEEE Trans. Circuits Sys. Video Tech."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Han, X., Zhong, Y., and Zhang, L. (2017). An efficient and robust integrated geospatial object detection framework for high spatial resolution remote sensing imagery. Remote Sens., 9.","DOI":"10.3390\/rs9070666"},{"key":"ref_46","unstructured":"Etten, A.V. (2018). You Only Look Twice: Rapid Multi-Scale Object Detection in Satellite Imagery. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhu, K., Chen, G., Tan, X., Zhang, L., Dai, F., Liao, P., and Gong, Y. (2019). Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid network. Remote Sens., 11.","DOI":"10.3390\/rs11070755"},{"key":"ref_48","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv."},{"key":"ref_49","unstructured":"Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C. (2017). DSSD: Deconvolutional Single Shot Detector. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Li, L., Zhang, S.B., and Wu, J. (2018, January 27\u201329). An efficient hardware architecture for activation function in deep learning processor. Proceedings of the 2018 3rd IEEE International Conference on Image, Vision and Computing, Chongqing, China.","DOI":"10.1109\/ICIVC.2018.8492754"},{"key":"ref_51","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014, January 3\u20137). Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654889"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (VOC) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"IJCV"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., and Song, S. (2016, January 21\u201323). Going deeper with embedded FPGA platform for convolutional neural network. Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/2847263.2847265"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Ma, Y., Cao, Y., Vrudhula, S., and Seo, J.S. (2017, January 22\u201324). Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/3020078.3021736"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Liu, Z., Chow, P., Xu, J., Jiang, J., Dou, Y., and Zhou, J. (2019). A uniform architecture design for accelerating 2D and 3D CNNs on FPGAs. Electronics, 8.","DOI":"10.3390\/electronics8010065"},{"key":"ref_57","unstructured":"Courbariaux, M., Bengio, Y., and David, J.P. (2015, January 7\u201312). Binaryconnect: Training deep neural networks with binary weights during propagations. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada."},{"key":"ref_58","first-page":"1","article-title":"Quantized neural networks: Training neural networks with low precision weights and activations","volume":"18","author":"Hubara","year":"2018","journal-title":"J. Mach. Learn. Res."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., and Kalenichenko, D. (2018, January 18\u201323). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00286"},{"key":"ref_60","unstructured":"(2018, December 12). Jetson AGX Xavier. Available online: https:\/\/developer.nvidia.com\/embedded\/jetson-agx-xavier."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/20\/2376\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:25:55Z","timestamp":1760189155000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/20\/2376"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,13]]},"references-count":60,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2019,10]]}},"alternative-id":["rs11202376"],"URL":"https:\/\/doi.org\/10.3390\/rs11202376","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,13]]}}}