{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T14:58:47Z","timestamp":1769266727000,"version":"3.49.0"},"reference-count":30,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2021,7,31]],"date-time":"2021-07-31T00:00:00Z","timestamp":1627689600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Defense Industrial Technology Development Program","award":["JCKY2019602C015"],"award-info":[{"award-number":["JCKY2019602C015"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Despite the breakthroughs in accuracy and efficiency of object detection using deep neural networks, the performance of small object detection is far from satisfactory. Gaze estimation has developed significantly due to the development of visual sensors. Combining object detection with gaze estimation can significantly improve the performance of small object detection. This paper presents a centered multi-task generative adversarial network (CMTGAN), which combines small object detection and gaze estimation. To achieve this, we propose a generative adversarial network (GAN) capable of image super-resolution and two-stage small object detection. We exploit a generator in CMTGAN for image super-resolution and a discriminator for object detection. We introduce an artificial texture loss into the generator to retain the original feature of small objects. We also use a centered mask in the generator to make the network focus on the central part of images where small objects are more likely to appear in our method. We propose a discriminator with detection loss for two-stage small object detection, which can be adapted to other GANs for object detection. Compared with existing interpolation methods, the super-resolution images generated by CMTGAN are more explicit and contain more information. Experiments show that our method exhibits a better detection performance than mainstream methods.<\/jats:p>","DOI":"10.3390\/s21155194","type":"journal-article","created":{"date-parts":[[2021,8,1]],"date-time":"2021-08-01T21:44:32Z","timestamp":1627854272000},"page":"5194","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Centered Multi-Task Generative Adversarial Network for Small Object Detection"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4260-8642","authenticated-orcid":false,"given":"Hongfeng","family":"Wang","sequence":"first","affiliation":[{"name":"School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianzhong","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kemeng","family":"Bai","sequence":"additional","affiliation":[{"name":"School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yong","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,7,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Fischer, T., Chang, H.J., and Demiris, Y. (2018, January 8\u201314). Rt-gene: Real-time eye gaze estimation in natural environments. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_21"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Jaques, N., Conati, C., Harley, J.M., and Azevedo, R. (2014). Predicting affect from gaze data during interaction with an intelligent tutoring system. International Conference on Intelligent Tutoring Systems, Springer.","DOI":"10.1007\/978-3-319-07221-0_4"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"558","DOI":"10.1109\/ACCESS.2016.2520093","article-title":"A novel eye-gaze-controlled wheelchair system for navigating unknown environments: Case study with a person with ALS","volume":"4","author":"Eid","year":"2016","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/s11257-017-9192-3","article-title":"Adaptive user modelling in car racing games using behavioural and physiological data","volume":"27","author":"Georgiou","year":"2017","journal-title":"User Model. User Adapt. Interact."},{"key":"ref_5","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_6","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016). Ssd: Single shot multibox detector. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Bai, Y., Zhang, Y., Ding, M., and Ghanem, B. (2018, January 8\u201314). Finding tiny faces in the wild with generative adversarial network. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1109\/CVPR.2018.00010"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bai, Y., Zhang, Y., Ding, M., and Ghanem, B. (2018, January 8\u201314). Sod-mtgan: Small object detection via multi-task generative adversarial network. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_13"},{"key":"ref_11","first-page":"558","article-title":"A Self-Labeling Feature Matching Algorithm for Instance Recognition on Multi-Sensor Images","volume":"41","author":"Zhang","year":"2021","journal-title":"Trans. Beijing Inst. Technol."},{"key":"ref_12","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liu, H., Fan, K., Ouyang, Q., and Li, N. (2021). Real-Time Small Drones Detection Based on Pruned YOLOv4. Sensors, 21.","DOI":"10.3390\/s21103374"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Xiang, X., Tian, Y., Zhang, Y., Fu, Y., Allebach, J.P., and Xu, C. (2020, January 14\u201319). Zooming slow-mo: Fast and accurate one-stage space-time video super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00343"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Su, R., Zhong, B., Ji, J., and Ma, K.K. (2020, January 25\u201328). Single Image Super-Resolution Via A Progressive Mixture Model. Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/ICIP40778.2020.9190772"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1109\/MSP.2017.2765202","article-title":"Generative adversarial networks: An overview","volume":"35","author":"Creswell","year":"2018","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ledig, C., Theis, L., Husz\u00e1r, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., and Wang, Z. (2017, January 21\u201326). Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.19"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., and Change Loy, C. (2018, January 8\u201314). ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-11021-5_5"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Feng, H., Guo, J., Xu, H., and Ge, S.S. (2021). SharpGAN: Dynamic Scene Deblurring Method for Smart Ship Based on Receptive Field Block and Generative Adversarial Networks. Sensors, 21.","DOI":"10.3390\/s21113641"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Marnerides, D., Bashford-Rogers, T., and Debattista, K. (2021). Deep HDR Hallucination for Inverse Tone Mapping. Sensors, 21.","DOI":"10.3390\/s21124032"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J.Y., Zhou, T., and Efros, A.A. (2017, January 21\u201326). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.632"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhu, J.Y., Park, T., Isola, P., and Efros, A.A. (2017, January 21\u201326). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.244"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S. (2017, January 21\u201326). Perceptual generative adversarial networks for small object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.211"},{"key":"ref_24","first-page":"295","article-title":"Double-Channel GAN with Multi-Level Semantic Correlation for Event Detection","volume":"41","author":"Pan","year":"2021","journal-title":"Trans. Beijing Inst. Technol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Truong, N.Q., Lee, Y.W., Owais, M., Nguyen, D.T., Batchuluun, G., Pham, T.D., and Park, K.R. (2020). SlimDeblurGAN-based motion deblurring and marker detection for autonomous drone landing. Sensors, 20.","DOI":"10.3390\/s20143918"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1109\/TPAMI.2015.2439281","article-title":"Image super-resolution using deep convolutional networks","volume":"38","author":"Dong","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Shi, W., Caballero, J., Husz\u00e1r, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., and Wang, Z. (2016, January 27\u201330). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.207"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Dong, Z., Xu, K., Yang, Y., Bao, H., Xu, W., and Lau, R.W. (2020). Location-aware Single Image Reflection Removal. arXiv.","DOI":"10.1109\/ICCV48922.2021.00497"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ma, C., Rao, Y., Cheng, Y., Chen, C., Lu, J., and Zhou, J. (2020, January 14\u201319). Structure-preserving super resolution with gradient guidance. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00779"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/15\/5194\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:38:03Z","timestamp":1760164683000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/15\/5194"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,31]]},"references-count":30,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2021,8]]}},"alternative-id":["s21155194"],"URL":"https:\/\/doi.org\/10.3390\/s21155194","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,31]]}}}