{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T15:53:24Z","timestamp":1776441204766,"version":"3.51.2"},"reference-count":51,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2021,11,21]],"date-time":"2021-11-21T00:00:00Z","timestamp":1637452800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61702323"],"award-info":[{"award-number":["61702323"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61972240"],"award-info":[{"award-number":["61972240"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Key Laboratory of Sustainable Exploitation of Oceanic Fisheries Resources Ministry of Education","award":["A1-2006-00-301104"],"award-info":[{"award-number":["A1-2006-00-301104"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>A challenging and attractive task in computer vision is underwater object detection. Although object detection techniques have achieved good performance in general datasets, problems of low visibility and color bias in the complex underwater environment have led to generally poor image quality; besides this, problems with small targets and target aggregation have led to less extractable information, which makes it difficult to achieve satisfactory results. In past research of underwater object detection based on deep learning, most studies have mainly focused on improving detection accuracy by using large networks; the problem of marine underwater lightweight object detection has rarely gotten attention, which has resulted in a large model size and slow detection speed; as such the application of object detection technologies under marine environments needs better real-time and lightweight performance. In view of this, a lightweight underwater object detection method based on the MobileNet v2, You Only Look Once (YOLO) v4 algorithm and attentional feature fusion has been proposed to address this problem, to produce a harmonious balance between accuracy and speediness for target detection in marine environments. In our work, a combination of MobileNet v2 and depth-wise separable convolution is proposed to reduce the number of model parameters and the size of the model. The Modified Attentional Feature Fusion (AFFM) module aims to better fuse semantic and scale-inconsistent features and to improve accuracy. Experiments indicate that the proposed method obtained a mean average precision (mAP) of 81.67% and 92.65% on the PASCAL VOC dataset and the brackish dataset, respectively, and reached a processing speed of 44.22 frame per second (FPS) on the brackish dataset. Moreover, the number of model parameters and the model size were compressed to 16.76% and 19.53% of YOLO v4, respectively, which achieved a good tradeoff between time and accuracy for underwater object detection.<\/jats:p>","DOI":"10.3390\/rs13224706","type":"journal-article","created":{"date-parts":[[2021,11,21]],"date-time":"2021-11-21T21:00:50Z","timestamp":1637528450000},"page":"4706","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":190,"title":["Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion"],"prefix":"10.3390","volume":"13","author":[{"given":"Minghua","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Information Technology, Shanghai Ocean University, Shanghai 201306, China"}]},{"given":"Shubo","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Information Technology, Shanghai Ocean University, Shanghai 201306, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0604-5563","authenticated-orcid":false,"given":"Wei","family":"Song","sequence":"additional","affiliation":[{"name":"College of Information Technology, Shanghai Ocean University, Shanghai 201306, China"}]},{"given":"Qi","family":"He","sequence":"additional","affiliation":[{"name":"College of Information Technology, Shanghai Ocean University, Shanghai 201306, China"}]},{"given":"Quanmiao","family":"Wei","sequence":"additional","affiliation":[{"name":"East China Sea Bureau, Ministry of Natural Resources, Shanghai 200137, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2346","DOI":"10.1364\/AO.57.002346","article-title":"Active Underwater Detection with an Array of Atomic Magnetometers","volume":"57","author":"Deans","year":"2018","journal-title":"Appl. Opt."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Pydyn, A., Popek, M., Kubacka, M., and Janowski, \u0141. (2021). Exploration and Reconstruction of a Medieval Harbour Using Hydroacoustics, 3-D Shallow Seismic and Underwater Photogrammetry: A Case Study from Puck, Southern Baltic Sea. Archaeol. Prospect., 1\u201316.","DOI":"10.1002\/arp.1823"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1485","DOI":"10.1016\/j.scitotenv.2017.10.165","article-title":"Deep Sea Habitats in the Chemical Warfare Dumping Areas of the Baltic Sea","volume":"616","author":"Czub","year":"2018","journal-title":"Sci. Total Environ."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Leibe, B., Matas, J., Sebe, N., and Welling, M. SSD: Single Shot MultiBox Detector. Proceedings of the Computer Vision\u2014ECCV 2016.","DOI":"10.1007\/978-3-319-46478-7"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path Aggregation Network for Instance Segmentation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., and Barnard, K. (2021, January 5\u20139). Attentional Feature Fusion. Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00360"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, C.-Y., Mark Liao, H.-Y., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., and Yeh, I.-H. (2020, January 14\u201319). CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018, January 18\u201323). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_15","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_17","unstructured":"Pedersen, M., HAurum, J.B., Gade, R., Moeslund, T.B., and Madsen, N. (2019, January 15\u201320). Detection of Marine Animals in a New Underwater Dataset with Varying Visibility. Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely Connected Convolutional Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_20","unstructured":"Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., and Keutzer, K. (2016). SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018, January 18\u201323). ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00716"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Lin, W.-H., Zhong, J.-X., Liu, S., Li, T., and Li, G. (2020, January 4\u20138). ROIMIX: Proposal-Fusion Among Multiple Images for Underwater Object Detection. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053829"},{"key":"ref_24","unstructured":"Uplavikar, P., Wu, Z., and Wang, Z. (2019). All-In-One Underwater Image Enhancement Using Domain-Adversarial Learning. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"897","DOI":"10.1016\/j.neucom.2017.09.044","article-title":"Transferring Deep Knowledge for Object Recognition in Low-Quality Underwater Videos","volume":"275","author":"Sun","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"3637","DOI":"10.1007\/s00521-020-05217-7","article-title":"Scale-Aware Feature Pyramid Architecture for Marine Object Detection","volume":"33","author":"Xu","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1007\/s11760-020-01818-w","article-title":"Multi-Scale ResNet for Real-Time Underwater Object Detection","volume":"15","author":"Pan","year":"2021","journal-title":"Signal Image Video Process."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1295","DOI":"10.1093\/icesjms\/fsz025","article-title":"Automatic Fish Detection in Underwater Videos by a Deep Neural Network-Based Hybrid Motion Learning System","volume":"77","author":"Salman","year":"2019","journal-title":"ICES J. Mar. Sci."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chen, L., Liu, Z., Tong, L., Jiang, Z., Wang, S., Dong, J., and Zhou, H. (2020, January 19\u201324). Underwater Object Detection Using Invert Multi-Class Adaboost with Deep Learning. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK.","DOI":"10.1109\/IJCNN48605.2020.9207506"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"5476142","DOI":"10.1155\/2020\/5476142","article-title":"A Marine Object Detection Algorithm Based on SSD and Feature Enhancement","volume":"2020","author":"Hu","year":"2020","journal-title":"Complexity"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep Learning with Depthwise Separable Convolutions. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_32","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv."},{"key":"ref_33","unstructured":"Wang, R.J., Li, X., and Ling, C.X. (2019). Pelee: A Real-Time Object Detection System on Mobile Devices. arXiv."},{"key":"ref_34","unstructured":"Li, Y., Li, J., Lin, W., and Li, J. (2018). Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages. arXiv."},{"key":"ref_35","unstructured":"Simonyan, K., and Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_36","unstructured":"Misra, D. (2020). Mish: A Self Regularized Non-Monotonic Activation Function. arXiv."},{"key":"ref_37","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv."},{"key":"ref_38","unstructured":"Kingma, D.P., and Ba, J. (2017). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_39","unstructured":"Loshchilov, I., and Hutter, F. (2017). SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1007\/978-3-319-10602-1_48","article-title":"Microsoft COCO: Common Objects in Context","volume":"Volume 8693","author":"Fleet","year":"2014","journal-title":"Computer Vision\u2014ECCV 2014"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y.M. (2021). Scaled-YOLOv4: Scaling Cross Stage Partial Network. arXiv.","DOI":"10.1109\/CVPR46437.2021.01283"},{"key":"ref_42","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016). R-FCN: Object Detection via Region-Based Fully Convolutional Networks. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Singh, B., Li, H., Sharma, A., and Davis, L.S. (2018, January 18\u201323). R-FCN-3000 at 30fps: Decoupling Detection and Classification. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00119"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., and Chen, Y. (2017, January 21\u201326). RON: Reverse Connection with Objectness Prior Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.557"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhou, P., Ni, B., Geng, C., Hu, J., and Xu, Y. (2018, January 18\u201323). Scale-Transferrable Object Detection. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00062"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., and Xue, X. (2017, January 21\u201326). DSOD: Learning Deeply Supervised Object Detectors from Scratch. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.212"},{"key":"ref_47","unstructured":"Han, S., Mao, H., and Dally, W.J. (2016). Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv."},{"key":"ref_48","unstructured":"Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. (2017). Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"He, Y., Zhang, X., and Sun, J. (2017, January 22\u201329). Channel Pruning for Accelerating Very Deep Neural Networks. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.155"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Kisantal, M., Wojna, Z., Murawski, J., Naruniec, J., and Cho, K. (2019, January 21\u201322). Augmentation for Small Object Detection. Proceedings of the 9th International Conference on Advances in Computing and Information Technology (ACITY 2019), Sydney, Australia.","DOI":"10.5121\/csit.2019.91713"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S. (2017, January 21\u201326). Perceptual Generative Adversarial Networks for Small Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.211"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/22\/4706\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:33:36Z","timestamp":1760168016000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/22\/4706"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,21]]},"references-count":51,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["rs13224706"],"URL":"https:\/\/doi.org\/10.3390\/rs13224706","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,21]]}}}