{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T10:29:07Z","timestamp":1778495347704,"version":"3.51.4"},"reference-count":39,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2021,9,24]],"date-time":"2021-09-24T00:00:00Z","timestamp":1632441600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["71774134"],"award-info":[{"award-number":["71774134"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1811462"],"award-info":[{"award-number":["U1811462"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Southwest Minzu University Research Startup Funds","award":["RQD2021061"],"award-info":[{"award-number":["RQD2021061"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>With the continuous development of artificial intelligence, embedding object detection algorithms into autonomous underwater detectors for marine garbage cleanup has become an emerging application area. Considering the complexity of the marine environment and the low resolution of the images taken by underwater detectors, this paper proposes an improved algorithm based on Mask R-CNN, with the aim of achieving high accuracy marine garbage detection and instance segmentation. First, the idea of dilated convolution is introduced in the Feature Pyramid Network to enhance feature extraction ability for small objects. Secondly, the spatial-channel attention mechanism is used to make features learn adaptively. It can effectively focus attention on detection objects. Third, the re-scoring branch is added to improve the accuracy of instance segmentation by scoring the predicted masks based on the method of Generalized Intersection over Union. Finally, we train the proposed algorithm in this paper on the Transcan dataset, evaluating its effectiveness by various metrics and comparing it with existing algorithms. The experimental results show that compared to the baseline provided by the Transcan dataset, the algorithm in this paper improves the mAP indexes on the two tasks of garbage detection and instance segmentation by 9.6 and 5.0, respectively, which significantly improves the algorithm performance. Thus, it can be better applied in the marine environment and achieve high precision object detection and instance segmentation.<\/jats:p>","DOI":"10.3390\/s21196391","type":"journal-article","created":{"date-parts":[[2021,9,27]],"date-time":"2021-09-27T22:16:38Z","timestamp":1632780998000},"page":"6391","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":48,"title":["An Embeddable Algorithm for Automatic Garbage Detection Based on Complex Marine Environment"],"prefix":"10.3390","volume":"21","author":[{"given":"Hongjie","family":"Deng","sequence":"first","affiliation":[{"name":"Key Laboratory of Electronic and Information Engineering, Southwest Minzu University, State Ethnic Affairs Commission, Chengdu 610041, China"}]},{"given":"Daji","family":"Ergu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Electronic and Information Engineering, Southwest Minzu University, State Ethnic Affairs Commission, Chengdu 610041, China"}]},{"given":"Fangyao","family":"Liu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Electronic and Information Engineering, Southwest Minzu University, State Ethnic Affairs Commission, Chengdu 610041, China"}]},{"given":"Bo","family":"Ma","sequence":"additional","affiliation":[{"name":"Key Laboratory of Electronic and Information Engineering, Southwest Minzu University, State Ethnic Affairs Commission, Chengdu 610041, China"}]},{"given":"Ying","family":"Cai","sequence":"additional","affiliation":[{"name":"Key Laboratory of Electronic and Information Engineering, Southwest Minzu University, State Ethnic Affairs Commission, Chengdu 610041, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Madricardo, F., Ghezzo, M., Nesto, N., Mc Kiver, W.J., Faussone, G.C., Fiorin, R., Riccato, F., Mackelworth, P.C., Basta, J., and De Pascalis, F. (2020). How to Deal with Seafloor Marine Litter: An Overview of the State-of-the-Art and Future Perspectives. Front. Mar. Sci., 7.","DOI":"10.3389\/fmars.2020.505134"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"116088","DOI":"10.1016\/j.image.2020.116088","article-title":"Underwater image processing and analysis: A review","volume":"91","author":"Jian","year":"2020","journal-title":"Signal Process. Image Commun."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single shot multi-box detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_6","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_7","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective Search for Object Recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_16","unstructured":"Yu, F., and Koltun, V. (2015). Multi-Scale Context Aggregation by Dilated Convolutions. arXiv."},{"key":"ref_17","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention Is All You Need. Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. (2019, January 16\u201320). Generalized Intersection Over union: A metric and a Loss for Bounding Box Regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00075"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Valdenegro-Toro, M. (2016, January 18\u201320). Submerged marine debris detection with autonomous underwater vehicles. Proceedings of the 2016 International Conference on Robotics and Automation for Humanitarian Applications (RAHA), Kollam, India.","DOI":"10.1109\/RAHA.2016.7931907"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"17091","DOI":"10.1007\/s11356-019-05148-4","article-title":"Identifying floating plastic marine debris using a deep learning approach","volume":"26","author":"Kylili","year":"2019","journal-title":"Environ. Sci. Pollut. Res."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Tharani, M., Wahab Amin, A., Maaz, M., and Taj, M. (2020). Attention Neural Network for Trash Detection on Water Channels. arXiv.","DOI":"10.1007\/978-3-030-92185-9_31"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fulton, M., Hong, J., Jahidul Islam, M., and Sattar, J. (2018). Robotic Detection of Marine Litter Using Deep Visual Detection Models. arXiv.","DOI":"10.1109\/ICRA.2019.8793975"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1117\/1.JRS.13.024511","article-title":"Underwater and airborne monitoring of marine ecosystems and debris","volume":"13","author":"Yang","year":"2019","journal-title":"J. Appl. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"103234","DOI":"10.1016\/j.engappai.2019.09.003","article-title":"Complex object detection using deep proposal mechanism","volume":"87","author":"Tan","year":"2020","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_25","unstructured":"Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/j.isprsjprs.2020.12.015","article-title":"PBNet: Part-based convolutional neural network for complex composite object detection in remote sensing imagery","volume":"173","author":"Sun","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3311","DOI":"10.1007\/s10489-020-01949-0","article-title":"Mask-guided SSD for small-object detection","volume":"51","author":"Sun","year":"2021","journal-title":"Appl. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3897","DOI":"10.1109\/TIP.2021.3065822","article-title":"Regularized Densely-Connected Pyramid Network for Salient Instance Segmentation","volume":"30","author":"Wu","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1016\/j.biosystemseng.2020.03.008","article-title":"Instance segmentation of apple flowers using the improved mask R\u2013CNN model","volume":"193","author":"Tian","year":"2020","journal-title":"Biosyst. Eng."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Xie, S.N., Girshick, R., Dollar, P., Tu, Z.W., and He, K.M. (2017, January 21\u201326). Aggregated Residual Transformations for Deep Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_32","unstructured":"Hong, J., Fulton, M., and Sattar, J. (2020). TrashCan: A Semantically-Segmented Dataset towards Visual Detection of Marine Debris. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R., He, K.M., and Dollar, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (2019, January 27\u201328). FCOS: Fully convolutional one-stage object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Wang, X., Kong, T., Shen, C., Jiang, Y., and Li, L. (2019). SOLO: Segmenting Objects by Locations. arXiv.","DOI":"10.1007\/978-3-030-58523-5_38"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., and Chen, H. (2020, January 23\u201328). Conditional convolutions for instance segmentation. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_17"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"133581","DOI":"10.1016\/j.scitotenv.2019.133581","article-title":"Anthropogenic Marine Debris assessment with Unmanned Aerial Vehicle imagery and deep learning: A case study along the beaches of the Republic of Maldives","volume":"693","author":"Fallati","year":"2019","journal-title":"Sci. Total Environ."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"111127","DOI":"10.1016\/j.marpolbul.2020.111127","article-title":"Estimation of plastic marine debris volumes on beaches using unmanned aerial vehicles and image processing based on deep learning","volume":"155","author":"Kako","year":"2020","journal-title":"Mar. Pollut. Bull."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"111974","DOI":"10.1016\/j.marpolbul.2021.111974","article-title":"Automatic detection of seafloor marine litter using towed camera images and deep learning","volume":"164","author":"Politikos","year":"2021","journal-title":"Mar. Pollut. Bull."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/19\/6391\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:04:42Z","timestamp":1760166282000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/19\/6391"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,24]]},"references-count":39,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["s21196391"],"URL":"https:\/\/doi.org\/10.3390\/s21196391","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,24]]}}}