{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T16:17:19Z","timestamp":1761581839669,"version":"build-2065373602"},"reference-count":46,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,6,10]],"date-time":"2019-06-10T00:00:00Z","timestamp":1560124800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61775139","61332009"],"award-info":[{"award-number":["61775139","61332009"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["2017M610230"],"award-info":[{"award-number":["2017M610230"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>In this paper, we propose a semantic segmentation method based on superpixel region merging and convolutional neural network (CNN), referred to as regional merging neural network (RMNN). Image annotation has always been an important role in weakly-supervised semantic segmentation. Most methods use manual labeling. In this paper, super-pixels with similar features are combined using the relationship between each pixel after super-pixel segmentation to form a plurality of super-pixel blocks. Rough predictions are generated by the fully convolutional networks (FCN) so that certain super-pixel blocks will be labeled. We perceive and find other positive areas in an iterative way through the marked areas. This reduces the feature extraction vector and reduces the data dimension due to super-pixels. The algorithm not only uses superpixel merging to narrow down the target\u2019s range but also compensates for the lack of weakly-supervised semantic segmentation at the pixel level. In the training of the network, we use the method of region merging to improve the accuracy of contour recognition. Our extensive experiments demonstrated the effectiveness of the proposed method with the PASCAL VOC 2012 dataset. In particular, evaluation results show that the mean intersection over union (mIoU) score of our method reaches as high as 44.6%. Because the cavity convolution is in the pooled downsampling operation, it does not degrade the network\u2019s receptive field, thereby ensuring the accuracy of image semantic segmentation. The findings of this work thus open the door to leveraging the dilated convolution to improve the recognition accuracy of small objects.<\/jats:p>","DOI":"10.3390\/bdcc3020031","type":"journal-article","created":{"date-parts":[[2019,6,10]],"date-time":"2019-06-10T11:39:47Z","timestamp":1560166787000},"page":"31","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Weakly-Supervised Image Semantic Segmentation Based on Superpixel Region Merging"],"prefix":"10.3390","volume":"3","author":[{"given":"Quanchun","family":"Jiang","sequence":"first","affiliation":[{"name":"Shanghai Key Lab of Modern Optical Systems, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9114-5974","authenticated-orcid":false,"given":"Olamide Timothy","family":"Tawose","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Songwen","family":"Pei","sequence":"additional","affiliation":[{"name":"Shanghai Key Lab of Modern Optical Systems, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaodong","family":"Chen","sequence":"additional","affiliation":[{"name":"Information Science and Technology Research, Shanghai Advanced Research Institute, Chinese Academy of Sciences, No. 99 Haike Rd., Zhangjiang, Pudong, Shanghai 201210, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linhua","family":"Jiang","sequence":"additional","affiliation":[{"name":"Shanghai Key Lab of Modern Optical Systems, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiayao","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Key Lab of Modern Optical Systems, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongfang","family":"Zhao","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557, USA"},{"name":"Department of Computer Science, University of California, Davis, CA 95661, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,6,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"42210","DOI":"10.1109\/ACCESS.2019.2904620","article-title":"An Intrusion Detection Model Based on Feature Reduction and Convolutional Neural Networks","volume":"7","author":"Xiao","year":"2019","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"834","DOI":"10.20965\/jaciii.2017.p0834","article-title":"Pedestrian Detection Algorithm Based on Improved Convolutional Neural Network","volume":"21","author":"Qin","year":"2013","journal-title":"J. Adv. Comput. Intell. Intell. Inform."},{"key":"ref_3","first-page":"121","article-title":"Article Users Activity Gesture Recognition on Kinect Sensor Using Convolutional Neural Networks and FastDTW for Controlling Movements of a Mobile Robot","volume":"22","author":"Pfitscher","year":"2019","journal-title":"Intell. Artif."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Abdelwahab, M.A. (2019, January 2\u20134). Accurate Vehicle Counting Approach Based on Deep Neural Networks. Proceedings of the 2019 International Conference on Innovative Trends in Computer Engineering (ITCE), Aswan, Egypt.","DOI":"10.1109\/ITCE.2019.8646549"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1109\/TPAMI.2016.2578328","article-title":"Object Instance Segmentation and Fine-Grained Localization Using Hypercolumns","volume":"39","author":"Hariharan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1007\/978-3-319-10584-0_20","article-title":"Simultaneous Detection and Segmentation","volume":"8695","author":"Hariharan","year":"2014","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mostajahi, M., Yadollahpour, P., and Shakhnarovich, G. (2015, January 7\u201312). Feedforward semantic segmentation with zoom-out features. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298959"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1915","DOI":"10.1109\/TPAMI.2012.231","article-title":"Learning Hierarchical Features for Scene Labeling","volume":"35","author":"Farabet","year":"2013","journal-title":"IEEE Trans. Pattern Anal."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","unstructured":"Papandreou, G., Chen, L.-C., Murphy, K., and Yuille, A.L. (2015). Weakly-and semi-supervised learning of a dcnn for semantic image segmentation. arXiv."},{"key":"ref_13","first-page":"109","article-title":"Deep convolutional networks for scene parsing","volume":"3","author":"Grangier","year":"2009","journal-title":"ICML Deep Learn. Workshop"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lin, D., Dai, J.F., Jia, J.Y., He, K.M., and Sun, J. (July, January 26). ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.344"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Dai, J.F., He, K.M., and Sun, J. (2015, January 7\u201313). BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA.","DOI":"10.1109\/ICCV.2015.191"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2983","DOI":"10.1016\/j.patcog.2015.04.019","article-title":"CRF learning with CNN features for image segmentation","volume":"48","author":"Liu","year":"2015","journal-title":"Pattern Recogn."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1007\/s11263-007-0109-1","article-title":"TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context","volume":"81","author":"Shotton","year":"2009","journal-title":"Int. J. Comput. Vis."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1007\/s11263-008-0202-0","article-title":"Robust Higher Order Potentials for Enforcing Label Consistency","volume":"82","author":"Kohli","year":"2009","journal-title":"Int. J. Comput. Vis."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Fulkerson, B., Vedaldi, A., and Soatto, S. (October, January 27). Class Segmentation and Object Localization with Superpixel Neighborhoods. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (ICCV), Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459175"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011, January 20\u201325). Real-Time Human Pose Recognition in Parts from Single Depth Images. Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995316"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Shotton, J., Johnson, M., and Cipolla, R. (2008, January 24\u201326). Semantic texton forests for image categorization and segmentation. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587503"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, X., Yan, S., Luo, J., Tang, J., Huango, Z., and Jin, H. (2010, January 13\u201318). Nonparametric Label-to-Region by search. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540033"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1109\/TMM.2011.2174780","article-title":"Weakly Supervised Graph Propagation Towards Collective Image Parsing","volume":"14","author":"Liu","year":"2012","journal-title":"IEEE Trans. Multimed."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Aminpour, A., and Razzaghi, P. (2018, January 8\u201310). Weakly Supervised Semantic Segmentation Using Hierarchical Multi-Image Model. Proceedings of the 2018 26th Iranian Conference on Electrical Engineering (ICEE), Mashhad, Iran.","DOI":"10.1109\/ICEE.2018.8472438"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"15297","DOI":"10.1109\/ACCESS.2018.2814568","article-title":"Improving Semantic Image Segmentation with a Probabilistic Superpixel-Based Dense Conditional Random Field","volume":"6","author":"Zhang","year":"2018","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Oquab, M., Bottou, L., Laptev, I., and Sivic, J. (2014, January 23\u201328). Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks. Proceedings of the Computer Vision & Pattern Recognition 2014, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.222"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2270","DOI":"10.1109\/TNSRE.2017.2748388","article-title":"Seizure Classification from EEG Signals using Transfer Learning, Semi-Supervised Learning and TSK Fuzzy System","volume":"25","author":"Jiang","year":"2017","journal-title":"IEEE Trans. Neural Syst. Rehabil. Eng."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Seker, A. (2018, January 28\u201330). Evaluation of Fabric Defect Detection Based on Transfer Learning with Pre-trained AlexNet. Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey.","DOI":"10.1109\/IDAP.2018.8620888"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","article-title":"The PASCAL Visual Object Classes Challenge: A Retrospective","volume":"111","author":"Everingham","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhao, W., Zhang, H., Yan, Y., Fu, Y., and Wang, H. (2018). A Semantic Segmentation Algorithm Using FCN with Combination of BSLIC. Appl. Sci., 8.","DOI":"10.3390\/app8040500"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2274","DOI":"10.1109\/TPAMI.2012.120","article-title":"SLIC Superpixels Compared to State-of-the-Art Superpixel Methods","volume":"34","author":"Achanta","year":"2012","journal-title":"IEEE Trans. Pattern Anal."},{"key":"ref_33","unstructured":"Wang, S., Lu, H.C., Yang, F., and Yang, M.H. (2011, January 6\u201313). Superpixel Tracking. Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Pinheiro, P.O., and Collohert, R. (2015, January 7\u201312). From Image-level to Pixel-level Labeling with Convolutional Networks. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298780"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Pathak, D., Krahenbuhl, P., and Darrell, T. (2015, January 7\u201313). Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.209"},{"key":"ref_36","first-page":"1502","article-title":"A Simple Algorithm of Superpixel Segmentation with Boundary Constraint","volume":"27","author":"Zhang","year":"2017","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1684","DOI":"10.1109\/83.730380","article-title":"Hybrid image segmentation using watersheds and fast region merging","volume":"7","author":"Haris","year":"1998","journal-title":"IEEE Trans. Image Process."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1471","DOI":"10.1080\/01431160903475308","article-title":"Segmentation of multispectral high-resolution satellite imagery based on integrated feature distributions","volume":"31","author":"Wang","year":"2010","journal-title":"Int. J. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1109\/LGRS.2012.2194693","article-title":"A Spatially-Constrained Color-Texture Model for Hierarchical VHR Image Segmentation","volume":"10","author":"Hu","year":"2013","journal-title":"IEEE Geosci. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1109\/TPAMI.2013.107","article-title":"Entropy-Rate Clustering: Cluster Analysis via Maximizing a Submodular Function Subject to a Matroid Constraint","volume":"36","author":"Liu","year":"2014","journal-title":"IEEE Trans. Pattern Anal."},{"key":"ref_43","unstructured":"Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017). A review on deep learning techniques applied to semantic segmentation. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1111\/j.1469-8137.1912.tb05611.x","article-title":"The Distribution of Flora in the Alpine Zone","volume":"11","author":"Jaccard","year":"2010","journal-title":"New Phytol."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., and Yuille, A. (2014, January 23\u201328). The role of context for object detection and semantic segmentation in the wild. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.119"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Mehta, P., Dorkenwald, S., Zhao, D., Kaftan, T., Cheung, A., Balazinska, M., and AlSayyad, Y. (2017, January 10\u201311). Comparative evaluation of big-data systems on scientific image analytics workloads. Proceedings of the 2017 VLDB Endow, Washington, DC, USA.","DOI":"10.14778\/3137628.3137634"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/3\/2\/31\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:57:14Z","timestamp":1760187434000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/3\/2\/31"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,10]]},"references-count":46,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["bdcc3020031"],"URL":"https:\/\/doi.org\/10.3390\/bdcc3020031","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2019,6,10]]}}}