{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:59:14Z","timestamp":1760147954216,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T00:00:00Z","timestamp":1678752000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Structural and Investment","award":["POCI-01-0247-FEDER-047264","2020.06434.BD"],"award-info":[{"award-number":["POCI-01-0247-FEDER-047264","2020.06434.BD"]}]},{"name":"Portuguese funding agency","award":["POCI-01-0247-FEDER-047264","2020.06434.BD"],"award-info":[{"award-number":["POCI-01-0247-FEDER-047264","2020.06434.BD"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Semantic segmentation consists of classifying each pixel according to a set of classes. Conventional models spend as much effort classifying easy-to-segment pixels as they do classifying hard-to-segment pixels. This is inefficient, especially when deploying to situations with computational constraints. In this work, we propose a framework wherein the model first produces a rough segmentation of the image, and then patches of the image estimated as hard to segment are refined. The framework is evaluated in four datasets (autonomous driving and biomedical), across four state-of-the-art architectures. Our method accelerates inference time by four, with additional gains for training time, at the cost of some output quality.<\/jats:p>","DOI":"10.3390\/s23063092","type":"journal-article","created":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T03:04:46Z","timestamp":1678763086000},"page":"3092","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Two-Stage Framework for Faster Semantic Segmentation"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5189-6228","authenticated-orcid":false,"given":"Ricardo","family":"Cruz","sequence":"first","affiliation":[{"name":"Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal"},{"name":"INESC TEC\u2014Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal"}]},{"given":"Diana Teixeira e","family":"Silva","sequence":"additional","affiliation":[{"name":"Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal"},{"name":"INESC TEC\u2014Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4744-9174","authenticated-orcid":false,"given":"Tiago","family":"Gon\u00e7alves","sequence":"additional","affiliation":[{"name":"Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal"},{"name":"INESC TEC\u2014Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal"}]},{"given":"Diogo","family":"Carneiro","sequence":"additional","affiliation":[{"name":"Bosch Car Multimedia, 4705-820 Braga, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3760-2473","authenticated-orcid":false,"given":"Jaime S.","family":"Cardoso","sequence":"additional","affiliation":[{"name":"Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal"},{"name":"INESC TEC\u2014Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"SegNet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_4","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wang, C., Zhao, Z., Ren, Q., Xu, Y., and Yu, Y. (2019). Dense U-Net based on patch-based learning for retinal vessel segmentation. Entropy, 21.","DOI":"10.3390\/e21020168"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kondaveeti, H.K., Bandi, D., Mathe, S.E., Vappangi, S., and Subramanian, M. (2022, January 25\u201326). A review of image processing applications based on Raspberry-Pi. Proceedings of the 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India.","DOI":"10.1109\/ICACCS54159.2022.9784958"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Fernandes, K., Cruz, R., and Cardoso, J.S. (2018, January 8\u201313). Deep image segmentation by quality inference. Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil.","DOI":"10.1109\/IJCNN.2018.8489696"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kim, J.U., Kim, H.G., and Ro, Y.M. (2017, January 11\u201315). Iterative deep convolutional encoder-decoder network for medical image segmentation. Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju Island, Republic of Korea.","DOI":"10.1109\/EMBC.2017.8036917"},{"key":"ref_9","unstructured":"Wang, W., Yu, K., Hugonot, J., Fua, P., and Salzmann, M. (November, January 27). Recurrent U-Net for resource-constrained segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_10","unstructured":"Banino, A., Balaguer, J., and Blundell, C. (2021, January 23\u201324). PonderNet: Learning to Ponder. Proceedings of the 8th ICML Workshop on Automated Machine Learning (AutoML), Virtual."},{"key":"ref_11","unstructured":"Silva, D.T., Cruz, R., Gon\u00e7alves, T., and Carneiro, D. (2022, January 18\u201320). Two-stage Semantic Segmentation in Neural Networks. Proceedings of the Fifteenth International Conference on Machine Vision (ICMV 2022), Rome, Italy."},{"key":"ref_12","unstructured":"Google AI Blog (2023, February 12). Accurate Alpha Matting for Portrait Mode Selfies on Pixel 6. Available online: https:\/\/ai.googleblog.com\/2022\/01\/accurate-alpha-matting-for-portrait.html."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Miangoleh, S.M.H., Dille, S., Mai, L., Paris, S., and Aksoy, Y. (2021, January 20\u201325). Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00956"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yu, Q., Wang, H., Kim, D., Qiao, S., Collins, M., Zhu, Y., Adam, H., Yuille, A., and Chen, L.C. (2022, January 18\u201324). CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00259"},{"key":"ref_15","unstructured":"Mnih, V., Heess, N., and Graves, A. (2014, January 8\u201313). Recurrent models of visual attention. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_16","unstructured":"Ba, J., Mnih, V., and Kavukcuoglu, K. (2015, January 7\u20139). Multiple object recognition with visual attention. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. (2018, January 13\u201319). BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00271"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1007\/s11263-018-1070-x","article-title":"Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes","volume":"126","author":"Alhaija","year":"2018","journal-title":"Int. J. Comput. Vis. (IJCV)"},{"key":"ref_20","unstructured":"Kaggle (2023, February 12). 2018 Data Science Bowl. Available online: https:\/\/www.kaggle.com\/c\/data-science-bowl-2018."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Mendon\u00e7a, T., Ferreira, P.M., Marques, J.S., Marcal, A.R., and Rozeira, J. (2013, January 3\u20137). PH2\u2013A dermoscopic image database for research and benchmarking. Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan.","DOI":"10.1109\/EMBC.2013.6610779"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis. (IJCV)"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Marcel, S., and Rodriguez, Y. (2010, January 25\u201329). Torchvision the machine-vision package of torch. Proceedings of the 18th ACM International Conference on Multimedia, Firenze, Italy.","DOI":"10.1145\/1873951.1874254"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/6\/3092\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:54:26Z","timestamp":1760122466000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/6\/3092"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,14]]},"references-count":24,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23063092"],"URL":"https:\/\/doi.org\/10.3390\/s23063092","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,3,14]]}}}