{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T09:41:05Z","timestamp":1780047665304,"version":"3.53.1"},"reference-count":41,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2021,1,3]],"date-time":"2021-01-03T00:00:00Z","timestamp":1609632000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004744","name":"Innoviris","doi-asserted-by":"publisher","award":["research project DRIvINg"],"award-info":[{"award-number":["research project DRIvINg"]}],"id":[{"id":"10.13039\/501100004744","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The paper proposes a novel instance segmentation method for traffic videos devised for deployment on real-time embedded devices. A novel neural network architecture is proposed using a multi-resolution feature extraction backbone and improved network designs for the object detection and instance segmentation branches. A novel post-processing method is introduced to ensure a reduced rate of false detection by evaluating the quality of the output masks. An improved network training procedure is proposed based on a novel label assignment algorithm. An ablation study on speed-vs.-performance trade-off further modifies the two branches and replaces the conventional ResNet-based performance-oriented backbone with a lightweight speed-oriented design. The proposed architectural variations achieve real-time performance when deployed on embedded devices. The experimental results demonstrate that the proposed instance segmentation method for traffic videos outperforms the you only look at coefficients algorithm, the state-of-the-art real-time instance segmentation method. The proposed architecture achieves qualitative results with 31.57 average precision on the COCO dataset, while its speed-oriented variations achieve speeds of up to 66.25 frames per second on the Jetson AGX Xavier module.<\/jats:p>","DOI":"10.3390\/s21010275","type":"journal-article","created":{"date-parts":[[2021,1,3]],"date-time":"2021-01-03T19:54:46Z","timestamp":1609703686000},"page":"275","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Real-Time Instance Segmentation of Traffic Videos for Embedded Devices"],"prefix":"10.3390","volume":"21","author":[{"given":"Ruben","family":"Panero Martinez","sequence":"first","affiliation":[{"name":"Department of Electronics and Informatics, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2202-1163","authenticated-orcid":false,"given":"Ionut","family":"Schiopu","sequence":"additional","affiliation":[{"name":"Department of Electronics and Informatics, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0688-8173","authenticated-orcid":false,"given":"Bruno","family":"Cornelis","sequence":"additional","affiliation":[{"name":"Department of Electronics and Informatics, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium"},{"name":"Macq S.A., Rue de l\u2019A\u00e9ronef 2, 1140 Brussels, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7290-0428","authenticated-orcid":false,"given":"Adrian","family":"Munteanu","sequence":"additional","affiliation":[{"name":"Department of Electronics and Informatics, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,1,3]]},"reference":[{"key":"ref_1","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (,  2012). Image Net Classification with Deep Convolutional Neural Networks. Proceedings of the International Conference on Neural Information Processing Systems, Stateline, NV, USA. Available online: https:\/\/dl.acm.org\/doi\/10.1145\/3065386."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding. arXiv.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Kirillov, A., He, K., Girshick, R.B., Rother, C., and Doll\u00e1r, P. (2018). Panoptic Segmentation. arXiv.","DOI":"10.1109\/CVPR.2019.00963"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R.B. (2017). Mask R-CNN. arXiv.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (, January October). YOLACT: Real-Time Instance Segmentation. Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea. Available online: https:\/\/openaccess.thecvf.com\/content_ICCV_2019\/html\/Bolya_YOLACT_Real-Time_Instance_Segmentation_ICCV_2019_paper.html.","DOI":"10.1109\/ICCV.2019.00925"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, X., Kong, T., Shen, C., Jiang, Y., and Li, L. (2019). SOLO: Segmenting Objects by Locations. arXiv.","DOI":"10.1007\/978-3-030-58523-5_38"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Li, Y., Qi, H., Dai, J., Ji, X., and Wei, Y. (2016). Fully Convolutional Instance-aware Semantic Segmentation. arXiv.","DOI":"10.1109\/CVPR.2017.472"},{"key":"ref_8","unstructured":"NVDIA (2020, August 16). NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge| NVIDIA Developer Blog. Available online: https:\/\/developer.nvidia.com\/blog\/jetson-tx2-delivers-twice-intelligence-edge."},{"key":"ref_9","unstructured":"NVDIA (2020, August 16). AI-Powered Autonomous Machines at Scale|NVIDIA Jetson AGX Xavier. Available online: www.nvidia.com\/en-us\/autonomous-machines\/embedded-systems\/jetson-agx-xavier."},{"key":"ref_10","unstructured":"Ren, S., He, K., Girshick, R.B., and Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv."},{"key":"ref_11","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016). R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lin, T., Doll\u00e1r, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J. (2016). Feature Pyramid Networks for Object Detection. arXiv.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Huang, Z., Huang, L., Gong, Y., Huang, C., and Wang, X. (2019). Mask Scoring R-CNN. arXiv.","DOI":"10.1109\/CVPR.2019.00657"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chen, L., Hermans, A., Papandreou, G., Schroff, F., Wang, P., and Adam, H. (2017). MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features. arXiv.","DOI":"10.1109\/CVPR.2018.00422"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1312","DOI":"10.1109\/TPAMI.2011.231","article-title":"CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts","volume":"34","author":"Carreira","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","unstructured":"Pinheiro, P.H.O., Collobert, R., and Doll\u00e1r, P. (2015). Learning to Segment Object Candidates. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Chu, J., Leng, L., and Miao, J. (2020). Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation. Sensors, 20.","DOI":"10.3390\/s20041010"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R.B., He, K., and Doll\u00e1r, P. (2017). Focal Loss for Dense Object Detection. arXiv.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1109\/TIP.2019.2923571","article-title":"Exemplar-Based Recursive Instance Segmentation with Application to Plant Image Analysis","volume":"29","author":"Yu","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"24135","DOI":"10.1109\/ACCESS.2020.2969480","article-title":"Weakly Supervised Instance Segmentation Based on Two-Stage Transfer Learning","volume":"8","author":"Sun","year":"2020","journal-title":"IEEE Access"},{"key":"ref_22","unstructured":"Liu, Y., Wu, Y.H., Wen, P.S., Shi, Y.J., Qiu, Y., and Cheng, M.M. (2020). Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","unstructured":"Liu, R., Lehman, J., Molino, P., Such, F.P., Frank, E., Sergeev, A., and Yosinski, J. (2018). An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution. arXiv."},{"key":"ref_24","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (2019). FCOS: Fully Convolutional One-Stage Object Detection. arXiv.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lee, Y., and Park, J. (2020). CenterMask: Real-Time Anchor-Free Instance Segmentation. arXiv.","DOI":"10.1109\/CVPR42600.2020.01392"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kong, T., Sun, F., Liu, H., Jiang, Y., and Shi, J. (2019). FoveaBox: Beyond Anchor-based Object Detector. arXiv.","DOI":"10.1109\/TIP.2020.3002345"},{"key":"ref_28","unstructured":"Xiang, C., Tian, S., Zou, W., and Xu, C. (2019). SAIS: Single-stage Anchor-free Instance Segmentation. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yang, H., Deng, R., Lu, Y., Zhu, Z., Chen, Y., Roland, J.T., Lu, L., Landman, B.A., Fogo, A.B., and Huo, Y. (2020). CircleNet: Anchor-free Detection with Circle Representation. arXiv.","DOI":"10.1007\/978-3-030-59719-1_4"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Milletari, F., Navab, N., and Ahmadi, S. (2016, January 25\u201328). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.79"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. arXiv.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_32","unstructured":"Cui, Y., Lin, T.Y., Kirillov, A., Ronchi, M.R., Girshick, R., and Dollr, P. (2020, September 30). COCO Detection Challenge (Segmentation Mask). Available online: https:\/\/competitions.codalab.org\/competitions\/20796#learn_the_details."},{"key":"ref_33","unstructured":"Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., and Girshick, R. (2020, January 13). Detectron2. Available online: https:\/\/github.com\/facebookresearch\/detectron2."},{"key":"ref_34","unstructured":"(2020, July 23). PyTorch. Available online: https:\/\/pytorch.org\/."},{"key":"ref_35","unstructured":"NVDIA (2020, November 06). TITAN X Specifications. Available online: www.nvidia.com\/en-us\/geforce\/products\/10series\/titan-x-pascal\/."},{"key":"ref_36","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_37","unstructured":"(2020, August 15). ONNX Homepage. Available online: https:\/\/onnx.ai\/."},{"key":"ref_38","unstructured":"(2020, July 23). NVIDIA TensorRT. Available online: https:\/\/developer.nvidia.com\/tensorrt."},{"key":"ref_39","unstructured":"Macq S.A.\/N.V. (2020, July 15). Smart Mobility Solutions. Available online: https:\/\/www.macq.eu."},{"key":"ref_40","unstructured":"Wang, R.J., Li, X., Ao, S., and Ling, C.X. (2018). Pelee: A Real-Time Object Detection System on Mobile Devices. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., and Chen, L. (2018). Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. arXiv.","DOI":"10.1109\/CVPR.2018.00474"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/1\/275\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:06:34Z","timestamp":1760159194000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/1\/275"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,3]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["s21010275"],"URL":"https:\/\/doi.org\/10.3390\/s21010275","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,3]]}}}