{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T17:16:34Z","timestamp":1775322994599,"version":"3.50.1"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,8,10]],"date-time":"2021-08-10T00:00:00Z","timestamp":1628553600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"BITS Additional Competitive Research","award":["PLN\/AD\/2018-19\/5"],"award-info":[{"award-number":["PLN\/AD\/2018-19\/5"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Internet Technol."],"published-print":{"date-parts":[[2021,8,31]]},"abstract":"<jats:p>Aerial scenes captured by UAVs have immense potential in IoT applications related to urban surveillance, road and building segmentation, land cover classification, and so on, which are necessary for the evolution of smart cities. The advancements in deep learning have greatly enhanced visual understanding, but the domain of aerial vision remains largely unexplored. Aerial images pose many unique challenges for performing proper scene parsing such as high-resolution data, small-scaled objects, a large number of objects in the camera view, dense clustering of objects, background clutter, and so on, which greatly hinder the performance of the existing deep learning methods. In this work, we propose ISDNet (Instance Segmentation and Detection Network), a novel network to perform instance segmentation and object detection on visual data captured by UAVs. This work enables aerial image analytics for various needs in a smart city. In particular, we use dilated convolutions to generate improved spatial context, leading to better discrimination between foreground and background features. The proposed network efficiently reuses the segment-mask features by propagating them from early stages using residual connections. Furthermore, ISDNet makes use of effective anchors to accommodate varying object scales and sizes. The proposed method obtains state-of-the-art results in the aerial context.<\/jats:p>","DOI":"10.1145\/3418205","type":"journal-article","created":{"date-parts":[[2021,8,10]],"date-time":"2021-08-10T20:36:58Z","timestamp":1628627818000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["ISDNet: AI-enabled Instance Segmentation of Aerial Scenes for Smart Cities"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6654-7004","authenticated-orcid":false,"given":"Prateek","family":"Garg","sequence":"first","affiliation":[{"name":"Dept. of EEE, BITS Pilani, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8400-6762","authenticated-orcid":false,"given":"Anirudh Srinivasan","family":"Chakravarthy","sequence":"additional","affiliation":[{"name":"Dept. of CSIS, BITS Pilani, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0157-0967","authenticated-orcid":false,"given":"Murari","family":"Mandal","sequence":"additional","affiliation":[{"name":"Dept. of CSE, MNIT Jaipur, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1865-3512","authenticated-orcid":false,"given":"Pratik","family":"Narang","sequence":"additional","affiliation":[{"name":"Dept. of CSIS, BITS Pilani, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6730-3060","authenticated-orcid":false,"given":"Vinay","family":"Chamola","sequence":"additional","affiliation":[{"name":"Dept. of EEE, BITS Pilani, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohsen","family":"Guizani","sequence":"additional","affiliation":[{"name":"Dept. of CSE, Qatar University, Qatar"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,8,10]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICEIC49074.2020.9051269"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.comcom.2020.05.025"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.vehcom.2020.100249"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00132"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00644"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3390\/s19184021"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2992341"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00511"},{"key":"e_1_2_1_9_1","volume-title":"Chen Change Loy, and Dahua Lin","author":"Chen Kai","year":"2019","unstructured":"Kai Chen , Jiaqi Wang , Jiangmiao Pang , Yuhang Cao , Yu Xiong , Xiaoxiao Li , Shuyang Sun , Wansen Feng , Ziwei Liu , Jiarui Xu , Zheng Zhang , Dazhi Cheng , Chenchen Zhu , Tianheng Cheng , Qijie Zhao , Buyu Li , Xin Lu , Rui Zhu , Yue Wu , Jifeng Dai , Jingdong Wang , Jianping Shi , Wanli Ouyang , Chen Change Loy, and Dahua Lin . 2019 . MMDetection: Open MMLab de tection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019). Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2699184"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS\u201914)","author":"Cui J. Q.","unstructured":"J. Q. Cui , S. Lai , X. Dong , P. Liu , B. M. Chen , and T. H. Lee . 2014. Autonomous navigation of UAV in forest . In Proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS\u201914) . 726\u2013733. J. Q. Cui, S. Lai, X. Dong, P. Liu, B. M. Chen, and T. H. Lee. 2014. Autonomous navigation of UAV in forest. In Proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS\u201914). 726\u2013733."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the International Conference on Information and Communication Technology Convergence (ICTC\u201918)","author":"Datta S. K.","unstructured":"S. K. Datta , J. Dugelay , and C. Bonnet . 2018. IoT based UAV platform for emergency services . In Proceedings of the International Conference on Information and Communication Technology Convergence (ICTC\u201918) . 144\u2013147. S. K. Datta, J. Dugelay, and C. Bonnet. 2018. IoT based UAV platform for emergency services. In Proceedings of the International Conference on Information and Communication Technology Convergence (ICTC\u201918). 144\u2013147."},{"key":"e_1_2_1_13_1","volume-title":"Learning ROI transformer for detecting oriented objects in aerial images. arXiv preprint arXiv:1812.00155","author":"Ding Jian","year":"2018","unstructured":"Jian Ding , Nan Xue , Yang Long , Gui-Song Xia , and Qikai Lu. 2018. Learning ROI transformer for detecting oriented objects in aerial images. arXiv preprint arXiv:1812.00155 ( 2018 ). Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, and Qikai Lu. 2018. Learning ROI transformer for detecting oriented objects in aerial images. arXiv preprint arXiv:1812.00155 (2018)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-4018-9_32"},{"key":"e_1_2_1_15_1","volume-title":"Attend refine repeat: Active box proposal generation via in-out localization. arXiv preprint arXiv:1606.04446","author":"Gidaris Spyros","year":"2016","unstructured":"Spyros Gidaris and Nikos Komodakis . 2016. Attend refine repeat: Active box proposal generation via in-out localization. arXiv preprint arXiv:1606.04446 ( 2016 ). Spyros Gidaris and Nikos Komodakis. 2016. Attend refine repeat: Active box proposal generation via in-out localization. arXiv preprint arXiv:1606.04446 (2016)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298641"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00162"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2020.2977036"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.comcom.2019.09.021"},{"key":"e_1_2_1_22_1","volume-title":"Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961\u20132969","author":"He Kaiming","year":"2017","unstructured":"Kaiming He , Georgia Gkioxari , Piotr Doll\u00e1r , and Ross Girshick . 2017 . Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961\u20132969 . Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961\u20132969."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"He K.","unstructured":"K. He , X. Zhang , S. Ren , and J. Sun . 2016. Deep residual learning for image recognition . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916) . 770\u2013778. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 770\u2013778."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2017.2683528"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.3141\/2519-03"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2016.90"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2907789"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2858826"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the European Conference on Computer Vision. Springer, 740\u2013755","author":"Lin Tsung-Yi","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C. Lawrence Zitnick . 2014. Microsoft COCO: Common objects in context . In Proceedings of the European Conference on Computer Vision. Springer, 740\u2013755 . Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740\u2013755."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00913"},{"key":"e_1_2_1_34_1","volume-title":"Berg","author":"Liu Wei","year":"2016","unstructured":"Wei Liu , Dragomir Anguelov , Dumitru Erhan , Christian Szegedy , Scott Reed , Cheng-Yang Fu , and Alexander C . Berg . 2016 . SSD : Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer , 21\u201337. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21\u201337."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1080\/10095020.2017.1420509"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cvi.2018.5206"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2019.2952253"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093324"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2018.8545504"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/LGRS.2019.2923564"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8803262"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2018.2841808"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.3390\/app10031057"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.178"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969442.2969462"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_5"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA\u201916)","author":"Polishetty R.","unstructured":"R. Polishetty , M. Roopaei , and P. Rad . 2016. A next-generation secure cloud-based deep learning license plate recognition for smart cities . In Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA\u201916) . 286\u2013293. R. Polishetty, M. Roopaei, and P. Rad. 2016. A next-generation secure cloud-based deep learning license plate recognition for smart cities. In Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA\u201916). 286\u2013293."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.3390\/s20010112"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.690"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969250"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.3390\/s19214779"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/WI-IAT.2009.132"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2966580"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the IEEE Aerospace Conference. 1\u20136.","author":"Vanegas F.","unstructured":"F. Vanegas , K. J. Gaston , J. Roberts , and F. Gonzalez . 2019. A framework for UAV navigation and exploration in GPS-Denied environments . In Proceedings of the IEEE Aerospace Conference. 1\u20136. F. Vanegas, K. J. Gaston, J. Roberts, and F. Gonzalez. 2019. A framework for UAV navigation and exploration in GPS-Denied environments. In Proceedings of the IEEE Aerospace Conference. 1\u20136."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28\u201337","author":"Zamir Syed Waqas","year":"2019","unstructured":"Syed Waqas Zamir , Aditya Arora , Akshita Gupta , Salman Khan , Guolei Sun , Fahad Shahbaz Khan , Fan Zhu , Ling Shao , Gui-Song Xia , and Xiang Bai . 2019 . iSAID: A large-scale dataset for instance segmentation in aerial images . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28\u201337 . Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, and Xiang Bai. 2019. iSAID: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28\u201337."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00418"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2974745"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.650"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00840"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00832"},{"key":"e_1_2_1_62_1","volume-title":"Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122","author":"Yu Fisher","year":"2015","unstructured":"Fisher Yu and Vladlen Koltun . 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 ( 2015 ). Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2873617"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.3390\/s19224855"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2919937"},{"key":"e_1_2_1_66_1","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Zhu Pengfei","year":"2018","unstructured":"Pengfei Zhu , Longyin Wen , Dawei Du , Xiao Bian , Haibin Ling , Qinghua Hu , Haotian Wu , Qinqin Nie , Hao Cheng , Chenfeng Liu , et\u00a0al. 2018 . VisDrone-VDT2018: The vision meets drone video detection and tracking challenge results . In Proceedings of the European Conference on Computer Vision (ECCV\u201918) . Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Haibin Ling, Qinghua Hu, Haotian Wu, Qinqin Nie, Hao Cheng, Chenfeng Liu, et\u00a0al. 2018. VisDrone-VDT2018: The vision meets drone video detection and tracking challenge results. In Proceedings of the European Conference on Computer Vision (ECCV\u201918)."}],"container-title":["ACM Transactions on Internet Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3418205","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3418205","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:36Z","timestamp":1750195896000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3418205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,10]]},"references-count":66,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,8,31]]}},"alternative-id":["10.1145\/3418205"],"URL":"https:\/\/doi.org\/10.1145\/3418205","relation":{},"ISSN":["1533-5399","1557-6051"],"issn-type":[{"value":"1533-5399","type":"print"},{"value":"1557-6051","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,10]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}