{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T16:08:36Z","timestamp":1775837316046,"version":"3.50.1"},"reference-count":55,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2019,3,22]],"date-time":"2019-03-22T00:00:00Z","timestamp":1553212800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Automatic extraction of ground objects is fundamental for many applications of remote sensing. It is valuable to extract different kinds of ground objects effectively by using a general method. We propose such a method, JointNet, which is a novel neural network to meet extraction requirements for both roads and buildings. The proposed method makes three contributions to road and building extraction: (1) in addition to the accurate extraction of small objects, it can extract large objects with a wide receptive field. By switching the loss function, the network can effectively extract multi-type ground objects, from road centerlines to large-scale buildings. (2) This network module combines the dense connectivity with the atrous convolution layers, maintaining the efficiency of the dense connection connectivity pattern and reaching a large receptive field. (3) The proposed method utilizes the focal loss function to improve road extraction. The proposed method is designed to be effective on both road and building extraction tasks. Experimental results on three datasets verified the effectiveness of JointNet in information extraction of road and building objects.<\/jats:p>","DOI":"10.3390\/rs11060696","type":"journal-article","created":{"date-parts":[[2019,3,25]],"date-time":"2019-03-25T06:56:52Z","timestamp":1553497012000},"page":"696","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":81,"title":["JointNet: A Common Neural Network for Road and Building Extraction"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4576-7275","authenticated-orcid":false,"given":"Zhengxin","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer Science, Beihang University, Beijing 100083, China"}]},{"given":"Yunhong","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Beihang University, Beijing 100083, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,22]]},"reference":[{"key":"ref_1","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_2","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_5","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (arXiv, 2017). Rethinking atrous convolution for semantic image segmentation, arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (arXiv, 2018). Encoder-decoder with atrous separable convolution for semantic image segmentation, arXiv.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_7","unstructured":"Mnih, V. (2013). Machine Learning for Aerial Image Labeling. [Ph.D. Thesis, University of Toronto]."},{"key":"ref_8","unstructured":"Marcu, A., and Leordeanu, M. (arXiv, 2016). Dual local-global contextual pathways for recognition in aerial imagery, arXiv."},{"key":"ref_9","first-page":"2999","article-title":"Focal Loss for Dense Object Detection","volume":"PP","author":"Lin","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_11","unstructured":"Wu, Y., and He, K. (arXiv, 2018). Group Normalization, arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Salakhutdinov, R., Mnih, A., and Hinton, G. (2007, January 20\u201324). Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA.","DOI":"10.1145\/1273496.1273596"},{"key":"ref_13","unstructured":"Nair, V., and Hinton, G.E. (2010, January 21\u201324). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel."},{"key":"ref_14","unstructured":"Mnih, V., Larochelle, H., and Hinton, G.E. (arXiv, 2012). Conditional restricted boltzmann machines for structured output prediction, arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Mnih, V., and Hinton, G.E. (2010). Learning to detect roads in high-resolution aerial images. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15567-3_16"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Saito, S., and Aoki, Y. (2015, January 8\u201312). Building and road detection from large aerial imagery. Proceedings of the Image Processing: Machine Vision Applications VIII, San Francisco, CA, USA.","DOI":"10.1117\/12.2083273"},{"key":"ref_17","first-page":"10402","article-title":"Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks","volume":"60","author":"Saito","year":"2016","journal-title":"Electron. Imaging"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1109\/LGRS.2018.2802944","article-title":"Road extraction by deep residual u-net","volume":"15","author":"Zhang","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the International Conference on Medical Image Computing & Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1016\/j.isprsjprs.2009.06.004","article-title":"Object based image analysis for remote sensing","volume":"65","author":"Blaschke","year":"2010","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1016\/j.geomorph.2006.04.013","article-title":"Automated classification of landform elements using object-based image analysis","volume":"81","author":"Blaschke","year":"2006","journal-title":"Geomorphology"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1016\/j.rse.2010.12.017","article-title":"Per-pixel vs. object-based classification of urban land cover extraction using high spatial resolution imagery","volume":"115","author":"Myint","year":"2011","journal-title":"Remote Sens. Environ."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2016, January 10\u201315). Fully convolutional neural networks for remote sensing image classification. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS, Beijing, China.","DOI":"10.1109\/IGARSS.2016.7730322"},{"key":"ref_24","unstructured":"Marcu, A., Costea, D., Slusanschi, E., and Leordeanu, M. (arXiv, 2018). A Multi-Stage Multi-Task Neural Network for Aerial Scene Interpretation and Geolocalization, arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/34.87344","article-title":"Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations","volume":"13","author":"Vincent","year":"1991","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2274","DOI":"10.1109\/TPAMI.2012.120","article-title":"SLIC Superpixels Compared to State-of-the-Art Superpixel Methods","volume":"34","author":"Achanta","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 8\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Eigen, D., and Fergus, R. (2015, January 7\u201313). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.304"},{"key":"ref_29","unstructured":"Pinheiro, P.H., and Collobert, R. (2014, January 21\u201326). Recurrent convolutional neural networks for scene labeling. Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lin, G., Shen, C., Van Den Hengel, A., and Reid, I. (2016, January 27\u201330). Efficient piecewise training of deep structured models for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.348"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Yang, Y., Wang, J., Xu, W., and Yuille, A.L. (2016, January 27\u201330). Attention to scale: Scale-aware semantic image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.396"},{"key":"ref_32","unstructured":"Badrinarayanan, V., Kendall, A., and Cipolla, R. (arXiv, 2015). Segnet: A deep convolutional encoder-decoder architecture for image segmentation, arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016). Identity mappings in deep residual networks. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 8\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A.A. (2017, January 4\u20139). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., and Cottrell, G. (2018, January 12\u201315). Understanding convolution for semantic segmentation. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00163"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1016\/j.image.2003.10.003","article-title":"Downsampling dependent upsampling of images","volume":"19","author":"Frajka","year":"2004","journal-title":"Signal Process. Image Commun."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep Learning with Depthwise Separable Convolutions. Proceedings of the Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_42","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (arXiv, 2016). Layer Normalization, arXiv."},{"key":"ref_43","unstructured":"Ulyanov, D., Vedaldi, A., and Lempitsky, V. (arXiv, 2016). Instance Normalization: The Missing Ingredient for Fast Stylization, arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"3322","DOI":"10.1109\/TGRS.2017.2669341","article-title":"Automatic Road Detection and Centerline Extraction via Cascaded End-to-End Convolutional Neural Network","volume":"55","author":"Cheng","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_45","unstructured":"OpenStreetMap Contributors (2019, March 20). OpenStreetMap. Available online: https:\/\/www.openstreetmap.org."},{"key":"ref_46","unstructured":"Simard, P., Steinkraus, D., and Platt, J.C. (2003, January 3\u20136). Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. Proceedings of the International Conference on Document Analysis Recognition, Edinburgh, UK."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Zhou, L., Zhang, C., and Wu, M. (2018, January 8\u201322). D-linknet: Linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00034"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D., and Raska, R. (2018, January 8\u201322). Deepglobe 2018: A challenge to parse the earth through satellite images. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00031"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Iglovikov, V.I., Seferbekov, S., Buslaev, A.V., and Shvets, A. (2018, January 8\u201322). TernausNetV2: Fully Convolutional Network for Instance Segmentation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00042"},{"key":"ref_50","unstructured":"Ehrig, M., and Euzenat, J. (2005, January 2). Relaxed precision and recall for ontology matching. Proceedings of the K-Cap 2005 Workshop on Integrating Ontology, Banff, AB, Canada."},{"key":"ref_51","unstructured":"Chollet, F. (2019, March 20). Keras. Available online: https:\/\/keras.io."},{"key":"ref_52","unstructured":"Kingma, D.P., and Ba, J. (arXiv, 2014). Adam: A method for stochastic optimization, arXiv."},{"key":"ref_53","unstructured":"Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017, January 4\u20139). Automatic differentiation in PyTorch. NIPS-W. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Hamaguchi, R., Fujita, A., Nemoto, K., Imaizumi, T., and Hikosaka, S. (arXiv, 2017). Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery, arXiv.","DOI":"10.1109\/WACV.2018.00162"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.isprsjprs.2017.05.002","article-title":"Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks","volume":"130","author":"Alshehhi","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/6\/696\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:40:07Z","timestamp":1760186407000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/6\/696"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,22]]},"references-count":55,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["rs11060696"],"URL":"https:\/\/doi.org\/10.3390\/rs11060696","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,3,22]]}}}