{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:34:09Z","timestamp":1760060049591,"version":"build-2065373602"},"reference-count":49,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T00:00:00Z","timestamp":1753833600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFC3005705"],"award-info":[{"award-number":["2022YFC3005705"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Building height information plays an important role in many urban-related applications, such as urban planning, disaster management, and environmental studies. With the rapid development of real scene maps, street view images are becoming a new data source for building height estimation, considering their easy collection and low cost. However, existing studies on building height estimation primarily utilize remote sensing images, with little exploration of height estimation from street-view images. In this study, we proposed a deep learning-based method for estimating the height of a single building in Baidu panoramic street view imagery. Firstly, the Segment Anything Model was used to extract the region of interest image and location features of individual buildings from the panorama. Subsequently, a cross-view matching algorithm was proposed by combining Baidu panorama and building footprint data with height information to generate building height samples. Finally, a Two-Branch feature fusion model (TBFF) was constructed to combine building location features and visual features, enabling accurate height estimation for individual buildings. The experimental results showed that the TBFF model had the best performance, with an RMSE of 5.69 m, MAE of 3.97 m, and MAPE of 0.11. Compared with two state-of-the-art methods, the TBFF model exhibited robustness and higher accuracy. The Random Forest model had an RMSE of 11.83 m, MAE of 4.76 m, and MAPE of 0.32, and the Pano2Geo model had an RMSE of 10.51 m, MAE of 6.52 m, and MAPE of 0.22. The ablation analysis demonstrated that fusing building location and visual features can improve the accuracy of height estimation by 14.98% to 69.99%. Moreover, the accuracy of the proposed method meets the LOD1 level 3D modeling requirements defined by the OGC (height error \u2264 5 m), which can provide data support for urban research.<\/jats:p>","DOI":"10.3390\/ijgi14080297","type":"journal-article","created":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T10:55:37Z","timestamp":1753872937000},"page":"297","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Novel Method for Estimating Building Height from Baidu Panoramic Street View Images"],"prefix":"10.3390","volume":"14","author":[{"given":"Shibo","family":"Ge","sequence":"first","affiliation":[{"name":"Research Center of Geospatial Big Data Application, Chinese Academy of Surveying and Mapping, Beijing 100036, China"}]},{"given":"Jiping","family":"Liu","sequence":"additional","affiliation":[{"name":"Research Center of Geospatial Big Data Application, Chinese Academy of Surveying and Mapping, Beijing 100036, China"}]},{"given":"Xianghong","family":"Che","sequence":"additional","affiliation":[{"name":"Research Center of Geospatial Big Data Application, Chinese Academy of Surveying and Mapping, Beijing 100036, China"}]},{"given":"Yong","family":"Wang","sequence":"additional","affiliation":[{"name":"Research Center of Geospatial Big Data Application, Chinese Academy of Surveying and Mapping, Beijing 100036, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8399-3607","authenticated-orcid":false,"given":"Haosheng","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Geography, Ghent University, 9000 Ghent, Belgium"}]}],"member":"1968","published-online":{"date-parts":[[2025,7,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.compenvurbsys.2017.01.001","article-title":"Generating 3D city models without elevation data","volume":"64","author":"Biljecki","year":"2017","journal-title":"Comput. Environ. Urban. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1080\/00420980601131902","article-title":"Determining optimal building height","volume":"44","author":"Chau","year":"2007","journal-title":"Urban. Stud."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1007\/s10940-021-09500-1","article-title":"Explaining crime diversity with google street view","volume":"37","author":"Khorshidi","year":"2021","journal-title":"J. Quant. Criminol."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Schug, F., Frantz, D., van der Linden, S., and Hostert, P. (2021). Gridded population mapping for Germany based on building density, height and type from Earth Observation data using census disaggregation and bottom-up estimates. PLoS ONE, 16.","DOI":"10.1371\/journal.pone.0249044"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"5834","DOI":"10.1080\/01431161.2013.796434","article-title":"Automatic building height extraction by volumetric shadow analysis of monoscopic imagery","volume":"34","author":"Lee","year":"2013","journal-title":"Int. J. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"6514","DOI":"10.1109\/JSTARS.2024.3372113","article-title":"Building Height Extraction From High-Resolution Single-View Remote Sensing Images Using Shadow and Side Information","volume":"17","author":"Xu","year":"2024","journal-title":"J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, C., Cui, Y., Zhu, Z., Jiang, S., and Jiang, W. (2022). Building height extraction from GF-7 satellite images based on roof contour constrained stereo matching. Remote Sens., 14.","DOI":"10.3390\/rs14071566"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2967","DOI":"10.1109\/TGRS.2010.2041460","article-title":"Height retrieval of isolated buildings from single high-resolution SAR images","volume":"48","author":"Guida","year":"2010","journal-title":"Trans. Geosci. Remote Sens."},{"key":"ref_9","first-page":"102596","article-title":"Retrieving building height in urban areas using ICESat-2 photon-counting LiDAR data","volume":"104","author":"Lao","year":"2021","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"6929","DOI":"10.1080\/01431161.2010.517226","article-title":"Shadow detection and building-height estimation using IKONOS data","volume":"32","author":"Shao","year":"2011","journal-title":"Int. J. Remote Sens."},{"key":"ref_11","unstructured":"Sun, Y., Shahzad, M., and Zhu, X.X. (2017, January 6\u20138). Building height estimation in single SAR image using OSM building footprints. Proceedings of the 2017 Joint Urban Remote Sensing Event (JURSE), Dubai, United Arab Emirates."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"27387","DOI":"10.1007\/s11042-018-5926-4","article-title":"Interactive 3D building modeling method using panoramic image sequences and digital map","volume":"77","author":"Kim","year":"2018","journal-title":"Multimed. Tools Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"104217","DOI":"10.1016\/j.landurbplan.2021.104217","article-title":"Street view imagery in urban analytics and GIS: A review","volume":"215","author":"Biljecki","year":"2021","journal-title":"Landsc. Urban. Plan."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Long, Y., and Liu, L. (2017). How green are the streets? An analysis for central areas of Chinese cities using Tencent Street View. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0171110"},{"key":"ref_15","first-page":"1998","article-title":"Sky View Factor Calculation based on Baidu Street View Images and Its Application in Urban Heat Island Study","volume":"23","author":"Feng","year":"2021","journal-title":"J. Geo-Inf. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Qi, J., and Zhang, R. (2019, January 13\u201317). Cbhe: Corner-based building height estimation for complex street scene images. Proceedings of the World Wide Web Conference, San Francisco, CA, USA.","DOI":"10.1145\/3308558.3313394"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Al-Habashna, A.a. (2021, January 26\u201328). Building height estimation using street-view images, deep-learning, contour processing, and geospatial data. Proceedings of the 2021 18th Conference on Robots and Vision (CRV), Burnaby, BC, Canada.","DOI":"10.1109\/CRV52889.2021.00022"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"D\u00edaz, E., and Arguello, H. (2016, January 16\u201317). An algorithm to estimate building heights from Google street-view imagery using single view metrology across a representational state transfer system. Proceedings of the Dimensional Optical Metrology and Inspection for Practical Applications V, Baltimore, MD, USA.","DOI":"10.1117\/12.2224312"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ure\u00f1a-Pliego, M., Mart\u00ednez-Mar\u00edn, R., Gonz\u00e1lez-Rodrigo, B., and Marchamalo-Sacrist\u00e1n, M. (2023). Automatic building height estimation: Machine learning models for urban image analysis. Appl. Sci., 13.","DOI":"10.3390\/app13085037"},{"key":"ref_20","unstructured":"Li, H., Yuan, Z., Dax, G., Kong, G., Fan, H., Zipf, A., and Werner, M. (2023). Semi-Supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1016\/j.isprsjprs.2024.07.005","article-title":"Pano2Geo: An efficient and robust building height estimation model using street-view panoramas","volume":"215","author":"Fan","year":"2024","journal-title":"ISPRS-J. Photogramm. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1317","DOI":"10.1080\/13658816.2021.1981334","article-title":"Exploring the vertical dimension of street view image based on deep learning: A case study on lowest floor elevation estimation","volume":"36","author":"Ning","year":"2022","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3661832","article-title":"Elev-vision: Automated lowest floor elevation estimation from segmenting street view images","volume":"2","author":"Ho","year":"2024","journal-title":"ACM J. Comput. Sustain. Soc."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ho, Y.-H., Li, L., and Mostafavi, A. (2024). ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation. arXiv.","DOI":"10.1111\/mice.13310"},{"key":"ref_25","unstructured":"(2024, October 22). Available online: https:\/\/map.baidu.com\/."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yue, H., Xie, H., Liu, L., and Chen, J. (2022). Detecting people on the street and the streetscape physical environment from Baidu street view images and their effects on community-level street crime in a Chinese city. ISPRS Int. J. Geo-Inf., 11.","DOI":"10.3390\/ijgi11030151"},{"key":"ref_28","unstructured":"(2024, October 22). Available online: https:\/\/en.wikipedia.org\/wiki\/List_of_street_view_services."},{"key":"ref_29","unstructured":"Zhang, C., Liu, L., Cui, Y., Huang, G., Lin, W., Yang, Y., and Hu, Y. (2023). A comprehensive survey on segment anything model for vision and beyond. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., and Lo, W.-Y. (2023, January 1\u20136). Segment anything. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"ref_31","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Marcilio, W.E., and Eler, D.M. (2020, January 7\u201310). From explanations to feature selection: Assessing SHAP values as feature selection mechanism. Proceedings of the 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Porto de Galinhas, Brazil.","DOI":"10.1109\/SIBGRAPI51738.2020.00053"},{"key":"ref_33","unstructured":"Lundberg, S. (2017). A unified approach to interpreting model predictions. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"117357","DOI":"10.1016\/j.jenvman.2023.117357","article-title":"Insights into geospatial heterogeneity of landslide susceptibility based on the SHAP-XGBoost model","volume":"332","author":"Zhang","year":"2023","journal-title":"J. Environ. Manag."},{"key":"ref_35","first-page":"e01059","article-title":"A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP)","volume":"16","author":"Ekanayake","year":"2022","journal-title":"Case Stud. Constr. Mater."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Nohara, Y., Matsumoto, K., Soejima, H., and Nakashima, N. (2022). Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput. Meth. Programs Biomed., 214.","DOI":"10.1016\/j.cmpb.2021.106584"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ogawa, Y., Oki, T., Chen, S., and Sekimoto, Y. (2021, January 2). Joining street-view images and building footprint gis data. Proceedings of the 1st ACM SIGSPATIAL International Workshop on Searching and Mining Large Collections of Geospatial Data, Beijing, China.","DOI":"10.1145\/3486640.3491395"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Kothuri, R.K.V., Ravada, S., and Abugov, D. (2002, January 3\u20136). Quadtree and R-tree indexes in oracle spatial: A comparison using GIS data. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA.","DOI":"10.1145\/564752.564755"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018, January 18\u201323). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., and Vasudevan, V. (2019, January 27\u201328). Searching for mobilenetv3. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00140"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1007\/s41095-022-0271-y","article-title":"Attention mechanisms in computer vision: A survey","volume":"8","author":"Guo","year":"2022","journal-title":"Comput. Vis. Media."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_44","unstructured":"Xiao, X., Yan, M., Basodi, S., Ji, C., and Pan, Y. (2020). Efficient hyperparameter optimization in deep learning using a variable length genetic algorithm. arXiv."},{"key":"ref_45","unstructured":"Kingma, D.P. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_46","unstructured":"Yang, B. (2021). A mathematical investigation on the distance-preserving property of an equidistant cylindrical projection. arXiv."},{"key":"ref_47","unstructured":"(2025, May 07). Available online: https:\/\/stateofvr.com\/1_the-basics.html."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"5357","DOI":"10.5194\/essd-16-5357-2024","article-title":"3D-GloBFP: The first global three-dimensional building footprint dataset","volume":"16","author":"Che","year":"2024","journal-title":"Earth Syst. Sci. Data Discuss."},{"key":"ref_49","unstructured":"(2025, July 11). Available online: https:\/\/portal.ogc.org\/files\/?artifact_id=33758."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/14\/8\/297\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:18:46Z","timestamp":1760033926000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/14\/8\/297"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,30]]},"references-count":49,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["ijgi14080297"],"URL":"https:\/\/doi.org\/10.3390\/ijgi14080297","relation":{},"ISSN":["2220-9964"],"issn-type":[{"type":"electronic","value":"2220-9964"}],"subject":[],"published":{"date-parts":[[2025,7,30]]}}}