{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T04:04:32Z","timestamp":1773547472682,"version":"3.50.1"},"reference-count":58,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2018,12,29]],"date-time":"2018-12-29T00:00:00Z","timestamp":1546041600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>With the widespread application of location-based services, the appropriate representation of indoor spaces and efficient indoor 3D reconstruction have become essential tasks. Due to the complexity and closeness of indoor spaces, it is difficult to develop a versatile solution for large-scale indoor 3D scene reconstruction. In this paper, an annotated hierarchical Structure-from-Motion (SfM) method is proposed for low-cost and efficient indoor 3D reconstruction using unordered images collected with widely available smartphone or consumer-level cameras. Although the reconstruction of indoor models is often compromised by the indoor complexity, we make use of the availability of complex semantic objects to classify the scenes and construct a hierarchical scene tree to recover the indoor space. Starting with the semantic annotation of the images, images that share the same object were detected and classified utilizing visual words and the support vector machine (SVM) algorithm. The SfM method was then applied to hierarchically recover the atomic 3D point cloud model of each object, with the semantic information from the images attached. Finally, an improved random sample consensus (RANSAC) generalized Procrustes analysis (RGPA) method was employed to register and optimize the partial models into a complete indoor scene. The proposed approach incorporates image classification in the hierarchical SfM based indoor reconstruction task, which explores the semantic propagation from images to points. It also reduces the computational complexity of the traditional SfM by avoiding exhausting pair-wise image matching. The applicability and accuracy of the proposed method was verified on two different image datasets collected with smartphone and consumer cameras. The results demonstrate that the proposed method is able to efficiently and robustly produce semantically and geometrically correct indoor 3D point models.<\/jats:p>","DOI":"10.3390\/rs11010058","type":"journal-article","created":{"date-parts":[[2018,12,31]],"date-time":"2018-12-31T07:22:30Z","timestamp":1546240950000},"page":"58","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Low-Cost and Efficient Indoor 3D Reconstruction through Annotated Hierarchical Structure-from-Motion"],"prefix":"10.3390","volume":"11","author":[{"given":"Youli","family":"Ding","sequence":"first","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China"}]},{"given":"Xianwei","family":"Zheng","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2226-5005","authenticated-orcid":false,"given":"Yan","family":"Zhou","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China"}]},{"given":"Hanjiang","family":"Xiong","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China"}]},{"given":"Jianya","family":"Gong","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China"},{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China"}]}],"member":"1968","published-online":{"date-parts":[[2018,12,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, L., Su, F., Yang, F., Zhu, H., Li, D., Zuo, X., Li, F., Liu, Y., and Ying, S. (2018). Reconstruction of Three-Dimensional (3D) Indoor Interiors with Multiple Stories via Comprehensive Segmentation. Remote Sens., 10.","DOI":"10.3390\/rs10081281"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Zheng, X., Chen, R., Xiong, H., and Guo, S. (2018). Image-Based Localization Aided Indoor Pedestrian Trajectory Estimation Using Smartphones. Sensors, 18.","DOI":"10.3390\/s18010258"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hermans, A., Floros, G., and Leibe, B. (June, January 31). Dense 3d semantic mapping of indoor scenes from rgb-d images. Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China.","DOI":"10.1109\/ICRA.2014.6907236"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Jamali, A., Abdul Rahman, A., and Boguslawski, P. (2016, January 16\u201317). A hybrid 3D indoor space model. Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Istanbul, Turkey.","DOI":"10.5194\/isprs-archives-XLII-2-W1-75-2016"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3284","DOI":"10.3390\/rs6043284","article-title":"Segmentation of sloped roofs from airborne LiDAR point clouds using ridge-based hierarchical decomposition","volume":"6","author":"Fan","year":"2014","journal-title":"Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/j.isprsjprs.2012.11.004","article-title":"Model driven reconstruction of roofs from sparse LIDAR point clouds","volume":"76","author":"Henn","year":"2013","journal-title":"Isprs J. Photogramm. Remote Sens."},{"key":"ref_7","unstructured":"Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., and Fitzgibbon, A. (2017, January 9\u201313). KinectFusion: Real-time dense surface mapping and tracking. Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, Nantes, France."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1318","DOI":"10.1109\/TCYB.2013.2265378","article-title":"Enhanced computer vision with microsoft kinect sensor: A review","volume":"43","author":"Han","year":"2013","journal-title":"IEEE Trans. Cybern."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Afanasyev, I., Sagitov, A., and Magid, E. (2015). ROS-Based SLAM for a Gazebo-Simulated Mobile Robot in Image-Based 3D Model of Indoor Environment. Advanced Concepts for Intelligent Vision Systems, Springer.","DOI":"10.1007\/978-3-319-25903-1_24"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1016\/j.geomorph.2012.08.021","article-title":"\u2018Structure-from-Motion\u2019 photogrammetry: A low-cost, effective tool for geoscience applications","volume":"179","author":"Westoby","year":"2012","journal-title":"Geomorphology"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wu, C., Agarwal, S., Curless, B., and Seitz, S.M. (2011, January 20\u201325). Multicore bundle adjustment. Proceedings of the Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2011.5995552"},{"key":"ref_12","unstructured":"Agarwal, S., Snavely, N., Simon, I., and Seitz, S.M. (October, January 29). Building Rome in a day. Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gherardi, R., Farenzena, M., and Fusiello, A. (2010, January 13\u201318). Improving the efficiency of hierarchical structure-and-motion. Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539782"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1145\/1141911.1141964","article-title":"Photo Tourism: Exploring Photo Collections In 3D","volume":"25","author":"Snavely","year":"2006","journal-title":"ACM Trans. Graph."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Snavely, N., Seitz, S.M., and Szeliski, R. (2008, January 23\u201328). Skeletal graphs for efficient structure from motion. Proceedings of the CVPR 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587678"},{"key":"ref_16","unstructured":"Wu, C. (July, January 29). Towards Linear-time Incremental Structure from Motion. Proceedings of the International Conference on 3d Vision, Seattle, WA, USA."},{"key":"ref_17","unstructured":"Yin, L., Snavely, N., and Gehrke, J. (2008, January 12\u201318). MatchMiner: Efficient Spanning Structure Mining in Large Image Collections. Proceedings of the European Conference on Computer Vision, Marseille, France."},{"key":"ref_18","unstructured":"Moulon, P., Monasse, P., and Marlet, R. (October, January 29). Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion. Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_19","unstructured":"Arie-Nachimson, M., Kovalsky, S.Z., Kemelmacher-Shlizerman, I., Singer, A., and Basri, R. (2016, January 13\u201314). Global Motion Estimation from Point Matches. Proceedings of the International Conference on 3d Imaging, Liege, Belgium."},{"key":"ref_20","unstructured":"Sinha, S.N., Steedly, D., and Szeliski, R. (2010, January 10\u201311). A multi-stage linear approach to structure from motion. Proceedings of the European Conference on Trends and Topics in Computer Vision, Heraklion, Greece."},{"key":"ref_21","unstructured":"Jiang, N., Cui, Z., and Tan, P. (October, January 29). A Global Linear Method for Camera Pose Registration. Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2841","DOI":"10.1109\/TPAMI.2012.218","article-title":"SfM with MRFs: Discrete-Continuous Optimization for Large-Scale Structure from Motion","volume":"35","author":"Crandall","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/s11263-012-0601-0","article-title":"Rotation Averaging","volume":"103","author":"Hartley","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1007\/s11263-012-0552-5","article-title":"Fully automatic registration of image sets on approximate geometry","volume":"102","author":"Corsini","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Furukawa, Y., Curless, B., Seitz, S.M., and Szeliski, R. (2010, January 13\u201318). Towards Internet-scale multi-view stereo. Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539802"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1016\/j.cviu.2017.02.005","article-title":"Efficient tree-structured SfM by RANSAC generalized Procrustes analysis","volume":"157","author":"Chen","year":"2017","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ni, K., and Dellaert, F. (2012, January 13\u201315). HyperSfM. Proceedings of the 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), Zurich, Switzerland.","DOI":"10.1109\/3DIMPVT.2012.47"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1080\/17538947.2016.1252433","article-title":"Detecting repetitive structures on building footprints for the purposes of 3D modeling and reconstruction","volume":"10","author":"Fan","year":"2017","journal-title":"Int. J. Digit. Earth"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Martin-Brualla, R., He, Y., Russell, B.C., and Seitz, S.M. (2014). The 3D Jigsaw Puzzle: Mapping Large Indoor Spaces, Springer International Publishing.","DOI":"10.1007\/978-3-319-10578-9_1"},{"key":"ref_30","unstructured":"Furukawa, Y., Curless, B., Seitz, S.M., and Szeliski, R. (October, January 29). Reconstructing building interiors from images. Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_31","first-page":"1","article-title":"Acquiring 3D indoor environments with variability and repetition","volume":"31","author":"Kim","year":"2012","journal-title":"ACM Trans. Graph."},{"key":"ref_32","unstructured":"Choi, S., Zhou, Q.-Y., and Koltun, V. (2008, January 23\u201328). Robust reconstruction of indoor scenes. Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1109\/JSTSP.2014.2381153","article-title":"Fast, Automated, Scalable Generation of Textured 3D Models of Indoor Environments","volume":"9","author":"Turner","year":"2015","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., and Niebner, M. (2017, January 21\u201326). ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. Proceedings of the CVPR, Honolulu, Hawaii.","DOI":"10.1109\/CVPR.2017.261"},{"key":"ref_35","unstructured":"Koppula, H.S., Anand, A., Joachims, T., and Saxena, A. (2013, January 5\u201310). Semantic labeling of 3D point clouds for indoor scenes. Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1730","DOI":"10.1109\/TPAMI.2016.2613051","article-title":"Dense Semantic 3D Reconstruction","volume":"39","author":"Haene","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1016\/j.autcon.2012.10.006","article-title":"Automatic creation of semantically rich 3D building models from laser scanner data","volume":"31","author":"Xiong","year":"2013","journal-title":"Autom. Constr."},{"key":"ref_38","unstructured":"Ikehata, S., Yang, H., and Furukawa, Y. (October, January 29). Structured Indoor Modeling. Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Perronnin, F., and Dance, C. (2008, January 23\u201328). Fisher Kernels on Visual Vocabularies for Image Categorization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2007.383266"},{"key":"ref_40","first-page":"487","article-title":"Exploiting Generative Models in Discriminative Classifiers","volume":"11","author":"Jaakkola","year":"1998","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1023\/A:1009715923555","article-title":"A Tutorial on Support Vector Machines for Pattern Recognition","volume":"2","author":"Burges","year":"1998","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wright, S., and Nocedal, J. (1999). Numerical Optimization, Springer Science.","DOI":"10.1007\/b98874"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Pizarro, D., and Bartoli, A. (2010, January 13\u201318). Global optimization for optimal generalized procrustes analysis. Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2011.5995677"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Fischler, M.A., and Bolles, R.C. (1981). Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography, ACM.","DOI":"10.1145\/358669.358692"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1016\/j.imavis.2004.05.007","article-title":"Robust Euclidean alignment of 3D point sets: The trimmed iterative closest point algorithm","volume":"23","author":"Chetverikov","year":"2005","journal-title":"Image Vis. Comput."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1109\/TPAMI.2007.1155","article-title":"A Thousand Words in a Scene","volume":"29","author":"Quelhas","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1704","DOI":"10.1109\/TPAMI.2011.235","article-title":"Aggregating Local Image Descriptors into Compact Codes","volume":"34","author":"Jegou","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Shen, S., and Hu, Z. (2018, January 5\u20138). Fine-Level Semantic Labeling of Large-Scale 3D Model by Active Learning. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00066"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.cag.2017.11.010","article-title":"SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks","volume":"71","author":"Boulch","year":"2018","journal-title":"Comput. Graph."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (arXiv, 2018). Encoder-decoder with atrous separable convolution for semantic image segmentation, arXiv.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2018, December 26). Densely Connected Convolutional Networks. Available online: http:\/\/openaccess.thecvf.com\/content_cvpr_2017\/papers\/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Chen, C., Yang, B., Song, S., Tian, M., Li, J., Dai, W., and Fang, L. (2018). Calibrate Multiple Consumer RGB-D Cameras for Low-Cost and Efficient 3D Indoor Mapping. Remote Sens., 10.","DOI":"10.3390\/rs10020328"},{"key":"ref_53","unstructured":"Cabral, R., and Furukawa, Y. (2008, January 23\u201328). Piecewise Planar and Compact Floorplan Reconstruction from Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1007\/s11263-014-0711-y","article-title":"Reconstructing the world\u2019s museums","volume":"110","author":"Xiao","year":"2014","journal-title":"Int. J. Comput. Vis."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"H\u00e4ne, C., Tulsiani, S., and Malik, J. (2017, January 10\u201312). Hierarchical Surface Prediction for 3D Object Reconstruction. Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China.","DOI":"10.1109\/3DV.2017.00054"},{"key":"ref_56","first-page":"2","article-title":"Stereo matching by training a convolutional neural network to compare image patches","volume":"17","author":"Zbontar","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_57","unstructured":"Chang, J.-R., and Chen, Y.-S. (2008, January 23\u201328). Pyramid Stereo Matching Network. Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Yao, Y., Luo, Z., Li, S., Fang, T., and Quan, L. (arXiv, 2018). MVSNet: Depth Inference for Unstructured Multi-view Stereo, arXiv.","DOI":"10.1007\/978-3-030-01237-3_47"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/1\/58\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:36:41Z","timestamp":1760197001000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/1\/58"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,29]]},"references-count":58,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,1]]}},"alternative-id":["rs11010058"],"URL":"https:\/\/doi.org\/10.3390\/rs11010058","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,29]]}}}