{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T06:48:53Z","timestamp":1774421333850,"version":"3.50.1"},"reference-count":56,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2022,7,3]],"date-time":"2022-07-03T00:00:00Z","timestamp":1656806400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>3D reconstruction is a beneficial technique to generate 3D geometry of scenes or objects for various applications such as computer graphics, industrial construction, and civil engineering. There are several techniques to obtain the 3D geometry of an object. Close-range photogrammetry is an inexpensive, accessible approach to obtaining high-quality object reconstruction. However, state-of-the-art software systems need a stationary scene or a controlled environment (often a turntable setup with a black background), which can be a limiting factor for object scanning. This work presents a method that reduces the need for a controlled environment and allows the capture of multiple objects with independent motion. We achieve this by creating a preprocessing pipeline that uses deep learning to transform a complex scene from an uncontrolled environment into multiple stationary scenes with a black background that is then fed into existing software systems for reconstruction. Our pipeline achieves this by using deep learning models to detect and track objects through the scene. The detection and tracking pipeline uses semantic-based detection and tracking and supports using available pretrained or custom networks. We develop a correction mechanism to overcome some detection and tracking shortcomings, namely, object-reidentification and multiple detections of the same object. We show detection and tracking are effective techniques to address scenes with multiple motion systems and that objects can be reconstructed with limited or no knowledge of the camera or the environment.<\/jats:p>","DOI":"10.3390\/rs14133199","type":"journal-article","created":{"date-parts":[[2022,7,4]],"date-time":"2022-07-04T20:59:18Z","timestamp":1656968358000},"page":"3199","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Partial Scene Reconstruction for Close Range Photogrammetry Using Deep Learning Pipeline for Region Masking"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2704-1630","authenticated-orcid":false,"given":"Mahmoud","family":"Eldefrawy","sequence":"first","affiliation":[{"name":"Department of Computing Sciences, Texas A&M University\u2014Corpus Christi, 6300 Ocean Dr, Corpus Christi, TX 78412, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4022-0388","authenticated-orcid":false,"given":"Scott A.","family":"King","sequence":"additional","affiliation":[{"name":"Department of Computing Sciences, Texas A&M University\u2014Corpus Christi, 6300 Ocean Dr, Corpus Christi, TX 78412, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7996-0594","authenticated-orcid":false,"given":"Michael","family":"Starek","sequence":"additional","affiliation":[{"name":"Department of Computing Sciences, Texas A&M University\u2014Corpus Christi, 6300 Ocean Dr, Corpus Christi, TX 78412, USA"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1007\/s11831-019-09320-4","article-title":"Computational methods of acquisition and processing of 3D point cloud data for construction applications","volume":"27","author":"Wang","year":"2020","journal-title":"Arch. Comput. Methods Eng."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"101169","DOI":"10.1016\/j.aei.2020.101169","article-title":"Development of an unwanted-feature removal system for Structure from Motion of repetitive infrastructure piers using deep learning","volume":"46","author":"Saovana","year":"2020","journal-title":"Adv. Eng. Inform."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/16864360.2016.1199751","article-title":"An innovative photogrammetry color segmentation based technique as an alternative approach to 3D scanning for reverse engineering design","volume":"14","author":"James","year":"2017","journal-title":"Comput.-Aided Des. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Obradovi\u0107, M., Vasiljevi\u0107, I., \u0110uri\u0107, I., Ki\u0107anovi\u0107, J., Stojakovi\u0107, V., and Obradovi\u0107, R. (2020). Virtual Reality Models Based on Photogrammetric Surveys\u2014A Case Study of the Iconostasis of the Serbian Orthodox Cathedral Church of Saint Nicholas in Sremski Karlovci (Serbia). Appl. Sci., 10.","DOI":"10.3390\/app10082743"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"31039","DOI":"10.1007\/s11042-021-10520-z","article-title":"Exploring gestural input for engineering surveys of real-life structures in virtual reality using photogrammetric 3D models","volume":"80","author":"Tadeja","year":"2021","journal-title":"Multimed. Tools Appl."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"119","DOI":"10.5194\/isprs-annals-IV-2-W1-119-2016","article-title":"Smart point cloud: Definition and remaining challenges","volume":"4","author":"Poux","year":"2016","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.inffus.2020.11.002","article-title":"Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy","volume":"68","author":"Fernandes","year":"2021","journal-title":"Inf. Fusion"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ahmed, H.O., Belhi, A., Alfaqheri, T., Bouras, A., Sadka, A.H., and Foufou, S. (2021, January 23\u201324). A Cost-Effective 3D Acquisition and Visualization Framework for Cultural Heritage. Proceedings of the Fifth International Congress on Information and Communication Technology, London, UK.","DOI":"10.1007\/978-981-15-5859-7_49"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.culher.2017.10.011","article-title":"A high-precision photogrammetric recording system for small artifacts","volume":"31","author":"Sapirstein","year":"2018","journal-title":"J. Cult. Herit."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"301","DOI":"10.20965\/ijat.2021.p0301","article-title":"Introduction of All-Around 3D Modeling Methods for Investigation of Plants","volume":"15","author":"Kochi","year":"2021","journal-title":"Int. J. Autom. Technol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"71","DOI":"10.7183\/2326-3768.4.1.71","article-title":"A simple photogrammetry rig for the reliable creation of 3D artifact models in the field: Lithic examples from the Early Upper Paleolithic sequence of Les Cott\u00e9s (France)","volume":"4","author":"Porter","year":"2016","journal-title":"Adv. Archaeol. Pract."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., and Frahm, J.M. (2016, January 27\u201330). Structure-from-motion revisited. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Bianco, S., Ciocca, G., and Marelli, D. (2018). Evaluating the performance of structure from motion pipelines. J. Imaging, 4.","DOI":"10.3390\/jimaging4080098"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, P., and Qin, T. (2018, January 8\u201314). Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_40"},{"key":"ref_15","first-page":"1","article-title":"Visual SLAM and structure from motion in dynamic environments: A survey","volume":"51","author":"Saputra","year":"2018","journal-title":"ACM Comput. Surv. CSUR"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40064-016-3573-7","article-title":"Review of visual odometry: Types, approaches, challenges, and applications","volume":"5","author":"Aqel","year":"2016","journal-title":"SpringerPlus"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tang, C., Wang, O., and Tan, P. (2017, January 10\u201312). Gslam: Initialization-robust monocular visual slam via global structure-from-motion. Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China.","DOI":"10.1109\/3DV.2017.00027"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1145\/358669.358692","article-title":"Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography","volume":"24","author":"Fischler","year":"1981","journal-title":"Commun. ACM"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"4076","DOI":"10.1109\/LRA.2018.2860039","article-title":"DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes","volume":"3","author":"Bescos","year":"2018","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1255","DOI":"10.1109\/TRO.2017.2705103","article-title":"Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras","volume":"33","year":"2017","journal-title":"IEEE Trans. Robot."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yu, C., Liu, Z., Liu, X.J., Xie, F., Yang, Y., Wei, Q., and Fei, Q. (2018, January 1\u20135). DS-SLAM: A semantic visual SLAM towards dynamic environments. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593691"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011, January 6\u201313). ORB: An efficient alternative to SIFT or SURF. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"ref_24","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Runz, M., Buffier, M., and Agapito, L. (2018, January 16\u201320). Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany.","DOI":"10.1109\/ISMAR.2018.00024"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xu, B., Li, W., Tzoumanikas, D., Bloesch, M., Davison, A., and Leutenegger, S. (2019, January 20\u201324). Mid-fusion: Octree-based object-level multi-instance dynamic slam. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794371"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1697","DOI":"10.1177\/0278364916669237","article-title":"ElasticFusion: Real-time dense SLAM and light source estimation","volume":"35","author":"Whelan","year":"2016","journal-title":"Int. J. Robot. Res."},{"key":"ref_29","unstructured":"Hruby, P., and Pajdla, T. (2021). Reconstructing Small 3D Objects in front of a Textured Background. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kundu, A., Krishna, K.M., and Jawahar, C. (2011, January 6\u201313). Realtime multibody visual SLAM with a smoothly moving monocular camera. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126482"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ranftl, R., Vineet, V., Chen, Q., and Koltun, V. (2016, January 27\u201330). Dense monocular depth estimation in complex dynamic scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.440"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1137\/110856733","article-title":"A convex approach to minimal partitions","volume":"5","author":"Chambolle","year":"2012","journal-title":"SIAM J. Imaging Sci."},{"key":"ref_33","unstructured":"Casser, V., Pirk, S., Mahjourian, R., and Angelova, A. (February, January 27). Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Casser, V., Pirk, S., Mahjourian, R., and Angelova, A. (2019, January 16\u201317). Unsupervised monocular depth and ego-motion learning with structure and semantics. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00051"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"20657","DOI":"10.1109\/JSEN.2021.3099511","article-title":"RS-SLAM: A Robust Semantic SLAM in Dynamic Environments Based on RGB-D Sensor","volume":"21","author":"Ran","year":"2021","journal-title":"IEEE Sensors J."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Huang, J., Yang, S., Zhao, Z., Lai, Y.K., and Hu, S.M. (2019, January 16\u201317). Clusterslam: A slam backend for simultaneous rigid body clustering and motion estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00597"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Judd, K.M., Gammell, J.D., and Newman, P. (2018, January 1\u20135). Multimotion visual odometry (mvo): Simultaneous estimation of camera and third-party motions. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594213"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bullinger, S., Bodensteiner, C., Wuttke, S., and Arens, M. (2016, January 4\u20138). Moving object reconstruction in monocular video data using boundary generation. Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico.","DOI":"10.1109\/ICPR.2016.7899640"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1109\/LRA.2020.3045647","article-title":"DymSLAM: 4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation","volume":"6","author":"Wang","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., and Hospedales, T.M. (2018, January 18\u201322). Learning to compare: Relation network for few-shot learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00131"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.neucom.2019.11.023","article-title":"Deep learning in video multi-object tracking: A survey","volume":"381","author":"Ciaparrone","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B. (2016, January 25\u201328). Simple online and realtime tracking. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7533003"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Wojke, N., Bewley, A., and Paulus, D. (2017, January 17\u201320). Simple Online and Realtime Tracking with a Deep Association Metric. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Wojke, N., and Bewley, A. (2018, January 12\u201315). Deep Cosine Metric Learning for Person Re-identification. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00087"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1115\/1.3662552","article-title":"A new approach to linear filtering and prediction problems","volume":"82","author":"Kalman","year":"1960","journal-title":"J. Basic Eng."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1330","DOI":"10.1109\/34.888718","article-title":"A flexible new technique for camera calibration","volume":"22","author":"Zhang","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_53","unstructured":"(2020, October 02). CloudCompare\u2014Open Source Project. Available online: https:\/\/www.cloudcompare.org\/doc\/wiki\/index.php?title=Introduction."},{"key":"ref_54","unstructured":"(2022, January 18). Cloud-to-Cloud Distance\u2014CloudCompareWiki. Available online: https:\/\/www.cloudcompare.org\/doc\/wiki\/index.php?title=Cloud-to-Cloud_Distance."},{"key":"ref_55","unstructured":"Boulch, A., Le Saux, B., and Audebert, N. (2017, January 23\u201324). Unstructured Point Cloud Semantic Labeling Using Deep Segmentation Networks. Proceedings of the 3DOR@ Eurographics\u2014Eurographics Workshop on 3D Object Retrieval, Lyon, France."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Guerry, J., Boulch, A., Le Saux, B., Moras, J., Plyer, A., and Filliat, D. (2017, January 22\u201329). Snapnet-r: Consistent 3d multi-view semantic labeling for robotics. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.85"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/13\/3199\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:42:21Z","timestamp":1760139741000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/13\/3199"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,3]]},"references-count":56,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["rs14133199"],"URL":"https:\/\/doi.org\/10.3390\/rs14133199","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,3]]}}}