{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T15:59:37Z","timestamp":1778947177664,"version":"3.51.4"},"reference-count":43,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2022,5,27]],"date-time":"2022-05-27T00:00:00Z","timestamp":1653609600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Vermont Agency of Transportation"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object geo-localization from images is crucial to many applications such as land surveying, self-driving, and asset management. Current visual object geo-localization algorithms suffer from hardware limitations and impractical assumptions limiting their usability in real-world applications. Most of the current methods assume object sparsity, the presence of objects in at least two frames, and most importantly they only support a single class of objects. In this paper, we present a novel two-stage technique that detects and geo-localizes dense, multi-class objects such as traffic signs from street videos. Our algorithm is able to handle low frame rate inputs in which objects might be missing in one or more frames. We propose a detector that is not only able to detect objects in images, but also predicts a positional offset for each object relative to the camera GPS location. We also propose a novel tracker algorithm that is able to track a large number of multi-class objects. Many current geo-localization datasets require specialized hardware, suffer from idealized assumptions not representative of reality, and are often not publicly available. In this paper, we propose a public dataset called ARTSv2, which is an extension of ARTS dataset that covers a diverse set of roads in widely varying environments to ensure it is representative of real-world scenarios. Our dataset will both support future research and provide a crucial benchmark for the field.<\/jats:p>","DOI":"10.3390\/rs14112575","type":"journal-article","created":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T02:30:06Z","timestamp":1653964206000},"page":"2575","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Object Tracking and Geo-Localization from Street Images"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8527-0231","authenticated-orcid":false,"given":"Daniel","family":"Wilson","sequence":"first","affiliation":[{"name":"Complex Systems Center, University of Vermont, 194 South Prospect Street Burlington, Burlington, VT 05405, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thayer","family":"Alshaabi","sequence":"additional","affiliation":[{"name":"Complex Systems Center, University of Vermont, 194 South Prospect Street Burlington, Burlington, VT 05405, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Colin","family":"Van Oort","sequence":"additional","affiliation":[{"name":"Complex Systems Center, University of Vermont, 194 South Prospect Street Burlington, Burlington, VT 05405, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6344-9604","authenticated-orcid":false,"given":"Xiaohan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Complex Systems Center, University of Vermont, 194 South Prospect Street Burlington, Burlington, VT 05405, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan","family":"Nelson","sequence":"additional","affiliation":[{"name":"Penn State Department of Geography, 302 N Burrowes Street, University Park, PA 16802, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Safwan","family":"Wshah","sequence":"additional","affiliation":[{"name":"Complex Systems Center, University of Vermont, 194 South Prospect Street Burlington, Burlington, VT 05405, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Chaabane, M., Gueguen, L., Trabelsi, A., Beveridge, R., and O\u2019Hara, S. (2021, January 5\u20139). End-to-End Learning Improves Static Object Geo-Localization From Video. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Virtual.","DOI":"10.1109\/WACV48630.2021.00211"},{"key":"ref_2","unstructured":"Nassar, A.S., Lef\u00e8vre, S., and Wegner, J.D. (November, January 27). Simultaneous multi-view instance detection with learned geometric soft-constraints. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Nassar, A.S., D\u2019Aronco, S., Lef\u00e8vre, S., and Wegner, J.D. (2020, January 23\u201328). GeoGraph: Graph-Based Multi-view Object Detection with Geometric Cues End-to-End. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58571-6_29"},{"key":"ref_4","unstructured":"McManus, C., Churchill, W., Maddern, W., Stewart, A.D., and Newman, P. (June, January 31). Shady dealings: Robust, long-term visual localisation using illumination invariance. Proceedings of the Institute of Electrical and Electronics Engineers (IEEE) International Conference on Robotics and Automation (ICRA), Hong Kong, China."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Suenderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., and Milford, M. (2015, January 13\u201317). Place recognition with ConvNet landmarks: Viewpoint-robust, condition-robust, training-free. Proceedings of the Robotics: Science and Systems XI, Rome, Italy.","DOI":"10.15607\/RSS.2015.XI.022"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Krylov, V.A., Kenny, E., and Dahyot, R. (2018). Automatic Discovery and Geotagging of Objects from Street View Imagery. Remote Sens., 10.","DOI":"10.3390\/rs10050661"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Krylov, V.A., and Dahyot, R. (2018, January 7\u201310). Object geolocation using mrf based multi-sensor fusion. Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece.","DOI":"10.1109\/ICIP.2018.8451458"},{"key":"ref_8","unstructured":"Wilson, D., Zhang, X., Sultani, W., and Wshah, S. (2021). Visual and Object Geo-localization: A Comprehensive Survey. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1109\/TITS.2019.2958486","article-title":"ARTS: Automotive Repository of Traffic Signs for the United States","volume":"22","author":"Almutairy","year":"2019","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1109\/MRA.2006.1678144","article-title":"Simultaneous localization and mapping (SLAM): Part II","volume":"13","author":"Bailey","year":"2006","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Szeliski, R. (2010). Computer Vision: Algorithms and Applications, Springer Science & Business Media.","DOI":"10.1007\/978-1-84882-935-0"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Fairfield, N., and Urmson, C. (2011, January 9\u201313). Traffic light mapping and detection. Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China.","DOI":"10.1109\/ICRA.2011.5980164"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.isprsjprs.2012.11.009","article-title":"Detection and 3D reconstruction of traffic signs from multiple view color images","volume":"77","author":"Soheilian","year":"2013","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hebbalaguppe, R., Garg, G., Hassan, E., Ghosh, H., and Verma, A. (2017, January 24\u201331). Telecom Inventory management via object recognition and localisation on Google Street View Images. Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA.","DOI":"10.1109\/WACV.2017.86"},{"key":"ref_15","unstructured":"Dalal, N., and Triggs, B. (2005, January 21\u201323). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_16","unstructured":"Liu, C.J., Ulicny, M., Manzke, M., and Dahyot, R. (2021). Context Aware Object Geotagging. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN Object detection with Caffe. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_20","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R.B., He, K., and Doll\u00e1r, P. (2018). Focal Loss for Dense Object Detection. arXiv.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., and Yang, M.H. (2018, January 8\u201314). Online multi-object tracking with dual matching attention networks. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_23"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.B.G., Geiger, A., and Leibe, B. (2019, January 15\u201320). Mots: Multi-object tracking and segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00813"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Son, J., Baek, M., Cho, M., and Han, B. (2017, January 21\u201326). Multi-object tracking with quadruplet convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.403"},{"key":"ref_26","unstructured":"Xu, J., Cao, Y., Zhang, Z., and Hu, H. (November, January 27). Spatial-temporal relation networks for multi-object tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_27","unstructured":"Hua, G., and J\u00e9gou, H. (2016, January 11\u201314). Fully-Convolutional Siamese Networks for Object Tracking. Proceedings of the Computer Vision\u2014ECCV 2016 Workshops, Amsterdam, The Netherlands."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Xiang, Y., Alahi, A., and Savarese, S. (2015, January 7\u201313). Learning to Track: Online Multi-object Tracking by Decision Making. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.534"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (2020, January 14\u201319). nuScenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_31","unstructured":"Tzutalin (2022, April 05). Tzutalin. LabelImg. Git Code. Available online: https:\/\/github.com\/tzutalin\/labelImg."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The Hungarian Method For The Assignment Problem","volume":"2","author":"Kuhn","year":"1955","journal-title":"Nav. Res. Logist. Q."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_34","unstructured":"Kingma, D.P., and Ba, J. (2015). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. arXiv.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Grabner, H., Grabner, M., and Bischof, H. (2006, January 4\u20137). Real-Time Tracking via On-line Boosting. Proceedings of the British Machine Vision Conference 2006, Edinburgh, UK.","DOI":"10.5244\/C.20.6"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Babenko, B., Yang, M.H., and Belongie, S. (2009, January 20\u201325). Visual tracking with online Multiple Instance Learning. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206737"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TPAMI.2014.2345390","article-title":"High-Speed Tracking with Kernelized Correlation Filters","volume":"37","author":"Henriques","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1409","DOI":"10.1109\/TPAMI.2011.239","article-title":"Tracking-Learning-Detection","volume":"34","author":"Kalal","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Kalal, Z., Mikolajczyk, K., and Matas, J. (2010, January 23\u201326). Forward-Backward Error: Automatic Detection of Tracking Failures. Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey.","DOI":"10.1109\/ICPR.2010.675"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Held, D., Thrun, S., and Savarese, S. (2016). Learning to Track at 100 FPS with Deep Regression Networks. arXiv.","DOI":"10.1007\/978-3-319-46448-0_45"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Bolme, D., Beveridge, J., Draper, B., and Lui, Y. (2010, January 13\u201318). Visual object tracking using adaptive correlation filters. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539960"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1007\/s11263-017-1061-3","article-title":"Discriminative Correlation Filter with Channel and Spatial Reliability","volume":"126","author":"Matas","year":"2018","journal-title":"Int. J. Comput. Vis."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/11\/2575\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:19:55Z","timestamp":1760138395000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/11\/2575"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,27]]},"references-count":43,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["rs14112575"],"URL":"https:\/\/doi.org\/10.3390\/rs14112575","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,27]]}}}