{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:48:14Z","timestamp":1740185294974,"version":"3.37.3"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T00:00:00Z","timestamp":1696982400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T00:00:00Z","timestamp":1696982400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["12272404"],"award-info":[{"award-number":["12272404"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Postgraduate Research Innovation Project of Hunan Province of China","award":["CX20210041"],"award-info":[{"award-number":["CX20210041"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Vis. Intell."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Monocular object pose tracking has been a key technology in autonomous rendezvous of two moving platforms. However, rapid relative motion between platforms causes large interframe pose shifts, which leads to pose tracking failure. Based on the derivation of the region-based pose tracking method and the theory of rigid body kinematics, we put forward that the stability of the color segmentation model and linearization in pose optimization are the key to region-based monocular object pose tracking. A reliable metric named VoI is designed to measure interframe pose shifts, based on which we argue that motion continuity recovery is a promising way to tackle the translation-dominant large pose shift issue. Then, a 2D tracking method is adopted to bridge the interframe motion continuity gap. For texture-rich objects, the motion continuity can be recovered through localized region-based pose transferring, which is performed by solving a PnP (Perspective-n-Point) problem within the tracked 2D bounding boxes of two adjacent frames. Moreover, for texture-less objects, a direct translation approach is introduced to estimate an intermediate pose of the frame. Finally, a region-based pose refinement is exploited to obtain the final tracked pose. Experimental results on synthetic and real image sequences indicate that the proposed method achieves superior performance to state-of-the-art methods in tracking objects with large pose shifts.<\/jats:p>","DOI":"10.1007\/s44267-023-00023-w","type":"journal-article","created":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T11:03:27Z","timestamp":1697022207000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Robust monocular object pose tracking for large pose shift using 2D tracking"],"prefix":"10.1007","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0266-7426","authenticated-orcid":false,"given":"Qiufu","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7106-404X","authenticated-orcid":false,"given":"Jiexin","family":"Zhou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhang","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3018-8043","authenticated-orcid":false,"given":"Xiaoliang","family":"Sun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qifeng","family":"Yu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,10,11]]},"reference":[{"key":"23_CR1","first-page":"666","volume-title":"Proceedings of the 15th Asian conference on computer vision","author":"M. Stoiber","year":"2020","unstructured":"Stoiber, M., Pfanne, M., Strobl, K., Triebel, R., & Albu-Sch\u00e4ffer, A. (2020). A sparse Gaussian approach to region-based 6DoF object tracking. In H. Ishikawa, C.-L. Liu, T. Pajdla, et al. (Eds.), Proceedings of the 15th Asian conference on computer vision (pp. 666\u2013682)). Cham: Springer."},{"issue":"3","key":"23_CR2","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1007\/s11390-021-1272-5","volume":"36","author":"J.-C. Li","year":"2021","unstructured":"Li, J.-C., Zhong, F., Xu, S.-H., & Qin, X.-Y. (2021). 3D object tracking with adaptively weighted local bundles. Journal of Computer Science and Technology, 36(3), 555\u2013571.","journal-title":"Journal of Computer Science and Technology"},{"issue":"8","key":"23_CR3","doi-asserted-by":"publisher","first-page":"1797","DOI":"10.1109\/TPAMI.2018.2884990","volume":"41","author":"H. Tjaden","year":"2019","unstructured":"Tjaden, H., Schwanecke, U., Sch\u00f6mer, E., & Cremers, D. (2019). A region-based Gauss-Newton approach to real-time monocular multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1797\u20131812.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"4","key":"23_CR4","doi-asserted-by":"publisher","first-page":"1008","DOI":"10.1007\/s11263-022-01579-8","volume":"130","author":"M. Stoiber","year":"2022","unstructured":"Stoiber, M., Pfanne, M., Strobl, K. H., Triebel, R., & Albu-Sch\u00e4ffer, A. (2022). SRT3D: a sparse region-based 3D object tracking approach for the real world. International Journal of Computer Vision, 130(4), 1008\u20131030.","journal-title":"International Journal of Computer Vision"},{"issue":"11","key":"23_CR5","doi-asserted-by":"publisher","first-page":"4409","DOI":"10.1109\/TCSVT.2021.3053696","volume":"31","author":"X. Sun","year":"2021","unstructured":"Sun, X., Zhou, J., Zhang, W., Wang, Z., & Yu, Q. (2021). Robust monocular pose tracking of less-distinct objects based on contour-part model. IEEE Transactions on Circuits and Systems for Video Technology, 31(11), 4409\u20134421.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"1","key":"23_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/0600000001","volume":"1","author":"V. Lepetit","year":"2005","unstructured":"Lepetit, V. & Fua, P. (2005). Monocular model-based 3D tracking of rigid objects: a survey. Foundations and Trends in Computer Graphics and Vision, 1(1), 1\u201389.","journal-title":"Foundations and Trends in Computer Graphics and Vision"},{"issue":"1","key":"23_CR7","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1109\/TVCG.2013.94","volume":"20","author":"B.-K. Seo","year":"2013","unstructured":"Seo, B.-K., Park, H., Park, J.-I., Hinterstoisser, S., & Ilic, S. (2013). Optimal local searching for fast and robust textureless 3D object tracking in highly cluttered backgrounds. IEEE Transactions on Visualization and Computer Graphics, 20(1), 99\u2013110.","journal-title":"IEEE Transactions on Visualization and Computer Graphics"},{"key":"23_CR8","doi-asserted-by":"publisher","first-page":"979","DOI":"10.1007\/s00371-015-1098-7","volume":"31","author":"G. Wang","year":"2015","unstructured":"Wang, G., Wang, B., Zhong, F., Qin, X., & Chen, B. (2015). Global optimal searching for textureless 3D object tracking. The Visual Computer, 31, 979\u2013988.","journal-title":"The Visual Computer"},{"key":"23_CR9","doi-asserted-by":"publisher","first-page":"973","DOI":"10.1007\/s11263-018-1119-x","volume":"127","author":"L. Zhong","year":"2019","unstructured":"Zhong, L., & Zhang, L. (2019). A robust monocular 3D object tracking method combining statistical and photometric constraints. International Journal of Computer Vision, 127, 973\u2013992.","journal-title":"International Journal of Computer Vision"},{"key":"23_CR10","first-page":"674","volume-title":"Proceedings of the 7th international joint conference on artificial intelligence","author":"B. D. Lucas","year":"1981","unstructured":"Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In P.J. Hayes (Ed.), Proceedings of the 7th international joint conference on artificial intelligence (pp. 674\u2013679). Los Altos: William Kaufmann."},{"key":"23_CR11","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1023\/B:VISI.0000011205.11775.fd","volume":"56","author":"S. Baker","year":"2004","unstructured":"Baker, S., & Matthews, I. (2004). Lucas-Kanade 20 years on: a unifying framework. International Journal of Computer Vision, 56, 221\u2013255.","journal-title":"International Journal of Computer Vision"},{"key":"23_CR12","first-page":"389","volume-title":"Proceedings of the 4th international conference on 3D vision","author":"H. Alismail","year":"2016","unstructured":"Alismail, H., Browning, B., & Lucey, S. (2016). Robust tracking in low light and sudden illumination changes. In Proceedings of the 4th international conference on 3D vision (pp. 389\u2013398). Los Alamitos: IEEE."},{"key":"23_CR13","first-page":"4429","volume-title":"2017 IEEE international conference on robotics and automation","author":"L. Chen","year":"2017","unstructured":"Chen, L., Zhou, F., Shen, Y., Tian, X., Ling, H., & Chen, Y. (2017). Illumination insensitive efficient second-order minimization for planar object tracking. In 2017 IEEE international conference on robotics and automation (pp. 4429\u20134436). Los Alamitos: IEEE."},{"key":"23_CR14","first-page":"3414","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"A. Crivellaro","year":"2014","unstructured":"Crivellaro, A., & Lepetit, V. (2014). Robust 3D tracking with descriptor fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3414\u20133421). Piscataway: IEEE."},{"key":"23_CR15","doi-asserted-by":"publisher","first-page":"551","DOI":"10.1007\/978-3-319-49409-8_48","volume-title":"The European conference on computer vision 2016 workshops","author":"B.-K. Seo","year":"2016","unstructured":"Seo, B.-K., & Wuest, H. (2016). A direct method for robust model-based 3D object tracking from a monocular RGB image. In G. Hua & H. J\u00e9gou (Eds.), The European conference on computer vision 2016 workshops (pp. 551\u2013562). Berlin: Springer."},{"key":"23_CR16","doi-asserted-by":"publisher","first-page":"1449","DOI":"10.1109\/ICCV.2013.183","volume-title":"2013 IEEE international conference on computer vision","author":"J. Engel","year":"2013","unstructured":"Engel, J., Sturm, J., & Cremers, D. (2013). Semi-dense visual odometry for a monocular camera. In 2013 IEEE international conference on computer vision (pp. 1449\u20131456). Piscataway: IEEE."},{"issue":"3","key":"23_CR17","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1109\/TPAMI.2017.2658577","volume":"40","author":"J. Engel","year":"2017","unstructured":"Engel, J., Koltun, V., & Cremers, D. (2017). Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 611\u2013625.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"9","key":"23_CR18","doi-asserted-by":"publisher","first-page":"2302","DOI":"10.1109\/TCSVT.2017.2731519","volume":"28","author":"L. Zhong","year":"2017","unstructured":"Zhong, L., Lu, M., & Zhang, L. (2017). A direct 3D object tracking method based on dynamic textured model rendering and extended dense feature fields. IEEE Transactions on Circuits and Systems for Video Technology, 28(9), 2302\u20132315.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"12","key":"23_CR19","doi-asserted-by":"publisher","first-page":"2200","DOI":"10.1109\/TCSVT.2015.2430652","volume":"26","author":"K. Pauwels","year":"2015","unstructured":"Pauwels, K., Rubio, L., & Ros, E. (2015). Real-time pose detection and tracking of hundreds of objects. IEEE Transactions on Circuits and Systems for Video Technology, 26(12), 2200\u20132214.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"23_CR20","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1007\/s11263-011-0514-3","volume":"98","author":"V. A. Prisacariu","year":"2012","unstructured":"Prisacariu, V. A., & Reid, I. D. (2012). PWP3D: real-time segmentation and tracking of 3d objects. International Journal of Computer Vision, 98, 335\u2013354.","journal-title":"International Journal of Computer Vision"},{"key":"23_CR21","first-page":"423","volume-title":"Proceedings of the 14th European conference on computer vision","author":"H. Tjaden","year":"2016","unstructured":"Tjaden, H., Schwanecke, U., & Sch\u00f6mer, E. (2016). Real-time monocular segmentation and pose tracking of multiple objects. In B. Leibe, J. Matas, N. Sebe, et al. (Eds.), Proceedings of the 14th European conference on computer vision (pp. 423\u2013438). Berlin: Springer."},{"key":"23_CR22","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1007\/s11263-015-0873-2","volume":"118","author":"J. Hexner","year":"2016","unstructured":"Hexner, J., & Hagege, R. R. (2016). 2D-3D pose estimation of heterogeneous objects using a region based approach. International Journal of Computer Vision, 118, 95\u2013112.","journal-title":"International Journal of Computer Vision"},{"key":"23_CR23","first-page":"124","volume-title":"2017 IEEE international conference on computer vision","author":"H. Tjaden","year":"2017","unstructured":"Tjaden, H., Schwanecke, U., & Sch\u00f6mer, E. (2017). Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In 2017 IEEE international conference on computer vision (pp. 124\u2013132). Piscataway: IEEE."},{"key":"23_CR24","doi-asserted-by":"publisher","first-page":"5065","DOI":"10.1109\/TIP.2020.2973512","volume":"29","author":"L. Zhong","year":"2020","unstructured":"Zhong, L., Zhao, X., Zhang, Y., Zhang, S., & Zhang, L. (2020). Occlusion-aware region-based 3D pose tracking of objects with temporally consistent polar-based local partitioning. IEEE Transactions on Image Processing, 29, 5065\u20135078.","journal-title":"IEEE Transactions on Image Processing"},{"issue":"12","key":"23_CR25","doi-asserted-by":"publisher","first-page":"4319","DOI":"10.1109\/TVCG.2021.3085197","volume":"28","author":"H. Huang","year":"2021","unstructured":"Huang, H., Zhong, F., & Qin, X. (2021). Pixel-wise weighted region-based 3D object tracking using contour constraints. IEEE Transactions on Visualization and Computer Graphics, 28(12), 4319\u20134331.","journal-title":"IEEE Transactions on Visualization and Computer Graphics"},{"issue":"12","key":"23_CR26","doi-asserted-by":"publisher","first-page":"6727","DOI":"10.1109\/JSEN.2020.2976202","volume":"20","author":"Y. Liu","year":"2020","unstructured":"Liu, Y., Sun, P., & Namiki, A. (2020). Target tracking of moving and rotating object by high-speed monocular active vision. IEEE Sensors Journal, 20(12), 6727\u20136744.","journal-title":"IEEE Sensors Journal"},{"key":"23_CR27","first-page":"745","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"W. Kehl","year":"2017","unstructured":"Kehl, W., Tombari, F., Ilic, S., & Navab, N. (2017). Real-time 3D model tracking in color and depth on a single CPU core. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 745\u2013753). Piscataway: IEEE."},{"key":"23_CR28","first-page":"833","volume-title":"Proceedings of the 15th European conference on computer vision","author":"F. Manhardt","year":"2018","unstructured":"Manhardt, F., Kehl, W., Navab, N., & Tombari, F. (2018). Deep model-based 6d pose refinement in RGB. In F. Manhardt, W. Kehl, N. Navab, et al. (Eds.), Proceedings of the 15th European conference on computer vision (pp. 833\u2013849). Cham: Springer."},{"key":"23_CR29","first-page":"683","volume-title":"Proceedings of 15th European conference on computer vision","author":"Y. Li","year":"2018","unstructured":"Li, Y., Wang, G., Ji, X., Xiang, Y., & Deepim, D. F. (2018). Deep iterative matching for 6d pose estimation. In V. Ferrari, M. Hebert, C. Sminchisescu, et al. (Eds.), Proceedings of 15th European conference on computer vision (pp. 683\u2013698). Cham: Springer."},{"issue":"5","key":"23_CR30","doi-asserted-by":"publisher","first-page":"1328","DOI":"10.1109\/TRO.2021.3056043","volume":"37","author":"X. Deng","year":"2021","unstructured":"Deng, X., Mousavian, A., Xiang, Y., Xia, F., Bretl, T., & Fox, D. (2021). PoseRBPF: a Rao\u2013Blackwellized particle filter for 6-d object pose tracking. IEEE Transactions on Robotics, 37(5), 1328\u20131342.","journal-title":"IEEE Transactions on Robotics"},{"issue":"4","key":"23_CR31","doi-asserted-by":"publisher","first-page":"5159","DOI":"10.1109\/LRA.2020.3003866","volume":"5","author":"L. Zhong","year":"2020","unstructured":"Zhong, L., Zhang, Y., Zhao, H., Chang, A., Xiang, W., Zhang, S., et al. (2020). Seeing through the occluders: robust monocular 6-DOF object pose tracking via model-guided video object segmentation. IEEE Robotics and Automation Letters, 5(4), 5159\u20135166.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"23_CR32","first-page":"5389","volume-title":"2019 IEEE international conference on computer vision","author":"H. N. Hu","year":"2019","unstructured":"Hu, H. N., Cai, Q. Z., Wang, D., Lin, J., Sun, M., Kraehenbuehl, P., et al. (2019). Joint monocular 3D vehicle detection and tracking. In 2019 IEEE international conference on computer vision (pp. 5389\u20135398). Piscataway: IEEE."},{"key":"23_CR33","unstructured":"Ahmadyan, A., Hou, T., Wei, J., Zhang, L., Ablavatski, A., & Grundmann, M. (2020). Instant 3D object tracking with applications in augmented reality. arXiv preprint. arXiv:2006.13194."},{"key":"23_CR34","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1007\/s11263-008-0152-6","volume":"81","author":"V. Lepetit","year":"2009","unstructured":"Lepetit, V., Moreno-Noguer, F., & Fua, P. (2009). EPnP: an accurate o(n) solution to the PnP problem. International Journal of Computer Vision, 81, 155\u2013166.","journal-title":"International Journal of Computer Vision"},{"key":"23_CR35","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1007\/s11263-006-8711-1","volume":"72","author":"D. Cremers","year":"2007","unstructured":"Cremers, D., Rousson, M., & Deriche, R. (2007). A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape. International Journal of Computer Vision, 72, 195\u2013215.","journal-title":"International Journal of Computer Vision"},{"key":"23_CR36","volume-title":"A mathematical introduction to robotic manipulation","author":"R. M. Murray","year":"1994","unstructured":"Murray, R. M., Li, Z., & Sastry, S.S. (1994). A mathematical introduction to robotic manipulation. Boca Raton: CRC Press."},{"key":"23_CR37","unstructured":"Denninger, M., Sundermeyer, M., Winkelbauer, D., Zidan, Y., Olefir, D., Elbadrawy, M., Lodhi, A., & Katam, H. (2019). Blenderproc. arXiv preprint arXiv:1911.01911."},{"key":"23_CR38","first-page":"1401","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"L. Bertinetto","year":"2016","unstructured":"Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Philip, H. S., & Staple, T. (2016). Complementary learners for real-time tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1401\u20131409). Piscataway: IEEE."},{"key":"23_CR39","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","volume":"60","author":"D. G. Lowe","year":"2004","unstructured":"Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91\u2013110.","journal-title":"International Journal of Computer Vision"},{"key":"23_CR40","unstructured":"Madsen, K., Bruun Nielsen, H., & Tingleff, O. (2004). Methods for non-linear least squares problems. Retrieved July 15, 2023, from https:\/\/plato.asu.edu\/ftp\/hbn_lectures\/meth_nonlin_lsq.pdf."},{"issue":"6","key":"23_CR41","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1145\/358669.358692","volume":"24","author":"M. A. Fischler","year":"1981","unstructured":"Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381\u2013395.","journal-title":"Communications of the ACM"}],"container-title":["Visual Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44267-023-00023-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44267-023-00023-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44267-023-00023-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T06:04:07Z","timestamp":1700460247000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44267-023-00023-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,11]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["23"],"URL":"https:\/\/doi.org\/10.1007\/s44267-023-00023-w","relation":{},"ISSN":["2731-9008"],"issn-type":[{"type":"electronic","value":"2731-9008"}],"subject":[],"published":{"date-parts":[[2023,10,11]]},"assertion":[{"value":"29 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 August 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 August 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 October 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"22"}}