{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T11:39:24Z","timestamp":1765280364473,"version":"3.41.2"},"reference-count":39,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T00:00:00Z","timestamp":1693180800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Person-following is a crucial capability for service robots, and the employment of vision technology is a leading trend in building environmental understanding. While most existing methodologies rely on a tracking-by-detection strategy, which necessitates extensive datasets for training and yet remains susceptible to environmental noise, we propose a novel approach: real-time tracking-by-segmentation with a future motion estimation framework. This framework facilitates pixel-level tracking of a target individual and predicts their future motion. Our strategy leverages a single-shot segmentation tracking neural network for precise foreground segmentation to track the target, overcoming the limitations of using a rectangular region of interest (ROI). Here we clarify that, while the ROI provides a broad context, the segmentation within this bounding box offers a detailed and more accurate position of the human subject. To further improve our approach, a classification-lock pre-trained layer is utilized to form a constraint that curbs feature outliers originating from the person being tracked. A discriminative correlation filter estimates the potential target region in the scene to prevent foreground misrecognition, while a motion estimation neural network anticipates the target's future motion for use in the control module. We validated our proposed methodology using the VOT, LaSot, YouTube-VOS, and Davis tracking datasets, demonstrating its effectiveness. Notably, our framework supports long-term person-following tasks in indoor environments, showing promise for practical implementation in service robots.<\/jats:p>","DOI":"10.3389\/fnbot.2023.1255085","type":"journal-article","created":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T15:13:33Z","timestamp":1693235613000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Tracking by segmentation with future motion estimation applied to person-following robots"],"prefix":"10.3389","volume":"17","author":[{"given":"Shenlu","family":"Jiang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Runze","family":"Cui","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Runze","family":"Wei","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiyang","family":"Fu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhonghua","family":"Hong","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guofu","family":"Feng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2023,8,28]]},"reference":[{"key":"B1","first-page":"221","article-title":"\u201cOne-shot video object segmentation,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Caelles","year":"2017"},{"key":"B2","first-page":"6668","article-title":"\u201cSiamese box adaptive network for visual tracking,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chen","year":"2020"},{"key":"B3","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1109\/IROS40897.2019.8967645","article-title":"\u201cPerson-following for telepresence robots using web cameras,\u201d","volume-title":"2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)","author":"Cheng","year":"2019"},{"key":"B4","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1080\/10798587.2016.1159059","article-title":"Particle filter planar target tracking with a monocular camera for mobile robots","volume":"23","author":"Chou","year":"2017","journal-title":"Intell. Autom. Soft Comput"},{"key":"B5","doi-asserted-by":"crossref","first-page":"4335","DOI":"10.1109\/ICRA.2013.6631191","article-title":"\u201cAutonomous person following for telepresence robots,\u201d","volume-title":"2013 IEEE International Conference on Robotics and Automation","author":"Cosgun","year":"2013"},{"key":"B6","first-page":"1","article-title":"\u201cHistograms of oriented gradients for human detection,\u201d","volume-title":"IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)","author":"Dalal","year":"2005"},{"key":"B7","first-page":"7183","article-title":"\u201cProbabilistic regression for visual tracking,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Danelljan","year":"2020"},{"key":"B8","doi-asserted-by":"publisher","first-page":"1532","DOI":"10.1109\/TPAMI.2014.2300479","article-title":"Fast feature pyramids for object detection","volume":"36","author":"Doll\u00e1r","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B9","first-page":"5937","article-title":"\u201cIm2Flow: motion hallucination from static images for action recognition,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Gao","year":"2018"},{"key":"B10","doi-asserted-by":"publisher","first-page":"2526","DOI":"10.1109\/TIP.2018.2806280","article-title":"Good features to correlate for visual tracking","volume":"27","author":"Gundogdu","year":"2018","journal-title":"IEEE Trans. Image Process"},{"key":"B11","first-page":"749","article-title":"\u201cLearning to track at 100 fps with deep regression networks,\u201d","volume-title":"European Conference on Computer Vision","author":"Held","year":"2016"},{"key":"B12","first-page":"1314","article-title":"\u201cSearching for mobilenetv3,\u201d","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Howard","year":"2019"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TIM.2022.3160534","article-title":"We know where they are looking at from the RGB-D camera: gaze following in 3D","volume":"71","author":"Hu","year":"2022","journal-title":"IEEE Trans. Instrum. Meas"},{"key":"B14","doi-asserted-by":"publisher","first-page":"3903","DOI":"10.3390\/s18113903","article-title":"A classification-lock tracking strategy allowing a person-following robot to operate in a complicated indoor environment","volume":"18","author":"Jiang","year":"2018","journal-title":"Sensors"},{"key":"B15","first-page":"1339","article-title":"\u201cObject tracking by reconstruction with view-specific discriminative correlation filters,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Kart","year":"2019"},{"key":"B16","first-page":"273","article-title":"\u201cInstance-level future motion estimation in a single image based on ordinal regression,\u201d","author":"Kim","year":"2019","journal-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (IEEE)"},{"key":"B17","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/j.robot.2016.07.004","article-title":"Identification of a specific person using color, height, and gait features for a person following robot","volume":"84","author":"Koide","year":"2016","journal-title":"Robot. Auton. Syst"},{"key":"B18","doi-asserted-by":"publisher","first-page":"103348","DOI":"10.1016\/j.robot.2019.103348","article-title":"Monocular person tracking and identification with on-line deep feature selection for person following robots","volume":"124","author":"Koide","year":"2020","journal-title":"Robot. Auton. Syst"},{"key":"B19","article-title":"\u201cThe sixth visual object tracking vot2018 challenge results,\u201d","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","author":"Kristan","year":"2018"},{"key":"B20","first-page":"8971","article-title":"\u201cHigh performance visual tracking with siamese region proposal network,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Li","year":"2018"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1038","DOI":"10.1109\/TIE.2011.2161248","article-title":"A robust real-time embedded vision system on an unmanned rotorcraft for ground target following","volume":"59","author":"Lin","year":"2012","journal-title":"IEEE Trans. Ind. Electron"},{"key":"B22","first-page":"7133","article-title":"\u201cD3S-a discriminative single shot segmentation tracker,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Lukezic","year":"2020"},{"key":"B23","doi-asserted-by":"publisher","first-page":"4276","DOI":"10.1109\/TIM.2018.2890400","article-title":"OPTICS-based template matching for vision sensor-based shoe detection in human-robot coexisting environments","volume":"68","author":"Paral","year":"2019","journal-title":"IEEE Trans. Instrum. Meas"},{"key":"B24","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2016.85","article-title":"\u201cA benchmark dataset and evaluation methodology for video object segmentation,\u201d","volume-title":"Comp. Vis. Patt. Recognition","author":"Perazzi","year":"2016"},{"key":"B25","doi-asserted-by":"publisher","first-page":"108339","DOI":"10.1016\/j.nanoen.2023.108339","article-title":"Self-powered difunctional sensors based on sliding contact-electrification and tribovoltaic effects for pneumatic monitoring and controlling","volume":"110","author":"Shi","year":"","journal-title":"Nano Energy"},{"key":"B26","doi-asserted-by":"publisher","first-page":"110001","DOI":"10.1016\/j.ymssp.2022.110001","article-title":"Center-based transfer feature learning with classifier adaptation for surface defect recognition","volume":"188","author":"Shi","year":"","journal-title":"Mech. Syst. Signal Process"},{"key":"B27","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v31i1.11231","article-title":"\u201cInception-v4, inception-resnet and the impact of residual connections on learning,\u201d","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31","author":"Szegedy","year":"2017"},{"key":"B28","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2019.00971","article-title":"\u201cFEELVOS: fast end-to-end embedding learning for video object segmentation,\u201d","volume-title":"Comp. Vis. Patt. Recognition","author":"Voigtlaender","year":"2019"},{"key":"B29","doi-asserted-by":"crossref","DOI":"10.5244\/C.31.116","article-title":"\u201cOnline adaptation of convolutional neural networks for video object segmentation,\u201d","volume-title":"Proc. British Machine Vision Conference","author":"Voigtlaender","year":"2017"},{"key":"B30","doi-asserted-by":"publisher","first-page":"997","DOI":"10.1109\/TMECH.2018.2820172","article-title":"Accurate and real-time 3-D tracking for the following robots by fusing vision and ultrasonar information","volume":"23","author":"Wang","year":"2018","journal-title":"IEEE\/ASME Trans. Mechatron"},{"key":"B31","first-page":"1328","article-title":"\u201cFast online object tracking and segmentation: a unifying approach,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang","year":"2019"},{"key":"B32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TIM.2021.3073712","article-title":"A UHF RFID-based dynamic object following method for a mobile robot using phase difference information","volume":"70","author":"Wu","year":"2021","journal-title":"IEEE Trans. Instrum. Meas"},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1809.03327","article-title":"YouTube-VOS: a large-scale video object segmentation benchmark","author":"Xu","year":"2018","journal-title":"arXiv"},{"key":"B34","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2018.00680","article-title":"\u201cEfficient video object segmentation via network modulation,\u201d","volume-title":"Comp. Vis. Patt. Recognition","author":"Yang","year":"2018"},{"key":"B35","first-page":"9","article-title":"\u201cDevelopment of a person following robot with vision based target detection,\u201d","volume-title":"Proceedings of IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2006)","author":"Yoshimi","year":"2006"},{"key":"B36","doi-asserted-by":"crossref","first-page":"4203","DOI":"10.1109\/ICRA40945.2020.9197374","article-title":"\u201cVisual odometry revisited: what should be learnt,\u201d","volume-title":"2020 IEEE International Conference on Robotics and Automation (ICRA)","author":"Zhan","year":"2020"},{"key":"B37","doi-asserted-by":"publisher","first-page":"9360","DOI":"10.1109\/TIE.2019.2893829","article-title":"Vision-based target-following guider for mobile robot","volume":"66","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Ind. Electron"},{"key":"B38","doi-asserted-by":"publisher","first-page":"1997","DOI":"10.1109\/TMECH.2021.3083594","article-title":"Efficient motion planning based on kinodynamic model for quadruped robots following persons in confined spaces","volume":"26","author":"Zhang","year":"2021","journal-title":"IEEE\/ASME Trans. Mechatronics"},{"key":"B39","doi-asserted-by":"publisher","first-page":"4270","DOI":"10.1109\/TIM.2019.2942533","article-title":"An end-to-end calibration method for welding robot laser vision systems with deep reinforcement learning","volume":"69","author":"Zou","year":"2019","journal-title":"IEEE Trans. Instrum. Meas"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1255085\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T15:13:46Z","timestamp":1693235626000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1255085\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,28]]},"references-count":39,"alternative-id":["10.3389\/fnbot.2023.1255085"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2023.1255085","relation":{},"ISSN":["1662-5218"],"issn-type":[{"type":"electronic","value":"1662-5218"}],"subject":[],"published":{"date-parts":[[2023,8,28]]},"article-number":"1255085"}}