{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T19:10:25Z","timestamp":1777317025662,"version":"3.51.4"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T00:00:00Z","timestamp":1712707200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T00:00:00Z","timestamp":1712707200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6derung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["4"],"award-info":[{"award-number":["4"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Intell Robot Syst"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot\u2019s state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e., mediated approaches, the robot\u2019s state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time \u2013 to the best of our knowledge \u2013 to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general methodology improves the regression performance of deep convolutional neural networks (CNNs) on a broad class of non-egocentric 3D pose estimation problems, with minimal computational cost. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R<jats:inline-formula><jats:alternatives><jats:tex-math>$$^{2}$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:msup>\n                    <mml:mrow\/>\n                    <mml:mn>2<\/mml:mn>\n                  <\/mml:msup>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> regression metric, up to +0.51, compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous cm-scale UAV on the human pose estimation task. Our results show a significant reduction, i.e., 24% on average, on the mean absolute error of our stateful CNN, compared to a State-of-the-Art stateless counterpart.<\/jats:p>","DOI":"10.1007\/s10846-024-02091-6","type":"journal-article","created":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T09:02:20Z","timestamp":1712739740000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics"],"prefix":"10.1007","volume":"110","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8888-3538","authenticated-orcid":false,"given":"Elia","family":"Cereda","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefano","family":"Bonato","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3736-0419","authenticated-orcid":false,"given":"Mirko","family":"Nava","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1240-0768","authenticated-orcid":false,"given":"Alessandro","family":"Giusti","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4487-0836","authenticated-orcid":false,"given":"Daniele","family":"Palossi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,4,10]]},"reference":[{"key":"2091_CR1","doi-asserted-by":"crossref","unstructured":"Pinto, L., Gupta, A.: Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In: IEEE international conference on robotics and automation (ICRA). IEEE 2016, 3406\u20133413 (2016)","DOI":"10.1109\/ICRA.2016.7487517"},{"key":"2091_CR2","doi-asserted-by":"publisher","unstructured":"Palossi, D., Zimmerman, N., Burrello, A., Conti, F., M\u00fcller, H., Gambardella, L.M., Benini, L., Giusti, A., Guzzi, J.: Fully onboard AI-powered human-drone pose estimation on ultra-low power autonomous flying nano-UAVs, IEEE Int. Things J. (2021) pp. 1\u20131https:\/\/doi.org\/10.1109\/JIOT.2021.3091643","DOI":"10.1109\/JIOT.2021.3091643"},{"key":"2091_CR3","doi-asserted-by":"publisher","unstructured":"Loquercio, A., Kaufmann, E., Ranftl, R., M\u00fcller, M., Koltun, V., Scaramuzza, D.: Learning high-speed flight in the wild. Sci. Robot. 6(59), (2021) eabg5810. https:\/\/doi.org\/10.1126\/scirobotics.abg5810","DOI":"10.1126\/scirobotics.abg5810"},{"key":"2091_CR4","doi-asserted-by":"crossref","unstructured":"Kaufmann, E., Loquercio, A., Ranftl, R., Mueller, M., Koltun, V., Scaramuzza, D.: Deep drone acrobatics. In: Robotics science and systems XVI, pp. 4780\u20134783 (2020)","DOI":"10.15607\/RSS.2020.XVI.040"},{"key":"2091_CR5","doi-asserted-by":"publisher","unstructured":"Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: Visual-inertial odometry as a sequence-to-sequence learning problem. Proceedings of the AAAI conference on artificial intelligence 31(1) (2017). https:\/\/doi.org\/10.1609\/aaai.v31i1.11215","DOI":"10.1609\/aaai.v31i1.11215"},{"key":"2091_CR6","doi-asserted-by":"publisher","first-page":"6906","DOI":"10.1109\/IROS40897.2019.8968467","volume":"2019","author":"L Han","year":"2019","unstructured":"Han, L., Lin, Y., Du, G., Lian, S.: DeepVIO: Self-supervised deep learning of monocular visual inertial odometry using 3d geometric constraints, in. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019, 6906\u20136913 (2019). https:\/\/doi.org\/10.1109\/IROS40897.2019.8968467","journal-title":"IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)"},{"key":"2091_CR7","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1016\/j.cortex.2018.04.003","volume":"104","author":"N Abekawa","year":"2018","unstructured":"Abekawa, N., Ferr\u00e8, E.R., Gallagher, M., Gomi, H., Haggard, P.: Disentangling the visual, motor and representational effects of vestibular input. Cortex 104, 46\u201357 (2018)","journal-title":"Cortex"},{"issue":"7","key":"2091_CR8","doi-asserted-by":"publisher","first-page":"2295","DOI":"10.1007\/s00221-021-06119-3","volume":"239","author":"ER Ferr\u00e8","year":"2021","unstructured":"Ferr\u00e8, E.R., Alsmith, A.J., Haggard, P., Longo, M.R.: The vestibular system modulates the contributions of head and torso to egocentric spatial judgements. Exp. Brain Res. 239(7), 2295\u20132302 (2021)","journal-title":"Exp. Brain Res."},{"issue":"5","key":"2091_CR9","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1097\/WNR.0b013e328326f815","volume":"20","author":"G Clement","year":"2009","unstructured":"Clement, G., Fraysse, M.-J., Deguine, O.: Mental representation of space in vestibular patients with otolithic or rotatory vertigo. NeuroReport 20(5), 457\u2013461 (2009)","journal-title":"NeuroReport"},{"issue":"15","key":"2091_CR10","doi-asserted-by":"publisher","first-page":"894","DOI":"10.1097\/WNR.0b013e3283594705","volume":"23","author":"G Cl\u00e9ment","year":"2012","unstructured":"Cl\u00e9ment, G., Skinner, A., Richard, G., Lathan, C.: Geometric illusions in astronauts during long-duration spaceflight. NeuroReport 23(15), 894\u2013899 (2012)","journal-title":"NeuroReport"},{"issue":"1","key":"2091_CR11","first-page":"1334","volume":"17","author":"S Levine","year":"2016","unstructured":"Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1), 1334\u20131373 (2016)","journal-title":"The Journal of Machine Learning Research"},{"key":"2091_CR12","unstructured":"Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke V et\u00a0al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on robot learning, PMLR, pp. 651\u2013673 (2018)"},{"key":"2091_CR13","doi-asserted-by":"publisher","first-page":"5533","DOI":"10.1109\/IROS.2017.8206441","volume":"2017","author":"S Pillai","year":"2017","unstructured":"Pillai, S., Leonard, J.J.: Towards visual ego-motion learning in robots, in. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017, 5533\u20135540 (2017). https:\/\/doi.org\/10.1109\/IROS.2017.8206441","journal-title":"IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)"},{"key":"2091_CR14","doi-asserted-by":"publisher","unstructured":"Cereda, E., Ferri, M., Mantegazza, D., Zimmerman, N., Gambardella, L.M., Guzzi, J., Giusti, A., Palossi, D.: Improving\u00a0the generalization capability of DNNs for ultra-low power autonomous nano-UAVs. In: 2021 17th International conference on distributed computing in sensor systems (DCOSS), pp. 327\u2013334 (2021) https:\/\/doi.org\/10.1109\/DCOSS52077.2021.00060","DOI":"10.1109\/DCOSS52077.2021.00060"},{"key":"2091_CR15","doi-asserted-by":"publisher","first-page":"9689","DOI":"10.1109\/ICRA46639.2022.9812150","volume":"2022","author":"S Li","year":"2022","unstructured":"Li, S., De Wagter, C., De Croon, G.C.H.E.: Self-supervised monocular multi-robot relative localization with efficient deep neural networks, in. International Conference on Robotics and Automation (ICRA) 2022, 9689\u20139695 (2022). https:\/\/doi.org\/10.1109\/ICRA46639.2022.9812150","journal-title":"International Conference on Robotics and Automation (ICRA)"},{"key":"2091_CR16","doi-asserted-by":"crossref","unstructured":"Kaufmann, E., Gehrig, M., Foehn, P., Ranftl, R., Dosovitskiy, A., Koltun, V., Scaramuzza, D.: Beauty and the beast: Optimal methods meet learning for drone racing. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp. 690\u2013696 (2019)","DOI":"10.1109\/ICRA.2019.8793631"},{"issue":"3","key":"2091_CR17","doi-asserted-by":"publisher","first-page":"2539","DOI":"10.1109\/LRA.2018.2808368","volume":"3","author":"S Jung","year":"2018","unstructured":"Jung, S., Hwang, S., Shin, H., Shim, D.H.: Perception, guidance, and navigation for indoor autonomous drone racing using deep learning. IEEE Robotics and Automation Letters 3(3), 2539\u20132544 (2018)","journal-title":"IEEE Robotics and Automation Letters"},{"key":"2091_CR18","doi-asserted-by":"crossref","unstructured":"Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: IEEE\/RSJ international conference on intelligent robots and systems (IROS). IEEE 2017, 23\u201330 (2017)","DOI":"10.1109\/IROS.2017.8202133"},{"key":"2091_CR19","doi-asserted-by":"crossref","unstructured":"Zeng, A., Yu, K.-T., Song, S., Suo, D., Walker, E., Rodriguez, A., Xiao, J.: Multi-view self-supervised deep learning for 6D pose estimation in the Amazon picking challenge. In: IEEE international conference on robotics and automation (ICRA). IEEE 2017, 1383\u20131386 (2017)","DOI":"10.1109\/ICRA.2017.7989165"},{"issue":"4","key":"2091_CR20","doi-asserted-by":"publisher","first-page":"6693","DOI":"10.1109\/LRA.2021.3095269","volume":"6","author":"M Nava","year":"2021","unstructured":"Nava, M., Paolillo, A., Guzzi, J., Gambardella, L.M., Giusti, A.: Uncertainty-aware self-supervised learning of spatial perception tasks. IEEE Robotics and Automation Letters 6(4), 6693\u20136700 (2021)","journal-title":"IEEE Robotics and Automation Letters"},{"key":"2091_CR21","doi-asserted-by":"publisher","unstructured":"Shorten, C., Khoshgoftaar, T.: A survey on image data augmentation for deep learning. J. Big Data 6 (2019). https:\/\/doi.org\/10.1186\/s40537-019-0197-0","DOI":"10.1186\/s40537-019-0197-0"},{"key":"2091_CR22","unstructured":"Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. In: Advances in neural information processing systems, vol\u00a033, Curran Associates, Inc., pp 6256\u20136268 (2020)"},{"issue":"13","key":"2091_CR23","doi-asserted-by":"publisher","first-page":"7723","DOI":"10.1007\/s00521-020-05514-1","volume":"33","author":"Q Zheng","year":"2021","unstructured":"Zheng, Q., Zhao, P., Li, Y., Wang, H., Yang, Y.: Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput. Appl. 33(13), 7723\u20137745 (2021)","journal-title":"Neural Comput. Appl."},{"key":"2091_CR24","doi-asserted-by":"publisher","first-page":"688","DOI":"10.1109\/ICIP40778.2020.9190809","volume":"2020","author":"Y Wan","year":"2020","unstructured":"Wan, Y., Gao, W., Han, S., Wu, Y.: Boosting image-based localization via randomly geometric data augmentation, in. IEEE International Conference on Image Processing (ICIP) 2020, 688\u2013692 (2020). https:\/\/doi.org\/10.1109\/ICIP40778.2020.9190809","journal-title":"IEEE International Conference on Image Processing (ICIP)"},{"key":"2091_CR25","doi-asserted-by":"crossref","unstructured":"Guerry, J., Boulch, A., Le\u00a0Saux, B., Moras, J., Plyer, A., Filliat, D.: SnapNet-R: Consistent 3D multi-view semantic labeling for robotics. In: Proceedings of the IEEE international conference on computer vision (ICCV) Workshops, pp. 669\u2013678 (2017)","DOI":"10.1109\/ICCVW.2017.85"},{"key":"2091_CR26","doi-asserted-by":"crossref","unstructured":"Zoph, B., Cubuk, E.D., Ghiasi, G., Lin, T.-Y., Shlens, J., Le, Q.V.: Learning data augmentation strategies for object detection. In: European conference on computer vision, Springer, pp. 566\u2013583 (2020)","DOI":"10.1007\/978-3-030-58583-9_34"},{"key":"2091_CR27","unstructured":"Coleman, D., Sucan, I.\u00a0A., Chitta, S., Correll, N.: Reducing the barrier to entry of complex robotic software: a MoveIt! case study. J. Softw. Eng. Robot. (2014)"},{"key":"2091_CR28","doi-asserted-by":"publisher","unstructured":"Palossi, D., Conti, F., Benini, L.: An open source and open hardware deep learning-powered visual navigation engine for autonomous nano-uavs. In: 2019 15th International conference on distributed computing in sensor systems (DCOSS), pp. 604\u2013611 (2019). https:\/\/doi.org\/10.1109\/DCOSS.2019.00111","DOI":"10.1109\/DCOSS.2019.00111"},{"key":"2091_CR29","doi-asserted-by":"publisher","unstructured":"Gautschi, M., Schiavone, P.D., Traber, A., Loi, I., Pullini, A., Rossi, D., Flamand, E., G\u00fcrkaynak, F.K., Benini, L.: Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE Trans. Very Large Scale Integr. (VLSI) Systems 25(10) (2017). https:\/\/doi.org\/10.1109\/TVLSI.2017.2654506","DOI":"10.1109\/TVLSI.2017.2654506"},{"issue":"91","key":"2091_CR30","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1111\/0031-868X.00113","volume":"16","author":"TA Clarke","year":"1998","unstructured":"Clarke, T.A., Fryer, J.G.: The development of camera calibration methods and models. Photogram. Rec. 16(91), 51\u201366 (1998)","journal-title":"Photogram. Rec."},{"key":"2091_CR31","doi-asserted-by":"crossref","unstructured":"Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 2174\u20132182 (2017)","DOI":"10.1109\/ICCVW.2017.254"},{"key":"2091_CR32","unstructured":"Redmon, J., Farhadi, A.: https:\/\/arxiv.org\/abs\/1804.02767YOLOv3: An incremental improvement (2018). https:\/\/doi.org\/10.48550\/ARXIV.1804.02767. arXiv:1804.02767"}],"container-title":["Journal of Intelligent &amp; Robotic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10846-024-02091-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10846-024-02091-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10846-024-02091-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T10:11:38Z","timestamp":1721297498000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10846-024-02091-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,10]]},"references-count":32,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["2091"],"URL":"https:\/\/doi.org\/10.1007\/s10846-024-02091-6","relation":{},"ISSN":["1573-0409"],"issn-type":[{"value":"1573-0409","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,10]]},"assertion":[{"value":"5 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 April 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This is an observational study. No ethical approval is required for this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}},{"value":"Informed consent was obtained from all individual participants included in the study.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"The authors affirm that human research participants provided informed consent for publication of the image in Fig. 1.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to publish"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"58"}}