{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T10:01:48Z","timestamp":1777716108369,"version":"3.51.4"},"reference-count":52,"publisher":"SAGE Publications","issue":"10-11","license":[{"start":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T00:00:00Z","timestamp":1721865600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"University of Delaware (UD) College of Engineering, the NSF","award":["IIS-1924897, MRI-2018905, SCH-2014264"],"award-info":[{"award-number":["IIS-1924897, MRI-2018905, SCH-2014264"]}]},{"name":"Google ARCore"},{"name":"NASA DE Space Grant Graduate Fellowship"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:p>\n                    In monocular visual-inertial navigation, it is desirable to initialize the system as quickly and robustly as possible. A state-of-the-art initialization method typically constructs a linear system to find a closed-form solution using the image features and inertial measurements and then refines the states with a nonlinear optimization. These methods generally require a few seconds of data, which however can be expedited (less than a second) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further accelerate this process, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. Importantly, we show that the typical estimation of\n                    <jats:italic toggle=\"yes\">all<\/jats:italic>\n                    feature states independently in the closed-form solution can be modeled as estimating\n                    <jats:italic toggle=\"yes\">only<\/jats:italic>\n                    the scale and bias parameters of the learned depth map. As such, our formulation enables building a smaller minimal problem than the state of the art, which can be seamlessly integrated into RANSAC for robust estimation. Experiments show that our method has state-of-the-art initialization performance in simulation as well as on popular real-world datasets (TUM-VI, and EuRoC MAV). For the TUM-VI dataset in simulation as well as real-world, we demonstrate the superior initialization performance with only a 0.3\u00a0s window of data, which is the smallest ever reported, and validate that our method can initialize more often, robustly, and accurately in different challenging scenarios.\n                  <\/jats:p>","DOI":"10.1177\/02783649241262452","type":"journal-article","created":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T06:11:06Z","timestamp":1721887866000},"page":"1619-1647","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":5,"title":["Fast and robust learned single-view depth-aided monocular visual-inertial initialization"],"prefix":"10.1177","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0579-7835","authenticated-orcid":false,"given":"Nathaniel","family":"Merrill","sequence":"first","affiliation":[{"name":"University of Delaware"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2179-3447","authenticated-orcid":false,"given":"Patrick","family":"Geneva","sequence":"additional","affiliation":[{"name":"University of Delaware"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Saimouli","family":"Katragadda","sequence":"additional","affiliation":[{"name":"University of Delaware"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6903-6405","authenticated-orcid":false,"given":"Chuchu","family":"Chen","sequence":"additional","affiliation":[{"name":"University of Delaware"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guoquan","family":"Huang","sequence":"additional","affiliation":[{"name":"University of Delaware"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2024,7,25]]},"reference":[{"key":"e_1_3_5_2_1","unstructured":"Agarwal S Mierle K Team TCS (2023) Ceres Solver. URL https:\/\/github.com\/ceres-solver\/ceres-solver"},{"key":"e_1_3_5_3_1","first-page":"1411","article-title":"Vision-based navigation for the nasa mars helicopter","author":"Bayard DS","year":"2019","unstructured":"Bayard DS, Conway DT, Brockers R, et al. (2019) Vision-based navigation for the nasa mars helicopter. AIAA Scitech 2019 Forum: 1411.","journal-title":"AIAA Scitech 2019 Forum"},{"key":"e_1_3_5_4_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364917728574"},{"key":"e_1_3_5_5_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364915620033"},{"key":"e_1_3_5_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793718"},{"key":"e_1_3_5_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9197334"},{"key":"e_1_3_5_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2021.3075644"},{"key":"e_1_3_5_9_1","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2020.00068"},{"key":"e_1_3_5_10_1","doi-asserted-by":"publisher","DOI":"10.2514\/4.866463"},{"key":"e_1_3_5_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9982263"},{"key":"e_1_3_5_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2021.3106505"},{"key":"e_1_3_5_13_1","volume-title":"Technical report, Dept. of Electrical Engineering","author":"Dong-Si TC","year":"2011","unstructured":"Dong-Si TC, Mourikis AI (2011) Closed-form solutions for vision-aided inertial navigation. In: Technical report, Dept. of Electrical Engineering. Riverside: University of California. URL: http:\/\/tdongsi.github.io\/download\/pubs\/2011_VIO_Init_TR.pdf"},{"key":"e_1_3_5_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386235"},{"key":"e_1_3_5_15_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364919835021"},{"key":"e_1_3_5_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2019.2918677"},{"key":"e_1_3_5_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3057564"},{"key":"e_1_3_5_18_1","doi-asserted-by":"crossref","unstructured":"Forster C Carlone L Dellaert F et al. (2015) Imu preintegration on manifold for efficient visual-inertial maximum-a-posteriori estimation Robotics: Science and Systems XI Daegu Republic of Korea July 10 - July 14 2023.","DOI":"10.15607\/RSS.2015.XI.006"},{"key":"e_1_3_5_19_1","volume-title":"Openvins State Initialization: Details and Derivations","author":"Geneva P","year":"2022","unstructured":"Geneva P, Huang G (2022) Openvins State Initialization: Details and Derivations. Newark: University of Delaware. Available: https:\/\/pgeneva.com\/downloads\/reports\/tr_init.pdf"},{"key":"e_1_3_5_20_1","doi-asserted-by":"crossref","unstructured":"Geneva P Eckenhoff K Huang G (2019) A linear-complexity EKF for visual-inertial navigation with loop closures Proc. International Conference on Robotics and Automation Montreal Canada 25-25 April 1997.","DOI":"10.1109\/ICRA.2019.8793836"},{"key":"e_1_3_5_21_1","doi-asserted-by":"crossref","unstructured":"Geneva P Eckenhoff K Lee W et al. (2020) OpenVINS: a research platform for visual-inertial estimation. Proc. Of the IEEE International Conference on Robotics and Automation Paris France 25-25 April 1997. https:\/\/github.com\/rpng\/open_vins.","DOI":"10.1109\/ICRA40945.2020.9196524"},{"key":"e_1_3_5_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2013.2277549"},{"key":"e_1_3_5_23_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913509675"},{"key":"e_1_3_5_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00545"},{"key":"e_1_3_5_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793604"},{"key":"e_1_3_5_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2016.2521413"},{"key":"e_1_3_5_27_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364914554813"},{"key":"e_1_3_5_28_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913481251"},{"key":"e_1_3_5_29_1","unstructured":"Li M Mourikis AI (2014) A convex formulation for motion estimation using visual and inertial sensorsIn: Proceedings of the Workshop on Multi-View Geometry Held in Conjunction with RSS Berkeley CA July 2014."},{"key":"e_1_3_5_30_1","doi-asserted-by":"crossref","unstructured":"Liu S Nie X Hamid R (2022) Depth-guided sparse structure-from-motion for movies and tv shows. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Seattle WA USA Jun 21st 2024 15980\u201315989.","DOI":"10.1109\/CVPR52688.2022.01551"},{"key":"e_1_3_5_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2011.2170332"},{"key":"e_1_3_5_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2011.2160468"},{"key":"e_1_3_5_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0647-7"},{"key":"e_1_3_5_34_1","doi-asserted-by":"crossref","unstructured":"Merrill N Geneva P Katragadda S et al. (2023) Fast monocular visual-inertial initialization leveraging learned single-view depth. In: Proc. Robotics: Science and Systems (RSS) Delft Netherlands Jul 15 \u2013 Jul 19 2024.","DOI":"10.15607\/RSS.2023.XIX.072"},{"key":"e_1_3_5_35_1","doi-asserted-by":"crossref","unstructured":"Mourikis AI Roumeliotis SI (2007) A multi-state constraint Kalman filter for vision-aided inertial navigation. In: Proceedings of the IEEE International Conference on Robotics and Automation. Rome Italy 13 May - 17 May 2024 3565\u20133572.","DOI":"10.1109\/ROBOT.2007.364024"},{"key":"e_1_3_5_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2017.2705103"},{"key":"e_1_3_5_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2017.2653359"},{"key":"e_1_3_5_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2004.17"},{"key":"e_1_3_5_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2017.2699790"},{"key":"e_1_3_5_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8206284"},{"key":"e_1_3_5_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2018.2853729"},{"key":"e_1_3_5_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3019967"},{"key":"e_1_3_5_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2018.8593419"},{"key":"e_1_3_5_44_1","volume-title":"Technical Report","author":"Trawny N","year":"2005","unstructured":"Trawny N, Roumeliotis SI (2005) Indirect Kalman filter for 3D attitude estimation. In: Technical Report. Minnesota, USA: University of Minnesota, Dept. of Comp. Sci. & Eng."},{"key":"e_1_3_5_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2019.2961227"},{"key":"e_1_3_5_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989603"},{"key":"e_1_3_5_47_1","doi-asserted-by":"crossref","unstructured":"Yang L Kang B Huang Z et al. (2024) Depth anything: unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Long Beach CA USA 13-19 June 2020.","DOI":"10.1109\/CVPR52733.2024.00987"},{"key":"e_1_3_5_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2018.8593941"},{"key":"e_1_3_5_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2833152"},{"key":"e_1_3_5_50_1","doi-asserted-by":"publisher","DOI":"10.3390\/s22093389"},{"key":"e_1_3_5_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20047-2_32"},{"key":"e_1_3_5_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3091407"},{"key":"e_1_3_5_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9560792"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649241262452","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/02783649241262452","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649241262452","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:17:22Z","timestamp":1777457842000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/02783649241262452"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,25]]},"references-count":52,"journal-issue":{"issue":"10-11","published-print":{"date-parts":[[2025,9]]}},"alternative-id":["10.1177\/02783649241262452"],"URL":"https:\/\/doi.org\/10.1177\/02783649241262452","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,25]]}}}