{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,9]],"date-time":"2026-02-09T05:11:34Z","timestamp":1770613894979,"version":"3.49.0"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2021,4,21]],"date-time":"2021-04-21T00:00:00Z","timestamp":1618963200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,4,21]],"date-time":"2021-04-21T00:00:00Z","timestamp":1618963200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003006","name":"ETH Zurich","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003006","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J CARS"],"published-print":{"date-parts":[[2021,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Purpose:\u00a0<\/jats:title>\n                <jats:p>Tracking of tools and surgical activity is becoming more and more important in the context of computer assisted surgery. In this work, we present a data generation framework, dataset and baseline methods to facilitate further research in the direction of markerless hand and instrument pose estimation in realistic surgical scenarios.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods:\u00a0<\/jats:title>\n                <jats:p>We developed a rendering pipeline to create inexpensive and realistic synthetic data for model pretraining.  Subsequently, we propose a pipeline to capture and label real data with hand and object pose ground truth in an experimental setup to gather high-quality real data. We furthermore present three state-of-the-art RGB-based pose estimation baselines.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results:\u00a0<\/jats:title>\n                <jats:p>We evaluate three baseline models on the proposed datasets. The best performing baseline achieves an average tool 3D vertex error of 16.7 mm on synthetic data as well as 13.8 mm on real data which is comparable to the state-of-the art in RGB-based hand\/object pose estimation.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion:\u00a0<\/jats:title>\n                <jats:p>To the best of our knowledge, we propose the first synthetic and real data generation pipelines to generate hand and object pose labels for open surgery. We present three baseline models for RGB based object and object\/hand pose estimation based on RGB frames. Our realistic synthetic data generation pipeline may contribute to overcome the data bottleneck in the surgical domain and can easily be transferred to other medical applications.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1007\/s11548-021-02369-2","type":"journal-article","created":{"date-parts":[[2021,4,21]],"date-time":"2021-04-21T05:03:23Z","timestamp":1618981403000},"page":"799-808","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":38,"title":["Towards markerless surgical tool and hand pose estimation"],"prefix":"10.1007","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0670-0828","authenticated-orcid":false,"given":"Jonas","family":"Hein","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9960-3244","authenticated-orcid":false,"given":"Matthias","family":"Seibold","sequence":"additional","affiliation":[]},{"given":"Federica","family":"Bogo","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7190-1127","authenticated-orcid":false,"given":"Mazda","family":"Farshad","sequence":"additional","affiliation":[]},{"given":"Marc","family":"Pollefeys","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6484-6206","authenticated-orcid":false,"given":"Philipp","family":"F\u00fcrnstahl","sequence":"additional","affiliation":[]},{"given":"Nassir","family":"Navab","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,4,21]]},"reference":[{"key":"2369_CR1","doi-asserted-by":"crossref","unstructured":"Allan M, Chang PL, Ourselin S, Hawkes DJ, Sridhar A, Kelly J, Stoyanov D (2015) Image based surgical instrument pose estimation with multi-class labelling and optical flow. In: International conference on medical image computing and computer\u2014assisted intervention, pp 331\u2013338","DOI":"10.1007\/978-3-319-24553-9_41"},{"key":"2369_CR2","doi-asserted-by":"crossref","unstructured":"Allotta B, Giacalone G, Rinaldi L (1997) A hand-held drilling tool for orthopedic surgery. In: IEEE\/ASME transactions on mechatronics 2","DOI":"10.1109\/3516.653046"},{"key":"2369_CR3","doi-asserted-by":"crossref","unstructured":"Amparore D, Checcucci E, Gribaudo M, Piazzolla P, Porpiglia F, Vezzetti E (2020) Non-linear-optimization using sqp for 3d deformable prostate model pose estimation in minimally invasive surgery. Advances in Computer Vision. CVC 2019. Adv Intell Syst Comput 943","DOI":"10.1007\/978-3-030-17795-9_35"},{"key":"2369_CR4","doi-asserted-by":"crossref","unstructured":"Brachmann E, Michel F, Krull A, Yang M.Y, Gumhold S, Rother C (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3364\u20133372","DOI":"10.1109\/CVPR.2016.366"},{"key":"2369_CR5","doi-asserted-by":"crossref","unstructured":"Chetverikov D, Svirko D, Stepanov D, Krsek P (2002) The trimmed iterative closest point algorithm. In: Object recognition supported by user interaction for service robots, Vol.\u00a03. IEEE, pp 545\u2013548","DOI":"10.1109\/ICPR.2002.1047997"},{"key":"2369_CR6","unstructured":"Do TT, Cai M, Pham T, Reid I (2018) Deep-6dpose: recovering 6d object pose from a single rgb image. arXiv preprint arXiv:1802.10367"},{"issue":"1\u20133","key":"2369_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3109\/10929081003647239","volume":"15","author":"R Elfring","year":"2010","unstructured":"Elfring R, de la Fuente M, Radermacher K (2010) Assessment of optical localizer accuracy for computer aided surgery systems. Comput Aid Surg 15(1\u20133):1\u201312","journal-title":"Comput Aid Surg"},{"key":"2369_CR8","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1038\/s41591-018-0316-z","volume":"25","author":"A Esteva","year":"2019","unstructured":"Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25:24\u201329","journal-title":"Nat Med"},{"key":"2369_CR9","doi-asserted-by":"publisher","first-page":"730","DOI":"10.1016\/j.spinee.2019.12.013","volume":"20","author":"M Farshad","year":"2020","unstructured":"Farshad M, Aichmair A, Gerber C, Bauer DE (2020) Classification of perioperative complications in spine surgery. Spine J 20:730\u2013736","journal-title":"Spine J"},{"key":"2369_CR10","doi-asserted-by":"publisher","first-page":"1625","DOI":"10.1016\/j.spinee.2018.02.003","volume":"18","author":"M Farshad","year":"2018","unstructured":"Farshad M, Bauer DE, Wechsler C, Gerber C, Aichmair A (2018) Risk factors for perioperative morbidity in spine surgeries of different complexities: a multivariate analysis of 1009 consecutive patients. Spine J 18:1625\u20131631","journal-title":"Spine J"},{"issue":"10","key":"2369_CR11","doi-asserted-by":"publisher","first-page":"872","DOI":"10.1177\/000313481608201002","volume":"82","author":"B Genovese","year":"2016","unstructured":"Genovese B, Yin S, Sareh S, DeVirgilio M, Mukdad L, Davis J, Santos VJ, Benharash P (2016) Surgical hand tracking in open surgery using a versatile motion sensing system: Are we there yet? Am Surg 82(10):872\u2013875","journal-title":"Am Surg"},{"key":"2369_CR12","doi-asserted-by":"publisher","first-page":"2327","DOI":"10.1007\/s00701-016-2981-3","volume":"158","author":"J Halliday","year":"2016","unstructured":"Halliday J, Kamaly I (2016) Use of the brainlab disposable stylet for endoscope and peel-away navigation. Acta Neurochirurgica 158:2327\u20132331","journal-title":"Acta Neurochirurgica"},{"key":"2369_CR13","doi-asserted-by":"crossref","unstructured":"Hampali S, Rad M, Oberweger M, Lepetit V (2019) Honnotate: a method for 3d annotation of hand and objects poses","DOI":"10.1109\/CVPR42600.2020.00326"},{"key":"2369_CR14","doi-asserted-by":"crossref","unstructured":"Hasson Y, Tekin B, Bogo F, Laptev I, Pollefeys M, Schmid C (2020) Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR42600.2020.00065"},{"key":"2369_CR15","doi-asserted-by":"crossref","unstructured":"Hasson Y, Varol G, Tzionas D, Kalevatykh I, Black MJ, Laptev I, Schmid C (2019) Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11807\u201311816","DOI":"10.1109\/CVPR.2019.01208"},{"key":"2369_CR16","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"2369_CR17","doi-asserted-by":"crossref","unstructured":"Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision. Springer, pp 548\u2013562","DOI":"10.1007\/978-3-642-37331-2_42"},{"key":"2369_CR18","doi-asserted-by":"crossref","unstructured":"Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521\u20131529","DOI":"10.1109\/ICCV.2017.169"},{"issue":"2","key":"2369_CR19","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1007\/s11263-008-0152-6","volume":"81","author":"V Lepetit","year":"2009","unstructured":"Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: an accurate o (n) solution to the pnp problem. Int J Comput Vis 81(2):155","journal-title":"Int J Comput Vis"},{"key":"2369_CR20","doi-asserted-by":"publisher","first-page":"1157","DOI":"10.1007\/s11548-019-01973-7","volume":"14","author":"F Liebmann","year":"2019","unstructured":"Liebmann F, Roner S, von Atzigen M, Scaramuzza D, Sutter R, Snedeker J, Farshad M, F\u00fcrnstahl P (2019) Pedicle screw navigation using surface digitization on the microsoft hololens. Int J Comput Assist Radiol Surg 14:1157\u20131165","journal-title":"Int J Comput Assist Radiol Surg"},{"issue":"6","key":"2369_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2816795.2818013","volume":"34","author":"M Loper","year":"2015","unstructured":"Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: a skinned multi-person linear model. ACM Trans Graph (TOG) 34(6):1\u201316","journal-title":"ACM Trans Graph (TOG)"},{"issue":"7","key":"2369_CR22","doi-asserted-by":"publisher","first-page":"813","DOI":"10.1243\/09544119JEIM268","volume":"221","author":"P Merloz","year":"2007","unstructured":"Merloz P, Troccaz J, Vouaillat H, Vasile C, Tonetti J, Eid A, Plaweski S (2007) Fluoroscopy-based navigation system in spine surgery. Proc Inst Mech Eng Part H J Eng Med 221(7):813\u2013820","journal-title":"Proc Inst Mech Eng Part H J Eng Med"},{"issue":"4","key":"2369_CR23","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1109\/MRA.2004.1371616","volume":"11","author":"AT Miller","year":"2004","unstructured":"Miller AT, Allen PK (2004) Graspit! a versatile simulator for robotic grasping. IEEE Robot Autom Mag 11(4):110\u2013122","journal-title":"IEEE Robot Autom Mag"},{"issue":"7","key":"2369_CR24","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1109\/MC.2012.75","volume":"45","author":"N Navab","year":"2012","unstructured":"Navab N, Blum T, Wang L, Okur A, Wendler T (2012) First deployments of augmented reality in operating rooms. Computer 45(7):48\u201355","journal-title":"Computer"},{"issue":"2","key":"2369_CR25","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1080\/13645706.2019.1584116","volume":"28","author":"N Padoy","year":"2018","unstructured":"Padoy N (2018) Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 28(2):82\u201390","journal-title":"Minim Invasive Ther Allied Technol"},{"key":"2369_CR26","doi-asserted-by":"crossref","unstructured":"Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4561\u20134570","DOI":"10.1109\/CVPR.2019.00469"},{"issue":"5","key":"2369_CR27","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1049\/htl.2018.5065","volume":"5","author":"L Qian","year":"2018","unstructured":"Qian L, Deguet A, Kazanzides P (2018) Arssist: augmented reality on a head-mounted display for the first assistant in robotic surgery. Healthc Technol Lett 5(5):194\u2013200","journal-title":"Healthc Technol Lett"},{"key":"2369_CR28","doi-asserted-by":"crossref","unstructured":"Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263\u20137271","DOI":"10.1109\/CVPR.2017.690"},{"issue":"6","key":"2369_CR29","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1145\/3130800.3130883","volume":"36","author":"J Romero","year":"2017","unstructured":"Romero J, Tzionas D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph (ToG) 36(6):245","journal-title":"ACM Trans Graph (ToG)"},{"key":"2369_CR30","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"2369_CR31","doi-asserted-by":"crossref","unstructured":"Sahiner B, Pezeshk A, Hadjiiski LM, Wang X, Drukker K, Cha KH, Summers RM, Giger ML (2019) Deep learning in medical imaging and radiation therapy. Med Phys 46","DOI":"10.1002\/mp.13264"},{"issue":"5","key":"2369_CR32","doi-asserted-by":"publisher","first-page":"599","DOI":"10.1177\/1553350619853099","volume":"26","author":"TJ Saun","year":"2019","unstructured":"Saun TJ, Zuo KJ, Grantcharov TP (2019) Video technologies for recording open surgery: a systematic review. Surg Innov 26(5):599\u2013612","journal-title":"Surg Innov"},{"key":"2369_CR33","doi-asserted-by":"crossref","unstructured":"Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2011.5995316"},{"key":"2369_CR34","doi-asserted-by":"crossref","unstructured":"Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2017.494"},{"key":"2369_CR35","doi-asserted-by":"crossref","unstructured":"Tekin B, Bogo F, Pollefeys M (2019) H+o: unified egocentric recognition of 3d hand-object poses and interactions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4511\u20134520","DOI":"10.1109\/CVPR.2019.00464"},{"key":"2369_CR36","doi-asserted-by":"crossref","unstructured":"Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshops","DOI":"10.1109\/CVPRW.2018.00143"},{"key":"2369_CR37","doi-asserted-by":"crossref","unstructured":"Varol G, Romero J, Martin X, Mahmood N, Black MJ, Laptev I, Schmid C (2017) Learning from synthetic humans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 109\u2013117","DOI":"10.1109\/CVPR.2017.492"},{"key":"2369_CR38","doi-asserted-by":"crossref","unstructured":"Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199","DOI":"10.15607\/RSS.2018.XIV.019"},{"key":"2369_CR39","doi-asserted-by":"crossref","unstructured":"Zwingmann J, Konrad G, Kotter E, S\u00fcdkamp NP (1833) Oberst M (2009) Computer-navigated iliosacral screw insertion reduces malposition rate and radiation exposure. Clin Orthop Relat Res 467(7)","DOI":"10.1007\/s11999-008-0632-6"}],"container-title":["International Journal of Computer Assisted Radiology and Surgery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-021-02369-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11548-021-02369-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-021-02369-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,20]],"date-time":"2021-05-20T09:01:17Z","timestamp":1621501277000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11548-021-02369-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,21]]},"references-count":39,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,5]]}},"alternative-id":["2369"],"URL":"https:\/\/doi.org\/10.1007\/s11548-021-02369-2","relation":{},"ISSN":["1861-6410","1861-6429"],"issn-type":[{"value":"1861-6410","type":"print"},{"value":"1861-6429","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,21]]},"assertion":[{"value":"10 March 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 April 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 April 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Mazda Farshad is shareholder and member of the board of directors of Incremed AG, a company developing mixed-reality applications. All other authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}},{"value":"The real surgical background images used in this work were obtained from fully anonymized data not subject to ethical approval.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"Patients gave informed consent that health-related data, such as mentioned above, can be used for research purposes.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed consent"}}]}}