{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T19:05:43Z","timestamp":1769108743662,"version":"3.49.0"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,10,3]],"date-time":"2022-10-03T00:00:00Z","timestamp":1664755200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,10,3]],"date-time":"2022-10-03T00:00:00Z","timestamp":1664755200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000050","name":"National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["R01HL146619"],"award-info":[{"award-number":["R01HL146619"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J CARS"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Purpose<\/jats:title>\n                <jats:p>Articulated hand pose tracking is an under-explored problem that carries the potential for use in an extensive number of applications, especially in the medical domain. With a robust and accurate tracking system on surgical videos, the motion dynamics and movement patterns of the hands can be captured and analyzed for many rich tasks.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>In this work, we propose a novel hand pose estimation model, <jats:bold>CondPose<\/jats:bold>, which improves detection and tracking accuracy by incorporating a pose prior into its prediction. We show improvements over state-of-the-art methods which provide frame-wise independent predictions, by following a temporally guided approach that effectively leverages past predictions.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We collect <jats:italic>Surgical Hands<\/jats:italic>, the first dataset that provides multi-instance articulated hand pose annotations for videos. Our dataset provides over 8.1k annotated hand poses from publicly available surgical videos and bounding boxes, pose annotations, and tracking IDs to enable multi-instance tracking. When evaluated on <jats:italic>Surgical Hands<\/jats:italic>, we show our method outperforms the state-of-the-art approach using mean Average Precision, to measure pose estimation accuracy, and Multiple Object Tracking Accuracy, to assess pose tracking performance.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>In comparison to a frame-wise independent strategy, we show greater performance in detecting and tracking hand poses and more substantial impact on localization accuracy. This has positive implications in generating more accurate representations of hands in the scene to be used for targeted downstream tasks.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1007\/s11548-022-02761-6","type":"journal-article","created":{"date-parts":[[2022,10,3]],"date-time":"2022-10-03T15:02:55Z","timestamp":1664809375000},"page":"117-125","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Temporally guided articulated hand pose tracking in surgical videos"],"prefix":"10.1007","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4502-6012","authenticated-orcid":false,"given":"Nathan","family":"Louis","sequence":"first","affiliation":[]},{"given":"Luowei","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Steven J.","family":"Yule","sequence":"additional","affiliation":[]},{"given":"Roger D.","family":"Dias","sequence":"additional","affiliation":[]},{"given":"Milisa","family":"Manojlovich","sequence":"additional","affiliation":[]},{"given":"Francis D.","family":"Pagani","sequence":"additional","affiliation":[]},{"given":"Donald S.","family":"Likosky","sequence":"additional","affiliation":[]},{"given":"Jason J.","family":"Corso","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,10,3]]},"reference":[{"issue":"7","key":"2761_CR1","doi-asserted-by":"publisher","first-page":"2095","DOI":"10.31557\/APJCP.2019.20.7.2095","volume":"20","author":"M Malathi","year":"2019","unstructured":"Malathi M, Sinthia P (2019) Brain tumour segmentation using convolutional neural network with tensor flow. Asian Pac J Cancer Prev: APJCP 20(7):2095","journal-title":"Asian Pac J Cancer Prev: APJCP"},{"issue":"3","key":"2761_CR2","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1097\/ACM.0000000000002414","volume":"94","author":"RD Dias","year":"2019","unstructured":"Dias RD, Gupta A, Yule SJ (2019) Using machine learning to assess physician competence: a systematic review. Acad Med 94(3):427\u2013439","journal-title":"Acad Med"},{"key":"2761_CR3","doi-asserted-by":"crossref","unstructured":"Tao L, Elhamifar E, Khudanpur S, Hager GD, Vidal R (2012) Sparse hidden markov models for surgical gesture classification and skill evaluation. In: international conference on information processing in computer-assisted interventions. Springer, pp 167\u2013177","DOI":"10.1007\/978-3-642-30618-1_17"},{"issue":"7","key":"2761_CR4","doi-asserted-by":"publisher","first-page":"732","DOI":"10.1016\/j.media.2013.04.007","volume":"17","author":"L Zappella","year":"2013","unstructured":"Zappella L, B\u00e9jar B, Hager G, Vidal R (2013) Surgical gesture classification from video and kinematic data. Med Image Anal 17(7):732\u2013745","journal-title":"Med Image Anal"},{"key":"2761_CR5","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/j.artmed.2018.08.002","volume":"91","author":"G Forestier","year":"2018","unstructured":"Forestier G, Petitjean F, Senin P, Despinoy F, Huaulm\u00e9 A, Fawaz HI, Weber J, Idoumghar L, Muller P-A, Jannin P (2018) Surgical motion analysis using discriminative interpretable patterns. Artif Intell Med 91:3\u201311","journal-title":"Artif Intell Med"},{"issue":"09","key":"2761_CR6","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1115\/1.2015-Sep-7","volume":"137","author":"S Kumar","year":"2015","unstructured":"Kumar S, Ahmidi N, Hager G, Singhal P, Corso J, Krovi V (2015) Surgical performance assessment. Mech Eng 137(09):7\u201310","journal-title":"Mech Eng"},{"issue":"7","key":"2761_CR7","first-page":"1542","volume":"36","author":"D Sarikaya","year":"2017","unstructured":"Sarikaya D, Corso JJ, Guru KA (2017) Detection and localization of robotic tools in robot-assisted surgery videos using deep neural networks for region proposal and detection. IEEE TMI 36(7):1542\u20131549","journal-title":"IEEE TMI"},{"issue":"3","key":"2761_CR8","doi-asserted-by":"publisher","first-page":"2714","DOI":"10.1109\/LRA.2019.2917163","volume":"4","author":"E Colleoni","year":"2019","unstructured":"Colleoni E, Moccia S, Du X, De Momi E, Stoyanov D (2019) Deep learning based robotic tool detection and articulation estimation with spatio-temporal layers. IEEE Robot Autom Lett 4(3):2714\u20132721","journal-title":"IEEE Robot Autom Lett"},{"key":"2761_CR9","doi-asserted-by":"crossref","unstructured":"Ni Z-L, Bian G-B, Xie X-L, Hou Z-G, Zhou X-H, Zhou Y-J (2019) Rasnet: segmentation for tracking surgical instruments in surgical videos using refined attention segmentation network. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 5735\u20135738","DOI":"10.1109\/EMBC.2019.8856495"},{"issue":"6","key":"2761_CR10","first-page":"1059","volume":"14","author":"CI Nwoye","year":"2019","unstructured":"Nwoye CI, Mutter D, Marescaux J, Padoy N (2019) Weakly supervised convolutional lstm approach for tool tracking in laparoscopic videos. IJCARS 14(6):1059\u20131067","journal-title":"IJCARS"},{"key":"2761_CR11","doi-asserted-by":"crossref","unstructured":"Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B (2018) Posetrack: a benchmark for human pose estimation and tracking. In: IEEE CVPR, pp 5167\u20135176","DOI":"10.1109\/CVPR.2018.00542"},{"key":"2761_CR12","doi-asserted-by":"crossref","unstructured":"Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: ECCV, pp 466\u2013481","DOI":"10.1007\/978-3-030-01231-1_29"},{"key":"2761_CR13","unstructured":"Bertasius G, Feichtenhofer C, Tran D, Shi J, Torresani L (2019) Learning temporal pose estimation from sparsely-labeled videos. In: NeurIPS, pp 3027\u20133038"},{"key":"2761_CR14","doi-asserted-by":"crossref","unstructured":"Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE CVPR, pp 5693\u20135703","DOI":"10.1109\/CVPR.2019.00584"},{"key":"2761_CR15","doi-asserted-by":"crossref","unstructured":"Ning G, Pei J, Huang H (2020) Lighttrack: a generic framework for online top-down human pose tracking. In: IEEE CVPR workshops, pp 1034\u20131035","DOI":"10.1109\/CVPRW50498.2020.00525"},{"key":"2761_CR16","doi-asserted-by":"crossref","unstructured":"Wang M, Tighe J, Modolo D (2020) Combining detection and tracking for human pose estimation in videos. In: IEEE CVPR, pp 11088\u201311096","DOI":"10.1109\/CVPR42600.2020.01110"},{"key":"2761_CR17","doi-asserted-by":"crossref","unstructured":"Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE CVPR, pp 7291\u20137299","DOI":"10.1109\/CVPR.2017.143"},{"key":"2761_CR18","doi-asserted-by":"crossref","unstructured":"Raaj Y, Idrees H, Hidalgo G, Sheikh Y (2019) Efficient online multi-person 2d pose tracking with recurrent spatio-temporal affinity fields. In: IEEE CVPR, pp 4620\u20134628","DOI":"10.1109\/CVPR.2019.00475"},{"key":"2761_CR19","doi-asserted-by":"crossref","unstructured":"Jin S, Liu W, Ouyang W, Qian C (2019) Multi-person articulated tracking with spatial and temporal embeddings. In: IEEE CVPR, pp 5664\u20135673","DOI":"10.1109\/CVPR.2019.00581"},{"issue":"3","key":"2761_CR20","doi-asserted-by":"publisher","first-page":"201664","DOI":"10.1001\/jamanetworkopen.2020.1664","volume":"3","author":"S Khalid","year":"2020","unstructured":"Khalid S, Goldenberg M, Grantcharov T, Taati B, Rudzicz F (2020) Evaluation of deep learning models for identifying surgical actions and measuring performance. JAMA Netw Open 3(3):201664\u2013201664","journal-title":"JAMA Netw Open"},{"key":"2761_CR21","doi-asserted-by":"crossref","unstructured":"Jin A, Yeung S, Jopling J, Krause J, Azagury D, Milstein A, Fei-Fei L (2018) Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. In: 2018 IEEE WACV, IEEE, pp 691\u2013699","DOI":"10.1109\/WACV.2018.00081"},{"key":"2761_CR22","doi-asserted-by":"crossref","unstructured":"Laina I, Rieke N, Rupprecht C, Vizca\u00edno JP, Eslami A, Tombari F, Navab N (2017)Concurrent segmentation and localization for tracking of surgical instruments. In: MICCAI. Springer, pp 664\u2013672","DOI":"10.1007\/978-3-319-66185-8_75"},{"issue":"5","key":"2761_CR23","first-page":"1276","volume":"37","author":"X Du","year":"2018","unstructured":"Du X, Kurmann T, Chang P-L, Allan M, Ourselin S, Sznitman R, Kelly JD, Stoyanov D (2018) Articulated multi-instrument 2-d pose estimation using fully convolutional networks. IEEE TMI 37(5):1276\u20131287","journal-title":"IEEE TMI"},{"key":"2761_CR24","doi-asserted-by":"crossref","unstructured":"Richa R, Balicki M, Meisner E, Sznitman R, Taylor R, Hager G (2011) Visual tracking of surgical tools for proximity detection in retinal surgery. In: international conference on information processing in computer-assisted interventions. Springer, pp 55\u201366","DOI":"10.1007\/978-3-642-21504-9_6"},{"issue":"5","key":"2761_CR25","doi-asserted-by":"publisher","first-page":"1263","DOI":"10.1109\/TPAMI.2012.209","volume":"35","author":"R Sznitman","year":"2012","unstructured":"Sznitman R, Richa R, Taylor RH, Jedynak B, Hager GD (2012) Unified detection and tracking of instruments during retinal microsurgery. IEEE PAMI 35(5):1263\u20131273","journal-title":"IEEE PAMI"},{"key":"2761_CR26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2008\/246309","volume":"2008","author":"K Bernardin","year":"2008","unstructured":"Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Video Process 2008:1\u201310","journal-title":"EURASIP J Image Video Process"},{"key":"2761_CR27","unstructured":"Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: international conference on learning representations"},{"key":"2761_CR28","doi-asserted-by":"crossref","unstructured":"Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: IEEE CVPR, pp 1145\u20131153","DOI":"10.1109\/CVPR.2017.494"},{"issue":"10","key":"2761_CR29","doi-asserted-by":"publisher","first-page":"11488","DOI":"10.1109\/JSEN.2020.3018172","volume":"21","author":"N Santavas","year":"2020","unstructured":"Santavas N, Kansizoglou I, Bampis L, Karakasis E, Gasteratos A (2020) Attention! a lightweight 2d hand pose estimation approach. IEEE Sens J 21(10):11488\u201311496","journal-title":"IEEE Sens J"},{"key":"2761_CR30","doi-asserted-by":"crossref","unstructured":"Zimmermann C, Ceylan D, Yang J, Russell B, Argus M, Brox T (2019) Freihand: a dataset for markerless capture of hand pose and shape from single rgb images. In: IEEE ICCV, pp 813\u2013822","DOI":"10.1109\/ICCV.2019.00090"},{"key":"2761_CR31","doi-asserted-by":"crossref","unstructured":"Zhang J, Jiao J, Chen M, Qu L, Xu X, Yang Q (2017) A hand pose tracking benchmark from stereo matching. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 982\u2013986","DOI":"10.1109\/ICIP.2017.8296428"},{"key":"2761_CR32","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/j.imavis.2018.12.001","volume":"81","author":"F Gomez-Donoso","year":"2019","unstructured":"Gomez-Donoso F, Orts-Escolano S, Cazorla M (2019) Large-scale multiview 3d hand pose dataset. IVC 81:25\u201333","journal-title":"IVC"},{"key":"2761_CR33","unstructured":"Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: IEEE CVPR. IEEE, vol 2, pp 1735\u20131742"},{"key":"2761_CR34","doi-asserted-by":"crossref","unstructured":"Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: thirty-second AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"2761_CR35","doi-asserted-by":"crossref","unstructured":"Shan D, Geng J, Shu M, Fouhey DF (2020) Understanding human hands in contact at internet scale. In: IEEE CVPR, pp 9869\u20139878","DOI":"10.1109\/CVPR42600.2020.00989"}],"container-title":["International Journal of Computer Assisted Radiology and Surgery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-022-02761-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11548-022-02761-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-022-02761-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T19:54:01Z","timestamp":1674849241000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11548-022-02761-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,3]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["2761"],"URL":"https:\/\/doi.org\/10.1007\/s11548-022-02761-6","relation":{},"ISSN":["1861-6429"],"issn-type":[{"value":"1861-6429","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,3]]},"assertion":[{"value":"17 March 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 September 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 October 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}},{"value":"This article does not contain any studies with human participants or animals performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"This article does not contain patient data.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed consent"}}]}}