{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T12:11:48Z","timestamp":1760962308719,"version":"build-2065373602"},"reference-count":26,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T00:00:00Z","timestamp":1760486400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Accurate segmentation of surgical instruments in endoscopic videos is crucial for robot-assisted surgery and intraoperative analysis. This paper presents a Segment-then-Classify framework that decouples mask generation from semantic classification to enhance spatial completeness and temporal stability. First, a Mask2Former-based segmentation backbone generates class-agnostic instance masks and region features. Then, a bounding box-guided instance-level spatiotemporal modeling module fuses geometric priors and temporal consistency through a lightweight transformer encoder. This design improves interpretability and robustness under occlusion and motion blur. Experiments on the EndoVis 2017 and 2018 datasets demonstrate that our framework achieves mIoU improvements of 3.06%, 2.99%, and 1.67% and mcIoU gains of 2.36%, 2.85%, and 6.06%, respectively, over previously state-of-the-art methods, while maintaining computational efficiency.<\/jats:p>","DOI":"10.3390\/jimaging11100364","type":"journal-article","created":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T11:57:42Z","timestamp":1760529462000},"page":"364","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Surgical Instrument Segmentation via Segment-Then-Classify Framework with Instance-Level Spatiotemporal Consistency Modeling"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-7013-5094","authenticated-orcid":false,"given":"Tiyao","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Automation and Intelligence, Beijing Jiaotong University, Beijing 100044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xue","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Automation and Intelligence, Beijing Jiaotong University, Beijing 100044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongze","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Automation and Intelligence, Beijing Jiaotong University, Beijing 100044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1186\/s13017-016-0102-5","article-title":"Laparoscopic versus open appendectomy: A retrospective cohort study assessing outcomes and cost-effectiveness","volume":"11","author":"Biondi","year":"2016","journal-title":"World J. Emerg. Surg."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"17626","DOI":"10.3748\/wjg.v20.i46.17626","article-title":"Meta-analysis of laparoscopic vs open cholecystectomy in elderly patients","volume":"20","author":"Antoniou","year":"2014","journal-title":"World J. Gastroenterol. WJG"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2088","DOI":"10.1007\/s11605-018-3883-x","article-title":"A comparison of open and minimally invasive surgery for hepatic and pancreatic resections among the medicare population","volume":"22","author":"Chen","year":"2018","journal-title":"J. Gastrointest. Surg."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1109\/JPROC.2022.3180350","article-title":"Robot-assisted minimally invasive surgery\u2014Surgical robotics in the data age","volume":"110","author":"Haidegger","year":"2022","journal-title":"Proc. IEEE"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"102306","DOI":"10.1016\/j.media.2021.102306","article-title":"Surgical data science\u2013from concepts toward clinical translation","volume":"76","author":"Eisenmann","year":"2022","journal-title":"Med. Image Anal."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1007\/s11548-018-1735-5","article-title":"Automated surgical skill assessment in RMIS training","volume":"13","author":"Zia","year":"2018","journal-title":"Int. J. Comput. Assist. Radiol. Surg."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1109\/TMI.2016.2593957","article-title":"Endonet: A deep architecture for recognition tasks on laparoscopic videos","volume":"36","author":"Twinanda","year":"2016","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Long, Y., Li, Z., Yee, C.H., Ng, C.F., Taylor, R.H., Unberath, M., and Dou, Q. (October, January 27). E-dssr: Efficient dynamic surgical scene reconstruction with transformer-based stereoscopic depth perception. Proceedings of the Medical Image Computing and Computer Assisted Intervention\u2014MICCAI 2021: 24th International Conference, Strasbourg, France. Proceedings, Part IV 24.","DOI":"10.1007\/978-3-030-87202-1_40"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"102008","DOI":"10.1016\/j.media.2021.102008","article-title":"A shape-constraint adversarial framework with instance-normalized spatio-temporal features for inter-fetal membrane segmentation","volume":"70","author":"Casella","year":"2021","journal-title":"Med. Image Anal."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1807","DOI":"10.1007\/s11548-020-02242-8","article-title":"Deep learning-based fetoscopic mosaicking for field-of-view expansion","volume":"15","author":"Bano","year":"2020","journal-title":"Int. J. Comput. Assist. Radiol. Surg."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2603","DOI":"10.1109\/TMI.2015.2450831","article-title":"Detecting surgical tools by modelling local appearance and global shape","volume":"34","author":"Bouget","year":"2015","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/j.media.2016.05.003","article-title":"Real-time localization of articulated surgical instruments in retinal microsurgery","volume":"34","author":"Rieke","year":"2016","journal-title":"Med. Image Anal."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Garcia-Peraza-Herrera, L.C., Li, W., Fidon, L., Gruijthuijsen, C., Devreker, A., Attilakos, G., Deprest, J., Poorten, E.V., Stoyanov, D., and Vercauteren, T. (2017, January 24\u201328). Toolnet: Holistically-nested real-time segmentation of robotic surgical tools. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206462"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Laina, I., Rieke, N., Rupprecht, C., Vizca\u00edno, J.P., Eslami, A., Tombari, F., and Navab, N. (2017, January 11\u201313). Concurrent segmentation and localization for tracking of surgical instruments. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2014MICCAI 2017: 20th International Conference, Quebec City, QC, Canada. Proceedings, Part II 20.","DOI":"10.1007\/978-3-319-66185-8_75"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Milletari, F., Rieke, N., Baust, M., Esposito, M., and Navab, N. (2018, January 16\u201320). CFCM: Segmentation via coarse to fine context memory. Proceedings of the Medical Image Computing and Computer Assisted Intervention\u2014MICCAI 2018: 21st International Conference, Granada, Spain. Proceedings, Part IV 11.","DOI":"10.1007\/978-3-030-00937-3_76"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ayobi, N., P\u00e9rez-Rond\u00f3n, A., Rodr\u00edguez, S., and Arbel\u00e1ez, P. (2023, January 18\u201321). Matis: Masked-attention transformers for surgical instrument segmentation. Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia.","DOI":"10.1109\/ISBI53787.2023.10230819"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Fan, H., Xiong, B., Mangalam, K., Li, Y., Yan, Z., Malik, J., and Feichtenhofer, C. (2021, January 10\u201317). Multiscale vision transformers. Proceedings of the Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00675"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Shvets, A.A., Rakhlin, A., Kalinin, A.A., and Iglovikov, V.I. (2018, January 17\u201320). Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning. Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications, Orlando, FL, USA.","DOI":"10.1109\/ICMLA.2018.00100"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI, Springer International Publishing.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Gonz\u00e1lez, C., Bravo-S\u00e1nchez, L., and Arbelaez, P. (2020). Isinet: An instance-based approach for surgical instrument segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, Proceedings of the MICCAI 2020: 23rd International Conference, Lima, Peru, 4\u20138 October 2020, Springer International Publishing. Proceedings, Part III.","DOI":"10.1007\/978-3-030-59716-0_57"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Song, M., Zhai, C., Yang, L., Liu, Y., and Bian, G. (2025). An attention-guided multi-scale fusion network for surgical instrument segmentation. Biomed. Signal Process. Control, 102.","DOI":"10.1016\/j.bspc.2024.107296"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Matasyoh, N.M., Mathis-Ullrich, F., and Zeineldin, R.A. (2024). Samsurg: Surgical Instrument Segmentation in Robotic Surgeries Using Vision Foundation Model, IEEE Access.","DOI":"10.1109\/ACCESS.2024.3520386"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Jin, Y., Cheng, K., Dou, Q., and Heng, P.A. (2019, January 13\u201317). Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. Proceedings of the Medical Image Computing and Computer Assisted Intervention\u2014MICCAI 2019: 22nd International Conference, Shenzhen, China. Proceedings, Part V 22.","DOI":"10.1007\/978-3-030-32254-0_49"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Jin, Y., and Heng, P.-A. (2022, January 23\u201327). Trasetr: Track-to-segment transformer with contrastive query for instance-level instrument segmentation in robotic surgery. Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA.","DOI":"10.1109\/ICRA46639.2022.9811873"},{"key":"ref_25","unstructured":"Allan, M., Kondo, S., Bodenstedt, S., Leger, S., Kadkhodamohammadi, R., Luengo, I., Fuentes, F., Flouty, E., Mohammed, A., and Pedersen, M. (2020). 2018 robotic scene segmentation challenge. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1227","DOI":"10.1007\/s11548-021-02404-2","article-title":"Mask then classify: Multi-instance segmentation for surgical instruments","volume":"16","author":"Kurmann","year":"2021","journal-title":"Int. J. Comput. Assist. Radiol. Surg."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/10\/364\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T11:30:28Z","timestamp":1760959828000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/10\/364"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,15]]},"references-count":26,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["jimaging11100364"],"URL":"https:\/\/doi.org\/10.3390\/jimaging11100364","relation":{},"ISSN":["2313-433X"],"issn-type":[{"type":"electronic","value":"2313-433X"}],"subject":[],"published":{"date-parts":[[2025,10,15]]}}}