{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T15:53:12Z","timestamp":1780501992204,"version":"3.54.1"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,4,4]],"date-time":"2024-04-04T00:00:00Z","timestamp":1712188800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,4]],"date-time":"2024-04-04T00:00:00Z","timestamp":1712188800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003786","name":"Hangzhou Science and Technology Bureau","doi-asserted-by":"publisher","award":["2021WJCY258"],"award-info":[{"award-number":["2021WJCY258"]}],"id":[{"id":"10.13039\/501100003786","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Pattern Anal Applic"],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep learning techniques can be effective in helping doctors diagnose gastrointestinal polyps. Currently, processing video frame sequences containing a large amount of spurious noise in polyp detection suffers from elevated recall and mean average precision. Moreover, the mean average precision is also low when the polyp target in the video frame has large-scale variability. Therefore, we propose a tiny polyp detection from endoscopic video frames using Vision Transformers, named TPolyp. The proposed method uses a cross-stage Swin Transformer as a multi-scale feature extractor to extract deep feature representations of data samples, improves the bidirectional sampling feature pyramid, and integrates the prediction heads of multiple channel self-attention mechanisms. This approach focuses more on the feature information of the tiny object detection task than convolutional neural networks and retains relatively deeper semantic information. It additionally improves feature expression and discriminability without increasing the computational complexity. Experimental results show that TPolyp improves detection accuracy by 7%, recall by 7.3%, and average accuracy by 7.5% compared to the YOLOv5 model, and has better tiny object detection in scenarios with blurry artifacts.<\/jats:p>","DOI":"10.1007\/s10044-024-01254-3","type":"journal-article","created":{"date-parts":[[2024,4,4]],"date-time":"2024-04-04T12:01:41Z","timestamp":1712232101000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Tiny polyp detection from endoscopic video frames using vision transformers"],"prefix":"10.1007","volume":"27","author":[{"given":"Entong","family":"Liu","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9868-1083","authenticated-orcid":false,"given":"Bishi","family":"He","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Darong","family":"Zhu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuanjiao","family":"Chen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhe","family":"Xu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,4,4]]},"reference":[{"issue":"1","key":"1254_CR1","doi-asserted-by":"publisher","first-page":"64","DOI":"10.5009\/gnl.2012.6.1.64","volume":"6","author":"SB Ahn","year":"2012","unstructured":"Ahn SB, Han DS, Bae JH, Byun TJ et al (2012) The miss rate for colorectal adenoma determined by quality-adjusted, back-to-back colonoscopies. Gut Liver 6(1):64","journal-title":"Gut Liver"},{"issue":"27","key":"1254_CR2","doi-asserted-by":"publisher","first-page":"e7468","DOI":"10.1097\/MD.0000000000007468","volume":"96","author":"J Lee","year":"2017","unstructured":"Lee J, Park SW, Kim YS et al (2017) Risk factors of missed colorectal lesions after colonoscopy. Medicine 96(27):e7468","journal-title":"Medicine"},{"key":"1254_CR3","doi-asserted-by":"publisher","first-page":"891","DOI":"10.1016\/j.gie.2020.02.042","volume":"92","author":"LZCT Pu","year":"2020","unstructured":"Pu LZCT et al (2020) Computer-aided diagnosis for characterisation of colorectal lesions: a comprehensive software including serrated lesions. Gastrointest Endosc 92:891\u2013899","journal-title":"Gastrointest Endosc"},{"issue":"6","key":"1254_CR4","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2017","unstructured":"Ren S et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137\u20131149","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1254_CR5","doi-asserted-by":"publisher","unstructured":"Wang R, Zhang W, Nie W, Yu Y (2020) Gastric polyps detection by improved faster R-CNN. In: Proceedings of the 2019 8th international conference on computing and pattern recognition (ICCPR '19). Association for Computing Machinery, New York, NY, USA, pp 128\u2013133. https:\/\/doi.org\/10.1145\/3373509.3373524","DOI":"10.1145\/3373509.3373524"},{"issue":"6","key":"1254_CR6","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2017","unstructured":"Ren S et al (2017) Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137\u20131149. https:\/\/doi.org\/10.1109\/TPAMI.2016.2577031","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1254_CR7","doi-asserted-by":"publisher","DOI":"10.14569\/IJACSA.2019.0100947","author":"S Al-Fedaghi","year":"2019","unstructured":"Al-Fedaghi S, Bayoumi M (2019) Authentication modeling with five generic processes. Int J Adv Comput Sci Appl (IJACSA). https:\/\/doi.org\/10.14569\/IJACSA.2019.0100947","journal-title":"Int J Adv Comput Sci Appl (IJACSA)"},{"key":"1254_CR8","unstructured":"Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767"},{"key":"1254_CR9","unstructured":"Bochkovskiy A et al (2020) YOLOv5: improved performance, and on-device training. arXiv preprint arXiv:2006.05597"},{"key":"1254_CR10","first-page":"5998","volume":"30","author":"A Vaswani","year":"2017","unstructured":"Vaswani A et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998\u20136008","journal-title":"Adv Neural Inf Process Syst"},{"key":"1254_CR11","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T et al (2021). An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929"},{"key":"1254_CR12","unstructured":"Su J, Zhou B, Jie Z, Zhu J, Ding C, Zhuang Y, Liu S, Li G, Wang Y, Li Z, Xiao B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 10257\u201310266"},{"key":"1254_CR13","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z et al. (2021). Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030","DOI":"10.1109\/ICCV48922.2021.00986"},{"issue":"2","key":"1254_CR14","doi-asserted-by":"publisher","first-page":"104","DOI":"10.3322\/caac.21220","volume":"64","author":"R Siegel","year":"2014","unstructured":"Siegel R, DeSantis C, Jemal A (2014) Colorectal cancer statistics, 2014. CA A Cancer J Clin 64(2):104\u2013117","journal-title":"CA A Cancer J Clin"},{"issue":"4","key":"1254_CR15","first-page":"616","volume":"14","author":"Y Wang","year":"2010","unstructured":"Wang Y, Dorner S, Ecker R (2010) A framework for automatic polyp detection in colonoscopy images. Med Image Anal 14(4):616\u2013629","journal-title":"Med Image Anal"},{"issue":"2","key":"1254_CR16","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1007\/s10916-017-0884-3","volume":"42","author":"Y Zheng","year":"2018","unstructured":"Zheng Y, Wang X, Song Y, Wang H (2018) Computer-aided diagnosis for colonoscopy by using bag-of-visual-words and Fisher vector techniques. J Med Syst 42(2):31","journal-title":"J Med Syst"},{"issue":"6","key":"1254_CR17","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1007\/s10916-016-0487-4","volume":"40","author":"X Zhang","year":"2016","unstructured":"Zhang X, Chen Y, Song Y (2016) A novel approach for automated polyp detection in colonoscopy images via SIFT features. J Med Syst 40(6):136","journal-title":"J Med Syst"},{"issue":"5","key":"1254_CR18","doi-asserted-by":"publisher","first-page":"820","DOI":"10.1109\/JPROC.2021.3054390","volume":"109","author":"SK Zhou","year":"2021","unstructured":"Zhou SK et al (2021) A review of deep learning in medical imaging: imaging traits, technology trends, case studies with progress highlights, and future promises. Proc IEEE 109(5):820\u2013838. https:\/\/doi.org\/10.1109\/JPROC.2021.3054390","journal-title":"Proc IEEE"},{"key":"1254_CR19","unstructured":"Zacharaki et al (2009) A comparative study of texture features for the detection of colonic polyps in computed tomography colonography"},{"issue":"5","key":"1254_CR20","doi-asserted-by":"publisher","first-page":"1299","DOI":"10.1109\/tmi.2016.2535302","volume":"35","author":"N Tajbakhsh","year":"2016","unstructured":"Tajbakhsh N et al (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35(5):1299\u20131312. https:\/\/doi.org\/10.1109\/tmi.2016.2535302","journal-title":"IEEE Trans Med Imaging"},{"issue":"5","key":"1254_CR21","doi-asserted-by":"crossref","first-page":"1495","DOI":"10.1109\/JBHI.2017.2770214","volume":"22","author":"P Wang","year":"2018","unstructured":"Wang P, Xiao X, Glissen Brown JR, Berzin TM (2018) Automatic detection of colonic polyps in endoscopic images using region-based convolutional neural networks. IEEE J Biomed Health Inform 22(5):1495\u20131505","journal-title":"IEEE J Biomed Health Inform"},{"key":"1254_CR22","unstructured":"Fang Y, Zhang J, Zhang Y, Gao Y (2016) Polyp detection using convolutional neural networks and region-based fully convolutional networks. In: International conference on medical image computing and computer-assisted intervention, vol 9902, pp 62\u201370"},{"key":"1254_CR23","unstructured":"Wang Y, Li L, Wang H, Gao X, Xia Y (2016) Polyp detection in colonoscopy videos using region-based convolutional neural networks. In: International conference on medical image computing and computer-assisted intervention, vol 9901, pp 473\u2013481"},{"issue":"4","key":"1254_CR24","doi-asserted-by":"publisher","first-page":"1069","DOI":"10.1053\/j.gastro.2018.06.037","volume":"155","author":"G Urban","year":"2018","unstructured":"Urban G, Tripathi P, Alkayali T, Mittal M, Jalali F, Karnes W et al (2018) Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy. Gastroenterology 155(4):1069\u20131078","journal-title":"Gastroenterology"},{"issue":"1","key":"1254_CR25","first-page":"73","volume":"40","author":"Y Xu","year":"2021","unstructured":"Xu Y, Chen W, Zhang X, Wang J (2021) EfficientDet-based colonic polyp detection in colonoscopy images. IEEE Trans Med Imaging 40(1):73\u201383","journal-title":"IEEE Trans Med Imaging"},{"issue":"2","key":"1254_CR26","first-page":"566","volume":"24","author":"H Li","year":"2020","unstructured":"Li H, Li X, Liang J, Li F (2020) EfficientDet-based automatic polyp detection for colonoscopy images. IEEE J Biomed Health Inform 24(2):566\u2013574","journal-title":"IEEE J Biomed Health Inform"},{"key":"1254_CR27","doi-asserted-by":"crossref","unstructured":"Tan M, Le QV (2020) EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10781\u201310790","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"1254_CR28","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1016\/j.media.2018.04.002","volume":"49","author":"D Bychkov","year":"2018","unstructured":"Bychkov D, Linder N, Annus P, K\u00f5ks S (2018) Detecting lesions in colorectal cancer with deep learning. Med Image Anal 49:88\u201397. https:\/\/doi.org\/10.1016\/j.media.2018.04.002","journal-title":"Med Image Anal"},{"key":"1254_CR29","doi-asserted-by":"publisher","unstructured":"Wang Z, Dong D, Wu L, Chen S, Liu F (2018) Towards accurate polyp detection with YOLO. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1576\u20131580. https:\/\/doi.org\/10.1109\/BIBM.2018.8621135","DOI":"10.1109\/BIBM.2018.8621135"},{"key":"1254_CR30","doi-asserted-by":"publisher","unstructured":"Bertrand R, Marion R, Boudiaf M, Chambon S (2019) Towards real-time lesion detection in colonoscopy using single shot detectors. In: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), pp 1003\u20131007. https:\/\/doi.org\/10.1109\/ISBI.2019.8759374","DOI":"10.1109\/ISBI.2019.8759374"},{"key":"1254_CR31","doi-asserted-by":"publisher","first-page":"8895832","DOI":"10.1155\/2020\/8895832","volume":"2020","author":"S Wang","year":"2020","unstructured":"Wang S, Wang R, Zhang X, Wang L, Zhang J (2020) Polyp detection in colonoscopy using focal loss convolutional neural networks. J Healthcare Eng 2020:8895832. https:\/\/doi.org\/10.1155\/2020\/8895832","journal-title":"J Healthcare Eng"},{"key":"1254_CR32","doi-asserted-by":"publisher","first-page":"891","DOI":"10.1016\/j.gie.2020.02.042","volume":"92","author":"LZCT Pu","year":"2020","unstructured":"Pu LZCT, Maicas G, Tian Y, Yamamura T, Nakamura M, Suzuki H, Singh G, Rana K, Hirooka Y, Burt AD et al (2020) Computer-aided diagnosis for characterisation of colorectal lesions: a comprehen-sive software including serrated lesions. Gastrointest Endosc 92:891\u2013899","journal-title":"Gastrointest Endosc"},{"key":"1254_CR33","doi-asserted-by":"crossref","unstructured":"Liu Y, Tian Y, Maicas G, Pu LZCT, Singh R, Verjans JW, Carneiro G (2020) Photoshopping colonoscopy video frames. In: 2020 IEEE 17th international symposium on biomedical imaging (ISBI). IEEE, pp 1\u20135","DOI":"10.1109\/ISBI45749.2020.9098406"},{"key":"1254_CR34","doi-asserted-by":"publisher","unstructured":"Tajbakhsh N et al (2015) Automatic polyp detection in colonoscopy videos using an ensemble of convolutional neural networks. In: 2015 IEEE 12th international symposium on biomedical imaging (ISBI). https:\/\/doi.org\/10.1109\/isbi.2015.7163821.","DOI":"10.1109\/isbi.2015.7163821"},{"issue":"10","key":"1254_CR35","doi-asserted-by":"publisher","first-page":"2926","DOI":"10.1109\/JBHI.2020.3003653","volume":"24","author":"A Bogusz","year":"2020","unstructured":"Bogusz A, Moscicki J, Skomorowski M et al (2020) Polyp detection in colonoscopy images using panoramic attention network. IEEE J Biomed Health Inform 24(10):2926\u20132935. https:\/\/doi.org\/10.1109\/JBHI.2020.3003653","journal-title":"IEEE J Biomed Health Inform"},{"key":"1254_CR36","doi-asserted-by":"crossref","unstructured":"Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759\u20138768","DOI":"10.1109\/CVPR.2018.00913"},{"issue":"8","key":"1254_CR37","doi-asserted-by":"publisher","first-page":"2560","DOI":"10.1109\/TMI.2020.2975962","volume":"39","author":"J Smith","year":"2020","unstructured":"Smith J (2020) Simplified PANet for polyp detection in colonoscopic images. IEEE Trans Med Imaging 39(8):2560\u20132569. https:\/\/doi.org\/10.1109\/TMI.2020.2975962","journal-title":"IEEE Trans Med Imaging"},{"key":"1254_CR38","doi-asserted-by":"crossref","unstructured":"Ma Y, Chen X, Cheng K, Li Y, Sun B (2021) LDPolypvideo benchmark: a large-scale colonoscopy video dataset of diverse polyps. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 387\u2013396","DOI":"10.1007\/978-3-030-87240-3_37"},{"issue":"1","key":"1254_CR39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41597-020-00622-y","volume":"7","author":"H Borgli","year":"2020","unstructured":"Borgli H et al (2020) Hyperkvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Scientific Data 7(1):1\u201314","journal-title":"Scientific Data"},{"key":"1254_CR40","volume-title":"Information theory, inference, and learning algorithms","author":"DJC MacKay","year":"2003","unstructured":"MacKay DJC (2003) Information theory, inference, and learning algorithms. Cambridge University Press, Cambridge"},{"key":"1254_CR41","doi-asserted-by":"crossref","unstructured":"Rezatofighi H, Tsoi N, Gwak JY et al (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 658\u2013666","DOI":"10.1109\/CVPR.2019.00075"},{"key":"1254_CR42","doi-asserted-by":"crossref","unstructured":"Zheng Z, Wang P, Liu W et al (2020) Distance-IoU loss: faster and better learning for bounding box regression. In: AAAI, pp 12993\u201313000","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"1254_CR43","doi-asserted-by":"crossref","unstructured":"Zhang H et al (2017) mixup: Beyond empirical risk minimization","DOI":"10.1007\/978-1-4899-7687-1_79"},{"key":"1254_CR44","unstructured":"Zhou X, Wang D, Philipp K (2019) Objects as points"},{"key":"1254_CR45","doi-asserted-by":"crossref","unstructured":"Zhou Q et al (2022) TransVOD: end-to-end video object detection with spatial-temporal transformers","DOI":"10.1109\/PRAI55851.2022.9904115"}],"container-title":["Pattern Analysis and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10044-024-01254-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10044-024-01254-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10044-024-01254-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T14:09:10Z","timestamp":1718633350000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10044-024-01254-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,4]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["1254"],"URL":"https:\/\/doi.org\/10.1007\/s10044-024-01254-3","relation":{},"ISSN":["1433-7541","1433-755X"],"issn-type":[{"value":"1433-7541","type":"print"},{"value":"1433-755X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,4]]},"assertion":[{"value":"14 April 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 February 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 April 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"The datasets analyzed during this study are obtained from public datasets. LDPolypVideo: . Hyper-Kvasir:","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical and informed consent"}}],"article-number":"38"}}