{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T02:02:51Z","timestamp":1767924171816,"version":"3.49.0"},"reference-count":56,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2021,12,31]],"date-time":"2021-12-31T00:00:00Z","timestamp":1640908800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Technology Development Fund, Macau SAR","award":["0023\/2018\/AFJ"],"award-info":[{"award-number":["0023\/2018\/AFJ"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The automatic analysis of endoscopic images to assist endoscopists in accurately identifying the types and locations of esophageal lesions remains a challenge. In this paper, we propose a novel multi-task deep learning model for automatic diagnosis, which does not simply replace the role of endoscopists in decision making, because endoscopists are expected to correct the false results predicted by the diagnosis system if more supporting information is provided. In order to help endoscopists improve the diagnosis accuracy in identifying the types of lesions, an image retrieval module is added in the classification task to provide an additional confidence level of the predicted types of esophageal lesions. In addition, a mutual attention module is added in the segmentation task to improve its performance in determining the locations of esophageal lesions. The proposed model is evaluated and compared with other deep learning models using a dataset of 1003 endoscopic images, including 290 esophageal cancer, 473 esophagitis, and 240 normal. The experimental results show the promising performance of our model with a high accuracy of 96.76% for the classification and a Dice coefficient of 82.47% for the segmentation. Consequently, the proposed multi-task deep learning model can be an effective tool to help endoscopists in judging esophageal lesions.<\/jats:p>","DOI":"10.3390\/s22010283","type":"journal-article","created":{"date-parts":[[2022,1,9]],"date-time":"2022-01-09T23:08:26Z","timestamp":1641769706000},"page":"283","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Multi-Task Model for Esophageal Lesion Analysis Using Endoscopic Images: Classification with Image Retrieval and Segmentation with Attention"],"prefix":"10.3390","volume":"22","author":[{"given":"Xiaoyuan","family":"Yu","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, Macau University of Science and Technology, Taipa, Macau"}]},{"given":"Suigu","family":"Tang","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Macau University of Science and Technology, Taipa, Macau"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7912-2913","authenticated-orcid":false,"given":"Chak Fong","family":"Cheang","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Macau University of Science and Technology, Taipa, Macau"}]},{"given":"Hon Ho","family":"Yu","sequence":"additional","affiliation":[{"name":"Kiang Wu Hospital, Santo Ant\u00f3nio, Macau"}]},{"given":"I Cheong","family":"Choi","sequence":"additional","affiliation":[{"name":"Kiang Wu Hospital, Santo Ant\u00f3nio, Macau"}]}],"member":"1968","published-online":{"date-parts":[[2021,12,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"209","DOI":"10.3322\/caac.21660","article-title":"Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries","volume":"71","author":"Sung","year":"2021","journal-title":"CA Cancer J. Clin."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"897","DOI":"10.1111\/dote.12533","article-title":"Recommendations for pathologic staging (pTNM) of cancer of the esophagus and esophagogastric junction for the 8th edition AJCC\/UICC staging manuals","volume":"29","author":"Rice","year":"2016","journal-title":"Dis. Esophagus"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2017","DOI":"10.1053\/j.gastro.2011.08.007","article-title":"Magnifying narrowband imaging is more accurate than conventional white-light imaging in diagnosis of gastric mucosal cancer","volume":"141","author":"Ezoe","year":"2011","journal-title":"Gastroenterology"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1159\/000487470","article-title":"Narrow-band imaging: Clinical application in gastrointestinal endoscopy","volume":"26","author":"Barbeiro","year":"2018","journal-title":"GE Port. J. Gastroenterol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"867","DOI":"10.1016\/j.dld.2006.09.007","article-title":"Capsule endoscopy: Where are we after 6 years of clinical use?","volume":"38","author":"Pennazio","year":"2006","journal-title":"Dig. Liver Dis."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1038\/nrgastro.2016.148","article-title":"Role of endoscopy in early oesophageal cancer","volume":"13","author":"Mannath","year":"2016","journal-title":"Nat. Rev. Gastroenterol. Hepatol."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"142053","DOI":"10.1109\/ACCESS.2019.2944676","article-title":"Review on the applications of deep learning in the analysis of gastrointestinal endoscopy images","volume":"7","author":"Du","year":"2019","journal-title":"IEEE Access"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"801","DOI":"10.31661\/jbpe.v0i0.2004-1107","article-title":"A deep learning approach to skin cancer detection in dermoscopy images","volume":"10","author":"Ameri","year":"2020","journal-title":"J. Biomed. Phys. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2369","DOI":"10.1109\/TMI.2016.2546227","article-title":"Segmenting retinal blood vessels with deep neural networks","volume":"35","author":"Liskowski","year":"2016","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"6582","DOI":"10.1007\/s00330-020-07008-z","article-title":"Clinically significant prostate cancer detection and segmentation in low-risk patients using a convolutional neural network on multi-parametric MRI","volume":"30","author":"Arif","year":"2020","journal-title":"Eur. Radiol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1645","DOI":"10.1016\/S1470-2045(19)30637-0","article-title":"Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: A multicentre, case-control, diagnostic study","volume":"20","author":"Luo","year":"2019","journal-title":"Lancet"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1016\/j.gie.2019.04.245","article-title":"Classification for invasion depth of esophageal squamous cell carcinoma using a deep neural network compared with experienced endoscopists","volume":"90","author":"Nakagawa","year":"2019","journal-title":"Gastrointest. Endosc."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1177\/15330338211034284","article-title":"Clinical target volume auto-segmentation of esophageal cancer for radiotherapy after radical surgery based on deep learning","volume":"20","author":"Cao","year":"2021","journal-title":"Technol. Cancer Res. Treat."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.gie.2019.08.018","article-title":"Real-time automated diagnosis of precancerous lesions and early esophageal squamous cell carcinoma using a deep learning model (with videos)","volume":"91","author":"Guo","year":"2020","journal-title":"Gastrointest. Endosc."},{"key":"ref_15","first-page":"95","article-title":"Multitask learning","volume":"27","author":"Caruana","year":"1998","journal-title":"Mach. Learn."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Misra, I., Shrivastava, A., Gupta, A., and Hebert, M. (2016, January 27\u201330). Cross-Stitch Networks for Multi-Task Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.433"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kokkinos, I. (2017, January 21\u201326). Ubernet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.579"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Caruana, R. (1993, January 27\u201329). Multitask Learning: A Knowledge Based Source of Inductive Bias. Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA.","DOI":"10.1016\/B978-1-55860-307-3.50012-5"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"324","DOI":"10.3414\/ME9230","article-title":"Computer-assisted diagnosis for precancerous lesions in the esophgus","volume":"48","author":"Kage","year":"2009","journal-title":"Methods Inf. Med."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2893","DOI":"10.1109\/TBME.2012.2212440","article-title":"Invariant gabor texture descriptors for classification of gastroenterology images","volume":"59","author":"Riaz","year":"2012","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"422","DOI":"10.4236\/jsea.2014.75039","article-title":"Bleeding and ulcer detection using wireless capsule endoscopy images","volume":"7","author":"Yeh","year":"2014","journal-title":"J. Softw. Eng. Appl."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/j.media.2016.04.007","article-title":"Identification of lesion images from gastrointestinal endoscope based on feature extraction of combinational methods with and without learning process","volume":"32","author":"Liu","year":"2016","journal-title":"Med. Image Anal."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1007\/s10388-018-0651-7","article-title":"Diagnosis using deep-learning artificial intelligence based on the endocytoscopic observation of the esophagus","volume":"16","author":"Kumagai","year":"2019","journal-title":"Esophagus"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/j.neucom.2018.10.100","article-title":"Fine-tuning pre-trained convolutional neural networks for gastric precancerous disease classification on magnification narrow-band imaging images","volume":"392","author":"Liu","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3066","DOI":"10.1364\/BOE.420935","article-title":"Automatic classification of esophageal disease in gastroscopic images using an efficient channel attention deep dense convolutional neural network","volume":"12","author":"Du","year":"2021","journal-title":"Biomed. Opt. Express"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"103950","DOI":"10.1016\/j.compbiomed.2020.103950","article-title":"Anatomical classification of upper gastrointestinal organs under various image capture conditions using AlexNet","volume":"124","author":"Igarashi","year":"2020","journal-title":"Comput. Biol. Med."},{"key":"ref_27","unstructured":"Fieselmann, A., Lautenschl\u00e4ger, S., Deinzer, F., Matthias, J., and Poppe, B. (2008, January 6\u20138). Esophagus segmentation by spatially-constrained shape interpolation. Proceedings of the Bildverarbeitung f\u00fcr die Medizin 2008: Algorithmen\u2013Systeme\u2013Anwendungen, Proceedings des Workshops, Berlin, Germany."},{"key":"ref_28","first-page":"255","article-title":"Fast automatic segmentation of the esophagus from 3D CT data using a probabilistic model","volume":"12","author":"Feulner","year":"2009","journal-title":"Med. Image Comput. Comput. Assist. Interv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1016\/j.neucom.2014.02.066","article-title":"Supportive automatic annotation of early esophageal cancer using local gabor and color features","volume":"144","author":"Sommen","year":"2014","journal-title":"Neurocomputing"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"9140","DOI":"10.1088\/1361-6560\/aa94ba","article-title":"Atlas ranking and selection for automatic segmentation of the esophagus from CT scans","volume":"62","author":"Yang","year":"2017","journal-title":"Phys. Med. Biol."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mendel, R., Ebigbo, A., Probst, A., Messmann, H., and Palm, C. (2017, January 12\u201314). Barrett\u2019s esophagus analysis using convolutional neural networks. Proceedings of the Bildverarbeitung f\u00fcr die Medizin 2017: Algorithmen\u2013Systeme\u2013Anwendungen, Proceedings des Workshops, Heidelberg, Germany.","DOI":"10.1007\/978-3-662-54345-0_23"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"122798","DOI":"10.1109\/ACCESS.2020.3007719","article-title":"Channel-attention U-Net: Channel attention mechanism for semantic segmentation of esophagus and esophageal cancer","volume":"8","author":"Huang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Tran, M., Kim, S., Yang, H., Lee, G., Oh, I., and Kang, S. (2021). Esophagus segmentation in CT images via spatial attention network and STAPLE algorithm. Sensors, 21.","DOI":"10.3390\/s21134556"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"82867","DOI":"10.1109\/ACCESS.2019.2923760","article-title":"U-Net Plus: Deep Semantic Segmentation for Esophagus and Esophageal Cancer in Computed Tomography Images","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"105685","DOI":"10.1016\/j.cmpb.2020.105685","article-title":"Esophagus segmentation from planning CT images using an atlas-based deep learning approach","volume":"197","author":"Diniz","year":"2020","journal-title":"Comput. Methods Programs Biomed."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"101838","DOI":"10.1016\/j.media.2020.101838","article-title":"ELNet: Automatic classification and segmentation for esophageal lesions using convolutional neural network","volume":"67","author":"Wu","year":"2021","journal-title":"Med. Image Anal."},{"key":"ref_38","unstructured":"Chakravarty, A., and Sivswamy, J. (2018). A deep learning based joint segmentation and classification framework for glaucoma assessment in retinal color fundus images. arXiv, Available online: https:\/\/arxiv.org\/abs\/1808.01355."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"2912","DOI":"10.1109\/JBHI.2020.2973614","article-title":"An end-to-end multi-task deep learning framework for skin lesion analysis","volume":"24","author":"Song","year":"2020","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1618","DOI":"10.1109\/TMI.2021.3062902","article-title":"3D multi-attention guided multi-task learning network for automatic gastric tumor segmentation and lymph node classification","volume":"40","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1109\/TPAMI.2018.2858826","article-title":"Focal loss for dense object detection","volume":"42","author":"Lin","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lin, K., Yang, H., Hsiao, J., and Chen, C. (2015, January 11\u201312). Deep Learning of Binary Hash Codes for Fast Image Retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301269"},{"key":"ref_43","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 worlds: Transformers for image recognition at scale. arXiv, Available online: https:\/\/arxiv.org\/abs\/2010.11929."},{"key":"ref_44","unstructured":"Xie, E., Wang, W., Yu, Z., Anndkumar, A., Alvarez, J., and Luo, P. (2021). SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv, Available online: http:\/\/arxiv.org\/abs\/2105.15203."},{"key":"ref_45","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv, Available online: http:\/\/arxiv.org\/abs\/1409.1556."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Dollar, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_48","unstructured":"Tan, M., and Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv, Available online: http:\/\/arxiv.org\/abs\/1905.11946v1."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Radosavovic, I., Kosaraju, R., Girshick, R., He, K., and Dollar, P. (2020, January 13\u201319). Designing Network Design Spaces. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01044"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017). Pyramid scene parsing network. arXiv, Available online: http:\/\/arxiv.org\/abs\/1612.01105.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","article-title":"Fully convolutional networks for semantic segmentation","volume":"39","author":"Shelhamer","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TPAMI.2020.2977911","article-title":"CCNet: Criss-cross attention for semantic segmentation","volume":"14","author":"Huang","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Yuan, Y., Chen, X., and Wang, J. (2020, January 23\u201328). Object-contextual representations for semantic segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.","DOI":"10.1007\/978-3-030-58539-6_11"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"83","DOI":"10.4161\/jig.1.2.15048","article-title":"Quality endoscopists and quality endoscopy units","volume":"1","author":"Cotton","year":"2011","journal-title":"J. Interv. Gastroenterol."},{"key":"ref_56","first-page":"1","article-title":"Neural architecture search: A survey","volume":"20","author":"Elsken","year":"2019","journal-title":"J. Mach. Learn. Res."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/1\/283\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:56:29Z","timestamp":1760169389000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/1\/283"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,31]]},"references-count":56,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,1]]}},"alternative-id":["s22010283"],"URL":"https:\/\/doi.org\/10.3390\/s22010283","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,31]]}}}