{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T09:40:43Z","timestamp":1780738843651,"version":"3.54.1"},"reference-count":55,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,9]],"date-time":"2025-03-09T00:00:00Z","timestamp":1741478400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation (NSF)","doi-asserted-by":"publisher","award":["2131307"],"award-info":[{"award-number":["2131307"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation (NSF)","doi-asserted-by":"publisher","award":["1OT2OD032581-01"],"award-info":[{"award-number":["1OT2OD032581-01"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Office of the Director, National Institutes of Health (NIH) Common Fund","award":["2131307"],"award-info":[{"award-number":["2131307"]}]},{"name":"Office of the Director, National Institutes of Health (NIH) Common Fund","award":["1OT2OD032581-01"],"award-info":[{"award-number":["1OT2OD032581-01"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of artificial intelligence (AI) for medical applications, especially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, Vector-Quantized Generative Adversarial Network (VQ-GAN)) have been used to generate images but not colonoscopy data for intelligent data augmentation. This study developed an effective method for producing synthetic colonoscopy image data, which can be used to train advanced medical diagnostic models for robust colorectal cancer detection and treatment. Text-to-image synthesis was performed using fine-tuned Visual Large Language Models (LLMs). Stable Diffusion and DreamBooth Low-Rank Adaptation produce images that look authentic, with an average Inception score of 2.36 across three datasets. The validation accuracy of various classification models Big Transfer (BiT), Fixed Resolution Residual Next Generation Network (FixResNeXt), and Efficient Neural Network (EfficientNet) were 92%, 91%, and 86%, respectively. Vision Transformer (ViT) and Data-Efficient Image Transformers (DeiT) had an accuracy rate of 93%. Secondly, for the segmentation of polyps, the ground truth masks are generated using Segment Anything Model (SAM). Then, five segmentation models (U-Net, Pyramid Scene Parsing Network (PSNet), Feature Pyramid Network (FPN), Link Network (LinkNet), and Multi-scale Attention Network (MANet)) were adopted. FPN produced excellent results, with an Intersection Over Union (IoU) of 0.64, an F1 score of 0.78, a recall of 0.75, and a Dice coefficient of 0.77. This demonstrates strong performance in terms of both segmentation accuracy and overlap metrics, with particularly robust results in balanced detection capability as shown by the high F1 score and Dice coefficient. This highlights how AI-generated medical images can improve colonoscopy analysis, which is critical for early colorectal cancer detection.<\/jats:p>","DOI":"10.3390\/a18030155","type":"journal-article","created":{"date-parts":[[2025,3,10]],"date-time":"2025-03-10T05:46:52Z","timestamp":1741585612000},"page":"155","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Text-Guided Synthesis in Medical Multimedia Retrieval: A Framework for Enhanced Colonoscopy Image Classification and Segmentation"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-2039-3075","authenticated-orcid":false,"given":"Ojonugwa Oluwafemi","family":"Ejiga Peter","sequence":"first","affiliation":[{"name":"Department of Computer Science, School of Computer, Mathematical and Natural Sciences, Morgan State University, Baltimore, MD 21251, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5709-0218","authenticated-orcid":false,"given":"Opeyemi Taiwo","family":"Adeniran","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Morgan State University, Baltimore, MD 21251, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3138-4639","authenticated-orcid":false,"given":"Adetokunbo MacGregor","family":"John-Otumu","sequence":"additional","affiliation":[{"name":"Department of Information Technology, Federal University of Technology Owerri, Owerri 460116, Imo State, Nigeria"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3318-2851","authenticated-orcid":false,"given":"Fahmi","family":"Khalifa","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Morgan State University, Baltimore, MD 21251, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0405-9088","authenticated-orcid":false,"given":"Md Mahmudur","family":"Rahman","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computer, Mathematical and Natural Sciences, Morgan State University, Baltimore, MD 21251, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1813","DOI":"10.1136\/gutjnl-2018-317500","article-title":"Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: A prospective randomised controlled study","volume":"68","author":"Wang","year":"2019","journal-title":"Gut"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1109\/TMI.2017.2664042","article-title":"Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results from the MICCAI 2015 Endoscopic Vision Challenge","volume":"36","author":"Bernal","year":"2017","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Kim, J.J.H., Um, R.S., Lee, J.W.Y., and Ajilore, O. (2024). Generative AI can fabricate advanced scientific visualizations: Ethical implications and strategic mitigation framework. AI Ethics.","DOI":"10.1007\/s43681-024-00439-0"},{"key":"ref_4","unstructured":"Videau, M., Knizev, N., Leite, A., Schoenauer, M., and Teytaud, O. Interactive Latent Diffusion Model. Proceedings of the Genetic and Evolutionary Computation Conference."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"394","DOI":"10.3322\/caac.21492","article-title":"Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries","volume":"68","author":"Bray","year":"2018","journal-title":"CA Cancer J. Clin."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"24412","DOI":"10.1109\/ACCESS.2024.3365043","article-title":"Text-to-Image Synthesis with Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction","volume":"12","author":"Alhabeeb","year":"2024","journal-title":"IEEE Access"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"88099","DOI":"10.1109\/ACCESS.2023.3306422","article-title":"Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects","volume":"11","author":"Tan","year":"2023","journal-title":"IEEE Access"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"100553","DOI":"10.1016\/j.cosrev.2023.100553","article-title":"A survey on GANs for computer vision: Recent research, analysis and taxonomy","volume":"48","author":"Iglesias","year":"2023","journal-title":"Comput. Sci. Rev."},{"key":"ref_9","unstructured":"Ejiga Peter, O.O., Rahman, M.M., and Khalifa, F. (2024, December 12). Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA. Conference and Labs of the Evaluation Forum. Available online: https:\/\/ceur-ws.org\/Vol-3740\/paper-145.pdf."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Najjar, R. (2023). Redefining Radiology: A Review of Artificial Intelligence Integration in Medical Imaging. Diagnostics, 13.","DOI":"10.20944\/preprints202306.1124.v1"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1007\/s10462-024-10814-2","article-title":"Efficient artificial intelligence approaches for medical image processing in healthcare: Comprehensive review, taxonomy, and analysis","volume":"57","author":"Alnaggar","year":"2024","journal-title":"Artif. Intell. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2929","DOI":"10.1038\/s41591-023-02608-w","article-title":"The value of standards for health datasets in artificial intelligence-based applications","volume":"29","author":"Arora","year":"2023","journal-title":"Nat. Med."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Han, P., Ye, C., Zhou, J., Zhang, J., Hong, J., and Li, X. (2024, January 17\u201318). Latent-based Diffusion Model for Long-tailed Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00270"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Du, Y., Jiang, Y., Tan, S., Wu, X., Dou, Q., Li, Z., Li, G., and Wan, X. (2023, January 8\u201312). ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models. Proceedings of the Medical Image Computing and Computer Assisted Intervention\u2014MICCAI 2023, Vancouver, BC, Canada.","DOI":"10.1007\/978-3-031-43895-0_32"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ku, H., and Lee, M. (2023). TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks. Appl. Sci., 13.","DOI":"10.3390\/app13085098"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Iqbal, M.A., Jadoon, W., and Kim, S.K. (2024). Synthetic Image Generation Using Conditional GAN-Provided Single-Sample Face Image. Appl. Sci., 14.","DOI":"10.3390\/app14125049"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"40950","DOI":"10.1109\/ACCESS.2018.2856402","article-title":"Automatic Colon Polyp Detection using Region based Deep CNN and Post Learning Approaches","volume":"6","author":"Shin","year":"2019","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Qadir, H.A., Shin, Y., Solhusvik, J., Bergsland, J., Aabakken, L., and Balasingham, I. (2019, January 8\u201310). Polyp Detection and Segmentation using Mask R-CNN: Does a Deeper Feature Extractor CNN Always Perform Better?. Proceedings of the International Symposium on Medical Information and Communication Technology (ISMICT), Oslo, Norway.","DOI":"10.1109\/ISMICT.2019.8743694"},{"key":"ref_19","unstructured":"Dong, B., Wang, W., Fan, D.-P., Li, J., Fu, H., and Shao, L. (2021). Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1053\/j.gastro.2020.04.062","article-title":"Efficacy of Real-Time Computer-Aided Detection of Colorectal Neoplasia in a Randomized Trial","volume":"159","author":"Repici","year":"2020","journal-title":"Gastroenterology"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1874","DOI":"10.1016\/j.cgh.2019.09.009","article-title":"Artificial Intelligence-assisted System Improves Endoscopic Identification of Colorectal Neoplasms","volume":"18","author":"Kudo","year":"2020","journal-title":"Clin. Gastroenterol. Hepatol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1016\/j.gie.2019.11.026","article-title":"A novel artificial intelligence system for the assessment of bowel preparation (with video)","volume":"91","author":"Zhou","year":"2020","journal-title":"Gastrointest Endosc"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2572","DOI":"10.1109\/TMI.2018.2842767","article-title":"Unsupervised Reverse Domain Adaptation for Synthetic Medical Images via Adversarial Training","volume":"37","author":"Mahmood","year":"2018","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"12561","DOI":"10.1007\/s10462-023-10453-z","article-title":"Medical image data augmentation: Techniques, comparisons and interpretations","volume":"56","author":"Goceri","year":"2023","journal-title":"Artif. Intell. Rev."},{"key":"ref_25","unstructured":"Yang, Z., Zhan, F., Liu, K., Xu, M., and Lu, S. (2023). AI-Generated Images as Data Source: The Dawn of Synthetic Era. arXiv."},{"key":"ref_26","unstructured":"Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P.S., and Sun, L. (2023). A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Bandi, A., Adapa, P.V.S.R., and Kuchi, Y.E.V.P.K. (2023). The Power of Generative AI: A Review of Requirements, Models, Input\u2013Output Formats, Evaluation Metrics, and Challenges. Future Internet, 15.","DOI":"10.3390\/fi15080260"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Bendel, O. (2023). Image synthesis from an ethical perspective. AI Soc.","DOI":"10.1007\/s00146-023-01780-4"},{"key":"ref_29","first-page":"36","article-title":"Comparative analysis of neural networks Midjourney, Stable Diffusion, and DALL-E and ways of their implementation in the educational process of students of design specialities","volume":"9","author":"Derevyanko","year":"2023","journal-title":"Pedagog. Psychol."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"101923","DOI":"10.1016\/j.artmed.2020.101923","article-title":"Deep learning to find colorectal polyps in colonoscopy: A systematic literature review","volume":"108","author":"Pagador","year":"2020","journal-title":"Artif. Intell. Med."},{"key":"ref_31","unstructured":"Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (2016). Improved Techniques for Training GANs. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_32","unstructured":"Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1038\/s41551-018-0301-3","article-title":"Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy","volume":"2","author":"Wang","year":"2018","journal-title":"Nat. Biomed. Eng."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2027","DOI":"10.1053\/j.gastro.2018.04.003","article-title":"Artificial Intelligence-Assisted Polyp Detection for Colonoscopy: Initial Experience","volume":"154","author":"Misawa","year":"2018","journal-title":"Gastroenterology"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Guo, Y., Bernal, J., and Matuszewski, B.J. (2020). Polyp Segmentation with Fully Convolutional Deep Neural Networks\u2014Extended Evaluation Study. J. Imaging, 6.","DOI":"10.3390\/jimaging6070069"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1038\/s41597-020-00622-y","article-title":"HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy","volume":"7","author":"Borgli","year":"2020","journal-title":"Sci. Data"},{"key":"ref_37","unstructured":"Beaumont, R. (2024, December 15). LAION-5B: A New Era of Open Large-Scale Multi-Modal Datasets. Available online: https:\/\/laion.ai\/blog\/laion-5b\/."},{"key":"ref_38","unstructured":"Hicks, S., Stor\u00e5s, A., Halvorsen, P., De Lange, T., Riegler, M., and Thambawita, V. (2024, December 15). Overview of ImageCLEFmedical 2023\u2014Medical Visual Question Answering for Gastrointestinal Tract. Available online: https:\/\/ceur-ws.org\/Vol-3497\/paper-107.pdf."},{"key":"ref_39","unstructured":"Wang, W., and Tian, J. (2024, December 15). CP-CHILD Records the Colonoscopy Data. figshare 2020. Available online: https:\/\/figshare.com\/articles\/dataset\/CP-CHILD_zip\/12554042?file=23383508."},{"key":"ref_40","unstructured":"Rahman, M.S. (2024, December 15). Binary Polyps Classification. Available online: https:\/\/www.kaggle.com\/datasets\/mdsahilurrahman71\/binary-polyps-classification?resource=download."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chicco, D., and Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom., 21.","DOI":"10.1186\/s12864-019-6413-7"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. (2019, January 15\u201320). Generalized Intersection Over Union: A Metric and A Loss for Bounding Box Regression. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00075"},{"key":"ref_44","unstructured":"Powers, D.M.W. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Hore, A., and Ziou, D. (2010, January 23\u201326). Image Quality Metrics: PSNR vs. SSIM. Proceedings of the International Conference on Pattern Recognition, Istanbul, Turkey.","DOI":"10.1109\/ICPR.2010.579"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image Quality Assessment: From Error Visibility to Structural Similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Taha, A.A., and Hanbury, A. (2015). Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging, 15.","DOI":"10.1186\/s12880-015-0068-x"},{"key":"ref_48","unstructured":"Ejiga, P.O., and Oluwafemi, O. (2024, December 15). Text-Guided Synthesis for Colon Cancer Screening. GitHub Repository. Available online: https:\/\/github.com\/Ejigsonpeter\/Text-Guided-Synthesis-for-Colon-Cancer-Screening."},{"key":"ref_49","unstructured":"HuggingFace (2024, December 15). Mask Generation. Available online: https:\/\/huggingface.co\/docs\/transformers\/tasks\/mask_generation."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. arXiv.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017). Pyramid Scene Parsing Network. arXiv.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Lin, T., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2016). Feature Pyramid Networks for Object Detection. arXiv.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Chaurasia, A., and Culurciello, E. (2017, January 10\u201313). LinkNet: Exploiting encoder representations for efficient semantic segmentation. Proceedings of the IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA. Available online: https:\/\/arxiv.org\/abs\/1707.03718.","DOI":"10.1109\/VCIP.2017.8305148"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Safari, F., Savi\u0107, I., Kunze, H., Ernst, J., and Gillis, D. (2023, January 21\u201323). A Review of AI-based MANET Routing Protocols. Proceedings of the 2023 19th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Montreal, QC, Canada.","DOI":"10.1109\/WiMob58348.2023.10187830"},{"key":"ref_55","unstructured":"Ejiga Peter, O.O. (2025, January 08). Advancing Colonoscopy Analysis Through Text-to-Image Synthesis Using Generative AI for Intelligent Data Augmentation, Image Classification, and Segmentation. Available online: https:\/\/www.proquest.com\/openview\/9a3add722e60af686957df5383de11f5\/1?pq-origsite=gscholar&cbl=18750&diss=y."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/3\/155\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:49:36Z","timestamp":1760028576000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/3\/155"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,9]]},"references-count":55,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["a18030155"],"URL":"https:\/\/doi.org\/10.3390\/a18030155","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,9]]}}}