{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T21:13:12Z","timestamp":1767906792649,"version":"3.49.0"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T00:00:00Z","timestamp":1707177600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T00:00:00Z","timestamp":1707177600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61872143"],"award-info":[{"award-number":["61872143"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The text-to-image (T2I) model based on a single-stage generative adversarial network (GAN) has significantly succeeded in recent years. However, the generation model based on GAN has two disadvantages: the generator does not introduce any image feature manifold structure, which makes it challenging to align the image and text features. Another is the image\u2019s diversity; the text\u2019s abstraction will prevent the model from learning the actual image distribution. This paper proposes a reversed image interaction generative adversarial network (RII-GAN), which consists of four components: text encoder, reversed image interaction network (RIIN), adaptive affine-based generator, and dual-channel feature alignment discriminator (DFAD). RIIN indirectly introduces the actual image distribution into the generation network, thus overcoming the problem that the network lacks the learning of the actual image feature manifold structure and generating the distribution of text-matching images. Each adaptive affine block (AAB) in the proposed affine-based generator can adaptively enhance text information, establishing an updated relation between original independent fusion blocks and the image feature. Moreover, this study designs a DFAD to capture important feature information of images and text in two channels. Such a dual-channel backbone improves semantic consistency by utilizing a particular synchronized bi-modal information extraction structure. We have performed experiments on publicly available datasets to prove the effectiveness of our model.<\/jats:p>","DOI":"10.1007\/s11063-024-11503-5","type":"journal-article","created":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T20:02:17Z","timestamp":1707249737000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["RII-GAN: Multi-scaled Aligning-Based Reversed Image Interaction Network for Text-to-Image Synthesis"],"prefix":"10.1007","volume":"56","author":[{"given":"Haofei","family":"Yuan","sequence":"first","affiliation":[]},{"given":"Hongqing","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Suyi","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Ziying","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Nan","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,6]]},"reference":[{"key":"11503_CR1","unstructured":"Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, pp 214\u2013223"},{"key":"11503_CR2","unstructured":"Berthelot D, Schumm T, Metz L (2017) BEGAN: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717"},{"key":"11503_CR3","doi-asserted-by":"crossref","unstructured":"Chen Z, Luo Y (2019) Cycle-consistent diverse image synthesis from natural language. In: 2019 IEEE international conference on multimedia & expo workshops (ICMEW), IEEE, pp 459\u2013464","DOI":"10.1109\/ICMEW.2019.00085"},{"key":"11503_CR4","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, et\u00a0al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929"},{"issue":"11","key":"11503_CR5","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1145\/3422622","volume":"63","author":"I Goodfellow","year":"2020","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63(11):139\u2013144","journal-title":"Commun ACM"},{"key":"11503_CR6","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, et\u00a0al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"11503_CR7","doi-asserted-by":"crossref","unstructured":"Hessel J, Holtzman A, Forbes M, et\u00a0al (2021) Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718","DOI":"10.18653\/v1\/2021.emnlp-main.595"},{"key":"11503_CR8","unstructured":"Heusel M, Ramsauer H, Unterthiner T, et\u00a0al (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, pp 662\u20136637"},{"key":"11503_CR9","unstructured":"Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980"},{"key":"11503_CR10","unstructured":"Li B, Qi X, Lukasiewicz T, et\u00a0al (2019) Controllable text-to-image generation. In: Advances in neural information processing systems, pp 2065\u20132075"},{"key":"11503_CR11","unstructured":"Li B, Torr PHS, Lukasiewicz T (2022) Memory-driven text-to-image generation. arXiv preprint arXiv:2208.07022"},{"key":"11503_CR12","doi-asserted-by":"crossref","unstructured":"Liao W, Hu K, Yang MY, et\u00a0al (2022) Text to image generation with semantic-spatial aware GAN. In: 2022 IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 18,166\u201318,175","DOI":"10.1109\/CVPR52688.2022.01765"},{"key":"11503_CR13","unstructured":"Lim JH, Ye JC (2017) Geometric GAN. arXiv preprint arXiv:1705.02894"},{"key":"11503_CR14","doi-asserted-by":"crossref","unstructured":"Lin TY, Maire M, Belongie S, et\u00a0al (2014) Microsoft COCO: common objects in context. In: European conference on computer vision, pp 740\u2013755","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"11503_CR15","unstructured":"Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784"},{"key":"11503_CR16","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/j.neunet.2021.01.023","volume":"138","author":"D Peng","year":"2021","unstructured":"Peng D, Yang W, Liu C et al (2021) SAM-GAN: self-attention supporting multi-stage generative adversarial networks for text-to-image synthesis. Neural Netw 138:57\u201367","journal-title":"Neural Netw"},{"key":"11503_CR17","doi-asserted-by":"publisher","first-page":"4356","DOI":"10.1109\/TMM.2021.3116416","volume":"24","author":"J Peng","year":"2022","unstructured":"Peng J, Zhou Y, Sun X et al (2022) Knowledge-driven generative adversarial network for text-to-image synthesis. IEEE Trans Multimed 24:4356\u20134366","journal-title":"IEEE Trans Multimed"},{"key":"11503_CR18","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1016\/j.neucom.2021.03.059","volume":"449","author":"Z Qi","year":"2021","unstructured":"Qi Z, Sun J, Qian J et al (2021) PCCM-GAN: photographic text-to-image generation with pyramid contrastive consistency model. Neurocomputing 449:330\u2013341","journal-title":"Neurocomputing"},{"key":"11503_CR19","doi-asserted-by":"crossref","unstructured":"Qiao T, Zhang J, Xu D, et\u00a0al (2019) MirrorGAN: Learning text-to-image generation by redescription. In: 2019 IEEE\/CVF Conference on computer vision and pattern recognition (CVPR), pp 1505\u20131514","DOI":"10.1109\/CVPR.2019.00160"},{"key":"11503_CR20","unstructured":"Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434"},{"key":"11503_CR21","unstructured":"Radford A, Kim JW, Hallacy C, et\u00a0al (2021) Learning transferable visual models from natural language supervision. In: Proceedings of the 38th international conference on machine learning, pp 8748\u20138763"},{"key":"11503_CR22","unstructured":"Reed S, Akata Z, Yan X, et\u00a0al (2016) Generative adversarial text to image synthesis. In: Proceedings of the 33rd international conference on machine learning, pp 1060\u20131069"},{"key":"11503_CR23","doi-asserted-by":"crossref","unstructured":"Ruan S, Zhang Y, Zhang K, et\u00a0al (2021) DAE-GAN: Dynamic aspect-aware gan for text-to-image synthesis. In: 2021 IEEE\/CVF international conference on computer vision (ICCV), pp 13,940\u201313,949","DOI":"10.1109\/ICCV48922.2021.01370"},{"key":"11503_CR24","unstructured":"Salimans T, Goodfellow I, Zaremba W, et\u00a0al (2016) Improved techniques for training GANs. In: Advances in neural information processing systems, p 2234-2242"},{"issue":"11","key":"11503_CR25","doi-asserted-by":"publisher","first-page":"2673","DOI":"10.1109\/78.650093","volume":"45","author":"M Schuster","year":"1997","unstructured":"Schuster M, Paliwal K (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673\u20132681","journal-title":"IEEE Trans Signal Process"},{"key":"11503_CR26","doi-asserted-by":"crossref","unstructured":"Szegedy C, Vanhoucke V, Ioffe S, et\u00a0al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2818\u20132826","DOI":"10.1109\/CVPR.2016.308"},{"key":"11503_CR27","doi-asserted-by":"crossref","unstructured":"Tan H, Liu X, Li X, et\u00a0al (2019) Semantics-enhanced adversarial nets for text-to-image synthesis. In: Proceedings of the IEEE\/CVF international conference on computer vision (ICCV), pp 10,500\u201310,509","DOI":"10.1109\/ICCV.2019.01060"},{"key":"11503_CR28","first-page":"1","volume":"34","author":"H Tan","year":"2022","unstructured":"Tan H, Liu X, Yin B et al (2022) DR-GAN: distribution regularization for text-to-image generation. IEEE Trans Neural Netw Learn Syst 34:1\u201315","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"11503_CR29","doi-asserted-by":"crossref","unstructured":"Tao M, Tang H, Wu F, et\u00a0al (2022) DF-GAN: A simple and effective baseline for text-to-image synthesis. In: 2022 IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 16,494\u201316,504","DOI":"10.1109\/CVPR52688.2022.01602"},{"key":"11503_CR30","unstructured":"de\u00a0Vries H, Strub F, Mary J, et\u00a0al (2017) Modulating early visual processing by language. In: Advances in neural information processing systems, pp 6594\u20136604"},{"key":"11503_CR31","unstructured":"Wah C, Branson S, Welinder P, et\u00a0al (2011) The caltech-ucsd birds-200-2011 dataset"},{"key":"11503_CR32","doi-asserted-by":"crossref","unstructured":"Xu T, Zhang P, Huang Q, et\u00a0al (2018) AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In: 2018 IEEE\/CVF conference on computer vision and pattern recognition, pp 1316\u20131324","DOI":"10.1109\/CVPR.2018.00143"},{"key":"11503_CR33","doi-asserted-by":"publisher","first-page":"2798","DOI":"10.1109\/TIP.2021.3055062","volume":"30","author":"Y Yang","year":"2021","unstructured":"Yang Y, Wang L, Xie D et al (2021) Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis. IEEE Trans Image Process 30:2798\u20132809","journal-title":"IEEE Trans Image Process"},{"key":"11503_CR34","unstructured":"Ye S, Liu F, Tan M (2022) Recurrent affine transformation for text-to-image synthesis. arXiv preprint arXiv:2204.10482"},{"key":"11503_CR35","doi-asserted-by":"crossref","unstructured":"Yin G, Liu B, Sheng L, et\u00a0al (2019) Semantics disentangling for text-to-image generation. In: 2019 IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 2322\u20132331","DOI":"10.1109\/CVPR.2019.00243"},{"key":"11503_CR36","doi-asserted-by":"crossref","unstructured":"Zhang H, Xu T, Li H, et\u00a0al (2017) StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV), pp 5908\u20135916","DOI":"10.1109\/ICCV.2017.629"},{"issue":"8","key":"11503_CR37","doi-asserted-by":"publisher","first-page":"1947","DOI":"10.1109\/TPAMI.2018.2856256","volume":"41","author":"H Zhang","year":"2019","unstructured":"Zhang H, Xu T, Li H et al (2019) StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947\u20131962","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11503_CR38","doi-asserted-by":"crossref","unstructured":"Zhang H, Koh JY, Baldridge J, et\u00a0al (2021) Cross-modal contrastive learning for text-to-image generation. In: 2021 IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 833\u2013842","DOI":"10.1109\/CVPR46437.2021.00089"},{"key":"11503_CR39","doi-asserted-by":"crossref","unstructured":"Zhang Z, Schomaker L (2021) DTGAN: Dual attention generative adversarial networks for text-to-image generation. In: 2021 International joint conference on neural networks (IJCNN), pp 1\u20138","DOI":"10.1109\/IJCNN52387.2021.9533527"},{"key":"11503_CR40","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1016\/j.neucom.2021.12.005","volume":"473","author":"Z Zhang","year":"2022","unstructured":"Zhang Z, Schomaker L (2022) DiverGAN: an efficient and effective single-stage framework for diverse text-to-image generation. Neurocomputing 473:18\u2013198","journal-title":"Neurocomputing"},{"key":"11503_CR41","doi-asserted-by":"crossref","unstructured":"Zhu M, Pan P, Chen W, et\u00a0al (2019) DM-GAN: Dynamic memory generative adversarial networks for text-to-image synthesis. In: 2019 IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 5795\u20135803","DOI":"10.1109\/CVPR.2019.00595"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11503-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11503-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11503-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,29]],"date-time":"2024-02-29T20:08:30Z","timestamp":1709237310000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11503-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,6]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,2]]}},"alternative-id":["11503"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11503-5","relation":{},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,6]]},"assertion":[{"value":"25 November 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 February 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"11"}}