{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T22:58:02Z","timestamp":1767913082041,"version":"3.49.0"},"reference-count":53,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T00:00:00Z","timestamp":1741824000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T00:00:00Z","timestamp":1741824000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100018693","name":"HORIZON EUROPE Framework Programme","doi-asserted-by":"publisher","award":["101136006"],"award-info":[{"award-number":["101136006"]}],"id":[{"id":"10.13039\/100018693","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose <jats:bold>ACTIG<\/jats:bold>, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model\u2019s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/yrcong\/ACTIG\" ext-link-type=\"uri\">https:\/\/github.com\/yrcong\/ACTIG<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s11263-025-02371-0","type":"journal-article","created":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T16:04:50Z","timestamp":1741881890000},"page":"4555-4570","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Attribute-Centric Compositional Text-to-Image Generation"],"prefix":"10.1007","volume":"133","author":[{"given":"Yuren","family":"Cong","sequence":"first","affiliation":[]},{"given":"Martin Renqiang","family":"Min","sequence":"additional","affiliation":[]},{"given":"Li Erran","family":"Li","sequence":"additional","affiliation":[]},{"given":"Bodo","family":"Rosenhahn","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0649-9987","authenticated-orcid":false,"given":"Michael Ying","family":"Yang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,13]]},"reference":[{"key":"2371_CR1","unstructured":"Alayrac, J. B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millicah, K., Reynolds, M., Ring, R., Rutherford, E., Cabi, S., Han. T., Gong, Z., Samangooei, S., Monteiro, M., Menick, J., Borgeaud, S., Brock, A., Nematzadeh, A., Sharifzadeh, S., Binkowski, M., Barreira, R., Vinyals, O., Zisserman, A., & Simonyan, K. (2022) Flamingo: a visual language model for few-shot learning. In Proceedings of the 36th international conference on neural information processing systems."},{"key":"2371_CR2","doi-asserted-by":"crossref","unstructured":"Changpinyo, S., Sharma, P., Ding, N., & Soricut, R. (2021). Conceptual 12M: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In Proceedings of the IEEE\/cvf conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR46437.2021.00356"},{"key":"2371_CR3","doi-asserted-by":"crossref","unstructured":"Cheng, J., Wu, F., Tian, Y., Wang, L., & Tao, D. (2020). Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10911\u201310920.","DOI":"10.1109\/CVPR42600.2020.01092"},{"key":"2371_CR4","doi-asserted-by":"crossref","unstructured":"Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan, E., Castricato, L., & Raff, E. (2022). Vqgan-clip: Open domain image generation and editing with natural language guidance. In: European conference on computer vision, Springer, pp 88\u2013105 .","DOI":"10.1007\/978-3-031-19836-6_6"},{"key":"2371_CR5","unstructured":"Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805"},{"key":"2371_CR6","first-page":"19822","volume":"34","author":"M Ding","year":"2021","unstructured":"Ding, M., Yang, Z., Hong, W., Zheng, W., Zhou, C., Yin, D., Lin, J., Zou, X., Shao, Z., Yang, H., et al. (2021). Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems, 34, 19822\u201319835.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2371_CR7","doi-asserted-by":"crossref","unstructured":"Gafni, O., Polyak, A., Ashual, O., Sheynin, S., Parikh, D., & Taigman, Y. (2022). Make-a-scene: Scene-based text-to-image generation with human priors. arXiv preprint arXiv:2203.13131","DOI":"10.1007\/978-3-031-19784-0_6"},{"key":"2371_CR8","doi-asserted-by":"crossref","unstructured":"Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., & Guo, B. (2022). Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10696\u201310706.","DOI":"10.1109\/CVPR52688.2022.01043"},{"key":"2371_CR9","first-page":"6626","volume":"30","author":"M Heusel","year":"2017","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, 30, 6626\u20136637.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2371_CR10","doi-asserted-by":"crossref","unstructured":"Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 4401\u20134410.","DOI":"10.1109\/CVPR.2019.00453"},{"key":"2371_CR11","doi-asserted-by":"crossref","unstructured":"Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 8110\u20138119.","DOI":"10.1109\/CVPR42600.2020.00813"},{"key":"2371_CR12","unstructured":"Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980."},{"key":"2371_CR13","doi-asserted-by":"crossref","unstructured":"Lee, C. H., Liu, Z., Wu, L., & Luo, P. (2020). Maskgan: Towards diverse and interactive facial image manipulation. In IEEE conference on computer vision and pattern recognition","DOI":"10.1109\/CVPR42600.2020.00559"},{"key":"2371_CR14","doi-asserted-by":"crossref","unstructured":"Lee, D., Kim, C., Kim, S., Cho, M., & Han, W. S. (2022). Autoregressive image generation using residual quantization. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 11523\u201311532.","DOI":"10.1109\/CVPR52688.2022.01123"},{"key":"2371_CR15","first-page":"2063","volume":"32","author":"B Li","year":"2019","unstructured":"Li, B., Qi, X., Lukasiewicz, T., & Torr, P. (2019). Controllable text-to-image generation. Advances in Neural Information Processing Systems, 32, 2063\u20132073.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2371_CR16","doi-asserted-by":"crossref","unstructured":"Li, W., Zhang, P., Zhang, L., Huang, Q., He, X., Lyu, S., & Gao, J. (2019b). Object-driven text-to-image synthesis via adversarial training. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 12174\u201312182.","DOI":"10.1109\/CVPR.2019.01245"},{"key":"2371_CR17","doi-asserted-by":"crossref","unstructured":"Li, Z., Min, M. R., Li, K., & Xu, C. (2022). Stylet2i: Toward compositional and high-fidelity text-to-image synthesis. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 18197\u201318207.","DOI":"10.1109\/CVPR52688.2022.01766"},{"key":"2371_CR18","doi-asserted-by":"crossref","unstructured":"Liao, W., Hu, K., Yang, M. Y., & Rosenhahn, B. (2022). Text to image generation with semantic-spatial aware gan. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 18187\u201318196.","DOI":"10.1109\/CVPR52688.2022.01765"},{"key":"2371_CR19","doi-asserted-by":"crossref","unstructured":"Liu, B., Song, K., Zhu, Y., de Melo, G., & Elgammal, A. (2021). Time: text and image mutual-translation adversarial networks. Proceedings of the AAAI conference on artificial intelligence 35, 2082\u20132090.","DOI":"10.1609\/aaai.v35i3.16305"},{"key":"2371_CR20","doi-asserted-by":"crossref","unstructured":"Liu, N., Li, S., Du, Y., Torralba, A., & Tenenbaum, J. B. (2022). Compositional visual generation with composable diffusion models. arXiv preprint arXiv:2206.01714.","DOI":"10.1007\/978-3-031-19790-1_26"},{"key":"2371_CR21","unstructured":"Liu, X., Gong, C., Wu, L., Zhang, S., Su, H., & Liu, Q. (2021b). Fusedream: Training-free text-to-image generation with improved clip+ gan space optimization. arXiv preprint arXiv:2112.01573."},{"key":"2371_CR22","unstructured":"Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., & Chen, M. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741."},{"key":"2371_CR23","first-page":"13497","volume":"34","author":"W Nie","year":"2021","unstructured":"Nie, W., Vahdat, A., & Anandkumar, A. (2021). Controllable and compositional generation with latent-space energy-based models. Advances in Neural Information Processing Systems, 34, 13497\u201313510.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2371_CR24","unstructured":"Park, D. H., Azadi, S., Liu, X., Darrell, T., & Rohrbach, A. (2021). Benchmark for compositional text-to-image synthesis. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (Round 1)."},{"key":"2371_CR25","doi-asserted-by":"crossref","unstructured":"Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). Mirrorgan: Learning text-to-image generation by redescription. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1505\u20131514.","DOI":"10.1109\/CVPR.2019.00160"},{"key":"2371_CR26","unstructured":"Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J. et al. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning, pp 8748\u20138763."},{"key":"2371_CR27","unstructured":"Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. In International conference on machine learning, pp 8821\u20138831."},{"key":"2371_CR28","unstructured":"Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125."},{"key":"2371_CR29","doi-asserted-by":"crossref","unstructured":"Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 49\u201358.","DOI":"10.1109\/CVPR.2016.13"},{"key":"2371_CR30","doi-asserted-by":"crossref","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10684\u201310695.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"2371_CR31","doi-asserted-by":"crossref","unstructured":"Ruan, S., Zhang, Y., Zhang, K., Fan, Y., Tang, F., Liu, Q., & Chen. E. (2021). Dae-gan: Dynamic aspect-aware gan for text-to-image synthesis. In Proceedings of the IEEE\/CVF international conference on computer vision, pp 13960\u201313969.","DOI":"10.1109\/ICCV48922.2021.01370"},{"key":"2371_CR32","doi-asserted-by":"crossref","unstructured":"Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S. K. S., Ayan, B. K., Mahdavi, S. S., & Lopes, R. G. et\u00a0al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487.","DOI":"10.1145\/3528233.3530757"},{"key":"2371_CR33","unstructured":"Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullisl, C., Katta, A., Coombes, T., Jitsev, J., & Komatsuzaki, A. (2021). Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114."},{"key":"2371_CR34","doi-asserted-by":"crossref","unstructured":"Tao, M., Tang, H., Wu, F., Jing, X. Y., Bao, B. K., & Xu, C. (2022). Df-gan: A simple and effective baseline for text-to-image synthesis. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 16515\u201316525.","DOI":"10.1109\/CVPR52688.2022.01602"},{"key":"2371_CR35","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset."},{"key":"2371_CR36","doi-asserted-by":"crossref","unstructured":"Wang, H., Lin, G., Hoi, S. C., & Miao, C. (2021). Cycle-consistent inverse gan for text-to-image synthesis. In Proceedings of the 29th ACM international conference on multimedia, pp 630\u2013638.","DOI":"10.1145\/3474085.3475226"},{"key":"2371_CR37","unstructured":"Wang, Z., Liu, W., He, Q., Wu, X., & Yi, Z. (2022). Clip-gen: Language-free training of a text-to-image generator with clip. arXiv preprint arXiv:2203.00386."},{"key":"2371_CR38","doi-asserted-by":"crossref","unstructured":"Wu, C., Liang, J., Ji, L., Yang, F., Fang, Y., Jiang, D., & Duan, N. (2022a). N\u00fcwa: Visual synthesis pre-training for neural visual world creation. In European conference on computer vision, pp 720\u2013736.","DOI":"10.1007\/978-3-031-19787-1_41"},{"key":"2371_CR39","doi-asserted-by":"crossref","unstructured":"Wu, F., Liu, L., Hao, F., He, F., & Cheng, J. (2022b). Text-to-image synthesis based on object-guided joint-decoding transformer. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 18113\u201318122.","DOI":"10.1109\/CVPR52688.2022.01758"},{"key":"2371_CR40","doi-asserted-by":"crossref","unstructured":"Wu, X., Zhao, H., Zheng, L., Ding, S., & Li, X. (2022c). Adma-gan: Attribute-driven memory augmented gans for text-to-image generation. In Proceedings of the 30th acm international conference on multimedia, pp 1593\u20131602.","DOI":"10.1145\/3503161.3547821"},{"key":"2371_CR41","doi-asserted-by":"crossref","unstructured":"Xia, W., Yang, Y., Xue, J. H., & Wu, B. (2021a). Tedigan: Text-guided diverse face image generation and manipulation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2256\u20132265.","DOI":"10.1109\/CVPR46437.2021.00229"},{"key":"2371_CR42","doi-asserted-by":"crossref","unstructured":"Xia, W., Yang, Y., Xue, J. H., & Wu, B. (2021b). Towards open-world text-guided face image generation and manipulation. arXiv preprint arXiv:2104.08910.","DOI":"10.1109\/CVPR46437.2021.00229"},{"key":"2371_CR43","doi-asserted-by":"crossref","unstructured":"Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1316\u20131324.","DOI":"10.1109\/CVPR.2018.00143"},{"key":"2371_CR44","doi-asserted-by":"crossref","unstructured":"Yin, G., Liu, B., Sheng, L., Yu, N., Wang, X., & Shao, J. (2019). Semantics disentangling for text-to-image generation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2327\u20132336.","DOI":"10.1109\/CVPR.2019.00243"},{"key":"2371_CR45","unstructured":"Yu, J., Xu, Y., Koh, J. Y., Luong, T., Baid, G., Wang, Z., Vasudevan, V., Ku, A., Yang, Y., & Ayan, B. K. et al. (2022). Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789."},{"key":"2371_CR46","unstructured":"Yuan, L., Chen, D., Chen, Y. L., Codella, N., Dai, X., Gao, J., Hu, H., Huang, X., Li, B., Li, C., Liu, C., Liu, M., Liu, Z., Lu, Y., Shi, Y., Wang, L., Wang, J., Xiao, B., Xiao, Z., Yang, J., Zeng, M., Zhou, L., & Zhang, P. (2021). Florence: A new foundation model for computer vision. arXiv preprint arXiv:2111.11432."},{"key":"2371_CR47","doi-asserted-by":"crossref","unstructured":"Zhang, H., Koh, J. Y., Baldridge, J., Lee, H., & Yang, Y. (2021). Cross-modal contrastive learning for text-to-image generation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 833\u2013842.","DOI":"10.1109\/CVPR46437.2021.00089"},{"key":"2371_CR48","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1016\/j.neucom.2021.12.005","volume":"473","author":"Z Zhang","year":"2022","unstructured":"Zhang, Z., & Schomaker, L. (2022). Divergan: An efficient and effective single-stage framework for diverse text-to-image generation. Neurocomputing, 473, 182\u2013198.","journal-title":"Neurocomputing"},{"key":"2371_CR49","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Xie, Y., & Yang, L. (2018). Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 6199\u20136208.","DOI":"10.1109\/CVPR.2018.00649"},{"key":"2371_CR50","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Zhang, R., Chen, C., Li, C., Tensmeyer, C., Yu, T., Gu, J., Xu, J., & Sun, T. (2021). Lafite: Towards language-free training for text-to-image generation. arXiv preprint arXiv:2111.13792.","DOI":"10.1109\/CVPR52688.2022.01738"},{"key":"2371_CR51","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Zhang, R., Gu, J., Tensmeyer, C., Yu, T., Chen, C., Xu, J., & Sun, T. (2022). Interactive image generation with natural-language feedback. In Proceedings of the 36th AAAI conference on artificial intelligence.","DOI":"10.1609\/aaai.v36i3.20270"},{"key":"2371_CR52","doi-asserted-by":"crossref","unstructured":"Zhu, B., & Ngo, C. W. (2020). Cookgan: Causality based text-to-image synthesis. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5519\u20135527.","DOI":"10.1109\/CVPR42600.2020.00556"},{"key":"2371_CR53","doi-asserted-by":"crossref","unstructured":"Zhu, M., Pan, P., Chen, W., & Yang, Y. (2019). Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5802\u20135810.","DOI":"10.1109\/CVPR.2019.00595"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02371-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-025-02371-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02371-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,7]],"date-time":"2025-06-07T06:05:34Z","timestamp":1749276334000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-025-02371-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,13]]},"references-count":53,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["2371"],"URL":"https:\/\/doi.org\/10.1007\/s11263-025-02371-0","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,13]]},"assertion":[{"value":"2 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 March 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"As the use of machine learning in everyday life grows, it is relevant to consider the potential social impact of our work. Our work has the potential to be used for deep fake. Since our model can generate high-fidelity images with specific attributes, this even makes deep fake more flexible. On the other hand, our attribute-centric generative model is less affected by overrepresented attribute compositions in the dataset and can generate the images that match the given text. Therefore, our work contributes to the elimination of bias in generative models.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics statement"}}]}}