{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:27:20Z","timestamp":1760059640758,"version":"build-2065373602"},"reference-count":32,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T00:00:00Z","timestamp":1750982400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["42371476","buctrc202132"],"award-info":[{"award-number":["42371476","buctrc202132"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities of Beijing University of Chemical Technology","doi-asserted-by":"publisher","award":["42371476","buctrc202132"],"award-info":[{"award-number":["42371476","buctrc202132"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Spatial scenes, as fundamental units of geospatial cognition, encompass rich objects and spatial relationships, and their generation techniques hold significant application value in disaster simulation and emergency drills, delayed spatial reconstruction and analysis, and other fields. However, existing studies still face limitations in modeling complex spatial relationships during scene generation, leading to insufficient semantic consistency and geographical accuracy. The advancement of Geospatial Artificial Intelligence (GeoAI) offers a new technical pathway for the intelligent modeling of spatial scenes. Against this backdrop, we propose SceneDiffusion, a scene generation model embedded with spatial constraints, and construct a geospatial scene dataset incorporating spatial relationship descriptions and geographic semantics, aiming to enhance the understanding and modeling capabilities of GeoAI models for spatial information. Specifically, SceneDiffusion employs a spatial scene representation framework to uniformly characterize objects and their topological, directional, and distance relationships, enhances the interactive modeling of objects and relationships through a Spatial relationship Attention-aware Graph (SAG) module, and finally generates high-quality scene images conforming to geographic semantics using a Layout information-guided Conditional Diffusion (LCD) module. Both qualitative and quantitative experiments demonstrate the superiority of SceneDiffusion, achieving a 56.6% reduction in FID and a 35.3% improvement in SSIM compared to baseline methods. Ablation studies confirm the importance of multi-relational modeling with attention mechanisms. By generating scenes that satisfy spatial distribution constraints, this work provides technical support for applications such as emergency scene simulation and virtual scene construction, while also offering insights for theoretical research and methodological innovation in GeoAI.<\/jats:p>","DOI":"10.3390\/ijgi14070250","type":"journal-article","created":{"date-parts":[[2025,6,30]],"date-time":"2025-06-30T10:03:48Z","timestamp":1751277828000},"page":"250","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["SceneDiffusion: Scene Generation Model Embedded with Spatial Constraints"],"prefix":"10.3390","volume":"14","author":[{"given":"Shanshan","family":"Yu","sequence":"first","affiliation":[{"name":"School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiaxin","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China"},{"name":"Beijing Sunwise Space Technology Ltd., Beijing 100004, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiaqi","family":"Li","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xunchun","family":"Li","sequence":"additional","affiliation":[{"name":"Academy of Broadcasting Science, National Radio and Television Administration of China, Beijing 100866, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kai","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing Institute of Aerospace Long March Vehicles, Beijing 100076, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jian","family":"Tu","sequence":"additional","affiliation":[{"name":"Beijing Institute of Aerospace Long March Vehicles, Beijing 100076, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9131-6363","authenticated-orcid":false,"given":"Danhuai","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,27]]},"reference":[{"key":"ref_1","first-page":"1865","article-title":"A review of recent researches and reflections on geospatial artificial intelligence","volume":"45","author":"Gao","year":"2020","journal-title":"Geomat. Inf. Sci. Wuhan Univ."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1900","DOI":"10.1109\/JPROC.2017.2684460","article-title":"Social media: New perspectives to improve remote sensing for emergency response","volume":"105","author":"Li","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_3","first-page":"400","article-title":"A spatial scene reconstruction framework in emergency response scenario","volume":"5","author":"Zheng","year":"2024","journal-title":"J. Saf. Sci. Resil."},{"key":"ref_4","unstructured":"Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., M\u00fcller, J., Penna, J., and Rombach, R. (2023). Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"29047","DOI":"10.1007\/s11042-023-16704-z","article-title":"Sketch-to-image synthesis via semantic masks","volume":"83","author":"Baraheem","year":"2024","journal-title":"Multimed. Tools Appl."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Maheshwari, P., Chaudhry, R., and Vinay, V. (2021, January 2\u20139). Scene graph embeddings using relative similarity supervision. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual.","DOI":"10.1609\/aaai.v35i3.16333"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Johnson, J., Gupta, A., and Fei-Fei, L. (2018, January 18\u201322). Image generation from scene graphs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00133"},{"key":"ref_8","unstructured":"Ashual, O., and Wolf, L. (November, January 27). Specifying object attributes and relations in interactive scene generation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Vo, D.M., and Sugimoto, A. (2020, January 23\u201328). Visual-relation conscious image generation from structured-text. Proceedings of the European Conference on Computer Vision, Virtual.","DOI":"10.1007\/978-3-030-58604-1_18"},{"key":"ref_10","unstructured":"Yang, L., Huang, Z., Song, Y., Hong, S., Li, G., Zhang, W., Cui, B., Ghanem, B., and Yang, M.H. (2022). Diffusion-based scene graph to image generation with masked contrastive pre-training. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022, January 19\u201324). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"ref_12","unstructured":"Lian, L., Li, B., Yala, A., and Darrell, T. (2023). Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, R., Chen, Z., Chen, C., Ma, J., Lu, H., and Lin, X. (2024, January 20\u201327). Compositional text-to-image synthesis with attention map control of diffusion models. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada.","DOI":"10.1609\/aaai.v38i6.28364"},{"key":"ref_14","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning. PmLR, Virtual."},{"key":"ref_15","unstructured":"Feng, W., He, X., Fu, T.J., Jampani, V., Akula, A., Narayana, P., Basu, S., Wang, X.E., and Wang, W.Y. (2022). Training-free structured diffusion guidance for compositional text-to-image synthesis. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3592116","article-title":"Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models","volume":"42","author":"Chefer","year":"2023","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yang, Z., Wang, J., Gan, Z., Li, L., Lin, K., Wu, C., Duan, N., Liu, Z., Liu, C., and Zeng, M. (2023, January 18\u201322). Reco: Region-controlled text-to-image generation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01369"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, L., Rao, A., and Agrawala, M. (2023, January 2\u20136). Adding conditional control to text-to-image diffusion models. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"ref_19","unstructured":"Li, Y., Wang, H., Jin, Q., Hu, J., Chemerys, P., Fu, Y., Wang, Y., Tulyakov, S., and Ren, J. (2024). Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. Adv. Neural Inf. Process. Syst., 36."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhang, G., Wang, K., Xu, X., Wang, Z., and Shi, H. (2024, January 17\u201321). Forget-me-not: Learning to forget in text-to-image diffusion models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00182"},{"key":"ref_21","unstructured":"Li, D., Li, J., and Hoi, S. (2024). Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing. Adv. Neural Inf. Process. Syst., 36."},{"key":"ref_22","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, X., Park, D.H., Azadi, S., Zhang, G., Chopikyan, A., Hu, Y., Shi, H., Rohrbach, A., and Darrell, T. (2023, January 2\u20137). More control for free! image synthesis with semantic diffusion guidance. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00037"},{"key":"ref_24","unstructured":"Kawar, B., Ganz, R., and Elad, M. (2022). Enhancing diffusion-based image synthesis with robust classifier guidance. arXiv."},{"key":"ref_25","unstructured":"Shenoy, R., Pan, Z., Balakrishnan, K., Cheng, Q., Jeon, Y., Yang, H., and Kim, J. (2024). Gradient-Free Classifier Guidance for Diffusion Model Sampling. arXiv."},{"key":"ref_26","unstructured":"Ho, J., and Salimans, T. (2022). Classifier-free diffusion guidance. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yang, B., Luo, Y., Chen, Z., Wang, G., Liang, X., and Lin, L. (2023, January 2\u20136). Law-diffusion: Complex scene generation by diffusion with layouts. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.02072"},{"key":"ref_28","unstructured":"Baykal, G., Karagoz, H.F., Binhuraib, T., and Unal, G. (2024, January 5\u20137). ProtoDiffusion: Classifier-free diffusion guidance with prototype learning. Proceedings of the Asian Conference on Machine Learning, PMLR, Hanoi, Vietnam."},{"key":"ref_29","first-page":"165","article-title":"A spatial logic based on regions and connection","volume":"92","author":"Randell","year":"1992","journal-title":"KR"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1389","DOI":"10.1002\/j.1538-7305.1957.tb01515.x","article-title":"Shortest connection networks and some generalizations","volume":"36","author":"Prim","year":"1957","journal-title":"Bell Syst. Tech. J."},{"key":"ref_31","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"8","DOI":"10.4236\/jcc.2019.73002","article-title":"Image quality assessment through FSIM, SSIM, MSE and PSNR\u2014A comparative study","volume":"7","author":"Sara","year":"2019","journal-title":"J. Comput. Commun."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/14\/7\/250\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:00:30Z","timestamp":1760032830000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/14\/7\/250"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,27]]},"references-count":32,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["ijgi14070250"],"URL":"https:\/\/doi.org\/10.3390\/ijgi14070250","relation":{},"ISSN":["2220-9964"],"issn-type":[{"type":"electronic","value":"2220-9964"}],"subject":[],"published":{"date-parts":[[2025,6,27]]}}}