{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,3]],"date-time":"2025-07-03T09:46:06Z","timestamp":1751535966656,"version":"3.37.3"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,3,29]],"date-time":"2021-03-29T00:00:00Z","timestamp":1616976000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,3,29]],"date-time":"2021-03-29T00:00:00Z","timestamp":1616976000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61303093"],"award-info":[{"award-number":["61303093"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61402278"],"award-info":[{"award-number":["61402278"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"the Shanghai Natural Science Foundation","award":["19ZR1419100"],"award-info":[{"award-number":["19ZR1419100"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Image Video Proc."],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Perfect image compositing can harmonize the appearance between the foreground and background effectively so that the composite result looks seamless and natural. However, the traditional convolutional neural network (CNN)-based methods often fail to yield highly realistic composite results due to overdependence on scene parsing while ignoring the coherence of semantic and structural between foreground and background. In this paper, we propose a framework to solve this problem by training a stacked generative adversarial network with attention guidance, which can efficiently create a high-resolution, realistic-looking composite. To this end, we develop a diverse adversarial loss in addition to perceptual and guidance loss to train the proposed generative network. Moreover, we construct a multi-scenario dataset for high-resolution image compositing, which contains high-quality images with different styles and object masks. Experiments on the synthesized and real images demonstrate the efficiency and effectiveness of our network in producing seamless, natural, and realistic results. Ablation studies show that our proposed network can improve the visual performance of composite results compared with the application of existing methods.<\/jats:p>","DOI":"10.1186\/s13640-021-00550-w","type":"journal-article","created":{"date-parts":[[2021,3,29]],"date-time":"2021-03-29T15:02:57Z","timestamp":1617030177000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Stacked generative adversarial networks for image compositing"],"prefix":"10.1186","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1697-6089","authenticated-orcid":false,"given":"Bing","family":"Yu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Youdong","family":"Ding","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhifeng","family":"Xie","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongjin","family":"Huang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,3,29]]},"reference":[{"key":"550_CR1","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1145\/1531326.1531373","volume":"28","author":"Z. Farbman","year":"2009","unstructured":"Z. Farbman, G. Hoffer, Y. Lipman, D. Cohen-Or, D. Lischinski, Coordinates for instant image cloning. ACM Trans. Graph.28:, 67 (2009).","journal-title":"ACM Trans. Graph."},{"key":"550_CR2","doi-asserted-by":"publisher","first-page":"3943","DOI":"10.1109\/ICCV.2015.449","volume-title":"2015 IEEE International Conference on Computer Vision (ICCV)","author":"J. -Y. Zhu","year":"2015","unstructured":"J. -Y. Zhu, P. Krahenbuhl, E. Shechtman, A. A. Efros, in 2015 IEEE International Conference on Computer Vision (ICCV). Learning a discriminative model for the perception of realism in composite images (IEEEPiscataway, 2015), pp. 3943\u20133951."},{"key":"550_CR3","doi-asserted-by":"publisher","first-page":"2799","DOI":"10.1109\/CVPR.2017.299","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Y. -H. Tsai","year":"2017","unstructured":"Y. -H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, X. Lu, M. -H. Yang, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Deep image harmonization (IEEEPiscataway, 2017), pp. 2799\u20132807."},{"key":"550_CR4","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1111\/cgf.13478","volume":"37","author":"F. Luan","year":"2018","unstructured":"F. Luan, S. Paris, E. Shechtman, K. Bala, Deep painterly harmonization. Comput. Graph. Forum. 37:, 95\u2013106 (2018).","journal-title":"Comput. Graph. Forum"},{"key":"550_CR5","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1145\/882262.882269","volume":"22","author":"P. Perez","year":"2003","unstructured":"P. Perez, M. Gangnet, A. Blake, Poisson image editing. ACM Trans. Graph.22:, 313\u2013318 (2003).","journal-title":"ACM Trans. Graph."},{"key":"550_CR6","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1145\/1276377.1276389","volume":"26","author":"J. Wang","year":"2007","unstructured":"J. Wang, M. Agrawala, M. F. Cohen, Soft scissors: an interactive tool for realtime high quality matting. ACM Trans. Graph.26:, 9 (2007).","journal-title":"ACM Trans. Graph."},{"key":"550_CR7","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1145\/1778765.1778862","volume":"29","author":"K. Sunkavalli","year":"2010","unstructured":"K. Sunkavalli, M. K. Johnson, W. Matusik, H. Pfister, Multi-scale image harmonization. ACM Trans. Graph.29:, 125 (2010).","journal-title":"ACM Trans. Graph."},{"key":"550_CR8","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1145\/2185520.2185578","volume":"31","author":"S. Darabi","year":"2012","unstructured":"S. Darabi, E. Shechtman, C. Barnes, D. B. Goldman, P. Sen, Image melding: combining inconsistent images using patch-based synthesis. ACM Trans. Graph.31:, 82 (2012).","journal-title":"ACM Trans. Graph."},{"key":"550_CR9","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1145\/2897824.2925942","volume":"35","author":"Y. -H. Tsai","year":"2016","unstructured":"Y. -H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, M. -H. Yang, Sky is not the limit: semantic-aware sky replacement. ACM Trans. Graph.35:, 149 (2016).","journal-title":"ACM Trans. Graph."},{"key":"550_CR10","doi-asserted-by":"publisher","first-page":"2536","DOI":"10.1109\/CVPR.2016.278","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"D. Pathak","year":"2016","unstructured":"D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, A. A. Efros, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Context encoders: feature learning by inpainting (IEEEPiscataway, 2016), pp. 2536\u20132544."},{"key":"550_CR11","first-page":"2487","volume-title":"2019 ACM International Conference on Multimedia","author":"H. Wu","year":"2019","unstructured":"H. Wu, S. Zheng, J. Zhang, K. Huang, in 2019 ACM International Conference on Multimedia. Gp-gan: Towards realistic high-resolution image blending (ACMNew York, 2019), pp. 2487\u20132495."},{"key":"550_CR12","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-030-01264-9_1","volume-title":"Computer Vision \u2013 ECCV 2018","author":"Z. Yan","year":"2018","unstructured":"Z. Yan, X. Li, M. Li, W. Zuo, S. Shan, in Computer Vision \u2013 ECCV 2018. Shift-net: image inpainting via deep feature rearrangement (SpringerNew York, 2018), pp. 3\u201319."},{"key":"550_CR13","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/2185520.2185580","volume":"31","author":"S. Xue","year":"2012","unstructured":"S. Xue, A. Agarwala, J. Dorsey, H. Rushmeier, Understanding and improving the realism of image composites. ACM Trans. Graph.31:, 84 (2012).","journal-title":"ACM Trans. Graph."},{"key":"550_CR14","doi-asserted-by":"publisher","first-page":"1519","DOI":"10.1109\/WACV.2018.00170","volume-title":"2018 IEEE Winter Conference on Applications of Computer Vision (WACV)","author":"F. Tan","year":"2018","unstructured":"F. Tan, C. Bernier, B. Cohen, V. Ordonez, C. Barnes, in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Where and who automatic semantic-aware person composition (IEEEPiscataway, 2018), pp. 1519\u20131528."},{"key":"550_CR15","first-page":"119","volume":"36","author":"R. Zhang","year":"2017","unstructured":"R. Zhang, J. -Y. Zhu, P. Isola, X. Geng, A. S. Lin, T. Yu, A. A. Efros, Real-time user-guided image colorization with learned deep priors. ACM Trans. Graph.36:, 119 (2017).","journal-title":"ACM Trans. Graph."},{"key":"550_CR16","doi-asserted-by":"publisher","first-page":"606","DOI":"10.1109\/CVPR.2018.00070","volume-title":"2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"X. Wang","year":"2018","unstructured":"X. Wang, K. Yu, C. Dong, C. C. Loy, in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Recovering realistic texture in image super-resolution by deep spatial feature transform (IEEEPiscataway, 2018), pp. 606\u2013615."},{"key":"550_CR17","doi-asserted-by":"publisher","first-page":"5928","DOI":"10.1109\/CVPR.2018.00621","volume-title":"2018 IEEE Conference on Computer Vision and Pattern Recognition","author":"J. Park","year":"2018","unstructured":"J. Park, J. -Y. Lee, D. Yoo, I. S. Kweon, in 2018 IEEE Conference on Computer Vision and Pattern Recognition. Distort-and-recover: color enhancement using deep reinforcement learning (IEEEPiscataway, 2018), pp. 5928\u20135936."},{"key":"550_CR18","first-page":"97","volume-title":"2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"V. Bychkovsky","year":"2011","unstructured":"V. Bychkovsky, S. Paris, E. Chan, F. Durand, in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Learning photographic global tonal adjustment with a database of input\/output image pairs (IEEEPiscataway, 2011), pp. 97\u2013104."},{"key":"550_CR19","first-page":"234","volume-title":"Lecture Notes in Computer Science","author":"O. Ronneberger","year":"2015","unstructured":"O. Ronneberger, P. Fischer, T. Brox, in Lecture Notes in Computer Science. U-net: convolutional networks for biomedical image segmentation (SpringerNew York, 2015), pp. 234\u2013241."},{"key":"550_CR20","first-page":"2672","volume-title":"Advances in Neural Information Processing Systems","author":"I. J. Goodfellow","year":"2014","unstructured":"I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, in Advances in Neural Information Processing Systems. Generative adversarial nets (MIT PressCambridge, 2014), pp. 2672\u20132680."},{"key":"550_CR21","unstructured":"M. Mirza, S. Osindero, Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014). https:\/\/arxiv.org\/abs\/1411.1784."},{"key":"550_CR22","first-page":"465","volume-title":"Advances in Neural Information Processing Systems","author":"J. -Y. Zhu","year":"2017","unstructured":"J. -Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, E. Shechtman, in Advances in Neural Information Processing Systems. Toward multimodal image-to-image translation (MIT PressCambridge, 2017), pp. 465\u2013476."},{"key":"550_CR23","doi-asserted-by":"publisher","first-page":"5967","DOI":"10.1109\/CVPR.2017.632","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"P. Isola","year":"2017","unstructured":"P. Isola, J. -Y. Zhu, T. Zhou, A. A. Efros, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Image-to-image translation with conditional adversarial networks (IEEEPiscataway, 2017), pp. 5967\u20135976."},{"key":"550_CR24","doi-asserted-by":"publisher","first-page":"8798","DOI":"10.1109\/CVPR.2018.00917","volume-title":"2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"T. -C. Wang","year":"2018","unstructured":"T. -C. Wang, M. -Y. Liu, J. -Y. Zhu, A. Tao, J. Kautz, B. Catanzaro, in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). High-resolution image synthesis and semantic manipulation with conditional gans (IEEEPiscataway, 2018), pp. 8798\u20138807."},{"key":"550_CR25","doi-asserted-by":"publisher","first-page":"8456","DOI":"10.1109\/CVPR.2018.00882","volume-title":"2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"W. Xian","year":"2018","unstructured":"W. Xian, P. Sangkloy, V. Agrawal, A. Raj, J. Lu, C. Fang, F. Yu, J. Hays, in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Texturegan: controlling deep image synthesis with texture patches (IEEEPiscataway, 2018), pp. 8456\u20138465."},{"key":"550_CR26","first-page":"8456","volume-title":"2018 IEEE Conference on Computer Vision and Pattern Recognition","author":"T. Dekel","year":"2018","unstructured":"T. Dekel, C. Gan, D. Krishnan, C. Liu, W. T. Freeman, in 2018 IEEE Conference on Computer Vision and Pattern Recognition. Sparse, smart contours to represent and edit images (IEEEPiscataway, 2018), pp. 8456\u20138465."},{"key":"550_CR27","first-page":"89","volume-title":"Computer Vision \u2013 ECCV 2018","author":"G. Liu","year":"2018","unstructured":"G. Liu, F. A. Reda, K. J. Shih, T. -C. Wang, A. Tao, B. Catanzaro, in Computer Vision \u2013 ECCV 2018. Image inpainting for irregular holes using partial convolutions (SpringerNew York, 2018), pp. 89\u2013105."},{"key":"550_CR28","doi-asserted-by":"publisher","first-page":"5908","DOI":"10.1109\/ICCV.2017.629","volume-title":"2017 IEEE International Conference on Computer Vision (ICCV)","author":"H. Zhang","year":"2017","unstructured":"H. Zhang, T. Xu, H. Li, in 2017 IEEE International Conference on Computer Vision (ICCV). StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks (IEEEPiscataway, 2017), pp. 5908\u20135916."},{"key":"550_CR29","doi-asserted-by":"publisher","first-page":"1520","DOI":"10.1109\/ICCV.2017.168","volume-title":"2017 IEEE International Conference on Computer Vision (ICCV)","author":"Q. Chen","year":"2017","unstructured":"Q. Chen, V. Koltun, in 2017 IEEE International Conference on Computer Vision (ICCV). Photographic image synthesis with cascaded refinement networks (IEEEPiscataway, 2017), pp. 1520\u20131529."},{"key":"550_CR30","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1145\/3072959.3073659","volume":"36","author":"S. Iizuka","year":"2017","unstructured":"S. Iizuka, E. Simo-Serra, H. Ishikawa, Globally and locally consistent image completion. ACM Trans. Graph.36:, 107 (2017).","journal-title":"ACM Trans. Graph."},{"key":"550_CR31","doi-asserted-by":"publisher","first-page":"694","DOI":"10.1007\/978-3-319-46475-6_43","volume-title":"Computer Vision \u2013 ECCV 2016","author":"J. Johnson","year":"2016","unstructured":"J. Johnson, A. Alahi, L. Fei-Fei, in Computer Vision \u2013 ECCV 2016. Perceptual losses for real-time style transfer and super-resolution (SpringerNew York, 2016), pp. 694\u2013711."},{"key":"550_CR32","unstructured":"K. Simonyan, Z. Andrew, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). http:\/\/arxiv.org\/abs\/1409.1556."},{"key":"550_CR33","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1109\/CVPR.2009.5206848","volume-title":"2009 IEEE Conference on Computer Vision and Pattern Recognition","author":"J. Deng","year":"2009","unstructured":"J. Deng, W. Dong, R. Socher, L. -J. Li, K. Li, L. Fei-Fei, in 2009 IEEE Conference on Computer Vision and Pattern Recognition. Imagenet: a large-scale hierarchical image database (IEEEPiscataway, 2009), pp. 248\u2013255."},{"key":"550_CR34","unstructured":"D. P. Kingma, J. L. Ba, Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). https:\/\/arxiv.org\/abs\/1412.6980."},{"key":"550_CR35","first-page":"1","volume-title":"2007 IEEE Conference on Computer Vision and Pattern Recognition","author":"N. Jacobs","year":"2007","unstructured":"N. Jacobs, N. Roman, R. Pless, in 2007 IEEE Conference on Computer Vision and Pattern Recognition. Consistent temporal variations in many outdoor scenes (IEEEPiscataway, 2007), pp. 1\u20136."},{"key":"550_CR36","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1007\/s11263-007-0090-8","volume":"77","author":"B. C. Russell","year":"2007","unstructured":"B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman, LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis.77:, 157\u2013173 (2007).","journal-title":"Int. J. Comput. Vis."},{"key":"550_CR37","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1145\/2601097.2601101","volume":"33","author":"P. -Y. Laffont","year":"2014","unstructured":"P. -Y. Laffont, Z. Ren, X. Tao, C. Qian, J. Hays, Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. Graph.33:, 149 (2014).","journal-title":"ACM Trans. Graph."},{"key":"550_CR38","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1145\/2010324.1964965","volume":"30","author":"Y. HaCohen","year":"2011","unstructured":"Y. HaCohen, E. Shechtman, D. B. Goldman, D. Lischinski, Non-rigid dense correspondence with applications for image enhancement. ACM Trans. Graph.30:, 70 (2011).","journal-title":"ACM Trans. Graph."},{"key":"550_CR39","doi-asserted-by":"publisher","first-page":"586","DOI":"10.1109\/CVPR.2018.00068","volume-title":"2018 IEEE Conference on Computer Vision and Pattern Recognition","author":"R. Zhang","year":"2018","unstructured":"R. Zhang, P. Isola, A. A. Efros, E. Shechtman, O. Wang, in 2018 IEEE Conference on Computer Vision and Pattern Recognition. The unreasonable effectiveness of deep features as a perceptual metric (IEEEPiscataway, 2018), pp. 586\u2013595."}],"container-title":["EURASIP Journal on Image and Video Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13640-021-00550-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13640-021-00550-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13640-021-00550-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,29]],"date-time":"2021-03-29T15:14:29Z","timestamp":1617030869000},"score":1,"resource":{"primary":{"URL":"https:\/\/jivp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13640-021-00550-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,29]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["550"],"URL":"https:\/\/doi.org\/10.1186\/s13640-021-00550-w","relation":{},"ISSN":["1687-5281"],"issn-type":[{"type":"electronic","value":"1687-5281"}],"subject":[],"published":{"date-parts":[[2021,3,29]]},"assertion":[{"value":"16 May 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 March 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 March 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"10"}}