{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T16:21:18Z","timestamp":1783095678486,"version":"3.54.6"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,7,1]],"date-time":"2022-07-01T00:00:00Z","timestamp":1656633600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1942531"],"award-info":[{"award-number":["1942531"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>We present a novel interactive augmented reality (AR) storytelling approach guided by indoor scene semantics. Our approach automatically populates virtual contents in real-world environments to deliver AR stories, which match both the story plots and scene semantics. During the storytelling process, a player can participate as a character in the story. Meanwhile, the behaviors of the virtual characters and the placement of the virtual items adapt to the player's actions. An input raw story is represented as a sequence of events, which contain high-level descriptions of the characters' states, and is converted into a graph representation with automatically supplemented low-level spatial details. Our hierarchical story sampling approach samples realistic character behaviors that fit the story contexts through optimizations; and an animator, which estimates and prioritizes the player's actions, animates the virtual characters to tell the story in AR. Through experiments and a user study, we validated the effectiveness of our approach for AR storytelling in different environments.<\/jats:p>","DOI":"10.1145\/3528223.3530061","type":"journal-article","created":{"date-parts":[[2022,7,22]],"date-time":"2022-07-22T21:06:27Z","timestamp":1658523987000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":30,"title":["Interactive augmented reality storytelling guided by scene semantics"],"prefix":"10.1145","volume":"41","author":[{"given":"Changyang","family":"Li","sequence":"first","affiliation":[{"name":"George Mason University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wanwan","family":"Li","sequence":"additional","affiliation":[{"name":"George Mason University and University of South Florida"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haikun","family":"Huang","sequence":"additional","affiliation":[{"name":"George Mason University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lap-Fai","family":"Yu","sequence":"additional","affiliation":[{"name":"George Mason University"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,7,22]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.5898\/JHRI.6.1.Admoni"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925893"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274247.3274511"},{"key":"e_1_2_2_4_1","unstructured":"Michael Argyle and Mark Cook. 1976. Gaze and mutual gaze. (1976)."},{"key":"e_1_2_2_5_1","volume-title":"Computer graphics forum","author":"Aristidou Andreas","unstructured":"Andreas Aristidou, Joan Lasenby, Yiorgos Chrysanthou, and Ariel Shamir. 2018. Inverse kinematics techniques in computer graphics: A survey. In Computer graphics forum, Vol. 37. Wiley Online Library, 35--58."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366175"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/616070.618818"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_23"},{"key":"e_1_2_2_9_1","volume-title":"Making space for voice: Technologies to support children's fantasy and storytelling. Personal and ubiquitous computing 5, 3","author":"Cassell Justine","year":"2001","unstructured":"Justine Cassell and Kimiko Ryokai. 2001. Making space for voice: Technologies to support children's fantasy and storytelling. Personal and ubiquitous computing 5, 3 (2001), 169--190."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2017.00081"},{"key":"e_1_2_2_11_1","volume-title":"Scene Graphs: A Survey of Generations and Applications. arXiv preprint arXiv:2104.01111","author":"Chang Xiaojun","year":"2021","unstructured":"Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, and Alex Hauptmann.2021. Scene Graphs: A Survey of Generations and Applications. arXiv preprint arXiv:2104.01111 (2021)."},{"key":"e_1_2_2_12_1","volume-title":"Tao Ruan Wan, and Jian Jun Zhang","author":"Chen Long","year":"2018","unstructured":"Long Chen, Wen Tang, Nigel John, Tao Ruan Wan, and Jian Jun Zhang. 2018. Context-aware mixed reality: A framework for ubiquitous interaction. arXiv preprint arXiv:1803.05541 (2018)."},{"key":"e_1_2_2_13_1","volume-title":"SceneAR: Scene-based Micro Narratives for Sharing and Remixing in Augmented Reality. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 294--303","author":"Chen Mengyu","year":"2021","unstructured":"Mengyu Chen, Andr\u00e9s Monroy-Hern\u00e1ndez, and Misha Sra. 2021. SceneAR: Scene-based Micro Narratives for Sharing and Remixing in Augmented Reality. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 294--303."},{"key":"e_1_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Yifei Cheng Yukang Yan Xin Yi Yuanchun Shi and David Lindlbauer. 2021. SemanticAdapt: Optimization-based Adaptation of Mixed Reality Layouts Leveraging Virtual-Physical Semantic Connections. In UIST. 282--297.","DOI":"10.1145\/3472749.3474750"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2021.102258"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470847"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1964921.1964929"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818057"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISMAR.2014.6948429"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376790"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1501750.1501819"},{"key":"e_1_2_2_22_1","volume-title":"Primary care optometry","author":"Grosvenor Theodore P","unstructured":"Theodore P Grosvenor. 2007. Primary care optometry. Elsevier Health Sciences."},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995448"},{"key":"e_1_2_2_24_1","doi-asserted-by":"crossref","unstructured":"Mohamed Hassan Duygu Ceylan Ruben Villegas Jun Saito Jimei Yang Yi Zhou and Michael J Black. 2021a. Stochastic scene-aware motion prediction. In ICCV. 11374--11384.","DOI":"10.1109\/ICCV48922.2021.01118"},{"key":"e_1_2_2_25_1","doi-asserted-by":"crossref","unstructured":"Mohamed Hassan Partha Ghosh Joachim Tesch Dimitrios Tzionas and Michael J Black. 2021b. Populating 3D Scenes by Learning Human-Scene Interaction. In CVPR. 14708--14718.","DOI":"10.1109\/CVPR46437.2021.01447"},{"key":"e_1_2_2_26_1","doi-asserted-by":"crossref","unstructured":"W Keith Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. (1970).","DOI":"10.2307\/2334940"},{"key":"e_1_2_2_27_1","volume-title":"Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.","author":"He Fengming","year":"2022","unstructured":"Fengming He, Xiyun Hu, Tianyi Wang, Ananya Ipsita, and Karthik Ramani. 2022. ScalAR: Authoring Semantically Adaptive Augmented Reality Experiences in Virtual Reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-018-1103-5"},{"key":"e_1_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Justin Johnson Ranjay Krishna Michael Stark Li-Jia Li David Shamma Michael Bernstein and Li Fei-Fei. 2015. Image retrieval using scene graphs. In CVPR. 3668--3678.","DOI":"10.1109\/CVPR.2015.7298990"},{"key":"e_1_2_2_30_1","first-page":"1","article-title":"Shape2pose: Human-centric shape analysis","volume":"33","author":"Kim Vladimir G","year":"2014","unstructured":"Vladimir G Kim, Siddhartha Chaudhuri, Leonidas Guibas, and Thomas Funkhouser. 2014. Shape2pose: Human-centric shape analysis. ACM Transactions on Graphics 33, 4 (2014), 1--12.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_2_2_31_1","volume-title":"Optimization by simulated annealing. Science 220, 4598","author":"Kirkpatrick Scott","year":"1983","unstructured":"Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671--680."},{"key":"e_1_2_2_32_1","volume-title":"Virtual agent positioning driven by scene semantics in mixed reality. In 2019 IEEE VR","author":"Lang Yining","unstructured":"Yining Lang, Wei Liang, and Lap-Fai Yu. 2019. Virtual agent positioning driven by scene semantics in mixed reality. In 2019 IEEE VR. IEEE, 767--775."},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480478"},{"key":"e_1_2_2_34_1","first-page":"1","article-title":"Grains: Generative recursive autoencoders for indoor scenes","volume":"38","author":"Li Manyi","year":"2019","unstructured":"Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri, Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen, Daniel Cohen-Or, and Hao Zhang. 2019. Grains: Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics 38, 2 (2019), 1--16.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445532"},{"key":"e_1_2_2_36_1","doi-asserted-by":"crossref","unstructured":"Zhihao Liang Zhihao Li Songcen Xu Mingkui Tan and Kui Jia. 2021a. Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks. In ICCV. 2783--2792.","DOI":"10.1109\/ICCV48922.2021.00278"},{"key":"e_1_2_2_37_1","volume-title":"Anna Maria Feit, and Otmar Hilliges","author":"Lindlbauer David","year":"2019","unstructured":"David Lindlbauer, Anna Maria Feit, and Otmar Hilliges. 2019. Context-aware online adaptation of mixed reality interfaces. In UIST. 147--160."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2980223"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.1699114"},{"key":"e_1_2_2_40_1","unstructured":"Microsoft. 2016. Fragments. www.microsoft.com\/en-us\/p\/fragments\/9nblggh5ggm8"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858250"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3083725"},{"key":"e_1_2_2_43_1","volume-title":"Virtualhome: Simulating household activities via programs. In CVPR. 8494--8502.","author":"Puig Xavier","year":"2018","unstructured":"Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, and Antonio Torralba. 2018. Virtualhome: Simulating household activities via programs. In CVPR. 8494--8502."},{"key":"e_1_2_2_44_1","unstructured":"Siyuan Qi Siyuan Huang Ping Wei and Song-Chun Zhu. 2017. Predicting human activities using stochastic grammar. In ICCV. 1164--1172."},{"key":"e_1_2_2_45_1","unstructured":"Siyuan Qi Yixin Zhu Siyuan Huang Chenfanfu Jiang and Song-Chun Zhu. 2018. Human-centric indoor scene synthesis using stochastic grammar. In CVPR. 5899--5908."},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9340781"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-41687-3_24"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661229.2661230"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925867"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356505"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISMAR-Adjunct51615.2020.00072"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751556"},{"key":"e_1_2_2_53_1","doi-asserted-by":"crossref","unstructured":"Jiashun Wang Huazhe Xu Jingwei Xu Sifei Liu and Xiaolong Wang. 2021a. Synthesizing long-term 3d human motion and interaction in 3d scenes. In CVPR. 9401--9411.","DOI":"10.1109\/CVPR46437.2021.00928"},{"key":"e_1_2_2_54_1","doi-asserted-by":"crossref","unstructured":"Jingbo Wang Sijie Yan Bo Dai and Dahua Lin. 2021b. Scene-aware generative network for human motion synthesis. In CVPR. 12206--12215.","DOI":"10.1109\/CVPR46437.2021.01203"},{"key":"e_1_2_2_55_1","doi-asserted-by":"crossref","unstructured":"Tianyi Wang Xun Qian Fengming He Xiyun Hu Ke Huo Yuanzhi Cao and Karthik Ramani. 2020. CAPturAR: An augmented reality tool for authoring human-involved context-aware applications. In UIST. 328--341.","DOI":"10.1145\/3379337.3415815"},{"key":"e_1_2_2_56_1","first-page":"1","article-title":"Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models","volume":"32","author":"Xu Kun","year":"2013","unstructured":"Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, and Shi-Min Hu. 2013. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Transactions on Graphics 32, 4 (2013), 1--15.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392404"},{"key":"e_1_2_2_58_1","doi-asserted-by":"crossref","unstructured":"Yibiao Zhao and Song-Chun Zhu. 2013. Scene parsing by integrating function geometry and appearance models. In CVPR. 3119--3126.","DOI":"10.1109\/CVPR.2013.401"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3097061"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/1067343.1067404"},{"key":"e_1_2_2_61_1","volume-title":"A stochastic grammar of images","author":"Zhu Song-Chun","unstructured":"Song-Chun Zhu and David Mumford. 2007. A stochastic grammar of images. Now Publishers Inc."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3528223.3530061","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3528223.3530061","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3528223.3530061","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:25Z","timestamp":1750186945000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3528223.3530061"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":61,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["10.1145\/3528223.3530061"],"URL":"https:\/\/doi.org\/10.1145\/3528223.3530061","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7]]},"assertion":[{"value":"2022-07-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}