{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T05:54:07Z","timestamp":1763358847981,"version":"3.41.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"CHI PLAY","license":[{"start":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T00:00:00Z","timestamp":1633392000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006831","name":"United States Air Force","doi-asserted-by":"crossref","award":["FA8702-15-D-0002"],"award-info":[{"award-number":["FA8702-15-D-0002"]}],"id":[{"id":"10.13039\/100006831","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2021,10,5]]},"abstract":"<jats:p>AI-enabled decision support systems have repeatedly failed in real world applications despite the underlying model operating as designed. Often this was because the system was used in an unexpected manner. Our goal is to enable better prediction of how systems will be used prior to their implementation as well as to improve existing designs, by taking human behavior into account. There are several challenges to collecting such data. Not having access to an existing prediction engine requires the simulation of such a system's behavior. This simulation must include not just the behavior of the underlying model but also the context in which the decision will be made in the real world. Additionally, collecting statistically valid samples requires that test subjects make repeated choices under slightly varied conditions. Unfortunately, in such repetitious conditions fatigue can quickly set in. Games provide us the ability to address both of these challenges by providing both systems context and narrative context. Systems context can be used to convey some or all of the information the player needs to make a decision in the game environment itself, which can help avoid the onset of fatigue. Narrative context can provide a broader environment within which the simulated system operates, adding a sense of progress, showing the effect of decisions, adding perceived social norms, and setting incentives and stakes. This broader environment can further prevent player fatigue while replicating many of the external factors that might affect choices in the real world. In this paper we describe the design of the Human-AI Decision Evaluation System (HADES), a test harness capable of interfacing with a game environment, simulating the behavior of an AI-enabled decision support system, and collecting the results of human decision making based upon such a system's predictions. Additionally, we present an analysis of data collected by HADES while interfaced with a visual novel game focused on software cyber-risk assessment.<\/jats:p>","DOI":"10.1145\/3474655","type":"journal-article","created":{"date-parts":[[2021,10,6]],"date-time":"2021-10-06T22:59:48Z","timestamp":1633561188000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Play for Real(ism) - Using Games to Predict Human-AI interactions in the Real World"],"prefix":"10.1145","volume":"5","author":[{"given":"Rotem D.","family":"Guttman","sequence":"first","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jessica","family":"Hammer","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Erik","family":"Harpstead","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carol J.","family":"Smith","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"REFERENCES"},{"key":"e_1_2_1_2_1","unstructured":"D. J. Ahler C. E. Roush and G. Sood. 2018. The micro-task market for \"Lemons\": Collecting data on Amazon's Mechanical Turk. Working Paper. Epub ahead of print."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"V. Aleven E. Myers M. Easterday and A. Ogan. 2010 April. Toward a framework for the analysis and design of educational games. In 2010 third IEEE international conference on digital game and intelligent toy enhanced learning (pp. 69--76). IEEE.","DOI":"10.1109\/DIGITEL.2010.55"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1177\/2053168018785483"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1930488.1930506"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"A. J. Berinsky G. A. Huber and G. S. Lenz. 2012. Evaluating online labor markets for experimental research: Amazon.com's Mechanical Turk. Political analysis 20(3) 351--368.","DOI":"10.1093\/pan\/mpr057"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173615"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1177\/2053168015622072"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11218-010--9145--8"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.2308\/bria-18-044"},{"key":"e_1_2_1_11_1","unstructured":"F. Doshi-Velez and B. Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1093\/jleo\/17.1.62"},{"key":"e_1_2_1_13_1","unstructured":"M. Dufwenberg S. G\u00e4chter and H. Henning-Schmidt. 2006. The framing of games and the psychology of strategic choice (No. 19\/2006). Bonn Econ Discussion Papers."},{"volume-title":"Proceedings of the 24th International Conference on Intelligent User Interfaces (pp. 229--239)","author":"Feng S.","key":"e_1_2_1_14_1","unstructured":"S. Feng and J. Boyd-Graber. 2019, March. What can ai do for me? evaluating machine learning interpretations in cooperative play. In Proceedings of the 24th International Conference on Intelligent User Interfaces (pp. 229--239)."},{"volume-title":"Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1--8).","author":"Fulton L. B.","key":"e_1_2_1_15_1","unstructured":"L. B. Fulton, J. Y. Lee, Q. Wang, Z. Yuan, J. Hammer, and A. Perer. 2020, April. Getting playful with explainable ai: Games with a purpose to improve human understanding of ai. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1--8)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"A. Furnham and H. C. Boo. 2011. A literature review of the anchoring effect. The journal of socio-economics 40(1) 35--42.","DOI":"10.1016\/j.socec.2010.10.008"},{"key":"e_1_2_1_17_1","unstructured":"C. Garvie 2019. Garbage In Garbage Out | Face Recognition on Flawed Data. [Online]. Available: https:\/\/www.flawedfacedata.com\/"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5898\/JHRI.5.1.Geiskkovitch"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376316"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1080\/10508406.2010.508029"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1177\/1046878105282276"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1057\/9780230601765_6"},{"key":"e_1_2_1_23_1","unstructured":"J. Juul. 2010. The game the player the world: Looking for a heart of gameness. Plurais Revista Multidisciplinar 1(2)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1017\/psrm.2020.6"},{"volume-title":"Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1--27","author":"Kou Y.","key":"e_1_2_1_25_1","unstructured":"Y. Kou and X. Gui. 2020. Mediating Community-AI Interaction through Situated Explanation: The Case of AI-Led Moderation. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1--27."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1006\/obhd.1998.2781"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"V. Lai and C. Tan. 2019. On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection arXiv preprint arXiv:1811.07901","DOI":"10.1145\/3287560.3287590"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3116595.3116630"},{"key":"e_1_2_1_29_1","unstructured":"A. C. Madrigal. 2019. How a Feel-Good AI Story Went Wrong in Flint. [Online]. Available: https:\/\/www.theatlantic.com\/technology\/archive\/2019\/01\/how-machine-learning-found-flints-lead-pipes\/578692\/"},{"key":"e_1_2_1_30_1","unstructured":"P. Madumal T. Miller L. Sonenberg and F. Vetere. 2019. A grounded interaction protocol for explainable artificial intelligence. arXiv preprint arXiv:1903.02409."},{"key":"e_1_2_1_31_1","volume-title":"Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. arXiv preprint arXiv:1712.00547.","author":"Miller T.","year":"2017","unstructured":"T. Miller, P. Howe and L. Sonenberg. 2017. Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. arXiv preprint arXiv:1712.00547."},{"volume-title":"January. Towards Explainable NPCs: A Relational Exploration Learning Agent. In AAAI Workshops (pp. 565--569)","author":"Molineaux M.","key":"e_1_2_1_32_1","unstructured":"M. Molineaux, D. Dannenhauer, and D. W. Aha. 2018, January. Towards Explainable NPCs: A Relational Exploration Learning Agent. In AAAI Workshops (pp. 565--569)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1017\/XPS.2015.19"},{"key":"e_1_2_1_34_1","unstructured":"M. Narayanan E. Chen J. He B. Kim S. Gershman and F. Doshi-Velez. 2018. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. arXiv preprint arXiv:1802.00682 (2018)."},{"volume-title":"Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 29","author":"Okita S. Y.","key":"e_1_2_1_35_1","unstructured":"S. Y. Okita, J. Bailenson, and D. L. Schwartz. 2007. The mere belief of social interaction improves learning. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 29, No. 29)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"E. Peer J. Vosgerau and A. Acquisti. 2014. Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods 46(4) 1023--1031.","DOI":"10.3758\/s13428-013-0434-y"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1177\/8756972818808982"},{"key":"e_1_2_1_39_1","first-page":"377","volume-title":"Thirteenth Symposium on Usable Privacy and Security ({SOUPS} 2017)","author":"Samat S.","unstructured":"S. Samat and A. Acquisti. 2017. Format vs. content: the impact of risk and presentation on disclosure decisions. In Thirteenth Symposium on Usable Privacy and Security ({SOUPS} 2017) (pp. 377--384)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","unstructured":"K. Schrier. 2019. Designing Games for Moral Learning and Knowledge Building. Games and Culture. 2019;14(4):306--343. doi:10.1177\/1555412017711514","DOI":"10.1177\/1555412017711514"},{"key":"e_1_2_1_41_1","unstructured":"C. A. Steinkuehler. 2004. Learning in massively multiplayer online games."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10956-008-9120-8"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.chb.2017.08.038"},{"key":"e_1_2_1_44_1","unstructured":"Matt Turek. 2019. Explainable Artificial Intelligence (XAI). [Online]. Available: https:\/\/www.darpa.mil\/program\/explainable-artificial-intelligence"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"A. Tversky and D. Kahneman. 1981. The framing of decisions and the psychology of choice. science 211(4481) 453--458.","DOI":"10.1126\/science.7455683"},{"key":"e_1_2_1_46_1","unstructured":"J. Villareale and J. Zhu. 2021. Understanding Mental Models of AI through Player-AI Interaction. arXiv preprint arXiv:2103.16168"},{"volume-title":"Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1--15)","author":"Wang D.","key":"e_1_2_1_47_1","unstructured":"D. Wang, Q. Yang, A. Abdul, and B. Y. Lim. 2019, May. Designing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1--15)."},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"J. D. Weinberg J. Freese and D. McElhattan. 2014. Comparing data characteristics and results of an online factorial survey between a population-based and a crowdsource-recruited sample. Sociological Science 1.","DOI":"10.15195\/v1.a19"},{"key":"e_1_2_1_49_1","volume-title":"Understanding the Effect of Accuracy on Trust in Machine Learning Models. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019","author":"Yin M.","year":"2019","unstructured":"M. Yin, J.W. Vaughan, and H. Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4--9, 2019, Glasgow, Scotland."}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474655","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474655","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474655","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:42Z","timestamp":1750191522000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474655"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,5]]},"references-count":49,"journal-issue":{"issue":"CHI PLAY","published-print":{"date-parts":[[2021,10,5]]}},"alternative-id":["10.1145\/3474655"],"URL":"https:\/\/doi.org\/10.1145\/3474655","relation":{},"ISSN":["2573-0142"],"issn-type":[{"type":"electronic","value":"2573-0142"}],"subject":[],"published":{"date-parts":[[2021,10,5]]},"assertion":[{"value":"2021-10-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}