{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T04:04:17Z","timestamp":1780459457375,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":90,"publisher":"ACM","license":[{"start":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T00:00:00Z","timestamp":1776038400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,4,13]]},"DOI":"10.1145\/3800424.3800454","type":"proceedings-article","created":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T03:27:31Z","timestamp":1780457251000},"page":"38-50","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Beyond Accuracy: Evaluating Trust, Bias, and Hallucination in AI-Generated Alternative (Alt) Text for Accessibility"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6700-719X","authenticated-orcid":false,"given":"Wajdi","family":"Aljedaani","sequence":"first","affiliation":[{"name":"Saudi Data and AI Authority, Riyadh, Saudi Arabia and Department of Software Engineering, Rochester Institute of Technology, Rochester, New York, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-4340-3629","authenticated-orcid":false,"given":"Razan","family":"Alsulaymi","sequence":"additional","affiliation":[{"name":"Saudi Data and AI Authority, Riyadh, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8830-0058","authenticated-orcid":false,"given":"Ghada","family":"Alqahtani","sequence":"additional","affiliation":[{"name":"Saudi Data and AI Authority, Riyadh, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6577-227X","authenticated-orcid":false,"given":"Yunhe","family":"Feng","sequence":"additional","affiliation":[{"name":"University of North Texas, Denton, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,6,2]]},"reference":[{"key":"e_1_3_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00522"},{"key":"e_1_3_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3677846.3677857"},{"key":"e_1_3_3_1_4_2","doi-asserted-by":"crossref","unstructured":"Asmaa\u00a0Mansour Alghamdi Wajdi Aljedaani Hamed Jalali Stephanie Ludi and Marcelo\u00a0M Eler. 2025. Understanding developer challenges and trends in web accessibility: a stack overflow analysis. Universal Access in the Information Society 24 2 (2025) 1701\u20131717.","DOI":"10.1007\/s10209-024-01174-3"},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CDMA61895.2025.00016"},{"key":"e_1_3_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3663548.3675659"},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3744257.3744270"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3641554.3701841"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3677846.3677854"},{"key":"e_1_3_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642508"},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CDMA54072.2022.00027"},{"key":"e_1_3_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3493612.3520471"},{"key":"e_1_3_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Wajdi Aljedaani Rubel\u00a0Hassan Mollik Eysha Saad Marcelo\u00a0M Eler and Asmaa\u00a0Mansour Alghamdi. 2025. Challenges and barriers faced by blind and visually impaired users on social media: a systematic literature review. Universal Access in the Information Society (2025) 1\u201330.","DOI":"10.1007\/s10209-025-01241-3"},{"key":"e_1_3_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3744257.3744277"},{"key":"e_1_3_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASEW52652.2021.00053"},{"key":"e_1_3_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445281"},{"key":"e_1_3_3_1_17_2","unstructured":"Anthropic. 2024. Claude 3.5 Sonnet. https:\/\/www.anthropic.com\/news\/claude-3-5-sonnet. Accessed: 2024-06-20."},{"key":"e_1_3_3_1_18_2","unstructured":"Amanda Askell Yuntao Bai Anna Chen Dawn Drain Deep Ganguli Tom Henighan Andy Jones Nicholas Joseph Ben Mann Nova DasSarma et\u00a0al. 2021. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2112.00861 (2021)."},{"key":"e_1_3_3_1_19_2","unstructured":"Yuntao Bai Saurav Kadavath Sandipan Kundu Amanda Askell Jackson Kernion Andy Jones Anna Chen Anna Goldie Azalia Mirhoseini Cameron McKinnon et\u00a0al. 2022. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2212.08073 (2022)."},{"key":"e_1_3_3_1_20_2","unstructured":"Catarina\u00a0G Bel\u00e9m Preethi Seshadri Yasaman Razeghi and Sameer Singh. 2024. Are models biased on text without gender-related language? arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.00588 (2024)."},{"key":"e_1_3_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300528"},{"key":"e_1_3_3_1_22_2","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared\u00a0D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877\u20131901."},{"key":"e_1_3_3_1_23_2","first-page":"77","volume-title":"Conference on fairness, accountability and transparency","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77\u201391."},{"key":"e_1_3_3_1_24_2","unstructured":"Canyu Chen and Kai Shu. 2023. Can llm-generated misinformation be detected? arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.13788 (2023)."},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Xuweiyi Chen Ziqiao Ma Xuejun Zhang Sihan Xu Shengyi Qian Jianing Yang David Fouhey and Joyce Chai. 2024. Multi-object hallucination in vision language models. Advances in Neural Information Processing Systems 37 (2024) 44393\u201344418.","DOI":"10.52202\/079017-1409"},{"key":"e_1_3_3_1_26_2","unstructured":"Xi Chen Xiao Wang Soravit Changpinyo Anthony\u00a0J Piergiovanni Piotr Padlewski Daniel Salz Sebastian Goodman Adam Grycner Basil Mustafa Lucas Beyer et\u00a0al. 2022. Pali: A jointly-scaled multilingual language-image model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2209.06794 (2022)."},{"key":"e_1_3_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00370"},{"key":"e_1_3_3_1_28_2","doi-asserted-by":"crossref","unstructured":"Killian Clarke. 2023. Which protests count? Coverage bias in Middle East event datasets. Mediterranean Politics 28 2 (2023) 302\u2013328.","DOI":"10.1080\/13629395.2021.1957577"},{"key":"e_1_3_3_1_29_2","doi-asserted-by":"crossref","unstructured":"Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 1 (1960) 37\u201346.","DOI":"10.1177\/001316446002000104"},{"key":"e_1_3_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Ludivine Crible and Liesbeth Degand. 2019. Reliability vs. granularity in discourse annotation: What is the trade-off? Corpus Linguistics and Linguistic Theory 15 1 (2019) 71\u201399.","DOI":"10.1515\/cllt-2016-0046"},{"key":"e_1_3_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCCN.2017.8038465"},{"key":"e_1_3_3_1_32_2","doi-asserted-by":"crossref","unstructured":"Isabel\u00a0O Gallegos Ryan\u00a0A Rossi Joe Barrow Md\u00a0Mehrab Tanjim Sungchul Kim Franck Dernoncourt Tong Yu Ruiyi Zhang and Nesreen\u00a0K Ahmed. 2024. Bias and fairness in large language models: A survey. Computational Linguistics 50 3 (2024) 1097\u20131179.","DOI":"10.1162\/coli_a_00524"},{"key":"e_1_3_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00672"},{"key":"e_1_3_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01331"},{"key":"e_1_3_3_1_35_2","first-page":"65","volume-title":"International Conference on Human-Computer Interaction","author":"Ghoneim Rana","year":"2025","unstructured":"Rana Ghoneim, Wajdi Aljedaani, Asmaa\u00a0Mansour Alghamdi, and Stephanie Ludi. 2025. Educational Accessibility in STEM for Visually Impaired and Blind Students: A Literature Review on Challenges and Support. In International Conference on Human-Computer Interaction. Springer, 65\u201386."},{"key":"e_1_3_3_1_36_2","unstructured":"Th\u00e9o Gigant Camille Guinaudeau and Fr\u00e9d\u00e9ric Dufaux. 2025. Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2504.10049 (2025)."},{"key":"e_1_3_3_1_37_2","unstructured":"Chenhui Gou Abdulwahab Felemban Faizan\u00a0Farooq Khan Deyao Zhu Jianfei Cai Hamid Rezatofighi and Mohamed Elhoseiny. 2024. How well can vision language models see image details? arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2408.03940 (2024)."},{"key":"e_1_3_3_1_38_2","doi-asserted-by":"crossref","unstructured":"Ojasvi Gupta Stefano Marrone Francesco Gargiulo Rajesh Jaiswal and Lidia Marassi. 2025. Understanding Social Biases in Large Language Models. AI 6 5 (2025) 106.","DOI":"10.3390\/ai6050106"},{"key":"e_1_3_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00380"},{"key":"e_1_3_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_25"},{"key":"e_1_3_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3461702.3462620"},{"key":"e_1_3_3_1_42_2","unstructured":"Marius Jahrens and Thomas Martinetz. 2025. Why LLMs Cannot Think and How to Fix It. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.09211 (2025)."},{"key":"e_1_3_3_1_43_2","first-page":"1","volume-title":"Proceedings of the XX Brazilian Symposium on Human Factors in Computing Systems","author":"Jandrey Alessandra\u00a0Helena","year":"2021","unstructured":"Alessandra\u00a0Helena Jandrey, Duncan Dubugras\u00a0Alcoba Ruiz, and Milene\u00a0Selbach Silveira. 2021. Image descriptions\u2019 limitations for people with visual impairments: where are we and where are we going?. In Proceedings of the XX Brazilian Symposium on Human Factors in Computing Systems. 1\u201311."},{"key":"e_1_3_3_1_44_2","doi-asserted-by":"crossref","unstructured":"Ziwei Ji Nayeon Lee Rita Frieske Tiezheng Yu Dan Su Yan Xu Etsuko Ishii Ye\u00a0Jin Bang Andrea Madotto and Pascale Fung. 2023. Survey of hallucination in natural language generation. ACM computing surveys 55 12 (2023) 1\u201338.","DOI":"10.1145\/3571730"},{"key":"e_1_3_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3375627.3375809"},{"key":"e_1_3_3_1_46_2","volume-title":"Web Content Accessibility Guidelines (WCAG) 2.2","author":"Kirkpatrick Andrew","year":"2023","unstructured":"Andrew Kirkpatrick, Joshue O\u00a0Connor, Alastair Campbell, and Michael Cooper. 2023. Web Content Accessibility Guidelines (WCAG) 2.2. Technical Report. W3C Recommendation. https:\/\/www.w3.org\/TR\/WCAG22\/"},{"key":"e_1_3_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376591"},{"key":"e_1_3_3_1_48_2","first-page":"19730","volume-title":"International conference on machine learning","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning. PMLR, 19730\u201319742."},{"key":"e_1_3_3_1_49_2","unstructured":"Yifan Li Yifan Du Kun Zhou Jinpeng Wang Wayne\u00a0Xin Zhao and Ji-Rong Wen. 2023. Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.10355 (2023)."},{"key":"e_1_3_3_1_50_2","doi-asserted-by":"crossref","unstructured":"Yuheng Li Lele Sha Lixiang Yan Jionghao Lin Mladen Rakovi\u0107 Kirsten Galbraith Kayley Lyons Dragan Ga\u0161evi\u0107 and Guanliang Chen. 2023. Can large language models write reflectively. Computers and Education: Artificial Intelligence 4 (2023) 100140.","DOI":"10.1016\/j.caeai.2023.100140"},{"key":"e_1_3_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_3_1_52_2","unstructured":"Fuxiao Liu Kevin Lin Linjie Li Jianfeng Wang Yaser Yacoob and Lijuan Wang. 2023. Mitigating hallucination in large multi-modal models via robust instruction tuning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2306.14565 (2023)."},{"key":"e_1_3_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02484"},{"key":"e_1_3_3_1_54_2","doi-asserted-by":"crossref","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong\u00a0Jae Lee. 2023. Visual instruction tuning. Advances in neural information processing systems 36 (2023) 34892\u201334916.","DOI":"10.52202\/075280-1516"},{"key":"e_1_3_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025814"},{"key":"e_1_3_3_1_56_2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1007\/978-3-031-62846-7_35","volume-title":"International Conference on Computers Helping People with Special Needs","author":"Moured Omar","year":"2024","unstructured":"Omar Moured, Shahid\u00a0Ali Farooqui, Karin M\u00fcller, Sharifeh Fadaeijouybari, Thorsten Schwarz, Mohammed Javed, and Rainer Stiefelhagen. 2024. Alt4Blind: a user interface to simplify charts alt-text creation. In International Conference on Computers Helping People with Special Needs. Springer, 291\u2013298."},{"key":"e_1_3_3_1_57_2","doi-asserted-by":"crossref","unstructured":"Jason Obeid and Enamul Hoque. 2020. Chart-to-text: Generating natural language descriptions for charts by adapting the transformer model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2010.09142 (2020).","DOI":"10.18653\/v1\/2020.inlg-1.20"},{"key":"e_1_3_3_1_58_2","first-page":"36","volume-title":"IFIP Conference on Human-Computer Interaction","author":"Oliveira Alberto Dumont\u00a0Alves","year":"2025","unstructured":"Alberto Dumont\u00a0Alves Oliveira, Wajdi Aljedaani, and Marcelo\u00a0Medeiros Eler. 2025. Assessing Visual Impairment Feedback in Mobile Applications: An Empirical Analysis Using BBC Accessibility Guidelines. In IFIP Conference on Human-Computer Interaction. Springer, 36\u201359."},{"key":"e_1_3_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581315"},{"key":"e_1_3_3_1_60_2","unstructured":"OpenAI. 2024. Hello GPT-4o. https:\/\/openai.com\/index\/hello-gpt-4o\/. Accessed: 2024-05-13."},{"key":"e_1_3_3_1_61_2","doi-asserted-by":"crossref","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray et\u00a0al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022) 27730\u201327744.","DOI":"10.52202\/068431-2011"},{"key":"e_1_3_3_1_62_2","unstructured":"Benji Peng Keyu Chen Ming Li Pohsun Feng Ziqian Bi Junyu Liu Xinyuan Song and Qian Niu. 2024. Securing large language models: Addressing bias misinformation and prompt attacks. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2409.08087 (2024)."},{"key":"e_1_3_3_1_63_2","first-page":"8748","volume-title":"International conference on machine learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748\u20138763."},{"key":"e_1_3_3_1_64_2","unstructured":"Chahat Raj Anjishnu Mukherjee Aylin Caliskan Antonios Anastasopoulos and Ziwei Zhu. 2024. Biasdora: Exploring hidden biased associations in vision-language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.02066 (2024)."},{"key":"e_1_3_3_1_65_2","volume-title":"Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context","author":"Reid Machel","year":"2024","unstructured":"Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, et\u00a0al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. Technical Report. Google DeepMind. https:\/\/arxiv.org\/abs\/2403.05530"},{"key":"e_1_3_3_1_66_2","doi-asserted-by":"crossref","unstructured":"Anna Rohrbach Lisa\u00a0Anne Hendricks Kaylee Burns Trevor Darrell and Kate Saenko. 2018. Object hallucination in image captioning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1809.02156 (2018).","DOI":"10.18653\/v1\/D18-1437"},{"key":"e_1_3_3_1_67_2","doi-asserted-by":"crossref","unstructured":"Sara Sarto Marcella Cornia and Rita Cucchiara. 2025. Image captioning evaluation in the age of multimodal llms: Challenges and future perspectives. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.14604 (2025).","DOI":"10.24963\/ijcai.2025\/1180"},{"key":"e_1_3_3_1_68_2","volume-title":"Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et\u00a0al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track."},{"key":"e_1_3_3_1_69_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Sclar Melanie","year":"2024","unstructured":"Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. 2024. Quantifying Language Models\u2019 Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_3_1_70_2","doi-asserted-by":"crossref","first-page":"6367","DOI":"10.18653\/v1\/2024.naacl-long.353","volume-title":"Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)","author":"Seshadri Preethi","year":"2024","unstructured":"Preethi Seshadri, Sameer Singh, and Yanai Elazar. 2024. The bias amplification paradox in text-to-image generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 6367\u20136384."},{"key":"e_1_3_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1238"},{"key":"e_1_3_3_1_72_2","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1145\/3730436.3730449","volume-title":"Proceedings of the 2025 International Conference on Artificial Intelligence and Computational Intelligence","author":"Shen Yixian","year":"2025","unstructured":"Yixian Shen, Hang Zhang, Yanxin Shen, Lun Wang, Chuanqi Shi, Shaoshuai Du, and Yiyi Tao. 2025. Altgen: Ai-driven alt text generation for enhancing epub accessibility. In Proceedings of the 2025 International Conference on Artificial Intelligence and Computational Intelligence. 78\u201383."},{"key":"e_1_3_3_1_73_2","unstructured":"Nikita Srivatsan Sofia Samaniego Omar Florez and Taylor Berg-Kirkpatrick. 2023. Alt-text with context: Improving accessibility for images on twitter. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.14779 (2023)."},{"key":"e_1_3_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373625.3417014"},{"key":"e_1_3_3_1_75_2","doi-asserted-by":"crossref","unstructured":"Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing 27 8 (2018) 3998\u20134011.","DOI":"10.1109\/TIP.2018.2831899"},{"key":"e_1_3_3_1_76_2","doi-asserted-by":"crossref","unstructured":"Akash Verma Arun\u00a0Kumar Yadav Mohit Kumar and Divakar Yadav. 2024. Automatic image caption generation using deep learning. Multimedia Tools and Applications 83 2 (2024) 5309\u20135325.","DOI":"10.1007\/s11042-023-15555-y"},{"key":"e_1_3_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/2818048.2820013"},{"key":"e_1_3_3_1_79_2","unstructured":"Sibo Wang Xiangkui Cao Jie Zhang Zheng Yuan Shiguang Shan Xilin Chen and Wen Gao. 2024. Vlbiasbench: A comprehensive benchmark for evaluating bias in large vision-language model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.14194 (2024)."},{"key":"e_1_3_3_1_80_2","unstructured":"Teng Wang Jinrui Zhang Junjie Fei Hao Zheng Yunlong Tang Zhe Li Mingqi Gao and Shanshan Zhao. 2023. Caption anything: Interactive image description with diverse multimodal controls. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.02677 (2023)."},{"key":"e_1_3_3_1_81_2","doi-asserted-by":"crossref","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Fei Xia Ed Chi Quoc\u00a0V Le Denny Zhou et\u00a0al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022) 24824\u201324837.","DOI":"10.52202\/068431-1800"},{"key":"e_1_3_3_1_82_2","doi-asserted-by":"crossref","unstructured":"Adam Worrall Erin Ballantyne and Jen Kendall. 2019. \u201cYou don\u2019t feel that you\u2019re so far away\u201d: Information sharing technology use and settlement of international student immigrants. Proceedings of the Association for Information Science and Technology 56 1 (2019) 306\u2013315.","DOI":"10.1002\/pra2.25"},{"key":"e_1_3_3_1_83_2","unstructured":"xAI. 2024. Grok-1.5 Vision Preview. https:\/\/x.ai\/blog\/grok-1.5v. Accessed: 2024-04-12."},{"key":"e_1_3_3_1_84_2","unstructured":"Chuqiao Yan Hans-Peter Hutter Felix\u00a0M Schmitt-Koopmann and Alireza Darvishy. 2025. Chart Accessibility: A Review of Current Alt Text Generation. IEEE Access (2025)."},{"key":"e_1_3_3_1_85_2","doi-asserted-by":"crossref","unstructured":"Peter Young Alice Lai Micah Hodosh and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the association for computational linguistics 2 (2014) 67\u201378.","DOI":"10.1162\/tacl_a_00166"},{"key":"e_1_3_3_1_86_2","doi-asserted-by":"crossref","unstructured":"Lu Yu Malvina Nikandrou Jiali Jin and Verena Rieser. 2023. Quality-agnostic image captioning to safely assist people with vision impairment. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2304.14623 (2023).","DOI":"10.24963\/ijcai.2023\/697"},{"key":"e_1_3_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581388"},{"key":"e_1_3_3_1_88_2","doi-asserted-by":"crossref","unstructured":"Jingyi Zhang Jiaxing Huang Sheng Jin and Shijian Lu. 2024. Vision-language models for vision tasks: A survey. IEEE transactions on pattern analysis and machine intelligence 46 8 (2024) 5625\u20135644.","DOI":"10.1109\/TPAMI.2024.3369699"},{"key":"e_1_3_3_1_89_2","doi-asserted-by":"crossref","unstructured":"Jieyu Zhao Tianlu Wang Mark Yatskar Vicente Ordonez and Kai-Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1707.09457 (2017).","DOI":"10.18653\/v1\/D17-1323"},{"key":"e_1_3_3_1_90_2","unstructured":"Yuchen Zhou Jiayu Tang Shuo Yang Xiaoyan Xiao Yuqin Dai Wenhao Yang Chao Gou Xiaobo Xia and Tat-Seng Chua. 2025. Logic unseen: Revealing the logical blindspots of vision-language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2508.11317 (2025)."},{"key":"e_1_3_3_1_91_2","unstructured":"Deyao Zhu Jun Chen Xiaoqian Shen Xiang Li and Mohamed Elhoseiny. 2023. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2304.10592 (2023)."}],"event":{"name":"W4A '26: The 23rd International Web for All Conference","location":"Dubai United Arab Emirates","acronym":"W4A '26"},"container-title":["Proceedings of the 23rd International Web for All Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3800424.3800454","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T03:31:30Z","timestamp":1780457490000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3800424.3800454"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,13]]},"references-count":90,"alternative-id":["10.1145\/3800424.3800454","10.1145\/3800424"],"URL":"https:\/\/doi.org\/10.1145\/3800424.3800454","relation":{},"subject":[],"published":{"date-parts":[[2026,4,13]]},"assertion":[{"value":"2026-06-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}