{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T21:51:14Z","timestamp":1740174674519,"version":"3.37.3"},"reference-count":147,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,6,22]],"date-time":"2024-06-22T00:00:00Z","timestamp":1719014400000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004756","name":"Emil Aaltonen Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004756","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Using language to interpret unstructured data"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Over the last decade, a plethora of training datasets have been compiled for use in language-based machine perception and in human-centered AI, alongside research regarding their compilation methods. From a primarily linguistic perspective, we add to these studies in two ways. First, we provide an overview of sixty-six training datasets used in automatic image, video, and audio captioning, examining their compilation methods with a metadata analysis. Second, we delve into the annotation process of crowdsourced datasets with an interest in understanding the linguistic factors that affect the form and content of the captions, such as contextualization and perspectivation. With a qualitative content analysis, we examine annotator instructions with a selection of eleven datasets. Drawing from various theoretical frameworks that help assess the effectiveness of the instructions, we discuss the visual and textual presentation of the instructions, as well as the perspective-guidance that is an essential part of the language instructions. While our analysis indicates that some standards in the formulation of instructions seem to have formed in the field, we also identified various reoccurring issues potentially hindering readability and comprehensibility of the instructions, and therefore, caption quality. To enhance readability, we emphasize the importance of text structure, organization of the information, consistent use of typographical cues, and clarity of language use. Last, engaging with previous research, we assess the compilation of both web-sourced and crowdsourced captioning datasets from various perspectives, discussing factors affecting the diversity of the datasets.<\/jats:p>","DOI":"10.1093\/llc\/fqae029","type":"journal-article","created":{"date-parts":[[2024,6,22]],"date-time":"2024-06-22T22:39:08Z","timestamp":1719095948000},"page":"864-883","source":"Crossref","is-referenced-by-count":1,"title":["Language-based machine perception: linguistic perspectives on the compilation of captioning datasets"],"prefix":"10.1093","volume":"39","author":[{"given":"Laura","family":"Hekanaho","sequence":"first","affiliation":[{"name":"Faculty of Information Technology and Communication Sciences, Tampere University , Tampere, Finland"}]},{"given":"Maija","family":"Hirvonen","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology and Communication Sciences, Tampere University , Tampere, Finland"}]},{"given":"Tuomas","family":"Virtanen","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology and Communication Sciences, Tampere University , Tampere, Finland"}]}],"member":"286","published-online":{"date-parts":[[2024,6,21]]},"reference":[{"key":"2024090307515158300_fqae029-B1","first-page":"603","volume-title":"Advances in Intelligent Systems and Computing","author":"Abreu","year":"2017"},{"first-page":"8947","year":"2019","author":"Agrawal","key":"2024090307515158300_fqae029-B2"},{"first-page":"184","year":"2020","author":"Al Kuwatly","key":"2024090307515158300_fqae029-B3"},{"first-page":"58","year":"2019","author":"Alikhani","key":"2024090307515158300_fqae029-B4"},{"key":"2024090307515158300_fqae029-B5","first-page":"107","article-title":"\u2018The Effects of Syntactic and Lexical Complexity on the Comprehension of Elementary Science Texts\u2019","volume":"4","author":"Arya","year":"2017","journal-title":"International Electronic Journal of Elementary Education"},{"key":"2024090307515158300_fqae029-B6","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1002\/9781118335598.ch16","volume-title":"The Handbook of Language Variation and Change","author":"Ash","year":"2013","edition":"2nd ed."},{"year":"2023","author":"ATLAS.ti","key":"2024090307515158300_fqae029-B7"},{"year":"2022","author":"Awad","key":"2024090307515158300_fqae029-B8"},{"first-page":"1708","year":"2021","author":"Bain","key":"2024090307515158300_fqae029-B9"},{"key":"2024090307515158300_fqae029-B10","doi-asserted-by":"crossref","first-page":"562","DOI":"10.4324\/9780367076399-39","volume-title":"The Routledge Handbook of Corpus Linguistics","author":"Baker","year":"2022"},{"key":"2024090307515158300_fqae029-B11","doi-asserted-by":"publisher","first-page":"103","DOI":"10.3389\/fpsyg.2016.00103","article-title":"\u2018Talking about Relations: Factors Influencing the Production of Relational Descriptions\u2019","volume":"7","author":"Baltaretu","year":"2016","journal-title":"Frontiers in Psychology"},{"first-page":"304","year":"2018","author":"Barrault","key":"2024090307515158300_fqae029-B12"},{"key":"2024090307515158300_fqae029-B13","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1162\/COLI_a_00074","article-title":"\u2018What Determines Inter-coder Agreement in Manual Annotations? A Meta-analytic Investigation\u2019","volume":"37","author":"Bayerl","year":"2011","journal-title":"Computational Linguistics"},{"year":"2022","author":"Beaumont","key":"2024090307515158300_fqae029-B14"},{"first-page":"5185","year":"2020","author":"Bender","key":"2024090307515158300_fqae029-B15"},{"first-page":"333","year":"2010","author":"Bigham","key":"2024090307515158300_fqae029-B16"},{"first-page":"8718","year":"2020","author":"Bisk","key":"2024090307515158300_fqae029-B17"},{"first-page":"12458","year":"2019","author":"Biten","key":"2024090307515158300_fqae029-B18"},{"first-page":"1452","year":"2022","author":"Bountos","key":"2024090307515158300_fqae029-B19"},{"key":"2024090307515158300_fqae029-B20","doi-asserted-by":"crossref","DOI":"10.4324\/9781003052968","volume-title":"Innovation In Audio Description Research, pp. 159\u2013196","author":"Braun","year":"2020"},{"key":"2024090307515158300_fqae029-B21","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1016\/j.chb.2015.08.031","article-title":"\u2018Work Experiences on MTurk: Job Satisfaction, Turnover, and Information Sharing\u2019","volume":"54","author":"Brawley","year":"2016","journal-title":"Computers in Human Behavior"},{"volume-title":"The Elements of Typographic Style","year":"1992","author":"Bringhurst","key":"2024090307515158300_fqae029-B22"},{"key":"2024090307515158300_fqae029-B23","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1515\/9780748635788-006","volume-title":"Language and Identities","author":"Bucholtz","year":"2010"},{"key":"2024090307515158300_fqae029-B24","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1007\/s10683-012-9318-8","article-title":"Multitasking","volume":"15","author":"Buser","year":"2012","journal-title":"Experimental Economics"},{"key":"2024090307515158300_fqae029-B25","doi-asserted-by":"crossref","first-page":"209","DOI":"10.3366\/cor.2013.0041","article-title":"\u2018A Multi-dimensional Contrastive Study of English Abstracts by Native and Non-native Writers\u2019","volume":"8","author":"Cao","year":"2013","journal-title":"Corpora"},{"first-page":"3557","year":"2021","author":"Changpinyo","key":"2024090307515158300_fqae029-B26"},{"year":"2023","author":"Chen","key":"2024090307515158300_fqae029-B27"},{"first-page":"190","year":"2011","author":"Chen","key":"2024090307515158300_fqae029-B28"},{"key":"2024090307515158300_fqae029-B29","first-page":"299","volume-title":"Learning, Keeping and Using Language. Selected papers from the Eighth World Congress of Applied Linguistics, Sydney, Australia","author":"Ciliberti","year":"1987"},{"key":"2024090307515158300_fqae029-B30","first-page":"1106","article-title":"\u2018Excavating AI: The Politics of Images in Machine Learning Training Sets\u2019","volume":"36","author":"Crawford","year":"2020","journal-title":"AI and Society"},{"key":"2024090307515158300_fqae029-B31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1515\/tlir.2010.001","article-title":"\u2018Naive v. Expert Intuitions: An Empirical Study of Acceptability judgments\u2019","volume":"27","author":"Dabrowska","year":"2010","journal-title":"Linguistic Review"},{"key":"2024090307515158300_fqae029-B32","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1016\/j.langsci.2005.11.014","article-title":"\u2018Individual Differences in Language Attainment: Comprehension of Passive Sentences by Native and Non-native English Speakers\u2019","volume":"28","author":"Dabrowska","year":"2006","journal-title":"Language Sciences"},{"first-page":"2634","year":"2013","author":"Das","key":"2024090307515158300_fqae029-B33"},{"key":"2024090307515158300_fqae029-B34","first-page":"346","volume-title":"Computer Vision\u2014ECCV 2022, Conference proceedings, Part IV,","author":"Delmas","year":"2022"},{"year":"2021","author":"Desai","key":"2024090307515158300_fqae029-B35"},{"first-page":"135","year":"2018","author":"Difallah","key":"2024090307515158300_fqae029-B36"},{"first-page":"736","year":"2020","author":"Drossos","key":"2024090307515158300_fqae029-B37"},{"key":"2024090307515158300_fqae029-B38","first-page":"64","volume-title":"Language and Gender: A Reader","author":"Eckert","year":"2011","edition":"2nd ed."},{"first-page":"70","year":"2016","author":"Elliott","key":"2024090307515158300_fqae029-B39"},{"first-page":"215","year":"2017","author":"Elliott","key":"2024090307515158300_fqae029-B40"},{"first-page":"1292","year":"2013","author":"Elliott","key":"2024090307515158300_fqae029-B41"},{"key":"2024090307515158300_fqae029-B42","doi-asserted-by":"crossref","first-page":"388","DOI":"10.1002\/9781118335598.ch18","volume-title":"The Handbook of Language Variation and Change","author":"Fought","year":"2013","edition":"2nd ed."},{"key":"2024090307515158300_fqae029-B43","first-page":"1","article-title":"\u2018The Influence of Comprehensibility on Interest and Comprehension\u2019","author":"Friedrich","year":"2022","journal-title":"Zeitschrift f\u00fcr P\u00e4dagogische Psychologie"},{"first-page":"955","year":"2017","author":"Gan","key":"2024090307515158300_fqae029-B44"},{"first-page":"1161","year":"2020","author":"Geva","key":"2024090307515158300_fqae029-B45"},{"key":"2024090307515158300_fqae029-B46","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1016\/j.neucom.2020.10.045","article-title":"\u2018Learning from Multiple Inconsistent and Dependent Annotators to Support Classification Tasks\u2019","volume":"423","author":"Gil-Gonzalez","year":"2021","journal-title":"Neurocomputing"},{"key":"2024090307515158300_fqae029-B47","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1075\/hcp.9","volume-title":"Perspective and Perspectivation in Discourse","author":"Graumann","year":"2002"},{"key":"2024090307515158300_fqae029-B48","first-page":"189","volume-title":"Aspects of Meaning Construction","author":"G\u00fcnter","year":"2007"},{"key":"2024090307515158300_fqae029-B49","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1007\/978-3-030-58520-4_25","volume-title":"Computer Vision\u2014ECCV 2020","author":"Gurari","year":"2020"},{"year":"2021","author":"Hacheme","key":"2024090307515158300_fqae029-B50"},{"first-page":"8528","year":"2019","author":"He","key":"2024090307515158300_fqae029-B51"},{"key":"2024090307515158300_fqae029-B52","first-page":"3","article-title":"Bias in Machine Learning\u2014What is it Good For\u2019","volume":"2659","author":"Hellstr\u00f6m","year":"2020","journal-title":"CEUR Workshop Proceedings"},{"first-page":"7","year":"2022","author":"Hiippala","key":"2024090307515158300_fqae029-B53"},{"first-page":"76","year":"2020","author":"Hirvonen","key":"2024090307515158300_fqae029-B54"},{"first-page":"2399","year":"2016","author":"Hitschler","key":"2024090307515158300_fqae029-B55"},{"key":"2024090307515158300_fqae029-B56","doi-asserted-by":"crossref","first-page":"853","DOI":"10.1613\/jair.3994","article-title":"\u2018Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics\u2019","volume":"47","author":"Hodosh","year":"2013","journal-title":"The Journal of Artificial Intelligence Research"},{"year":"2013","author":"Hovy","key":"2024090307515158300_fqae029-B57"},{"first-page":"3258","year":"2021","author":"Hsu","key":"2024090307515158300_fqae029-B58"},{"year":"2021","author":"Huynh","key":"2024090307515158300_fqae029-B59"},{"first-page":"4565","year":"2016","author":"Johnson","key":"2024090307515158300_fqae029-B60"},{"key":"2024090307515158300_fqae029-B61","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1017\/CBO9780511844744.005","volume-title":"Cognitive Load Theory","author":"Kalyuga","year":"2010"},{"key":"2024090307515158300_fqae029-B62","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1016\/j.neucom.2014.10.082","article-title":"\u2018Modeling Annotator Behaviors for Crowd Labeling\u2019","volume":"160","author":"Kara","year":"2015","journal-title":"Neurocomputing"},{"first-page":"787","year":"2014","author":"Kazemzadeh","key":"2024090307515158300_fqae029-B63"},{"key":"2024090307515158300_fqae029-B64","first-page":"638","article-title":"\u2018Typography, Color, and Information Structure\u2019","volume":"40","author":"Keyes","year":"1993","journal-title":"Journal of the Society for Technical Communication"},{"first-page":"119","year":"2019","author":"Kim","key":"2024090307515158300_fqae029-B65"},{"key":"2024090307515158300_fqae029-B66","doi-asserted-by":"crossref","first-page":"102643","DOI":"10.1016\/j.ipm.2021.102643","article-title":"\u2018Offensive, Aggressive, and Hate Speech Analysis: From Data-centric to Human-centered Approach\u2019","volume":"58","author":"Koco\u0144","year":"2021","journal-title":"Information Processing and Management"},{"first-page":"3337","year":"2017","author":"Krause","key":"2024090307515158300_fqae029-B67"},{"first-page":"4667","year":"2022","author":"Kreiss","key":"2024090307515158300_fqae029-B68"},{"first-page":"706","year":"2017","author":"Krishna","key":"2024090307515158300_fqae029-B69"},{"key":"2024090307515158300_fqae029-B70","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1007\/s11263-016-0981-7","article-title":"\u2018Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations\u2019","volume":"123","author":"Krishna","year":"2017","journal-title":"International Journal of Computer Vision"},{"volume-title":"Principles of Linguistic Change. Volume 2: Social Factors","year":"2001","author":"Labov","key":"2024090307515158300_fqae029-B71"},{"key":"2024090307515158300_fqae029-B72","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1037\/rev0000297","article-title":"\u2018Word Meaning in Minds and Machines\u2019","volume":"130","author":"Lake","year":"2023","journal-title":"Psychological Review"},{"key":"2024090307515158300_fqae029-B73","first-page":"83","article-title":"\u2018Producing Spoken Language: A Blueprint of the Speaker\u2019","volume":"9","author":"Levelt","year":"1999","journal-title":"The Neurocognition of Language"},{"key":"2024090307515158300_fqae029-B74","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1111\/lnc3.12147","article-title":"\u2018Integrating Intersectionality in Language, Gender, and Sexuality research\u2019,","volume":"9","author":"Levon","year":"2015","journal-title":"Language and Linguistics Compass"},{"key":"2024090307515158300_fqae029-B75","doi-asserted-by":"crossref","first-page":"628","DOI":"10.1016\/j.promfg.2018.06.092","article-title":"\u2018Effects of Information Content in Work Instructions for Operator Performance\u2019","volume":"25","author":"Li","year":"2018","journal-title":"Procedia Manufacturing"},{"first-page":"2046","year":"2020","author":"Li","key":"2024090307515158300_fqae029-B76"},{"first-page":"271","year":"2016","author":"Li","key":"2024090307515158300_fqae029-B77"},{"first-page":"2347","year":"2019","author":"Li","key":"2024090307515158300_fqae029-B78"},{"first-page":"4641","year":"2016","author":"Li","key":"2024090307515158300_fqae029-B79"},{"key":"2024090307515158300_fqae029-B80","doi-asserted-by":"crossref","DOI":"10.1515\/9780748637492","volume-title":"The Sociolinguistics of Writing","author":"Lills","year":"2013"},{"key":"2024090307515158300_fqae029-B81","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1007\/978-3-319-10602-1_48","volume-title":"Computer Vision\u2014ECCV 2014","author":"Lin","year":"2014"},{"key":"2024090307515158300_fqae029-B82","first-page":"47","volume-title":"Perspective and Perspectivation in Discourse","author":"Lindell","year":"2002"},{"key":"2024090307515158300_fqae029-B83","doi-asserted-by":"crossref","DOI":"10.4135\/9781071878804","volume-title":"Conducting Online Research on Amazon Mechanical Turk and Beyond","author":"Litman","year":"2021","edition":"1st edn"},{"first-page":"10897","year":"2020","author":"Liu","key":"2024090307515158300_fqae029-B84"},{"first-page":"2183","year":"2018","author":"Lu","key":"2024090307515158300_fqae029-B85"},{"first-page":"1","year":"2016","author":"Mao","key":"2024090307515158300_fqae029-B86"},{"first-page":"90","year":"2021","author":"Mart\u00edn-Morat\u00f3","key":"2024090307515158300_fqae029-B87"},{"first-page":"3574","year":"2016","author":"Mathews","key":"2024090307515158300_fqae029-B88"},{"key":"2024090307515158300_fqae029-B89","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1207\/s1532690xci1401_1","article-title":"Are Good Texts Always Better? Interactions of Text Coherence, Background Knowledge, and Levels of Understanding in Learning From Text\u2019","volume":"14","author":"McNamara","year":"1996","journal-title":"Cognition and Instruction"},{"key":"2024090307515158300_fqae029-B90","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3457607","article-title":"A Survey on Bias and Fairness in Machine Learning\u2019","volume":"54","author":"Mehrabi","year":"2021","journal-title":"ACM Computing Surveys"},{"key":"2024090307515158300_fqae029-B91","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13636-022-00259-2","article-title":"\u2018Automated Audio Captioning: An Overview of Recent Progress and New Challenges\u2019","volume":"2022","author":"Mei","year":"2022","journal-title":"EURASIP Journal on Audio, Speech, and Music Processing"},{"key":"2024090307515158300_fqae029-B92","doi-asserted-by":"crossref","DOI":"10.4324\/9780203874196","volume-title":"Introducing Sociolinguistics","author":"Meyerhoff","year":"2015"},{"year":"2023","author":"Mialon","key":"2024090307515158300_fqae029-B93"},{"first-page":"2630","year":"2019","author":"Miech","key":"2024090307515158300_fqae029-B94"},{"key":"2024090307515158300_fqae029-B95","doi-asserted-by":"crossref","DOI":"10.4324\/9780203124666","volume-title":"Authority in Language: Investigating Standard English","author":"Milroy","year":"2012"},{"first-page":"1780","year":"2016","author":"Miyasaki","key":"2024090307515158300_fqae029-B96"},{"volume-title":"Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-Centered AI","year":"2021","author":"Monarch","key":"2024090307515158300_fqae029-B97"},{"key":"2024090307515158300_fqae029-B98","doi-asserted-by":"crossref","DOI":"10.1075\/pbns.138","volume-title":"Discourse Markers in Native and Non-native English Discourse","author":"M\u00fcller","year":"2005"},{"first-page":"4220","year":"2021","author":"Nakamura","key":"2024090307515158300_fqae029-B99"},{"key":"2024090307515158300_fqae029-B100","doi-asserted-by":"crossref","first-page":"730","DOI":"10.1007\/978-3-030-63007-2","volume-title":"Proceedings of Computational Collective Intelligence","author":"Nguyen","year":"2020"},{"key":"2024090307515158300_fqae029-B101","doi-asserted-by":"crossref","DOI":"10.1515\/9783110803389","volume-title":"Folk Linguistics","author":"Niedzielski","year":"2000"},{"year":"2022","author":"Nieto","key":"2024090307515158300_fqae029-B102"},{"key":"2024090307515158300_fqae029-B103","first-page":"3","volume-title":"Language Socialization. Encyclopedia of Language and Education","author":"Ochs","year":"2017"},{"year":"2023","author":"OpenAI","key":"2024090307515158300_fqae029-B104"},{"first-page":"1143","year":"2011","author":"Ordonez","key":"2024090307515158300_fqae029-B105"},{"key":"2024090307515158300_fqae029-B106","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.ijinfomgt.2018.01.008","article-title":"\u2018Trait motivations of crowdsourcing and task choice: a distal-proximal perspective\u2019","volume":"40","author":"Pee","year":"2018","journal-title":"International Journal of Information Management"},{"key":"2024090307515158300_fqae029-B107","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511844744","volume-title":"Cognitive Load Theory","author":"Plass","year":"2010"},{"key":"2024090307515158300_fqae029-B108","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1007\/978-3-030-58558-7_38","volume-title":"Computer Vision\u2014ECCV 2020","author":"Pont-Tuset","year":"2020"},{"key":"2024090307515158300_fqae029-B109","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1080\/00140139.2014.889220","article-title":"\u2018Is Red the Colour of Danger? Testing an Implicit Red-Danger Association\u2019","volume":"57","author":"Pravossoudovitch","year":"2014","journal-title":"Ergonomics"},{"first-page":"1074","year":"2017","author":"Rabinovich","key":"2024090307515158300_fqae029-B110"},{"first-page":"171","year":"2016","author":"Rajendran","key":"2024090307515158300_fqae029-B111"},{"year":"2017","author":"Ramisa","key":"2024090307515158300_fqae029-B112"},{"first-page":"139","year":"2010","author":"Rashtchian","key":"2024090307515158300_fqae029-B113"},{"key":"2024090307515158300_fqae029-B114","first-page":"297","article-title":"\u2018Learning from Crowds\u2019","volume":"11","author":"Raykar","year":"2010","journal-title":"Journal of Machine Learning Research"},{"first-page":"25","year":"2013","author":"Regneri","key":"2024090307515158300_fqae029-B115"},{"key":"2024090307515158300_fqae029-B116","doi-asserted-by":"crossref","first-page":"1428","DOI":"10.1016\/j.patrec.2013.05.012","article-title":"\u2018Learning from Multiple Annotators: Distinguishing Good From Random Labelers\u2019","volume":"34","author":"Rodrigues","year":"2013","journal-title":"Pattern Recognition Letters"},{"first-page":"3202","year":"2015","author":"Rohrbach","key":"2024090307515158300_fqae029-B117"},{"key":"2024090307515158300_fqae029-B118","doi-asserted-by":"crossref","DOI":"10.4135\/9781529682571","volume-title":"Qualitative Content Analysis in Practice","author":"Schreier","year":"2012"},{"year":"2022","author":"Schuhmann","key":"2024090307515158300_fqae029-B119"},{"first-page":"184","year":"2014","author":"Senina","key":"2024090307515158300_fqae029-B120"},{"first-page":"2556","year":"2018","author":"Sharma","key":"2024090307515158300_fqae029-B121"},{"first-page":"1","year":"2014","author":"Sharpe","key":"2024090307515158300_fqae029-B122"},{"key":"2024090307515158300_fqae029-B123","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1007\/978-3-030-58536-5_44","volume-title":"Computer Vision\u2014ECCV 2020","author":"Sidorov","year":"2020"},{"key":"2024090307515158300_fqae029-B124","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1007\/978-3-319-46448-0_31","volume-title":"Computer Vision\u2014ECCV 2016","author":"Sigurdsson","year":"2016"},{"first-page":"1","year":"2020","author":"Simons","key":"2024090307515158300_fqae029-B125"},{"first-page":"5016","year":"2022","author":"Soldan","key":"2024090307515158300_fqae029-B126"},{"first-page":"2443","year":"2021","author":"Srinivasan","key":"2024090307515158300_fqae029-B127"},{"key":"2024090307515158300_fqae029-B128","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1109\/TPAMI.2022.3148210","article-title":"\u2018From Show to Tell: A Survey on Deep Learning-based Image Captioning\u2019","volume":"45","author":"Stefanini","year":"2023","journal-title":"IEEE Transactions on Pattern Analysis and Machine"},{"volume-title":"User-Centered Translation","year":"2015","author":"Suojanen","key":"2024090307515158300_fqae029-B129"},{"key":"2024090307515158300_fqae029-B130","first-page":"576","volume-title":"Conference Presentation at Digital Humanities 2022","author":"Suviranta","year":"2022"},{"volume-title":"Explorations in the Learning Sciences, Instructional Systems and Performance Technologies","year":"2011","author":"Sweller","key":"2024090307515158300_fqae029-B131"},{"first-page":"16","year":"2021","author":"Takatsu","key":"2024090307515158300_fqae029-B132"},{"year":"2022","author":"Thapliyal","key":"2024090307515158300_fqae029-B133"},{"first-page":"5228","year":"2022","author":"Thrush","key":"2024090307515158300_fqae029-B134"},{"year":"2015","author":"Torabi","key":"2024090307515158300_fqae029-B135"},{"year":"2019","author":"Van Miltenburg","key":"2024090307515158300_fqae029-B136"},{"first-page":"21","year":"2017","author":"Van Miltenburg","key":"2024090307515158300_fqae029-B137"},{"key":"2024090307515158300_fqae029-B138","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1007\/s11263-012-0564-1","article-title":"\u2018Efficiently Scaling Up Crowdsourced Video Annotation A Set of Best Practices for High Quality, Economical Video Labeling\u2019","volume":"101","author":"Vondrick","year":"2013","journal-title":"International Journal of Computer Vision"},{"first-page":"4580","year":"2019","author":"Wang","key":"2024090307515158300_fqae029-B139"},{"key":"2024090307515158300_fqae029-B140","doi-asserted-by":"crossref","first-page":"709","DOI":"10.1007\/978-3-031-19833-5_41","volume-title":"Computer Vision\u2014ECCV 2022","author":"Wang","year":"2022"},{"key":"2024090307515158300_fqae029-B141","doi-asserted-by":"crossref","DOI":"10.1075\/celcr.20","volume-title":"Sensory Linguistics. Language, Perception and Metaphor","author":"Winter","year":"2019"},{"key":"2024090307515158300_fqae029-B142","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1007\/978-3-031-19836-6_2","volume-title":"Computer Vision\u2014ECCV 2022","author":"Wu","year":"2022"},{"first-page":"418","year":"2021","author":"Wu","key":"2024090307515158300_fqae029-B143"},{"first-page":"5288","year":"2016","author":"Xu","key":"2024090307515158300_fqae029-B144"},{"first-page":"67","year":"2014","author":"Young","key":"2024090307515158300_fqae029-B145"},{"first-page":"6571","year":"2019","author":"Zhou","key":"2024090307515158300_fqae029-B146"},{"first-page":"7590","year":"2018","author":"Zhou","key":"2024090307515158300_fqae029-B147"}],"container-title":["Digital Scholarship in the Humanities"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/dsh\/article-pdf\/39\/3\/864\/58997721\/fqae029.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/dsh\/article-pdf\/39\/3\/864\/58997721\/fqae029.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,3]],"date-time":"2024-09-03T12:32:50Z","timestamp":1725366770000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/dsh\/article\/39\/3\/864\/7697546"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,21]]},"references-count":147,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,6,21]]},"published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/llc\/fqae029","relation":{},"ISSN":["2055-7671","2055-768X"],"issn-type":[{"type":"print","value":"2055-7671"},{"type":"electronic","value":"2055-768X"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,6,21]]}}}