{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,2]],"date-time":"2022-04-02T15:07:56Z","timestamp":1648912076203},"reference-count":27,"publisher":"World Scientific Pub Co Pte Lt","issue":"02","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Semantic Computing"],"published-print":{"date-parts":[[2020,6]]},"abstract":"<jats:p> Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and\/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA<jats:sub>sub<\/jats:sub>). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA<jats:sub>sub<\/jats:sub> as well. The results show a slight improvement over the single-task CNN-LSTM model. <\/jats:p>","DOI":"10.1142\/s1793351x20400085","type":"journal-article","created":{"date-parts":[[2020,9,28]],"date-time":"2020-09-28T02:45:06Z","timestamp":1601261106000},"page":"223-248","source":"Crossref","is-referenced-by-count":0,"title":["Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm"],"prefix":"10.1142","volume":"14","author":[{"given":"Mehrdad","family":"Alizadeh","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Illinois at Chicago, Chicago, Illinois, USA"}]},{"given":"Barbara","family":"Di Eugenio","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Chicago, Chicago, Illinois, USA"}]}],"member":"219","published-online":{"date-parts":[[2020,9,23]]},"reference":[{"key":"S1793351X20400085BIB001","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"S1793351X20400085BIB002","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2017.06.005"},{"key":"S1793351X20400085BIB003","volume-title":"Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition","author":"Martin J. H.","year":"2009"},{"key":"S1793351X20400085BIB004","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-007-9048-2"},{"key":"S1793351X20400085BIB005","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1548"},{"key":"S1793351X20400085BIB006","first-page":"12","volume-title":"Proc. Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"Shen D.","year":"2007"},{"key":"S1793351X20400085BIB007","doi-asserted-by":"publisher","DOI":"10.1162\/0891201053630264"},{"key":"S1793351X20400085BIB008","doi-asserted-by":"publisher","DOI":"10.1093\/ijl\/16.3.235"},{"key":"S1793351X20400085BIB009","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.597"},{"key":"S1793351X20400085BIB010","first-page":"1682","volume-title":"Advances in Neural Information Processing Systems","author":"Malinowski M.","year":"2014"},{"key":"S1793351X20400085BIB011","first-page":"2953","volume-title":"Advances in Neural Information Processing Systems","author":"Ren M.","year":"2015"},{"key":"S1793351X20400085BIB012","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.540"},{"key":"S1793351X20400085BIB013","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2754246"},{"key":"S1793351X20400085BIB014","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.10"},{"key":"S1793351X20400085BIB015","first-page":"289","volume-title":"Advances in Neural Information Processing Systems","author":"Lu J.","year":"2016"},{"key":"S1793351X20400085BIB016","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.202"},{"key":"S1793351X20400085BIB017","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219748"},{"key":"S1793351X20400085BIB018","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298880"},{"key":"S1793351X20400085BIB019","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"issue":"2","key":"S1793351X20400085BIB021","first-page":"26","volume":"4","author":"Tieleman T.","year":"2012","journal-title":"COURSERA: Neural Netw. Mach. Learn."},{"key":"S1793351X20400085BIB022","doi-asserted-by":"publisher","DOI":"10.3115\/981732.981751"},{"key":"S1793351X20400085BIB025","first-page":"9","volume-title":"Proc. Generative Lexicon Conf.","author":"Palmer M.","year":"2009"},{"key":"S1793351X20400085BIB026","volume-title":"English Verb Classes and Alternations: A Preliminary Investigation","author":"Levin B.","year":"1993"},{"key":"S1793351X20400085BIB027","first-page":"1253","volume-title":"Proc. 34th Int. Conf. Machine Learning","volume":"70","author":"Gentile C.","year":"2017"},{"key":"S1793351X20400085BIB028","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911548"},{"key":"S1793351X20400085BIB029","first-page":"1301","volume-title":"J. Mach. Learn. Res. Workshop and Conf. Proc.","volume":"48","author":"Korda N.","year":"2016"},{"key":"S1793351X20400085BIB030","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939832"}],"container-title":["International Journal of Semantic Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S1793351X20400085","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,9,28]],"date-time":"2020-09-28T02:45:39Z","timestamp":1601261139000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S1793351X20400085"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6]]},"references-count":27,"journal-issue":{"issue":"02","published-print":{"date-parts":[[2020,6]]}},"alternative-id":["10.1142\/S1793351X20400085"],"URL":"https:\/\/doi.org\/10.1142\/s1793351x20400085","relation":{},"ISSN":["1793-351X","1793-7108"],"issn-type":[{"value":"1793-351X","type":"print"},{"value":"1793-7108","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6]]}}}