{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:26:43Z","timestamp":1760059603148,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,6,28]],"date-time":"2025-06-28T00:00:00Z","timestamp":1751068800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Every minute, vast amounts of video and image data are uploaded worldwide to the internet and social media platforms, creating a rich visual archive of human experiences\u2014from weddings and family gatherings to significant historical events such as war crimes and humanitarian crises. When properly analyzed, this multimodal data holds immense potential for reconstructing important events and verifying information. However, challenges arise when images and videos lack complete annotations, making manual examination inefficient and time-consuming. To address this, we propose a novel event-based focal visual content text attention (EFVCTA) framework for automated past event retrieval using visual question answering (VQA) techniques. Our approach integrates a Long Short-Term Memory (LSTM) model with convolutional non-linearity and an adaptive attention mechanism to efficiently identify and retrieve relevant visual evidence alongside precise answers. The model is designed with robust weight initialization, regularization, and optimization strategies and is evaluated on the Common Objects in Context (COCO) dataset. The results demonstrate that EFVCTA achieves the highest performance across all metrics (88.7% accuracy, 86.5% F1-score, 84.9% mAP), outperforming state-of-the-art baselines. The EFVCTA framework demonstrates promising results for retrieving information about past events captured in images and videos and can be effectively applied to scenarios such as documenting training programs, workshops, conferences, and social gatherings in academic institutions<\/jats:p>","DOI":"10.3390\/computers14070255","type":"journal-article","created":{"date-parts":[[2025,6,30]],"date-time":"2025-06-30T10:03:48Z","timestamp":1751277828000},"page":"255","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Focal Correlation and Event-Based Focal Visual Content Text Attention for Past Event Search"],"prefix":"10.3390","volume":"14","author":[{"given":"Pranita P.","family":"Deshmukh","sequence":"first","affiliation":[{"name":"School of Computing Science & Engineering, VIT Bhopal University, Bhopal-Indore Highway Kothrikalan, Sehore 466114, Madhya Pradesh, India"}]},{"given":"S.","family":"Poonkuntran","sequence":"additional","affiliation":[{"name":"School of Computing Science & Engineering, VIT Bhopal University, Bhopal-Indore Highway Kothrikalan, Sehore 466114, Madhya Pradesh, India"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"855","DOI":"10.1109\/TPAMI.2008.137","article-title":"A Novel Connectionist System for Improved Unconstrained Handwriting Recognition","volume":"31","author":"Graves","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","unstructured":"Sak, H., Senior, A., and Beaufays, F. (2021, September 24). Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. Available online: https:\/\/static.googleusercontent.com\/media\/research.google.com\/en\/\/pubs\/archive\/43905.pdf."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Li, X., and Wu, X. (2014). Constructing Long Short-Term Memory Based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition. arXiv.","DOI":"10.1109\/ICASSP.2015.7178826"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Calin, O. (2020). Deep Learning Architectures, Springer Nature.","DOI":"10.1007\/978-3-030-36721-3"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Yang, H., Chaisorn, L., Zhao, Y., Neo, S.-Y., and Chua, T.-S. (2003, January 2\u20138). VideoQA: Question answering on news video. Proceedings of the Eleventh ACM International Conference on Multimedia, Berkeley, CA, USA.","DOI":"10.1145\/957142.957146"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., and Parikh, D. (2015, January 7\u201313). VQA: Visual question answering. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.279"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Groth, O., Bernstein, M., and Fei-Fei, L. (2016, January 27\u201330). Visual7w: Grounded question answering in images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.540"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Jang, Y., Song, Y., Yu, Y., Kim, Y., and Kim, G. (2017, January 21\u201326). TGIF-QA: Toward spatio-temporal reasoning in visual question answering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.149"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., and Fidler, S. (2016, January 27\u201330). MovieQA: Understanding stories in movies through question-answering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.501"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Xu, H., and Saenko, K. (2016, January 11\u201314). Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46478-7_28"},{"key":"ref_12","unstructured":"Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., and Xu, W. (2014, January 8\u201313). Are you talking to a machine? Dataset and methods for multilingual image question. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Andreas, J., Rohrbach, M., Darrell, T., and Klein, D. (2016, January 27\u201330). Neural module networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.12"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C.L., and Girshick, R. (2017, January 21\u201326). Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.215"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kafle, K., and Kanan, C. (2017, January 22\u201329). An analysis of visual question answering algorithms. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.217"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1007\/s11263-017-1033-7","article-title":"Uncovering temporal context for video question and answering","volume":"124","author":"Zhu","year":"2017","journal-title":"Int. J. Comput. Vis."},{"key":"ref_17","unstructured":"Ren, M., Kiros, R., and Zemel, R. (2014, January 8\u201313). Exploring models and data for image question answering. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yu, L., Park, E., Berg, A.C., and Berg, T.L. (2015, January 7\u201313). Visual madlibs: Fill in the blank description generation and question answering. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.283"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., and Parikh, D. (2017, January 21\u201326). Making the V in VQA matter: Elevating the role of image understanding in visual question answering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.670"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xu, D., Zhao, Z., Xiao, J., Wu, F., Zhang, H., He, X., and Zhuang, Y. (2017, January 23\u201327). Video question answering via gradually refined attention over appearance and motion. Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA.","DOI":"10.1145\/3123266.3123427"},{"key":"ref_21","unstructured":"Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. (2014). OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going Deeper with Convolutions. arXiv.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_24","unstructured":"(2025, May 30). COCO Dataset. Available online: https:\/\/cocodataset.org\/#download."},{"key":"ref_25","unstructured":"(2025, May 30). VQA Datasets. Available online: https:\/\/visualqa.org\/vqa_v1_download.html."},{"key":"ref_26","unstructured":"(2025, May 30). VGG 16 Net. Available online: https:\/\/app.box.com\/s\/idt5khauxsamcg3y69jz13w6sc6122ph."},{"key":"ref_27","unstructured":"(2025, May 30). ResNet50 Net. Available online: https:\/\/app.box.com\/s\/17vthb1zl0zeh340m4gaw0luuf2vscne."},{"key":"ref_28","unstructured":"(2025, May 30). Memex QA Dataset. Available online: https:\/\/memexqa.cs.cmu.edu\/memexqa_dataset_v1.1\/."},{"key":"ref_29","unstructured":"Xiong, C., Merity, S., and Socher, R. (2016). Dynamic Memory Networks for Visual and Textual Question Answering. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., and Rohrbach, M. (2016). Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. arXiv.","DOI":"10.18653\/v1\/D16-1044"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Malinowski, M., Doersch, C., Santoro, A., and Battaglia, P. (2018). Learning Visual Question Answering by Bootstrapping Hard Attention. arXiv.","DOI":"10.1007\/978-3-030-01231-1_1"},{"key":"ref_32","unstructured":"Seo, M., Kembhavi, A., Farhadi, A., and Hajishirzi, H. (2018). Bi-Directional Attention Flow for Machine Comprehension. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1893","DOI":"10.1109\/TPAMI.2018.2890628","article-title":"Focal Visual-Text Attention for Memex Question Answering","volume":"41","author":"Liang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/7\/255\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:00:48Z","timestamp":1760032848000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/7\/255"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,28]]},"references-count":34,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["computers14070255"],"URL":"https:\/\/doi.org\/10.3390\/computers14070255","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2025,6,28]]}}}