{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T05:21:03Z","timestamp":1766553663800,"version":"3.48.0"},"reference-count":79,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T00:00:00Z","timestamp":1766102400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Educational videos contain long periods of visual redundancy, where only a few frames convey meaningful instructional information. Conventional video models, which are designed for dynamic scenes, often fail to capture these subtle pedagogical transitions. We introduce LEARNet, an entropy-aware framework that models educational video understanding as the extraction of high-information instructional content from low-entropy visual streams. LEARNet combines a Temporal Information Bottleneck (TIB) for selecting pedagogically significant keyframes with a Spatial\u2013Semantic Decoder (SSD) that produces fine-grained annotations refined through a proposed Relational Consistency Verification Network (RCVN). This architecture enables the construction of EVUD-2M, a large-scale benchmark with multi-level semantic labels for diverse instructional formats. LEARNet achieves substantial redundancy reduction (70.2%) while maintaining high annotation fidelity (F1 = 0.89, mAP@50 = 0.88). Grounded in information-theoretic principles, LEARNet provides a scalable foundation for tasks such as lecture indexing, visual content summarization, and multimodal learning analytics.<\/jats:p>","DOI":"10.3390\/e28010003","type":"journal-article","created":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T14:27:16Z","timestamp":1766154436000},"page":"3","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["LEARNet: A Learning Entropy-Aware Representation Network for Educational Video Understanding"],"prefix":"10.3390","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1871-6037","authenticated-orcid":false,"given":"Chitrakala","family":"S","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3975-2931","authenticated-orcid":false,"given":"Nivedha","family":"V V","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7168-5284","authenticated-orcid":false,"given":"Niranjana","family":"S R","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1016\/S0923-5965(00)00011-4","article-title":"Temporal Video Segmentation: A Survey","volume":"16","author":"Koprinska","year":"2001","journal-title":"Signal Process Image Commun."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1349","DOI":"10.1109\/34.895972","article-title":"Content-Based Image Retrieval at the End of the Early Years","volume":"22","author":"Smeulders","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Islam, R., and Moushi, O.M. (2024). GPT-4o: The Cutting-Edge Advancement in Multimodal LLM. Intelligent Computing. Proceedings of the 2025 Computing Conference, Springer.","DOI":"10.36227\/techrxiv.171986596.65533294\/v1"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1109\/TMM.2008.2009703","article-title":"A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities","volume":"11","author":"Chen","year":"2009","journal-title":"IEEE Trans. Multimed."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2019\/5217961","article-title":"Key-Frame Extraction Based on HSV Histogram and Adaptive Clustering","volume":"2019","author":"Zhao","year":"2019","journal-title":"Math. Probl. Eng."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhao, B., Xu, S., Lin, S., Wang, R., and Luo, X. (2019, January 8\u201312). A New Visual Interface for Searching and Navigating Slide-Based Lecture Videos. Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China.","DOI":"10.1109\/ICME.2019.00164"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhao, B., Lin, S., Luo, X., Xu, S., and Wang, R. (2017, January 23\u201327). A Novel System for Visual Navigation of Educational Videos Using Multimodal Cues. Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA.","DOI":"10.1145\/3123266.3123406"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1016\/j.cviu.2009.03.011","article-title":"Video Shot Boundary Detection: Seven Years of TRECVid Activity","volume":"114","author":"Smeaton","year":"2010","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"4185","DOI":"10.1109\/TIP.2015.2460013","article-title":"Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement","volume":"24","author":"Wang","year":"2015","journal-title":"IEEE Trans. Image Process."},{"key":"ref_10","unstructured":"Otani, M., Nakashima, Y., Rahtu, E., Heikkil\u00e4, J., and Yokoya, N. (2016, January 20\u201324). Video Summarization Using Deep Semantic Features. Proceedings of the 13th Asian Conference on Computer Vision, Taipei, Taiwan."},{"key":"ref_11","unstructured":"Repp, S., and Meinel, C. (2006, January 13\u201317). Semantic Indexing for Recorded Educational Lecture Videos. Proceedings of the Fourth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW\u201906), Pisa, Italy."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Shiraiwa, T., and Nobuhara, H. (2021, January 12\u201315). Efficient Video Summarization Based on Semantic Segmentation Model. Proceedings of the 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), Kyoto, Japan.","DOI":"10.1109\/GCCE53005.2021.9622026"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Dhanushika, T., and Weerasinghe, T.A. (2024, January 4). Auto Identifying Key Information in Video Lectures to Generate a Navigation Structure. Proceedings of the 2024 International Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka.","DOI":"10.1109\/SCSE61872.2024.10550668"},{"key":"ref_14","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of Oriented Gradients for Human Detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Milosevic, N. (2020). Convolutions and Convolutional Neural Networks. Introduction to Convolutional Neural Networks, Apress.","DOI":"10.1007\/978-1-4842-5648-0_12"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"De, P. (2023). Key Frame Extraction from Videos Based on SIFT and Structural Similarity. International Conference on Communication and Intelligent System, Springer Nature.","DOI":"10.1007\/978-981-99-2100-3_29"},{"key":"ref_17","unstructured":"Li, Y. (2004, January 17\u201321). Chitra Dorai SVM-Based Audio Classification for Instructional Video Analysis. Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Nandyal, S., and Kattimani, S.L. (2021, January 2\u20134). An Efficient Umpire Key Frame Segmentation in Cricket Video Using HOG and SVM. Proceedings of the 2021 6th International Conference for Convergence in Technology (I2CT), Pune, India.","DOI":"10.1109\/I2CT51068.2021.9418112"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"9294","DOI":"10.3934\/mbe.2021457","article-title":"Feature Fusion and Clustering for Key Frame Extraction","volume":"18","author":"Sun","year":"2021","journal-title":"Math. Biosci. Eng."},{"key":"ref_20","unstructured":"Hashemi, N.S., Aghdam, R.B., Ghiasi, A.S.B., and Fatemi, P. (2016). Template Matching Advances and Applications in Image Analysis. arXiv."},{"key":"ref_21","first-page":"1","article-title":"Involvement of Moocs in The Teachinglearning Process","volume":"24","author":"Cabrero","year":"2021","journal-title":"J. Entrep. Educ."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhang, X., Li, C., Li, S.-W., and Zue, V. (2016, January 25\u201328). Automated Segmentation of MOOC Lectures towards Customized Learning. Proceedings of the 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT), Austin, TX, USA.","DOI":"10.1109\/ICALT.2016.25"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Baidya, E., and Goel, S. (2014, January 7\u20139). LectureKhoj: Automatic Tagging and Semantic Segmentation of Online Lecture Videos. Proceedings of the 2014 Seventh International Conference on Contemporary Computing (IC3), Noida, India.","DOI":"10.1109\/IC3.2014.6897144"},{"key":"ref_24","unstructured":"Shah, R.R., Yu, Y., Shaikh, A.D., Tang, S., and Zimmermann, R. (2017, January 3\u20137). ATLAS. Proceedings of the 22nd ACM international conference on Multimedia, Orlando, FL, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Biswas, A., Gandhi, A., and Deshmukh, O. (2015, January 13\u201317). MMToC. Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, QLD, Australia.","DOI":"10.1145\/2733373.2806253"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Mahapatra, D., Mariappan, R., and Rajan, V. (2018, January 23\u201327). Automatic Hierarchical Table of Contents Generation for Educational Videos. Proceedings of the Companion of the Web Conference 2018\u2014WWW \u201918, Lyon, France.","DOI":"10.1145\/3184558.3186336"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Mahapatra, D., Mariappan, R., Rajan, V., Yadav, K.A., and Roy, S. (2018, January 23\u201327). VideoKen. Proceedings of the Companion of the Web Conference 2018\u2014WWW \u201918, Lyon, France.","DOI":"10.1145\/3184558.3186988"},{"key":"ref_28","unstructured":"van den Oord, A., Li, Y., and Vinyals, O. (2019). Representation Learning with Contrastive Predictive Coding. arXiv."},{"key":"ref_29","unstructured":"Soares, E.R., and Barr\u00e9re, E. (November, January 29). An Optimization Model for Temporal Video Lecture Segmentation Using Word2vec and Acoustic Features. Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, Rio de Janeiro, Brazil."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Liu, T., and Choudary, C. (2006, January 8\u201311). Content Extraction and Summarization of Instructional Videos. Proceedings of the 2006 International Conference on Image Processing, Atlanta, GA, USA.","DOI":"10.1109\/ICIP.2006.312381"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"110745","DOI":"10.1016\/j.ress.2024.110745","article-title":"A Generalized Fault Diagnosis Framework for Rotating Machinery Based on Phase Entropy","volume":"256","author":"Wang","year":"2025","journal-title":"Reliab. Eng. Syst. Saf."},{"key":"ref_32","unstructured":"Younes, A., Schaub-Meyer, S., and Chalvatzaki, G. (2023, January 23\u201329). Entropy-Driven Unsupervised Keypoint Representation Learning in Videos. Proceedings of the 2023 International Conference on Machine Learning, Honolulu, HI, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Su, R., Huang, W., Ma, H., Song, X., and Hu, J. (2021, January 19\u201322). SGE Net: Video Object Detection with Squeezed GRU and Information Entropy Map. Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA.","DOI":"10.1109\/ICIP42928.2021.9506081"},{"key":"ref_34","unstructured":"Zhang, X., Fu, D., and Liu, N. (2024). Shot Segmentation Based on Von Neumann Entropy for Key Frame Extraction. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Masry, A., Long, D.X., Tan, J.Q., Joty, S., and Hoque, E. (2022). ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning. arXiv.","DOI":"10.18653\/v1\/2022.findings-acl.177"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H., and Farhadi, A. (2016, January 8\u201316). A Diagram Is Worth a Dozen Images. Proceedings of the 2016 European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_15"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/s11263-020-01363-6","article-title":"Improving Image Description with Auxiliary Modality for Visual Localization in Challenging Conditions","volume":"129","author":"Piasco","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1109\/ICDAR.2007.4376991","article-title":"An Overview of the Tesseract OCR Engine","volume":"Volume 2","author":"Smith","year":"2007","journal-title":"Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zellers, R., Lu, J., Lu, X., Yu, Y., Zhao, Y., Salehi, M., Kusupati, A., Hessel, J., Farhadi, A., and Choi, Y. (2022, January 19\u201324). MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01589"},{"key":"ref_40","unstructured":"Li, Z., Gavrilyuk, K., Gavves, E., Jain, M., and Snoek, C.G. (2020, January 26\u201329). The How2 Dataset and Multimodal Baselines. Proceedings of the 2020 International Conference on Multimedia Retrieval, Dublin, Ireland."},{"key":"ref_41","unstructured":"Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J., Jiang, H., Liu, M., and Liu, X. (2022, January 19\u201324). Ego4d: Around the World in 3000 h of Egocentric Video. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA."},{"key":"ref_42","unstructured":"(2025, October 09). NPTEL Online Certification. Indian Institute of Technology & Indian Institute of Science. National Programme on Technology Enhanced Learning (NPTEL). Available online: https:\/\/nptel.ac.in\/."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Haurilet, M., Al-Halah, Z., and Stiefelhagen, R. (2019, January 7\u201311). SPaSe\u2013Multi-Label Page Segmentation for Presentation Slides. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA.","DOI":"10.1109\/WACV.2019.00082"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Haurilet, M., Roitberg, A., Martinez, M., and Stiefelhagen, R. (2019, January 20\u201325). WiSe\u2014Slide Segmentation in the Wild. Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia.","DOI":"10.1109\/ICDAR.2019.00062"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Dutta, K., Mathew, M., Krishnan, P., and Jawahar, C.V. (2018, January 5\u20138). Localizing and Recognizing Text in Lecture Videos. Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, USA.","DOI":"10.1109\/ICFHR-2018.2018.00049"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wang, W., Song, Y., and Jha, S. (2022, January 16\u201319). Autolv: Automatic Lecture Video Generator. Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France.","DOI":"10.1109\/ICIP46576.2022.9897436"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Lee, D.W., Ahuja, C., Liang, P.P., Natu, S., and Morency, L.-P. (2022). Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides. arXiv.","DOI":"10.1109\/ICCV51070.2023.01838"},{"key":"ref_48","first-page":"6674","article-title":"What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning","volume":"33","author":"Li","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_49","unstructured":"Bulathwela, S., Perez-Ortiz, M., Yilmaz, E., and Shawe-Taylor, J. (2020). VLEngagement: A Dataset of Scientific Video Lectures for Evaluating Population-Based Engagement. arXiv."},{"key":"ref_50","unstructured":"Araujo, A.Y.M., and G.B. (2023, December 13). ClassX Dataset (Classx, Video Search, Query-by-Image, and Lecture Videos). Available online: https:\/\/Exhibits.Stanford.Edu\/Data\/Catalog\/Sf888mq5505."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Minderer, M., Gritsenko, A., Stone, A., Neumann, M., Weissenborn, D., Dosovitskiy, A., Mahendran, A., Arnab, A., Dehghani, M., and Shen, Z. (2022, January 23\u201327). Simple Open-Vocabulary Object Detection with Vision Transformers. Proceedings of the 2022 European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20080-9_42"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Heigold, G., Keysers, D., Minderer, M., Lu\u010di\u0107, M., Gritsenko, A., Yu, F., Bewley, A., and Kipf, T. (2023, January 1\u20136). Video OWL-ViT: Temporally-Consistent Open-World Localization in Video. Proceedings of the 2023 IEEE\/CVF International Conference on Computer Vision (ICCV), Paris, France.","DOI":"10.1109\/ICCV51070.2023.01269"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Li, L.H., Zhang, P., Zhang, H., Yang, J., Li, C., Zhong, Y., Wang, L., Yuan, L., Zhang, L., and Hwang, J.-N. (2022, January 18\u201324). Grounded Language-Image Pre-Training. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01069"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., and Lo, W.-Y. (2023, January 1\u20136). Segment Anything. Proceedings of the 2023 IEEE\/CVF International Conference on Computer Vision (ICCV), Paris, France.","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"ref_55","unstructured":"Ren, T., Jiang, Q., Liu, S., Zeng, Z., Liu, W., Gao, H., Huang, H., Ma, Z., Jiang, X., and Chen, Y. (2024). Grounding DINO 1.5: Advance the \u201cEdge\u201d of Open-Set Object Detection. arXiv."},{"key":"ref_56","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Mal, Z., Luo, G., Gao, J., Li, L., Chen, Y., Wang, S., Zhang, C., and Hu, W. (2022, January 18\u201324). Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01368"},{"key":"ref_58","unstructured":"Calzolari, N., B\u00e9chet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., and Mariani, J. (2020, January 11\u201316). TableBank: Table Benchmark for Image-Based Table Detection and Recognition. Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_59","unstructured":"Haloi, M., Shekhar, S., Fande, N., Dash, S.S., and G, S. (2023). Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method. arXiv."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1080\/07317131.2012.705751","article-title":"Duckduckgo http:\/\/www.duckduckgo.com or http:\/\/www.ddg.gg","volume":"29","author":"Hands","year":"2012","journal-title":"Tech. Serv. Q."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Araujo, A., Chaves, J., Lakshman, H., Angst, R., and Girod, B. (2016). Large-Scale Query-by-Image Video Retrieval Using Bloom Filters. arXiv.","DOI":"10.1109\/ICIP.2015.7351054"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Paliwal, S.S., D, V., Rahul, R., Sharma, M., and Vig, L. (2019, January 20\u201325). TableNet: Deep Learning Model for End-to-End Table Detection and Tabular Data Extraction from Scanned Document Images. Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia.","DOI":"10.1109\/ICDAR.2019.00029"},{"key":"ref_63","unstructured":"(2025, October 09). Roboflow: End-to-End Computer Vision Platform. Available online: https:\/\/universe.roboflow.com\/."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Lei, J., Yu, L., Bansal, M., and Berg, T.L. (November, January 31). TVQA: Localized, Compositional Video Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1167"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Miech, A., Zhukov, D., Alayrac, J.-B., Tapaswi, M., Laptev, I., and Sivic, J. (November, January 29). HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00272"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Tang, Y., Ding, D., Rao, Y., Zheng, Y., Zhang, D., Zhao, L., Lu, J., and Zhou, J. (2019, January 15\u201320). COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00130"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"104469","DOI":"10.1109\/ACCESS.2021.3099427","article-title":"FCN-LectureNet: Extractive Summarization of Whiteboard and Chalkboard Lecture Videos","volume":"9","author":"Davila","year":"2021","journal-title":"IEEE Access"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Sharma, V., Gupta, M., Kumar, A., and Mishra, D. (2021). EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment. Sensors, 21.","DOI":"10.3390\/s21175699"},{"key":"ref_69","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2018, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.patrec.2010.08.004","article-title":"VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method","volume":"32","author":"Lopes","year":"2011","journal-title":"Pattern Recognit. Lett."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_72","unstructured":"Biswas, D., Shah, S., and Subhlok, J. (2025). Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos. arXiv."},{"key":"ref_73","unstructured":"Xue, H., Sun, Y., Liu, B., Fu, J., Song, R., Li, H., and Luo, J. (2023). CLIP-ViP: Adapting Pre-Trained Image-Text Model to Video-Language Representation Alignment. arXiv."},{"key":"ref_74","unstructured":"Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., and Altman, S. (2024). GPT-4 Technical Report. arXiv."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Xu, H., Ghosh, G., Huang, P.-Y., Okhonko, D., Aghajanyan, A., Metze, F., Zettlemoyer, L., and Feichtenhofer, C. (2021). VideoCLIP: Contrastive Pre-Training for Zero-Shot Video-Text Understanding. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.544"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., and Rowe, L.A. (2010, January 25\u201329). TalkMiner. Proceedings of the Proceedings of the 18th ACM International Conference on Multimedia, Florence, Italy.","DOI":"10.1145\/1873951.1873986"},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"1586","DOI":"10.17485\/IJST\/v17i15.456","article-title":"A Framework for Video Summarization Using Visual Attention Technique","volume":"17","author":"Dhanushree","year":"2024","journal-title":"Indian J. Sci. Technol."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Biswas, D., Shah, S., and Subhlok, J. (2025). Visual Content Detection in Educational Videos with Transfer Learning and Dataset Enrichment. arXiv.","DOI":"10.1109\/MIPR67560.2025.00090"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Psallidas, T., and Spyrou, E. (2023). Video Summarization Based on Feature Fusion and Data Augmentation. Computers, 12.","DOI":"10.3390\/computers12090186"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/28\/1\/3\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T05:17:44Z","timestamp":1766553464000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/28\/1\/3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,19]]},"references-count":79,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1]]}},"alternative-id":["e28010003"],"URL":"https:\/\/doi.org\/10.3390\/e28010003","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2025,12,19]]}}}