{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,20]],"date-time":"2026-07-20T12:58:28Z","timestamp":1784552308374,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":76,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,24]],"date-time":"2021-08-24T00:00:00Z","timestamp":1629763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"European Union?s Horizon 2020 research and innovation programme under the Marie Sk?odowska-Curie grant No. 765140"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,24]]},"DOI":"10.1145\/3460426.3463662","type":"proceedings-article","created":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:50:29Z","timestamp":1630536629000},"page":"580-589","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":33,"title":["GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization"],"prefix":"10.1145","author":[{"given":"Jia-Hong","family":"Huang","sequence":"first","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Luka","family":"Murn","sequence":"additional","affiliation":[{"name":"BBC Research and Development &amp; Dublin City University, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marta","family":"Mrak","sequence":"additional","affiliation":[{"name":"BBC Research and Development, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marcel","family":"Worring","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-37731-1_40"},{"key":"e_1_3_2_1_3_1","volume-title":"Video Summarization Using Deep Neural Networks: A Survey. arXiv preprint arXiv:2101.06072","author":"Apostolidis Evlampios","year":"2021","unstructured":"Evlampios Apostolidis , Eleni Adamantidou , Alexandros I Metsai , Vasileios Mezaris , and Ioannis Patras . 2021. Video Summarization Using Deep Neural Networks: A Survey. arXiv preprint arXiv:2101.06072 ( 2021 ). Evlampios Apostolidis, Eleni Adamantidou, Alexandros I Metsai, Vasileios Mezaris, and Ioannis Patras. 2021. Video Summarization Using Deep Neural Networks: A Survey. arXiv preprint arXiv:2101.06072 (2021)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3347449.3357482"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_12"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2007.897466"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3338533.3366583"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298981"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2010.08.004"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_11_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_1_12_1","volume-title":"How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512","author":"Ethayarajh Kawin","year":"2019","unstructured":"Kawin Ethayarajh . 2019. How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512 ( 2019 ). Kawin Ethayarajh. 2019. How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512 (2019)."},{"key":"e_1_3_2_1_13_1","unstructured":"Boqing Gong Wei-Lun Chao Kristen Grauman and Fei Sha. 2014. Diverse sequential subset selection for supervised video summarization. In Advances in Neural Information Processing Systems. 2069--2077.  Boqing Gong Wei-Lun Chao Kristen Grauman and Fei Sha. 2014. Diverse sequential subset selection for supervised video summarization. In Advances in Neural Information Processing Systems. 2069--2077."},{"key":"e_1_3_2_1_14_1","volume-title":"Creating summaries from user videos","author":"Gygli Michael","unstructured":"Michael Gygli , Helmut Grabner , Hayko Riemenschneider , and Luc Van Gool . 2014. Creating summaries from user videos . In ECCV. Springer , 505--520. Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating summaries from user videos. In ECCV. Springer, 505--520."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2012.2192917"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_5"},{"key":"e_1_3_2_1_18_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1197\/jamia.M1733"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00517"},{"key":"e_1_3_2_1_21_1","volume-title":"Robustness Analysis of Visual Question Answering Models by Basic Questions","author":"Huang Jia-Hong","year":"2017","unstructured":"Jia-Hong Huang . 2017. Robustness Analysis of Visual Question Answering Models by Basic Questions . King Abdullah University of Science and Technology MS thesis ( 2017 ). Jia-Hong Huang. 2017. Robustness Analysis of Visual Question Answering Models by Basic Questions. King Abdullah University of Science and Technology MS thesis (2017)."},{"key":"e_1_3_2_1_22_1","volume-title":"CVPR VQA Challenge Workshop","author":"Huang Jia-Hong","year":"2017","unstructured":"Jia-Hong Huang , Modar Alfadly , and Bernard Ghanem . 2017 . Vqabq: Visual question answering by basic questions . CVPR VQA Challenge Workshop (2017). Jia-Hong Huang, Modar Alfadly, and Bernard Ghanem. 2017. Vqabq: Visual question answering by basic questions. CVPR VQA Challenge Workshop (2017)."},{"key":"e_1_3_2_1_23_1","volume-title":"2019 a. Assessing the robustness of visual question answering. arXiv preprint arXiv:1912.01452","author":"Huang Jia-Hong","year":"2019","unstructured":"Jia-Hong Huang , Modar Alfadly , Bernard Ghanem , and Marcel Worring . 2019 a. Assessing the robustness of visual question answering. arXiv preprint arXiv:1912.01452 ( 2019 ). Jia-Hong Huang, Modar Alfadly, Bernard Ghanem, and Marcel Worring. 2019 a. Assessing the robustness of visual question answering. arXiv preprint arXiv:1912.01452 (2019)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018449"},{"key":"e_1_3_2_1_25_1","volume-title":"CVPR VQA Challenge and Visual Dialog Workshop","author":"Huang Jia-Hong","year":"2018","unstructured":"Jia-Hong Huang , Cuong Duc Dao , Modar Alfadly , C Huck Yang , and Bernard Ghanem . 2018 . Robustness analysis of visual qa models by basic questions . CVPR VQA Challenge and Visual Dialog Workshop (2018). Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, C Huck Yang, and Bernard Ghanem. 2018. Robustness analysis of visual qa models by basic questions. CVPR VQA Challenge and Visual Dialog Workshop (2018)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372278.3390695"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460426.3463667"},{"key":"e_1_3_2_1_28_1","volume-title":"Deep Context-Encoding Network for Retinal Image Captioning. IEEE International Conference on Image Processing (ICIP).","author":"Huang Jia-Hong","year":"2021","unstructured":"Jia-Hong Huang , Ting-Wei Wu , Chao-Han Huck Yang , and Marcel Worring . 2021 b . Deep Context-Encoding Network for Retinal Image Captioning. IEEE International Conference on Image Processing (ICIP). Jia-Hong Huang, Ting-Wei Wu, Chao-Han Huck Yang, and Marcel Worring. 2021 b. Deep Context-Encoding Network for Retinal Image Captioning. IEEE International Conference on Image Processing (ICIP)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Jia-Hong Huang Ting-Wei Wu Chao-Han Huck Yang and Marcel Worring. 2021 c. Longer Version for \u201cDeep Context-Encoding Network for Retinal Image Captioning\u201d. arXiv preprint arXiv:2105.14538.  Jia-Hong Huang Ting-Wei Wu Chao-Han Huck Yang and Marcel Worring. 2021 c. Longer Version for \u201cDeep Context-Encoding Network for Retinal Image Captioning\u201d. arXiv preprint arXiv:2105.14538.","DOI":"10.1109\/ICIP42928.2021.9506803"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00249"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.04.132"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2904996"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018537"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.284"},{"key":"e_1_3_2_1_35_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_36_1","volume-title":"Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114","author":"Kingma Diederik P","year":"2013","unstructured":"Diederik P Kingma and Max Welling . 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 ( 2013 ). Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1561\/9781601986290"},{"key":"e_1_3_2_1_38_1","volume-title":"2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353","author":"Lee Yong Jae","year":"2012","unstructured":"Yong Jae Lee , Joydeep Ghosh , and Kristen Grauman . 2012 . Discovering important people and objects for egocentric video summarization . In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353 . Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2860797"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-1618"},{"key":"e_1_3_2_1_41_1","volume-title":"Neural word embedding as implicit matrix factorization. Advances in neural information processing systems","author":"Levy Omer","year":"2014","unstructured":"Omer Levy and Yoav Goldberg . 2014b. Neural word embedding as implicit matrix factorization. Advances in neural information processing systems , Vol. 27 ( 2014 ), 2177--2185. Omer Levy and Yoav Goldberg. 2014b. Neural word embedding as implicit matrix factorization. Advances in neural information processing systems, Vol. 27 (2014), 2177--2185."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2017.8297032"},{"key":"e_1_3_2_1_43_1","volume-title":"Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855","author":"Liu Nelson F","year":"2019","unstructured":"Nelson F Liu , Matt Gardner , Yonatan Belinkov , Matthew E Peters , and Noah A Smith . 2019. Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855 ( 2019 ). Nelson F Liu, Matt Gardner, Yonatan Belinkov, Matthew E Peters, and Noah A Smith. 2019. Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855 (2019)."},{"key":"e_1_3_2_1_44_1","volume-title":"Asian Conference on Computer Vision. Springer, 235--250","author":"Liu Yi-Chieh","year":"2018","unstructured":"Yi-Chieh Liu , Hao-Hsiang Yang , C-H Huck Yang , Jia-Hong Huang , Meng Tian , Hiromasa Morikawa , Yi-Chang James Tsai , and Jesper Tegner . 2018 . Synthesizing new retinal symptom images by multiple generative models . In Asian Conference on Computer Vision. Springer, 235--250 . Yi-Chieh Liu, Hao-Hsiang Yang, C-H Huck Yang, Jia-Hong Huang, Meng Tian, Hiromasa Morikawa, Yi-Chang James Tsai, and Jesper Tegner. 2018. Synthesizing new retinal symptom images by multiple generative models. In Asian Conference on Computer Vision. Springer, 235--250."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.318"},{"key":"e_1_3_2_1_46_1","volume-title":"Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Greg Corrado , and Jeffrey Dean . 2013. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546 ( 2013 ). Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546 (2013)."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/946247.946650"},{"key":"e_1_3_2_1_48_1","volume-title":"Asian Conference on Computer Vision. Springer, 361--377","author":"Otani Mayu","year":"2016","unstructured":"Mayu Otani , Yuta Nakashima , Esa Rahtu , Janne Heikkil\"a, and Naokazu Yokoya . 2016 . Video summarization using deep semantic features . In Asian Conference on Computer Vision. Springer, 361--377 . Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkil\"a, and Naokazu Yokoya. 2016. Video summarization using deep semantic features. In Asian Conference on Computer Vision. Springer, 361--377."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.395"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.455"},{"key":"e_1_3_2_1_51_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_1_53_1","volume-title":"Deep contextualized word representations. arXiv preprint arXiv:1802.05365","author":"Peters Matthew E","year":"2018","unstructured":"Matthew E Peters , Mark Neumann , Mohit Iyyer , Matt Gardner , Christopher Clark , Kenton Lee , and Luke Zettlemoyer . 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 ( 2018 ). Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)."},{"key":"e_1_3_2_1_54_1","volume-title":"Language models are unsupervised multitask learners. OpenAI blog","author":"Radford Alec","year":"2019","unstructured":"Alec Radford , Jeffrey Wu , Rewon Child , David Luan , Dario Amodei , and Ilya Sutskever . 2019. Language models are unsupervised multitask learners. OpenAI blog , Vol. 1 , 8 ( 2019 ), 9. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9."},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00809"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3347318.3355524"},{"key":"e_1_3_2_1_57_1","unstructured":"Sam Scott and Stan Matwin. 1998. Text classification using WordNet hypernyms. In Usage of WordNet in Natural Language Processing Systems.  Sam Scott and Stan Matwin. 1998. Text classification using WordNet hypernyms. In Usage of WordNet in Natural Language Processing Systems."},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMEW.2016.7574720"},{"key":"e_1_3_2_1_59_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 5179--5187","author":"Song Yale","year":"2015","unstructured":"Yale Song , Jordi Vallmitjana , Amanda Stent , and Alejandro Jaimes . 2015 . Tvsum: Summarizing web videos using titles . In Proceedings of the IEEE conference on computer vision and pattern recognition. 5179--5187 . Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5179--5187."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.9790\/0661-16153438"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123297"},{"key":"e_1_3_2_1_62_1","volume-title":"Attention is all you need. arXiv preprint arXiv:1706.03762","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2007.4284941"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11297"},{"key":"e_1_3_2_1_65_1","volume-title":"ICML Workshop on Computational Biology","author":"Huck Yang C-H","year":"2018","unstructured":"C-H Huck Yang , Jia-Hong Huang , Fangyu Liu , Fang-Yi Chiu , Mengya Gao , Weifeng Lyu , Jesper Tegner , 2018 a. A novel hybrid machine learning model for auto-classification of retinal diseases . ICML Workshop on Computational Biology (2018). C-H Huck Yang, Jia-Hong Huang, Fangyu Liu, Fang-Yi Chiu, Mengya Gao, Weifeng Lyu, Jesper Tegner, et al. 2018a. A novel hybrid machine learning model for auto-classification of retinal diseases. ICML Workshop on Computational Biology (2018)."},{"key":"e_1_3_2_1_66_1","volume-title":"Asian Conference on Computer Vision. Springer, 323--338","author":"Huck Yang C-H","year":"2018","unstructured":"C-H Huck Yang , Fangyu Liu , Jia-Hong Huang , Meng Tian , MD I- Hung Lin , Yi Chieh Liu , Hiromasa Morikawa , Hao-Hsiang Yang , and Jesper Tegner . 2018 b. Auto-classification of retinal diseases in the limit of sparse data using a two-streams machine learning model . In Asian Conference on Computer Vision. Springer, 323--338 . C-H Huck Yang, Fangyu Liu, Jia-Hong Huang, Meng Tian, MD I-Hung Lin, Yi Chieh Liu, Hiromasa Morikawa, Hao-Hsiang Yang, and Jesper Tegner. 2018b. Auto-classification of retinal diseases in the limit of sparse data using a two-streams machine learning model. In Asian Conference on Computer Vision. Springer, 323--338."},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019143"},{"key":"e_1_3_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2017.2771247"},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.120"},{"key":"e_1_3_2_1_70_1","volume-title":"Video summarization with long short-term memory","author":"Zhang Ke","unstructured":"Ke Zhang , Wei-Lun Chao , Fei Sha , and Kristen Grauman . 2016b. Video summarization with long short-term memory . In ECCV. Springer , 766--782. Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016b. Video summarization with long short-term memory. In ECCV. Springer, 766--782."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3321408.3322622"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123328"},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00773"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.322"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12255"},{"key":"e_1_3_2_1_76_1","volume-title":"Video summarisation by classification with deep reinforcement learning. arXiv preprint arXiv","author":"Zhou Kaiyang","year":"2018","unstructured":"Kaiyang Zhou , Tao Xiang , and Andrea Cavallaro . 2018b. Video summarisation by classification with deep reinforcement learning. arXiv preprint arXiv ( 2018 ). Kaiyang Zhou, Tao Xiang, and Andrea Cavallaro. 2018b. Video summarisation by classification with deep reinforcement learning. arXiv preprint arXiv (2018)."}],"event":{"name":"ICMR '21: International Conference on Multimedia Retrieval","location":"Taipei Taiwan","acronym":"ICMR '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2021 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463662","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460426.3463662","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:04Z","timestamp":1750191424000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463662"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,24]]},"references-count":76,"alternative-id":["10.1145\/3460426.3463662","10.1145\/3460426"],"URL":"https:\/\/doi.org\/10.1145\/3460426.3463662","relation":{},"subject":[],"published":{"date-parts":[[2021,8,24]]},"assertion":[{"value":"2021-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}