{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T03:38:30Z","timestamp":1775533110456,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3481545","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T08:03:42Z","timestamp":1634544222000},"page":"1185-1191","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Personalized Multi-modal Video Retrieval on Mobile Devices"],"prefix":"10.1145","author":[{"given":"Haotian","family":"Zhang","sequence":"first","affiliation":[{"name":"Samsung AI Center - Toronto, Toronto, ON, Canada"}]},{"given":"Allan D.","family":"Jepson","sequence":"additional","affiliation":[{"name":"Samsung AI Center - Toronto, Toronto, ON, Canada"}]},{"given":"Iqbal","family":"Mohomed","sequence":"additional","affiliation":[{"name":"Samsung AI Center - Toronto, Toronto, ON, Canada"}]},{"given":"Konstantinos G.","family":"Derpanis","sequence":"additional","affiliation":[{"name":"Samsung AI Center - Toronto, Toronto, ON, Canada"}]},{"given":"Ran","family":"Zhang","sequence":"additional","affiliation":[{"name":"Samsung AI Center - Toronto, Toronto, ON, Canada"}]},{"given":"Afsaneh","family":"Fazly","sequence":"additional","affiliation":[{"name":"Samsung AI Center - Toronto, Toronto, ON, Canada"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural machine translation by jointly learning to align and translate . In Proceedings of the International Conference on Learning Representations . San Diego, CA. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations. San Diego, CA."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2798607"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401192"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01065"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2507880"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics."},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the British Machine Vision Conference.","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri , David J. Fleet , Jamie Ryan Kiros , and Sanja Fidler . 2018 . VSE: Improving visual-semantic embeddings with hard negatives . In Proceedings of the British Machine Vision Conference. Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE: Improving visual-semantic embeddings with hard negatives. In Proceedings of the British Machine Vision Conference."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58548-8_13"},{"key":"e_1_3_2_1_11_1","volume-title":"Proceedings of the The International Conference on Computer Vision. IEEE, 5267--5275","author":"Gao J.","unstructured":"J. Gao , C. Sun , Z. Yang , and R. Nevatia . 2017. TALL: Temporal activity localization via language query . In Proceedings of the The International Conference on Computer Vision. IEEE, 5267--5275 . J. Gao, C. Sun, Z. Yang, and R. Nevatia. 2017. TALL: Temporal activity localization via language query. In Proceedings of the The International Conference on Computer Vision. IEEE, 5267--5275."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578746"},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the The International Conference on Computer Vision. 1247--1257","author":"Hendricks L. A.","unstructured":"L. A. Hendricks , O. Wang , E. Shechtman , J. Sivic , T. Darrell , and B. Russell . 2017. Localizing moments in video with natural language . In Proceedings of the The International Conference on Computer Vision. 1247--1257 . L. A. Hendricks, O. Wang, E. Shechtman, J. Sivic, T. Darrell, and B. Russell. 2017. Localizing moments in video with natural language. In Proceedings of the The International Conference on Computer Vision. 1247--1257."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.767"},{"key":"e_1_3_2_1_15_1","volume-title":"Pixel-BERT: Aligning image pixels with text by deep multi-modal transformers. arXiv:2004.00849","author":"Huang Zhicheng","year":"2020","unstructured":"Zhicheng Huang , Zhaoyang Zeng , Bei Liu , Dongmei Fu , and Jianlong Fu. 2020. Pixel-BERT: Aligning image pixels with text by deep multi-modal transformers. arXiv:2004.00849 ( 2020 ). Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, and Jianlong Fu. 2020. Pixel-BERT: Aligning image pixels with text by deep multi-modal transformers. arXiv:2004.00849 (2020)."},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of the The International Conference on Computer Vision. IEEE, 706--715","author":"Krishna R.","unstructured":"R. Krishna , K. Hata , F. Ren , L. Fei-Fei , and J. C. Niebles . 2017a. Dense-captioning events in videos . In Proceedings of the The International Conference on Computer Vision. IEEE, 706--715 . R. Krishna, K. Hata, F. Ren, L. Fei-Fei, and J. C. Niebles. 2017a. Dense-captioning events in videos. In Proceedings of the The International Conference on Computer Vision. IEEE, 706--715."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58589-1_27"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_21_1","volume-title":"Proceedings of the British Machine Vision Conference.","author":"Liu Yang","year":"2019","unstructured":"Yang Liu , Samuel Albanie , Arsha Nagrani , and Andrew Zisserman . 2019 . Use what you have: Video retrieval using representations from collaborative experts . In Proceedings of the British Machine Vision Conference. Yang Liu, Samuel Albanie, Arsha Nagrani, and Andrew Zisserman. 2019. Use what you have: Video retrieval using representations from collaborative experts. In Proceedings of the British Machine Vision Conference."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454289"},{"key":"e_1_3_2_1_23_1","volume-title":"First Workshop on Privacy in Natural Language Processing at the Conference on Empirical Methods in Natural Language Processing.","author":"Melnick Levi","year":"2020","unstructured":"Levi Melnick , Hussein Elmessilhy , Vassilis Polychronopoulos , Gilsinia Lopez , Yuancheng Tu , Omar Zia Khan , Ye-Yi Wang , and Chris Quirk . 2020 . Privacy-aware personalized entity representations for improved user understanding . In First Workshop on Privacy in Natural Language Processing at the Conference on Empirical Methods in Natural Language Processing. Levi Melnick, Hussein Elmessilhy, Vassilis Polychronopoulos, Gilsinia Lopez, Yuancheng Tu, Omar Zia Khan, Ye-Yi Wang, and Chris Quirk. 2020. Privacy-aware personalized entity representations for improved user understanding. In First Workshop on Privacy in Natural Language Processing at the Conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58595-2_41"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.232"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1514"},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of the International Conference on Machine Learning. 6105--6114","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc V Le . 2019 . EfficientNet: Rethinking model scaling for convolutional neural networks . In Proceedings of the International Conference on Machine Learning. 6105--6114 . Mingxing Tan and Quoc V Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. 6105--6114."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01095"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019062"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401446"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2016.2603342"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2723009"}],"event":{"name":"MM '21: ACM Multimedia Conference","location":"Virtual Event China","acronym":"MM '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3481545","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3481545","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:35Z","timestamp":1750191455000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3481545"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":34,"alternative-id":["10.1145\/3474085.3481545","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3481545","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}