{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,12]],"date-time":"2025-09-12T17:45:42Z","timestamp":1757699142371,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,8,20]],"date-time":"2020-08-20T00:00:00Z","timestamp":1597881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,8,23]]},"DOI":"10.1145\/3394486.3403297","type":"proceedings-article","created":{"date-parts":[[2020,8,20]],"date-time":"2020-08-20T23:17:27Z","timestamp":1597965447000},"page":"2474-2482","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Combo-Attention Network for Baidu Video Advertising"],"prefix":"10.1145","author":[{"given":"Tan","family":"Yu","sequence":"first","affiliation":[{"name":"Baidu Research, Seattle, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Yang","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Li","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaodong","family":"Chen","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingming","family":"Sun","sequence":"additional","affiliation":[{"name":"Baidu Research, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ping","family":"Li","sequence":"additional","affiliation":[{"name":"Baidu Research, Seattle, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,8,20]]},"reference":[{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR)","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural Machine Translation by Jointly Learning to Align and Translate . In Proceedings of the 3rd International Conference on Learning Representations (ICLR) . San Diego, CA. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). San Diego, CA."},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR)","author":"Bradbury James","year":"2017","unstructured":"James Bradbury , Stephen Merity , Caiming Xiong , and Richard Socher . 2017 . QuasiRecurrent Neural Networks . In Proceedings of the 5th International Conference on Learning Representations (ICLR) . Toulon, France. James Bradbury, Stephen Merity, Caiming Xiong, and Richard Socher. 2017. QuasiRecurrent Neural Networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR). Toulon, France."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/792550.792552"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_6_1","volume-title":"Courville","author":"de Vries Harm","year":"2017","unstructured":"Harm de Vries , Florian Strub , J\u00e9r\u00e9mie Mary , Hugo Larochelle , Olivier Pietquin , and Aaron C . Courville . 2017 . Modulating early visual processing by language. In Advances in Neural Information Processing Systems (NIPS). Long Beach, CA , 6594--6604. Harm de Vries, Florian Strub, J\u00e9r\u00e9mie Mary, Hugo Larochelle, Olivier Pietquin, and Aaron C. Courville. 2017. Modulating early visual processing by language. In Advances in Neural Information Processing Systems (NIPS). Long Beach, CA, 6594--6604."},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) . Minneapolis, MN, 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 4171--4186."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2599174"},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the 2018 British Machine Vision Conference 2018 (BMVC)","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri , David J. Fleet , Jamie Ryan Kiros , and Sanja Fidler . 2018 . VSE++: Improving Visual-Semantic Embeddings with Hard Negatives . In Proceedings of the 2018 British Machine Vision Conference 2018 (BMVC) . Newcastle, UK, 12. Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives. In Proceedings of the 2018 British Machine Vision Conference 2018 (BMVC). Newcastle, UK, 12."},{"volume-title":"Sponsored Search: A Brief History. In SSA Workshop","author":"Daniel","key":"e_1_3_2_1_10_1","unstructured":"Daniel C. Fain and Jan O. Pedersen. 2006 . Sponsored Search: A Brief History. In SSA Workshop . Ann Arbor, Michigan. Daniel C. Fain and Jan O. Pedersen. 2006. Sponsored Search: A Brief History. In SSA Workshop. Ann Arbor, Michigan."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330651"},{"volume-title":"Proceedings of the 11th European Conference on Computer Vision (ECCV)","author":"Farhadi Ali","key":"e_1_3_2_1_12_1","unstructured":"Ali Farhadi , Seyyed Mohammad Mohsen Hejrati , Mohammad Amin Sadeghi , Peter Young , Cyrus Rashtchian , Julia Hockenmaier , and David A. Forsyth . 2010. Every Picture Tells a Story: Generating Sentences from Images. In Computer Vision - ECCV 2010 , Proceedings of the 11th European Conference on Computer Vision (ECCV) . Heraklion, Greece, 15--29. Ali Farhadi, Seyyed Mohammad Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David A. Forsyth. 2010. Every Picture Tells a Story: Generating Sentences from Images. In Computer Vision - ECCV 2010, Proceedings of the 11th European Conference on Computer Vision (ECCV). Heraklion, Greece, 15--29."},{"volume-title":"Advances in Neural Information Processing Systems (NIPS).","author":"Frome Andrea","key":"e_1_3_2_1_13_1","unstructured":"Andrea Frome , Gregory S. Corrado , Jonathon Shlens , Samy Bengio , Jeffrey Dean , Marc'Aurelio Ranzato , and Tomas Mikolov . 2013. DeViSE: A Deep VisualSemantic Embedding Model . In Advances in Neural Information Processing Systems (NIPS). Lake Tahoe, NV , 2121--2129. Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep VisualSemantic Embedding Model. In Advances in Neural Information Processing Systems (NIPS). Lake Tahoe, NV, 2121--2129."},{"volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML)","author":"Gehring Jonas","key":"e_1_3_2_1_14_1","unstructured":"Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , and Yann N. Dauphin . 2017. Convolutional Sequence to Sequence Learning . In Proceedings of the 34th International Conference on Machine Learning (ICML) . Sydney, Australia, 1243-- 1252. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional Sequence to Sequence Learning. In Proceedings of the 34th International Conference on Machine Learning (ICML). Sydney, Australia, 1243-- 1252."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401238"},{"volume-title":"Advances in Neural Information Processing Systems (NeurIPS).","author":"Lu Jiasen","key":"e_1_3_2_1_20_1","unstructured":"Jiasen Lu , Dhruv Batra , Devi Parikh , and Stefan Lee . 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks . In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada , 13--23. Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 13--23."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.301"},{"volume-title":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR)","author":"Mithun Niluthpol Chowdhury","key":"e_1_3_2_1_22_1","unstructured":"Niluthpol Chowdhury Mithun , Juncheng Li , Florian Metze , and Amit K . RoyChowdhury. 2018. Learning Joint Embedding with Multimodal Cues for CrossModal Video-Text Retrieval . In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR) . Yokohama, Japan, 19--27. Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, and Amit K. RoyChowdhury. 2018. Learning Joint Embedding with Multimodal Cues for CrossModal Video-Text Retrieval. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR). Yokohama, Japan, 19--27."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46604-0_46"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.497"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_3_2_1_26_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Advances in Neural Information Processing Systems (NIPS). Montreal Canada 568--576.  Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Advances in Neural Information Processing Systems (NIPS). Montreal Canada 568--576."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00756"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"volume-title":"Advances in Neural Information Processing Systems (NIPS).","author":"Vaswani Ashish","key":"e_1_3_2_1_29_1","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is All you Need . In Advances in Neural Information Processing Systems (NIPS). Long Beach, CA , 5998--6008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems (NIPS). Long Beach, CA, 5998--6008."},{"key":"e_1_3_2_1_30_1","volume-title":"Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Wang Xiaolong","year":"2018","unstructured":"Xiaolong Wang , Ross B. Girshick , Abhinav Gupta , and Kaiming He . 2018 . NonLocal Neural Networks . In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Salt Lake City, UT, 7794--7803. Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. 2018. NonLocal Neural Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, 7794--7803."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01228-1_25"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00468"},{"volume-title":"Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI)","author":"Xu Ran","key":"e_1_3_2_1_33_1","unstructured":"Ran Xu , Caiming Xiong , Wei Chen , and Jason J. Corso . 2015. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework . In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI) . Austin, TX, 2346--2352. Ran Xu, Caiming Xiong, Wei Chen, and Jason J. Corso. 2015. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI). Austin, TX, 2346--2352."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.347"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00644"},{"key":"e_1_3_2_1_36_1","volume-title":"SONG: Approximate Nearest Neighbor Search on GPU. In 35th IEEE International Conference on Data Engineering (ICDE)","author":"Zhao Weijie","year":"2020","unstructured":"Weijie Zhao , Shulong Tan , and Ping Li . 2020 . SONG: Approximate Nearest Neighbor Search on GPU. In 35th IEEE International Conference on Data Engineering (ICDE) . Dallas, TX. Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In 35th IEEE International Conference on Data Engineering (ICDE). Dallas, TX."},{"key":"e_1_3_2_1_37_1","volume-title":"Proceedings of the 3rd Conference on Third Conference on Machine Learning and Systems (MLSys)","author":"Zhao Weijie","year":"2020","unstructured":"Weijie Zhao , Deping Xie , Ronglai Jia , Yulei Qian , Ruiquan Ding , Mingming Sun , and Ping Li . 2020 . Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems . In Proceedings of the 3rd Conference on Third Conference on Machine Learning and Systems (MLSys) . Huston, TX. Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Proceedings of the 3rd Conference on Third Conference on Machine Learning and Systems (MLSys). Huston, TX."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3358045"},{"volume-title":"Advances in Neural Information Processing Systems (NeurIPS).","author":"Zhou Zhixin","key":"e_1_3_2_1_39_1","unstructured":"Zhixin Zhou , Shulong Tan , Zhaozhuo Xu , and Ping Li. 2019. M\u00f6bius Transformation for Fast Inner Product Search on Graph . In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada , 8216--8227. Zhixin Zhou, Shulong Tan, Zhaozhuo Xu, and Ping Li. 2019. M\u00f6bius Transformation for Fast Inner Product Search on Graph. In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 8216--8227."}],"event":{"name":"KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"],"location":"Virtual Event CA USA","acronym":"KDD '20"},"container-title":["Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3403297","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394486.3403297","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:48Z","timestamp":1750197708000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3403297"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,20]]},"references-count":37,"alternative-id":["10.1145\/3394486.3403297","10.1145\/3394486"],"URL":"https:\/\/doi.org\/10.1145\/3394486.3403297","relation":{},"subject":[],"published":{"date-parts":[[2020,8,20]]},"assertion":[{"value":"2020-08-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}