{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,16]],"date-time":"2026-07-16T05:16:17Z","timestamp":1784178977676,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,3,8]],"date-time":"2021-03-08T00:00:00Z","timestamp":1615161600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,3,8]]},"DOI":"10.1145\/3437963.3441822","type":"proceedings-article","created":{"date-parts":[[2021,3,6]],"date-time":"2021-03-06T04:36:17Z","timestamp":1615005377000},"page":"58-66","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Modeling Across-Context Attention For Long-Tail Query Classification in E-commerce"],"prefix":"10.1145","author":[{"given":"Junhao","family":"Zhang","sequence":"first","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Weidi","family":"Xu","sequence":"additional","affiliation":[{"name":"Ant Financial Services Group, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jianhui","family":"Ji","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xi","family":"Chen","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hongbo","family":"Deng","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Keping","family":"Yang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,3,8]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/00031305.1992.10475879","article-title":"An introduction to kernel and nearest-neighbor non-parametric regression","volume":"46","author":"Altman Naomi S","year":"1992","unstructured":"Naomi S Altman . 1992 . An introduction to kernel and nearest-neighbor non-parametric regression . The American Statistician 46 , 3 (1992), 175 -- 185 . Naomi S Altman. 1992. An introduction to kernel and nearest-neighbor non-parametric regression. The American Statistician 46, 3 (1992), 175--185.","journal-title":"The American Statistician"},{"key":"e_1_3_2_2_2_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural machine translation by jointly learning to align and translate . In 3rd International Conference on Learning Representations, ICLR 2015. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2015.14"},{"key":"e_1_3_2_2_4_1","volume-title":"Learning multi-label scene classification. Pattern recognition 37, 9","author":"Boutell Matthew R","year":"2004","unstructured":"Matthew R Boutell , Jiebo Luo , Xipeng Shen , and Christopher M Brown . 2004. Learning multi-label scene classification. Pattern recognition 37, 9 ( 2004 ), 1757--1771. Matthew R Boutell, Jiebo Luo, Xipeng Shen, and Christopher M Brown. 2004. Learning multi-label scene classification. Pattern recognition 37, 9 (2004), 1757--1771."},{"key":"e_1_3_2_2_5_1","volume-title":"Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil.","author":"Cer Daniel Matthew","year":"2018","unstructured":"Daniel Matthew Cer , Yinfei Yang , Sheng yi Kong , Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018 . Universal Sentence Encoder. ArXivabs\/ 1803.11175 (2018). Daniel Matthew Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. ArXivabs\/1803.11175 (2018)."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403368"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2017.7966144"},{"key":"e_1_3_2_2_8_1","volume-title":"Proceedings of the 27th international conference on machine learning (ICML-10)","author":"Cheng Weiwei","year":"2010","unstructured":"Weiwei Cheng , Eyke H\u00fcllermeier , and Krzysztof J Dembczynski . 2010 . Bayes optimal multilabel classification via probabilistic classifier chains . In Proceedings of the 27th international conference on machine learning (ICML-10) . 279--286. Weiwei Cheng, Eyke H\u00fcllermeier, and Krzysztof J Dembczynski. 2010. Bayes optimal multilabel classification via probabilistic classifier chains. In Proceedings of the 27th international conference on machine learning (ICML-10). 279--286."},{"key":"e_1_3_2_2_9_1","volume-title":"Jul","author":"Duchi John","year":"2011","unstructured":"John Duchi , Elad Hazan , and Yoram Singer . 2011. Adaptive subgradient methods for online learning and stochastic optimization.Journal of machine learning research 12 , Jul ( 2011 ), 2121--2159. John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization.Journal of machine learning research 12, Jul (2011), 2121--2159."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"crossref","unstructured":"Andr\u00e9 Elisseeff and Jason Weston. 2002. A kernel method for multi-labelled classification. In Advances in neural information processing systems. 681--687. Andr\u00e9 Elisseeff and Jason Weston. 2002. A kernel method for multi-labelled classification. In Advances in neural information processing systems. 681--687.","DOI":"10.7551\/mitpress\/1120.003.0092"},{"key":"e_1_3_2_2_11_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_2_12_1","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Iyyer Mohit","unstructured":"Mohit Iyyer , Varun Manjunatha , Jordan Boyd-Graber , and Hal Daum\u00e9 III. 2015. Deep Unordered Composition Rivals Syntactic Methods for Text Classification . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . Association for Computational Linguistics , Beijing, China , 1681--1691. https:\/\/doi.org\/10.3115\/v1\/P15--1162 Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daum\u00e9 III. 2015. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 1681--1691. https:\/\/doi.org\/10.3115\/v1\/P15--1162"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"crossref","unstructured":"Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation(1972). Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation(1972).","DOI":"10.1108\/eb026526"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Armand Joulin Edouard Grave Piotr Bojanowski and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In EACL. Armand Joulin Edouard Grave Piotr Bojanowski and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In EACL.","DOI":"10.18653\/v1\/E17-2068"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_2_2_16_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014)."},{"key":"e_1_3_2_2_17_1","volume-title":"1995. Convolutional networks for images, speech, and time series.The handbook of brain theory and neural networks 3361, 10","author":"LeCun Yann","year":"1995","unstructured":"Yann LeCun , Yoshua Bengio , 1995. Convolutional networks for images, speech, and time series.The handbook of brain theory and neural networks 3361, 10 ( 1995 ), 1995. Yann LeCun, Yoshua Bengio, et al.1995. Convolutional networks for images, speech, and time series.The handbook of brain theory and neural networks 3361, 10 (1995), 1995."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1099"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1485"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2018.8622008"},{"key":"e_1_3_2_2_21_1","volume-title":"Mo Yu, Bing Xiang,Bowen Zhou, and Yoshua Bengio.","author":"Lin Zhouhan","year":"2017","unstructured":"Zhouhan Lin , Minwei Feng , Cicero Nogueira dos Santos , Mo Yu, Bing Xiang,Bowen Zhou, and Yoshua Bengio. 2017 . A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130(2017). Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang,Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130(2017)."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080834"},{"key":"e_1_3_2_2_23_1","volume-title":"Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2873--2879","author":"Liu Pengfei","year":"2016","unstructured":"Pengfei Liu , Xipeng Qiu , and Xuanjing Huang . 2016 . Recurrent neural network for text classification with multi-task learning . In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2873--2879 . Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2873--2879."},{"key":"e_1_3_2_2_24_1","first-page":"2579","article-title":"Visualizing data using t-SNE","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton . 2008 . Visualizing data using t-SNE . Journal of machine learning research 9 , Nov (2008), 2579 -- 2605 . Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.","journal-title":"Journal of machine learning research 9"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104425"},{"key":"e_1_3_2_2_26_1","volume-title":"Iryna Gurevych, and Johannes F\u00fcrnkranz.","author":"Nam Jinseok","year":"2014","unstructured":"Jinseok Nam , Jungi Kim , Eneldo Loza Menc\u00eda , Iryna Gurevych, and Johannes F\u00fcrnkranz. 2014 . Large-scale multi-label text classification?revisiting neural net-works. In Joint european conference on machine learning and knowledge discovery in databases. Springer , 437--452. Jinseok Nam, Jungi Kim, Eneldo Loza Menc\u00eda, Iryna Gurevych, and Johannes F\u00fcrnkranz. 2014. Large-scale multi-label text classification?revisiting neural net-works. In Joint european conference on machine learning and knowledge discovery in databases. Springer, 437--452."},{"key":"e_1_3_2_2_27_1","volume-title":"Hyunwoo J Kim, and Johannes F\u00fcrnkranz.","author":"Nam Jinseok","year":"2017","unstructured":"Jinseok Nam , Eneldo Loza Menc\u00eda , Hyunwoo J Kim, and Johannes F\u00fcrnkranz. 2017 . Maximizing subset accuracy with recurrent neural networks in multi-label classification. In Advances in neural information processing systems. 5413--5423. Jinseok Nam, Eneldo Loza Menc\u00eda, Hyunwoo J Kim, and Johannes F\u00fcrnkranz. 2017. Maximizing subset accuracy with recurrent neural networks in multi-label classification. In Advances in neural information processing systems. 5413--5423."},{"key":"e_1_3_2_2_28_1","volume-title":"International conference on machine learning. 1310--1318","author":"Pascanu Razvan","year":"2013","unstructured":"Razvan Pascanu , Tomas Mikolov , and Yoshua Bengio . 2013 . On the difficulty of training recurrent neural networks . In International conference on machine learning. 1310--1318 . Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In International conference on machine learning. 1310--1318."},{"key":"e_1_3_2_2_29_1","volume-title":"Classifier chains for multi-label classification. Machine learning 85, 3","author":"Read Jesse","year":"2011","unstructured":"Jesse Read , Bernhard Pfahringer , Geoff Holmes , and Eibe Frank . 2011. Classifier chains for multi-label classification. Machine learning 85, 3 ( 2011 ), 333. Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning 85, 3 (2011), 333."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646047"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1002\/1097-4571(2000)9999:9999<::AID-ASI1591>3.3.CO;2-I"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"crossref","unstructured":"Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM)3 3(2007) 1--13. Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM)3 3(2007) 1--13.","DOI":"10.4018\/jdwm.2007070101"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00048"},{"key":"e_1_3_2_2_34_1","volume-title":"Convolutional Recurrent Neural Networks for Text Classification. 2019 International Joint Conference on Neural Networks (IJCNN)(2019)","author":"Wang Ruishuang","year":"2019","unstructured":"Ruishuang Wang , Zhenwei Li , Jian Cao , Tong Chen , and Lei Wang . 2019 . Convolutional Recurrent Neural Networks for Text Classification. 2019 International Joint Conference on Neural Networks (IJCNN)(2019) , 1--6. Ruishuang Wang, Zhenwei Li, Jian Cao, Tong Chen, and Lei Wang. 2019. Convolutional Recurrent Neural Networks for Text Classification. 2019 International Joint Conference on Neural Networks (IJCNN)(2019), 1--6."},{"key":"e_1_3_2_2_35_1","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey etal2016. Google's neural machine translation system: Bridging the gap between human and machine translation.arXiv preprint arXiv:1609.08144(2016). Yonghui Wu Mike Schuster Zhifeng Chen Quoc V Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey et al.2016. Google's neural machine translation system: Bridging the gap between human and machine translation.arXiv preprint arXiv:1609.08144(2016)."},{"key":"e_1_3_2_2_36_1","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics. 3915--3926","author":"Yang Pengcheng","year":"2018","unstructured":"Pengcheng Yang , Xu Sun , Wei Li , Shuming Ma , Wei Wu , and Houfeng Wang . 2018 . SGM: Sequence Generation Model for Multi-label Classification . In Proceedings of the 27th International Conference on Computational Linguistics. 3915--3926 . Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence Generation Model for Multi-label Classification. In Proceedings of the 27th International Conference on Computational Linguistics. 3915--3926."},{"key":"e_1_3_2_2_37_1","volume-title":"Hovy","author":"Yang Zichao","year":"2016","unstructured":"Zichao Yang , Diyi Yang , Chris Dyer , Xiaodong He , Alexander J. Smola , and Eduard H . Hovy . 2016 . Hierarchical Attention Networks for Document Classification. In HLT-NAACL. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. 2016. Hierarchical Attention Networks for Document Classification. In HLT-NAACL."},{"key":"e_1_3_2_2_38_1","volume-title":"Multilabel neural networks with applications to functional genomics and text categorization","author":"Zhang Min-Ling","year":"2006","unstructured":"Min-Ling Zhang and Zhi-Hua Zhou . 2006. Multilabel neural networks with applications to functional genomics and text categorization .IEEE transactions on Knowledge and Data Engineering 18, 10 ( 2006 ), 1338--1351. Min-Ling Zhang and Zhi-Hua Zhou. 2006. Multilabel neural networks with applications to functional genomics and text categorization.IEEE transactions on Knowledge and Data Engineering 18, 10 (2006), 1338--1351."},{"key":"e_1_3_2_2_39_1","volume-title":"ML-KNN: A lazy learning approach to multi-label learning.Pattern recognition 40, 7","author":"Zhang Min-Ling","year":"2007","unstructured":"Min-Ling Zhang and Zhi-Hua Zhou . 2007. ML-KNN: A lazy learning approach to multi-label learning.Pattern recognition 40, 7 ( 2007 ), 2038--2048. Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning.Pattern recognition 40, 7 (2007), 2038--2048."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098134"}],"event":{"name":"WSDM '21: The Fourteenth ACM International Conference on Web Search and Data Mining","location":"Virtual Event Israel","acronym":"WSDM '21","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data","SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 14th ACM International Conference on Web Search and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3437963.3441822","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3437963.3441822","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:36Z","timestamp":1750193256000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3437963.3441822"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,8]]},"references-count":40,"alternative-id":["10.1145\/3437963.3441822","10.1145\/3437963"],"URL":"https:\/\/doi.org\/10.1145\/3437963.3441822","relation":{},"subject":[],"published":{"date-parts":[[2021,3,8]]},"assertion":[{"value":"2021-03-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}