{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T13:06:33Z","timestamp":1765544793853,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,3,8]],"date-time":"2021-03-08T00:00:00Z","timestamp":1615161600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,3,8]]},"DOI":"10.1145\/3437963.3441807","type":"proceedings-article","created":{"date-parts":[[2021,3,6]],"date-time":"2021-03-06T04:34:28Z","timestamp":1615005268000},"page":"49-57","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["DECAF: Deep Extreme Classification with Label Features"],"prefix":"10.1145","author":[{"given":"Anshul","family":"Mittal","sequence":"first","affiliation":[{"name":"IIT Delhi, Delhi, India"}]},{"given":"Kunal","family":"Dahiya","sequence":"additional","affiliation":[{"name":"IIT Delhi, Delhi, India"}]},{"given":"Sheshansh","family":"Agrawal","sequence":"additional","affiliation":[{"name":"Microsoft Research, Bengaluru, India"}]},{"given":"Deepak","family":"Saini","sequence":"additional","affiliation":[{"name":"Microsoft Research, Bengaluru, India"}]},{"given":"Sumeet","family":"Agarwal","sequence":"additional","affiliation":[{"name":"IIT Delhi, Delhi, India"}]},{"given":"Purushottam","family":"Kar","sequence":"additional","affiliation":[{"name":"IIT Kanpur &amp; Microsoft Research, Kanpur, India"}]},{"given":"Manik","family":"Varma","sequence":"additional","affiliation":[{"name":"Microsoft Research &amp; IIT Delhi, Bengaluru, India"}]}],"member":"320","published-online":{"date-parts":[[2021,3,8]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"R. Agrawal A. Gupta Y. Prabhu and M. Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW.  R. Agrawal A. Gupta Y. Prabhu and M. Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW.","DOI":"10.1145\/2488388.2488391"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"R. Babbar and B. Sch\u00f6lkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.  R. Babbar and B. Sch\u00f6lkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.","DOI":"10.1145\/3018661.3018741"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-019-05791-5"},{"key":"e_1_3_2_1_4_1","unstructured":"K. Bhatia K. Dahiya H. Jain A. Mittal Y. Prabhu and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. http:\/\/manikvarma.org\/downloads\/XC\/XMLRepository.html  K. Bhatia K. Dahiya H. Jain A. Mittal Y. Prabhu and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. http:\/\/manikvarma.org\/downloads\/XC\/XMLRepository.html"},{"key":"e_1_3_2_1_5_1","unstructured":"K. Bhatia H. Jain P. Kar M. Varma and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS.  K. Bhatia H. Jain P. Kar M. Varma and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"P. Bojanowski E. Grave A. Joulin and T. Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics (2017).  P. Bojanowski E. Grave A. Joulin and T. Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics (2017).","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"W-C. Chang H.-F. Yu K. Zhong Y. Yang and I. Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD.  W-C. Chang H.-F. Yu K. Zhong Y. Yang and I. Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD.","DOI":"10.1145\/3394486.3403368"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"K. Dahiya D. Saini A. Mittal A. Shaw K. Dave A. Soni H. Jain S. Agarwal and M. Varma. 2021. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. In WSDM.  K. Dahiya D. Saini A. Mittal A. Shaw K. Dave A. Soni H. Jain S. Agarwal and M. Varma. 2021. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. In WSDM.","DOI":"10.1145\/3437963.3441810"},{"key":"e_1_3_2_1_9_1","volume-title":"BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL.","author":"Devlin J.","year":"2019","unstructured":"J. Devlin , M. W. Chang , K. Lee , and K. Toutanova . 2019 . BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL."},{"key":"e_1_3_2_1_10_1","unstructured":"X. Glorot and X. Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS.  X. Glorot and X. Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS."},{"key":"e_1_3_2_1_11_1","unstructured":"C. Guo A. Mousavi X. Wu Daniel N. Holtmann-Rice S. Kale S. Reddi and S. Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In Neurips.  C. Guo A. Mousavi X. Wu Daniel N. Holtmann-Rice S. Kale S. Reddi and S. Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In Neurips."},{"volume-title":"Proceedings of the IEEE international conference on computer vision. 1026--1034","author":"He K.","key":"e_1_3_2_1_12_1","unstructured":"K. He , X. Zhang , S. Ren , and J. Sun . 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification . In Proceedings of the IEEE international conference on computer vision. 1026--1034 . K. He, X. Zhang, S. Ren, and J. Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026--1034."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural computation Vol. 9 8 (1997) 1735--1780.  S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural computation Vol. 9 8 (1997) 1735--1780.","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290979"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"H. Jain Y. Prabhu and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation Tagging Ranking and Other Missing Label Applications. In KDD.  H. Jain Y. Prabhu and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation Tagging Ranking and Other Missing Label Applications. In KDD.","DOI":"10.1145\/2939672.2939756"},{"key":"e_1_3_2_1_16_1","unstructured":"V. Jain N. Modhe and P. Rai. 2017. Scalable Generative Models for Multi-label Learning with Missing Labels. In ICML.  V. Jain N. Modhe and P. Rai. 2017. Scalable Generative Models for Multi-label Learning with Missing Labels. In ICML."},{"key":"e_1_3_2_1_17_1","unstructured":"K. Jasinska K. Dembczynski R. Busa-Fekete K. Pfannschmidt T. Klerx and E. Hullermeier. 2016. Extreme F-measure Maximization using Sparse Probability Estimates. In ICML.  K. Jasinska K. Dembczynski R. Busa-Fekete K. Pfannschmidt T. Klerx and E. Hullermeier. 2016. Extreme F-measure Maximization using Sparse Probability Estimates. In ICML."},{"volume-title":"Proceedings of the European Chapter of the Association for Computational Linguistics.","author":"Joulin A.","key":"e_1_3_2_1_18_1","unstructured":"A. Joulin , E. Grave , P. Bojanowski , and T. Mikolov . 2017. Bag of Tricks for Efficient Text Classification . In Proceedings of the European Chapter of the Association for Computational Linguistics. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the European Chapter of the Association for Computational Linguistics."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"B. Kanagal A. Ahmed S. Pandey V. Josifovski J. Yuan and L. Garcia-Pueyo. 2012. Supercharging Recommender Systems Using Taxonomies for Learning User Purchase Behavior. VLDB (June 2012).  B. Kanagal A. Ahmed S. Pandey V. Josifovski J. Yuan and L. Garcia-Pueyo. 2012. Supercharging Recommender Systems Using Taxonomies for Learning User Purchase Behavior. VLDB (June 2012).","DOI":"10.14778\/2336664.2336669"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-020-05888-2"},{"key":"e_1_3_2_1_21_1","volume-title":"Adam: A Method for Stochastic Optimization. CoRR","author":"Kingma P. D.","year":"2014","unstructured":"P. D. Kingma and J. Ba . 2014 . Adam: A Method for Stochastic Optimization. CoRR (2014). P. D. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (2014)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"crossref","unstructured":"J. Liu W. Chang Y. Wu and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR.  J. Liu W. Chang Y. Wu and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR.","DOI":"10.1145\/3077136.3080834"},{"key":"e_1_3_2_1_23_1","unstructured":"T. K. R. Medini Q. Huang Y. Wang V. Mohan and A. Shrivastava. 2019. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. In Neurips.  T. K. R. Medini Q. Huang Y. Wang V. Mohan and A. Shrivastava. 2019. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. In Neurips."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"A. K. Menon K.P. Chitrapura S. Garg D. Agarwal and N. Kota. 2011. Response Prediction Using Collaborative Filtering with Hierarchies and Side-Information. In KDD.  A. K. Menon K.P. Chitrapura S. Garg D. Agarwal and N. Kota. 2011. Response Prediction Using Collaborative Filtering with Hierarchies and Side-Information. In KDD.","DOI":"10.1145\/2020408.2020436"},{"key":"e_1_3_2_1_25_1","unstructured":"T. Mikolov I. Sutskever K. Chen G. Corrado and J. Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NIPS.  T. Mikolov I. Sutskever K. Chen G. Corrado and J. Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NIPS."},{"key":"e_1_3_2_1_26_1","unstructured":"T. Miyato T. Kataoka M. Koyama and Y. Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In ICLR.  T. Miyato T. Kataoka M. Koyama and Y. Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In ICLR."},{"key":"e_1_3_2_1_27_1","unstructured":"A. Niculescu-Mizil and E. Abbasnejad. 2017. Label Filters for Large Scale Multilabel Classification. In AISTATS.  A. Niculescu-Mizil and E. Abbasnejad. 2017. Label Filters for Large Scale Multilabel Classification. In AISTATS."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Y. Prabhu A. Kag S. Gopinath K. Dahiya S. Harsola R. Agrawal and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging ranking and recommendation. In WSDM.  Y. Prabhu A. Kag S. Gopinath K. Dahiya S. Harsola R. Agrawal and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging ranking and recommendation. In WSDM.","DOI":"10.1145\/3159652.3159660"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3185998"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Y. Prabhu and M. Varma. 2014. FastXML: A Fast Accurate and Stable Tree-classifier for eXtreme Multi-label Learning. In KDD.  Y. Prabhu and M. Varma. 2014. FastXML: A Fast Accurate and Stable Tree-classifier for eXtreme Multi-label Learning. In KDD.","DOI":"10.1145\/2623330.2623651"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"N. Sachdeva K. Gupta and V. Pudi. 2018. Attentive Neural Architecture Incorporating Song Features for Music Recommendation. In RecSys.  N. Sachdeva K. Gupta and V. Pudi. 2018. Attentive Neural Architecture Incorporating Song Features for Music Recommendation. In RecSys.","DOI":"10.1145\/3240323.3240397"},{"key":"e_1_3_2_1_32_1","unstructured":"W. Siblini P. Kuntz and F. Meyer. 2018. CRAFTML an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In ICML.  W. Siblini P. Kuntz and F. Meyer. 2018. CRAFTML an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In ICML."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Y. Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD.  Y. Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD.","DOI":"10.1145\/3097983.3097987"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"crossref","unstructured":"L. Wu A. Fisch S. Chopra K. Adams A. Bordes and J. Weston. 2017. StarSpace: Embed All The Things! CoRR (2017).  L. Wu A. Fisch S. Chopra K. Adams A. Bordes and J. Weston. 2017. StarSpace: Embed All The Things! CoRR (2017).","DOI":"10.1609\/aaai.v32i1.11996"},{"key":"e_1_3_2_1_35_1","unstructured":"M. Wydmuch K. Jasinska M. Kuznetsov R. Busa-Fekete and K. Dembczynski. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS.  M. Wydmuch K. Jasinska M. Kuznetsov R. Busa-Fekete and K. Dembczynski. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS."},{"key":"e_1_3_2_1_36_1","unstructured":"E.H. I. Yen X. Huang W. Dai I. Ravikumar P.and Dhillon and E. Xing. 2017. PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD.  E.H. I. Yen X. Huang W. Dai I. Ravikumar P.and Dhillon and E. Xing. 2017. PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD."},{"key":"e_1_3_2_1_37_1","unstructured":"I. Yen S. Kale F. Yu D. Holtmann R. S. Kumar and P. Ravikumar. 2018. Loss Decomposition for Fast Learning in Large Output Spaces. In ICML.  I. Yen S. Kale F. Yu D. Holtmann R. S. Kumar and P. Ravikumar. 2018. Loss Decomposition for Fast Learning in Large Output Spaces. In ICML."},{"key":"e_1_3_2_1_38_1","volume-title":"Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Neurips.","author":"You R.","year":"2019","unstructured":"R. You , Z. Zhang , Z. Wang , S. Dai , H. Mamitsuka , and S. Zhu . 2019 . Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Neurips. R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka, and S. Zhu. 2019. Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Neurips."}],"event":{"name":"WSDM '21: The Fourteenth ACM International Conference on Web Search and Data Mining","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data","SIGIR ACM Special Interest Group on Information Retrieval"],"location":"Virtual Event Israel","acronym":"WSDM '21"},"container-title":["Proceedings of the 14th ACM International Conference on Web Search and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3437963.3441807","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3437963.3441807","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:36Z","timestamp":1750193256000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3437963.3441807"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,8]]},"references-count":38,"alternative-id":["10.1145\/3437963.3441807","10.1145\/3437963"],"URL":"https:\/\/doi.org\/10.1145\/3437963.3441807","relation":{},"subject":[],"published":{"date-parts":[[2021,3,8]]},"assertion":[{"value":"2021-03-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}