{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,10]],"date-time":"2026-05-10T10:20:38Z","timestamp":1778408438503,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,8,20]],"date-time":"2020-08-20T00:00:00Z","timestamp":1597881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,8,23]]},"DOI":"10.1145\/3394486.3403368","type":"proceedings-article","created":{"date-parts":[[2020,8,20]],"date-time":"2020-08-20T23:03:59Z","timestamp":1597964639000},"page":"3163-3171","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":143,"title":["Taming Pretrained Transformers for Extreme Multi-label Text Classification"],"prefix":"10.1145","author":[{"given":"Wei-Cheng","family":"Chang","sequence":"first","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hsiang-Fu","family":"Yu","sequence":"additional","affiliation":[{"name":"Amazon, Palo Alto, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kai","family":"Zhong","sequence":"additional","affiliation":[{"name":"Amazon, Palo Alto, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yiming","family":"Yang","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Inderjit S.","family":"Dhillon","sequence":"additional","affiliation":[{"name":"Amazon and University of Texas at Austin, Berkeley, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,8,20]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"crossref","unstructured":"Rohit Babbar and Bernhard Sch\u00f6lkopf. 2017. DiSMEC: distributed sparse machines for extreme multi-label classification. In WSDM .  Rohit Babbar and Bernhard Sch\u00f6lkopf. 2017. DiSMEC: distributed sparse machines for extreme multi-label classification. In WSDM .","DOI":"10.1145\/3018661.3018741"},{"key":"e_1_3_2_2_2_1","volume-title":"Data scarcity, robustness and extreme multi-label classification. Machine Learning","author":"Babbar Rohit","year":"2019","unstructured":"Rohit Babbar and Bernhard Sch\u00f6lkopf . 2019. Data scarcity, robustness and extreme multi-label classification. Machine Learning ( 2019 ), 1--23. Rohit Babbar and Bernhard Sch\u00f6lkopf. 2019. Data scarcity, robustness and extreme multi-label classification. Machine Learning (2019), 1--23."},{"key":"e_1_3_2_2_3_1","unstructured":"Kush Bhatia Himanshu Jain Purushottam Kar Manik Varma and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In NIPS .  Kush Bhatia Himanshu Jain Purushottam Kar Manik Varma and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In NIPS ."},{"key":"e_1_3_2_2_4_1","volume-title":"Pre-training Tasks for Embedding-based Large-scale Retrieval. In International Conference on Learning Representations .","author":"Chang Wei-Cheng","year":"2020","unstructured":"Wei-Cheng Chang , Felix X. Yu , Yin-Wen Chang , Yiming Yang , and Sanjiv Kumar . 2020 . Pre-training Tasks for Embedding-based Large-scale Retrieval. In International Conference on Learning Representations . Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. 2020. Pre-training Tasks for Embedding-based Large-scale Retrieval. In International Conference on Learning Representations ."},{"key":"e_1_3_2_2_5_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) .","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . Bert: Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) ."},{"key":"e_1_3_2_2_6_1","unstructured":"Chuan Guo Ali Mousavi Xiang Wu Daniel N Holtmann-Rice Satyen Kale Sashank Reddi and Sanjiv Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In Advances in Neural Information Processing Systems. 4944--4954.  Chuan Guo Ali Mousavi Xiang Wu Daniel N Holtmann-Rice Satyen Kale Sashank Reddi and Sanjiv Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In Advances in Neural Information Processing Systems. 4944--4954."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290979"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Himanshu Jain Yashoteja Prabhu and Manik Varma. 2016. Extreme multi-label loss functions for recommendation tagging ranking & other missing label applications. In KDD .  Himanshu Jain Yashoteja Prabhu and Manik Varma. 2016. Extreme multi-label loss functions for recommendation tagging ranking & other missing label applications. In KDD .","DOI":"10.1145\/2939672.2939756"},{"key":"e_1_3_2_2_9_1","volume-title":"Bonsai-Diverse and Shallow Trees for Extreme Multi-label Classification. arXiv preprint arXiv:1904.08249","author":"Khandagale Sujay","year":"2019","unstructured":"Sujay Khandagale , Han Xiao , and Rohit Babbar . 2019. Bonsai-Diverse and Shallow Trees for Extreme Multi-label Classification. arXiv preprint arXiv:1904.08249 ( 2019 ). Sujay Khandagale, Han Xiao, and Rohit Babbar. 2019. Bonsai-Diverse and Shallow Trees for Extreme Multi-label Classification. arXiv preprint arXiv:1904.08249 (2019)."},{"key":"e_1_3_2_2_10_1","volume-title":"Proceedings of the International Conference on Learning Representations .","author":"Kingma Diederik","year":"2014","unstructured":"Diederik Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization . In Proceedings of the International Conference on Learning Representations . Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations ."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1612"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080834"},{"key":"e_1_3_2_2_13_1","volume-title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 ( 2019 ). Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019)."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-44415-3_4"},{"key":"e_1_3_2_2_15_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.  Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119."},{"key":"e_1_3_2_2_16_1","volume-title":"Hyunwoo J Kim, and Johannes F\u00fcrnkranz.","author":"Nam Jinseok","year":"2017","unstructured":"Jinseok Nam , Eneldo Loza Menc'ia , Hyunwoo J Kim, and Johannes F\u00fcrnkranz. 2017 . Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification. In NIPS . Jinseok Nam, Eneldo Loza Menc'ia, Hyunwoo J Kim, and Johannes F\u00fcrnkranz. 2017. Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification. In NIPS ."},{"key":"e_1_3_2_2_17_1","volume-title":"LSHTC: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581","author":"Partalas Ioannis","year":"2015","unstructured":"Ioannis Partalas , Aris Kosmopoulos , Nicolas Baskiotis , Thierry Artieres , George Paliouras , Eric Gaussier , Ion Androutsopoulos , Massih-Reza Amini , and Patrick Galinari . 2015 . LSHTC: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581 (2015). Ioannis Partalas, Aris Kosmopoulos, Nicolas Baskiotis, Thierry Artieres, George Paliouras, Eric Gaussier, Ion Androutsopoulos, Massih-Reza Amini, and Patrick Galinari. 2015. LSHTC: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581 (2015)."},{"key":"e_1_3_2_2_18_1","volume-title":"Glove: Global vectors for word representation. In EMNLP. 1532--1543.","author":"Pennington Jeffrey","year":"2014","unstructured":"Jeffrey Pennington , Richard Socher , and Christopher D Manning . 2014 . Glove: Global vectors for word representation. In EMNLP. 1532--1543. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532--1543."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3185998"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623651"},{"key":"e_1_3_2_2_22_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).  Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_2_2_23_1","unstructured":"Sashank J Reddi Satyen Kale Felix Yu Dan Holtmann-Rice Jiecao Chen and Sanjiv Kumar. 2019. Stochastic Negative Mining for Learning with Large Output Spaces. In AISTATS .  Sashank J Reddi Satyen Kale Felix Yu Dan Holtmann-Rice Jiecao Chen and Sanjiv Kumar. 2019. Stochastic Negative Mining for Learning with Large Output Spaces. In AISTATS ."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3097987"},{"key":"e_1_3_2_2_25_1","unstructured":"Manik Varma. 2019. The Extreme Classification Repository: Multi-label Datasets & Code. http:\/\/manikvarma.org\/downloads\/XC\/XMLRepository.html .  Manik Varma. 2019. The Extreme Classification Repository: Multi-label Datasets & Code. http:\/\/manikvarma.org\/downloads\/XC\/XMLRepository.html ."},{"key":"e_1_3_2_2_26_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In NIPS .  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In NIPS ."},{"key":"e_1_3_2_2_27_1","volume-title":"Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461","author":"Wang Alex","year":"2018","unstructured":"Alex Wang , Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , and Samuel R Bowman . 2018 . Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018). Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)."},{"key":"e_1_3_2_2_28_1","volume-title":"HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv","author":"Wolf Thomas","year":"2019","unstructured":"Thomas Wolf , Lysandre Debut , Victor Sanh , Julien Chaumond , Clement Delangue , Anthony Moi , Pierric Cistac , Tim Rault , R'emi Louf , Morgan Funtowicz , and Jamie Brew . 2019. HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv , Vol. abs\/ 1910 .03771 ( 2019 ). Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R'emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv , Vol. abs\/1910.03771 (2019)."},{"key":"e_1_3_2_2_29_1","unstructured":"Marek Wydmuch Kalina Jasinska Mikhail Kuznetsov R\u00f3bert Busa-Fekete and Krzysztof Dembczynski. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS .  Marek Wydmuch Kalina Jasinska Mikhail Kuznetsov R\u00f3bert Busa-Fekete and Krzysztof Dembczynski. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS ."},{"key":"e_1_3_2_2_30_1","unstructured":"Zhilin Yang Zihang Dai Yiming Yang Jaime Carbonell Ruslan Salakhutdinov and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NIPS .  Zhilin Yang Zihang Dai Yiming Yang Jaime Carbonell Ruslan Salakhutdinov and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NIPS ."},{"key":"e_1_3_2_2_31_1","unstructured":"Ian EH Yen Xiangru Huang Wei Dai Pradeep Ravikumar Inderjit Dhillon and Eric Xing. 2017. PPDsparse: A parallel primal-dual sparse method for extreme classification. In KDD. ACM.  Ian EH Yen Xiangru Huang Wei Dai Pradeep Ravikumar Inderjit Dhillon and Eric Xing. 2017. PPDsparse: A parallel primal-dual sparse method for extreme classification. In KDD. ACM."},{"key":"e_1_3_2_2_32_1","unstructured":"Ronghui You Zihan Zhang Ziye Wang Suyang Dai Hiroshi Mamitsuka and Shanfeng Zhu. 2019. AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification. In Advances in Neural Information Processing Systems. 5812--5822.  Ronghui You Zihan Zhang Ziye Wang Suyang Dai Hiroshi Mamitsuka and Shanfeng Zhu. 2019. AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification. In Advances in Neural Information Processing Systems. 5812--5822."}],"event":{"name":"KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Virtual Event CA USA","acronym":"KDD '20","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3403368","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394486.3403368","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:29Z","timestamp":1750195889000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3403368"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,20]]},"references-count":32,"alternative-id":["10.1145\/3394486.3403368","10.1145\/3394486"],"URL":"https:\/\/doi.org\/10.1145\/3394486.3403368","relation":{},"subject":[],"published":{"date-parts":[[2020,8,20]]},"assertion":[{"value":"2020-08-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}