{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T13:08:25Z","timestamp":1765544905505,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":70,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T00:00:00Z","timestamp":1691107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science Foundation","award":["ATD 2123761","CNS 1822118"],"award-info":[{"award-number":["ATD 2123761","CNS 1822118"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,6]]},"DOI":"10.1145\/3580305.3599508","type":"proceedings-article","created":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T18:13:58Z","timestamp":1691172838000},"page":"544-556","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Sparse Binary Transformers for Multivariate Time Series Modeling"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5000-1242","authenticated-orcid":false,"given":"Matt","family":"Gorbett","sequence":"first","affiliation":[{"name":"Colorado State University, Fort Collins, CO, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2721-0628","authenticated-orcid":false,"given":"Hossein","family":"Shirazi","sequence":"additional","affiliation":[{"name":"Colorado State University, San Diego, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0714-7676","authenticated-orcid":false,"given":"Indrakshi","family":"Ray","sequence":"additional","affiliation":[{"name":"Colorado State University, Fort Collins, CO, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,8,4]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-016-0483-9"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00223"},{"key":"e_1_3_2_2_3_1","volume-title":"Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150","author":"Beltagy Iz","year":"2020","unstructured":"Iz Beltagy , Matthew E Peters , and Arman Cohan . 2020 . Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020). Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)."},{"key":"e_1_3_2_2_4_1","volume-title":"Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arXiv:1308.3432 [cs] (Aug","author":"Bengio Yoshua","year":"2013","unstructured":"Yoshua Bengio , Nicholas L\u00e9onard , and Aaron Courville . 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arXiv:1308.3432 [cs] (Aug . 2013 ). http:\/\/arxiv.org\/abs\/1308.3432 arXiv: 1308.3432. Yoshua Bengio, Nicholas L\u00e9onard, and Aaron Courville. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arXiv:1308.3432 [cs] (Aug. 2013). http:\/\/arxiv.org\/abs\/1308.3432 arXiv: 1308.3432."},{"key":"e_1_3_2_2_5_1","volume-title":"Semi-supervised on-device neural network adaptation for remote and portable laser-induced breakdown spectroscopy. arXiv preprint arXiv:2104.03439","author":"Bhardwaj Kshitij","year":"2021","unstructured":"Kshitij Bhardwaj and Maya Gokhale . 2021. Semi-supervised on-device neural network adaptation for remote and portable laser-induced breakdown spectroscopy. arXiv preprint arXiv:2104.03439 ( 2021 ). Kshitij Bhardwaj and Maya Gokhale. 2021. Semi-supervised on-device neural network adaptation for remote and portable laser-induced breakdown spectroscopy. arXiv preprint arXiv:2104.03439 (2021)."},{"key":"e_1_3_2_2_6_1","first-page":"129","article-title":"What is the state of neural network pruning","volume":"2","author":"Blalock Davis","year":"2020","unstructured":"Davis Blalock , Jose Javier Gonzalez Ortiz , Jonathan Frankle , and John Guttag . 2020 . What is the state of neural network pruning ? Proceedings of machine learning and systems , Vol. 2 (2020), 129 -- 146 . Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. 2020. What is the state of neural network pruning? Proceedings of machine learning and systems, Vol. 2 (2020), 129--146.","journal-title":"Proceedings of machine learning and systems"},{"key":"e_1_3_2_2_7_1","volume-title":"Advances in Neural Information Processing Systems","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , Sandhini Agarwal , Ariel Herbert-Voss , Gretchen Krueger , Tom Henighan , Rewon Child , Aditya Ramesh , Daniel Ziegler , Jeffrey Wu , Clemens Winter , Chris Hesse , Mark Chen , Eric Sigler , Mateusz Litwin , Scott Gray , Benjamin Chess , Jack Clark , Christopher Berner , Sam McCandlish , Alec Radford , Ilya Sutskever , and Dario Amodei . 2020 . Language Models are Few-Shot Learners . In Advances in Neural Information Processing Systems , Vol. 33 . Curran Associates, Inc. , 1877--1901. https:\/\/papers.nips.cc\/paper\/2020\/hash\/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 1877--1901. https:\/\/papers.nips.cc\/paper\/2020\/hash\/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html"},{"key":"e_1_3_2_2_8_1","volume-title":"Pre-Trained Image Processing Transformer. In 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE","author":"Chen Hanting","year":"2021","unstructured":"Hanting Chen , Yunhe Wang , Tianyu Guo , Chang Xu , Yiping Deng , Zhenhua Liu , Siwei Ma , Chunjing Xu , Chao Xu , and Wen Gao . 2021 . Pre-Trained Image Processing Transformer. In 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE , Nashville, TN, USA, 12294--12305. https:\/\/doi.org\/10.1109\/CVPR46437. 2021.01212 10.1109\/CVPR46437.2021.01212 Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. 2021. Pre-Trained Image Processing Transformer. In 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Nashville, TN, USA, 12294--12305. https:\/\/doi.org\/10.1109\/CVPR46437.2021.01212"},{"key":"e_1_3_2_2_9_1","volume-title":"The lottery ticket hypothesis for pre-trained bert networks. Advances in neural information processing systems","author":"Chen Tianlong","year":"2020","unstructured":"Tianlong Chen , Jonathan Frankle , Shiyu Chang , Sijia Liu , Yang Zhang , Zhangyang Wang , and Michael Carbin . 2020. The lottery ticket hypothesis for pre-trained bert networks. Advances in neural information processing systems , Vol. 33 ( 2020 ), 15834--15846. Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, and Michael Carbin. 2020. The lottery ticket hypothesis for pre-trained bert networks. Advances in neural information processing systems, Vol. 33 (2020), 15834--15846."},{"key":"e_1_3_2_2_10_1","volume-title":"Advances in Neural Information Processing Systems","volume":"34","author":"Chijiwa Daiki","year":"2021","unstructured":"Daiki Chijiwa , Shin' ya Yamaguchi , Yasutoshi Ida , Kenji Umakoshi , and Tomohiro INOUE. 2021 . Pruning Randomly Initialized Neural Networks with Iterative Randomization . In Advances in Neural Information Processing Systems , Vol. 34 . Curran Associates, Inc., 4503--4513. https:\/\/papers.nips.cc\/paper\/ 2021\/hash\/23e582ad8087f2c03a5a31c125123f9a-Abstract.html Daiki Chijiwa, Shin' ya Yamaguchi, Yasutoshi Ida, Kenji Umakoshi, and Tomohiro INOUE. 2021. Pruning Randomly Initialized Neural Networks with Iterative Randomization. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 4503--4513. https:\/\/papers.nips.cc\/paper\/2021\/hash\/23e582ad8087f2c03a5a31c125123f9a-Abstract.html"},{"key":"e_1_3_2_2_11_1","volume-title":"Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509","author":"Child Rewon","year":"2019","unstructured":"Rewon Child , Scott Gray , Alec Radford , and Ilya Sutskever . 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 ( 2019 ). Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019)."},{"key":"e_1_3_2_2_12_1","unstructured":"Krzysztof Choromanski Valerii Likhosherstov David Dohan Xingyou Song Andreea Gane Tamas Sarlos Peter Hawkins Jared Davis Afroz Mohiuddin Lukasz Kaiser etal 2020. Rethinking attention with performers. arXiv preprint arXiv:2009.14794 (2020).  Krzysztof Choromanski Valerii Likhosherstov David Dohan Xingyou Song Andreea Gane Tamas Sarlos Peter Hawkins Jared Davis Afroz Mohiuddin Lukasz Kaiser et al. 2020. Rethinking attention with performers. arXiv preprint arXiv:2009.14794 (2020)."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2019.2958185"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-020-00701-z"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_2_16_1","first-page":"19","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https:\/\/doi.org\/10. 18653\/v1\/N 19 - 1423 10.18653\/v1 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https:\/\/doi.org\/10.18653\/v1\/N19-1423"},{"key":"e_1_3_2_2_17_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=U_mat0b9iv","author":"Diffenderfer James","year":"2021","unstructured":"James Diffenderfer and Bhavya Kailkhura . 2021 . Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network . In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=U_mat0b9iv James Diffenderfer and Bhavya Kailkhura. 2021. Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=U_mat0b9iv"},{"key":"e_1_3_2_2_18_1","first-page":"1","article-title":"Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity","volume":"23","author":"Fedus William","year":"2021","unstructured":"William Fedus , Barret Zoph , and Noam Shazeer . 2021 . Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity . J. Mach. Learn. Res , Vol. 23 (2021), 1 -- 40 . William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res, Vol. 23 (2021), 1--40.","journal-title":"J. Mach. Learn. Res"},{"key":"e_1_3_2_2_19_1","volume-title":"Unsupervised scalable representation learning for multivariate time series. Advances in neural information processing systems","author":"Franceschi Jean-Yves","year":"2019","unstructured":"Jean-Yves Franceschi , Aymeric Dieuleveut , and Martin Jaggi . 2019. Unsupervised scalable representation learning for multivariate time series. Advances in neural information processing systems , Vol. 32 ( 2019 ). Jean-Yves Franceschi, Aymeric Dieuleveut, and Martin Jaggi. 2019. Unsupervised scalable representation learning for multivariate time series. Advances in neural information processing systems, Vol. 32 (2019)."},{"key":"e_1_3_2_2_20_1","volume-title":"Trainable Neural Networks. (March","author":"Frankle Jonathan","year":"2019","unstructured":"Jonathan Frankle and Michael Carbin . 2019. The Lottery Ticket Hypothesis: Finding Sparse , Trainable Neural Networks. (March 2019 ). http:\/\/arxiv.org\/abs\/1803.03635 arXiv: 1803.03635. Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. (March 2019). http:\/\/arxiv.org\/abs\/1803.03635 arXiv: 1803.03635."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00413"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-10684-2_9"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3532105.3535038"},{"key":"e_1_3_2_2_24_1","volume-title":"Randomly Initialized Subnetworks with Iterative Weight Recycling. arXiv preprint arXiv:2303.15953","author":"Gorbett Matt","year":"2023","unstructured":"Matt Gorbett and Darrell Whitley . 2023. Randomly Initialized Subnetworks with Iterative Weight Recycling. arXiv preprint arXiv:2303.15953 ( 2023 ). Matt Gorbett and Darrell Whitley. 2023. Randomly Initialized Subnetworks with Iterative Weight Recycling. arXiv preprint arXiv:2303.15953 (2023)."},{"key":"e_1_3_2_2_25_1","volume-title":"Dally","author":"Han Song","year":"2015","unstructured":"Song Han , Jeff Pool , John Tran , and William J . Dally . 2015 . Learning both Weights and Connections for Efficient Neural Networks . (Oct. 2015). http:\/\/arxiv.org\/abs\/1506.02626 arXiv: 1506.02626. Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both Weights and Connections for Efficient Neural Networks. (Oct. 2015). http:\/\/arxiv.org\/abs\/1506.02626 arXiv: 1506.02626."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_3_2_2_27_1","volume-title":"Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2016 . Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE , Las Vegas, NV, USA, 770--778. https:\/\/doi.org\/10.1109\/CVPR. 2016.90 10.1109\/CVPR.2016.90 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 770--778. https:\/\/doi.org\/10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219845"},{"key":"e_1_3_2_2_29_1","first-page":"9895","article-title":"Sparse is enough in scaling transformers","volume":"34","author":"Jaszczur Sebastian","year":"2021","unstructured":"Sebastian Jaszczur , Aakanksha Chowdhery , Afroz Mohiuddin , Lukasz Kaiser , Wojciech Gajewski , Henryk Michalewski , and Jonni Kanerva . 2021 . Sparse is enough in scaling transformers . Advances in Neural Information Processing Systems , Vol. 34 (2021), 9895 -- 9907 . Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Lukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, and Jonni Kanerva. 2021. Sparse is enough in scaling transformers. Advances in Neural Information Processing Systems, Vol. 34 (2021), 9895--9907.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_30_1","first-page":"372","volume-title":"TinyBERT: Distilling BERT for Natural Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Jiao Xiaoqi","year":"2020","unstructured":"Xiaoqi Jiao , Yichun Yin , Lifeng Shang , Xin Jiang , Xiao Chen , Linlin Li , Fang Wang , and Qun Liu . 2020 . TinyBERT: Distilling BERT for Natural Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020 . Association for Computational Linguistics, Online, 4163--4174. https:\/\/doi.org\/10. 18653\/v1\/2020.findings-emnlp. 372 10.18653\/v1 Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 4163--4174. https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.372"},{"key":"e_1_3_2_2_31_1","volume-title":"Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451","author":"Kitaev Nikita","year":"2020","unstructured":"Nikita Kitaev , \u0141ukasz Kaiser , and Anselm Levskaya . 2020 . Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020). Nikita Kitaev, \u0141ukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020)."},{"key":"e_1_3_2_2_32_1","volume-title":"Advances in Neural Information Processing Systems","volume":"25","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . 2012 . ImageNet Classification with Deep Convolutional Neural Networks . In Advances in Neural Information Processing Systems , Vol. 25 . Curran Associates, Inc. https:\/\/papers.nips.cc\/paper\/ 2012\/hash\/c399862d3b9d6b76c8436e924a68c45b-Abstract.html Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, Vol. 25. Curran Associates, Inc. https:\/\/papers.nips.cc\/paper\/2012\/hash\/c399862d3b9d6b76c8436e924a68c45b-Abstract.html"},{"key":"e_1_3_2_2_33_1","unstructured":"Woosuk Kwon Sehoon Kim Michael W. Mahoney Joseph Hassoun Kurt Keutzer and Amir Gholami. 2022. A Fast Post-Training Pruning Framework for Transformers. In Advances in Neural Information Processing Systems Alice H. Oh Alekh Agarwal Danielle Belgrave and Kyunghyun Cho (Eds.). https:\/\/openreview.net\/forum?id=0GRBKLBjJE  Woosuk Kwon Sehoon Kim Michael W. Mahoney Joseph Hassoun Kurt Keutzer and Amir Gholami. 2022. A Fast Post-Training Pruning Framework for Transformers. In Advances in Neural Information Processing Systems Alice H. Oh Alekh Agarwal Danielle Belgrave and Kyunghyun Cho (Eds.). https:\/\/openreview.net\/forum?id=0GRBKLBjJE"},{"key":"e_1_3_2_2_34_1","volume-title":"Block pruning for faster transformers. arXiv preprint arXiv:2109.04838","author":"Lagunas Francc","year":"2021","unstructured":"Francc ois Lagunas , Ella Charlaix , Victor Sanh , and Alexander M Rush . 2021. Block pruning for faster transformers. arXiv preprint arXiv:2109.04838 ( 2021 ). Francc ois Lagunas, Ella Charlaix, Victor Sanh, and Alexander M Rush. 2021. Block pruning for faster transformers. arXiv preprint arXiv:2109.04838 (2021)."},{"key":"e_1_3_2_2_35_1","volume-title":"Advances in Neural Information Processing Systems","volume":"2","author":"LeCun Yann","year":"1989","unstructured":"Yann LeCun , John Denker , and Sara Solla . 1989 . Optimal Brain Damage . In Advances in Neural Information Processing Systems , Vol. 2 . Morgan-Kaufmann. https:\/\/papers.nips.cc\/paper\/ 1989\/hash\/6c9882bbac1c7093bd25041881277658-Abstract.html Yann LeCun, John Denker, and Sara Solla. 1989. Optimal Brain Damage. In Advances in Neural Information Processing Systems, Vol. 2. Morgan-Kaufmann. https:\/\/papers.nips.cc\/paper\/1989\/hash\/6c9882bbac1c7093bd25041881277658-Abstract.html"},{"key":"e_1_3_2_2_36_1","volume-title":"Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668","author":"Lepikhin Dmitry","year":"2020","unstructured":"Dmitry Lepikhin , HyoukJoong Lee , Yuanzhong Xu , Dehao Chen , Orhan Firat , Yanping Huang , Maxim Krikun , Noam Shazeer , and Zhifeng Chen . 2020 . Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668 (2020). Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2020. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668 (2020)."},{"key":"e_1_3_2_2_37_1","volume-title":"Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems","author":"Li Shiyang","year":"2019","unstructured":"Shiyang Li , Xiaoyong Jin , Yao Xuan , Xiyou Zhou , Wenhu Chen , Yu-Xiang Wang , and Xifeng Yan . 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems , Vol. 32 ( 2019 ). Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, Vol. 32 (2019)."},{"key":"e_1_3_2_2_38_1","volume-title":"Gated transformer networks for multivariate time series classification. arXiv preprint arXiv:2103.14438","author":"Liu Minghao","year":"2021","unstructured":"Minghao Liu , Shengqi Ren , Siyuan Ma , Jiahui Jiao , Yizhou Chen , Zhiguang Wang , and Wei Song . 2021a. Gated transformer networks for multivariate time series classification. arXiv preprint arXiv:2103.14438 ( 2021 ). Minghao Liu, Shengqi Ren, Siyuan Ma, Jiahui Jiao, Yizhou Chen, Zhiguang Wang, and Wei Song. 2021a. Gated transformer networks for multivariate time series classification. arXiv preprint arXiv:2103.14438 (2021)."},{"key":"e_1_3_2_2_39_1","volume-title":"International Conference on Learning Representations.","author":"Liu Shizhan","year":"2021","unstructured":"Shizhan Liu , Hang Yu , Cong Liao , Jianguo Li , Weiyao Lin , Alex X Liu , and Schahram Dustdar . 2021 b. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting . In International Conference on Learning Representations. Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. 2021b. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_40_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning. PMLR, 6682--6691","author":"Malach Eran","year":"2020","unstructured":"Eran Malach , Gilad Yehudai , Shai Shalev-Schwartz , and Ohad Shamir . 2020 . Proving the Lottery Ticket Hypothesis: Pruning is All You Need . In Proceedings of the 37th International Conference on Machine Learning. PMLR, 6682--6691 . ISSN: 2640--3498. Eran Malach, Gilad Yehudai, Shai Shalev-Schwartz, and Ohad Shamir. 2020. Proving the Lottery Ticket Hypothesis: Pruning is All You Need. In Proceedings of the 37th International Conference on Machine Learning. PMLR, 6682--6691. ISSN: 2640--3498."},{"key":"e_1_3_2_2_41_1","volume-title":"LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148","author":"Malhotra Pankaj","year":"2016","unstructured":"Pankaj Malhotra , Anusha Ramakrishnan , Gaurangi Anand , Lovekesh Vig , Puneet Agarwal , and Gautam Shroff . 2016. LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148 ( 2016 ). Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2016. LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148 (2016)."},{"key":"e_1_3_2_2_42_1","volume-title":"International Conference on Aerospace System Science and Engineering. Springer, 351--362","author":"Meng Hengyu","year":"2019","unstructured":"Hengyu Meng , Yuxuan Zhang , Yuanxiang Li , and Honghua Zhao . 2019 . Spacecraft anomaly detection via transformer reconstruction error . In International Conference on Aerospace System Science and Engineering. Springer, 351--362 . Hengyu Meng, Yuxuan Zhang, Yuanxiang Li, and Honghua Zhao. 2019. Spacecraft anomaly detection via transformer reconstruction error. In International Conference on Aerospace System Science and Engineering. Springer, 351--362."},{"key":"e_1_3_2_2_43_1","volume-title":"Advances in Neural Information Processing Systems","volume":"33","author":"Orseau Laurent","year":"2020","unstructured":"Laurent Orseau , Marcus Hutter , and Omar Rivasplata . 2020 . Logarithmic Pruning is All You Need . In Advances in Neural Information Processing Systems , Vol. 33 . Curran Associates, Inc., 2925--2934. Laurent Orseau, Marcus Hutter, and Omar Rivasplata. 2020. Logarithmic Pruning is All You Need. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 2925--2934."},{"key":"e_1_3_2_2_44_1","volume-title":"Advances in Neural Information Processing Systems","volume":"33","author":"Pensia Ankit","year":"2020","unstructured":"Ankit Pensia , Shashank Rajput , Alliot Nagle , Harit Vishwakarma , and Dimitris Papailiopoulos . 2020 . Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient . In Advances in Neural Information Processing Systems , Vol. 33 . Curran Associates, Inc., 2599--2610. Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, and Dimitris Papailiopoulos. 2020. Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 2599--2610."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107281"},{"key":"e_1_3_2_2_46_1","volume-title":"Sinong Wang, and Jie Tang.","author":"Qiu Jiezhong","year":"2019","unstructured":"Jiezhong Qiu , Hao Ma , Omer Levy , Scott Wen-tau Yih , Sinong Wang, and Jie Tang. 2019 . Blockwise self-attention for long document understanding. arXiv preprint arXiv:1911.02972 (2019). Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, and Jie Tang. 2019. Blockwise self-attention for long document understanding. arXiv preprint arXiv:1911.02972 (2019)."},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"crossref","unstructured":"Vivek Ramanujan Mitchell Wortsman Aniruddha Kembhavi Ali Farhadi and Mohammad Rastegari. 2020. What's Hidden in a Randomly Weighted Neural Network?. In Computer Vision and Pattern Recognition (CVPR).  Vivek Ramanujan Mitchell Wortsman Aniruddha Kembhavi Ali Farhadi and Mohammad Rastegari. 2020. What's Hidden in a Randomly Weighted Neural Network?. In Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR42600.2020.01191"},{"key":"e_1_3_2_2_48_1","volume-title":"Amsterdam, The Netherlands","author":"Rastegari Mohammad","year":"2016","unstructured":"Mohammad Rastegari , Vicente Ordonez , Joseph Redmon , and Ali Farhadi . 2016 . Xnor-net: Imagenet classification using binary convolutional neural networks. In Computer Vision-ECCV 2016: 14th European Conference , Amsterdam, The Netherlands , October 11-14, 2016, Proceedings, Part IV. Springer, 525--542. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV. Springer, 525--542."},{"key":"e_1_3_2_2_49_1","volume-title":"Self-attention for raw optical satellite time series classification. ISPRS journal of photogrammetry and remote sensing","author":"Ru\u00dfwurm Marc","year":"2020","unstructured":"Marc Ru\u00dfwurm and Marco K\u00f6rner . 2020. Self-attention for raw optical satellite time series classification. ISPRS journal of photogrammetry and remote sensing , Vol. 169 ( 2020 ), 421--435. Marc Ru\u00dfwurm and Marco K\u00f6rner. 2020. Self-attention for raw optical satellite time series classification. ISPRS journal of photogrammetry and remote sensing, Vol. 169 (2020), 421--435."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_2_51_1","first-page":"13016","article-title":"Timeseries anomaly detection using temporal hierarchical one-class network","volume":"33","author":"Shen Lifeng","year":"2020","unstructured":"Lifeng Shen , Zhuocong Li , and James Kwok . 2020 . Timeseries anomaly detection using temporal hierarchical one-class network . Advances in Neural Information Processing Systems , Vol. 33 (2020), 13016 -- 13026 . Lifeng Shen, Zhuocong Li, and James Kwok. 2020. Timeseries anomaly detection using temporal hierarchical one-class network. Advances in Neural Information Processing Systems, Vol. 33 (2020), 13016--13026.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098144"},{"key":"e_1_3_2_2_53_1","volume-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition. (Sept","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. (Sept . 2014 ). https:\/\/doi.org\/10.48550\/arXiv.1409.1556 10.48550\/arXiv.1409.1556 Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. (Sept. 2014). https:\/\/doi.org\/10.48550\/arXiv.1409.1556"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330672"},{"key":"e_1_3_2_2_55_1","volume-title":"International Conference on Machine Learning. PMLR, 10096--10106","author":"Tan Mingxing","year":"2021","unstructured":"Mingxing Tan and Quoc Le . 2021 . Efficientnetv2: Smaller models and faster training . In International Conference on Machine Learning. PMLR, 10096--10106 . Mingxing Tan and Quoc Le. 2021. Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning. PMLR, 10096--10106."},{"key":"e_1_3_2_2_56_1","volume-title":"Efficient Transformers: A Survey. ACM Comput. Surv. (apr","author":"Tay Yi","year":"2022","unstructured":"Yi Tay , Mostafa Dehghani , Dara Bahri , and Donald Metzler . 2022 . Efficient Transformers: A Survey. ACM Comput. Surv. (apr 2022). https:\/\/doi.org\/10.1145\/3530811 Just Accepted . 10.1145\/3530811 Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2022. Efficient Transformers: A Survey. ACM Comput. Surv. (apr 2022). https:\/\/doi.org\/10.1145\/3530811 Just Accepted."},{"key":"e_1_3_2_2_57_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning. PMLR, 10347--10357","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron , Matthieu Cord , Matthijs Douze , Francisco Massa , Alexandre Sablayrolles , and Herve Jegou . 2021 . Training data-efficient image transformers & distillation through attention . In Proceedings of the 38th International Conference on Machine Learning. PMLR, 10347--10357 . https:\/\/proceedings.mlr.press\/v139\/touvron21a.html ISSN: 2640--3498. Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 10347--10357. https:\/\/proceedings.mlr.press\/v139\/touvron21a.html ISSN: 2640--3498."},{"key":"e_1_3_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.14778\/3514061.3514067"},{"key":"e_1_3_2_2_59_1","volume-title":"Advances in Neural Information Processing Systems","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems , Vol. 30 . Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper\/ 2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_2_60_1","volume-title":"Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768","author":"Wang Sinong","year":"2020","unstructured":"Sinong Wang , Belinda Z Li , Madian Khabsa , Han Fang , and Hao Ma . 2020 . Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020). Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020)."},{"key":"e_1_3_2_2_61_1","volume-title":"Transformers in Time Series: A Survey. (March","author":"Wen Qingsong","year":"2022","unstructured":"Qingsong Wen , Tian Zhou , Chaoli Zhang , Weiqi Chen , Ziqing Ma , Junchi Yan , and Liang Sun . 2022. Transformers in Time Series: A Survey. (March 2022 ). http:\/\/arxiv.org\/abs\/2202.07125 Number : arXiv:2202.07125 arXiv:2202.07125 [cs, eess, stat]. Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun. 2022. Transformers in Time Series: A Survey. (March 2022). http:\/\/arxiv.org\/abs\/2202.07125 Number: arXiv:2202.07125 arXiv:2202.07125 [cs, eess, stat]."},{"key":"e_1_3_2_2_62_1","first-page":"22419","article-title":"Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting","volume":"34","author":"Wu Haixu","year":"2021","unstructured":"Haixu Wu , Jiehui Xu , Jianmin Wang , and Mingsheng Long . 2021 . Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting . Advances in Neural Information Processing Systems , Vol. 34 (2021), 22419 -- 22430 . Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, Vol. 34 (2021), 22419--22430.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_63_1","volume-title":"Adversarial sparse transformer for time series forecasting. Advances in neural information processing systems","author":"Wu Sifan","year":"2020","unstructured":"Sifan Wu , Xi Xiao , Qianggang Ding , Peilin Zhao , Ying Wei , and Junzhou Huang . 2020. Adversarial sparse transformer for time series forecasting. Advances in neural information processing systems , Vol. 33 ( 2020 ), 17105--17115. Sifan Wu, Xi Xiao, Qianggang Ding, Peilin Zhao, Ying Wei, and Junzhou Huang. 2020. Adversarial sparse transformer for time series forecasting. Advances in neural information processing systems, Vol. 33 (2020), 17105--17115."},{"key":"e_1_3_2_2_64_1","volume-title":"Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=LzQQ89U1qm_","author":"Xu Jiehui","year":"2022","unstructured":"Jiehui Xu , Haixu Wu , Jianmin Wang , and Mingsheng Long . 2022 . Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=LzQQ89U1qm_ Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2022. Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=LzQQ89U1qm_"},{"key":"e_1_3_2_2_65_1","volume-title":"Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. (April","author":"Yang Tien-Ju","year":"2017","unstructured":"Tien-Ju Yang , Yu-Hsin Chen , and Vivienne Sze . 2017. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. (April 2017 ). http:\/\/arxiv.org\/abs\/1611.05128 arXiv: 1611.05128. Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. 2017. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. (April 2017). http:\/\/arxiv.org\/abs\/1611.05128 arXiv: 1611.05128."},{"key":"e_1_3_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/EMC2-NIPS53020.2019.00016"},{"key":"e_1_3_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467401"},{"key":"e_1_3_2_2_68_1","volume-title":"International Conference on Machine Learning. PMLR, 12437--12446","author":"Zhang Hang","year":"2021","unstructured":"Hang Zhang , Yeyun Gong , Yelong Shen , Weisheng Li , Jiancheng Lv , Nan Duan , and Weizhu Chen . 2021 . Poolingformer: Long document modeling with pooling attention . In International Conference on Machine Learning. PMLR, 12437--12446 . Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, and Weizhu Chen. 2021. Poolingformer: Long document modeling with pooling attention. In International Conference on Machine Learning. PMLR, 12437--12446."},{"key":"e_1_3_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i12.17325"},{"key":"e_1_3_2_2_70_1","volume-title":"Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"27286","author":"Zhou Tian","year":"2022","unstructured":"Tian Zhou , Ziqing Ma , Qingsong Wen , Xue Wang , Liang Sun , and Rong Jin . 2022 . FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting . In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 27268-- 27286 . https:\/\/proceedings.mlr.press\/v162\/zhou22g.html Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 27268--27286. https:\/\/proceedings.mlr.press\/v162\/zhou22g.html"}],"event":{"name":"KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"],"location":"Long Beach CA USA","acronym":"KDD '23"},"container-title":["Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599508","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580305.3599508","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:52Z","timestamp":1750178272000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599508"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,4]]},"references-count":70,"alternative-id":["10.1145\/3580305.3599508","10.1145\/3580305"],"URL":"https:\/\/doi.org\/10.1145\/3580305.3599508","relation":{},"subject":[],"published":{"date-parts":[[2023,8,4]]},"assertion":[{"value":"2023-08-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}