{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T22:32:49Z","timestamp":1763591569387},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2019,10]]},"abstract":"<jats:p>\n            Deep learning models have been used to support analytics beyond simple aggregation, where deeper and wider models have been shown to yield great results. These models consume a huge amount of memory and computational operations. However, most of the large-scale industrial applications are often computational budget constrained. In practice, the peak workload of inference service could be 10x higher than the average cases, with the presence of unpredictable extreme cases. Lots of computational resources could be wasted during off-peak hours and the system may crash when the workload exceeds system capacity. How to support deep learning services with dynamic workload cost-efficiently remains a challenging problem. In this paper, we address the challenge with a general and novel training scheme called\n            <jats:italic>model slicing<\/jats:italic>\n            , which enables deep learning models to provide predictions within the prescribed computational resource budget dynamically.\n            <jats:italic>Model slicing<\/jats:italic>\n            could be viewed as an elastic computation solution without requiring more computational resources. Succinctly, each layer in the model is divided into\n            <jats:italic>groups<\/jats:italic>\n            of contiguous block of basic components (i.e. neurons in dense layers and channels in convolutional layers), and then partially ordered relation is introduced to these groups by enforcing that groups participated in each forward pass always starts from the\n            <jats:italic>first<\/jats:italic>\n            group to the\n            <jats:italic>dynamically-determined rightmost<\/jats:italic>\n            group. Trained by dynamically indexing the rightmost group with a single parameter\n            <jats:italic>slice rate<\/jats:italic>\n            , the network is engendered to build up group-wise and residual representation. Then during inference, a sub-model with fewer groups can be readily deployed for efficiency whose computation is roughly quadratic to the width controlled by the\n            <jats:italic>slice rate.<\/jats:italic>\n            Extensive experiments show that models trained with\n            <jats:italic>model slicing<\/jats:italic>\n            can effectively support on-demand workload with elastic inference cost.\n          <\/jats:p>","DOI":"10.14778\/3364324.3364325","type":"journal-article","created":{"date-parts":[[2020,9,11]],"date-time":"2020-09-11T03:16:00Z","timestamp":1599794160000},"page":"86-99","source":"Crossref","is-referenced-by-count":11,"title":["Model slicing for supporting complex analytics with elastic inference cost and resource constraints"],"prefix":"10.14778","volume":"13","author":[{"given":"Shaofeng","family":"Cai","sequence":"first","affiliation":[{"name":"National University of Singapore"}]},{"given":"Gang","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang University"}]},{"given":"Beng Chin","family":"Ooi","sequence":"additional","affiliation":[{"name":"National University of Singapore"}]},{"given":"Jinyang","family":"Gao","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]}],"member":"320","published-online":{"date-parts":[[2019,10]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007279"},{"key":"e_1_2_1_2_1","volume-title":"Isbnet: Instance-aware selective branching network. arXiv preprint arXiv:1905.04849","author":"Cai S.","year":"2019","unstructured":"S. Cai , Y. Shu , W. Wang , and B. C. Ooi . Isbnet: Instance-aware selective branching network. arXiv preprint arXiv:1905.04849 , 2019 . S. Cai, Y. Shu, W. Wang, and B. C. Ooi. Isbnet: Instance-aware selective branching network. arXiv preprint arXiv:1905.04849, 2019."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190659"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056097"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080819"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_1_7_1","first-page":"2285","volume-title":"International Conference on Machine Learning","author":"Chen W.","year":"2015","unstructured":"W. Chen , J. Wilson , S. Tyree , K. Weinberger , and Y. Chen . Compressing neural networks with the hashing trick . In International Conference on Machine Learning , pages 2285 -- 2294 , 2015 . W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen. Compressing neural networks with the hashing trick. In International Conference on Machine Learning, pages 2285--2294, 2015."},{"key":"e_1_2_1_8_1","volume-title":"On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259","author":"Cho K.","year":"2014","unstructured":"K. Cho , B. Van Merri\u00ebnboer , D. Bahdanau , and Y. Bengio . On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 , 2014 . K. Cho, B. Van Merri\u00ebnboer, D. Bahdanau, and Y. Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.195"},{"key":"e_1_2_1_10_1","volume-title":"Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830","author":"Courbariaux M.","year":"2016","unstructured":"M. Courbariaux , I. Hubara , D. Soudry , R. El-Yaniv , and Y. Bengio . Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 , 2016 . M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_12_1","first-page":"1269","volume-title":"Advances in neural information processing systems","author":"Denton E. L.","year":"2014","unstructured":"E. L. Denton , W. Zaremba , J. Bruna , Y. LeCun , and R. Fergus . Exploiting linear structure within convolutional networks for efficient evaluation . In Advances in neural information processing systems , pages 1269 -- 1277 , 2014 . E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems, pages 1269--1277, 2014."},{"key":"e_1_2_1_13_1","volume-title":"Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677","author":"Goyal P.","year":"2017","unstructured":"P. Goyal , P. Doll\u00e1r , R. Girshick , P. Noordhuis , L. Wesolowski , A. Kyrola , A. Tulloch , Y. Jia , and K. He . Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 , 2017 . P. Goyal, P. Doll\u00e1r, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017."},{"key":"e_1_2_1_14_1","volume-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149","author":"Han S.","year":"2015","unstructured":"S. Han , H. Mao , and W. J. Dally . Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 , 2015 . S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015."},{"key":"e_1_2_1_15_1","first-page":"1135","volume-title":"Advances in neural information processing systems","author":"Han S.","year":"2015","unstructured":"S. Han , J. Pool , J. Tran , and W. Dally . Learning both weights and connections for efficient neural network . In Advances in neural information processing systems , pages 1135 -- 1143 , 2015 . S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135--1143, 2015."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"e_1_2_1_18_1","volume-title":"Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531","author":"Hinton G.","year":"2015","unstructured":"G. Hinton , O. Vinyals , and J. Dean . Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 , 2015 . G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_20_1","volume-title":"Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861","author":"Howard A. G.","year":"2017","unstructured":"A. G. Howard , M. Zhu , B. Chen , D. Kalenichenko , W. Wang , T. Weyand , M. Andreetto , and H. Adam . Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 , 2017 . A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33013812"},{"key":"e_1_2_1_22_1","volume-title":"Van Der Maaten, and K. Q. Weinberger. Multi-scale dense convolutional networks for efficient prediction. arXiv preprint arXiv:1703.09844, 2","author":"Huang G.","year":"2017","unstructured":"G. Huang , D. Chen , T. Li , F. Wu , L. Van Der Maaten, and K. Q. Weinberger. Multi-scale dense convolutional networks for efficient prediction. arXiv preprint arXiv:1703.09844, 2 , 2017 . G. Huang, D. Chen, T. Li, F. Wu, L. Van Der Maaten, and K. Q. Weinberger. Multi-scale dense convolutional networks for efficient prediction. arXiv preprint arXiv:1703.09844, 2, 2017."},{"key":"e_1_2_1_23_1","first-page":"4700","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Huang G.","year":"2017","unstructured":"G. Huang , Z. Liu , L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks . In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4700 -- 4708 , 2017 . G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700--4708, 2017."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_39"},{"key":"e_1_2_1_25_1","volume-title":"Squeezenet: Alexnet-level accuracy with 50x fewer parameters and&iexcl","author":"Iandola F. N.","year":"2016","unstructured":"F. N. Iandola , S. Han , M. W. Moskewicz , K. Ashraf , W. J. Dally , and K. Keutzer . Squeezenet: Alexnet-level accuracy with 50x fewer parameters and&iexcl ; 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016 . F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and&iexcl; 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016."},{"key":"e_1_2_1_26_1","volume-title":"Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167","author":"Ioffe S.","year":"2015","unstructured":"S. Ioffe and C. Szegedy . Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 , 2015 . S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137664"},{"key":"e_1_2_1_28_1","first-page":"3146","volume-title":"Advances in Neural Information Processing Systems","author":"Ke G.","year":"2017","unstructured":"G. Ke , Q. Meng , T. Finley , T. Wang , W. Chen , W. Ma , Q. Ye , and T.-Y. Liu . Lightgbm : A highly efficient gradient boosting decision tree . In Advances in Neural Information Processing Systems , pages 3146 -- 3154 , 2017 . G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pages 3146--3154, 2017."},{"key":"e_1_2_1_29_1","volume-title":"Citeseer","author":"Krizhevsky A.","year":"2009","unstructured":"A. Krizhevsky and G. Hinton . Learning multiple layers of features from tiny images. Technical report , Citeseer , 2009 . A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009."},{"key":"e_1_2_1_30_1","first-page":"1097","volume-title":"Advances in neural information processing systems","author":"Krizhevsky A.","year":"2012","unstructured":"A. Krizhevsky , I. Sutskever , and G. E. Hinton . Imagenet classification with deep convolutional neural networks . In Advances in neural information processing systems , pages 1097 -- 1105 , 2012 . A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2935694.2935698"},{"key":"e_1_2_1_32_1","volume-title":"Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648","author":"Larsson G.","year":"2016","unstructured":"G. Larsson , M. Maire , and G. Shakhnarovich . Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648 , 2016 . G. Larsson, M. Maire, and G. Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648, 2016."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915235"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098011"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.298"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00216"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2010-343"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2807410"},{"key":"e_1_2_1_39_1","volume-title":"Using the output embedding to improve language models. arXiv preprint arXiv:1608.05859","author":"Press O.","year":"2016","unstructured":"O. Press and L. Wolf . Using the output embedding to improve language models. arXiv preprint arXiv:1608.05859 , 2016 . O. Press and L. Wolf. Using the output embedding to improve language models. arXiv preprint arXiv:1608.05859, 2016."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242643"},{"key":"e_1_2_1_41_1","volume-title":"Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538","author":"Shazeer N.","year":"2017","unstructured":"N. Shazeer , A. Mirhoseini , K. Maziarz , A. Davis , Q. Le , G. Hinton , and J. Dean . Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 , 2017 . N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610492"},{"key":"e_1_2_1_43_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan K.","year":"2014","unstructured":"K. Simonyan and A. Zisserman . Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014 . K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.61"},{"key":"e_1_2_1_45_1","first-page":"550","volume-title":"Advances in Neural Information Processing Systems","author":"Veit A.","year":"2016","unstructured":"A. Veit , M. J. Wilber , and S. Belongie . Residual networks behave like ensembles of relatively shallow networks . In Advances in Neural Information Processing Systems , pages 550 -- 558 , 2016 . A. Veit, M. J. Wilber, and S. Belongie. Residual networks behave like ensembles of relatively shallow networks. In Advances in Neural Information Processing Systems, pages 550--558, 2016."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009934"},{"key":"e_1_2_1_47_1","volume-title":"Idk cascades: Fast deep learning by learning not to overthink. arXiv preprint arXiv:1706.00885","author":"Wang X.","year":"2017","unstructured":"X. Wang , Y. Luo , D. Crankshaw , A. Tumanov , F. Yu , and J. E. Gonzalez . Idk cascades: Fast deep learning by learning not to overthink. arXiv preprint arXiv:1706.00885 , 2017 . X. Wang, Y. Luo, D. Crankshaw, A. Tumanov, F. Yu, and J. E. Gonzalez. Idk cascades: Fast deep learning by learning not to overthink. arXiv preprint arXiv:1706.00885, 2017."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_25"},{"key":"e_1_2_1_49_1","first-page":"2074","volume-title":"Advances in Neural Information Processing Systems","author":"Wen W.","year":"2016","unstructured":"W. Wen , C. Wu , Y. Wang , Y. Chen , and H. Li . Learning structured sparsity in deep neural networks . In Advances in Neural Information Processing Systems , pages 2074 -- 2082 , 2016 . W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems, pages 2074--2082, 2016."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_1"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_2_1_52_1","volume-title":"Slimmable neural networks. arXiv preprint arXiv:1812.08928","author":"Yu J.","year":"2018","unstructured":"J. Yu , L. Yang , N. Xu , J. Yang , and T. Huang . Slimmable neural networks. arXiv preprint arXiv:1812.08928 , 2018 . J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang. Slimmable neural networks. arXiv preprint arXiv:1812.08928, 2018."},{"key":"e_1_2_1_53_1","volume-title":"Wide residual networks. arXiv preprint arXiv:1605.07146","author":"Zagoruyko S.","year":"2016","unstructured":"S. Zagoruyko and N. Komodakis . Wide residual networks. arXiv preprint arXiv:1605.07146 , 2016 . S. Zagoruyko and N. Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016."},{"key":"e_1_2_1_54_1","volume-title":"Recurrent neural network regularization. arXiv preprint arXiv:1409.2329","author":"Zaremba W.","year":"2014","unstructured":"W. Zaremba , I. Sutskever , and O. Vinyals . Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 , 2014 . W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014."},{"key":"e_1_2_1_55_1","volume-title":"Materialization optimizations for feature selection workloads. ACM Transactions on Database Systems (TODS), 41(1):2","author":"Zhang C.","year":"2016","unstructured":"C. Zhang , A. Kumar , and C. R\u00e9 . Materialization optimizations for feature selection workloads. ACM Transactions on Database Systems (TODS), 41(1):2 , 2016 . C. Zhang, A. Kumar, and C. R\u00e9. Materialization optimizations for feature selection workloads. ACM Transactions on Database Systems (TODS), 41(1):2, 2016."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00716"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447819"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.5555\/2919332.2919877"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3364324.3364325","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:59:10Z","timestamp":1672225150000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3364324.3364325"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10]]},"references-count":58,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,10]]}},"alternative-id":["10.14778\/3364324.3364325"],"URL":"https:\/\/doi.org\/10.14778\/3364324.3364325","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2019,10]]}}}