{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T04:37:08Z","timestamp":1780547828527,"version":"3.54.1"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,6]]},"abstract":"<jats:p>\n            Modern machine learning services and systems are complicated data systems --- the process of designing such systems is an art of compromising between\n            <jats:italic>functionality<\/jats:italic>\n            ,\n            <jats:italic>performance<\/jats:italic>\n            , and\n            <jats:italic>quality<\/jats:italic>\n            . Providing different levels of system supports for different functionalities, such as automatic feature engineering, model selection and ensemble, and hyperparameter tuning, could improve the quality, but also introduce additional cost and system complexity. In this paper, we try to facilitate the process of asking the following type of questions:\n            <jats:italic>How much will the users lose if we remove the support of functionality x from a machine learning service?<\/jats:italic>\n          <\/jats:p>\n          <jats:p>Answering this type of questions using existing datasets, such as the UCI datasets, is challenging. The main contribution of this work is a novel dataset, MLBench, harvested from Kaggle competitions. Unlike existing datasets, MLBench contains not only the raw features for a machine learning task, but also those used by the winning teams of Kaggle competitions. The winning features serve as a baseline of best human effort that enables multiple ways to measure the quality of machine learning services that cannot be supported by existing datasets, such as relative ranking on Kaggle and relative accuracy compared with best-effort systems.<\/jats:p>\n          <jats:p>We then conduct an empirical study using MLBench to understand example machine learning services from Amazon and Microsoft Azure, and showcase how MLBench enables a comparative study revealing the strength and weakness of these existing machine learning services quantitatively and systematically. The full version of this paper can be found at arxiv.org\/abs\/1707.09562<\/jats:p>","DOI":"10.14778\/3231751.3231770","type":"journal-article","created":{"date-parts":[[2018,7,27]],"date-time":"2018-07-27T12:21:07Z","timestamp":1532694067000},"page":"1220-1232","source":"Crossref","is-referenced-by-count":20,"title":["MLbench"],"prefix":"10.14778","volume":"11","author":[{"given":"Yu","family":"Liu","sequence":"first","affiliation":[{"name":"ETH Zurich"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hantian","family":"Zhang","sequence":"additional","affiliation":[{"name":"ETH Zurich"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Luyuan","family":"Zeng","sequence":"additional","affiliation":[{"name":"ETH Zurich"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wentao","family":"Wu","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ce","family":"Zhang","sequence":"additional","affiliation":[{"name":"ETH Zurich"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2018,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"https:\/\/studio.azureml.net\/.  https:\/\/studio.azureml.net\/."},{"key":"e_1_2_1_2_1","unstructured":"https:\/\/aws.amazon.com\/aml\/.  https:\/\/aws.amazon.com\/aml\/."},{"key":"e_1_2_1_3_1","unstructured":"https:\/\/www.kaggle.com\/.  https:\/\/www.kaggle.com\/."},{"key":"e_1_2_1_4_1","unstructured":"https:\/\/github.com\/KKPMW\/Kaggle-MLSP-Schizo-3rd.  https:\/\/github.com\/KKPMW\/Kaggle-MLSP-Schizo-3rd."},{"key":"e_1_2_1_5_1","unstructured":"https:\/\/github.com\/saffsd\/kaggle-stumbleupon2013.  https:\/\/github.com\/saffsd\/kaggle-stumbleupon2013."},{"key":"e_1_2_1_6_1","unstructured":"https:\/\/github.com\/Cardal\/Kaggle WestNileVirus.  https:\/\/github.com\/Cardal\/Kaggle WestNileVirus."},{"key":"e_1_2_1_7_1","unstructured":"https:\/\/github.com\/owenzhang\/Kaggle-AmazonChallenge2013.  https:\/\/github.com\/owenzhang\/Kaggle-AmazonChallenge2013."},{"key":"e_1_2_1_8_1","unstructured":"https:\/\/www.datarobot.com\/blog\/datarobot-the-2014-kdd-cup\/.  https:\/\/www.datarobot.com\/blog\/datarobot-the-2014-kdd-cup\/."},{"key":"e_1_2_1_9_1","unstructured":"https:\/\/github.com\/gramolin\/flavours-of-physics.  https:\/\/github.com\/gramolin\/flavours-of-physics."},{"key":"e_1_2_1_10_1","unstructured":"http:\/\/docs.aws.amazon.com\/machine-learning\/latest\/dg\/learning-algorithm.html.  http:\/\/docs.aws.amazon.com\/machine-learning\/latest\/dg\/learning-algorithm.html."},{"key":"e_1_2_1_11_1","unstructured":"https:\/\/aws.amazon.com\/sagemaker\/.  https:\/\/aws.amazon.com\/sagemaker\/."},{"key":"e_1_2_1_12_1","unstructured":"http:\/\/www.tpc.org\/information\/benchmarks.asp.  http:\/\/www.tpc.org\/information\/benchmarks.asp."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007515423169"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(96)00142-2"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143865"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807128.1807152"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022627411411"},{"key":"e_1_2_1_20_1","first-page":"269","volume-title":"The Benchmark Handbook for Database and Transaction Systems","author":"DeWitt D. J.","year":"1993","unstructured":"D. J. DeWitt . The Wisconsin benchmark: Past, present, and future . In The Benchmark Handbook for Database and Transaction Systems , pages 269 -- 316 . 1993 . D. J. DeWitt. The Wisconsin benchmark: Past, present, and future. In The Benchmark Handbook for Database and Transaction Systems, pages 269--316. 1993."},{"key":"e_1_2_1_21_1","volume-title":"UCI machine learning repository","author":"Dheeru D.","year":"2017","unstructured":"D. Dheeru and E. Karra Taniskidou . UCI machine learning repository , 2017 . D. Dheeru and E. Karra Taniskidou. UCI machine learning repository, 2017."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2347736.2347755"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2697065"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1504"},{"key":"e_1_2_1_25_1","volume-title":"glmnet: Lasso and elastic-net regularized generalized linear models. R package version, 1(4)","author":"Friedman J.","year":"2009","unstructured":"J. Friedman , T. Hastie , and R. Tibshirani . glmnet: Lasso and elastic-net regularized generalized linear models. R package version, 1(4) , 2009 . J. Friedman, T. Hastie, and R. Tibshirani. glmnet: Lasso and elastic-net regularized generalized linear models. R package version, 1(4), 2009."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-006-6226-1"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"e_1_2_1_28_1","volume-title":"Neural Networks: A Comprehensive Foundation","author":"Haykin S.","year":"1998","unstructured":"S. Haykin . Neural Networks: A Comprehensive Foundation . Prentice Hall PTR , 2 nd edition, 1998 . S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, 2nd edition, 1998.","edition":"2"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244301753683717"},{"key":"e_1_2_1_30_1","first-page":"278","volume-title":"ICDAR","volume":"1","author":"Ho T. K.","year":"1995","unstructured":"T. K. Ho . Random decision forests . In ICDAR , volume 1 , pages 278 -- 282 , 1995 . T. K. Ho. Random decision forests. In ICDAR, volume 1, pages 278--282, 1995."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3187009.3177737"},{"key":"e_1_2_1_32_1","volume-title":"ArXiv","author":"Liu Y.","year":"2017","unstructured":"Y. Liu , H. Zhang , L. Zeng , W. Wu , and C. Zhang . How good are machine learning clouds for binary classification with good features ? ArXiv , 2017 . Y. Liu, H. Zhang, L. Zeng, W. Wu, and C. Zhang. How good are machine learning clouds for binary classification with good features? ArXiv, 2017."},{"key":"e_1_2_1_33_1","first-page":"134","volume-title":"ALTA","author":"Lui M.","year":"2012","unstructured":"M. Lui . Feature stacking for sentence classification in evidence-based medicine . In ALTA , pages 134 -- 138 , 2012 . M. Lui. Feature stacking for sentence classification in evidence-based medicine. In ALTA, pages 134--138, 2012."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-012-2118-7"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214507000001120"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022643204877"},{"key":"e_1_2_1_37_1","volume-title":"gbm: Generalized boosted regression models. R package version, 1(3):55","author":"Ridgeway G.","year":"2006","unstructured":"G. Ridgeway gbm: Generalized boosted regression models. R package version, 1(3):55 , 2006 . G. Ridgeway et al. gbm: Generalized boosted regression models. R package version, 1(3):55, 2006."},{"key":"e_1_2_1_38_1","first-page":"234","volume-title":"NIPS","author":"Shotton J.","year":"2013","unstructured":"J. Shotton , T. Sharp , P. Kohli , S. Nowozin , J. M. Winn , and A. Criminisi . Decision jungles: Compact and rich models for classification . In NIPS , pages 234 -- 242 , 2013 . J. Shotton, T. Sharp, P. Kohli, S. Nowozin, J. M. Winn, and A. Criminisi. Decision jungles: Compact and rich models for classification. In NIPS, pages 234--242, 2013."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3231751.3231770","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:43:35Z","timestamp":1672224215000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3231751.3231770"}},"subtitle":["benchmarking machine learning services against human experts"],"short-title":[],"issued":{"date-parts":[[2018,6]]},"references-count":38,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2018,6]]}},"alternative-id":["10.14778\/3231751.3231770"],"URL":"https:\/\/doi.org\/10.14778\/3231751.3231770","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,6]]}}}