{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T03:55:57Z","timestamp":1760586957460},"reference-count":18,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2019,8]]},"abstract":"<jats:p>\n            Developing machine learning (ML) applications is similar to developing traditional software --- it is often an iterative process in which developers navigate within a rich space of\n            <jats:italic>requirements, design decisions, implementations, empirical quality<\/jats:italic>\n            , and\n            <jats:italic>performance<\/jats:italic>\n            . In traditional software development, software engineering is the field of study which provides principled guidelines for this iterative process. However, as of today, the counterpart of \"software engineering for ML\" is largely missing --- developers of ML applications are left with powerful tools (e.g., TensorFlow and PyTorch) but little guidance regarding the development lifecycle itself.\n          <\/jats:p>\n          <jats:p>\n            In this paper, we view the management of ML development\n            <jats:italic>life-cycles<\/jats:italic>\n            from a\n            <jats:italic>data management<\/jats:italic>\n            perspective. We demonstrate two closely related systems, ease.ml\/ci and ease.ml\/meter, that provide\n            <jats:italic>some<\/jats:italic>\n            \"principled guidelines\" for ML application development: ci is a continuous integration engine for ML models and meter is a \"profiler\" for controlling\n            <jats:italic>overfitting<\/jats:italic>\n            of ML models. Both systems focus on managing the \"statistical generalization power\" of datasets used for assessing the quality of ML applications, namely, the\n            <jats:italic>validation set<\/jats:italic>\n            and the\n            <jats:italic>test set<\/jats:italic>\n            . By demonstrating these two systems we hope to spawn further discussions within our community on building this new type of data management systems for statistical generalization.\n          <\/jats:p>","DOI":"10.14778\/3352063.3352110","type":"journal-article","created":{"date-parts":[[2019,9,18]],"date-time":"2019-09-18T18:36:11Z","timestamp":1568831771000},"page":"1962-1965","source":"Crossref","is-referenced-by-count":6,"title":["Ease.ml\/ci and Ease.ml\/meter in action"],"prefix":"10.14778","volume":"12","author":[{"given":"Cedric","family":"Renggli","sequence":"first","affiliation":[{"name":"ETH Zurich"}]},{"given":"Frances Ann","family":"Hubis","sequence":"additional","affiliation":[{"name":"ETH Zurich"}]},{"given":"Bojan","family":"Karla\u0161","sequence":"additional","affiliation":[{"name":"ETH Zurich"}]},{"given":"Kevin","family":"Schawinski","sequence":"additional","affiliation":[{"name":"Modulos AG"}]},{"given":"Wentao","family":"Wu","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]},{"given":"Ce","family":"Zhang","sequence":"additional","affiliation":[{"name":"ETH Zurich"}]}],"member":"320","published-online":{"date-parts":[[2019,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Amazon Sage Maker. aws.amazon.com\/blogs\/aws\/sagemaker-automatic-model-tuning.  Amazon Sage Maker. aws.amazon.com\/blogs\/aws\/sagemaker-automatic-model-tuning."},{"key":"e_1_2_1_2_1","unstructured":"Google cloud AutoML. cloud.google.com\/automl.  Google cloud AutoML. cloud.google.com\/automl."},{"key":"e_1_2_1_3_1","unstructured":"Microsoft Azure Machine Learning. azure.microsoft.com\/en-us\/blog\/announcing-automated-ml-capability-in-azure-machine-learning.  Microsoft Azure Machine Learning. azure.microsoft.com\/en-us\/blog\/announcing-automated-ml-capability-in-azure-machine-learning."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1093\/mnras\/sty1398"},{"key":"e_1_2_1_5_1","volume-title":"ICML","author":"Blum A.","year":"2015","unstructured":"A. Blum and M. Hardt . The Ladder: A reliable leaderboard for machine learning competitions . In ICML , 2015 . A. Blum and M. Hardt. The Ladder: A reliable leaderboard for machine learning competitions. In ICML, 2015."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2746539.2746580"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5616"},{"key":"e_1_2_1_8_1","volume-title":"Ease.ml\/meter: Quantitative overfitting management for human-in-the-loop ml application development. CoRR, abs\/1906.00299","author":"Hubis F. A.","year":"2019","unstructured":"F. A. Hubis Ease.ml\/meter: Quantitative overfitting management for human-in-the-loop ml application development. CoRR, abs\/1906.00299 , 2019 . F. A. Hubis et al. Ease.ml\/meter: Quantitative overfitting management for human-in-the-loop ml application development. CoRR, abs\/1906.00299, 2019."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3240493"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685095"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3187009.3177737"},{"key":"e_1_2_1_12_1","volume-title":"SysML","author":"Renggli C.","year":"2019","unstructured":"C. Renggli Continuous integration of machine learning models with ease. ml\/ci: Towards a rigorous yet practical treatment . SysML , 2019 . C. Renggli et al. Continuous integration of machine learning models with ease. ml\/ci: Towards a rigorous yet practical treatment. SysML, 2019."},{"key":"e_1_2_1_13_1","volume-title":"SemEval@NAACL-HLT","author":"Rotsztejn J.","year":"2018","unstructured":"J. Rotsztejn ETH-DS3Lab at semeval-2018 task 7: Effectively combining recurrent and convolutional neural networks for relation classification and extraction . In SemEval@NAACL-HLT , 2018 . J. Rotsztejn et al. ETH-DS3Lab at semeval-2018 task 7: Effectively combining recurrent and convolutional neural networks for relation classification and extraction. In SemEval@NAACL-HLT, 2018."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1093\/mnrasl\/slx008"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1051\/0004-6361\/201833800"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1093\/mnras\/sty764"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1101\/256792"},{"key":"e_1_2_1_18_1","volume-title":"ICML","author":"Zhang H.","year":"2017","unstructured":"H. Zhang : Training linear models with end-to-end low precision, and a little bit of deep learning . In ICML , 2017 . H. Zhang et al. ZipML: Training linear models with end-to-end low precision, and a little bit of deep learning. In ICML, 2017."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3352063.3352110","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:38:46Z","timestamp":1672223926000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3352063.3352110"}},"subtitle":["towards data management for statistical generalization"],"short-title":[],"issued":{"date-parts":[[2019,8]]},"references-count":18,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2019,8]]}},"alternative-id":["10.14778\/3352063.3352110"],"URL":"https:\/\/doi.org\/10.14778\/3352063.3352110","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2019,8]]}}}