{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T12:22:53Z","timestamp":1772713373220,"version":"3.50.1"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"14","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2013,9]]},"abstract":"<jats:p>Machine learning algorithms are widely used today for analytical tasks such as data cleaning, data categorization, or data filtering. At the same time, the rise of social media motivates recent uptake in large scale graph processing. Both categories of algorithms are dominated by iterative subtasks, i.e., processing steps which are executed repetitively until a convergence condition is met. Optimizing cluster resource allocations among multiple workloads of iterative algorithms motivates the need for estimating their runtime, which in turn requires: i) predicting the number of iterations, and ii) predicting the processing time of each iteration. As both parameters depend on the characteristics of the dataset and on the convergence function, estimating their values before execution is difficult.<\/jats:p>\n          <jats:p>This paper proposes PREDIcT, an experimental methodology for predicting the runtime of iterative algorithms. PREDIcT uses sample runs for capturing the algorithm's convergence trend and per-iteration key input features that are well correlated with the actual processing requirements of the complete input dataset. Using this combination of characteristics we predict the runtime of iterative algorithms, including algorithms with very different runtime patterns among subsequent iterations. Our experimental evaluation of multiple algorithms on scale-free graphs shows a relative prediction error of 10%-30% for predicting runtime, including algorithms with up to 100\u00d7 runtime variability among consecutive iterations.<\/jats:p>","DOI":"10.14778\/2556549.2556553","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1678-1689","source":"Crossref","is-referenced-by-count":51,"title":["PREDIcT"],"prefix":"10.14778","volume":"6","author":[{"given":"Adrian Daniel","family":"Popescu","sequence":"first","affiliation":[{"name":"Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"}]},{"given":"Andrey","family":"Balmin","sequence":"additional","affiliation":[{"name":"IBM Almaden Research Center, San Jose, CA and GraphSQL, Mountain View, CA"}]},{"given":"Vuk","family":"Ercegovac","sequence":"additional","affiliation":[{"name":"IBM Almaden Research Center, San Jose, CA and Google Inc."}]},{"given":"Anastasia","family":"Ailamaki","sequence":"additional","affiliation":[{"name":"Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"}]}],"member":"320","published-online":{"date-parts":[[2013,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1951365.1951367"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447834"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1137856.1137880"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/16894.16859"},{"key":"e_1_2_1_5_1","volume-title":"Scaling Datalog for Machine Learning on Big Data. CoRR, abs\/1203.0160","author":"Bu Y.","year":"2012","unstructured":"Y. Bu, V. R. Borkar, M. J. Carey, J. Rosen, N. Polyzotis, T. Condie, M. Weimer, and R. Ramakrishnan. Scaling Datalog for Machine Learning on Big Data. CoRR, abs\/1203.0160, 2012."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920881"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1609\/icwsm.v4i1.14033"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066223"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276343"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687576"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/1331939.1331940"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989359"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350245"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.130"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1833515.1833840"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-004-1127-9"},{"key":"e_1_2_1_17_1","volume-title":"The Elements of Statistical Learning: Data Mining, Inference and Prediction","author":"Hastie T.","year":"2008","unstructured":"T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction, Second Edition. Springer, 2008."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402707.3402746"},{"key":"e_1_2_1_19_1","first-page":"261","volume-title":"CIDR","author":"Herodotou H.","year":"2011","unstructured":"H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A Self-tuning System for Big Data Analytics. In CIDR, pages 261-272, 2011."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2009.14"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465369"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1080\/15427951.2004.10129091"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150479"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367591"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350269"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/11687238_54"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807223"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2012.66"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/645927.672349"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/79173.79181"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1998582.1998637"},{"key":"e_1_2_1_37_1","volume-title":"CIDR","author":"Wang G.","year":"2013","unstructured":"G. Wang, W. Xie, A. J. Demers, and J. Gehrke. Asynchronous Large-Scale Graph Processing Made Easy. In CIDR, 2013."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/2023718.2023720"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1863103.1863113"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2556549.2556553","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T22:35:17Z","timestamp":1729722917000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2556549.2556553"}},"subtitle":["towards predicting the runtime of large scale iterative analytics"],"short-title":[],"issued":{"date-parts":[[2013,9]]},"references-count":36,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2013,9]]}},"alternative-id":["10.14778\/2556549.2556553"],"URL":"https:\/\/doi.org\/10.14778\/2556549.2556553","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2013,9]]}}}