{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T10:04:22Z","timestamp":1781172262116,"version":"3.54.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2020,2,1]],"date-time":"2020-02-01T00:00:00Z","timestamp":1580515200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2020,2,1]],"date-time":"2020-02-01T00:00:00Z","timestamp":1580515200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"funder":[{"DOI":"10.13039\/501100007601","name":"Horizon 2020","doi-asserted-by":"publisher","award":["644869"],"award-info":[{"award-number":["644869"]}],"id":[{"id":"10.13039\/501100007601","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Form. Asp. Comput."],"published-print":{"date-parts":[[2020,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Apache Spark is probably the most widely adopted framework for developing big-data batch applications and for executing them on a\ncluster of (virtual) machines. In general, the more resources\n(machines) one uses, the faster applications execute, but there is\ncurrently no adequate means to determine the proper size of a Spark\ncluster given time constraints, or to foresee execution times given\nthe number of employed machines. One can only run these applications\nand use her\/his experience to size the cluster and predict expected\nexecution times. Wrong estimation of execution times can lead to\ncostly overruns and overly long executions, thus calling for\nanalytic sizing\/prediction techniques that provide precise time\nguarantees. This paper addresses this problem by proposing a\nsolution based on model-checking. The approach exploits a directed\nacyclic graph (DAG) to abstract the structure of the execution flows\nof Spark programs, annotates each node (Spark stage) with\nexecution-related data, and formulates the identification of the\nglobal execution time as a reachability problem. To avoid the\nwell-known state space explosion problem, the paper also proposes a\ntechnique to reduce the size of generated abstract models. This\nresults in a significant decrease in used memory and\/or verification\ntime making our approach feasible for predicting the execution time\nof Spark applications given the resources available. The benefits of\nthe proposed reduction technique are evaluated by using both timed\nautomata and constraint LTL over clocks logic  to formally\nencode and analyze generated models. The approach is also\nsuccessfully validated on some realistic case studies. Since the\noptimization is not Spark-specific, we claim that it can be applied\nto a wide range of applications whose underlying model can be\nabstracted as a DAG.<\/jats:p>","DOI":"10.1007\/s00165-020-00505-4","type":"journal-article","created":{"date-parts":[[2020,3,6]],"date-time":"2020-03-06T13:32:38Z","timestamp":1583501558000},"page":"33-70","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Using formal verification to evaluate theexecution time of Spark applications"],"prefix":"10.1145","volume":"32","author":[{"given":"L.","family":"Baresi","sequence":"first","affiliation":[{"name":"Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Golgi 42, 20133, Milan, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5137-940X","authenticated-orcid":false,"given":"M. M.","family":"Bersani","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Golgi 42, 20133, Milan, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"F.","family":"Marconi","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Golgi 42, 20133, Milan, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"G.","family":"Quattrocchi","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Golgi 42, 20133, Milan, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"M.","family":"Rossi","sequence":"additional","affiliation":[{"name":"Dipartimento di Meccanica, Politecnico di Milano, via La Masa 1, 20156, Milano, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","reference":[{"key":"e_1_2_1_2_1_2","doi-asserted-by":"publisher","DOI":"10.1006\/inco.1993.1024"},{"key":"e_1_2_1_2_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(94)90010-8"},{"key":"e_1_2_1_2_3_2","unstructured":"Brito A Ardagna D Blanquer I Evangelinou A Barbierato E Gribaudo M Almeida J Couto AP Braga T (2017) D3.4 EUBra-BIGSEA QoS infrastructure services intermediate version. Technical report EUBra-BIGSEA consortium"},{"key":"e_1_2_1_2_4_2","first-page":"118","volume":"58","author":"Biere A","year":"2003","journal-title":"Bounded model checking. Adv Comput"},{"key":"e_1_2_1_2_5_2","unstructured":"Behrmann G David A Larsen KG Hakansson J Petterson P Yi W Hendriks M (2006) Uppaal 4.0. In: Proceedings of the 3rd international conference on the quantitative evaluation of systems QEST '06 Washington DC USA. IEEE Computer Society pp 125\u2013126"},{"key":"e_1_2_1_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0055357"},{"key":"e_1_2_1_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2010.39"},{"key":"e_1_2_1_2_8_2","unstructured":"Bradley S Henderson W Kendall D (1999) Using timed automata for response time analysis of distributed real-time systems. In: 24th IFAC\/IFIP workshop on real-time programming pp 143\u2013148"},{"key":"e_1_2_1_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/1059816.1059823"},{"key":"e_1_2_1_2_10_2","doi-asserted-by":"crossref","unstructured":"Bouyer P (2009) Model-checking timed temporal logics. Electron Notes Theor Comput Sci 231:323\u2013341. Proceedings of the 5th workshop on methods for modalities (M4M5 2007)","DOI":"10.1016\/j.entcs.2009.02.044"},{"key":"e_1_2_1_2_11_2","doi-asserted-by":"crossref","unstructured":"Brin S Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the international world-wide web conference (WWW) pp 107\u2013117","DOI":"10.1016\/S0169-7552(98)00110-X"},{"key":"e_1_2_1_2_12_2","doi-asserted-by":"crossref","unstructured":"Baresi L Pourhashem Kallehbasti MM Rossi M (2016) How bit-vector logic can help improve the verification of LTL specifications over infinite domains. In: Proceedings of the 31st annual ACM symposium on applied computing pp 1666\u20131673","DOI":"10.1145\/2851613.2851833"},{"key":"e_1_2_1_2_13_2","doi-asserted-by":"crossref","unstructured":"Baresi L Quattrocchi G (2018) Towards vertically scalable spark applications. In: Euro-Par 2018: parallel processing workshops. Springer","DOI":"10.1007\/978-3-030-10549-5_9"},{"key":"e_1_2_1_2_14_2","doi-asserted-by":"crossref","unstructured":"Bersani MM Rossi M San Pietro P (2017) A logical characterization of timed regular languages. Theor Comput Sci 658:46--59","DOI":"10.1016\/j.tcs.2016.07.020"},{"key":"e_1_2_1_2_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00236-015-0229-y"},{"key":"e_1_2_1_2_16_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009715923555"},{"key":"e_1_2_1_2_17_2","doi-asserted-by":"crossref","unstructured":"Cimatti A Clarke EM Giunchiglia E Giunchiglia F Pistore M Roveri M Sebastiani R Tacchella A (2002) Nusmv 2: an opensource tool for symbolic model checking. In: Computer aided verification 14th international conference CAV 2002 Copenhagen Denmark 27\u201331 July 2002 Proceedings pp 359\u2013364","DOI":"10.1007\/3-540-45657-0_29"},{"key":"e_1_2_1_2_18_2","doi-asserted-by":"crossref","unstructured":"Clarke EC Emerson EA Jha S Prasad Sistla A (1998) Symmetry reductions in model checking. In: Hu AJ Vardi MY (eds) Computer aided verification 10th international conference CAV '98 Vancouver BC Canada 28 June\u20132 July 1998 Proceedings. Lecture notes in computer science vol 1427. Springer pp 147\u2013158","DOI":"10.1007\/BFb0028741"},{"key":"e_1_2_1_2_19_2","volume-title":"Model checking","author":"Clarke EM","year":"1999"},{"key":"e_1_2_1_2_20_2","doi-asserted-by":"crossref","unstructured":"Corbett JC (Jul 1996) Timing analysis of ada tasking programs. IEEE Trans Softw Eng 22(7):461\u2013483","DOI":"10.1109\/32.538604"},{"key":"e_1_2_1_2_21_2","unstructured":"DAG-ver Project repository. github.com\/deib-polimi\/DAG-ver 2019"},{"key":"e_1_2_1_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2006.09.006"},{"key":"e_1_2_1_2_23_2","doi-asserted-by":"publisher","DOI":"10.2307\/1969503"},{"key":"e_1_2_1_2_24_2","doi-asserted-by":"crossref","unstructured":"Donaldson AF Miller A Parker D (2009) Language-level symmetry reduction for probabilistic model checking. In: QEST 2009 sixth international conference on the quantitative evaluation of systems Budapest Hungary 13\u201316 Sept 2009. IEEE Computer Society pp 289\u2013298","DOI":"10.1109\/QEST.2009.21"},{"issue":"1","key":"e_1_2_1_2_25_2","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/S0304-3975(00)00102-X","article-title":"Well-structured transition systems everywhere!","volume":"256","author":"Finkel A","year":"2001","journal-title":"Theor Comput Sci"},{"key":"e_1_2_1_2_26_2","doi-asserted-by":"crossref","unstructured":"Godefroid P (1996) Partial-order methods for the verification of concurrent systems: an approach to the state-explosion problem. Lecture notes in computer science vol 1032. Springer Berlin","DOI":"10.1007\/3-540-60761-7"},{"key":"e_1_2_1_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3092819.3092824"},{"key":"e_1_2_1_2_28_2","doi-asserted-by":"crossref","unstructured":"Hazewinkel M (1987) Encyclopaedia of mathematics (1). Encyclopaedia of mathematics: an updated and annotated translation of the soviet ``Mathematical Encyclopaedia''. Springer Berlin","DOI":"10.1007\/978-94-015-1239-8"},{"key":"e_1_2_1_2_29_2","first-page":"265","volume-title":"The theory of hybrid automata","author":"Henzinger TA","year":"2000"},{"key":"e_1_2_1_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/32.588521"},{"key":"e_1_2_1_2_31_2","unstructured":"Ikiz S Garg VK (2004) Online algorithms for Dilworth's chain partition. Technical report Parallel and Distributed Systems Laboratory Department of Electrical and Computer Engineering University of Texas at Austin"},{"key":"e_1_2_1_2_32_2","doi-asserted-by":"crossref","unstructured":"Jang K Sherry J Ballani H Moncaster T (2015) Silo: predictable message latency in the cloud. In: Proceedings of the 2015 ACM conference on special interest group on data communication SIGCOMM 2015 London UK 17\u201321 Aug 2015 pp 435\u2013448","DOI":"10.1145\/2829988.2787479"},{"key":"e_1_2_1_2_33_2","doi-asserted-by":"crossref","unstructured":"Kc K Anyanwu K (2010) Scheduling hadoop jobs to meet deadlines. In: Proceedings of the IEEE 2nd international conference on cloud computing technology and science. IEEE","DOI":"10.1109\/CloudCom.2010.97"},{"key":"e_1_2_1_2_34_2","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1109\/WFCS.2004.1377759","article-title":"Timed automata approach to real time distributed system verification","volume":"2004","author":"Krakora J","year":"2004","journal-title":"Proceedings of the IEEE international workshop on factory communication systems"},{"key":"e_1_2_1_2_35_2","doi-asserted-by":"crossref","unstructured":"Li S Hu S Wang S Su L Abdelzaher T Gupta I Pace R (2014) WOHA: deadline-aware map-reduce workflow scheduling framework over hadoop clusters. In: Proceedings of the IEEE 34th international conference on distributed computing systems. IEEE","DOI":"10.1109\/ICDCS.2014.18"},{"issue":"4","key":"e_1_2_1_2_36_2","doi-asserted-by":"crossref","first-page":"522","DOI":"10.1016\/j.jal.2014.07.005","article-title":"Constraint LTL satisfiability checking without automata","volume":"12","author":"Bersani MM","year":"2014","journal-title":"J Appl Log"},{"key":"e_1_2_1_2_37_2","unstructured":"MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability vol 1. University of California Press pp 281\u2013297"},{"key":"e_1_2_1_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/1132960.1132962"},{"key":"e_1_2_1_2_39_2","doi-asserted-by":"crossref","unstructured":"Marconi F Quattrocchi G Baresi L Bersani MM Rossi M (2018) On the timed analysis of big-data applications. In: Dutle A Mu\u00f1oz CA Narkawicz A (eds) NASA formal methods\u201410th international symposium NFM 2018 Newport News VA USA 17\u201319 Apr 2018 Proceedings. Lecture notes in computer science vol 10811. Springer pp 315\u2013332","DOI":"10.1007\/978-3-319-77935-5_22"},{"key":"e_1_2_1_2_40_2","unstructured":"Ousterhout K Rasti R Ratnasamy S Shenker S Chun B (2015) Making sense of performance in data analytics frameworks. In: Proceedings of the 12th USENIX conference on networked systems design and implementation. USENIX"},{"key":"e_1_2_1_2_41_2","volume-title":"DICE simulation tools: final version","author":"Perez D","year":"2017"},{"key":"e_1_2_1_2_42_2","doi-asserted-by":"crossref","unstructured":"Palencia JC Gonzalez Harbour M (1998) Schedulability analysis for tasks with static and dynamic offsets. In: Proceedings of the IEEE real-time systems symposium pp 26\u201337","DOI":"10.1109\/REAL.1998.739728"},{"key":"e_1_2_1_2_43_2","unstructured":"Politecnico di Milano (2019) The Zot Bounded Model\/Satisfiability Checker. github.com\/fm-polimi\/zot"},{"key":"e_1_2_1_2_44_2","doi-asserted-by":"publisher","DOI":"10.5555\/2944214.2944278"},{"key":"e_1_2_1_2_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2013.05.001"},{"issue":"2","key":"e_1_2_1_2_46_2","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1016\/j.tcs.2005.11.018","article-title":"Scheduling with timed automata","volume":"354","author":"Yasmina Abdedda\u00efm Y","year":"2006","journal-title":"Theor Comput Sci"},{"key":"e_1_2_1_2_47_2","doi-asserted-by":"crossref","unstructured":"Yu J Chen H Hu F (2015) SASM: improving Spark performance with adaptive skew mitigation. In: 2015 IEEE international conference on progress in informatics and computing (PIC)","DOI":"10.1109\/PIC.2015.7489818"}],"container-title":["Formal Aspects of Computing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s00165-020-00505-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s00165-020-00505-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s00165-020-00505-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1007\/s00165-020-00505-4","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,7]],"date-time":"2022-01-07T06:57:21Z","timestamp":1641538641000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1007\/s00165-020-00505-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2]]},"references-count":47,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,2]]}},"alternative-id":["10.1007\/s00165-020-00505-4"],"URL":"https:\/\/doi.org\/10.1007\/s00165-020-00505-4","relation":{},"ISSN":["0934-5043","1433-299X"],"issn-type":[{"value":"0934-5043","type":"print"},{"value":"1433-299X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2]]},"assertion":[{"value":"18 February 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 February 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}