{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T22:52:48Z","timestamp":1757631168473,"version":"3.44.0"},"reference-count":9,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:p>Nowadays, Apache Hive has been widely used for large-scale data analysis applications in many organizations. Various visual analytical tools are developed to help Hive users quickly analyze the query execution process and identify the performance bottleneck of executed queries. However, existing tools mostly focus on showing the time usage of query sub-components (jobs and operators) but fail to provide enough evidence to analyze the root reasons for the slow execution progress. To tackle this problem, we develop a visual analytical system DHive to visualize and analyze the query execution progress via dataflow analysis. DHive shows the dataflow during query execution at multiple levels: query level, job level and task level, which enable users to identify the key jobs\/tasks and explain their time usage by linking them to the auxiliary information such as the system configuration and hardware status. We demonstrate the effectiveness of DHive by two cases in a production cluster. DHive is open-source at https:\/\/github.com\/DBGroup-SUSTech\/DHive.git.<\/jats:p>","DOI":"10.14778\/3611540.3611605","type":"journal-article","created":{"date-parts":[[2023,9,15]],"date-time":"2023-09-15T11:32:37Z","timestamp":1694777557000},"page":"3998-4001","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["DHive: Query Execution Performance Analysis via Dataflow in Apache Hive"],"prefix":"10.14778","volume":"16","author":[{"given":"Chaozu","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Southern University of Science and Technology"}]},{"given":"Qiaomu","family":"Shen","sequence":"additional","affiliation":[{"name":"Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology"}]},{"given":"Bo","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Southern University of Science and Technology"}]}],"member":"320","published-online":{"date-parts":[[2023,8]]},"reference":[{"unstructured":"2023. Tez UI. https:\/\/tez.apache.org\/tez-ui.html","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","first-page":"12","article-title":"Cui and et al. 2011. Textflow: Towards better understanding of evolving topics in text","volume":"17","author":"Weiwei","year":"2011","unstructured":"Weiwei Cui and et al. 2011. Textflow: Towards better understanding of evolving topics in text. TVCG 17, 12 (2011), 2412--2421.","journal-title":"TVCG"},{"key":"e_1_2_1_3_1","volume-title":"Theia: Visual signatures for problem diagnosis in large hadoop clusters. In LISA. 33--42.","author":"Garduno Elmer","year":"2012","unstructured":"Elmer Garduno and et al. 2012. Theia: Visual signatures for problem diagnosis in large hadoop clusters. In LISA. 33--42."},{"volume-title":"Proceedings of the 2022 ACM SIGMOD. 2417--2420","author":"Haotian","unstructured":"Haotian Liu and et al. 2022. GHive: A Demonstration of GPU-Accelerated Query Processing in Apache Hive. In Proceedings of the 2022 ACM SIGMOD. 2417--2420.","key":"e_1_2_1_4_1"},{"key":"e_1_2_1_5_1","first-page":"8","article-title":"Ma and et al. 2020. Diagnosing root causes of intermittent slow queries in cloud databases","volume":"13","author":"Minghua","year":"2020","unstructured":"Minghua Ma and et al. 2020. Diagnosing root causes of intermittent slow queries in cloud databases. Proceedings of the VLDB Endowment 13, 8 (2020), 1176--1189.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_6_1","volume-title":"Perfopticon: Visual query analysis for distributed databases. In Computer Graphics Forum","author":"Moritz Dominik","year":"2015","unstructured":"Dominik Moritz and et al. 2015. Perfopticon: Visual query analysis for distributed databases. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 71--80."},{"key":"e_1_2_1_7_1","first-page":"6","article-title":"Muelder and et al. 2016. Visual analysis of cloud computing performance using behavioral lines","volume":"22","author":"Chris","year":"2016","unstructured":"Chris Muelder and et al. 2016. Visual analysis of cloud computing performance using behavioral lines. TVCG 22, 6 (2016), 1694--1704.","journal-title":"TVCG"},{"key":"e_1_2_1_8_1","volume-title":"QEVIS: Multi-grained Visualization of Distributed Query Execution","author":"Shen Qiaomu","year":"2023","unstructured":"Qiaomu Shen and et al. 2023. QEVIS: Multi-grained Visualization of Distributed Query Execution. IEEE VIS (2023)."},{"volume-title":"Proceedings of the 2016 ACM SIGMOD. 1599--1614","author":"Young Dong","unstructured":"Dong Young Yoon and et.al. 2016. Dbsherlock: A performance diagnostic tool for transactional databases. In Proceedings of the 2016 ACM SIGMOD. 1599--1614.","key":"e_1_2_1_9_1"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3611540.3611605","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T22:34:03Z","timestamp":1757543643000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3611540.3611605"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8]]},"references-count":9,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["10.14778\/3611540.3611605"],"URL":"https:\/\/doi.org\/10.14778\/3611540.3611605","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2023,8]]},"assertion":[{"value":"2023-08-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}