{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T05:49:46Z","timestamp":1719899386322},"reference-count":33,"publisher":"IGI Global","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,10,1]]},"abstract":"<p>An important component of near-real-time data warehouses is the near-real-time integration layer. One important element in near-real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join (MESHJOIN), can be used. However, in MESHJOIN the performance of the algorithm is inversely proportional to the size of disk-based relation. The Index Nested Loop Join (INLJ) can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. This paper introduces a robust stream-based join algorithm called Hybrid Join (HYBRIDJOIN), which combines the two approaches. A theoretical result shows that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. The authors present performance measurements of the implementation. In experiments using synthetic data based on a Zipfian distribution, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.<\/p>","DOI":"10.4018\/jdwm.2011100102","type":"journal-article","created":{"date-parts":[[2011,10,19]],"date-time":"2011-10-19T16:11:46Z","timestamp":1319040706000},"page":"21-42","source":"Crossref","is-referenced-by-count":17,"title":["HYBRIDJOIN for Near-Real-Time Data Warehousing"],"prefix":"10.4018","volume":"7","author":[{"given":"M. Asif","family":"Naeem","sequence":"first","affiliation":[{"name":"The University of Auckland, New Zealand"}]},{"given":"Gillian","family":"Dobbie","sequence":"additional","affiliation":[{"name":"The University of Auckland, New Zealand"}]},{"given":"Gerald","family":"Weber","sequence":"additional","affiliation":[{"name":"The University of Auckland, New Zealand"}]}],"member":"2432","reference":[{"key":"jdwm.2011100102-0","first-page":"130","author":"C.Anderson","year":"2006","journal-title":"The long tail: Why the future of business is selling less of more"},{"key":"jdwm.2011100102-1","doi-asserted-by":"publisher","DOI":"10.1145\/603867.603884"},{"key":"jdwm.2011100102-2","doi-asserted-by":"crossref","unstructured":"Chakraborty, A., & Singh, A. (2009). A partition-based approach to support streaming updates over persistent data in an active data warehouse. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (pp. 1-11).","DOI":"10.1109\/IPDPS.2009.5161064"},{"key":"jdwm.2011100102-3","doi-asserted-by":"crossref","unstructured":"Golab, L., & \u00d6zsu, M. T. (2003). Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany (pp. 500-511).","DOI":"10.1016\/B978-012722442-8\/50051-3"},{"key":"jdwm.2011100102-4","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2009010101"},{"key":"jdwm.2011100102-5","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2009080702"},{"key":"jdwm.2011100102-6","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/4472.001.0001","volume":"Vol. 18","author":"A.Gupta","year":"1999","journal-title":"Maintenance of materialized views: Problems, techniques, and applications"},{"key":"jdwm.2011100102-7","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-006-0017-y"},{"key":"jdwm.2011100102-8","doi-asserted-by":"publisher","DOI":"10.1145\/304181.304209"},{"key":"jdwm.2011100102-9","doi-asserted-by":"crossref","unstructured":"Karakasidis, A., Vassiliadis, P., & Pitoura, E. (2005). ETL queues for active data warehousing. In Proceedings of the 2nd International Workshop on Information Quality in Information Systems, Baltimore, MD (pp. 28-39).","DOI":"10.1145\/1077501.1077509"},{"key":"jdwm.2011100102-10","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2005010102"},{"key":"jdwm.2011100102-11","first-page":"400","volume":"Vol. 3","author":"D. E.Knuth","year":"1998","journal-title":"The art of computer programming"},{"key":"jdwm.2011100102-12","unstructured":"Labio, W., & Garcia-Molina, H. (1996). Efficient snapshot differential algorithms for data warehousing. In Proceedings of the 22nd International Conference on Very Large Data Bases, San Francisco, CA (pp. 63-74.)"},{"key":"jdwm.2011100102-13","doi-asserted-by":"publisher","DOI":"10.1145\/335191.335379"},{"key":"jdwm.2011100102-14","unstructured":"Labio, W., Yang, J., Cui, Y., Garcia-Molina, H., & Widom, J. (2000). Performance issues in incremental warehouse maintenance. In Proceedings of the 26th International Conference on Very Large Data Bases, San Francisco, CA (pp. 461-472)."},{"key":"jdwm.2011100102-15","unstructured":"Lawrence, R. (2005). Early hash join: A configurable algorithm for the efficient and early production of join results. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway (pp. 841-852)."},{"key":"jdwm.2011100102-16","doi-asserted-by":"crossref","unstructured":"Mokbel, M. F., Lu, M., & Aref, W. G. (2004). Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In Proceedings of the 20th International Conference on Data Engineering, Washington, DC (pp. 251-263).","DOI":"10.1109\/ICDE.2004.1320002"},{"key":"jdwm.2011100102-17","doi-asserted-by":"crossref","unstructured":"Naeem, M. A., Dobbie, G., & Weber, G. (2008). An event-based near real-time data integration architecture. In Proceedings of the 12th Enterprise Distributed Object Computing Conference Workshops, Washington, DC (pp. 401-404).","DOI":"10.1109\/EDOCW.2008.14"},{"key":"jdwm.2011100102-18","unstructured":"Nguyen, T. M. (2003). Zero-latency data warehousing for heterogeneous data sources and continuous data streams. In Proceedings of the Fifth International Conference on Information Integration and Web-based Applications Services, Jakarta, Indonesia (pp. 55-64)."},{"key":"jdwm.2011100102-19","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2005100102"},{"key":"jdwm.2011100102-20","doi-asserted-by":"publisher","DOI":"10.1007\/s10619-009-7054-7"},{"key":"jdwm.2011100102-21","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.27"},{"key":"jdwm.2011100102-22","doi-asserted-by":"crossref","unstructured":"Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., & Frantzell, N. E. (2007). Supporting streaming updates in an active data warehouse. In Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey (pp. 476-485).","DOI":"10.1109\/ICDE.2007.367893"},{"key":"jdwm.2011100102-23","first-page":"337","author":"R.Ramakrishnan","year":"1999","journal-title":"Database management systems"},{"key":"jdwm.2011100102-24","doi-asserted-by":"publisher","DOI":"10.1145\/6314.6315"},{"key":"jdwm.2011100102-25","doi-asserted-by":"crossref","unstructured":"Thiele, M., Fischer, U., & Lehner, W. (2007). Partition-based workload scheduling in living data warehouse environments. In Proceedings of the ACM Tenth International Workshop on Data Warehousing and OLAP, Lisbon, Portugal (pp. 57-64).","DOI":"10.1145\/1317331.1317342"},{"key":"jdwm.2011100102-26","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2009070103"},{"issue":"27","key":"jdwm.2011100102-27","article-title":"XJoin: A reactively-scheduled pipelined join operator.","volume":"23","author":"T.Urhan","year":"2000","journal-title":"A Quarterly Bulletin of the Computer Society of the IEEE Technical Committee on Data Engineering"},{"key":"jdwm.2011100102-28","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2009070101"},{"key":"jdwm.2011100102-29","doi-asserted-by":"crossref","unstructured":"Wilschut, A. N., & Apers, P. M. G. (1990). Pipelining in query execution. In Proceedings of the International Conference on Databases, Parallel Architectures and their Applications, Miami Beach, FL (p. 562).","DOI":"10.1109\/PARBSE.1990.77227"},{"key":"jdwm.2011100102-30","doi-asserted-by":"crossref","unstructured":"Wilschut, A. N., & Apers, P. M. G. (1991). Dataflow query execution in a parallel main-memory environment. In Proceedings of the First International Conference on Parallel and Distributed Information Systems, Miami, FL (pp. 68-77).","DOI":"10.1109\/PDIS.1991.183069"},{"key":"jdwm.2011100102-31","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4379(01)00049-7"},{"key":"jdwm.2011100102-32","doi-asserted-by":"crossref","unstructured":"Zhuge, Y., Garc\u00eda-Molina, H., Hammer, J., & Widom, J. (1995). View maintenance in a warehousing environment. In Proceedings of the ACM SIGMOD International Conference on Management of Data, San Jose, CA (pp. 316-327).","DOI":"10.1145\/568271.223848"}],"container-title":["International Journal of Data Warehousing and Mining"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=58636","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T19:06:46Z","timestamp":1654110406000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jdwm.2011100102"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2011,10,1]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2011,10]]}},"URL":"https:\/\/doi.org\/10.4018\/jdwm.2011100102","relation":{},"ISSN":["1548-3924","1548-3932"],"issn-type":[{"value":"1548-3924","type":"print"},{"value":"1548-3932","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,10,1]]}}}