HYBRIDJOIN for Near Real-time Data Warehousing

Naeem, M
Dobbie, G
Weber, G
Item type
Commissioned Report
Degree name
Journal Title
Journal ISSN
Volume Title
University of Auckland

In order to make timely and effective decisions, businesses need the latest information from data warehouse repositories. To keep these repositories up-to-date with respect to the end user updates, near real-time data integration is required. An important phase in near real-time data integration is data transformation where the stream of updates is joined with disk-based master data. The stream-based algorithm, Hybrid Join (HYBRIDJOIN), performs well in general but has not been optimized for real world conditions. In real world market economics, a few products are sold more frequently as compared to the rest of the products; therefore, a large number of sale transactions relate to a small portion of master data. In the transformation phase, to join the input stream of sales transactions with disk-based master data, HYBRIDJOIN loads that particular part of master data each time from the disk, increasing the disk access cost significantly with a negative effect on performance. Contrarily, X-HYBRIDJOIN stores that particular part of master data in memory permanently, eliminating the disk access overhead significantly. To validate the arguments and analyze the performance of X-HYBRIDJOIN an experimental study is conducted.

Software Engineering, The University of Auckland. (2010, July). Research Report Series (TR Number: UoA-SE-2010-2). Retrieved from (see Publisher's Version).
Rights statement
Material which is clearly indicated as owned by the University of Auckland may only be used for "not-for-profit" educational purposes or private research and study in accordance with the Copyright Act 1994, provided that textual and graphical content are not altered and that the University's ownership of the material is acknowledged. The University reserves the right withdraw this permission at any time.