AUT LibraryAUT
View Item 
  •   Open Research
  • AUT Faculties
  • Faculty of Design and Creative Technologies (Te Ara Auaha)
  • School of Engineering, Computer and Mathematical Sciences - Te Kura Mātai Pūhanga, Rorohiko, Pāngarau
  • View Item
  •   Open Research
  • AUT Faculties
  • Faculty of Design and Creative Technologies (Te Ara Auaha)
  • School of Engineering, Computer and Mathematical Sciences - Te Kura Mātai Pūhanga, Rorohiko, Pāngarau
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

X-HYBRIDJOIN for near-real-time Data Warehousing

Naeem, MA; Dobbie, G; Weber, G
Thumbnail
View/Open
X-HYBRIDJOIN_for_Near-Real-Time_Data_Warehousing.pdf (436.6Kb)
Permanent link
http://hdl.handle.net/10292/4052
Metadata
Show full metadata
Abstract
In order to make timely and effective decisions, businesses need the latest information from data warehouse repositories. To keep these repositories up-to-date with respect to end user updates, near-real-time data integration is required. An important phase in near-real-time data integration is data transformation where the stream of updates is joined with disk-based master data. The stream-based algorithm Mesh Join (MESHJOIN) has been proposed to amortize disk access over fast stream. MESHJOIN makes no assumptions about the data distribution. In real world applications, however, skewed distributions can be found, e.g, certain products are sold more frequently than the remainder of the products. The question arises, how much does MESHJOIN loose in terms of performance by not adapting to data skew. In this paper we perform a rigorous experimental study analyzing the possible performance improvements while considering typical data distributions. For this purpose we design an algorithm Extended Hybrid Join (X-HYBRIDJOIN) that is complementary to MESHJOIN in that it can adapt to data skew and stores parts of the master data in memory permanently, reducing the disk access overhead significantly. We compare the performance of X-HYBRIDJOIN against the performance of MESHJOIN. We take several precautions to make sure the comparison is adequate and focuses on the utilization of data skew. The experiments show that considering data skew offers substantial room for performance gains that cannot be used by non-adaptive approaches such as MESHJOIN.
Date
2011
Source
28th British National Conference on Databases (BNCOD 2011), UK
Item Type
Conference Contribution
Publisher
Springer-Verlag
Publisher's Version
http://www.springerlink.com/content/978-3-642-24576-3#section=979035&page=1&locus=0
Rights Statement
An author may self-archive an author-created version of his/her article on his/her own website and or in his/her institutional repository. He/she may also deposit this version on his/her funder’s or funder’s designated repository at the funder’s request or as a result of a legal obligation, provided it is not made publicly available until 12 months after official publication. He/ she may not use the publisher's PDF version, which is posted on www.springerlink.com, for the purpose of self-archiving or deposit.

Contact Us
  • Admin

Hosted by Tuwhera, an initiative of the Auckland University of Technology Library

 

 

Browse

Open ResearchTitlesAuthorsDateSchool of Engineering, Computer and Mathematical Sciences - Te Kura Mātai Pūhanga, Rorohiko, PāngarauTitlesAuthorsDate

Alternative metrics

 

Statistics

For this itemFor all Open Research

Share

 
Follow @AUT_SC

Contact Us
  • Admin

Hosted by Tuwhera, an initiative of the Auckland University of Technology Library