Skewed Distributions in Semi-stream Joins: How Much Can Caching Help?

aut.relation.journalInformation Systemsen_NZ
aut.researcherNaeem, Muhammad
dc.contributor.authorNaeem, MAen_NZ
dc.contributor.authorDobbie, Gen_NZ
dc.contributor.authorLutteroth, Cen_NZ
dc.contributor.authorWeber, Gen_NZ
dc.date.accessioned2017-03-14T01:48:19Z
dc.date.available2017-03-14T01:48:19Z
dc.date.copyright2016en_NZ
dc.date.issued2016en_NZ
dc.description.abstractSemi-stream join algorithms join a fast data stream with a disk-based relation. This is important, for example, in real-time data warehousing where a stream of transactions is joined with master data before loading it into a data warehouse. In many important scenarios, the stream input has a skewed distribution, which makes certain performance optimizations possible. We propose two such optimization techniques: (1) a caching technique for frequently used master data and (2) a technique for selective load shedding of stream tuples. The caching technique is fine-grained, operating on a tuple-level. Furthermore, it is generic in the sense that it can be applied to different semi-stream join algorithms to deal with data skew. We analyze it by combining it with various well-known semi-stream joins, and show that it improves the service rate by more than 40% for typical data with skewed distributions. The load shedding technique sheds the fraction of the stream that is most expensive to join. In contrast to existing approaches, the service rate improves under load shedding. We present experimental data showing significant improvements as compared to related approaches and perform a sensitivity analysis for various internal parameters.
dc.identifier.citationInformation Systems. Volume 64, March 2017, Pages 63–74
dc.identifier.doi10.​1016/​j.​is.​2016.​09.​007en_NZ
dc.identifier.issn0306-4379en_NZ
dc.identifier.urihttps://hdl.handle.net/10292/10378
dc.publisherElsevier
dc.relation.urihttp://www.sciencedirect.com/science/article/pii/S0306437916304161
dc.rightsCopyright © 2017 Elsevier Ltd. All rights reserved. This is the author’s version of a work that was accepted for publication in (see Citation). Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. The definitive version was published in (see Citation). The original publication is available at (see Publisher's Version).
dc.rights.accessrightsOpenAccessen_NZ
dc.subjectSemi-stream processing; Join; Front-stage cache; Performance optimization
dc.titleSkewed Distributions in Semi-stream Joins: How Much Can Caching Help?en_NZ
dc.typeJournal Article
pubs.elements-id212219
pubs.organisational-data/AUT
pubs.organisational-data/AUT/Design & Creative Technologies
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CMESHJOIN.pdf
Size:
663.09 KB
Format:
Adobe Portable Document Format
Description:
Journal article
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RE4.10 Grant of Licence.docx
Size:
14.05 KB
Format:
Microsoft Word 2007+
Description: