Show simple item record

dc.contributor.authorNaeem, MAen_NZ
dc.contributor.authorDobbie, Gen_NZ
dc.contributor.authorLutteroth, Cen_NZ
dc.contributor.authorWeber, Gen_NZ
dc.date.accessioned2017-03-14T01:48:19Z
dc.date.available2017-03-14T01:48:19Z
dc.date.copyright2016en_NZ
dc.identifier.citationInformation Systems. Volume 64, March 2017, Pages 63–74
dc.identifier.issn0306-4379en_NZ
dc.identifier.urihttp://hdl.handle.net/10292/10378
dc.description.abstractSemi-stream join algorithms join a fast data stream with a disk-based relation. This is important, for example, in real-time data warehousing where a stream of transactions is joined with master data before loading it into a data warehouse. In many important scenarios, the stream input has a skewed distribution, which makes certain performance optimizations possible. We propose two such optimization techniques: (1) a caching technique for frequently used master data and (2) a technique for selective load shedding of stream tuples. The caching technique is fine-grained, operating on a tuple-level. Furthermore, it is generic in the sense that it can be applied to different semi-stream join algorithms to deal with data skew. We analyze it by combining it with various well-known semi-stream joins, and show that it improves the service rate by more than 40% for typical data with skewed distributions. The load shedding technique sheds the fraction of the stream that is most expensive to join. In contrast to existing approaches, the service rate improves under load shedding. We present experimental data showing significant improvements as compared to related approaches and perform a sensitivity analysis for various internal parameters.
dc.publisherElsevier
dc.relation.urihttp://www.sciencedirect.com/science/article/pii/S0306437916304161
dc.rightsCopyright © 2017 Elsevier Ltd. All rights reserved. This is the author’s version of a work that was accepted for publication in (see Citation). Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. The definitive version was published in (see Citation). The original publication is available at (see Publisher's Version).
dc.subjectSemi-stream processing; Join; Front-stage cache; Performance optimization
dc.titleSkewed Distributions in Semi-stream Joins: How Much Can Caching Help?en_NZ
dc.typeJournal Article
dc.rights.accessrightsOpenAccessen_NZ
dc.identifier.doi10.​1016/​j.​is.​2016.​09.​007en_NZ
pubs.elements-id212219
aut.relation.journalInformation Systemsen_NZ


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record