The State of Big Data Reference Architectures: A Systematic Literature Review
Big Data (BD) is a nascent term emerged to describe large amount of data that comes in different forms from various channels. In modern world, users are the ceaseless generators of structured, semi-structured, and unstructured data that if gleaned and crunched precisely, will reveal game-changing patterns. While the opportunities exist with BD, the unprecedented amount of data has brought traditional approaches to a bottleneck, and the growth of data is outpacing technological and scientific advances in data analytics. It is estimated that approximately 75% of the BD projects have failed within the last decade according to multiple sources. Among the challenges, system development and data architecture are prominent. This paper aims to facilitate BD system development and architecture by conducting a systematic literature review on BD reference architectures (RA). The primary goal is to highlight the state of BD RAs and how they can be helpful for BD system development. The secondary goal is to find all BD RAs, describe the challenges of creating these RA, discuss the common architectural components of these RA and the limitations of these RA. As a result of this work, firstly major concepts about RA are discussed and their applicability to BD system development is depicted. Secondly, 22 BD reference architecture is assessed from academia and practice and their commonalities, challenges, and limitations are identified. The findings gained emerges the understanding that RAs can be an effective artefact to tackle complex BD system development.