From data mining and knowledge discovery to big data analytics and knowledge extraction for applications in science

Date
2014-12-10
Authors
Shanmuganathan, S
Supervisor
Item type
Journal Article
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
Science Publications
Abstract

“Data mining” for “knowledge discovery in databases” and associated computational operations first introduced in the mid-1990 s can no longer cope with the analytical issues relating to the so-called “big data”. The recent buzzword big data refers to large volumes of diverse, dynamic, complex, longitudinal and/or distributed data generated from instruments, sensors, Internet transactions, email, video, click streams, noisy, structured/unstructured and/or all other digital sources available today and in the future at speeds and on scales never seen before in human history. The big data also being described using 3 Vs, volume, variety and velocity (with an additional 4th V for “veracity” and more recently with a 5th V for “value”), requires a set of new technologies, such as high performance computing i.e., exascale, architectures (distributed or grid), algorithms (for data clustering and generating association rules), programming languages, automated and scalable software tools, to uncover hidden patterns, unknown correlations and other useful information lately referred to as “actionable knowledge” or “data products” from the massive volumes of complex raw data. In view of the above facts, the paper gives an introduction to the synergistic challenges in “data-intensive” science and “exascale” computing for resolving “big data analytics” and “data science” issues in four main disciplines namely, computer science, computational science, statistics and mathematics. For the realisation of vital identified foundational aspects of an effective cyber infrastructure, basic problems need to be addressed adequately in the respective disciplines and are outlined. Finally, the paper looks at five scientific research projects that are urgently in need of high performance computing; this is in contrast to the earlier situations where private business enterprises were the drivers of better modern and faster technologies

Description
Keywords
Source
Journal of Computer Science, vol.10(12), pp.1 - 8 (8)
Rights statement
© 2014 S. Shanmuganathan, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license.