Integration of Data Mining and Data Warehousing: a practical methodology

Date
2010
Authors
Usman, M
Pears, R
Supervisor
Item type
Journal Article
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
International Journal of Advancements in Computing Technology (IJACT)
Abstract

The ever growing repository of data in all fields poses new challenges to the modern analytical systems. Real-world datasets, with mixed numeric and nominal variables, are difficult to analyze and require effective visual exploration that conveys semantic relationships of data. Traditional data mining techniques such as clustering clusters only the numeric data. Little research has been carried out in tackling the problem of clustering high cardinality nominal variables to get better insight of underlying dataset. Several works in the literature proved the likelihood of integrating data mining with warehousing to discover knowledge from data. For the seamless integration, the mined data has to be modeled in form of a data warehouse schema. Schema generation process is complex manual task and requires domain and warehousing familiarity. Automated techniques are required to generate warehouse schema to overcome the existing dependencies. To fulfill the growing analytical needs and to overcome the existing limitations, we propose a novel methodology in this paper that permits efficient analysis of mixed numeric and nominal data, effective visual data exploration, automatic warehouse schema generation and integration of data mining and warehousing. The proposed methodology is evaluated by performing case study on real-world data set. Results show that multidimensional analysis can be performed in an easier and flexible way to discover meaningful knowledge from large datasets.

Description
Keywords
Automatic Schema, Clustering, Data Warehouse, Multi-dimensional Analysis
Source
Journal of Advancements in Computing Technology, vol.2(3), pp.31 - 46
Publisher's version
Rights statement
Integrated Publishing Association which supports IJACIT is a RoMeo green publisher– RoMEO is a database of Publishers copyright and self archiving policies hosted be the University of Nottingham and we had also signed in the Budapest open access initiative to show our commitment towards open access publishing.