Materialization strategies for web based search computing applications

Date
2014
Authors
Zagorac, Srdan
Supervisor
Pears, Russel
Item type
Thesis
Degree name
Doctor of Philosophy
Journal Title
Journal ISSN
Volume Title
Publisher
Auckland University of Technology
Abstract

In the thesis we provide a characterization of view materialization in the context of multi domain heterogeneous search application. Web data view materialization is presented as a solution for technical constraints and problems implied by the unknown structure of the web data sources. The web data materialization model extends the search computing (SeCo) multi-layered model, where the search services are registered in a service repository that describes the functional (e.g. invocation end-point, input and output attributes) information of data end-points. Our first research goal is to solve the problem of finding a sequence of access patterns, which when executed produces a materialization output. For the first research goal we make the following novel contributions: 1) Formulation of the building blocks for the materialization feasibility analysis; 2) Definition of the materialization feasibility analysis method and the accompanying algorithms; 3) A detailed empirical study conducted on a set of materialization tasks ranging in their schema dependency complexity.

Our second research goal is the optimization of the materialization process so that the most optimal sequence in terms of materialization output efficiency and quality, executes at all times. For this goal we make the following novel contributions: 1) Formulation of a set of performance dimensions and their metrics for web source materialization; 2) A cost model that utilizes optimization metrics in order to qualitatively differentiate between web services in terms of materialization time; 3) A query optimization procedure that explores the characteristics of the underlying source data domain in order to prioritize the execution of the most productive queries in terms of their data harvesting power; 4) Materialization process optimization strategies based on the web source performance dimension metrics and query optimization procedure; 5) A detailed empirical study conducted on several relevant web based data sources that clearly shows the effectiveness of the proposed solution.

Description
Keywords
Multi-domain search , Materialization feasibility analysis , Materialization optimization , Data surfacing , Deep web mining , Web data materialization , Web data services
Source
DOI
Publisher's version
Rights statement
Collections