Automated Knowledge Enrichment for Semantic Web Data
The Semantic Web is an effort to interchange unstructured data over the Web into a structured format that is processable not only by human beings but also computers. The Semantic Web creates a distributed framework to publish, query, and reuse information. The key backbones of Semantic Web are ontologies and annotations that provide semantics for raw data known as RDF data. Although there exist many Semantic Web applications, sophisticated analytical infrastructures are still lacking, preventing users from extracting the semantics attached to RDF data. Additionally, the Semantic Web data face with a wide range of data quality issues due to the distributed nature of the Semantic Web. This thesis presents three approaches based on the following purposes: (I) to express the semantics behind discovered patterns, (II) to deal with a Semantic Web data quality issue, and (III) to enrich knowledge in the Semantic Web ontologies. The following contributions have been made in this thesis. Firstly, the thesis shows the influence of relations and ontological knowledge in the process of mining hidden patterns and proposes Semantic Web Association Rule Mining (SWARM), an automated mining approach that attaches semantics to the discovered patterns. Secondly, the thesis concentrates on a data quality issue in the Semantic Web field which indicates incorrect assignments between instances and classes in the ontology. To this end, Class Assignment Detector (CAD) approach has been designed to tackle the data quality issue. Thirdly, the thesis enhances the process of ontology enrichment by generating new classes by mining instance-level and schema-level knowledge. Since ontologies are often designed before actual usage, Class Enricher (CEn) approach is developed to extract new classes which are not defined in the ontologies. All the proposed approaches have been tested over real datasets to validate their effectiveness.