Integrative methods for gene data analysis and knowledge discovery on the case study of KEDRI’s brain gene ontology
In 2003, Pomeroy et al. published a research study that described a gene expression based prediction of central nervous system embryonal tumour (CNS) outcome. Over a half of decade, many models and approaches have been developed based on experimental data consisting of 99 samples with 7,129 genes. The way, how meaningful knowledge from these models can be extracted, and how this knowledge for further research is still a hot topic. This thesis addresses this and has developed an information method that includes modelling of interactive patterns, important genes discovery and visualisation of the obtained knowledge. The major goal of this thesis is to discover important genes responsible for CNS tumour and import these genes into a well structured knowledge framework system, called Brain-Gene-Ontology. In this thesis, we take the first step towards finding the most accurate model for analysing the CNS tumour by offering a comparative study of global, local and personalised modelling. Five traditional modelling approaches and a new personalised method – WWKNN (weighted distance, weighted variables K-nearest neighbours) – are investigated. To increase the classification accuracy and one-vs.-all based signal to- noise ratio is also developed for pre-processing experimental data. For the knowledge discovery, CNS-based ontology system is developed. Through ontology analysis, 21 discriminate genes are found to be relevant for different CNS tumour classes, medulloblastoma tumour subclass and medulloblastoma treatment outcome. All the findings in this thesis contribute for expanding the information space of the BGO framework.