Integrative approaches to modelling and knowledge discovery of molecular interactions in bioinformatics
The core focus of this research lies in developing and using intelligent methods to solve biological problems and integrating the knowledge for understanding the complex gene regulatory phenomenon. We have developed an integrative framework and used it to: model molecular interactions from separate case studies on time-series gene expression microarray datasets, molecular sequences and structure data including the functional role of microRNAs; to extract knowledge; and to build reusable models for the central dogma theme. Knowledge was integrated with the use of ontology and it can be reused to facilitate new discoveries as demonstrated on one of our systems – the Brain Gene Ontology (BGO). The central dogma theme states that proteins are produced from the DNA (gene) via an intermediate transcript called RNA. Later these proteins play the role of enzymes to perform the checkpoints as a gene expression control. Also, according to the recently emerged paradigm, sometimes genes do not code for proteins but results in small molecules of microRNAs which in turn controls the gene regulation. The idea is that such a very complicated molecular biology process (central dogma) results in production of a wide variety of data that can be used by computer scientists for modelling and to enable discoveries. We have suggested that this range of data should actually be taken into account for analysis to understand the concept of gene regulation instead of just taking one source of data and applying some standard methods to reveal facts in the system biology. The problem is very complex and, currently, computational algorithms have not been really successful because either existing methods have certain problems or the proven results were obtained for only one domain of the central dogma of molecular biology, so there has always been a lack of knowledge integration. Proper maintenance of diverse sources of data, structures and, in particular, their adaptation to new knowledge is one of the most challenging problems and one of the crucial tasks towards the knowledge integration vision is the efficient encoding of human knowledge in ontologies. More specifically this work has contributed towards the development of novel computational and information science methods and we have promoted the vision of knowledge integration by developing brain gene ontology (BGO) system. With the integrative use of several bioinformatics methods, this research has indeed resulted in modelling of such knowledge that has not been revealed in system biology so far. There are many discoveries made during my study and some of the findings are briefly mentioned as follows: (1) in relation to leukaemia disease we have discovered a new gene “TCF-1” that interacts with the “telomerase” gene. (2) With respect to yeast cell cycle analysis, we hypothesize that exoglucanase gene “exg1” is now implicated to be tied with “MCB cluster regulation” and a “mannosidase” with “histone linked mannoses”. A new quantitative prediction is that the time delay of the interaction between two genes seems to be approximately 30 minutes, or 0.17 cell cycles. Next, Cdc22, Suc22 and Mrc1 genes were discovered that interacts with each other as the potential candidates in controlling the Ribonucleotide reductase (RNR) activity. (3) Upon studying the phenomenon of Long Term Potentiation (LTP) it was found that the transcription factors, responsible for regulation of gene expression, begin to be elevated as soon as 30 min after induction of LTP, and remain elevated up to 2 hours. (4) Human microRNA data investigation resulted in the successful identification of two miRNA families i.e. let-7 and mir-30. (5) When we analysed the CNS cancer data, a set of 10 genes (HMG-I(Y), NBL1, UBPY, Dynein, APC, TARBP2, hPGT, LTC4S, NTRK3, and Gps2) was found to give 85% correct prediction on drug response. (6) Upon studying the AMPA, GABRA and NMDA receptors we hypothesize that phenylalanine (F at position 269) and leucine (L at position 353) in these receptors play the role of a binding centre for their interaction with several other genes/proteins such as c-jun, mGluR3, Jerky, BDNF, FGF-2, IGF-1, GALR1, NOS and S100beta. All the developed methods that we have used to discover above mentioned findings are very generic and can be easily applied on any dataset with some constraints. We believe that this research has established the significant fact that integrative use of various computational intelligence methods is critical to reveal new aspects of the problem and finally knowledge integration is also a must. During this coursework, I have significantly published this research in reputed international journals, presented results in several conferences and also produced book chapters.