Incremental nonparametric discriminant analysis based active learning and its applications
MetadataShow full metadata
Learning is one such innate general cognitive ability which has empowered the living animate entities and especially humans with intelligence. It is obtained by acquiring new knowledge and skills that enable them to adapt and survive. With the advancement of technology, a large amount of information gets amassed. Due to the sheer volume of increasing information, its analysis is humanly unfeasible and impractical. Therefore, for the analysis of massive data we need machines (such as computers) with the ability to learn and evolve in order to discover new knowledge from the analysed data. The majority of the traditional machine learning algorithms function optimally on a parametric (static) data. However, the datasets acquired in real practices are often vast, inaccurate, inconsistent, non-parametric and highly volatile. Therefore, the learning algorithms’ optimized performance can only be transitory, thus requiring a learning algorithm that can constantly evolve and adapt according to the data it processes. In light of a need for such machine learning algorithm, we look for the inspiration in humans’ innate cognitive learning ability. Active learning is one such biologically inspired model, designed to mimic humans’ dynamic, evolving, adaptive and intelligent cognitive learning ability. Active learning is a class of learning algorithms that aim to create an accurate classifier by iteratively selecting essentially important unlabeled data points by the means of adaptive querying and training the classifier on those data points which are potentially useful for the targeted learning task (Tong & Koller, 2002). The traditional active learning techniques are implemented under supervised or semi-supervised learning settings (Pang et al., 2009). Our proposed model performs the active learning in an unsupervised setting by introducing a discriminative selective sampling criterion, which reduces the computational cost by substantially decreasing the number of irrelevant instances to be learned by the classifier. The methods based on passive learning (which assumes the entire dataset for training is truly informative and is presented in advance) prove to be inadequate in a real world application (Pang et al., 2009). To overcome this limitation, we have developed Active Mode Incremental Nonparametric Discriminant Analysis (aIncNDA) which undertakes adaptive discriminant selection of the instances for an incremental NDA learning. NDA is a discriminant analysis method that has been incorporated in our selective sampling technique in order to reduce the effects of the outliers (which are anomalous observations/data points in a dataset). It works with significant efficiency on the anomalous datasets, thereby minimizing the computational cost (Raducanu & Vitri´a, 2008). NDA is one of the methods used in the proposed active learning model. This thesis presents the research on a discrimination-based active learning where NDA is extended for fast discrimination analysis and data sampling. In addition to NDA, a base classifier (such as Support Vector Machine (SVM) and k-Nearest Neighbor (k-NN)) is applied to discover and merge the knowledge from the newly acquired data. The performance of our proposed method is evaluated against benchmark University of California, Irvine (UCI) datasets, face image, and object image category datasets. The assessment that was carried out on the UCI datasets showed that Active Mode Incremental NDA (aIncNDA) performs at par and in many cases better than the incremental NDA with a lower number of instances. Additionally, aIncNDA also performs efficiently under the different levels of redundancy, but has an improved discrimination performance more often than a passive incremental NDA. In an application that undertakes the face image and object image recognition and retrieval task, it can be seen that the proposed multi-example active learning system dynamically and incrementally learns from the newly obtained images, thereby gradually reducing its retrieval (classification) error rate by the means of iterative refinement. The results of the empirical investigation show that our proposed active learning model can be used for classification with increased efficiency. Furthermore, given the nature of network data which is large, streaming, and constantly changing, we believe that our method can find practical application in the field of Internet security.