Entropy-Based Optimization Strategies for Convolutional Neural Networks
Gowdra, Nidhi Nijagunappa
MetadataShow full metadata
Deep convolutional neural networks are state-of-the-art for image classification and significant strides have been made to improve neural network model performance which can now even outperform human-level abilities. However, these gains have been achieved through increased model depths and rigorous specialized manual fine-tuning of model HyperParameters (HPs). These strategies cause considerable over-parameterization and elevated complexity in Convolutional Neural Network (CNN) model training. Training over-parameterized CNN models tend to induce afflictions like overfitting, increased sensitivity to noise and decreased generalization ability which contribute to deterioration of model performance. Furthermore, training over-parameterized CNN models require specialized regimes and vast computing power subsequently increasing the complexity and difficulty of training. In this thesis, we develop several novel entropy-based techniques to abate the effects of over-parameterization, reduce the number of manually tuned HPs, increase generalization ability and, enhance performance of CNN models. Specifically, we examine information propagation and feature extraction/generation \& representation in CNNs. Armed with this knowledge, we develop a heuristic and several optimization strategies to simplify model training and improve model performance by addressing the problem of over-parameterization in CNNs. We cultivate the techniques in this thesis utilizing quantitative metrics such as Shannon's Entropy (SE), Maximum Entropy (ME) and Signal-to-Noise (SNR) ratio. Our methodology involves a multi-faceted approach of incorporating iterative and continuous integration of quantitatively defined feedback loops which allows us to test numerous research hypotheses efficiently using the design science research framework. We start off by exploring and understanding the hierarchical feature extraction \& representational capabilities of CNNs. Through our experimentation we were able to explore the sparsity of feature representations and analyze the underlying learning mechanisms in CNNs for non-convex optimization problems such as image classification. Equipped with this knowledge, we were able to experimentally demonstrate and validate the notion that for low and high quality input data (determined through ME and SNR measures) using deeper and shallower networks could lead to the phenomena of information underflow and overflow respectively, degrading classification performance. To mitigate the negative effects of information underflow and overflow in the context of kernel saturation, we propose and evaluate a novel hypothesis of augmenting the data distribution of the input dataset with negative images. Our experimental results generated a classification accuracy increase of 3%-7% on various datasets. One of the limitations argued against the validity of our novel augmentation was model training time, in particular, models require large amounts of computing power and time to train. In order to address these criticisms, we theorize a SE-based heuristic to resolve the problem of over-parameterization by forcing feature abstractions in the convolutional layers up to its theoretic limit as defined by their SE measure. The SE-based model trained 45.22% faster without compromising classification accuracy when compared to deeper models. Further arguments were posed relating to model training afflictions such as overfitting and generalizability. To mitigate the speculations raised around model training afflictions such as overfitting and generalizability in deep CNN models, we introduce a Maximum Entropy-based Learning raTE enhanceR (MELTER), to dynamically schedule and adapt model learning during training, and a Maximum Categorical Cross-Entropy (MCCE) loss function derived from the commonly used Categorical Cross-Entropy (CCE) loss function, to reduce model overfitting. MELTER and MCCE utilize a priori knowledge of the input data to curtail a few risks encountered during model training which affect performance such as, sensitivity to random noise, overfitting to the training data and lack of generalizability to new unseen data. To this extent, MELTER outperforms manually tuned models by 2%-6% on various benchmarking datasets by exploring a larger solution space. MCCE-trained models showed a reduction in overfitting by up to 5.4% and outperform Categorical Cross-Entropy (CCE) trained models in terms of classification accuracy by up to 6.17% on two facial (ethnicity) recognition datasets, colorFERET and UTKFace, along with standard benchmarking datasets such as CIFAR-10 and MNIST. Through these series of experiments, we can conclude that, entropy-based optimization strategies for tuning HPs of deep learning models are viable and either maintain or outperform baseline classification accuracies achieved by networks trained using traditional methods. Furthermore, the entropy-based optimization methods outlined in this thesis also mitigate several well-known training afflictions such as overfitting, lack of generalizability and rate of convergence while eliminating manual fine-tuning of HPs.