A Deep Learning Approach to Multispectral Detection of UAVs
Due to the significant risks that commercial off-the-shelf drones pose domestically and operationally within a military context, this thesis investigates progression of the counter-drone domain, specifically drone detection through multispectral computer vision and machine learning.
The objectives of this thesis were to implement and demonstrate a stand-alone computer vision-based drone detection solution capable of at least near real-time detection speeds on field deployable edge devices, robust to diverse environmental conditions, and capable of accurate discrimination against multiple similar objects such as birds, airplanes, and helicopters. A secondary aim has been to deliver a key end-to-end methodology in developing such a vision-based solution, including careful curation of diverse datasets required for well generalized deep learning models.
The proposed methodology consists of four phases, covering dataset design, deep learning model development, ensemble development, and evaluation of models and the complete system. Through iterative evaluation, the dataset design phase included data collection and examination, careful cleaning and preprocessing, and dataset partitioning. Model development consisted of iterative fine tuning, optimization, and evaluation of multiple object detection models and secondary classifiers in order to explore the best combination of models and datasets to use within an ensemble aggregation framework. Ensemble development therefore consisted of investigating ensemble methods and customizing and fine tuning an ensemble aggregation which fuses multimodal models. Finally, individual models and the complete system are evaluated against standardized accuracy metrics, as well as against diverse environments and lighting conditions in order to highlight the robust nature of the proposed system, whilst highlighting key issues specific to drone detection in each environment.
The proposed system employs the YOLOv7 architecture as the base detector, whilst employing up to three custom VGG-inspired models as secondary classifiers. Each model is trained on varying datasets covering drone, bird, airplane, and helicopter classes in both the visible and long wave infrared spectrums. A four-step ensemble aggregation fuses the predictions of all models and stabilizes the results through a process of weighted and non-weighted voting and averaging, customized to the abilities of independent models. The models are primarily employed to act on a combined visible and thermal picture-in-picture mode, comparing object predictions against one another in both spectrum regions of interest. A reduced ensemble aggregation mode is also demonstrated to be effective on single spectrum video feeds, including the thermal infrared feed under complete darkness, without requiring model adjustments.
This process has demonstrated the requirements and subsequent benefits to producing well curated datasets, the benefits of fusing data from multiple spectrums to increase object detectability, and the benefits of employing an ensemble aggregation with multiple models in order to improve object prediction results. Finally, this thesis has demonstrated that while single sensor systems are less capable than multi-sensor systems, an accurate field deployable solution can be developed for a relatively small financial cost.