Crime Prediction from Digital Videos Using Deep Learning
In the surge of intelligent surveillance, surveillance alarming has been demanded never before, which makes people aware of their ordinary security. Traditional surveillance models are relatively simple, which cannot be applied to detect real-time crimes automatically and undoubtedly waste social resources. Intelligent monitoring approaches make use of pattern recognition and machine learning to analyze and tackle video footages. If abnormal behaviors are detected, an alarm will be triggered timely. In this thesis, we train our models based on Spatial and Temporal Graph Convolutional Networks (ST-GCN) as well as Temporal Relational Networks (TRN) for detection and recognition of human behaviors from digital videos. The skeleton sequences of human behaviors are extracted from surveillance videos. The risk level is determined by setting the corresponding thresholds. The TRN networks are compared with the ST-GCN which are based on optical flow to combine with the temporal relationship of video frames. The main purpose of this TRN method is to extract a spate of frames from the given videos in a random way. The results show that human behavior recognition method based on skeleton and optical flow outperforms than other algorithms in deep learning. For the identification of human dangerous behaviors, in this thesis, we train model based on spatial temporal graph convolutional neural networks and time series relational networks, respectively, for the detection and identification of human criminal behaviors. The key to the recognition method based on a spatial temporal graph convolutional neural network is the extraction of the human skeleton. Taken into consideration of the skeleton sequence of human behavior, the skeleton of each frame contains 18 joint points of human skeleton and the estimated confidence value of the skeleton of each frame. According to the obtained skeleton feature information, combined with the time vector in the skeleton sequence, a spatial temporal map is established. The network model classifies the criminal behavior and determines the criminal behavior of the behavior by setting the corresponding threshold. Compared with the spatial temporal graph convolutional network, the time-series relational network uses different feature information. The relational network establishes a time-series relational network model based on the human optical flow information and the relational reasoning of video frames. The key to the identification method based on a temporal relation network is to extract several frames of input temporal relation network from the videos in an order or randomly. The experimental results based on the collected dataset show that the recognition result is better than the single feature algorithm, the recognition accuracy is higher, and the robustness is better. The network equipped with the time series relationship module effectively improves the recognition accuracy in the detection and recognition of criminal behavior. In this thesis, we take use of a variety of methods to conduct comparative experiments, the results show that the recognition method based on skeleton and optical flow features is significantly better than the manual feature extraction algorithm.