Human Action Recognition from Digital Videos Based on Deep Learning Methods
Surveillance today is widely used in public safety and security. Additionally, more and more people have their own smartphones, the advancement has led to a large amount of video data being generated every day. Recognizing human actions from digital videos can not only contribute to the public safety, but also transform a large amount of video data into usable information. Therefore, in this thesis, we propose a YOLOv7-based model that utilizes various attention mechanisms for human action recognition. In the options of attention mechanisms, we choose CBAM and SimAM attention as our main framework.
Based on these two attention mechanisms, in this thesis, we propose three models: YOLOv7+CBAM, YOLOv7+SimAM, and YOLOv7+CBAM+SimAM. The three models are able to recognize five human actions (i.e., clapping, punching, walking, waving, running). In addition, the dataset in this thesis is to select suitable data samples from six public datasets, we acquire these data samples that can be employed for YOLOv7 training and testing.
Finally, through this dataset, YOLOv7 using the attention mechanism improves the accuracy by 7% over the base model. After the experiment, the accuracy of YOLOv7+CBAM+SimAM model is the highest one, which is up to 99.6%. The computing speed has also been improved, which takes 295ms to process one video frame on average.