Deep Learning Methods for Human Action Recognition

Yu, Zeqi
Yan, Wei Qi
Item type
Degree name
Master of Computer and Information Sciences
Journal Title
Journal ISSN
Volume Title
Auckland University of Technology

Human action recognition from digital videos is a hot topic in the field of computer vision. It has a pretty assortment of applications in a myriad of fields such as video surveillance, human-computer interaction, visual information retrieval, and unmanned driving. With the exponential growth of surveillance data on the Internet in recent years, how to implement effective and efficient analysis of video data is extremely crucial. Traditional machine learning methods that only extract computable features have limitations and do not suit massive visual data, meanwhile, deep learning methods, especially convolutional neural networks, have gained great attainments in this field. The goal of human action recognition is to classify patterns so as to understand human actions from visual data and export corresponding tags. In addition to spatial correlation existing in 2D images, human actions in a video on the correlation in the temporal domain. Due to the complexity of human actions, changes of perspectives, background noises, and lighting conditions will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this thesis. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D-CNN are harnessed to identify human actions. Each algorithm is explicated and analyzed in detail. HMDB-51 dataset is employed to test these algorithms and gain the best results. Our experimental results demonstrate that the three methods have effectively identified human actions in given videos, the best algorithm thus is verified.

Human action recognition , Convolutional neural network , Deep learning , LSTM , 3D-CNN , Two-Stream CNN
Publisher's version
Rights statement