Repository logo
 

Video Understanding with Attention Encoder and Multimodal Large Language Model

aut.embargoNo
aut.thirdpc.containsNo
dc.contributor.advisorYan, Wei Qi
dc.contributor.authorZheng, Anni
dc.date.accessioned2025-09-15T19:40:09Z
dc.date.available2025-09-15T19:40:09Z
dc.date.issued2025
dc.description.abstractThe challenge of achieving robust video understanding has become increasingly significant with the emergence of Multimodal Large Language Models (MLLMs). While MLLMs have demonstrated significant promise, effectively capturing and reasoning about complex temporal dynamics and object-level interactions in videos remains an active area of research. This project introduces a novel framework designed to enhance video understanding capabilities. We propose a new model architecture featuring a Temporal Context Gated Attention (TCGA) encoder layer, combined with a fine-tuned MLLM, demonstrates improved performance in video event retrieval and understanding tasks. Furthermore, we present the design and implementation of a real-time system application built upon our proposed model. This work aims to contribute a specialized video processing module and system design insights, offering a valuable step towards more sophisticated and applicable video understanding within MLLMs. We hope our findings provide a foundation for future research in temporal-aware multimodal learning.
dc.identifier.urihttp://hdl.handle.net/10292/19800
dc.language.isoen
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.titleVideo Understanding with Attention Encoder and Multimodal Large Language Model
dc.typeThesis
thesis.degree.grantorAuckland University of Technology
thesis.degree.nameMaster of Computer and Information Sciences

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZhengA.pdf
Size:
8.15 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
890 B
Format:
Item-specific license agreed upon to submission
Description:

Collections