MLGA - A Modality-Level Graph Attention Architecture for Multimodal Depression Detection

Malika, Malika

MLGA - A Modality-Level Graph Attention Architecture for Multimodal Depression Detection

aut.embargo	No
aut.thirdpc.contains	No
dc.contributor.advisor	Yongchareon, Sira
dc.contributor.author	Malika, Malika
dc.date.accessioned	2026-01-30T01:46:57Z
dc.date.available	2026-01-30T01:46:57Z
dc.date.issued	2025
dc.description.abstract	Depression is one of the most pressing global health challenges, affecting millions of individuals and placing significant strain on healthcare systems. Early and accurate detection is critical for timely intervention and improving patient outcomes. Traditional diagnostic methods, which rely heavily on clinical interviews and self-reports, are often resource intensive, subjective, and limited in scalability. To address these limitations, this study presents an architectural investigation of Modality-Level Graph Attention (MLGA), a deep learning framework for multimodal fusion in depression detection. The proposed architecture integrates textual embeddings from ClinicalBERT, visual representations from VGG-PCA, and facial behavioral descriptors from OpenFace. These modalities are fused through a modality-level Graph Attention Network (GAT) that explicitly models inter-modality relationships, while a temporal module captures dynamic behavioral patterns over time. To enhance robustness, the framework incorporates Gaussian noise injection, L2 normalization, and modality dropout, thereby encouraging resilience to noise and missing inputs. A comprehensive evaluation was conducted on three benchmark datasets: E-DAIC-WOZ, EATD-Corpus, and D-Vlog. Across these datasets, MLGA achieved competitive performance and, on the E-DAIC benchmark in particular, surpassed several unimodal and late-fusion baselines in terms of precision, recall, F1-score, and ROC-AUC, demonstrating the effectiveness of graph-based multimodal integration under the studied conditions. The results highlight the importance of modeling both intra-modality features and cross-modality dependencies within a unified fusion architecture. Rather than proposing a fully deployable clinical tool, this study advances the field of affective computing by systematically analysing a modality-level graph attention design that is computationally moderate and interpretable, and by quantifying its behavior across heterogeneous datasets. Future directions include expanding modality coverage, applying domain adaptation for cross-cultural and cross setting generalization, and enhancing interpretability using advanced explainable AI techniques.
dc.identifier.uri	http://hdl.handle.net/10292/20562
dc.language.iso	en
dc.publisher	Auckland University of Technology
dc.rights.accessrights	OpenAccess
dc.subject	Multimodal Depression Detection
dc.subject	ClinicalBERT
dc.subject	VGG
dc.subject	OpenFace
dc.subject	Graph Attention Networks
dc.subject	Temporal Modeling
dc.subject	Explainable AI
dc.title	MLGA - A Modality-Level Graph Attention Architecture for Multimodal Depression Detection
dc.type	Thesis
thesis.degree.grantor	Auckland University of Technology
thesis.degree.name	Master of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Malika.pdf
Size:: 11.82 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 890 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses