Repository logo
 

MLGA - A Modality-Level Graph Attention Architecture for Multimodal Depression Detection

aut.embargoNo
aut.thirdpc.containsNo
dc.contributor.advisorYongchareon, Sira
dc.contributor.authorMalika, Malika
dc.date.accessioned2026-01-30T01:46:57Z
dc.date.available2026-01-30T01:46:57Z
dc.date.issued2025
dc.description.abstractDepression is one of the most pressing global health challenges, affecting millions of individuals and placing significant strain on healthcare systems. Early and accurate detection is critical for timely intervention and improving patient outcomes. Traditional diagnostic methods, which rely heavily on clinical interviews and self-reports, are often resource intensive, subjective, and limited in scalability. To address these limitations, this study presents an architectural investigation of Modality-Level Graph Attention (MLGA), a deep learning framework for multimodal fusion in depression detection. The proposed architecture integrates textual embeddings from ClinicalBERT, visual representations from VGG-PCA, and facial behavioral descriptors from OpenFace. These modalities are fused through a modality-level Graph Attention Network (GAT) that explicitly models inter-modality relationships, while a temporal module captures dynamic behavioral patterns over time. To enhance robustness, the framework incorporates Gaussian noise injection, L2 normalization, and modality dropout, thereby encouraging resilience to noise and missing inputs. A comprehensive evaluation was conducted on three benchmark datasets: E-DAIC-WOZ, EATD-Corpus, and D-Vlog. Across these datasets, MLGA achieved competitive performance and, on the E-DAIC benchmark in particular, surpassed several unimodal and late-fusion baselines in terms of precision, recall, F1-score, and ROC-AUC, demonstrating the effectiveness of graph-based multimodal integration under the studied conditions. The results highlight the importance of modeling both intra-modality features and cross-modality dependencies within a unified fusion architecture. Rather than proposing a fully deployable clinical tool, this study advances the field of affective computing by systematically analysing a modality-level graph attention design that is computationally moderate and interpretable, and by quantifying its behavior across heterogeneous datasets. Future directions include expanding modality coverage, applying domain adaptation for cross-cultural and cross setting generalization, and enhancing interpretability using advanced explainable AI techniques.
dc.identifier.urihttp://hdl.handle.net/10292/20562
dc.language.isoen
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.subjectMultimodal Depression Detection
dc.subjectClinicalBERT
dc.subjectVGG
dc.subjectOpenFace
dc.subjectGraph Attention Networks
dc.subjectTemporal Modeling
dc.subjectExplainable AI
dc.titleMLGA - A Modality-Level Graph Attention Architecture for Multimodal Depression Detection
dc.typeThesis
thesis.degree.grantorAuckland University of Technology
thesis.degree.nameMaster of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Malika.pdf
Size:
11.82 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
890 B
Format:
Item-specific license agreed upon to submission
Description:

Collections