Yongchareon, SiraMalika, Malika2026-01-302026-01-302025http://hdl.handle.net/10292/20562Depression is one of the most pressing global health challenges, affecting millions of individuals and placing significant strain on healthcare systems. Early and accurate detection is critical for timely intervention and improving patient outcomes. Traditional diagnostic methods, which rely heavily on clinical interviews and self-reports, are often resource intensive, subjective, and limited in scalability. To address these limitations, this study presents an architectural investigation of Modality-Level Graph Attention (MLGA), a deep learning framework for multimodal fusion in depression detection. The proposed architecture integrates textual embeddings from ClinicalBERT, visual representations from VGG-PCA, and facial behavioral descriptors from OpenFace. These modalities are fused through a modality-level Graph Attention Network (GAT) that explicitly models inter-modality relationships, while a temporal module captures dynamic behavioral patterns over time. To enhance robustness, the framework incorporates Gaussian noise injection, L2 normalization, and modality dropout, thereby encouraging resilience to noise and missing inputs. A comprehensive evaluation was conducted on three benchmark datasets: E-DAIC-WOZ, EATD-Corpus, and D-Vlog. Across these datasets, MLGA achieved competitive performance and, on the E-DAIC benchmark in particular, surpassed several unimodal and late-fusion baselines in terms of precision, recall, F1-score, and ROC-AUC, demonstrating the effectiveness of graph-based multimodal integration under the studied conditions. The results highlight the importance of modeling both intra-modality features and cross-modality dependencies within a unified fusion architecture. Rather than proposing a fully deployable clinical tool, this study advances the field of affective computing by systematically analysing a modality-level graph attention design that is computationally moderate and interpretable, and by quantifying its behavior across heterogeneous datasets. Future directions include expanding modality coverage, applying domain adaptation for cross-cultural and cross setting generalization, and enhancing interpretability using advanced explainable AI techniques.enMultimodal Depression DetectionClinicalBERTVGGOpenFaceGraph Attention NetworksTemporal ModelingExplainable AIMLGA - A Modality-Level Graph Attention Architecture for Multimodal Depression DetectionThesisOpenAccess