Pinto, AndreaHerrera, Luis-CarlosDonoso, YezidGutierrez, Jairo A2026-03-302026-03-302026-03-30Discover Computing, ISSN: 2948-2992 (Online), Springer Science and Business Media LLC, 29(1). doi: 10.1007/s10791-026-10064-62948-2992http://hdl.handle.net/10292/20831In critical infrastructure, the convergence of physical systems with digital networks forms complex Cyber-Physical Systems (CPS), that are vulnerable to threats compromising both data and physical operations. Traditional security systems, often focused solely on network traffic, create a significant security gap by neglecting the rich contextual data provided by physical sensors. To address this issue, the paper introduces a novel unsupervised multimodal framework that synthesizes data from these dual sources for holistic anomaly detection. The proposed architecture combines pre-trained Variational Autoencoder-Long Short-Term Memory (VAE-LSTM) networks to model temporal dependencies with a dual cross-attention mechanism for deep fusion of latent representations. To enhance the detection of subtle, low-observability threats, the model is further regularized through adversarial training using a discriminator that distinguishes between original and reconstructed data. Evaluated on the comprehensive SWAT dataset, the model successfully identifies 24 out of 26 relevant attack scenarios using 10-second time sequences and achieves an Area Under the Curve (AUC) of 0.87, outperforming unimodal benchmarks. This work validates the critical importance of deep data fusion and presents a more resilient, context-aware defense mechanism for modern CPS.Open Access. This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.Unsupervised learningCybersecurityCritical infrastructureAnomaly detectionMultimodalCyber-Physical Anomaly Detection a Deep Adversarial Fusion of Sensor and Network DataJournal ArticleOpenAccess10.1007/s10791-026-10064-6