Online Transfer Learning (OTL) for Accelerating Deep Reinforcement Learning (DRL) for Building Energy Management

Quang, Tran VanDoan, DatOnline Transfer Learning (OTL) for Accelerating Deep Reinforcement Learning (DRL) for Building Energy ManagementTaylor and Francis Group20251201 Architecture1202 Building3301 Architecture3302 BuildingMy UniversityMy University2025-06-252025-06-252025-06-02Journal ArticleJournal of Building Performance Simulation, ISSN: 1940-1493 (Print); 1940-1507 (Online), Taylor and Francis Group.1940-14931940-1507http://hdl.handle.net/10292/19363© 2025 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.http://creativecommons.org/licenses/by-nc-nd/4.0/OpenAccessBuildings account for over one-third of global energy consumption and emissions, primarily from heating and cooling operations. Intelligent optimisation through predictive controls, such as deep reinforcement learning (DRL), offers significant potential for energy efficiency. However, DRL faces challenges in generalisation and impractical retraining when applied to different buildings, limiting its scalability. Prior online transfer learning (OTL) approaches relied on simulation or rule-based methods but lacked live learning and real-time optimisation. This study proposes an OTL strategy combining autonomous simulation-based DRL policy pretraining with real-time fine-tuning for rapid adaptation to new buildings. Using the Soft Actor-Critic (SAC) algorithm, it was tested on commercial building energy management simulations. Results showed 18%+ reductions in HVAC energy consumption and 8%+ improvement in thermal comfort compared to rule-based and non-transfer DRL baselines. Empirical validation highlights OTL's potential in overcoming DRL's cold start and training burdens, paving the way for broader deployment in sustainable energy management.