ISBN: 978-981-18-5852-9 DOI: 10.18178/wcse.2022.04.140
Safe Reinforcement Learning through Hierarchical Shielding with Self-Adaptive Techniques
Abstract— This paper introduces a method for safe reinforcement learning that combines our hierarchical
shielding approach with self-adaptive approaches and can be applied to AI systems or machines in the
industries that use Reinforcement Learning (RL) while ensuring safety. A shield is a small monitor
constructed from safety specification that is placed behind the RL agent. It monitors the environment and the
agent's actions, and if the agent tries to do something dangerous to the environment, the shield overwrites the
dangerous action with a safer one. Hierarchical shielding, on the other hand, has several levels of safety
requirements, as well as different levels of shield that will be selected and changed out over the run time. We
use graceful degradation and progressive enhancement to accomplish hierarchical shielding using selfadaptive
approaches, such as the MAPE-K loop. We demonstrate it in this paper using a hot water storage
tank as an example.
Index Terms— safe reinforcement learning, shield, mape-k loop, graceful degradation, progressive
enhancement.
Prasanth Senthilvelan
Department of Computer Science and Communication Engineering Waseda University Tokyo, Japan
Jialong Li
Department of Computer Science and Communication Engineering Waseda University Tokyo, Japan
Kenji Tei
Department of Computer Science and Communication Engineering Waseda University/ National Institute of Informatics Tokyo, Japan
Cite: Prasanth Senthilvelan, Jialong Li, Kenji Tei, " Safe Reinforcement Learning through Hierarchical Shielding with Self-Adaptive Techniques, " WCSE 2022 Spring Event: 2022 9th International Conference on Industrial Engineering and Applications, pp. 1213-1224, Sanya, China, April 15-18, 2022.