WCSE 2022 Spring
ISBN: 978-981-18-5852-9 DOI: 10.18178/wcse.2022.04.140

Safe Reinforcement Learning through Hierarchical Shielding with Self-Adaptive Techniques

Prasanth Senthilvelan, Jialong Li, Kenji Tei

Abstract— This paper introduces a method for safe reinforcement learning that combines our hierarchical shielding approach with self-adaptive approaches and can be applied to AI systems or machines in the industries that use Reinforcement Learning (RL) while ensuring safety. A shield is a small monitor constructed from safety specification that is placed behind the RL agent. It monitors the environment and the agent's actions, and if the agent tries to do something dangerous to the environment, the shield overwrites the dangerous action with a safer one. Hierarchical shielding, on the other hand, has several levels of safety requirements, as well as different levels of shield that will be selected and changed out over the run time. We use graceful degradation and progressive enhancement to accomplish hierarchical shielding using selfadaptive approaches, such as the MAPE-K loop. We demonstrate it in this paper using a hot water storage tank as an example.

Index Terms— safe reinforcement learning, shield, mape-k loop, graceful degradation, progressive enhancement.

Prasanth Senthilvelan
Department of Computer Science and Communication Engineering Waseda University Tokyo, Japan
Jialong Li
Department of Computer Science and Communication Engineering Waseda University Tokyo, Japan
Kenji Tei
Department of Computer Science and Communication Engineering Waseda University/ National Institute of Informatics Tokyo, Japan

[Download]


Cite: Prasanth Senthilvelan, Jialong Li, Kenji Tei, " Safe Reinforcement Learning through Hierarchical Shielding with Self-Adaptive Techniques, " WCSE 2022 Spring Event: 2022 9th International Conference on Industrial Engineering and Applications, pp. 1213-1224, Sanya, China, April 15-18, 2022.