Non-crossing Quantile Regression for Deep Reinforcement Learning-湖北国家应用数学中心

Non-crossing Quantile Regression for Deep Reinforcement Learning

2022年04月28日 14:12

报告题目：Non-crossing Quantile Regression for Deep Reinforcement Learning

报告时间：2022-04-08 14:30 - 15:10

报告人：冯兴东教授上海财经大学

腾讯会议ID：957-794-776

报告入口：https://meeting.tencent.com/dm/5BKN9b4u1B3v

Abstract：Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs. However, batch-based DRL algorithms cannot guarantee the non-decreasing property of learned quantile curves especially at the early training stage, leading to abnormal distribution estimates and reduced model interpretability. To address these issues, we introduce a general DRL framework by using non-crossing quantile regression to ensure the monotonicity constraint within each sampled batch, which can be incorporated with some well-known DRL algorithm. We demonstrate the validity of our method from both the theory and model implementation perspectives. Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modification can significantly outperform their baselines in terms of faster convergence speeds and better testing performance. In particular, our method can effectively recover the distribution information and thus dramatically increase the exploration efficiency when the reward space is extremely sparse.

演讲者	冯兴东（上海财经大学）	地址	腾讯会议
会议时间	2022-04-08	时间段	2022-04-08 14:30 - 15:10