Non-stationary RL

Published:

Non-Stationary RL

Usually we consider optimizing an objective under a stationary MDP with a fixed transition and reward function. We can learn the optimal policy through policy evaluation and policy improvement steps. However, in a constantly-changing environment where the transition kernal and the reward functions may be unkown, it’s crucial for our learners to adapt itself to the environment through interaction and sampling.

Existing Literature On Continual Learning

  1. Towards Continual Reinforcement Learning: A Review and Perspectives
  2. A Comprehensive Survey of Continual Learning: Theory, Method and Application
  3. Reinforcement learning algorithm for non-stationary environments

Now I’m trying to understand the big picture of the field in Non-stationary RL. Here is the collection of papers I’m planning to read.