Non-stationary RL

Published: September 29, 2024

Non-Stationary RL

Usually we consider optimizing an objective under a stationary MDP with a fixed transition and reward function. We can learn the optimal policy through policy evaluation and policy improvement steps. However, in a constantly-changing environment where the transition kernal and the reward functions may be unkown, it’s crucial for our learners to adapt itself to the environment through interaction and sampling.

Existing Literature On Continual Learning

Now I’m trying to understand the big picture of the field in Non-stationary RL. Here is the collection of papers I’m planning to read.

Black Box Multi-Agent System
Memory-Based Meta-Learning
Debiased Offline Representation Learning
Adaptive Deep RL for Piecewise Context
Factored Adaptation
Goal Oriented Shortest Path
Counterfactual Off-Policy
Inverse Online Learning
RestartQ-UCB
Sliding Window Upper-Confidence
Optimizing for the Future
Safe Policy Improvement
Dynamic Regret
Usually, the numerical experiemnts are tested on the Grid World and the MuJoCo environments to show the efficiency of the algorithm.

In general, I think there are three ways to deal with the non-stationary environements.

Learn task representations through encoder-decoder architectures. By representing a latent variable as a new input to the reward function and the transition kernel, we can account for the non-stationarity. And finally optimize the model by both optimizing the policy part and the probablistic model.
Learn good initialization for a new episode, which is essentially a similar idea to meta-learning where we need to learn a meta-policy across different tasks.
Context detection. Sometimes the context length may be stochastic, we need to collect the data or trajectories only related to the current context.
Knowledge distillation or policy consolidation. To prevent the castrophic forgetting, some work froze the past policy and use them as the input to learn the new environment.

Share on

Twitter Facebook LinkedIn

Junyu Guo