Beaver escaping from prison

Reward id not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

Published:
Published:

First I've seen it on Twitter and I was intrigued by simple and cartoonish environment. It has simple representation to set its main focus on switching objectives. After some time I was implementing Reinforcement Learning algorithms from scratch in Rust and the time of snake game and DQN came. And while I was trying to find a way to train the network to get more than one apple, I remembered about the beaver environment where the agent had more than one objective and supposedly it successfully achieved them.

So let's review the main article. Though it will be about a badger (I will defend myself by saying that drawings in the article are not clear about what animal is this).

Rate this page