Link: arxiv (probably)
We can create a model that will formulate its actions as readable rules with premise and conclusion. Maybe even written in natural language.
So then the gameplay will be understandable (here, we are talking about the snake environment, for example). And in comparison, any DL approach will look like a sequence of steps chosen by probabilities that don’t make any sense.
Also it will be able to construct abstract concepts from the gameplay.
And the learning process is much shorter. Probably no replay buffer is required.