I follow RL from the sides (I have dabbled with it myself), and have seen some of the cool videos the article also lists. I think one of the key points (and a bit of a personal nitpick) the article makes is this:
> Thus far, every attempt at training a Trackmania-playing program has trained the program on one map at a time. As a result, no matter how well the network did on one track, it would have to be retrained - probably significantly retrained
This is a crucial aspect when talking about RL. Most of the Trackmania AI attempts focuses on a track at a time, which is not really a problem since they want to, given an individual track, outperform the best human racers.
However, it is this nuance that a lot of more business oriented users don't get when being sold on some fancy new RL project. In the real world (think self-driving cars), we typically want agents to be way more able to generalize.
Most of the RL techniques we have do rather well in these kinds of constrained environments (in a sense they eventually start overfitting on the given environment), but making them behave well in more varied environments is way harder. A lot of beginner RL tutorials also fail to make this very explicit, and will e.g. show how to train an agent to find the exit in a maze without ever trying it on a newly generated maze :).
> Thus far, every attempt at training a Trackmania-playing program has trained the program on one map at a time. As a result, no matter how well the network did on one track, it would have to be retrained - probably significantly retrained
This is a crucial aspect when talking about RL. Most of the Trackmania AI attempts focuses on a track at a time, which is not really a problem since they want to, given an individual track, outperform the best human racers.
However, it is this nuance that a lot of more business oriented users don't get when being sold on some fancy new RL project. In the real world (think self-driving cars), we typically want agents to be way more able to generalize.
Most of the RL techniques we have do rather well in these kinds of constrained environments (in a sense they eventually start overfitting on the given environment), but making them behave well in more varied environments is way harder. A lot of beginner RL tutorials also fail to make this very explicit, and will e.g. show how to train an agent to find the exit in a maze without ever trying it on a newly generated maze :).