Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't the general method of prevention just to train a bigger model for longer, so that all these niche edge case exploits get found and addressed during policy exploration?

If there's an exploit that's sufficiently rare and unpredictable, then that seems like the only way (and indeed it should be a sufficient way, if done right) to address it.



That is the obvious answer, but I have no idea if it's true in practice.


It worked in adversarial board games (Go, Chess) and poker. We now have unexploitable bots for these games.

It hasn't worked yet in Starcraft because the strategy space is so much larger and the action space is also much larger. The networks are too small relative to this space, and humans can still put the bot into a situation it can't handle.

I'm going to guess that Starcraft will end up like these other games once the hardware etc advances another 5-10 years, and we'll have an unexploitable bot. The main reason I'm thinking this is we have unlimited training data, unlike with self-driving. We can make the models arbitrarily good.

The bot still won't have an ounce of common sense beyond what it's trained to do. It's just that it will have been so exhaustively exposed to every nook of the search space that a human won't be able to find any exploits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: