How do you propose that punishment make the AIs "get better"? From their perspec...

maleldil · on Feb 4, 2025

Reinforcement Learning can train a model based on some reward function. The suggestion is that real-world accountability could be translated into such a reward function.

Also, OP explicitly mentioned "online learning", which is a continuous training process after standard pre-training.

For what it's worth, I don't think this would work. Rewards would come in too sporadically to be useful.