DEV Community

Laurent Laborde
Laurent Laborde

Posted on

Yet another Reward Hack ...

my RL model just found another annoying reward hack.

It's a combat game (toribash style). When it win by score, it behead itself to end the match, and because it's an edge case i didn't predict it lose the match (which it doesn't care about) but still get the reward (which is all the model care about).

And the code suck because I tried to make it with MiniMax 2.7 and the nerfed Claude Opus have trouble fixing it.

Top comments (0)