my RL model just found another annoying reward hack.
It's a combat game (toribash style). When it win by score, it behead itself to end the match, and because it's an edge case i didn't predict it lose the match (which it doesn't care about) but still get the reward (which is all the model care about).
And the code suck because I tried to make it with MiniMax 2.7 and the nerfed Claude Opus have trouble fixing it.
Top comments (0)