Yet another Reward Hack ...

#ai #machinelearning

my RL model just found another annoying reward hack.

It's a combat game (toribash style). When it win by score, it behead itself to end the match, and because it's an edge case i didn't predict it lose the match (which it doesn't care about) but still get the reward (which is all the model care about).

And the code suck because I tried to make it with MiniMax 2.7 and the nerfed Claude Opus have trouble fixing it.

DEV Community

Yet another Reward Hack ...

Top comments (0)