Why Your Generic CartPole Project Won't Land You an Interview
Every RL portfolio on GitHub looks the same: CartPole, LunarLander, maybe Atari if they're ambitious. I reviewed 30+ candidate portfolios last year, and exactly zero stood out. The ones that did? They built custom environments that solved actual problems.
Here's the thing: interviewers don't care that you can run stable_baselines3.PPO on a pre-built environment. They want to see that you understand the MDP formulation deeply enough to model a new problem from scratch. That's the signal that separates "followed a tutorial" from "can actually do RL work."
I'm going to walk through building two different custom environments for the same problem domain—inventory management—and show you exactly where each approach falls apart. One uses a naive reward structure, the other uses shaped rewards. The difference in training behavior is dramatic, and understanding why is what makes this portfolio-worthy.
The Problem: Inventory Management as an MDP
Continue reading the full article on TildAlice

Top comments (0)