DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

Custom Gymnasium Environment: Portfolio Project Guide

Why Your Generic CartPole Project Won't Land You an Interview

Every RL portfolio on GitHub looks the same: CartPole, LunarLander, maybe Atari if they're ambitious. I reviewed 30+ candidate portfolios last year, and exactly zero stood out. The ones that did? They built custom environments that solved actual problems.

Here's the thing: interviewers don't care that you can run stable_baselines3.PPO on a pre-built environment. They want to see that you understand the MDP formulation deeply enough to model a new problem from scratch. That's the signal that separates "followed a tutorial" from "can actually do RL work."

I'm going to walk through building two different custom environments for the same problem domain—inventory management—and show you exactly where each approach falls apart. One uses a naive reward structure, the other uses shaped rewards. The difference in training behavior is dramatic, and understanding why is what makes this portfolio-worthy.

A smiling girl holds a colorful vial, exploring a science experiment in a playful setting.

Photo by Mikhail Nilov on Pexels

The Problem: Inventory Management as an MDP


Continue reading the full article on TildAlice

Top comments (0)