Hitting Merge: Mentally Preparing for Your First Push to Production

#ai #machinelearning #software

I pride myself on being decently good at what I do, but more importantly, at loving it. And through my journey trying to make a career out of it, I learned that toy projects are not enough, that I need to deploy something into production.

Thankfully, this is an experience I was exposed to decently early in my career. Now, after working across multiple teams to get my first live service out the door, I’m here to share the exact mental and practical guardrails I learned along the way.

Before You Dive, You Must Prepare

Your first time diving is dangerous, you cannot just dive in head first, you need to prepare your oxygen tank(s), have your diving instructor or some professional with you, and most importantly, not going into a dangerous place for your first time.

These rules apply to your first time deploying to prod as well:

you cannot dive in head first, you need to prepare your oxygen tank(s): this means ensuring your container configurations, environment variables, dependencies, and artifacts are completely decoupled from your local machine and fully portable.
Have your diving instructor or some professional with you: Always have a senior engineer, tech lead, or manager watching over your shoulder for your first roll-out. You want an experienced teammate there who can calmly take the wheel if a catastrophic edge case appears.
And most importantly, not going into a dangerous place for your first time: Depending on your scope and the company, ensure your first deployment isn't a highly sensitive, mission-critical legacy system. Start with isolated, well-bounded services where you can learn the pipeline safely.

After the dive

After the dive, you find yourself taking in the beauty of the sight you just witnessed. It feels different, new, exciting, scary, but most importantly, alive.

Your first time seeing a feature or model that you worked on working in production, processing live traffic, will feel different, it will be this weird mix of pride and fear. When you see users use that feature for the first time and it works, you'll find yourself proudly saying "I built that btw".

Reality check: Localhost vs Host

I started with a "before you dive", and an "after you dive", but what about the diving itself?

Diving in this case means accepting a harsh reality: the calculations change completely once you leave staging.

Development/Staging vs Production

The development environment and even borderline the staging environment are meant for testing if the code works and passes tests, but when it comes to production, everything changes. Suddenly you have to take into consideration query optimizations, latency reduction, scaling, etc. But in ML specifically, this paradigm shift hits even harder. I like to call this the Cost of Intelligence.

The cost of intelligence

What this refers to is simply the fact that ML models are different than traditional deterministic software, they introduce different attack vectors, somewhat more complex deployment strategies, non-trivial explainability requirements, and a ton of guardrails around model outputs. That's not even taking into consideration observability over the model, model re-training, training policies, etc.

The cost of intelligence in this case is simply the fact that you need to make sure pre and post processing are ported and scaled correctly, the model's outputs are correct and safe, and telemetry tracks data drift in real time. These are the fundamental pillars of MLOps, but they represent a constant architectural tax on modern intelligent systems.

Real World ML

In my journey moving from experimental code into production, I found my manager always asking me two things:

What is our latency, and how do we reduce it?
How is the model performing?

It feels like those are two easy questions, and at first, I thought so too! But with time, I found out that generic metrics do not cover real-life requirements.

For instance, my first production project involved orchestrating multiple interconnected models, splitting single inputs into sub-tasks, processing them asynchronously, and aggregating the final results. Standard accuracy metrics weren't enough, I had to learn how to implement strict batch bookkeeping and design or implement domain-specific metrics that reflected true business constraints and the downstream consequences of a model's prediction.

So, my most important advice here is simply know the application of your service before deploying, and understand your true KPIs

After drying up

Experimenting in notebooks is fun, building a service around a model you created is exciting, deploying them is a chore, seeing the results is like holding your firstborn. This is the cycle I felt at least, but trust me, once you have your first ever service deployed, and you clash with real world constraints, you'll learn what tens of courses and hundreds of articles won't teach you: Acceptance, rigorous preparation, and knowing when to fight the insatiable engineer's urge to over-optimize.

Final Notes

This is just the beginning. In upcoming posts, I’ll break down the specific technical steps, pipeline tools, and architectural patterns needed to make this jump safely. But before having all of your model guardrails in, you have to get your mental guardrails in place.