David Bean

Posted on Oct 3

Why a C++ Systems Engineer is Learning Machine Learning

#devjournal #career #cpp #machinelearning

A senior systems programmer's journey into AI/ML - Week 1 reflections

The Decision

After spending over a decade building high-performance C++ systems in defense and aerospace, I've made a decision: I'm learning machine learning. Not casually browsing tutorials on weekends, but committing to a structured 12-month roadmap with one hour of focused work every single day.

Why? Because the intersection of systems engineering and ML represents one of the most valuable skill combinations in tech right now. MLOps engineers see 9.8× demand growth with salaries averaging $122k-$167k. More importantly, most ML practitioners lack deep systems knowledge, while most systems engineers don't understand ML. I'm betting that bridging this gap is worth the investment.

Why This Feels Different

I've looked at ML courses before. They all seem to follow the same pattern: install Anaconda, run some scikit-learn examples, train a model on the Iris dataset, celebrate. That's fine for getting started, but it doesn't prepare you for production systems where models fail silently, data pipelines break, and performance matters.

So I chose a different approach: a discovery-based roadmap that prioritizes production skills from day one. Instead of copying tutorial code, I solve problems by reading documentation, debugging issues independently, and building understanding through experimentation.

The mindset shift: I'm not learning to run ML models. I'm learning to build ML systems.

Week 1: Building Something Real

Most "Week 1 ML" tutorials have you print "Hello World" and maybe plot a graph. My Week 1 looked different.

I built a data quality checker. Sounds boring, right? But here's the thing - I have no idea what makes good ML data. I'm literally learning this from an AI assistant (Claude) in real-time, using a roadmap designed to make me figure things out rather than copy-paste solutions.

The framework analyzes numeric, categorical, and temporal data. It detects outliers, finds missing values, identifies data quality issues. It has 44 tests because I spent two full days just writing tests.

But honestly? I don't know if these are the right checks for ML. I'm a C++ guy who knows about memory management and thread safety. Data quality for machine learning? That's completely new territory.

Day 1: Just Make It Not Crash

First day, I wrote a function to check data quality. Coming from C++, my instinct was to write something that handles edge cases without dying.

def check_data_quality(data: List[float]) -> Dict[str, Any]:
    clean_data = [x for x in data if x is not None and not pd.isna(x)]

    if not clean_data:
        return {'error': 'no valid data points'}

    # Continue with analysis...

The AI teaching me asked: "Why not just let it crash with an error?"

Because in my world, if your distributed system crashes because someone passed it bad data, you've failed. You handle errors gracefully, you log what happened, you return something useful.

Apparently that's also important for ML pipelines. Who knew? (Everyone who does ML, probably. But I didn't.)

Day 2-3: Making It Installable

While I was setting up proper Python packaging with pyproject.toml, I kept thinking "this seems like overkill for a learning project."

But the roadmap insisted: documentation, logging, proper module structure from day one. Not because the code is complex, but because production habits need to be habitual.

Fine. I wrote docstrings. I set up logging. I made it pip installable.

Two days later when I had to debug why my tests were failing, those logs saved me 30 minutes of confusion. The docstrings reminded me what I was trying to do. Point taken.

Day 4-5: Testing Like My Career Depends On It

I spent two days writing tests. Not "does it run" tests. Real tests. Unit tests, integration tests, property-based tests using a library called Hypothesis that generates random inputs to find bugs.

Hypothesis found actual bugs I never would have caught:

Floating-point precision issues with large numbers
Numerical overflow with extreme values
CSV type conversion errors where pandas read numbers as strings

This is where my C++ background actually helped. I know what edge cases look like. I know that "works on my machine" isn't good enough. I know that systems fail in weird ways when you least expect it.

Turns out that's useful for ML too. Data is messy. Edge cases are everywhere. Tests catch problems before they break production.

Day 6-7: The "I Have No Idea" Moment

Weekend project: add temporal data analysis. Dates, timestamps, time series stuff.

I built gap detection - finding missing dates in time series data. The algorithm calculates time deltas between dates, finds the most common one, flags anything bigger as a gap.

Then Claude (the AI helping me learn (and right these blog post)) asked: "What temporal quality checks matter most for ML?"

My answer: "I really have no idea. I'm doing this whole course to find that out."

And you know what? That was the right answer.

Claude's response: "Start simple, document your assumptions, make it observable, iterate later. This is how real ML engineering works. Even senior engineers build V1 without knowing all requirements."

That was valuable. Not because I learned some ML best practice, but because I learned it's okay to not know. You build something reasonable, you see how it's used, you improve it.

This actually feels familiar. People think defense/aerospace work is all upfront specs and formal requirements. Reality? You get dropped into a mess of legacy systems, vague requirements, and contradictory stakeholder demands, then you hack your way through until something works. ML engineering sounds similar, just with different tools.

The Integer Problem

Here's a fun debugging story. I wrote a dispatcher that automatically figures out if your data is numeric, categorical, or temporal (dates/times).

Initial version routed [1, 2, 3, 4, 5] to the temporal analyzer. Why? Because pandas interprets small integers as Unix epoch days. Day 1 after Unix epoch is January 1, 1970. So [1, 2, 3, 4, 5] looked like a date sequence to pandas.

That's... not what anyone would expect.

Solution: Only test large integers (>946684800, roughly year 2000) as potential timestamps. Small integers default to numeric.

if isinstance(item, int) and item > 946684800:
    # Large integers: might be Unix timestamps
    try:
        pd.to_datetime([item], unit='s', errors='raise')
        temp_count += 1
    except:
        pass
# Small ints: skip temporal test, treat as numeric

I have no idea if this is how production ML systems handle this. But it makes sense, tests pass, and it solves the immediate problem. V2 can be smarter if needed.

What Surprised Me

Pandas is kind of amazing

Coming from C++ where you manually manage everything, pandas feels like cheating:

# Frequency distribution in one line
series.value_counts().to_dict()

# Date parsing with error handling
pd.to_datetime(series, errors='coerce')

What would be 20-30 lines of careful C++ becomes a method call. I can see why everyone uses this.

Not knowing is fine

That moment when Claude asked what ML engineers need for temporal data and I said "I have no idea" - that felt vulnerable. Like admitting I don't know what I'm doing.

But it led to the best insight of the week: nobody knows everything upfront. You build something reasonable, document your assumptions, ship it, learn from how it's used, improve it later.

That's actually freeing. I can stop trying to make perfect decisions with incomplete information and just... build something that works.

Systems thinking transfers

My C++ experience helped with:

Architecture decisions (I used the Strategy pattern without even thinking about it)
Understanding when to optimize vs. when good enough is fine
Knowing that defensive programming matters
Writing code that won't confuse me in six months

But I'm learning entirely new patterns: how pandas works, why statistical validation matters, what makes data "good" for ML (still figuring this one out).

It's weirdly complementary. Systems knowledge gives me structure. ML is teaching me to think about data differently.

What I Built (In Plain English)

The data quality framework has three analyzers:

Numeric: Checks numbers - calculates mean, standard deviation, finds outliers using a 2-sigma rule. I don't know if 2-sigma is the right threshold for ML, but it's what I learned in college and it seems reasonable.

Categorical: Checks text/category data - counts unique values, finds frequency distribution, identifies the most and least common items. Warns you if you accidentally passed it numbers.

Temporal: Checks dates/times - finds the date range, detects gaps in time series (like missing days of sensor data), tries to figure out if data is regular (daily readings) or irregular (random events).

Plus a dispatcher that looks at your data, figures out which type it probably is, and routes it to the right analyzer. Uses something called Yamane's formula for sampling so it doesn't have to look at every single item in huge datasets.

Is this what professional ML engineers use? I have literally no idea. But it works, it has tests, and it solves problems I can understand: don't let bad data silently break your stuff.

Reality Check

Here's what I don't know yet:

What actually makes data "good" for ML models
When my outlier detection would help vs. hurt
Whether these are the right data quality checks
How real ML pipelines handle this stuff
Literally anything about neural networks, transformers, or the AI stuff people talk about

Here's what I do know:

How to write code that handles errors gracefully
How to test thoroughly
How to structure projects so they don't become unmaintainable messes
How to read documentation and figure stuff out
That pandas is really handy

Week 1 taught me that systems engineering skills transfer to ML tooling, even when I don't know the ML part yet. The fundamentals are the same: handle errors, test thoroughly, document clearly, build things that won't break six months from now.

Next Week: NumPy

Week 2 is about NumPy - arrays, vectorization, memory layout, all that stuff. Coming from C++, this actually sounds interesting. Arrays and memory? That's my comfort zone.

The roadmap says I'll be doing image transformations using only NumPy (no OpenCV). Not sure why yet, but I'm guessing it's about understanding how the low-level stuff works before using the high-level libraries.

After that: actual machine learning. Linear models, decision trees, ensemble methods. The stuff that makes predictions.

But first: arrays.

Why Document This?

A few reasons:

Accountability - Harder to skip days when you've committed publicly.

Perspective - I'm learning this as a complete ML beginner but an experienced systems engineer. Maybe that viewpoint helps someone else in the same boat.

Reality - Most learning blogs are polished success stories. I'm sharing the actual process: bugs, confusion, "I have no idea" moments, and figuring it out anyway.

Connection - If you're also transitioning into ML from systems/C++/infrastructure work, or if you're interested in the production/systems side of ML, let's talk.

The Commitment

One hour per day. Seven days a week. For twelve months.

That's what the roadmap promised, anyway. Reality? More like 1.5-2 hours most days. Turns out AIs are optimistic about how long things take. They're great at designing curricula but bad at estimating "figure out why your import statement doesn't work" time.

Day 7 was supposed to include writing a report generator in 10 minutes. I know string formatting - I didn't need a lesson on that. So I just had the AI write that function. It was 120 lines long. I don't why it thought that was a 10 minute task, but that's the way it is I guess

Other things take longer because you hit a real problem. Type detection ambiguity. CSV parsing weirdness. Tests that fail for mysterious reasons. That's where the actual learning happens.

Week 1: Probably 10-12 hours total, one complete portfolio project.

If I keep this pace: more like 500-700 hours over the year instead of 365, but still very achievable. The consistency matters more than the exact hours.

Week 1: ✅

Week 2-8: Traditional ML

Months 3-6: Deep Learning

Months 7-12: Specialization (probably ML Systems Engineering - combining C++ performance work with ML)

One hour at a time. Or two. We'll see how optimistic claude gets.

Find me: GitHub

DEV Community