DEV Community

Cover image for Building a Compute Node and Debugging My Assumptions
Mahesh Cheemalapati
Mahesh Cheemalapati

Posted on

Building a Compute Node and Debugging My Assumptions

A few months ago, if someone had told me I would spend three days trying to install an operating system on a computer the size of my hand, I would have assumed they were exaggerating.

After all, how difficult could it be?

The machine arrived, the SSD was ready, and I already knew where it would fit into my growing homelab. In my head, the process was simple. Install the drive. Flash the operating system. Connect it to the network. Start experimenting.

jetson nano orin image

A single evening seemed more than enough.

Three days later, I had booted multiple operating systems, read more forum posts than I care to admit, learned far more about recovery modes than I ever intended to, and found myself sitting on the floor holding a piece of copper wire while trying to convince a tiny computer to reveal itself.

The strange thing is that none of that was supposed to be the story.

The story was supposed to be about adding another compute node to my homelab.

Instead, it became a story about assumptions.

Because every major problem I encountered over those three days began the same way: with something I was convinced I understood.

And almost every breakthrough arrived immediately after discovering that I didn't.

Curiosity Escalates

A year ago, I wasn't particularly interested in infrastructure.

That probably sounds strange coming from someone who now spends weekends setting up servers for fun, but most of my experience lived much higher in the stack. Applications, APIs, user interfaces, cloud platforms. If something needed to run somewhere, there was usually a service willing to host it and a monthly bill willing to remind me about it.

Then AI arrived.

Like many engineers, I started by experimenting with the models themselves. I generated code, asked questions, explored prompts, and spent time understanding what these systems could do.

What surprised me wasn't the technology.

It was my reaction to it.

The more I used these tools, the less interested I became in the outputs and the more interested I became in the machinery producing them.

Where was the model actually running?

How much hardware did it really need?

Why could one model respond instantly while another struggled?

What happened between a prompt and a response?

Those questions led to more questions.

Before long, I wasn't reading about models anymore. I was reading about GPUs, inference engines, networking, storage, virtualization, self-hosting, and hardware.

Somewhere along the way, curiosity quietly escalated into a homelab.

One machine became two. One project became several. Every answer seemed to create another rabbit hole.

The deeper I went, the more I realized that many of the things I had previously treated as abstractions were only abstractions because someone else had already solved the hard parts.

I wanted to see the hard parts.

Not because I planned to replace cloud services.

Not because I was trying to build a miniature data center in my office.

I simply wanted to understand the systems I was building on top of.

The Jetson entered the picture because it represented an interesting constraint.

I wasn't interested in buying the biggest GPU I could find. That would have answered a different question. What interested me was understanding how much useful work could be extracted from relatively inexpensive hardware.

Could a small, power-efficient device contribute meaningfully to local AI workloads?

Could it become part of a distributed setup?

Could it teach me something about the relationship between software and hardware that I wouldn't learn from a specification sheet?

Those were the questions I thought I was buying the Jetson to answer.

What I didn't realize was that before I could learn anything about AI workloads, the machine was about to teach me something much more fundamental.

Assumption #1: The Hardware Is Broken

The first evening started exactly as expected.

The SSD installation was straightforward. The board powered on immediately. The fan spun to life. A green LED appeared almost instantly.

Everything looked healthy.

There was only one problem.

Nothing could see it.

Windows couldn't detect it. The flashing software couldn't detect it. No matter what I tried, the board remained invisible.

At first, I wasn't particularly worried. Hardware projects always involve a little troubleshooting. I swapped cables, tried different ports, restarted machines, checked drivers, and worked through the usual list of suspects.

But as the hours passed, a theory began to form.

Maybe the board was defective.

It felt like a reasonable conclusion. If a machine refuses to appear, eventually you start questioning the machine itself.

The frustrating part was that the evidence never fully supported the theory. The board looked healthy. It behaved like healthy hardware. It powered on consistently. Yet every attempt to communicate with it ended in failure.

By the end of the night, I hadn't proven that the hardware was broken.

I had simply failed to prove that it wasn't.

At the time, that felt like the same thing.

Assumption #2: The Board Isn't Entering Recovery Mode

The second day began with a clue.

The Jetson needed to be placed into recovery mode before it could be flashed.

That sounded promising.

It also sounded like the answer.

I quickly learned those are not always the same thing.

What followed was several hours spent moving between documentation, diagrams, forum threads, videos, and search results. Every source seemed to assume I already understood the previous step.

Eventually, I discovered that the recovery pins were hidden on the underside of the board.

This led to one of the more absurd moments of the entire project.

At some point that afternoon, I found myself sitting on the floor with a small piece of copper wire, carefully connecting two tiny pins together while applying power to the board.

If anyone had walked into the room at that moment, they probably would have assumed I had abandoned software engineering entirely and started repairing electronics.

To my surprise, it worked.

jetson nano graphic shorting pins for recovery mode

For the first time, the board appeared.

A small entry in Device Manager confirmed what I had been trying to prove since the previous evening.

The machine was alive.

That moment felt surprisingly significant.

Not because the problem was solved.

Because the problem had changed.

For nearly two days, I had been operating under the assumption that the hardware might be faulty.

Now I had evidence suggesting the opposite.

The board wasn't broken.

The challenge was that I still didn't know what was.

Assumption #3: The Flashing Process Is Broken

Once the board was finally detected, I assumed the difficult part was over.

The next stage of the journey introduced me to an entirely different category of problems.

The next stage of the journey introduced me to an entirely different category of problems.

The hardware had finally started cooperating.

The software had not.

The flashing process became a maze of operating systems, compatibility issues, drivers, tooling requirements, and storage constraints.

Windows behaved one way.

Ubuntu behaved another.

One version of Ubuntu wasn't supported. Another appeared to work until the tooling decided otherwise. A live USB environment introduced storage limitations that I didn't even know existed.

Every apparent breakthrough seemed to reveal another problem hiding behind it.

The experience reminded me of preparing for exams back in school.

You spend hours learning one topic, convinced you're finally making progress, only to discover three more chapters you didn't know existed.

By the end of the second night and well into the third day, I felt like I was debugging multiple systems simultaneously.

The Jetson.

The operating system.

The flashing tools.

The storage environment.

The host machine.

Every layer had its own rules, assumptions, and failure modes.

And yet, something interesting was beginning to emerge.

Each failed attempt wasn't simply a failure.

It was information.

Assumption #4: The Jetson Is The Problem

By the third day, frustration had started to give way to curiosity.

Not because things were working.

Because I had run out of assumptions.

The breakthrough came when I noticed a pattern.

Every successful test pointed in the same direction.

The Jetson entered recovery mode exactly as expected.

The device appeared when connected correctly.

The SSD was visible.

The board responded consistently.

Every time I managed to isolate a variable, the hardware behaved predictably.

The failures were happening somewhere else.

That realization forced me to step back and look at the situation differently.

For almost three days, I had treated the Jetson as the primary suspect.

Yet every successful test kept proving its innocence.

The machine wasn't failing.

The environment around it was.

That subtle shift changed everything.

Instead of asking why the board wasn't working, I started asking which assumptions I was making about the systems surrounding it.

Suddenly, the path forward became much clearer.

The Realization

The final breakthrough wasn't dramatic.

There was no hidden command. No magical setting. No single moment where everything suddenly made sense.

Instead, the solution emerged from a growing pile of lessons.

The correct operating system.

The correct flashing workflow.

The correct storage configuration.

The correct understanding of how the process actually worked.

Once those pieces finally aligned, the installation process became almost disappointingly simple.

After three days of troubleshooting, I expected a triumphant ending.

Instead, the flash completed quietly.

The board rebooted.

Ubuntu loaded.

A login prompt appeared.

That was it.

There was no dramatic conclusion. Just a blinking cursor waiting for input.

Oddly enough, that simple cursor felt more satisfying than it should have.

Not because the installation was complete.

Because understanding had finally caught up with effort.

What I Actually Learned

When I started this project, I thought I was learning how to set up a Jetson.

Looking back, the Jetson was only the setting. The real lesson had very little to do with the hardware itself.

The machine was never the problem.

In fact, the machine spent most of those three days doing exactly what it was supposed to do.

The board wasn't broken.

Recovery mode wasn't broken.

The SSD wasn't broken.

The hardware wasn't conspiring against me.

What slowed me down were the assumptions I carried into the process.

Every troubleshooting session starts with a theory.

Sometimes the theory is right.

Most of the time, it isn't.

Progress comes from testing those theories and being willing to let them go when the evidence points elsewhere.

For nearly three days, reality kept telling me the same thing.

The Jetson was working.

I just wasn't ready to believe it.

When I think back on those three days, I don't remember the commands I ran or the documentation I read.

What I remember is the process.

The constant cycle of forming theories, testing them, discarding them, and starting again.

I remember how often certainty turned into doubt and how often doubt eventually turned into understanding.

The experience reminded me that engineering isn't really about always having the right answers.

It's about learning how to ask better questions.

What changed wasn't the machine.

What changed was my understanding of the system.

By the time the Jetson finally booted successfully, I knew far more about operating systems, flashing workflows, recovery modes, and infrastructure than I had when I started.

Ironically, if everything had worked on the first attempt, I probably would have learned far less.

And in hindsight, that's probably the reason I started building a homelab in the first place.

Not to avoid problems.

To understand how things work.

graphic meme of a developer in his homelab excited

What's Next

Today, the Jetson sits quietly alongside the other machines in my homelab.

The excitement of getting it online has already faded, replaced by a different kind of curiosity.

The original question that led me to the machine still hasn't been answered.

How much useful work can a small, inexpensive device actually do?

Over the next few weeks, I'll be putting it to work as a dedicated compute node for smaller AI workloads. I want to see how it handles local models, embeddings, retrieval pipelines, and supporting tasks within a larger distributed setup.

Those are the questions that interested me when I bought the hardware in the first place.

The difference is that now I can finally start answering them.

And after three days of getting the machine online, that part feels surprisingly easy.

Top comments (1)

Collapse
 
harjjotsinghh profile image
Harjot Singh

i can totally relate to the unexpected challenges of setting up new tech - it often takes way longer than planned. if you're looking to simplify your next project, check out moonshift. you can get a full next.js + postgres + auth build deployed in about 7 min, and you own the code on your github for a flat per-build cost. how about a free run to try it out?