Void

Posted on Mar 14

Nvidia GTC Starts Monday. Here's What Actually Matters If You Write Code for a Living

#nvidia #ai #gpu #discuss

Nvidia's GTC conference starts Monday. Jensen Huang takes the stage at 11 AM Pacific, does his leather jacket thing for two hours, and by Tuesday morning your Twitter/X, LinkedIn feeds will be drowning in hot takes from people who weren't there and didn't watch it.

I figured I'd do something different but along the lines. Instead of waiting for the recap, I wanted to write down what I'm personally watching for as someone who doesn't trade $NVDA stock (Can't even if i wanted to) but does actually use GPUs and AI tools to get work done.

Because here's the thing. GTC has become the Super Bowl of AI infrastructure. 30,000 attendees from 190 countries. It runs March 16-19 across ten venues in downtown San Jose. There are over 1,000 sessions. Jensen will talk about chips, software, models, robots, and probably world peace by the end of it(current situation seems like it).

But if you're a regular developer like anyother someone building apps, running inference, maybe self-hosting models, or just trying to understand where all this is going 90% of that is noise. Here's what i consider my priorities.

The inference chip thing

This is probably the biggest deal for anyone actually deploying AI in prod or home or anywhere.

Training a model is expensive. Running a trained model (inference) is what you do millions of times a day. Right now Nvidia dominates training with something like 80% market share. But inference is a different game. Google, Amazon, and others are building custom chips specifically for inference, trying to eat into Nvidia's lead.

Nvidia reportedly bought Groq last year for $20 billion. If you've used Groq's API, you know the speed is insane. Hundreds of tokens per second. They use a completely different chip architecture that's built specifically for inference rather than the general purpose GPU approach.

What I want to know: how does Groq's tech integrate with Nvidia's ecosystem? If Nvidia can offer both training AND inference at stupid fast speeds, that changes the math for anyone choosing between self-hosting and API calls. The cost of running your own models could drop significantly. Or Nvidia could just jack up prices because they own both sides. We'll see.

NemoClaw - an open-source AI agent framework

There's a rumored announcement of something called NemoClaw. It's apparently an open-source platform for building enterprise AI agents.

Now, "AI agent framework" is one of those terms that makes me immediately suspicious because everyone and their dog has one now and the list is long and growing.

But Nvidia releasing one is different for a specific reason: hardware integration. Most agent frameworks are model-agnostic, which is great for flexibility but means they can't really optimize for the hardware they're running on. An Nvidia-built framework could be tightly coupled with their GPUs, CUDA ecosystem, and TensorRT optimizations in ways that third-party tools can't easily match.

If you're building anything where AI agents need to run fast and locally think ondevice assistants, enterprise tools that can't ship data to the cloud, or anything latency-sensitive this is worth paying attention to.

The Rubin architecture

Nvidia's next gen GPU architecture is called Rubin. Reportedly packing up to 288GB of HBM4 memory with a massive performance leap over the current Blackwell generation. Numbers like "five times the dense floating-point performance" are being thrown around.

I'm not going to pretend I fully understand the differences between HBM3e and HBM4 at the physics level. I don't. What I do understand is what more memory means in practice, bigger models can fit on fewer GPUs. If you're self hosting a 70B parameter model right now, you probably need multiple GPUs. With Rubin's memory capacity, that might change. And that directly affects whether it makes sense to self-host or keep paying per token API fees.

The practical question: when do these actually ship, and at what price? If Rubin is a 2027 product, it's interesting but academic. If it starts showing up in cloud instances by late 2026, that changes planning for anyone running AI workloads.

The open models panel

On Wednesday, Jensen is personally moderating a panel about open frontier models. The guest list includes Harrison Chase (LangChain), plus leaders from A16Z, AI2, Cursor, and Thinking Machines Lab.

This is interesting timing. The Meta Avocado situation just happened (I'll probably write about that after getting some of my facts straight STAY TUNED!!!) their new model got delayed and there are real questions about whether Meta will keep releasing competitive open weight models or shift toward closed source. If there was ever a moment to have a serious conversation about who's going to carry the open model torch, it's right now.

I don't expect Jensen to badmouth Meta directly. But I would not be surprised if Nvidia positions itself as the open ecosystem's best friend. They sell more GPUs when more people are training and running open models. Their incentives are aligned with keeping models free and accessible.

The ARM CPU (rumor)

This one's more speculative, but there are rumors that Nvidia might show ARM-based processors for PCs. They've been doing ARM chips for data centers (Grace), and the question has always been when does that come to laptops and desktops?

Apple proved with M-series chips that ARM can absolutely compete with x86 for developer workloads. If Nvidia enters that space with integrated GPU capabilities, it could be a big deal for developers who want to run local AI models on a laptop without carrying an external GPU.

Or it could be a data center only announcement and we'll never know only time will tell. Such is life.

What I'm NOT watching for

Gaming GPUs. GTC has historically been the enterprise/AI event, not the consumer one. If you're hoping for RTX 5090 Ti pricing, this probably isn't the place.

Stock predictions. I genuinely don't care and I'm not qualified. If you want that, go read Many other Financial Bros blogs.

"AI will change everything" platitudes. Jensen will say inspiring things about AI being essential infrastructure. He says this every year. It's always partly right and partly marketing. I'm filtering for the specific product announcements, not the philosophy.

How to actually watch

The keynote streams free at nvidia.com on Monday, March 16 at 11 AM Pacific/ 2 PM Eastern/ 11PM Indian. No registration needed for the keynote stream. The full conference runs through the 19th, and there's a pre-show starting at 8 AM with analysts and founders.

The pre-show guests include CEOs from Perplexity, LangChain, Mistral AI, and a bunch of AI infrastructure companies. Honestly, the pre-show panel might be more interesting for developers than the keynote itself, since keynotes tend to lean heavy on enterprise partnerships and CEO-to-CEO handshakes.

I'll probably watch the keynote, skim the pre-show, and then cherry-pick sessions over the rest of the week based on what actually gets announced. If anything wild drops, I might write a follow-up.

Why a Software engineer cares about a hardware conference

Because the hardware dictates the economics. And the economics dictate what we can build.

Two years ago, running a 7B model locally was a novelty. Now it's normal. That happened because hardware got cheaper and more accessible. The announcements at GTC will determine whether self-hosting a 70B+ model becomes normal too and how fast.

If inference gets 5x cheaper because of Groq integration, that changes which projects are viable. If Rubin chips make local inference on bigger models practical, that shifts the build-vs-buy calculation. If NemoClaw gives us an agent framework that actually runs well on commodity hardware, that unlocks use cases that are currently too expensive or too slow.

None of that is abstract to me. It's the difference between a side project being a toy and a side project being a product. And that's why I'm watching.

See you Monday.

DEV Community