Pritesh

Posted on Apr 28

Claude Code is Gem

#ai #claude #softwareengineering #tooling

If you had asked me about AI a year ago, I would’ve said it’s mostly hype. Interesting, but not something you can rely on for real engineering work. That wasn’t entirely wrong at the time — but it was definitely incomplete.

I had tried LLMs back then. Used them for code, small utilities, even some writing. The results were inconsistent at best. Code would break in non-obvious ways, outputs looked fine on the surface but didn’t hold up, and overall it felt like more effort than value.

That perspective probably also came from the kind of work I’ve been doing for most of my career.

I’ve spent the last 10+ years working on trading systems and backend infrastructure — a big part of that at Bloomberg, and more recently working on production systems handling real financial workflows. This is the kind of environment where correctness matters a lot. You’re dealing with order routing, trade execution, reconciliation — things that can’t “mostly work.” They either work correctly, or they cause real problems.

So when early LLM outputs looked fragile, it was hard to take them seriously.

Things have changed since then.

Part of it is obvious — models and tooling have improved significantly. But the bigger shift, at least for me, was learning how to actually use them properly. That part doesn’t get enough attention. Using LLMs effectively is a skill, and in 2026, it’s becoming a pretty fundamental one.

One thing upfront — I don’t put my name on AI-generated content without being clear about it. This is written by me. I use LLMs as tools, not replacements.

How I Got Here

I generally like understanding how systems work — not just using them, but figuring out where they break and how to make them better. That’s been consistent across my work as well.

At Bloomberg, I worked on exchange connectivity and order routing systems, including handling protocols like FIX and building systems that operate across regions. Later, I moved into decision support and portfolio systems, working with both C++ and backend services to generate trade ideas and improve execution efficiency.

More recently, I’ve been working on Python-based distributed systems, building and maintaining services, optimizing databases, and dealing with batch processes that interact with exchanges and internal systems. A lot of this involves Kafka, RabbitMQ, CI/CD pipelines, and Kubernetes-based deployments.

Across all of this, one thing is consistent — systems need to be reliable, observable, and maintainable.

So naturally, I approached LLMs the same way.

Initial experience wasn’t great. Tried multiple tools — different models, coding assistants — none of it really worked for me. In hindsight, the issue wasn’t the tools. It was how I was using them.

Later, when I started working in an environment where LLM usage was encouraged, I gave it another serious shot. Picked up a subscription and started building something non-trivial — not just scripts, but a proper application.

Day one felt promising. Things were moving fast.

Day two was reality.

The system “worked,” but only superficially. No proper validation, no tests, no guarantees. That’s when my usual engineering instincts kicked in — if this were a trading system, I wouldn’t ship it like this.

I had to restart multiple times before I got into a workflow that was actually sustainable.

Over time, things improved. I was able to iterate faster, refactor with less friction, and even work in areas outside my usual stack more comfortably. That’s when it clicked — the value isn’t just in generation, it’s in how you structure the interaction.

The Core Lesson — Context Matters More Than Prompts

Earlier, there was a lot of focus on prompt engineering. It made sense — small wording changes used to have a big impact.

That’s less true now.

Modern models are much better at interpreting intent. What matters more is context — what information you give, how much of it, and how structured it is.

This became very obvious when I started using LLMs for systems that actually resembled real-world complexity.

For example, while working on services that involved event-driven architectures using Kafka and RabbitMQ, I tried feeding the model full architecture docs, feature plans, and code structure all at once.

The result? It lost focus.

It would try to do too much, or pick up the wrong signals from the context. In one case, I asked it to implement a small feature, and it started building parts of the system that weren’t even in scope yet.

That’s when I started limiting context intentionally — only what’s required for that specific task.

The difference was immediate.

It’s similar to how we design distributed systems — you don’t want every component to know everything. You want clear boundaries and well-defined responsibilities.

What Works Better in Practice

A few things that made a noticeable difference for me:

Treat context as a limited resource
Keep documentation structured and minimal
Separate high-level design from implementation details
Point to actual code instead of duplicating logic
Break work into smaller, well-defined steps

This is very similar to how we already think about system design — modularity, separation of concerns, clear interfaces.

Also, if you have specific expectations — testing practices, deployment patterns, coding standards — they need to be explicitly defined. Otherwise, the model will fill in the gaps, and not always in the way you want.

When Things Stop Working

One pattern I’ve seen repeatedly — people assume the model suddenly got worse.

In most cases, something changed:

Model behavior
Tooling
Context
Your own inputs

In distributed systems, we’re used to debugging issues by looking at changes in the system — deployments, configs, dependencies. The same thinking applies here.

These models are not deterministic. Same input can give different outputs.

Instead of reacting to bad output, it’s more useful to debug the process:

Was the context too large?
Were instructions ambiguous?
Was there conflicting information?

One useful shift — instead of asking “what went wrong,” ask “why did this happen and how do we fix the process.” That framing tends to produce much better results.

Practical Takeaways

A few things that consistently helped:

Context Management

Keep core docs small and focused
Avoid duplication
Regularly clean up outdated information

Execution

Use LLMs to generate reusable solutions, not repetitive work
Iterate in cycles — generate → review → refine
Reset quickly when something isn’t working instead of forcing it

Working with Larger Systems

Break problems into smaller units
Use separate contexts where possible
Don’t try to solve everything in one session
Final Thought

The biggest shift for me was realizing this isn’t about “better prompts.” It’s about designing better workflows.

Once you start treating LLMs like part of a system — with constraints, inputs, and feedback loops — they become far more useful.

And honestly, this fits quite naturally with how we already think about building reliable systems.

Top comments (3)

Double CHEN • May 12

The correctness-first mindset from trading systems maps onto browser automation unexpectedly. I was running Playwright-based scripts via Claude Code for scraping -- worked 70% of the time, failed silently the other 30% when a layout change broke a CSS selector. Same as the fragile-early-LLM-output problem you describe. Switching to browser-act CLI fixed that: state returns what is actually on the page as indexed elements, no selector assumptions, no silent mismatches. get markdown for extraction. 0 silent failures in ~3 months since the switch.