DEV Community: Rakesh Tanwar

Common Mistakes Enterprises Make with Cloud Storage and How to Avoid Them

Rakesh Tanwar — Wed, 24 Dec 2025 10:46:38 +0000

Over and over, I see big enterprises burn money, tank performance, or create compliance nightmares because they treat cloud storage like a magic infinite disk. It isn’t. It’s a toolbox. And if you use a hammer for everything, eventually you’re going to hit your thumb. Here are the most common mistakes I see, and how I’d avoid them if I were rebuilding from scratch.

1. Treating cloud storage like an on-prem SAN
The classic one: “We moved to the cloud, so we provisioned giant network volumes and mounted them everywhere. Done.”

That’s not “cloud,” that’s your old data center with extra steps.

Block storage has its place (databases, certain legacy apps), but:

It doesn’t scale like object storage
It’s usually more expensive at large capacity
It ties data to specific instances and zones

What I do instead
I start with object storage as the default for anything that is:

Shared across teams
Read-heavy
Long-lived

Block storage is reserved for latency-sensitive, tightly coupled workloads. If I catch myself putting “everything” on block storage, that’s my red flag that I’m just re-implementing the old world in the cloud.

2. Keeping everything in the hottest (most expensive) tier
I once reviewed a storage bill for an enterprise where 90%+ of the data hadn’t been touched in over a year—all sitting in premium “hot” storage. Their monthly bill was basically a museum ticket for data nobody visited.

This happens because:

Nobody owns lifecycle policies.
“We’ll clean it up later” quietly becomes “never.”
Teams are afraid of archive tiers because they don’t trust they can get data back.

How to avoid it:

Classify data into hot / warm / cold / archive.
Put automated lifecycle policies on every bucket by default:
- After X days → cool tier
- After Y days → archive or delete
Only exempt datasets where you actively justify why they must stay hot.

My rule: if no one can name a reason a dataset must be hot within 5 seconds, it probably shouldn’t be.

3. Ignoring egress and API costs
Everyone obsesses over “$ per GB per month” and then gets ambushed by:

Cross-region egress
“Chatty” apps making millions of small GET/PUTs
Constant re-downloading of the same objects

I’ve seen GPU training jobs where the storage API bill rivaled the compute bill because the data loader was pulling tiny objects one by one across regions.

How I avoid this?

Co-locate compute and storage in the same region by default.
For high-I/O workloads, shard small files into larger objects (webdataset, tar, parquet, etc.).
Use caching:
- Local NVMe or node-local SSDs as a read-through cache for frequently accessed datasets.
Set up cost dashboards that actually surface:
- Top egress sources
- Top buckets by API requests

If you don’t measure egress and API calls, you’ll be surprised. And cloud surprise is always expensive.

4. No data locality strategy for performance-critical workloads
From the GPU side, this one hurts the most.

I’ve seen enterprises deploy multi-million-dollar GPU clusters, then point them at data sitting:

In another region
In another cloud
On a sad NFS box hidden behind a VPN

Then they wonder why GPU utilization is 40%.

My rule
For performance-sensitive jobs (training, large-scale analytics, latency-sensitive inference):

Data and compute must live as close as physically possible.
For big training workloads:
- Keep canonical data in object storage in the same region.
- Stage active shards onto local NVMe before the job starts.
For critical real-time inference:
- Keep models and key features on local SSD / high-performance block.

If you’re paying for high-end GPUs, it’s almost always cheaper to over-provision fast storage than to let those GPUs idle waiting for bytes.

5. Over-sharing and under-governing buckets
Another common pattern: one giant “data” bucket with:

Broad access
Flat structure
Ad hoc naming
No clear ownership

It works fine until:

Someone deletes a folder they shouldn’t.
An internal tool exposes data it shouldn’t.
Nobody knows who can approve access because “everyone uses that bucket.”

How I handle it?

Design for data domains, not “one bucket to rule them all”: analytics-, ml-, raw-, archive-, etc.
- Assign clear ownership per bucket/domain:
  - Data owner
  - Access policy owner
  - Lifecycle policy owner
Use least-privilege IAM:
- Read-only where possible
- Narrow write permissions
- Strong separation between production and experiment buckets

Security teams love this. So do auditors. But more importantly, it reduces accidents.

6. No versioning, no backups, no restore tests
This is the quiet killer.

I still see critical buckets with:

Versioning turned off
No backup or replication strategy
No tested restore process

Then one day, a bad script runs rm -rf in the wrong prefix, and suddenly everyone discovers that “11 nines of durability” doesn’t mean “undo button.”

My practical approach

Turn on versioning for:
- Any bucket storing production models, configs, or critical reference data.
Have a clear replication / backup story:
- Cross-region replication for “if this region dies, we’re in trouble” datasets.
- Separate “backup projects/accounts” to isolate from accidental deletion.
Actually test restores:
- Pull a random dataset from backup.
- Time how long it takes and what breaks.

If you’ve never practiced a restore, assume it doesn’t work.

7. Letting everyone do “whatever they want” forever
Some chaos is healthy. But I’ve worked with enterprises where every team:

Invents their own folder structure
Chooses random storage classes
Builds slightly different ingestion pipelines

On day one, this feels like “autonomy.” By year two, it’s data hell.

What I recommend?

Create a small set of storage patterns:
- “Analytics dataset pattern”
- “ML training dataset pattern”
- “Archive pattern”
Provide templates and tooling:
- Terraform modules, bucket naming conventions, lifecycle defaults.
Allow deviations—but make them explicit decisions, not accidents.

The goal isn’t central control for its own sake. It’s to avoid having 20 ways to do the same thing, all slightly broken in different ways.

Bringing it together
When I walk into an enterprise as a cloud GPU person, I’ve learned not to start by asking “what GPUs are you using?” I start with:

Where does your data live?
Who owns which buckets?
What are your lifecycle policies?
How often do you move or restore data?

Most “GPU performance issues” I see are really storage design issues in disguise.

If you treat cloud storage as a strategic system (classify data, control access, manage lifecycle, test restores, and care about locality), you’ll get better security, lower bills, and much happier GPUs.

Best Practices for Connecting LLMs to SQL Databases

Rakesh Tanwar — Wed, 24 Dec 2025 10:23:18 +0000

Hooking an LLM straight up to your production SQL database is one of those ideas that sounds cool in a demo and terrifying in a real company.

Done well, you get “ask in English, get SQL + results” and a lot less back-and-forth between data folks and everyone else. Done badly, you get slow queries, wrong numbers in executive decks, or worse, accidental data leaks and write operations you never meant to allow. Enterprise NL2SQL papers and blog posts keep repeating the same warning: accuracy and safety are the main problems, not “can the model write SQL”.

Let’s walk through practical best practices for connecting LLMs to SQL in a way that’s useful, predictable, and not terrifying for your DBAs.

1. Treat the LLM as an untrusted client
First principle: the LLM is not special. It’s just another client that can send weird queries.

So architecture-wise:

Put the LLM behind an API layer, not directly on the database connection string.
Let your backend service:
- Call the LLM.
- Inspect the generated SQL.
- Decide whether to run it, rewrite it, or reject it.

For safety and performance, hit:

A read replica or analytics database, not the OLTP primary.
A separate schema or database user with limited permissions.

Safety checklists for LLM agents all hammer on the same idea: limit tools, limit permissions, and assume the model will eventually do something dumb if you let it.

2. Make the LLM schema-aware (but don’t dump the whole catalog)
Most NL2SQL accuracy issues come from the model not really “knowing” your schema: table names are strange, joins are non-obvious, and column naming is inconsistent. Recent surveys put execution accuracy in the ~60–70% range even for strong models on realistic datasets.

You boost accuracy by feeding the model the right schema context:

Provide:
- Table names and short descriptions
- Column names + types
- Key relationships (PK/FK, common join paths)
Scope the schema:
- Only include tables relevant to the current product area or user.
- For big warehouses, predefine “domains” (sales, support, billing) and only send one at a time.

Tools and guides across vendors (Azure, Oracle, LlamaIndex, etc.) all follow this pattern: index the schema, then give the model a filtered view based on the question.

Too much schema = confusion. Too little schema = wrong joins. Spend time getting this balance right.

3. Use a two-step reasoning pattern, not “prompt → SQL → run”
Direct “question in, SQL out, execute immediately” is fragile. Better to split it:

Interpret the request
- Have the model restate the question in structured form:
  - Intent (report vs lookup vs debug)
  - Entities (customer, product, region, time range)
  - Output shape (single value, table, time series)
Generate SQL from that plan
- Ask the model to produce SQL and a short explanation of the join logic and filters.

Chain-of-thought style prompting (even if you don’t show the thoughts to the user) consistently improves SQL generation quality in studies and production write-ups.

Implementation tip:

Parse only the SQL part (e.g., fenced in a code block).
Ignore anything else when executing.

That gives you something to log and debug when a query misbehaves.

4. Lock down what SQL the model is allowed to run
Don’t rely on “please don’t write DELETE statements” in the prompt. Enforce it.

Concrete rules that work well:

Read-only DB user
- Only SELECT allowed.
- No INSERT/UPDATE/DELETE/MERGE, no DDL.
Single-statement rule
- Reject queries with multiple statements or suspicious delimiters.
Row and time limits
- Always add LIMIT and sane timeouts.
- For dashboards, page the results.
Column allow-listing
- Exclude PII or sensitive columns at the schema layer, or expose only safe views.

Some teams go one step further and allow the LLM to call only stored procedures instead of emitting free-form SQL. That trades flexibility for strong control: the model picks a stored proc and fills in parameters, but can’t touch arbitrary tables.

Whatever you choose, implement checks in code before execution, not just in the prompt.

5. Validate and sandbox queries before hitting real data
Even with a read-only user, ugly queries can still hurt performance or return nonsense.

Good guardrails:

Static checks
- Parse the SQL (e.g., with your language’s SQL parser) and inspect the AST.
- Reject:
  - Cross-database references
  - Dangerous functions
  - Huge cartesian joins
Dry run or EXPLAIN
- Run EXPLAIN first and reject queries with insane cost estimates or full table scans on huge tables.
Result sanity checks
- Enforce row count caps.
- If the result is empty or obviously off, you can ask the model to debug/adjust the SQL instead of returning junk.

Research on constrained NL2SQL and runtime enforcement basically boils down to this: let the model propose queries, but use hard-coded constraints to keep execution safe.

For sensitive environments, consider running first against masked or synthetic data to test prompts and behavior.

6. Put a human in the loop where the blast radius is high
Not every query needs approval. But some really should.

Patterns that work:

For ad-hoc analytics or internal reporting, you can usually auto-run reads with good guardrails.
For actions that:
- Affect pricing, payouts, or compliance, or
- Touch very sensitive tables
…show the SQL and a plain-English summary to a human for approval first.

Safety guides for LLM agents explicitly recommend human review for any high-impact actions like editing databases; querying sensitive data can fit the same pattern depending on your risk profile.

Make it easy for the reviewer: include the original question, the generated SQL, and a quick explanation of what the query does.

7. Log everything and measure accuracy over time
NL2SQL is not “solved”, especially once you move beyond academic benchmarks into messy enterprise schemas.

Treat your LLM–SQL layer as a product:

Log:
- User question
- Schema context you passed in
- Generated SQL
- Execution plan and runtime
- Result shape (row count, columns)
Sample and label:
- Regularly review a subset of interactions.
- Mark which SQL queries:
  - Ran successfully
  - Returned correct answers
  - Needed manual fixes
Track:
- Execution accuracy (did the SQL run).
- Answer accuracy (was it the right question / result).
- Latency and cost.

This gives you a feedback loop when you change models, prompts, or schema, and lets you catch regressions early.

8. Start narrow, then widen the blast radius
The safest path is to begin with a tight use case and expand.

A nice rollout order:

One domain, one schema
- e.g., just analytics on a reporting replica of your billing DB.
Internal users only
- Data/BI teams who can spot nonsense quickly.
Gradual schema expansion
- Add more tables and domains once you trust the behavior.
Broader audiences and more powerful queries
- Only after logs and metrics show stable, predictable behavior.

You’ll learn a lot about your own data quality, naming, and join structure along the way, which often leads to better views and marts even outside the LLM use case.

Wrapping up
Connecting LLMs to SQL databases isn’t just about getting fancy demos where someone types “show me revenue by region” and a pretty chart appears. The hard part is everything around that moment: scoping schema, locking down permissions, validating what runs, and tracking whether answers are actually right.

If you treat the LLM as an untrusted client, keep it schema-aware but constrained, add hard checks around the SQL it emits, and watch behavior with real metrics, you can give people a natural-language window into your data without giving your DBA a heart attack.

Why GPUs Are Critical for Medical Image Processing

Rakesh Tanwar — Mon, 22 Dec 2025 10:45:06 +0000

If you’ve ever worked with medical imaging data, you know it doesn’t behave like “normal images.” A CT study isn’t one picture. It’s a stack of slices, sometimes hundreds of them. MRI can add multiple sequences. Ultrasound can be a stream. Then you layer on reconstruction, denoising, segmentation, registration, and sometimes deep learning inference on top. That’s why GPUs are critical for medical image processing. Not because GPUs are trendy, but because the math and the data volume line up almost perfectly with what GPUs do well.

This isn’t medical advice or a claim about clinical outcomes. It’s just the compute reality: if you want faster turnaround and fewer pipeline bottlenecks, GPUs usually end up in the middle of the system.

Medical imaging is a data-heavy problem, not just an “image” problem
Once you treat it like 3D data plus workflow pressure, the GPU case makes more sense.

A typical computer vision workflow might deal with 224 by 224 images. Medical imaging often deals with full volumes, and sometimes time series on top. Every step you do, filtering, resampling, masking, is repeated across millions of voxels.
That size has a knock-on effect: it increases memory traffic, increases compute, and makes “do it on CPU later” feel like a slow leak that turns into a backlog.

The core reason GPUs win: parallel math and high memory bandwidth
Medical image processing is full of repeated operations, and GPUs are built for that kind of repetition.

A lot of medical imaging workloads boil down to “apply the same operation across a large grid,” whether that’s a convolution, interpolation, thresholding, or a more complex kernel. GPUs can run thousands of threads in parallel, which maps nicely to voxel-wise and pixel-wise work.

The other side of it is memory. Moving and touching large volumes costs time. GPUs are designed to push a lot of data through math units quickly, and many imaging steps are limited by memory bandwidth as much as raw compute.

Reconstruction is where GPUs earn their keep
In several modalities, you’re not loading an image, you’re building it from raw measurements.

MRI and ultrasound reconstruction leans hard on FFT math
MRI reconstruction commonly uses the Fast Fourier Transform as part of turning acquired signal data into an image. NVIDIA’s GPU Gems includes a chapter explicitly showing GPU-based FFT work for MRI and ultrasonic imaging reconstruction.

That matters because FFT work is highly parallel and can be a big chunk of total reconstruction time. Research literature also calls out FFT acceleration as a key theme for speeding advanced MRI reconstruction algorithms.

Iterative CT reconstruction is compute-hungry
Iterative reconstruction methods can improve image quality, but they’re heavier than simpler analytic methods. There are papers focused on accelerating iterative CT reconstruction on GPUs, including work exploring GPU features like Tensor Cores for speeding iterative CT reconstruction.

The takeaway isn’t “every CT pipeline uses this.” It’s that reconstruction can easily become the dominant compute cost, and it’s a very GPU-friendly cost.

AI in medical imaging is GPU-first by default
Once you start training or running 3D models, CPUs stop being the default option.

If you’re doing segmentation, detection, triage, or classification, you’re usually pushing big tensor ops over 2D stacks or full 3D volumes. That’s why most practical medical imaging AI stacks assume GPUs, especially when you move from 2D to 3D segmentation.

A good example is MONAI, a PyTorch-based, open-source toolkit built for healthcare imaging AI. It’s part of the PyTorch ecosystem and is designed around deep learning workflows for medical imaging.

One practical detail people miss: GPUs help twice here. First, for training. Second, for inference throughput when you need to run models over many studies, many slices, or a live queue. Even if a single inference is “fast enough,” queues are where latency becomes a real workflow problem.

Don’t ignore the boring bottlenecks: decode and data movement
A fast GPU model won’t help if your pipeline can’t feed it.
DICOM workflows often involve compression and decoding. JPEG 2000 shows up in medical imaging and digital pathology, and decode can become a real bottleneck when you scale. NVIDIA’s nvJPEG2000 library is specifically aimed at accelerating JPEG 2000 decoding and encoding on NVIDIA GPUs, with parts of the decode offloaded to the GPU.

NVIDIA has also written about GPU-accelerated medical image decoding using nvJPEG2000 in the context of DICOM images.
This is where a lot of teams get surprised. They upgrade the model, see no speedup, and the reason is simple: decode and transfers are stalling everything.

What to look for in a GPU setup for medical imaging
You don’t need the “biggest GPU,” but you do need the right shape for your workloads.

First, VRAM. 3D volumes and 3D models eat memory fast. If you’re doing full-volume inference or training, VRAM is often the first constraint you hit.

Second, predictable throughput. For imaging pipelines, it’s rarely one job. It’s many studies, batching, retries, and a queue that’s always there. Stable performance is more useful than peak benchmarks.

Third, plan for where the data lives. If your GPU is fast but your storage or network is slow, you’ll see stutters. Medical imaging workloads punish slow I/O.

Summary
GPUs matter in medical image processing because the workload is a perfect storm: large 3D data, repeated math, heavy reconstruction steps, and deep learning that lives on tensor ops. Reconstruction benefits from GPU-friendly computation like FFTs in MRI and ultrasound, and iterative approaches in CT can be heavy enough that GPUs become the only practical way to keep turnaround reasonable.

And the less glamorous part is just as real: decode and data movement can bottleneck the whole system, and GPU-accelerated decoding libraries exist for a reason.