Why Your GKE Load Balancer Fails After 30 Seconds (Real Fix)

sanjay yadav — Tue, 28 Apr 2026 10:12:08 +0000

Still getting 504 errors in GKE even when everything looks fine?

We faced the same issue. Pods were healthy, APIs were responding, and there were no errors in logs.

The real problem wasn’t the application. It was a 30-second timeout.

We spent hours debugging this.

Everything looked normal:

Pods were running fine
APIs were responding
No crashes or error logs

But still, requests kept failing.

We were seeing random 503 and 504 errors.

The Confusing Part

Short requests worked perfectly
Logs showed no issues
System appeared healthy

But anything taking more than around 30 seconds failed every time.

What We Thought Initially

At first, we assumed:

Application bug
Database latency
Resource limits

We checked everything. Nothing was wrong.

The Real Cause

After digging deeper, we found the actual issue:

GKE Load Balancer has a default timeout of 30 seconds.

This means:

Your backend may still be processing
But the load balancer stops waiting

Result: 504 Gateway Timeout.

The Fix: BackendConfig

We fixed it using BackendConfig.

It allows you to:

Increase timeout
Customize health checks
Control load balancer behavior

What We Changed

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: my-backend-config
spec:
  timeoutSec: 60

This change increased the timeout from 30 seconds to 60 seconds.

After applying it, the issue was resolved.

Attach It to Your Service

annotations:
  cloud.google.com/backend-config: '{"ports": {"8003":"my-backend-config"}}'

Apply the configuration and redeploy your service.

Important Note

If you don’t configure BackendConfig:

Default timeout remains 30 seconds
Long-running requests will fail
You have limited control over behavior

Bonus Tip

If your application regularly takes longer to respond:

Avoid increasing timeout too much
Consider using asynchronous processing (queues or background jobs)

Final Thought

This was not a complex bug.

It was a default setting that we were not aware of.

Sometimes the real issue is not in your code, but in infrastructure defaults.

Full step-by-step guide

If you work with GKE, this can save you a lot of time.

Stop Using T2 Instances — They Cost More Than You Think (T2 vs T3)

sanjay yadav — Tue, 28 Apr 2026 09:43:27 +0000

Stop Using T2 Instances — They’re Quietly Increasing Your AWS Bill

I used to think T2 instances were the cheapest option on AWS.

They look affordable, they’re everywhere in tutorials, and honestly — most of us just start with them.

But after checking a few AWS bills closely, I realized something wasn’t adding up.

🤔 What’s Actually Going On with T2?

T2 instances run on a CPU credit system.

In simple terms:

When your app is idle → you earn credits
When CPU usage increases → you spend credits

At first, this seems like a smart system.

But here’s the part most people miss 👇

Where the Extra Cost Comes From

If your instance runs out of CPU credits, AWS doesn’t just slow things down.

If unlimited mode is enabled (which it often is by default), you start getting charged for extra CPU usage.

No clear warning. No obvious alert.

Just a slightly higher bill at the end of the month.

That’s exactly what happened to me.

Why I Switched to T3

T3 instances are basically the improved version of T2.

After switching, I noticed:

More consistent performance
Fewer surprises in billing
Better overall value

They use newer hardware and handle CPU usage more efficiently.

T2 vs T3 (Simple View)

Feature	T2	T3
CPU Handling	Credit-based	Smarter credits
Performance	Can drop	More stable
Pricing	Can spike	More predictable
Generation	Older	Newer

When You Should Avoid T2

From experience, avoid T2 if:

Your app has traffic spikes
You're running anything close to production
You don’t want unpredictable costs

Quick Tip

If you're currently using T2, check this:

Go to your billing dashboard → look for CPU credit charges

You might already be paying more than expected.

Final Thought

T2 isn’t “bad” — it’s just outdated for most real use cases.

T3 is usually the safer and smarter choice now.

🔗 Full article: [https://www.kubeblogs.com/why-t3-is-better-than-t2-for-most-aws-ec2-workloads/]
If you're into simple, practical DevOps tips, follow along 👍

DEV Community: sanjay yadav

Why Your GKE Load Balancer Fails After 30 Seconds (Real Fix)

The Confusing Part

What We Thought Initially

The Real Cause

The Fix: BackendConfig

What We Changed

Attach It to Your Service

Important Note

Bonus Tip

Final Thought

Stop Using T2 Instances — They Cost More Than You Think (T2 vs T3)

Stop Using T2 Instances — They’re Quietly Increasing Your AWS Bill

🤔 What’s Actually Going On with T2?

Where the Extra Cost Comes From

Why I Switched to T3

T2 vs T3 (Simple View)

When You Should Avoid T2

Quick Tip

Final Thought