DEV Community: Vector

Network egress is the cloud cost that people notice too late

Vector — Sat, 14 Mar 2026 16:30:58 +0000

Network egress is the cloud cost people notice too late

Most cloud cost estimates start with compute.

That makes sense. Compute is visible, familiar, and easy to discuss in architecture meetings.

But plenty of painful cloud bills are not caused by the VM, the container, or the database instance. They are caused by the data moving between them, or out to the internet.

Network egress is one of those costs that stays invisible until a service gets busy enough for the bill to become uncomfortable.

The easy rule to remember

In GCP, ingress is free.

The trouble starts when data leaves a boundary that Google charges for:

out to the internet
across regions
and, in some cases, across zones

That is why two architectures with the same compute footprint can end up with very different monthly costs.

What is usually free

The source guide gives a useful practical summary of traffic that is often free:

traffic coming into GCP from the internet
same-zone traffic on internal IPs
same-region traffic on internal IPs between most GCP services
traffic to Google APIs such as googleapis.com

That means a lot of same-region, privately-routed communication can be kept cheap if you design for it deliberately.

What tends to cost money

Again using the source guide as the reference point:

some cross-zone traffic in the same region can cost about $0.01/GB
traffic between GCP regions can cost roughly $0.01-0.08/GB, depending on the region pair
internet egress from the Americas and Europe is around $0.085/GB for the first TB per month
internet egress from Asia-Pacific is higher, around $0.12/GB

Those figures are approximate and the guide is clear that exact prices vary by region pair and can change over time. The point is not to memorise each number. The point is to remember that placement decisions have a direct cost.

The most common expensive mistake

The easiest way to create avoidable egress is to put related services in different regions.

The example in the source guide is simple and realistic:

application servers in europe-west1
primary Cloud SQL database in us-central1

Every database query crossing that boundary creates inter-region egress. If the application talks to the database constantly, the cost piles up quickly.

This is why "closer to users" is not the only placement question. You also need to ask what the service talks to all day.

If two components communicate heavily, they usually belong in the same region unless you have a very good reason to split them.

Storage and compute can quietly double the problem

The same pattern shows up with data processing workloads.

If you store a large dataset in Cloud Storage and process it from compute in a different region, you can pay on the way in and on the way out:

data transferred from storage to compute
results transferred back again

That is why the source guide recommends co-locating compute and storage for data-heavy workloads.

It sounds obvious when written down. In practice, it is easy to miss because teams often choose storage location and compute location in separate conversations.

Internal IPs are not a small detail

One of the most useful low-effort recommendations in the source guide is to keep same-region service-to-service traffic on internal IPs whenever possible.

If two services in the same region communicate over public addresses instead, traffic can leave and re-enter GCP, which is exactly the kind of path that can introduce charges you did not need to create.

This is not just a networking cleanliness issue. It is a cost-control habit.

Cloud CDN is not just about speed

People usually think of Cloud CDN as a performance tool first.

It is also a cost tool.

The guide points out that cached responses served from edge locations use CDN egress pricing, which is lower than regular internet egress pricing. If you are serving cacheable assets or responses with a good cache hit ratio, CDN can reduce both origin load and outbound transfer cost.

That is especially relevant for:

static assets
large downloadable files
API responses that can actually be cached

It is not a fit for personalised or frequently-changing responses, but when it fits, it changes the cost profile meaningfully.

A better way to think about egress in reviews

Instead of asking "what does this service cost?", ask:

where is the data coming from?
where is it going?
how often does that happen?
is it crossing a region or the public internet?

That framing catches egress problems much earlier than staring at a pricing calculator after the design is already fixed.

One practical checklist

When I want to sanity-check egress risk quickly, I use this:

keep compute and storage in the same region if they exchange a lot of data
keep application services and databases in the same region if latency and cost both matter
prefer internal IPs for same-region communication
consider Cloud CDN for cacheable high-traffic content
include egress explicitly in every cost estimate

That last point matters most. The source guide is blunt about it: many teams estimate compute and storage, then barely think about network transfer. That is how egress becomes the line item that surprises everyone later.

You still need visibility after launch

Architecture choices are only part of the story. You also need a way to see what is actually happening.

The source guide recommends two useful routes:

billing export analysis in BigQuery to find the SKUs driving transfer cost
VPC Flow Logs to understand where traffic is actually going

That is the difference between "we think networking is expensive" and "we know which path is expensive".

The main takeaway

Network egress is not an edge case. It is part of the architecture.

If data leaves the region, leaves the platform, or takes the wrong network path, you pay for it. Good architecture reduces that spend long before finance asks where the bill came from.

If you want the full breakdown, read the original Network Egress Costs Explained in GCP guide.

If you are estimating Cloud Run workloads as part of that architecture, the Cloud Run Cost Calculator is useful because it includes egress in the estimate rather than treating compute as the whole bill.

How I would estimate GCP costs before building anything

Vector — Sat, 14 Mar 2026 16:22:15 +0000

cloudwebschool.com

How I would estimate GCP costs before building anything

Most bad cloud cost surprises do not come from price changes.

They come from weak estimates.

Someone prices a VM, ignores storage and networking, assumes the free tier will carry more than it really will, and only discovers the gaps after the system is already live.

If I had to estimate a new GCP workload before any code was in production, I would keep it much simpler than most people do.

First, list the services before you touch a calculator

The GCP Pricing Calculator is useful, but it only works well if you already know what you are trying to price.

The source guide makes the right point here: identify every service in the architecture first, then estimate the usage dimensions for each one.

Typical examples:

Compute Engine: machine type, region, hours per month
Cloud Run: requests, average duration, CPU, memory
Cloud Storage: stored data, operations, egress
BigQuery: bytes processed and storage
Cloud SQL: instance size, storage, HA setup

This step sounds boring, but it is where most underestimates begin. If a service exists in the architecture but not in the model, it is not really an estimate yet.

Then separate fixed-ish costs from usage-driven costs

This is the fastest way to make the estimate understandable.

For example:

a VM running all month looks relatively fixed
Cloud Run is usage-driven
storage can be partly fixed and partly growth-driven
egress can change dramatically with traffic

Once you split costs that way, it becomes much easier to see what deserves the most attention.

A practical Compute Engine estimate

The source guide gives a straightforward example for Compute Engine:

n2-standard-4
us-central1
about $0.19/hour on demand
roughly $138/month if it runs all day, every day

It also notes that a one-year committed use discount can reduce that to around $85/month.

That is already enough to ask the next useful question:

"Does this service actually need a continuously running VM?"

If the answer is no, that is not just a cost detail. It may point to a better compute model entirely.

A practical Cloud Run estimate

Cloud Run estimates are easy to get wrong if you only think in requests.

The source guide uses this manual example:

Monthly requests:     10,000,000
Average duration:     200ms
Memory allocated:     512 MB
CPU allocated:        1 vCPU

Request cost:         10M × $0.40/M = $4.00
CPU cost:             10M × 0.2s × 1 vCPU × $0.000024/vCPU-s = $48.00
Memory cost:          10M × 0.2s × 0.5 GB × $0.0000025/GB-s = $2.50

Estimated total:      ~$54.50/month

Then you subtract the free tier where it applies.

The important lesson is not the exact total. It is that CPU time can dominate the bill. If the average request duration comes down, cost often follows.

For this kind of workload, I would not do the maths manually more than once. I would use the Cloud Run Cost Calculator to test a few traffic and configuration scenarios quickly.

Do not estimate storage as "basically cheap"

That shortcut causes trouble all the time.

The source guide breaks Cloud Storage into three parts:

data stored
operations
egress

That is the right model. Stored data might dominate for big datasets, but network transfer can still become a large part of the bill if users or downstream systems pull a lot of data out.

The guide also gives a blunt reminder on egress: a service delivering 100 TB/month to internet users could see around $8,000/month in egress alone.

That one line is enough to justify putting networking into the estimate properly rather than treating it as an afterthought.

Build one simple spreadsheet, not ten perfect ones

The source guide recommends complementing the Pricing Calculator with a cost model spreadsheet, and I think that is the right move for anything non-trivial.

The point of the spreadsheet is not to replace the calculator. It is to answer questions the calculator does not answer very well on its own:

what happens at 1x, 5x, and 10x traffic?
what is the cost per request, user, or GB processed?
which three line items matter most?

That kind of model is where the estimate becomes useful for actual decisions.

A minimal structure is enough:

Service          | Unit        | Volume/Month | Unit Cost    | Monthly Cost
Compute Engine   | hours       | 720          | $0.19/hr     | $136.80
Cloud SQL        | hours       | 720          | $0.12/hr     | $86.40
Cloud Storage    | GB-month    | 1,000        | $0.020/GB    | $20.00
BigQuery queries | TB scanned  | 10           | $5.00/TB     | $50.00
Network egress   | GB          | 500          | $0.08/GB     | $40.00
                                               TOTAL:         $333.20

That is not fancy, but it gives you something much more valuable than a pretty screenshot: a model you can update when assumptions change.

The mistakes worth avoiding

The source guide calls out four beginner errors that are worth repeating:

estimating only compute and forgetting storage and networking
not including a growth factor
assuming free tier coverage will still matter once the service grows
never comparing estimate versus actual spend after launch

If I had to pick the biggest one, it would be the first. Teams love pricing the obvious compute layer and then acting surprised by everything around it.

My rule for pre-launch estimates

Before launch, I would want three numbers:

a realistic starting estimate
a 3x growth scenario
a 10x growth scenario

If the architecture only works financially at the smallest version of the traffic model, the estimate has already done its job by exposing that weakness early.

Final thought

Good cloud cost estimation is not about pretending you know the future perfectly.

It is about understanding the structure of the bill well enough that growth does not surprise you for obvious reasons.

If you want the longer version, read the original How to Estimate Cloud Costs in GCP guide.

If the workload is Cloud Run based, use the Cloud Run Cost Calculator to model the request, CPU, memory, and free-tier side of the estimate before you commit to an architecture.

Cloud Run scaling is simple until it isn't: the settings that actually matter

Vector — Sat, 14 Mar 2026 16:20:35 +0000

cloudwebschool.com

Cloud Run scaling is simple until it isn't: the settings that actually matter

Cloud Run scaling looks wonderfully hands-off right up until a real workload lands on it.

Then the questions start:

why did the first request feel slow?
why did the service spin up so many instances?
why is the database suddenly unhappy?

The good news is that Cloud Run scaling is not difficult once you focus on the few settings that actually shape behaviour: minimum instances, maximum instances, and concurrency.

If you understand those three, you can avoid most of the beginner mistakes without turning a simple service into a tuning project.

Start with the mental model

Cloud Run scales based on concurrent requests against the instances it already has available.

When the number of in-flight requests per instance approaches the configured concurrency limit, Cloud Run starts more instances. When traffic drops, idle instances are stopped after a cooldown period. If minimum instances is set to 0, the service eventually scales to zero.

That is the whole model in plain English:

more concurrent requests than current capacity means more instances
less traffic means fewer instances
no traffic long enough means zero instances if you allow scale-to-zero

The default behaviour is usually fine. The problems come from not matching the defaults to the service you are actually running.

Cold starts are real, but they are not always a problem

A cold start happens when Cloud Run needs to start a new container and there is no warm instance ready to take the request.

In the source guide, the typical added latency is around 200 ms to 2 seconds, depending on image size and startup time.

That sounds bad until you ask the right question: who notices?

For internal automation, webhook receivers, and background triggers, an occasional cold start is often acceptable. For user-facing APIs and web services, it can be very noticeable.

That is why the first real scaling decision is not "how do I eliminate cold starts everywhere?" but "does this service need a warm instance all the time?"

When to use minimum instances

If the service is user-facing, setting --min-instances=1 is often the cleanest fix.

That keeps one instance warm and ready, which makes response times more consistent after quiet periods. The source guide also notes that keeping one warm instance is usually affordable for most services, typically only a few dollars per month at standard memory allocations.

If the service is not user-facing, scale-to-zero is usually the better trade-off:

zero idle cost
simpler defaults
no warm capacity you are paying for unnecessarily

There is also a middle ground people forget about: if you need stronger rollout resilience, two or more minimum instances can make sense so one instance is not carrying everything during a deployment transition.

Why maximum instances matters more than people think

Beginners often spend time worrying about cold starts and ignore the setting that protects everything behind the service.

--max-instances is not just a scaling knob. It is a safety limit.

If Cloud Run is free to create lots of instances under load, every one of those instances may try to talk to the same database, queue, or downstream API. That is where trouble starts.

The source guide makes this point clearly: set the maximum based on downstream capacity, especially database connection limits, not just your hoped-for traffic peak.

If you hit the maximum and all instances are full, new requests are queued or can return HTTP 429.

That is not ideal, but it is still often better than letting the service overwhelm a dependency it cannot safely scale with.

Most services should not set concurrency to 1

This is probably the easiest Cloud Run mistake to make.

People see concurrency and think, "one request per instance sounds safer". Sometimes it is. Often it is just more expensive and less efficient.

Cloud Run defaults to a concurrency of 80. That means one instance can handle up to eighty simultaneous requests.

Lowering concurrency can make sense for CPU-heavy workloads where each request needs a lot of processor time. But for many I/O-bound services, reducing concurrency to 1 just creates more instances, more cold starts, and more pressure on downstream systems.

If you do not have a clear reason to lower it, the default is usually the right place to stay.

A practical starting point

For a normal user-facing API, this is a sensible first pass:

gcloud run deploy my-service \
  --image=IMAGE \
  --region=us-central1 \
  --min-instances=1 \
  --max-instances=100 \
  --concurrency=80

That is not the right configuration for every service, but it is a good example of reasonable defaults:

one warm instance for consistent latency
a maximum instance cap so the service does not grow without bounds
default concurrency unless you have measured evidence to change it

For an internal endpoint or background trigger, I would be much more willing to leave minimum instances at 0.

The simplest ways to reduce cold start pain

If you do care about cold start time, there are four levers in the source guide worth paying attention to:

use a smaller base image
minimise startup logic
keep a minimum instance warm
use --cpu-boost to speed up startup

That last one is useful because the service gets extra CPU during startup, which helps it become ready more quickly.

gcloud run services update my-service \
  --region=us-central1 \
  --cpu-boost

The main point is to fix startup properly before you try to compensate for a slow application with lots of always-warm capacity.

Do not pay for always-allocated CPU unless you need it

Cloud Run has two CPU allocation modes:

CPU during requests only
CPU always allocated

The default request-only mode is what most HTTP services want. CPU is billed while requests are being handled, and idle instances with minimum instances configured only incur a reduced memory cost.

Always-allocated CPU is for cases where the container needs CPU even between requests. If you do not have that kind of workload, it is an easy way to spend more than necessary.

That is one reason scaling and cost are tied together more closely than people expect.

The real rule: tune for workload, not for ideology

The strongest advice in the original guide is also the least glamorous: choose scaling settings based on who is calling the service and what is behind it.

user-facing service: keep one warm instance
internal trigger endpoint: scale to zero
fragile downstream database: cap the maximum instance count
CPU-bound workload: test lower concurrency carefully
normal web service: do not rush to override the default concurrency

That is the kind of tuning that actually helps.

If you want the fuller walkthrough, read the original Cloud Run scaling behaviour guide.

If you also want to see how scaling choices affect spend, the Cloud Run Cost Calculator is the easiest way to model the difference between scaling to zero and keeping one or more warm instances.

Cloud Run vs GKE vs VMs: how to choose the right GCP compute option

Vector — Sat, 14 Mar 2026 16:14:27 +0000

cloudwebschool.com

Cloud Run vs GKE vs VMs: how to choose the right GCP compute option

Most teams do not need more compute options. They need a sane default.

On Google Cloud, the trap is usually the same: a workload gets containerised, somebody says "we should use Kubernetes", and the team quietly signs up for more operational complexity than the service actually needs.

Here is the simpler way to think about it:

start with Cloud Run for new stateless HTTP or gRPC services
move to GKE when you need Kubernetes features Cloud Run does not give you
use Compute Engine VMs when the workload cannot sensibly live in either of those models

That framing will save you time, money, and a fair amount of unnecessary platform work.

The short version

Option	Best for	Ops overhead	Scales to zero	Stateful workloads
Cloud Run	Stateless HTTP/gRPC services	Very low	Yes	No
GKE	Kubernetes workloads needing more control	Medium to high	No	Yes
Compute Engine VMs	Legacy apps, custom OS needs, hardware-specific workloads	High	No	Yes

If you are building a normal API or internal service and it is stateless, Cloud Run is usually the right place to begin.

When Cloud Run is the right answer

Cloud Run is a strong default for:

public or internal APIs with variable traffic
microservices with bursts of usage and long idle periods
containerised services that only need HTTP or gRPC
teams that want container deployment without running Kubernetes

The main reason is not just convenience. It is fit.

Cloud Run gives you container-based deployment with very little platform overhead, and it scales to zero when nothing is hitting the service. That makes it a good match for new services where you want to ship fast and avoid paying for idle compute.

If the workload is stateless and request-driven, start here unless you have a specific reason not to.

When GKE starts to make sense

GKE becomes the better fit when the workload genuinely needs Kubernetes behaviour rather than just containers.

That usually means things like:

multi-container pods
sidecar patterns
PersistentVolumes for stateful services
service mesh requirements
existing Kubernetes manifests and operating knowledge
GPU node pools or more advanced node-level control

The important thing is to be honest about why you are choosing it.

GKE is powerful, but it comes with more to run: cluster upgrades, node pool sizing, Kubernetes networking, and Kubernetes security. That is a fair trade if the workload needs those capabilities. It is not a fair trade for a simple stateless web service that would run happily on Cloud Run.

When a VM is still the right tool

Compute Engine still matters.

A VM is the right choice when the workload does not fit neatly into the Cloud Run or GKE model, for example:

legacy applications that cannot be containerised cleanly
software with specific OS or kernel requirements
Windows Server workloads
applications that need direct hardware access
lift-and-shift migrations that are not ready to be redesigned yet

Sometimes the right answer is not "modernise everything first". Sometimes it is "run it on a VM because that is the practical option right now".

A simple decision flow

When I need to make this call quickly, I use four questions:

Can the workload be containerised?
Is it stateless and driven by HTTP or gRPC?
Does it need Kubernetes features such as sidecars, PersistentVolumes, or service mesh?
Does it need specific OS, kernel, or hardware control?

That usually leads to a clean result:

if it cannot be containerised, use a VM
if it is stateless and HTTP/gRPC, start with Cloud Run
if it needs Kubernetes features, use GKE
if it needs low-level machine control, use a VM

If you are still unsure, Cloud Run is the safest default for a new stateless service. You can always move up to GKE or sideways to VMs later when you hit a real limitation.

A practical example

Take a simple internal API handling about 100,000 requests per day, with each request taking roughly 100 ms.

From the source guide, the trade-off looks like this:

Cloud Run is billed only during request handling and can end up costing only a few dollars per month for a workload at this level
GKE Autopilot is billed per pod resource request, and a two-replica deployment running all day to avoid cold starts will usually cost more than Cloud Run at low traffic
Compute Engine on an e2-medium is billed continuously at roughly $25-35/month

That does not mean Cloud Run is always cheapest forever. The original guide makes the opposite point as well: at very high sustained traffic, the per-request model of Cloud Run can become less attractive than a well-sized VM.

But for low-traffic and variable-traffic APIs, Cloud Run is usually hard to beat on both cost and operational simplicity.

The mistakes I see most often

The biggest mistake is defaulting to GKE for every containerised workload.

Containers are not the same thing as Kubernetes requirements. A lot of teams pick GKE because it feels like the "serious" platform choice, when what they actually need is a straightforward way to run a stateless service.

The second mistake is keeping simple APIs on VMs out of habit. If the service is stateless and containerisable, a VM often means more patching, more idle cost, and more infrastructure work than necessary.

The third mistake is treating the first decision as permanent. Workloads change. A service that starts well on Cloud Run may eventually need GKE features. A VM-hosted workload may later become container-friendly. Revisit the choice when the requirements change.

What I would recommend

If you are choosing for a new service today:

pick Cloud Run for stateless request-driven workloads
pick GKE only when you can point to a Kubernetes-specific need
pick Compute Engine when the application needs full machine-level control or cannot be containerised sensibly

Most real architectures end up using all three somewhere. The goal is not to choose one platform for everything. The goal is to use the simplest option that still fits the workload properly.

If you want the fuller breakdown, read the original Cloud Run vs GKE vs VMs guide.

If cost is part of the decision, it is worth checking the Cloud Run Cost Calculator before you compare Cloud Run against a fixed VM or GKE setup.

How to estimate your Cloud Run bill without guessing

Vector — Sat, 14 Mar 2026 16:06:35 +0000

cloudwebschool.com

Cloud Run pricing looks simple until someone asks a very normal question:

"How much is this service actually going to cost me each month?"

A lot of people jump straight to request count. In practice, that is often not the part that matters most. For many workloads, CPU time is the real cost driver.
The good news is that you do not need a perfect spreadsheet to get a useful estimate. If you know a few inputs, you can get close enough to make better decisions before you deploy.

The four numbers that matter

For a basic Cloud Run estimate, you mostly care about:

monthly requests
average request duration
CPU allocated per instance
memory allocated per instance

If you expect outbound traffic, add egress as well.

That is the core of it. Cloud Run pricing is granular, but it is not random.

The basic model

A simple estimate comes down to two usage calculations:

vCPU-seconds = monthly requests * (duration_ms / 1000) * vCPUs
GB-seconds   = monthly requests * (duration_ms / 1000) * (memory_MB / 1024)

From there, Cloud Run adds request charges and networking egress.

The current calculator on CloudWebSchool uses these published pricing constants:

CPU: $0.00002400 per vCPU-second
Memory: $0.00000250 per GB-second
Requests: $0.40 per million
Egress: $0.12 per GB

It also applies the free tier if you want a more realistic estimate:

2,000,000 free requests per month
360,000 free vCPU-seconds per month
180,000 free GB-seconds per month
1 free GB of egress per month

One detail people miss: the free tier is per billing account per month, not per service.

A worked example

Let us use one of the calculator's example workloads:

10 million requests per month
200 ms average duration
1 vCPU
512 MB memory
5 GB egress
free tier applied

That lands at roughly $45/month.

The breakdown is the useful part:

CPU: $39.36
Memory: $2.05
Requests: $3.20
Egress: $0.48

This tells you something important straight away: the bill is mostly CPU time, not request count.

If you cut average duration from 200 ms to 100 ms, the total cost drops sharply.

The calculator's scenario notes that halving duration roughly halves the bill.

That is the kind of optimisation insight you want before tweaking settings blindly.

What developers usually get wrong

1. Treating request count as the whole story

A service can handle a lot of requests and still stay cheap if each request is short and the free tier absorbs part of the traffic.

For example:

100,000 monthly requests
300 ms duration
256 MB memory
1 vCPU

This comes out at about $0/month, because it stays inside the free tier.

2. Ignoring infrastructure outside request handling

A simple Cloud Run estimate is useful, but it is not the whole bill if you also use:

minimum instances
always-allocated CPU
Cloud SQL
Secret Manager API calls
load balancing
VPC connectors
Artifact Registry storage

That is why it is best to treat the estimate as a baseline, not a promise.

How to use this in practice

If you are sizing a new service, start with a rough model before touching production settings.

Ask yourself:

How many requests do I expect each month?
What is the average request duration?
Do I really need this much CPU and memory?
Is my service actually CPU-bound, or just slow?

That last question matters more than most people think.

If CPU time dominates your bill, then improving latency is not just a performance win.

It is also a cost optimisation.

If you want to test your own numbers, the Cloud Run Cost Calculator lets you plug in:

request volume
request duration
CPU allocation
memory allocation
network egress

directly in the browser.

If you are tuning minimum instances, concurrency, or memory, the longer Cloud Run cost optimisation guide goes deeper into those trade-offs.

Conclusion

Cloud Run pricing becomes much easier to reason about once you stop guessing.

For many services, the biggest lever is not:

"How many requests do I have?"

but rather:

"How much CPU time does each request burn?"

Get that right and your estimates become much more reliable.

Try the calculator

If you want to run the numbers on your own workload, try the full Cloud Run Cost Calculator:

cloudwebschool.com

It is free, browser-based, and useful for quick planning before changing production settings.