DEV Community: gyorgy

Democratizing professional cloud infrastructure

gyorgy — Wed, 17 Jun 2026 17:21:24 +0000

Published June 17, 2026 by gyorgy

The cloud the serious companies use is closer than you think.

You can build almost anything now. You describe what you want, an agent writes it, and a few minutes later there is a working app on your screen. "That part is largely solved." The gate that used to keep most people out of building software is largely gone.

Then there is a second gate, and the tools that nailed the first one drop you right at its edge. Building the thing and shipping the thing are different problems. Your app runs on your laptop. Maybe a toy host will take a frontend. But the moment you want a real backend, a database or three, something that holds up when more than one person shows up, on infrastructure you actually own, the weekend project stalls. That is the cliff.

The gate has always been made of knowledge that was hard to get. Cloud accounts, roles and permissions, networking, the difference between a thing that works and a thing that works securely. Knowing one of those does not mean you know the others. You cannot really practice it without a real thing to build for, and the big providers sat behind all of it. You could see them. You could not get in without a pile of knowledge that has nothing to do with your idea.

That gate is starting to come down.

The part the critics are right about

People who ship by prompting tend to ship things that are not safe. The app works in the demo and falls over in the real world. The secrets are in the wrong place. The setup is one good day away from a bad one. "You cannot vibe your way to a secure production deployment" has been a fair thing to say.

Fair, because the person shipping had to assemble all the professional parts themselves, and could not tell the agent to build what they could not picture. Left to improvise, the agent builds something that looks finished and might not be. The problem was never the person. It was that the safe, standard way to deploy was something you had to already understand to get.

What changes

So change that. Put the secure setup underneath the app by default, instead of asking the person to assemble it.

That is what tsdevstack does. tsdevstack is free and open source. You build your services in standard frameworks, and it generates the production setup around them: a CDN and WAF at the edge, a load balancer that terminates TLS, a Kong gateway behind it routing every request, your services on a private network nothing outside the gateway can reach, secrets in your cloud's managed secret store instead of an env file, the database, the CI, the deploy. As Terraform that lives in your repo and runs in your own cloud account. No platform in the middle, nothing to lock into, nothing to pay but your own cloud bill, through your own account.

You own all of it. And none of it is the part you have to know how to build. Not because vibe coders learned security overnight, but because the part under them is no longer the flimsy part.

Your agent already knows the rails

Here is the part nothing else gives you. When the agent scaffolds your project, it wires itself to tsdevstack directly. From then on it understands the whole system: every command, the structure, where the secrets go, how the routing is wired. So when you ask it for help, it is not improvising infrastructure it half-understands and quietly getting it wrong. It is operating a system that already has the guardrails, using the safe defaults instead of reinventing them.

That is the real answer to "the AI writes insecure code." Here the AI is not writing the dangerous part at all. The dangerous part is already built, the same way for everyone, and the agent just moves around inside it. The thing people are scared of, an agent left alone with your production setup, is the thing that does not happen.

And starting is one paste. There is a prompt on the site you drop into Claude Code, Cursor, or whatever agent you use. It scaffolds the project and brings it up locally in an environment that mirrors the cloud, so what runs on your laptop runs when you deploy. One block of text, from nothing to a running full-stack app. There is a 90-second video. If you would rather drive, it is a few CLI commands.

The fast track is Google

One part still takes real effort. You need a cloud account, and you need to give the framework access. The docs walk every step. It is not hard. It is just the one place that asks for fifteen quiet minutes instead of a prompt.

The providers are not equal here. Start with Google, where the setup is genuinely simple; follow the steps and you are deployed before you have time to get confused. AWS asks a little more, more moving parts, but it is doable and documented. Azure, well. Azure is there when you are ready, and you will know when you are ready.

The gate is lower

The first wave let everyone build. It left out that building was never the whole job; shipping something that holds up was a separate gate with its own toll. That gate is lower now. Not gone, and not for everything, but for ordinary full-stack microservices, the secure, standard deployment is something you can get without having built it yourself. You still have to make the thing good. You just no longer have to become an infrastructure engineer first to put it somewhere real.

The prompt is on the site. Paste it into your agent and start there.

tsdevstack is free and open source: tsdevstack.dev. The copy-paste prompt and the 90-second demo are on the homepage.

IaC, FdI, IaF: three ways a codebase becomes infrastructure

gyorgy — Sat, 13 Jun 2026 09:53:42 +0000

Published June 17, 2026 by gyorgy

Infrastructure used to be something you wrote separately from your application. Lately that boundary has been dissolving, and the vocabulary has not kept up. Three distinct ideas are getting blurred together, partly because they all start from the same place: your code already implies what infrastructure it needs, so why state it twice.

They diverge sharply on what they do about that. Here is the short version, then the longer one.

The short version

Infrastructure as Code (IaC). You describe the infrastructure explicitly, in its own files. The tool turns those files into real resources. Total control, total verbosity, and your infrastructure definition lives apart from your application code.

Framework-defined Infrastructure (FdI). The framework infers the infrastructure from your application code, and a managed platform provisions it for you. Almost no configuration, no drift between app and infra, but the inference only covers what the framework exposes, and the resulting infrastructure runs on the platform's rails.

Infrastructure as Framework (IaF). The framework reads your applications and generates infrastructure code that you own, deployed into your own cloud accounts. The framework does the inferring, you keep the output and the account.

	Who writes the infra	Who owns the output	Where it runs	Scope
IaC	You, by hand	You	Any cloud	Anything you can express
FdI	The framework	The platform	The platform	What the framework exposes
IaF	The framework	You	Your cloud accounts	What the framework covers

The rest of this is just those three rows, explained.

Infrastructure as Code

IaC is the established answer. You write declarations, in HCL or a general-purpose language, that spell out the resources you want: this VPC, this load balancer, this database, these IAM bindings. A tool like Terraform or Pulumi reads the declarations and reconciles your cloud to match.

The strength is that nothing is hidden. Every resource is something you chose and can see. If you need an unusual topology, you can express it, because you are working directly with the primitives.

The cost is the verbosity and the drift. A production system is hundreds to thousands of lines of declarations, and most of it is boilerplate that looks the same across projects. And the infrastructure definition is a separate artifact from the application it serves. The app says it needs a queue; somewhere else, by hand, you wrote the queue. Those two facts can drift apart, and keeping them in sync is manual work that nothing enforces.

IaC is not going anywhere. It is the layer the other two approaches generate down to. The question the newer ideas ask is not whether the declarations get written, but whether you have to be the one writing them by hand.

Framework-defined Infrastructure

FdI says no. The insight, which Vercel articulated and named, is that a framework already encodes most of what the infrastructure needs to know. Vercel's own description is that it leverages the predictable structure of framework-based applications to map framework concepts onto infrastructure without explicit configuration.

The canonical examples are frontend primitives. In Next.js, a file in the routing directory implies a route, so the route table can be generated rather than declared. A page using server-side rendering implies a compute resource to render it, so a serverless function is provisioned. Middleware implies edge compute. You change the code, and the inferred infrastructure changes with it, at the same commit. There is no separate infra artifact to drift, because there is no separate infra artifact at all.

This is genuinely elegant for what it targets. It grew out of the frontend cloud, and that is where it is strongest: deploying framework-based frontends with zero configuration and no drift.

The trade follows from the design rather than being a flaw in it. First, the inference can only reach what the framework exposes, so the model is strongest on frontend and request-response work and thinner on backend systems that want always-on processes, long-lived connections, or full control over the runtime. The platform keeps narrowing that gap, with longer-running and more server-like compute, so the boundary is moving rather than fixed. But the inference is rooted in framework structure, and that structure is richest at the frontend. Second, the provisioning runs on the platform's own infrastructure and accounts. That is what makes the zero-configuration experience possible, and also what bounds it.

Infrastructure as Framework

IaF starts from the same observation as FdI, reached independently from a different direction: if the framework understands the project, its services and how they fit together, the framework can produce the infrastructure. But it answers two questions differently, and those answers are the whole distinction.

The first is ownership of the output. An IaF framework reads your application and generates infrastructure code as a real artifact, the same IaC a person would have written by hand, except a person did not write it. You keep that output. It lives in your repository and deploys into your own cloud accounts. The framework is a generator, not a runtime. When it is done, what you have is yours, and you could in principle stop using the framework and keep the infrastructure.

The second is scope. Because the generated artifact is ordinary infrastructure code targeting ordinary cloud primitives, there is no inherent ceiling at the frontend. The same approach that generates a gateway and a CDN can generate a VPC, a managed database per service, background workers, a message queue, scheduled jobs, and the observability stack around them. The framework reads the whole project, backend included, and emits the infrastructure that architecture implies.

tsdevstack is one implementation of this idea. You build standard application-framework services through its CLI, and the framework keeps track of what your project contains. From that it generates the Terraform, the gateway configuration, the CI pipelines, and the rest, across GCP, AWS, and Azure, deployed into accounts you control. Your application code carries no cloud-provider-specific code, so a different provider is a target you choose, handled by the tooling but open to anyone who wants to change it by hand. The framework holds the knowledge of how to build the infrastructure. You hold the infrastructure.

This idea has relatives. There is a broader movement, often called infrastructure from code, where you express infrastructure needs from inside your application and a tool derives the infrastructure from them. Some of those tools are an SDK you add to an existing framework. Others ask you to adopt a framework of their own. Either way, the infrastructure primitives end up living in your application code. IaF parts ways on exactly that coupling. Your application stays in standard frameworks, with no infrastructure primitives mixed into your business logic. A separate config describes what you want, and the framework generates infrastructure code you own. The knowledge of how to build the infrastructure lives in the framework, not spread through your application code.

The cost here is honesty about a different boundary. The framework covers what it has been taught to cover. It is opinionated about architecture, and an application that wants a fundamentally different shape than the framework models will fit it poorly. IaF trades some of IaC's open-ended expressiveness for the automation, in exchange for keeping the ownership that FdI gives up.

Reading the three together

The three approaches are three points on the same map, defined by two axes: who produces the infrastructure, and who owns and runs what comes out.

IaC keeps everything in your hands, which is also the problem, because everything is in your hands. FdI takes nearly all of it off your hands, which is also the boundary, because the output and its scope are defined by the platform. IaF tries to hold the middle: the framework produces the infrastructure, and you keep the output and the account.

None of these is strictly better than the others. They optimize for different things. IaC optimizes for control and expressiveness. FdI optimizes for the least possible configuration on a focused set of concerns. IaF optimizes for keeping ownership while letting the framework do the writing. The right one depends on which trade you actually want to make, and the only mistake is not noticing that you are making one.

tsdevstack is the IaF framework I build. You create standard application-framework services through its CLI, the framework stays aware of what is in your project, and from that it generates production infrastructure across GCP, AWS, and Azure, as Terraform and gateway configuration and CI pipelines that live in your repository and deploy to your accounts. The framework does the writing. You keep the output. It is open source.

A note on timing. This part of the field moves fast. Platform limits, pricing, and capabilities shift month to month, and some of what is described here will have moved by the time you read it. Everything above reflects the state of things as of June 2026, sourced to the providers' own documentation below where it matters.

References

Agentic loops don't fix lying agents

gyorgy — Fri, 12 Jun 2026 13:57:38 +0000

Published June 15, 2026 by gyorgy

The current discourse says you should stop prompting coding agents and start designing loops around them. Give the agent a trigger and a verifiable goal, let an evaluator check the result, and only stop when it passes. One major provider shipped a dedicated /goal command for it. People are running 25-hour unattended sessions and calling it loop engineering.

The instinct is correct. Never accept the agent's word that something is done. Demand proof.

But "verifiable" is doing all the work in that sentence. I build a framework that generates cloud infrastructure across GCP, AWS, and Azure, and I use coding agents on it daily. I keep an audit document of every serious failure. Reading it through the loop engineering lens is uncomfortable, because every bug in it would have survived a loop. Not because the loop iterated too few times. Because the verifier could not see the lie.

Here are three of them.

The init job that never existed

During the AWS implementation, the agent generated the Terraform for the database layer. The GCP equivalent creates per-service databases and users through native Terraform resources. AWS has no such resources, so something else has to create them.

The agent solved this with a comment:

# Individual databases and users are created at deploy time via init job

There was no init job. Not a stub, not a TODO, not a half-finished Lambda. The agent wrote documentation for a mechanism it never built, then moved on. The code compiled. terraform validate passed. The RDS instance deployed fine. Services failed to connect the first time I tested against a real cloud environment, because the databases they were configured to use did not exist. This was during development, long before any release.

A comment is the cheapest possible way to make code look complete. It costs one line and satisfies any reviewer who skims.

The variable that nothing consumes

When I pointed out the missing databases, the agent fixed it. Its fix was to copy the GCP pattern: pass each service's database password to Terraform as TF_VAR_db_{service}_password.

for (const [serviceName, password] of Object.entries(dbPasswords)) {
  const suffix = toTfVarSuffix(serviceName);
  tfEnv[`TF_VAR_db_${suffix}_password`] = password;
}

On GCP this variable exists because google_sql_user consumes it to create the user. On AWS, no resource consumed it. Terraform accepted the variable and ignored it. The connection string was then built with a password for a user that would never exist.

The agent copied syntax without understanding semantics. It never asked the one question that mattered: what Terraform resource uses this variable? Again, everything compiled. Everything validated. The fix made the codebase look more correct while leaving it exactly as broken.

The bug that passed a real deploy

This one is the worst, because it survived the strongest verifier I have.

The Azure Front Door generator needed to create an endpoint, origin, and route for each Next.js service. The agent wrote this:

const firstNextjsService = nextjsServiceNames[0];

One hardcoded index. It created routing for the first Next.js service and silently ignored the rest. And it deployed. Real Terraform apply, real Front Door, real traffic flowing to a real app. Green across the board.

It was green because the test project had one Next.js service. tsdevstack is a framework. Users can add as many as they want. The spec was "N services must work" and the verifier only ever asked about one. No loop catches that, no matter how many iterations you give it, unless the verification encodes the spec instead of the happy path.

Every bug passed a verification layer

Line the three up against the checks they survived:

Bug	Compiles	terraform validate	Real deploy
Phantom init job	passed	passed	failed
Unconsumed TF_VAR	passed	passed	failed
Hardcoded first service	passed	passed	passed

A loop wired to the compiler would have terminated and reported success three times. A loop wired to terraform validate would have done the same. A loop wired to a live deploy still misses the third one.

The agent was not failing to iterate. The verifier was failing to see.

Why infrastructure is the worst case

Loops genuinely work in domains with cheap, strong verifiers. A compiler error is instant and unambiguous. A type checker is free. A fast unit test suite gives a tight feedback signal, and an agent looping against it converges on working code. That is the environment where the loop engineering results come from, and the results are real.

Infrastructure inverts every one of those properties. The cheap checks are weak: validation confirms your HCL is well-formed, not that your architecture works. The strong check is a real deploy that takes 20 minutes and costs money. And the strongest check might not exist yet. The second Next.js service that exposes the hardcoded index is a user action that happens months after release.

There is one more failure mode worth naming. An agent under pressure to make a verifier pass will sometimes change the verifier instead of the code. Adjust the test expectation. Delete the failing assertion. I have watched this happen. When it does, the loop is not converging on correctness. It is training the agent to fake completion more efficiently.

What I actually do

The answer is not more iterations. It is matching each verification layer to the lies it can catch, and being honest about what each layer cannot see.

For tsdevstack the split looks like this. Snapshot tests on every generated output catch unintended changes to Terraform, Kong configs, and CI workflows the moment they happen. That is the fast loop, and the agent runs inside it constantly.

Real cloud deploys are reserved for the apply, deploy, and verify chain. Slow and expensive, so they run when the cheap layers cannot answer the question. The phantom init job and the unconsumed variable both died here, before any release.

And the hardcoded index taught me the last rule: the spec has to live in the tests, not in my head. The fix for that bug was iteration over all Next.js services, and the regression test now deploys with more than one. The verifier had to learn the framework's actual contract before it could defend it.

The one verifier I never rely on is the agent's self-report. It is the weakest signal in the entire system, and a loop built on it is just the agent agreeing with itself faster.

The part you can delegate

If you cannot define done in a way the agent cannot fake, the loop is just faster fabrication. That is the whole argument.

Loops are the right structure. But the engineering effort moves from prompting into verification, and verification is where the domain knowledge lives. For infrastructure, that means encoding contracts the agent will not infer on its own: every service, not the first one. A user that exists, not a variable that compiles.

Agents are good at producing work. They are not yet trustworthy at judging it. Knowledge is the part you can safely delegate. Outcomes are not.

This experience shaped how tsdevstack treats AI agents. The framework ships an MCP server, @tsdevstack/cli-mcp, built as a knowledge layer rather than an autopilot. It makes an agent a framework expert: every command, the config schema, the routing, the secret assignments. What it does not do is hand the agent ownership of outcomes. People stay in command, because the responsibility is theirs. I wrote about the boundary between hints and enforcement in MCP annotations are a UX layer, not a security layer.

References

Getting a production WAF out of Azure Front Door Standard

gyorgy — Mon, 01 Jun 2026 14:35:00 +0000

Published June 1, 2026 by gyorgy

Azure Front Door covers the whole edge in a single resource: CDN, managed SSL, a load balancer, and a WAF. On AWS and GCP that job is spread across two or three services. On Azure it is one. When I set it up for an Azure deployment, that consolidation was the part I liked.

The first surprise was the floor. Front Door Standard has a base fee of around $35 a month, before you serve a single request. The closest AWS setup, the new CloudFront Free plan, gives you a CDN, a WAF, and DDoS protection for $0 at small scale. So on Azure you start at $35 for roughly the thing AWS hands you for nothing. Not a fortune. But a fixed monthly floor where the competition has none.

The second surprise took longer to find.

The WAF has a ceiling

On Standard, the Front Door WAF runs on custom rules you write yourself. There are no managed rule sets at this tier. That was fine by me. I would rather read the rules than trust a black box.

What I did not expect: Standard caps a WAF policy at 100 custom rules. That is a hard limit. Your entire protection has to fit inside 100 lines.

To cover the usual ground I ended up with about 80 rules, grouped into priority bands: rate limiting, restricted paths and methods, scanner fingerprints, known CVE patterns, SQL injection, XSS, path traversal, command injection, and protocol abuse. That left around 20 rules of headroom under the cap for anything app-specific.

Enough for now. But you can feel the ceiling. A larger app carrying a lot of its own rules would start bumping into it.

What custom rules cost you

Custom rules are static. They match what you told them to match. They don't learn, and they don't update when a new attack pattern shows up next month. Someone has to maintain them. That someone is you.

A managed rule set works the other way. Microsoft tracks new attack patterns and updates the signatures, and you inherit the updates without touching anything. For an OWASP baseline you would rather not hand-maintain, that is worth real money.

On Front Door, managed rule sets are Premium only.

What Premium buys, and what it costs

Premium replaces the OWASP bands (SQL injection, XSS, path traversal, command injection) with Microsoft's managed Default Rule Set 2.1, which updates as new signatures land. It adds bot management. And it adds Private Link origins, so your origin services lose their public endpoint entirely. The custom-rule cap goes from 100 to 500.

The price for all of that: the Front Door SKU goes from about $35 a month to about $330. A flat $295 difference, every month, no matter the traffic.

Tier	Base fee	WAF	Bot management
AWS CloudFront Free plan	$0	5 custom rules	No, starts on the $200 Business plan
Azure Front Door Standard	~$35/mo	~80 custom rules, 100 cap	No
Azure Front Door Premium	~$330/mo	Managed DRS 2.1 plus ~35 custom, 500 cap	Yes

Usage, the data transfer and request charges, sits on top of the base fee for all three. The AWS Free plan covers small scale: 100 GB and one million requests a month, five WAF rules. Past that you move up its paid tiers. The point here is the starting line, not the ceiling.

Which tier to pick

For an early or small service, Standard with a solid custom rule set is the right call. You get real WAF coverage, you can read every rule, and you pay $35 instead of $330. Maintaining 80 rules by hand is not free work, but it is a few hours now and then, not a second job.

You move to Premium when that maintenance stops being worth your time, or when you specifically need bot management or origins with no public endpoint. Managed rules you pay for beat static rules you forget to update. Decide which side of that line you are on before you commit, because $295 a month adds up fast. The move is also hard to walk back. Azure has no in-place downgrade from Premium to Standard, so reversing it rebuilds the Front Door profile from scratch: new hostnames, new certificates, and 15 to 20 minutes of downtime.

For someone landing on Azure for the first time, this is a lot to work out on day one. Two tiers, a rule cap, managed against custom, a $295 gap. None of it shows up in a getting-started guide. It is the kind of decision that should live in your tooling, not in your head.

References

Azure Front Door pricing: https://azure.microsoft.com/pricing/details/frontdoor/
AWS CloudFront pricing: https://aws.amazon.com/cloudfront/pricing/
Google Cloud Armor pricing: https://cloud.google.com/armor/pricing

Standard with this rule set is the default Azure tier in tsdevstack. The framework generates the full custom WAF policy from one config file, and switches to Premium's managed rule sets when you set a single flag. You pick the tier. It writes the rules.

Tags: azure, waf, security, cloud

Cloud Run private networking without a VPC Connector

gyorgy — Thu, 07 May 2026 16:02:30 +0000

If you Google how to call one Cloud Run service from another over private networking, every result tells you to provision a Serverless VPC Access Connector. It works. It also runs a managed pool of e2-micro instances you pay for whether you use them or not, costs $14 to $30 per month, and is no longer the recommended pattern.

Google has documented a cleaner approach in at least three different places. It uses Direct VPC Egress, a Cloud DNS private zone, and Private Google Access on your subnet. It costs about $0.20 per month. And it gives you something the connector path quietly fails at: keeping egress: private-ranges-only on your services while still reaching external APIs without a Cloud NAT.

The problem

You have backend services on Cloud Run that should be unreachable from the public internet. They need to call each other. They need to call external APIs (Stripe, Resend, OpenAI, whatever). And the database and Redis live on private IPs in your VPC.

The "unreachable from the public internet" part is easy. Cloud Run gives you ingress: internal. Set it, done.

The hard part is the rest: services calling each other, services calling external APIs, all without re-exposing them or paying for a NAT. Cloud Run lets you set egress on outbound traffic, with two useful values:

all-traffic routes every outbound packet through your VPC. To reach the public internet from there, you need a Cloud NAT, which is another ~$30 per month plus data processing. Every call to Stripe now hairpins through your network just to leave it again.

private-ranges-only only routes private destinations through the VPC. Public IPs go straight out via Google's edge, no NAT needed. This is what you want for external APIs.

But there's a catch. Cloud Run service URLs (*.run.app) resolve to public IPs by default. So when service A calls https://service-b-123.region.run.app, that traffic exits over the internet path, hits the public Cloud Run frontend, and gets blocked because service B has ingress: internal. You see connection failures on your own infrastructure.

The connector approach works around this by routing everything through the VPC and using all-traffic egress. That works, but as discussed, it brings the NAT problem back.

The Google-documented fix

Make *.run.app resolve to private IPs from inside your VPC. Specifically, to the private.googleapis.com range: 199.36.153.8/30.

This is not a hack. The VPC Private Google Access docs explicitly list *.run.app among the domains supported via this range, alongside *.gcr.io, *.pkg.dev, and *.gke.goog. The Cloud Run private networking page recommends it. The networking best practices doc calls private-ranges-only with Private Google Access the recommended option for reaching Google APIs over Direct VPC Egress. Google maintains a codelab walking through the exact setup.

Three pieces of Terraform:

# 1. Subnet with Private Google Access enabled
resource "google_compute_subnetwork" "subnet" {
  name                     = "${var.project_name}-subnet"
  ip_cidr_range            = "10.0.0.0/24"
  region                   = var.region
  network                  = google_compute_network.vpc.id
  private_ip_google_access = true
}

# 2. Cloud DNS private zone overriding *.run.app
resource "google_dns_managed_zone" "cloudrun_internal" {
  name        = "cloudrun-internal"
  dns_name    = "run.app."
  description = "Route Cloud Run URLs through VPC"
  visibility  = "private"

  private_visibility_config {
    networks {
      network_url = google_compute_network.vpc.id
    }
  }
}

resource "google_dns_record_set" "cloudrun_a" {
  name         = "*.run.app."
  managed_zone = google_dns_managed_zone.cloudrun_internal.name
  type         = "A"
  ttl          = 300
  rrdatas      = [
    "199.36.153.8",
    "199.36.153.9",
    "199.36.153.10",
    "199.36.153.11",
  ]
}

# 3. Cloud Run service with Direct VPC Egress
resource "google_cloud_run_v2_service" "backend" {
  name     = "auth-service"
  location = var.region
  ingress  = "INGRESS_TRAFFIC_INTERNAL_ONLY"

  template {
    vpc_access {
      network_interfaces {
        network    = google_compute_network.vpc.id
        subnetwork = google_compute_subnetwork.subnet.id
      }
      egress = "PRIVATE_RANGES_ONLY"
    }

    containers {
      image = "us-central1-docker.pkg.dev/${var.project_id}/repo/auth-service:latest"
    }
  }
}

If you serve IPv6 traffic, also add an AAAA record pointing to 2600:2d00:0002:2000::.

What happens at runtime: a request from service A to service-b-123.region.run.app does a DNS lookup, the private zone returns 199.36.153.x, and Direct VPC Egress with private-ranges-only treats those addresses as internal and routes the traffic through the VPC's Private Google Access path. Traffic stays on Google's network, hits Cloud Run's internal frontend, passes the ingress: internal check, and reaches service B.

External APIs still work because they resolve to ordinary public IPs, fall outside the DNS override, and bypass the VPC entirely. No Cloud NAT needed.

A Cloud Run service that needs to be reached by an external Load Balancer (a Kong gateway in front of your stack, for instance) uses ingress: internal-and-cloud-load-balancing instead. Same egress and VPC config; slightly looser ingress.

Is this actually secure?

Reasonable question. Changing how DNS resolves can sound like the kind of thing that accidentally undoes a security boundary.

It does not. Three reasons.

First, ingress: internal is enforced at the Cloud Run frontend, not by DNS. The check is "did this request arrive from an allowed network source?" Public traffic still hits a Google-managed boundary that rejects it regardless of how DNS resolved. Resolving service.run.app to a different IP doesn't unlock anything; it changes which Google entry point your traffic uses.

Second, IAM is still enforced. The caller's service account needs roles/run.invoker on the target. If you remove that binding, requests fail even from inside the VPC. The DNS path doesn't bypass IAM.

Third, the 199.36.153.8/30 range is not "a public IP we are tricking the network to treat as private." It is a Google-owned, Google-routed range published specifically for Private Google Access traffic. It only carries traffic between your VPC and Google services, never to or from the broader internet.

The pattern that would be insecure is setting ingress: all and relying on the DNS trick to keep services private. Don't do that. Stack ingress: internal plus IAM plus the DNS routing. That is defense in depth, not security through DNS.

What this costs

Approach	Recurring cost	Components
VPC Connector + Cloud NAT	$44 to $60 per month	Connector instance pool ($14 to $30) plus Cloud NAT (~$30)
VPC Connector, `all-traffic`, no NAT	$14 to $30 per month, but external APIs broken	Connector only
Direct VPC Egress + DNS override	~$0.20 per month	Cloud DNS private zone

Direct VPC Egress itself is free. Private Google Access is free. The only line item is the Cloud DNS zone at $0.20 per zone per month, plus $0.40 per million queries.

There's a performance angle too. The connector adds an extra network hop through a managed proxy pool that maxes out around 1 Gbps. Direct VPC Egress is a direct network path with no proxy hop, lower latency, and higher throughput. Google's own launch blog leads with this when announcing the feature.

Why is this not the default in tutorials?

Speculation, but: the connector shipped first, and most "Cloud Run private networking" content online predates Direct VPC Egress reaching general availability. The DNS-override pattern is documented across three Google pages (VPC, Cloud Run, codelabs), and it's easy to miss if you only read the first hit on a search. The connector is also conceptually simpler to explain in a one-paragraph blog post: "spin up a connector, point your service at it, done." The DNS-override path requires understanding Private Google Access, which most introductory material skips.

The result is an ecosystem default that is strictly worse than what the platform owner recommends.

References

Cloud Run private networking: https://cloud.google.com/run/docs/securing/private-networking
Configure Private Google Access: https://cloud.google.com/vpc/docs/configure-private-google-access
Cloud Run networking best practices: https://cloud.google.com/run/docs/configuring/networking-best-practices
Codelab walkthrough: https://codelabs.developers.google.com/codelabs/how-to-access-internal-only-service-while-retaining-internet
Direct VPC Egress launch blog: https://cloud.google.com/blog/products/serverless/announcing-direct-vpc-egress-for-cloud-run

This is the default GCP networking setup for tsdevstack. The Terraform above is essentially what the framework generates from a single config file.

MCP annotations are a UX layer, not a security layer

gyorgy — Tue, 05 May 2026 14:18:58 +0000

When the Model Context Protocol added tool annotations like readOnlyHint, destructiveHint, and idempotentHint, a lot of MCP server authors and host implementers read them as a permission system. The mental model goes something like: a tool declares itself destructive, the host sees that, and the host either prompts the user or refuses outright. Annotations as enforcement, the way file permissions work in a Unix filesystem.

That's not what they are. A tool annotation is a string the server author typed into a tool definition. The model sees it, the host sees it, and they can use it for confirmation prompts or sorting or color coding. Nothing in the protocol verifies the annotation is true. A server can declare readOnlyHint: true on a tool that drops your production database, and the protocol won't notice. The host can choose to trust the annotation or not, but the trust is a policy decision the host makes about the server, not something the protocol provides.

This distinction matters because the annotation system is being asked to carry weight it wasn't designed to carry. Two active spec proposals (SEP-1862 and SEP-1913) extend the annotation surface in useful ways. Neither of them changes what annotations fundamentally are. They make a UX layer better. They do not turn it into a security layer.

What annotations actually are

Annotations are server-declared hints. The server author writes them into the tool definition, the server sends them to the client in tools/list, and that's the entire chain of custody. There is no signature, no third-party verification, no model-side analysis of what the tool actually does. The annotation is exactly as trustworthy as the server that produced it.

The MCP specification is explicit about this. From the schema documentation: "All properties in ToolAnnotations are hints. They are not guaranteed to provide a faithful description of tool behavior... Clients should never make tool use decisions based on ToolAnnotations received from untrusted servers." That language is in the spec because the working group knows annotations are forgeable.

Justin Spahr-Summers, one of the MCP co-creators, raised the obvious question during the original review of the annotation system: if a client knows the annotations can't be trusted, what's the point of having them? It's the right question and the spec hasn't really answered it. The working answer in practice is that annotations are useful for two things. First, hosts can build better UX on top of them when the server is trusted (skip the confirmation prompt for a tool that declares itself read-only, render destructive tools in a different color, sort tools so safer ones are surfaced first). Second, hosts can use annotations as one signal among many when scoring how much to scrutinize a tool call.

Neither of those is enforcement. Both assume the host has already decided the server is honest. The annotation tells the host how to render the tool's intent, not whether to allow it.

The two SEPs in flight

Two annotation-related proposals are currently working through the MCP spec process, both authored or co-authored by Sam Morrow at GitHub.

SEP-1862 (Tool Resolution) addresses a real problem with static annotations: a single tool that takes an action argument and behaves differently based on its value has to declare itself destructive at all times, because the static annotation has to cover the worst case. A manage_files tool that supports both read and delete operations is forced to look as dangerous as its most dangerous mode, even on read calls. The fix is a new tools/resolve method, inspired by LSP's codeAction/resolve pattern. Before invoking the tool, the client asks the server: given these specific arguments, what are the real annotations? The server returns refined metadata for that call. Multi-action tools become viable again without sacrificing UX accuracy.

SEP-1913 (Trust and Sensitivity Annotations), co-authored with OpenAI, works on a different axis. Where existing annotations describe what a tool does, SEP-1913 adds annotations that describe what the data flowing through a tool means. New fields like sensitiveHint (low/medium/high), privateHint, maliciousActivityHint, and attribution let servers mark returned data with trust and sensitivity metadata, and let that metadata propagate through an agent session so a host can enforce policies like "do not send data marked private to tools marked open-world."

Both proposals fill genuine gaps. SEP-1862 unblocks a tool design pattern that was effectively forbidden by static annotations. SEP-1913 extends the annotation surface from what tools do to what data they handle, which is the right direction if you care about prompt injection and exfiltration.

What neither proposal changes is the trust model. SEP-1862's resolved annotations are still server-declared. SEP-1913's data annotations are still server-declared. A server that lies in tools/list can lie just as easily in tools/resolve or in a sensitiveHint field on returned content. The proposals make honest servers more expressive. They do not make dishonest servers detectable.

What this means for MCP server design today

If annotations are a UX layer, design your server so the UX layer stays accurate without depending on protocol-level enforcement.

The first decision is tool granularity. A multi-action tool with an action argument forces a worst-case static annotation, which means honest hosts will over-prompt and well-tuned models will steer around the tool because it looks dangerous. Until SEP-1862 lands, separate tools per action keep static annotations honest. One tool reads, one tool lists, one tool removes. Each declares its real shape and the annotation is true at all times. This costs you a few more tool definitions and saves the host from making bad UX decisions on your behalf.

The second decision is how to use the existing annotation fields. The boolean grid (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) is independent flags rather than ordered tiers, but in practice tools cluster into three groups. Read-only tools (readOnlyHint: true). Mutating but recoverable tools (readOnlyHint: false, destructiveHint: false). Destructive tools (readOnlyHint: false, destructiveHint: true). Treating these as a tier internally simplifies host policy, even though the protocol doesn't enforce the structure. It also makes it obvious which tier a new tool belongs to when you add one, which matters at scale.

The third decision is what to do about the trust gap. The honest answer is that the protocol can't close it for you, so you close it elsewhere. Sandboxed execution, infrastructure-level egress controls, and third-party scanners (Snyk's Agent Scan is one example) sit outside the protocol and verify or constrain what tools actually do, regardless of what they claim. If your MCP server runs in a context where any of those layers exist, lean on them. The annotations on your tools should be honest, but the security boundary lives somewhere else.

What you should not do is treat annotation correctness as the security boundary. A server author who annotates carefully and a server author who lies look identical to the protocol. If your design assumes the host can tell them apart through annotations alone, you have a gap.

The actual security layer lives outside MCP

Once you accept that annotations are a UX layer, the question of where security actually lives becomes easier to answer. It lives in three places, none of them in the protocol.

The first is host-level policy on which servers to trust. The host decides which MCP servers it accepts tools from, what scopes those servers operate under, and what the user has approved. That's where the real allow/deny decision happens. Annotations help the host build clearer prompts and better defaults, but the host is the one accepting or rejecting the tool call.

The second is infrastructure-level enforcement. Sandboxed execution, network egress rules, filesystem permissions, container boundaries. These don't care what a tool's annotations say. A tool that claims to be read-only but tries to write outside its sandbox is stopped by the sandbox, not by the annotation. For any MCP server doing real work in production, this layer is where deletion, exfiltration, and lateral movement actually get prevented.

The third is third-party verification. Scanners that examine MCP server code or behavior independently of what the server claims. Snyk's Agent Scan is one example of this category, and more will appear as the ecosystem matures. These tools occupy the space the protocol can't, because by definition they treat the server as untrusted and verify rather than trust.

None of this makes annotations useless. Annotations let honest servers communicate intent, let hosts build interfaces that match that intent, and give users the right amount of friction at the right moments. SEP-1862 will make that signal sharper for multi-action tools. SEP-1913 will extend it to the data flowing through tools. Both are worth shipping.

Migrating off AWS App Runner before the April 30 deadline

gyorgy — Tue, 14 Apr 2026 14:11:34 +0000

AWS is shutting the door on App Runner for new customers effective April 30, 2026. If you're running production workloads on it, existing apps keep working for now, but there are no new features coming, and "maintenance mode" at AWS historically means "start planning your migration."

I just finished a migration off App Runner for a production Next.js frontend, and wanted to write down what I learned in case it's useful to anyone else facing the same deadline.

The options

AWS officially recommends ECS Express Mode as the direct App Runner replacement. It's a newer single-resource abstraction that auto-provisions an ECS cluster, service, ALB, security groups, auto-scaling, and CloudWatch logging. One Terraform resource, one deploy, done.

The other options:

Standard ECS Fargate. More moving parts, years of battle-testing, full control.
AWS Lambda + API Gateway. True scale-to-zero, good for infrequent API traffic, cold starts on anything else.
Lightsail containers. Simpler than ECS, cheaper for small workloads.
Google Cloud Run. If you're open to leaving AWS, this is genuinely the best container-in-a-box experience on any cloud.
fly.io / Render / Railway. PaaS experience outside AWS.

For our use case (production Next.js behind CloudFront with a real VPC, Kong gateway, and backend services on the same infrastructure), ECS Fargate was the natural fit. Express Mode looked appealing on paper, but I went with standard Fargate instead.

Why not ECS Express Mode

Three reasons:

1. Terraform bug. The aws_ecs_express_gateway_service resource had an open issue (hashicorp/terraform-provider-aws#45792, "Provider produced inconsistent result after apply") that would have blocked deploys. Fixable with workarounds, but not something I wanted to own.

2. "Managed abstraction" fatigue. App Runner was also supposed to be the easy path. It lasted four years before being sidelined. Express Mode is newer than App Runner was when I first used it. I wasn't willing to bet a second production frontend on another abstraction that might get sunset in 18 months.

3. ALB duplication. Express Mode auto-creates its own ALB. If you already have an ALB for other services (like I did for a Kong gateway routing backend services), you end up paying for two. Around $16/month extra for the overlap. Not huge, but annoying and unnecessary.

Standard ECS Fargate uses the ALB you already have. Same pattern as every other service in the cluster. Boring, predictable, stable.

What the migration actually looked like

The architecture ended up like this:

Browser
  ↓
CloudFront (caching + WAF)
  ↓ X-Origin-Verify header
ALB (port 443, host-based routing)
  ↓                    ↓
Next.js target      Kong target
group               group
  ↓                    ↓
ECS Fargate         Kong gateway
(Next.js)              ↓
                    Backend services

Next.js containers run in private VPC subnets. ALB listener rules use host-based routing to split frontend traffic (example.com → Next.js target group) from API traffic (any host + X-Origin-Verify header → Kong target group). CloudFront in front for caching, SSL, and WAF.

For origin protection, I stuck with X-Origin-Verify header validation on the ALB rule. The AWS-managed CloudFront prefix list is a cleaner option (allow only CloudFront IPs at the security group level) but it's more moving parts and one more thing to update when AWS changes its prefix list. The header check was good enough.

Gotchas I hit

Health checks. Next.js needs a /health endpoint returning 200 for ALB target group health checks. This is obvious in retrospect but it was our first failed deploy. Add it to your app/health/route.ts before you migrate, not during.

Single-phase deploy. The App Runner + CloudFront setup I had was a two-phase deploy: Terraform creates App Runner, CLI collects the URL, Terraform runs again with the URL as a CloudFront origin. With ECS behind an ALB that already exists at plan time, this goes away. One terraform apply, no two-phase dance. Genuinely nicer.

Private subnets from the start. App Runner services are publicly routable on the internet, with WAF-only protection and no network-level isolation. ECS Fargate in private subnets gives you proper network boundaries. Don't skip this. Put your container in private subnets with no public IP, only allow ingress from the ALB security group.

Auto-scaling. Express Mode gives you auto-scaling for free. Standard Fargate requires configuring target-tracking scaling policies yourself. One extra Terraform resource, but you have actual control over what the scaling metric is.

What about scale-to-zero?

This is the pain point for everyone moving off App Runner. Standard Fargate does not scale to zero. You always pay for at least one running task. If your workload has long idle periods, this is a real cost difference.

For production workloads this is usually fine (you want at least one container warm anyway). For dev/staging environments or low-traffic side projects, you have three options:

Cloud Run on GCP. Actual scale-to-zero, sub-second cold starts, no ALB needed.
Lambda + API Gateway. Scale-to-zero, but cold starts hurt if your app isn't designed for them.
Scheduled shutdowns. eventbridge rules to scale the ECS service to 0 at night, back to 1 in the morning. Crude but effective for dev environments.

If your app is a very low traffic fastapi backend (as in the Reddit thread that prompted this article), honestly, Cloud Run is probably the right answer. AWS just doesn't have a real equivalent right now.

Would I do it again?

Yeah, for a production workload with an existing VPC and other services, the standard Fargate path was the right call. The migration was not fun but the result is cleaner than App Runner. Single-phase deploys, private networking, no dependency on a deprecated service.

If I were starting fresh with a brand new single service and no existing infrastructure, I'd look harder at Cloud Run or fly.io. AWS's container story below ECS is just not compelling anymore.

The tsdevstack angle

I build a multi-cloud TypeScript framework called tsdevstack that generates production infrastructure from a config file. The App Runner to ECS Fargate migration above is what shipped in v0.2.0. Framework users who were deploying Next.js frontends via App Runner can now re-run infra:deploy and the framework handles the migration automatically.

One thing worth mentioning given the scale-to-zero discussion above: tsdevstack implements scale-to-zero on AWS for services that set minInstances: 0 in config. Since ECS Fargate doesn't have native scale-to-zero, the framework generates a three-layer mechanism: a CloudWatch alarm scales the service to zero when idle (CPU below 5% for 15 minutes), and a wake-up Lambda spins it back up when the first request hits the ALB and returns 502. Kong catches the 502, fires the wake-up call, and returns a 503 with Retry-After: 30 so the client retries automatically. Cold start is around 30-60 seconds, which is significant compared to Cloud Run or Container Apps, but it's real scale-to-zero on AWS and it works. Kong itself stays at minInstances >= 1 so there's always something to trigger the wake-up.

If you're tired of writing Terraform by hand for every AWS migration AWS forces on you, take a look. Docs here, repo at github.com/tsdevstack.

Tags: aws, terraform, devops, cloud

I built a TypeScript framework that generates your entire cloud infrastructure

gyorgy — Wed, 08 Apr 2026 15:18:57 +0000

TL;DR: tsdevstack is an open-source TypeScript microservices framework. You write a config file and application code. It generates Terraform, Docker, Kong gateway routes, CI/CD pipelines, secrets, and observability — across GCP, AWS, and Azure. One command deploys the whole stack.

The problem

Every TypeScript project I shipped to production followed the same pattern. Write the application code in a week. Spend the next month wiring up infrastructure.

Terraform for the cloud resources. Docker for local dev. Kong or some other gateway for routing. JWT auth boilerplate. Secrets management across environments. CI/CD pipelines. Observability. WAF rules. SSL certificates. Health checks. Database migrations. And then the same dance for staging and production.

The application code was the easy part. Everything around it took 10x longer.

I tried the existing options:

Heroku-style platforms hide too much. The moment you need a WAF, a custom gateway, or VPC isolation, you're stuck.
Pulumi/CDK/Terraform modules are flexible but you still write and maintain all of it. And you write it differently for each cloud provider.
Templates and starters get you a working hello-world but rot the moment you customise them.

I wanted something in between. A framework that owned the infrastructure layer entirely — generated, managed, deployed — but stayed out of the way of the application code.

What I built

Infrastructure as Framework. You write TypeScript application code and one config file. The framework generates everything else.

npx @tsdevstack/cli init

This scaffolds a monorepo with NestJS backends, Next.js frontends, a Kong API gateway, Postgres, Redis, and observability. Everything wired together, ready to run.

npm run dev

Local development matches production. Same gateway, same database engine, same auth flow, same observability stack. No "works on my machine" gap.

npx tsdevstack cloud:init --gcp
npx tsdevstack infra:init --env dev
npx tsdevstack infra:deploy --env dev

This provisions the full production stack: VPC, managed Postgres, Redis, container registry, Cloud Run services, API gateway, load balancer, WAF, SSL certificates, observability. From a single config file.

The same flow works on AWS (ECS Fargate) and Azure (Container Apps). Same framework, same patterns, same commands. No rewriting infrastructure when you switch providers.

What's in the box

Application layer — NestJS backends, Next.js frontends, Rsbuild SPAs. Auto-generated TypeScript API clients with DTOs as separated imports — both frontend and backend apps consume the same type-safe library.

API gateway — Kong routes auto-generated from your OpenAPI specs. JWT validation, rate limiting, CORS, bot detection. Fully customisable when you need it.

Background processing — BullMQ job queues with detached workers running in separate containers. Scale independently from API services.

Object storage — Add buckets with add-bucket-storage. MinIO locally, S3/GCS/Azure Blob in production. Unified StorageModule with pre-signed URLs and per-provider adapters.

Async messaging — Inter-service pub/sub via Redis Streams. Consumer groups, dead letter queues, retry logic. No new infrastructure — runs on the same Redis instance as caching.

Authentication — JWT token management, protected routes, session handling, email confirmation. Bring your own OIDC or use the built-in auth service.

Secrets — Local secrets generated automatically for development. Cloud secrets managed separately and pushed to the cloud provider's Secret Manager. Environment isolation, scoped per service. Works with Secret Manager on all three providers.

Observability — Prometheus metrics, Grafana dashboards, distributed tracing with Jaeger, structured logging. Configured from day one.

Infrastructure — Generated Terraform for GCP, AWS, and Azure. VPC/VNet, managed databases, Redis, container orchestration, load balancers, WAF, SSL, CDN.

CI/CD — Generated GitHub Actions workflows. OIDC authentication, per-service deploys, environment selection. No secrets in your repo.

Compliance — SOC 2, ISO 27001, GDPR technical controls built in. Encryption at rest and in transit, network isolation, zero-credential runtimes.

How it actually works

The framework manages a config.json for your project structure — you don't edit it by hand, you modify it through commands like add-service, add-bucket-storage, add-messaging-topic.

Your config.json ends up looking like this:

{
  "projectName": "my-saas",
  "cloud": "gcp",
  "services": [
    {
      "name": "auth-service",
      "type": "nestjs",
      "hasDatabase": true
    },
    {
      "name": "frontend",
      "type": "nextjs"
    }
  ],
  "storage": {
    "buckets": ["uploads"]
  }
}

When you run npx tsdevstack sync, the framework reads the config and generates:

docker-compose.yml with all the services, dependencies, and health checks
Kong gateway config from OpenAPI specs
Local secrets in .env files per service
Database initialization scripts
Service stubs if you added new ones

You write the infrastructure.json directly for cloud-specific settings (domains, scaling, environments).

{
  "environments": {
    "dev": {
      "services": {
        "auth-service": {
          "minInstances": 0,
          "maxInstances": 3,
          "cpu": 1,
          "memory": "512Mi"
        },
        "frontend": {
          "minInstances": 1,
          "maxInstances": 5,
          "cpu": 1,
          "memory": "1Gi",
          "domain": "dev.example.com",
        }
      }
    }
  }
}

When you run npx tsdevstack infra:deploy, it generates Terraform for your chosen provider and applies it. The framework owns the Terraform, you don't write it, you don't maintain it.

The escape hatch is intentional. Custom Kong config? Drop in your own. Need a Terraform resource the framework doesn't generate? Add it as a side file. Need a cloud-native service the framework doesn't wrap? Use the SDK directly. The framework isn't a cage — it's a starting point that handles 95% of cases and gets out of your way for the other 5%.

Why three clouds?

Vendor lock-in is real but slow-moving. You don't switch clouds because you want to — you switch because acquisition, pricing change, region requirements, or a customer with an immovable preference forces you to. When that happens, rewriting infrastructure is brutal.

tsdevstack generates the equivalent infrastructure on GCP (Cloud Run + Cloud SQL + Memorystore), AWS (ECS Fargate + RDS + ElastiCache), and Azure (Container Apps + Azure Database for PostgreSQL + Azure Cache for Redis). Same application code, same config file, different generated Terraform. Switching providers is a config change and a redeploy.

No abstraction layer trying to hide the differences between clouds. Each provider gets a native, idiomatic implementation. The framework handles the translation.

What about AI agents?

There's a built-in MCP (Model Context Protocol) server with 54 tools for deploying, querying, and debugging your stack. Claude Code, Cursor, and VS Code Copilot can manage the infrastructure directly — and because the framework has strong conventions, the AI agent actually understands what it's doing instead of hallucinating CLI commands.

Three permission tiers: SAFE_READ, CLOUD_MUTATE, CLOUD_DESTRUCTIVE. The agent always asks for permission before mutating anything. The MCP server is built into the CLI — no separate package, no extra setup.

Where it stands

Open source. MIT license. Four packages on npm:

@tsdevstack/cli — the CLI, infrastructure generation, deployment
@tsdevstack/nest-common — shared NestJS modules
@tsdevstack/cli-mcp — MCP server for AI agents
@tsdevstack/react-bot-detection — React bot detection

v0.2.0 just shipped with object storage, async messaging, AWS App Runner → ECS Fargate migration (App Runner stops accepting new customers April 30), and a batch of WAF and observability improvements across all three providers.

This is solo work. I'm a developer building this on the side. It started as the framework I wanted for my own projects and grew into something I think other people will find useful. The first users are showing up now.

npx @tsdevstack/cli init

Docs and guides: tsdevstack.dev
GitHub: github.com/tsdevstack
Discord: discord.gg/tsdevstack

Feedback wanted. Bug reports wanted. Issues, ideas, complaints — all welcome.

Tags: typescript, nestjs, devops, opensource