DEV Community: Boris Teplitsky

Compiled AI: Engineering Deterministic LLM Systems

Boris Teplitsky — Mon, 15 Jun 2026 08:29:24 +0000

Moving the LLM from runtime to compile time - and what to build around the corpus it produces.

1. Why compiled AI

Today millions of people use LLM for work and leisure, and AI has become a part of our lives. But systematic use of LLMs in computer systems runs into difficulties - it turns out that LLMs know "all" except what you really need: no knowledge about your business processes, your rules, your data. MCP and agents came to resolve this problem by binding the LLM to information concerning your particular domain.
Another issue is inconsistency of LLM results. Yes, today you can ask AI to compose a poem or to suggest a medical diagnosis, but if you ask again you will likely receive a different poem and a different diagnosis. Compiled AI came to resolve this issue. It proposes the following paradigm: the LLM processes source information and transforms it into schemas (the corpus); runtime systems use the corpus to produce fully deterministic, traceable results. It's clear that you can't "compile" mass knowledge from different sources the way a pure LLM does - but for a wide set of specific tasks, this approach can really help.

2. A Two-Phase Architecture

In this article we focus on a class of systems which cannot tolerate non-deterministic results. The architecture consists of two phases: Phase A - building the corpus (compile time); Phase B - using the corpus for deterministic execution. By corpus we mean the structured output of the compile phase - schemas, templates, generated code, validation rules - everything the deterministic runtime needs to execute the workflow. This is not a training corpus or a retrieval index; it is the compiled program.

The main actor of the compile phase is the LLM (in practice, several different LLMs). The LLMs draw on standards, documents, schemas, user data, and other structured and unstructured sources to build the structured artifacts the execution phase needs. These artifacts are schemas, templates, and code fragments - validators, composers, generators. Artifacts produced by LLMs can contain mistakes, be incomplete, or simply be wrong, so before using them we have to verify and test them. Once we trust the corpus, we treat its publication as a release - the same release process traditional software systems use. From then on, the deterministic system uses the corpus - often with a human operator involved - for ongoing operation.
Running a Compiled AI system depends largely on the particular system, but there are two issues common to the entire class.
1) Contradictions between parts of the corpus.
Because a system receives information from different sources, very often the same corpus item ends up with different values. On its own this doesn't cause a problem - these values can live in the corpus "in peace" - but at runtime a situation can arise where two or more rules with contradictory requirements apply to the same element. For example, our system has to choose an encryption algorithm for some data: the general schema template says it should be method A, best practices recommend method B, regulatory documents require C, and a user wants D. Which one do we apply at runtime? We resolve this with a specific set of rules - for example, Python code, also part of the corpus - that sets the priorities of the different sources and decides which value is correct in this specific situation. We call it an overlapping mechanism.
2) Runtime issues.
Dealing with runtime bugs and issues rests on a basic Compiled AI feature: backtrace capability. For every element of the runtime result, we can say which corpus element (or elements) is responsible for that value. What happens next depends on the kind of issue. In many cases you'll meet an ordinary bug, like in any other system: a human mistake, a typo, wrong syntax, and so on. But there may also be cases where the wrong result was caused by the LLM's output, and you'll have to investigate using LLM tools.

3. Testing and Verification

Verifying and testing Compiled AI systems is fundamentally different from testing traditional software. Traditional testing by QA teams, based on test cases prepared in advance, probably won't work in most cases. For example, in our system we have to use legal documents - an LLM transforms the relevant parts into JSON schemas. For an LLM this is a trivial job, but how do we verify the results? Hire a team of lawyers to read through the results (and explain to them what JSON is)? Not really.
A practical way for verification and testing is to use another LLM. I'll propose the following methods:
generate results from the same prompt by a few LLMs and compare the results;
ask another LLM to verify the results against the sources;
test the whole corpus against an end-to-end test suite of the system's operation.

Part of verification - for example, the syntax of schemas and templates - can be done manually or by simple scripts. It's worth doing if it fits the testing flow; otherwise this work, too, can be handed to an LLM.
Testing the whole corpus against an end-to-end suite is a more effective method but has obvious gaps. A substantial number of test cases have to cover all aspects of production operation. Again - what do we do with the results? How can we be sure the system took into account all requirements of some legal document and interpreted it correctly? This is impossible without invoking LLMs. So testing and verification of Compiled AI systems rests mostly on LLMs: one LLM verifying another.
Despite the fact that most of the grunt work in testing is performed by LLMs, the development team has a central role in verifying and testing a Compiled AI system. It starts with planning: sporadically running LLM queries and comparing results, with no defined goals or flow, never leads to a trusted system. Creating the test prompts and evaluating the responses is human work: only a person can decide whether an LLM's answer actually addresses the question, and determine the next step.

4. Maintenance

Like all software, Compiled AI systems need to be maintained: fixing bugs and issues, adding features, shipping new minor and major releases. But Compiled AI has an additional source of change and an additional thing to maintain: the corpus. If we used certain information sources during the compile phase, then during the system's operation we have to track all of them and promptly update the corpus. Note that while a product's releases can be planned years in advance, changes to the corpus have to be more or less synchronized with changes in the sources the corpus is based on. This is a hard constraint. You can't build a Compiled AI system on sources that change daily - such a system is impossible to maintain. Fortunately, many systems can be built on rarely-changing sources: company documentation, APIs, regulatory and other legal documents, and so on. The human part of this work is to track which of the used sources has a new version, evaluate the changes - especially their impact on the corpus - invoke LLMs as needed, and verify and test the new version of the corpus. So the corpus version won't always match the product version.
For testing the corpus's new version, we recommend a "golden test suite" method. The team selects one or more cases that already pass all required tests and uses them as a baseline for future product versions. If, after changes to the product, the golden suite produces different results, either the new version is incorrect - or the change was significant enough that the golden cases themselves need to be re-verified.

5. Examples

General note

A short note about the Compiled AI examples. In these examples I want to show how Compiled AI may be implemented. Real systems are far more complicated - I'm deliberately keeping things simple in order to explain the approach, not to ship a production-ready product. Please look at the examples from this perspective. That said, I'd be happy to discuss any ideas about real Compiled AI implementations - leave a comment.

Code Security Scan with extended rules set

Problem: Many companies use Semgrep products for security scanning their codebase. Standard sets of rules don't cover all sources of code vulnerabilities and do not include company specific experience and requirements.
Sources: Incident reports (free text), CVE descriptions, internal post-mortem documents, OWASP guidance, language-specific best practices.
Runtime: running Semgrep product with extended rule set (corpus) for company codebase.

Building corpus:
Example of a prompt:

SYSTEM
You are a Semgrep rule generator. Your job is to convert prose security
guidance into valid Semgrep YAML rules that statically detect violations
of the guidance.
Output contract:
- Return ONLY valid Semgrep YAML. No prose, no explanations, no markdown fences.
- The output MUST validate against the Semgrep rule schema (rules: array
with id, pattern or patterns, message, severity, languages, metadata).
- Use kebab-case for rule IDs, prefixed with the source: e.g.
"owasp-sqli-string-concat-python".
- Every rule MUST include a metadata block with:
source: "OWASP SQL Injection Prevention Cheat Sheet"
source_url: <the URL you fetched>
source_fetched_at: <ISO 8601 timestamp of fetch>
cwe: <relevant CWE id>
owasp_top_10: <relevant entry, e.g. "A03:2021 Injection">
- Generate one rule per (language, anti-pattern) combination. Cover
Python, Java, and JavaScript / TypeScript at minimum. Add other
languages only if the guidance clearly applies.
- For each language, target the common database libraries:
Python: sqlite3, psycopg2, mysql.connector, pymysql
Java: java.sql.Statement, JDBC
JS/TS: pg, mysql, mysql2, sequelize raw, knex.raw
- Severity: ERROR for direct user-input-to-query flows; WARNING for
string concatenation patterns where taint cannot be statically proven.
- Prefer Semgrep's `patterns:` block with `pattern` + `pattern-not` over
a single `pattern:` when needed to reduce false positives.
- Do NOT invent CWE numbers or OWASP entries; if uncertain, omit the
field rather than guess.
Output structure:
- Return a single YAML document with a top-level `rules:` array.
- The document MUST start with `rules:` as the first key (no surrounding
wrapper object, no extra top-level keys).
- All rules generated from this source go into the same document.
- The output is intended to be saved as a single file named
`owasp-sqli.yml` and consumed by Semgrep via:
semgrep - config owasp-sqli.yml <code-path>
USER
Fetch the OWASP SQL Injection Prevention Cheat Sheet from:
https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html
Generate Semgrep rules that detect violations of the guidance in that
document. Focus on the cheat sheet's primary defense recommendations
(parameterized queries, ORMs, stored procedures) and the anti-patterns
it warns against (string concatenation, dynamic SQL with user input,
escaping as a primary defense).
If you cannot fetch the URL, stop and report the failure rather than
producing rules from training-data recollection of the document.

Prompt results: On the picture is shown a fragment of a LLM response.

This is a working prompt, you can try it in different LLMs. Take into account, LLMs make mistakes. Invoking this prompt in different LLMs will likely give you different results. Building a Compiled AI system means putting LLM outputs through a validation gauntlet - cross-checking across models, filtering invalid output, running the results against test cases - before anything enters the corpus. Generally, Compiled AI is about how to build a trusted corpus from untrusted LLM outputs.

OPA Policy Agent

Many companies using Kubernetes defend their environment with OPA Gatekeeper, often starting from the open-source Gatekeeper Library. The library covers common admission rules. It doesn't cover your organization's own policies - and it shouldn't have to. That's where Compiled AI comes in: it takes your security team's plain-English policies and compiles them into Rego.

To illustrate how a corpus can be built, let's take two short controls from the CIS Kubernetes Benchmark:

Policy 1 - No privileged containers (CIS Kubernetes Benchmark v1.8, control 5.2.1):
Prose source (paraphrased): "Minimize the admission of privileged containers. Privileged containers can perform almost every action that can be performed directly on the host, severely undermining cluster security boundaries."
Rego output:

regopackage kubernetes.admission
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
container.securityContext.privileged == true
msg := sprintf(
"Privileged container '%v' not allowed (CIS-5.2.1)",
[container.name]
)
}

Policy 2 - Block :latest image tags (operational hygiene, often cited alongside CIS):
Prose source: "Container images must be deployed using immutable tags. Images using the ':latest' tag or no tag at all must be rejected, since they prevent reproducible deployments and complicate rollback."
Rego output:

regopackage kubernetes.admission
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
endswith(container.image, ":latest")
msg := sprintf(
"Container '%v' uses ':latest' tag (not allowed)",
[container.name]
)
}
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not contains(container.image, ":")
msg := sprintf(
"Container '%v' has no tag (implicit :latest, not allowed)",
[container.name]
)
}

The compiled Rego is packaged into Gatekeeper ConstraintTemplate resources and applied to the cluster like any other Kubernetes object. Every kubectl apply then flows through OPA, which evaluates the policies and returns allow or deny - deterministic, fast, no LLM involved. The runtime is the OPA Gatekeeper your cluster already runs.

Multi-Framework Compliance

An American clinic stores its patients' medical data and also collects money from them using credit and debit cards. By law, the clinic's IT system must comply with two regulations: HIPAA (the Health Insurance Portability and Accountability Act) and PCI-DSS (the Payment Card Industry Data Security Standard). Additionally, the clinic has its own set of rules that constrain the content of its internal and external applications. For example:
Card-entry fields and clinical fields (diagnosis, medication, insurance ID) must never appear in the same component. Receipts may include patient name and service date, but no diagnosis or procedure codes. Any screen displaying full medical history must not be reachable from the public payment flow without re-authentication.
Building the corpus consists of: 1) compliance framework documents transformed into JSON; 2) internal rules transformed from free-text instructions into JSON; 3) application content (screens, data, flow) extracted by an LLM from the codebase and design documents. Compiled AI runs as part of CI/CD, which for each version verifies that there are no contradictions between the three corpus components.
It looks like a working model, but with one caveat: the third component (codebase analysis) doesn't belong in the corpus at all - it's actually part of the runtime, so no one verifies the result of the LLM's codebase analysis. The arXiv paper on Compiled AI and INXM both discuss the concept and do allow LLM invocation at runtime. To avoid a theoretical discussion, I'll leave this example as "not pure Compiled AI".

6. Conclusion

The Compiled AI approach is very new, and there are really only a few sources describing it. I haven't found any source explaining the engineering aspects of Compiled AI.
This article is based on my experience of developing a real Compiled AI product. It doesn't cover every aspect of the approach, and some of my claims may not be relevant or may not fit every product or situation. But I believe Compiled AI can extend the use of LLMs into new areas - and with this article I've tried to make its implementation more practical.

Why I stopped letting LLMs write my Terraform

Boris Teplitsky — Mon, 18 May 2026 09:32:50 +0000

I am an IT architect. Been doing system automation for years. As a cloud architect lately, I see that landing zone setup really needs some automation. Hundreds (not millions) of parameters come from business and technical requirements and actually predefine how the LZ should look.The problem is that most of these parameters you can't get from the customer. They are distributed over tens of standards, guides, manuals, etc.Great, LLM is a perfect tool for it. It can gather information from different non-structured sources and build a set of parametrised templates (let's say JSON). Parameters come from the user and I get an unambiguous spec of what we have to build.Next step — create from the spec a Terraform script, execute it, and the LZ is ready. Give it to LLM too.But no... every change in parameters and conditions gives me another LZ. Now I have to go to my big boss and explain to him why, when he asked for one small change, I bring him something cardinally different from the first version.Not on my shift.So I asked the LLM to build Jinja2 templates and some code (generators) that create Terraform from the spec and the templates.Turns out there's a name for what I ended up with — Compiled AI. LLMs build the templates, deterministic code runs them.
Technical details of how I did it:(https://medium.com/google-cloud/compile-time-ai-for-gcp-landing-zones-2555560fbd2f)

Compiled AI for GCP Landing Zones

Boris Teplitsky — Mon, 18 May 2026 06:48:01 +0000

How LLM-authored templates and deterministic generators replace runtime guesswork in complicated cloud foundations.

LLM is spreading into more and more areas of work, but there are several where it cannot produce content directly. These are bank regulatory filings, executed legal contracts, medical prescriptions, audit attestations, aerospace maintenance procedures, and so on. Such outputs must be reproducible from the same inputs and auditable to every value. The same requirements have to produce the same output, and any change in the output has to be explainable back to a change in the inputs.
A public cloud landing zone belongs to this group. The same business requirements, regulatory obligations and architectural decisions have to produce the same configuration. A configuration that drifts because of sampling temperature is not acceptable, and a landing zone is not an exception. It is the foundation a regulated business runs on.

Setting up a landing zone today typically consists of five steps:

Gathering business and technical requirements and constraints, including applicable regulations.
Selecting a reference architecture. It may be a similar project the team did recently, a public repository on GitHub, or a vendor blueprint. For GCP many teams use FAST; for AWS there are more options.
Making the parameter decisions: regions, VPC topology, key policy, IAM model, VPC-SC perimeters, DR pairings, and so on.
Writing or adapting the Terraform and YAML.
Validating and delivering. An iterative process, from syntax validation with terraform plan to deep analysis of how the LZ aligns with the requirements.

Steps 1 and 3 are architects’ job. They need deep understanding of the company’s target and limitations, knowledge in public cloud foundation, architectural thinking. Usually these processes go through a series of discussions with colleagues and stakeholders. Steps 2, 4, 5 are less creative. They are mechanical, time-consuming, and frequently a source of mistakes that are hard to spot. Compile-time AI is aimed at the translation work. The judgment work stays with the architect.

One more thing worth saying before going further. Google’s own landing zone design documentation states that “this series does not specifically address compliance requirements from regulated industries such as financial services or healthcare.” The official guidance stops where regulated cloud foundations actually begin.

This article describes how to close this gap.
Phase A — building the corpus. This is the job of Merlin’s product team, not of the architect. The LLM reads framework documents, FAST modules, GCP best-practice guides, and practitioner literature, and drafts structured corpus entries: schemas, compliance rules, Jinja templates, validators. Each entity is reviewed by a human and passes tests before becoming a part of the corpus. The corpus is versioned. Nothing in it is the LLM’s unreviewed output.
Phase B — generating a landing zone. This is pure architect work performed using the Merlin application at https://app.merlin-studio.cloud. The process consists of three steps: discovery — defining business and technical requirements; configuration — supplying technical parameters; and finally generation — creating Terraform, YAML, diagrams and the scorecard based on corpus entities. There is no LLM in the runtime. The worker container does not even have an API key to a model provider. The same spec and the same corpus version produce identical output.

The corpus is the boundary between the two phases. Phase A produces it. Phase B uses it.

The corpus contains several kinds of entities. The list below is the short version.

Section schemas. JSON files that define what the architect can configure for one part of the landing zone — the fields, their types, and the defaults that kick in if the architect leaves them alone.
Output templates. Jinja2 templates that produce the files the engine emits: Terraform variables, Mermaid architecture diagrams, operator-facing documentation.
Compliance mappings. One JSON file per framework. Each file translates a regulatory regime into concrete spec requirements — allowed regions, restriction levels, rotation cadences, retention floors, and so on.
Validators. Python code that checks the generated bundle for structural correctness, cross-section consistency, and operational readiness, and produces the weighted score the architect sees in the scorecard.

Phase A produces all of these. Phase B reads them. The LLM never runs at Phase B.

Let’s trace one rule from the source document to the rendered Terraform.
The source is HIPAA. Specifically, 45 CFR §164.530(j).
TL;DR in my own words: a covered entity has to keep its HIPAA-related documentation — written policies, procedures, operational records — for six years from the date of creation or the date the document was last in effect, whichever is later. Audit logs are part of that documentation, so the same six-year floor applies to them.
Phase A. The LLM is given the list of corpus topics Merlin tracks and asked to build the HIPAA mapping. It mines its training data, finds the text relevant to log retention, recognises that the matching Merlin field is log_retention.default_retention_days, converts six years to days (2190), and emits a JSON entry that fits the schema for compliance rules. A human reviewer reads the entry against the source paragraph: is the field path right, is 2190 the right number, did the LLM hallucinate any clause that isn’t in the regulation? Verification of the framework as a whole is performed end to end by Merlin’s team before the framework is released for use.
The entry that lands in configuration/compliance_mappings/hipaa.json:

{
"field": "log_retention.default_retention_days",
"field_label": "Default Log Retention Period (Days)",
"operator": "minimum",
"minimum_value": 2190,
"severity": "required",
"rationale": "HIPAA requires 6-year retention of audit logs",
"reference": "45 CFR 164.530(j)"
}

Phase B. An architect opens the wizard at app.merlin-studio.cloud and ticks HIPAA in the compliance section. Three things then happen.
Compliance preprocessor. For each active framework, the preprocessor walks the rule entries and applies them to the spec. For this rule, operator: minimum with minimum_value: 2190 means: write 2190 into the spec at log_retention.default_retention_days, unless the architect already set that field explicitly. After this pass the spec contains default_retention_days = 2190.
Section parser. Reshapes the spec into a flat dictionary that the template consumes. The retention value passes through unchanged.
Template render. The Jinja2 template for the logging section reads:

{ if values.log_retention is defined }
log_retention = {
default_retention_days = {{ values.log_retention.default_retention_days | default(30) }}
custom_buckets = {
{ for bucket in values.log_retention.custom_retention_buckets | default([]) %}
"{{ _lz_prefix }}-{{ bucket.name }}" = {
retention_days = {{ bucket.retention_days }}
locked = {{ bucket.locked | default(false) | lower }}
}{{ "," if not loop.last }}
{ endfor }
}
}
{ endif }

The renderer evaluates the template against the parsed spec and emits this block into the generated 08_logging_monitoring.tfvars:

log_retention = {
default_retention_days = 2190
custom_buckets = {}
}

Three corpus artifacts contributed to that single block: the compliance mapping (which set 2190), the section schema (which defined the field), and the Jinja2 template (which laid out the HCL). Every link in the chain is a pure function of its inputs. Same wizard spec, same corpus version, same generated value, every time.
Official documents — compliance frameworks, Google’s best-practice guides, FAST blueprints — do not cover every parameter of a landing zone. There are decisions an architect has to make that no document prescribes: budget alarm thresholds, naming conventions, DR strategy tiers, and many smaller choices.
For these gaps, the LLM is the right tool: it mines its training data for the prevailing practice across vendor blogs, conference talks, FinOps and SRE literature, and the operational experience encoded in books and forums. It produces an entry against the same schema the compliance mappings use, with a default value that fits the conventional shape. A human reviewer judges whether the default is sensible for Merlin’s audience.
A small example. In 09_cost_management.json the default starter budget is $1000 with alert thresholds at 50%, 80%, and 100% of current spend, plus 100% of forecasted spend:

{
"budget_amount": 1000,
"alert_thresholds_percent": [
{"percent": 50, "basis": "current"},
{"percent": 80, "basis": "current"},
{"percent": 100, "basis": "current"},
{"percent": 100, "basis": "forecasted"}
]
}

Nothing in FedRAMP, HIPAA, or CIS prescribes these numbers. They are the conventional FinOps starter shape.
The entry that lands in the corpus looks no different from one derived from a regulation. The compliance preprocessor reads both the same way. Once the corpus is built, the question of where each value came from is a Phase A concern, settled before the architect opens the wizard.
Phase B is the part of Merlin a reader can verify themselves. The architect’s wizard session produces a single spec.json file: every choice from the discovery and configuration steps is captured there. Merlin’s worker takes that file plus the corpus — Jinja2 templates and Python code (generators, composers) — and produces the bundle. The claim is: same spec.json, same corpus version, identical output zip.
The mechanism is straightforward. The compliance preprocessor walks frameworks in their listed order and rule entries in their JSON-listed order. Section parsers are pure transformations of dicts. Generators and composers are pure Python: same input, same output. The Jinja2 environment is configured without any non-deterministic filters. The bundle assembler writes files into the zip in a fixed sorted order. There is no random sampling anywhere — there is no model in the loop. Sampling is what makes LLMs nondeterministic; Phase B has no LLM.
Anyone can verify this directly at app.merlin-studio.cloud, guest mode included. Every project keeps its configurations versioned. An architect can regenerate the artifacts as many times as they want and, as long as the configuration has not changed, will get the same bundle back. Step into the configuration, change one parameter, regenerate — the diff in the artifacts is exactly the consequence of that one change. Revert the parameter, regenerate again, and the original artifacts are back. The chain from input to output is fully traceable, in both directions.
The corpus is only as good as the team maintaining it. Phase A is real, ongoing work — new compliance frameworks, new GCP services, new best practices. What compile-time AI does is move that work to a place where humans can scrutinise it, and keep the architect’s session deterministic and reproducible.
The two-phase approach Merlin uses has been in the air for some time. Several companies have implemented variants of it. For example Stainless uses this kind of architecture in their official SDKs for OpenAI, Anthropic, and others: an LLM helps build the generator’s configuration, and the generator itself runs without an LLM invocation.
Recently, the approach got its theoretical grounding. A paper published April 2026 under the title Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation (arxiv 2604.05150) studies the architecture in the context of high-stakes enterprise workflows. The paper’s name for the pattern, Compiled AI, fits Merlin exactly, which is why I use it.
The interesting question for cloud foundations is not whether LLMs can help. They can. The question is where to put them. Merlin’s answer is: use LLM to build templates and rules; use deterministic pipelines to build the LZ artefacts.
Merlin is free at app.merlin-studio.cloud. I would be glad to discuss anything in the article that doesn’t sit right with you.

How to Start Your Google Cloud from the Right Foot

Boris Teplitsky — Wed, 29 Apr 2026 05:51:58 +0000

Setting up a GCP landing zone from scratch — a step-by-step approach for DevOps engineers new to GCP.

Let's consider a familiar situation: a company has decided to move part of its IT to Google Cloud. They assigned the job to a DevOps engineer — not a GCP expert, but someone with enough knowledge and experience to set up and deploy services on GCP. Sound familiar? Thousands of companies have been exactly in this position — and thousands more will be.

Here we describe an approach to setting up Google Cloud for a small company — a startup, for example — or for a single system within a large company, using Merlin Studio (https://site.merlin-studio.cloud). We assume the company has no strict regulatory requirements (such as HIPAA or GDPR), but the company does care about following best practices and leaving room for seamless extension in the future.

The setup process with Merlin Studio consists of three stages:

Discovery — defining business requirements and conditions
Configuration — setting parameters for each GCP section
Generation — producing a package of Terraform tfvars files, schemas, documentation, and guides

Discovery

At this stage you tell Merlin what you want it to build: what your company does, how big it is, how experienced your cloud team is, whether you have any regulatory requirements, whether you need connectivity to an on-prem datacenter or another cloud, and so on.

Merlin has no access to your environment and does not validate the accuracy of your answers — but it stores all your information encrypted, separately for each customer. So if you provide accurate data about your company, it will save you the effort of manual edits before deployment.

As shown in the screenshots, our example covers a small company — a startup — with no specific requirements.

Among the technical requirements, pay attention to Terraform Output Format — either "Generic Terraform tfvars" or "FAST (Cloud Foundation Fabric)." FAST is a solid Terraform framework, but it requires effort to set up and maintain. For this reason, we chose tfvars — simpler and more suitable for small companies or projects.

Merlin is able to produce scripts for landing zones that meet the requirements of a set of EU and US compliance frameworks. In our example we assume the company has no specific regulatory requirements, but we still recommend aligning the GCP setup to Google best practices — specifically, CIS Benchmarks. The CIS (Center for Internet Security) Benchmarks are a set of globally recognized configuration guidelines designed to reduce the attack surface of cloud environments. They are vendor-neutral, widely adopted, and free to use. The CIS recommendations are labeled on the configuration screens, but you are not required to accept all of them.

Based on the information provided during Discovery, Merlin sets the default configuration parameters, determines the profile complexity, identifies which configuration sections are required, and recommends a configuration mode: Express (accept best-practice defaults), Guided (review recommendations, customize as needed), or Expert (full control over all options). You can change the configuration mode at any time, but to change the profile you must return to the Discovery stage.

In our example, Merlin recommends the Simple profile and activates 12 configuration sections. To illustrate the key architectural decisions, we selected Guided mode.

Configuration

Configuration is organized into sections, each covering a specific GCP domain — IAM, Networking, Security, and others. For our startup example, Merlin activated 12 sections. A sidebar lets you navigate between sections in any order — completed sections are marked, so you always know where you stand. You can focus on the sections relevant to your setup and leave the rest at their default values.

Setting up a GCP environment requires tens, sometimes hundreds of parameters. Merlin makes this as straightforward as possible:

Most fields have default values, set based on data collected during Discovery.
Almost every field has a help panel with a short explanation, a link to the relevant Google documentation, and an optional LLM prompt.
Fields required by compliance frameworks (CIS Benchmark in our case) are marked with a badge — red for mandatory, orange for recommended.
Merlin validates field values in real time and warns about errors and invalid inputs.

Once you finish all configuration steps, click Generate Spec to produce a JSON document summarizing all configuration parameters. This step also performs cross-section validation, surfacing any errors and unmet requirements. If you are satisfied with the configuration, proceed to the next stage.

Generation

In the final stage, Merlin produces the artifacts for setting up your GCP environment. Clicking the Generate Artifacts button starts the process. In our case, the output includes documentation, security scorecards, architecture diagrams, and 14 Terraform-related files (12 .tfvars and 2 JSON metadata files) used to provision the GCP environment.

In our example, we showed how a DevOps engineer without deep GCP expertise can set up a landing zone from scratch in a single interactive session. Starting from business questions and simple configuration choices, you end up with 14 tfvars files, architecture and security scorecards, Mermaid diagrams, and a step-by-step DEPLOYMENT_GUIDE.md aligned with CIS Benchmarks.

Merlin does not replace learning GCP. You still need to understand what you deploy, review the generated code, and adapt it to your environment. But instead of starting from an empty folder, you start with a working foundation that follows best practices. Your time goes into understanding the decisions, not rediscovering them.

A complete set of files — including Terraform configurations, documentation, scorecards, and architecture diagrams — can be found at github.com/Merlin-Studio/Startup-Example.

Merlin is now open and free to try. No signup, no email — guest mode lets you start designing instantly: https://app.merlin-studio.cloud/

This is the second article in our GCP Landing Zone series. The first article — Setting Up a GCP Landing Zone for Organizations with Strict Regulatory Requirements — covers the same approach for healthcare and other regulated industries.

Setting Up a GCP Landing Zone for Organizations with Strict Regulatory Requirements

Boris Teplitsky — Mon, 20 Apr 2026 07:46:06 +0000

Setting up a GCP Landing Zone for organizations with strict compliance requirements is not a trivial task. Cloud Foundation Fabric with a suitable template can significantly simplify the work — but what if no appropriate template exists, or your specific requirements go beyond what the templates cover? In this article, we explain how a tool we built, Merlin Studio, can help set up a landing zone under complex compliance requirements. We use a US healthcare provider as an example, walking through a landing zone aligned with the HIPAA compliance framework. The same approach applies to other regulations in the US and EU.

The setup process with Merlin Studio consists of three parts:
Discovery — defining business requirements and conditions
Configuration — setting parameters for all landing zone sections
Generation — producing a package of Cloud Foundation Fabric YAML files, scorecards, documentation, and guides.

If you want to try Merlin on your own landing zone, drop us an email at intentarcha@gmail.com and we’ll set up your access — it’s free.

Discovery

The goal of this stage is to determine who and what we are setting up. During Discovery, the user fills out 7 forms describing the company and project's business environment. The forms cover general information about the organization and specific GCP implementation conditions: deployment strategy (GCP-only, hybrid with on-premises, or multi-cloud), workload types, company size, timeline, and budget expectations. A critical section is compliance — which regulatory frameworks must be implemented.
In our example, we use a US healthcare provider that needs to connect GCP to an on-premises data center via Partner Interconnect. The required compliance frameworks are HIPAA, SOC 2, and CIS Benchmarks. Infrastructure requirements include multi-region deployment (us-east1 as primary, us-west1 as secondary) with warm standby disaster recovery.

Based on the information provided during Discovery, Merlin sets the default landing zone configuration parameters, determines the profile complexity, identifies which configuration sections are required, and recommends a configuration mode: Express (accept best-practice defaults), Guided (review recommendations, customize as needed), or Expert (full control over all options). The user can change the configuration mode at any time, but to change the profile, they must return to the Discovery stage.
In our example, Merlin recommends the Standard profile and activates 17 configuration sections. The user selects Guided mode.

Configuration

Configuration is organized into sections, each covering a specific domain — IAM, Networking, Security, and others. In our example, Merlin recommended 17 sections. A sidebar allows free navigation between sections in any order — completed sections are marked, so the user always knows where they are. This allows focusing on specific sections and leaving others at their default values.

In order to set up a landing zone, it is necessary to define hundreds of parameters. Merlin makes this task as straightforward as possible:

Most fields have default values, set based on data collected during Discovery and the selected compliance framework requirements.
Almost every field has a help panel with a short explanation, a link to the relevant Google documentation, and an optional LLM prompt.
Fields required by compliance frameworks are marked with a badge — red for mandatory, orange for recommended.
Merlin validates field values in real time and warns about errors and invalid inputs.

The final step of the Configuration stage is generating a specification. Clicking the "Generate Spec" button triggers cross-section validation and produces a structured JSON document summarizing all configuration parameters. The results screen shows two things: any unmet compliance requirements with direct links to the relevant configuration sections, and the full specification in a readable format.
The compliance posture summary is particularly useful — it shows exactly how many requirements are met per framework (in our example: SOC 2 12/13, HIPAA 28/29, CIS Benchmarks 16/17), lists each unmet requirement with the specific control reference, and provides a direct link to the configuration section where it can be fixed. No cross-referencing external documentation — everything needed to reach full compliance is on one screen.
If the user is satisfied with the configuration, they proceed to the next stage.

Generation

In the final stage, Merlin produces artifacts for setting up the landing zone with minimal effort compared to starting from scratch. All generated artifacts are divided into four categories:

Scorecards — Merlin evaluates the configuration from architecture and security perspectives and provides a score with an explanation of any issues found. In our example, the security scan scored 100/100 (Checkov) and the architecture scorecard 98/100 — Overall Grade A. This is shift-left in practice: issues are caught at design time, before deployment, without waiting for findings from Security Command Center or Wiz.
Terraform — Merlin generated 61 YAML files ready to use with Cloud Foundation Fabric. The files cover all five FAST stages: bootstrap (org setup, IAM, org policies), networking (VPC, subnets, firewall, DNS), security (KMS, SCC), project factory (workload projects), and VPC Service Controls (service perimeters). Dependencies between stages are handled automatically via FAST's $-interpolation tokens — no manual ID copying between stages.
Documentation — A landing zone description and a step-by-step deployment guide explaining how to use the generated YAML files with Cloud Foundation Fabric.
Diagrams — A set of diagrams describing the landing zone structure. Merlin produces Mermaid (.mmd) files rather than static images. Diagrams can be rendered at https://mermaid.live or converted to any graphics format. The complete set of generated files and other Merlin examples are available at https://github.com/Merlin-Studio. Merlin Studio is currently free — registration only at https://site.merlin-studio.cloud.

In our example we showed how weeks of work can be reduced to a single interactive session. Starting from business requirements and technical conditions, and with guidance from the tool throughout, the user ends up with 61 ready-to-use Cloud Foundation Fabric files, architecture and security scorecards, a deployment guide, and Mermaid diagrams — all aligned with HIPAA, SOC 2, and CIS Benchmarks.

Despite providing a rich set of deployment-ready files, Merlin does not replace the cloud architect. Design review, stakeholder discussions, and alignment with networking and security teams remain an essential part of any landing zone project. What Merlin does is take the tedious part off the table.

Interested in trying it? Email intentarcha@gmail.com — we’ll get you set up within 24 hours.

GCP Landing Zone Setup Automation

Boris Teplitsky — Mon, 16 Mar 2026 10:30:04 +0000

The Problem

Every GCP engagement starts the same way. Discovery call, spreadsheet
of requirements, weeks of manual Terraform, IAM wiring, VPC design,
org policies, budget alerts. Then a review cycle to catch what was
missed. Then another.

For a process that happens at the start of every cloud project,
it's remarkably unautomated.

What a Landing Zone Actually Requires

A production-ready GCP landing zone typically includes:

Organization hierarchy and folder structure
VPC and shared networking
IAM roles and service accounts
Org policies and constraints
Budget alerts and billing controls
Security baselines
FAST-compatible configuration

Getting all of this right manually takes 2-3 weeks minimum.

A New Approach: Merlin

Merlin is a GCP landing zone generator. Answer an architecture
questionnaire — org structure, environments, compliance, networking
— and it outputs a complete production-ready landing zone.

What comes out:

FAST-compatible Terraform files
Architecture and security scorecards
Mermaid diagrams
Validation warnings

See the Real Output

Published openly on GitHub — no signup required:

👉 github.com/Merlin-Studio

Includes Simple, Standard, and Advanced profile examples.

Worth Knowing About

👉 site.merlin-studio.cloud