Paulo Victor Leite Lima Gomes

Posted on Jun 12

prompts are becoming CI/CD configuration

#githubactions #aiagents #cicd #platformengineering

GitHub Agentic Workflows is now in public preview, and the headline version is easy to understand.

You write a natural-language Markdown file that describes a reasoning-based automation. GitHub compiles it into Actions YAML. The agent runs inside the Actions world, using the runners, permissions, policies, and review machinery teams already have.

That sounds like "AI for GitHub Actions."

I think the more interesting version is slightly more uncomfortable:

Prompts are becoming CI/CD configuration.

Not prompts in the casual chat window sense. Not "please summarize this issue" typed by one developer on a Tuesday afternoon. I mean prompts as durable, reviewed, repeatable inputs to the delivery system.

They live in the repository. They describe work. They decide which tools an agent can use. They affect code, issues, documentation, triage, security checks, and pull requests. They can spend organization money. They can be triggered by workflows.

At that point, a prompt is not a suggestion.

It is infrastructure.

markdown was the easy part

There is a very nice developer-experience trick here.

Markdown feels harmless. Everyone knows how to read it. A workflow written in Markdown looks less intimidating than a page of YAML, shell scripts, permissions, and matrix jobs.

That is useful. A lot of CI/CD configuration became painful because it asked every team to think like a build-platform engineer. If a product engineer can describe a narrow piece of maintenance work in plain language and have it compile into a normal Actions workflow, that is a real improvement.

But the plain-language surface should not fool us.

The system still needs all the boring parts underneath:

which events can start the workflow
which repository contents the agent can read
which tools it can call
which runner group it uses
which secrets it cannot reach
which output is considered safe
which lockfile represents the compiled behavior
which human is responsible when it gets weird

The prompt is friendly. The operational shape is not casual.

This is the part that engineering organizations need to internalize. The prompt is now one layer of a control plane. It is closer to Terraform, GitHub Actions YAML, CodeQL configuration, or policy-as-code than it is to a chat message.

Readable does not mean low risk.

compiled prompts need code review

The phrase "compiled into Actions YAML" matters.

Compilation creates a useful boundary. It means the natural-language file is not the only artifact that should be understood. There is a generated workflow shape too, and that workflow has permissions, jobs, runners, and execution behavior.

That should sound familiar.

We do not review Kubernetes manifests only by asking whether the app developer had good intentions. We look at the resources, probes, ports, environment variables, service accounts, and network exposure. We do not review Terraform by saying the description felt reasonable. We inspect the plan.

Agentic workflows need the same discipline.

If someone changes the prompt from "triage stale issues" to "fix stale issues," that may be a huge behavior change. If someone adds a tool, broadens a path pattern, changes a permission, or swaps the model used for the task, the diff can look small while the blast radius gets much larger.

Natural language makes this trickier because tiny wording changes can matter.

"Update documentation when APIs change" is different from "update documentation and examples when APIs change." One writes prose. The other may touch executable code. "Open a draft pull request" is different from "open a pull request." "Suggest labels" is different from "apply labels."

This is not a reason to avoid the feature. It is a reason to stop treating prompt review as vibes.

The reviewer should ask boring questions:

What is this workflow allowed to change?
What evidence does it need before changing it?
Does it produce a draft or a final artifact?
Are generated changes clearly labeled?
What happens when the agent is uncertain?
Can the team reproduce the behavior from the committed files?

That is code review. It just happens to include English.

the personal token going away is a big deal

The related GitHub change may be even more important: agentic workflows can now use the built-in GITHUB_TOKEN instead of a long-lived personal access token.

That sounds like plumbing because it is plumbing.

It is also exactly the kind of plumbing that separates hobby automation from company infrastructure.

Long-lived personal access tokens are a bad foundation for shared automation. They blur ownership. They outlive people. They hide inside secrets. They make it too easy for "Paulo's token" to become the thing that keeps a business process running.

Moving agentic workflows to GITHUB_TOKEN puts them into the normal Actions identity model. The repository and organization can own the automation. Permissions can be scoped. Billing can attach to the organization instead of a person. Policies can decide whether Copilot CLI usage is allowed.

This is less flashy than an agent writing code.

It is also the maturity moment.

Agents stop being toys when they stop using your personal token.

That does not make them safe by default. It makes them governable in a way that enterprises can understand.

budgets are part of the workflow file now

Organization billing changes the conversation too.

If an agentic workflow runs as part of Actions and consumes AI credits, the cost is no longer an individual developer experimenting with a tool. It is a property of the delivery pipeline.

That means it needs the same treatment as other metered CI resources.

Some workflows are worth running on every issue. Some should run nightly. Some should run only when a maintainer asks. Some should run on a small model. Some should spend more because the work is security-sensitive or touches a critical path.

None of that should be discovered from the invoice.

The uncomfortable part is that AI cost will often be mixed with human attention cost. A workflow that opens five low-quality pull requests a week is not cheap just because the model bill is small. It spends reviewer time. It creates notification noise. It teaches the team to ignore agent output.

That is why the owner matters.

Every agentic workflow should have someone who can answer three questions:

Is this still useful?
Is this still worth what it costs?
Is this still operating inside the intended boundary?

If nobody owns those answers, the workflow is just another piece of automation drifting toward background noise.

safe outputs are the new build artifacts

GitHub's announcement spends time on safeguards: read-only defaults, sandboxed containers, a firewall, safe outputs, and threat detection scanning proposed changes before they are applied.

That is the right direction.

It also hints at the real problem. Once an agent can reason over repository content and generate changes, the output itself becomes something that needs validation before the rest of the pipeline trusts it.

This is very similar to build artifacts.

A compiled binary is not trusted because the source code looked nice. It is trusted because it came from a known process, in a known environment, with checks, signatures, provenance, and review rules around it.

Agent output needs that mindset.

The question is not only "did the agent produce a useful diff?"

The better questions are:

What input did it see?
Which tools did it use?
Which files did it touch?
Which checks ran after it produced the result?
Was the output constrained before it reached a privileged workflow?
Can a reviewer understand why the change exists?

This is why putting agentic workflows inside Actions is smart. It gives the ecosystem a familiar place to put these controls.

But teams still have to use them.

what i would do first

I would not start with a workflow that rewrites production code.

Start with something useful and boring.

Issue triage is a decent first candidate. Documentation drift is another. Weekly dependency-report generation is probably fine. Release-note preparation can work if the output is explicitly a draft.

The important part is to keep the first workflow narrow enough that a reviewer can tell when it misbehaves.

For example:

It can only comment, not edit code.
It can only touch files under docs/.
It can only open draft pull requests.
It cannot run on untrusted external input.
It has a named owner.
It has a budget.
It has a clear delete condition.

That last one is underrated. Automation should have a delete condition. If the workflow creates more review burden than value for a month, turn it off. If the team ignores every output, delete it. If it needs a human to rewrite everything, tighten the task or stop pretending it is automation.

Engineering maturity is not keeping every clever workflow alive forever.

the punchline

Agentic Workflows are interesting because they make prompts durable.

A prompt can now sit in a repository, compile into a workflow, run on organization infrastructure, use organization identity, spend organization money, and produce changes that enter the same review process as human work.

That is a real shift.

It is also a warning label.

If prompts are becoming CI/CD configuration, they need the habits we already learned from CI/CD: review, ownership, least privilege, budgets, lockfiles, sandboxing, rollback, and deletion when the value is gone.

The pleasant fiction is that natural language makes automation simple.

The more useful truth is that natural language makes automation easier to author, which means we will have more of it.

More automation is good only when the operating model keeps up.

references

To test my projects, I use Railway. If you want $20 USD to get started, use this link.

Top comments (2)

Luis Cruz • Jun 12

This is an excellent and thought-provoking post on agentic workflows as first-class CI/CD infrastructure. I really appreciate how you highlight that prompts are no longer casual suggestions—they are durable, reviewed, and executable inputs that must be treated with the same discipline as YAML workflows, Terraform, or Kubernetes manifests. The emphasis on review, ownership, budgets, sandboxing, and deletion policies is critical to avoid automation drift and maintain trust.

I’d love to collaborate and explore best practices for safe agentic workflows, including automated verification, lifecycle management, and secure tool integration. Sharing strategies for reviewable workflow outputs, explicit delete conditions, and scoped execution could help teams adopt agentic workflows responsibly at scale.

Would you be open to discussing collaboration or prototyping guidelines for enterprise-grade agentic workflow governance?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.