PART 2 Starry: An Internal Developer Platform (IDP) for Ephemeral Environments

#devex #idp #devops #cloud

In my last post, I introduced Starry as a custom internal developer platform (IDP) that creates ephemeral environments for merge requests. The platform uses well known, interoperable tools (Kubernetes, Helm, ArgoCD, Terraform) which makes the solution practical and adaptable to a range of needs. In this article, I will dive deeper into the architecture, a simplified CICD flow, and the relationship between Starry and Argo CD.

Let's simplify the system further by supposing that a user will manually create environments through the IDP. In a follow-up post I will introduce what is required to fully automate merge request to environment creation.

From Push To Ephemeral Environment

I am running CICD in GitHub for repos sample-be, sample-fe-1, and sample-fe-2. When I have a merge request open and I push to the feature branch, an image is built and pushed to Artifact Registry. This approach is super simplified. The CICD pipeline, triggered when there is a push to a branch that has an open pull request, in this case, only needs to build a container image, properly tag it, and push it to the image registry. The developer can create an environment with any tag pointing to sample-be, sample-fe-1, and sample-fe-2 images.

Ephemeral Environment Management Workflow

A user can go to the Starry Internal Developer Platform (IDP) and create an ephemeral environment by passing in an image tag for sample-be backend. Optionally, a user may also choose to create an environment with sample-fe-1, sample-fe-2 or both by passing in image tags for those. The backend and frontends become available at an ephemeral URL like sample-env-backend.ephemeral.mycompany.com and sample-env-fe-1.ephemeral.mycompany.com. The user can then manually run any tests by clicking around, checking endpoints, or the IDP can have some additional features like automated end-to-end tests that span the backend and frontend. Once the time-to-live (ttl) of an environment is reached i.e. the environment is 30 minutes old, for example, the environment is destroyed or, of course, the user can trigger deletion through Starry before that themselves.

How is the Starry application itself managed in Kubernetes? How does Starry manage an environment? Understanding the Argo CD setup is actually key. Let's go into that next.

`starry-helm`

To understand Starry, it is important to understand how it lives in Kubernetes. To begin with, Starry is a Python FAST API app that uses some frontend technologies and can be bundled up into an image and used just as any other app. We have a Dockerfile and the CICD pipeline builds an image and pushes the image registry when there is a push to the main branch.

Here is a diagram of the Kubernetes resources the Starry Helm chart templates out:

The Starry Helm chart provisions a simple ingress–service–deployment architecture suited for ephemeral use: a stateless web app runs as a Deployment scaled by a HorizontalPodAutoscaler (2–10) and configured via a ConfigMap, using a ServiceAccount annotated for (GCP) Workload Identity. It’s exposed internally by a Service (ClusterIP with NEG) and externally by a GCE Ingress on host starry.mycompany.com, with TLS handled by a ManagedCertificate so the Google L7 HTTP(S) Load Balancer terminates TLS and forwards HTTP to the Service. Caching/coordination is provided by a single‑replica Redis StatefulSet using emptyDir storage (no persistence) and reached via the starry-redis-master ClusterIP (and a headless service for stable DNS). A CronJob runs cleanup tasks and talks to the app through the in‑cluster Service. RBAC (ClusterRole, ClusterRoleBinding, Role, RoleBinding) is created to grant the app and Argo CD the required permissions. Networking flows Ingress → Service (NEG) → pods by label selector, pods reach Redis via service DNS, and outbound internet access uses the cluster’s standard egress/NAT; this favors fast spin‑up/tear‑down with minimal state and operational overhead.

Note that the CronJob cleanup tasks work by deleting environment resources for any environment that has exceeded the time-to-live setting.

Argo CD and the `starry-helm` Chart

To let Argo CD manage Starry and its environments safely, the Helm chart is designed to be fully declarative and follow GitOps best practices. It gives just enough permissions for Argo CD and the Starry app to do their jobs—nothing more.

The chart includes:

RBAC setup:
- A Role and RoleBinding in the argocd namespace lets the Starry app create and update Argo CD Application resources—these represent the ephemeral environments.
- A read-only ClusterRoleBinding lets the app watch core Kubernetes resources so it can monitor environment status.
Argo CD sync hints:
- Sync annotations are added to guide Argo CD on how to apply changes safely:
  - Replace=true on the Redis StatefulSet to handle immutable field changes.
  - PruneLast=true on the HorizontalPodAutoscaler so it’s cleaned up last.
  - ServerSideApply=true and related annotations on the cleanup CronJob to preserve managed fields.
Stable naming and labeling:
- Kubernetes labels and names follow predictable patterns (like app.kubernetes.io/name) so Argo CD can track changes cleanly and avoid drift.

`argocd-apps`

In the argocd-apps repository, we declare the Argo CD setup.

How the repo is structured

This repo uses Argo CD’s App of Apps pattern to keep environments simple and consistent. There’s one root Application per environment (dev and prod). Each root points at a Kustomize overlay directory that lists the child Applications to deploy. This keeps environment differences limited to overlays, while the core app definitions live in a shared base.

At the project level, an AppProject named starry defines what Git repos are allowed, where resources can be deployed, and who can operate them. Think of the AppProject as the guardrails: it scopes access and keeps all Applications operating within an approved perimeter.

What gets deployed first (and why)

Sync waves ensure the right order. Operators that everything else depends on come first: External Secrets and cert-manager are deployed early so secrets and certificates exist before workloads need them. External Secrets pulls a GitLab token from Google Secret Manager and materializes an Argo CD “repository” Secret; this gives Argo CD access to your Git repos without hardcoding credentials.

Workload and automation

The main workload, starry, is deployed via its Helm chart. Image updates are automated by Argo CD Image Updater, which watches your container registry and, when a new allowed tag appears, writes the tag back to Git. That Git change drives Argo CD to reconcile the updated chart, keeping deployments declarative and auditable.

Handling environment drift and platform quirks

Some Kubernetes fields are mutated by the platform (e.g., GKE Autopilot) or are immutable in StatefulSets. ignoreDifferences rules are applied where needed so Argo CD focuses on meaningful drift and doesn’t fight expected, safe mutations. Overall, the combination of App of Apps, overlays, operators-first ordering, External Secrets, and Image Updater results in a clean, environment-aware, and fully GitOps-driven deployment flow.

Starry-Managed Environments via Argo CD

When a new ephemeral environment is requested (for example, from a pull/merge request), the starry service programmatically creates an Argo CD Application custom resource by talking directly to the Kubernetes API. Running in‑cluster with a dedicated ServiceAccount, it authenticates using the standard Kubernetes client flow and has RBAC to manage applications.argoproj.io resources in the argocd namespace. The service calculates a unique preview name (e.g., starry-pr-123), target namespace, and value overrides (hostnames, image tag, replica counts), then submits a new Application object pointing to the environment Helm chart and Git revision for that preview. The spec.destination.namespace is set to the preview namespace, and the Helm/Kustomize parameters embed any per‑environment differences.

Once the Application object is created, the Argo CD controller does the heavy lifting. It watches the new Application, pulls the referenced Git content, renders the manifests (via Helm/Kustomize), and applies them to the preview namespace. Sync waves ensure platform prerequisites (like secrets or certs, if included) are present before the workload rolls out. If needed, the starry service can pre‑create the target namespace or supporting resources, but generally it relies on Argo CD with CreateNamespace=true to keep the flow simple and declarative.

The lifecycle is symmetrical. To tear down a preview, the starry service deletes the corresponding Application object through the Kubernetes API. Because Applications are created with Argo CD finalizers and pruning enabled, Argo CD prunes everything it deployed for that preview and the environment disappears cleanly. To keep operations safe and repeatable, the service uses idempotent “apply”-style calls (or server‑side apply), sets labels/annotations that encode the preview context, and ensures the ServiceAccount only holds the minimal RBAC needed to create, update, and delete Application resources and the preview namespace.

Conclusion

Starry keeps ephemeral environments simple, fast, and consistent by leaning on proven building blocks: Kubernetes for runtime, Helm for packaging, and Argo CD for declarative GitOps. CI’s job is minimal—build and push images when a branch with an open PR changes. Users then create environments in the IDP by choosing which images to run (backend and optional frontends), and Starry provisions a short‑lived, isolated stack at predictable URLs. Because the chart is stateless by default and scoped to a unique namespace, environments spin up quickly and tear down cleanly when TTL is reached or the user deletes them. We will go more into the ephemeral environment Helm chart in another article.

Under the hood, Argo CD provides the control loop while the chart and RBAC give just enough permission for Starry to create/update Application CRs safely. Sync waves ensure platform dependencies (External Secrets, cert‑manager) land first, and optional image automation (Argo CD Image Updater) keeps updates auditable by writing tags back to Git. Drift rules suppress harmless platform mutations so reconciles stay focused on meaningful changes.

This approach is practical and adaptable: each piece is interchangeable, the flow is fully declarative, and environments are reproducible across dev and prod. Today, creation is explicit and user‑driven for clarity; in a follow‑up, we’ll layer on automation to go from “merge request opened” to “environment ready” without manual steps. We will also go over what an ephemeral environment Helm chart could look like.

DEV Community

PART 2 Starry: An Internal Developer Platform (IDP) for Ephemeral Environments

From Push To Ephemeral Environment

Ephemeral Environment Management Workflow