DEV Community: SEON

Hands-on DevOps #1 — GitLab CI/CD Components & Catalog: Build, Publish, and Consume by Version

SEON — Sun, 14 Jun 2026 14:26:49 +0000

TL;DR

Build — Put a component under templates/ and declare its inputs with spec:inputs — types, defaults, options, even regex. Invalid values are rejected before the pipeline is even created.
Publish & consume — Push a semantic-version tag and the release job publishes the component to the CI/CD Catalog; other projects pull it in with include: component@version. Version ranges like @1 and @~latest let you control breaking changes.
How it's verified — Every output in this article was captured by running directly against a real gitlab.com project, SEON.N/gitlab-ci-components-catalog (public), via pipelines, releases, and the CI Lint API.

Overview

When you first set up a CI/CD pipeline, most people start by copying another project's .gitlab-ci.yml. It works for now, but as projects multiply, the same configuration gets duplicated everywhere and three problems keep recurring.

Discoverability — There's no way to know whether someone has already built the same build/test/deploy job. So every team rewrites a similar pipeline from scratch.
Reusability — You can pull in another file with include, but there's no versioning and no input validation. When the source file changes, the pipelines referencing it break without warning.
Contribution — There's no standard path to safely distribute a well-built pipeline piece across the organization and announce "here it is, go use it."

CI/CD components and the catalog exist to solve exactly these three. In one line: they turn pipeline configuration from "code you copy-paste" into "a versioned package."

Just as you don't copy a library's source wholesale but pull it in by version with npm install or a Go module, components let you pull pipeline pieces in by version, like a dependency. And the CI/CD Catalog is the marketplace that gathers those components in one place so you can search and discover them.

A CI/CD component is what GitLab defines as a "reusable single pipeline configuration unit." You pull it in with include just like before, but two things are fundamentally different.

Typed inputs (**spec:inputs**) — A component can declare string/number/boolean/array types, defaults, options (enum), and regex validation. Invalid values are rejected before the pipeline is even created.
Semantic versioning + catalog — Components are released with semantic versions and discovered in the CI/CD Catalog.

Compared with the existing include:

Method	What it pulls in	Typed inputs	Version/Catalog
`include:local`	a file in the same repo	No	No
`include:project`	a file from another project (ref)	No	No
`include:remote`	a file at a URL	No	No
`include:template`	a GitLab-provided template	No	No
`include:component`	a component (spec + job)	Yes	Yes (Catalog)

In one line: where include:local/include:project/include:remote "copy a file as-is," include:component "pulls in a dependency with an explicit version and input contract (spec)." The key difference is that you're pulling in not a plain file but a building block with a guaranteed input spec and version.

CI/CD components and the CI/CD Catalog reached GA in GitLab 17.0 (2024-05-16); before that they were experimental/beta. This article uses GitLab.com (always the latest version).

Where it's useful

Components shine when "you repeat the same thing across many projects." Common real-world use cases:

In this situation	Solve it with a component
Apply the same security scan or lint to every repository	Build a scan component once and put it in the catalog; each project gets the same checks with a few lines of `include`
Standardize a deploy routine (Cloud Run, Kubernetes, etc.)	Take the environment name and image tag as `inputs`, and reuse the same component across teams by changing only the values
A platform team wants to enforce a company-standard pipeline	Gather components at the group level and offer them as a catalog — you get governance (enforcement) and DRY (de-duplication) at the same time
Control breaking changes, like a library upgrade	Consumers pin a partial version like `@1` to auto-accept only non-breaking updates, or pin an exact version

The flagship example GitLab cited at GA was a Google Cloud Run deploy component, and the most common adoption driver is turning "jobs every team repeats the same way" — security scan, build, lint, deploy — into shared, organization-wide building blocks.

Version history and direction

Components and the catalog haven't stood still since they went GA in 17.0. GitLab keeps adding to this area with each quarterly release.

Version	What was added/changed
Beta (2023-12)	CI/CD Catalog beta released
17.0 (2024-05)	Components + catalog reach full GA
18.0	The `release` job's standard image moved from `release-cli` to `glab`; `release-cli` is deprecated, removal planned for 20.0 (until then it falls back automatically when glab is absent)
18.5	Per-project component limit raised 30 → 100
18.6–18.7	Component context expression — a component can access its own metadata such as name and version
18.9	Catalog resource usage analytics introduced
19.0 (2026-05)	Detailed component usage for maintainers — track which project uses which version

The direction is clear: the core syntax (spec:inputs, semantic versioning, release, the catalog) has stayed stable since GA, while operational conveniences (usage analytics, usage detail, context expressions) and raised limits keep being layered on top.

So it's a stable foundation you learn once and use for a long time, and at the same time an area GitLab actively expands every quarter. This article's hands-on was run on gitlab.com (always latest), so the captured behavior already reflects the newest version.

The availability range is broad too. Components, the catalog, and spec:inputs all work across:

Tier: Free · Premium · Ultimate — all tiers
Offering: GitLab.com (SaaS) · GitLab Self-Managed · GitLab Dedicated — all offerings

Even on the free tier or self-managed, the core actions — build, publish, and consume components — work as-is. Just remember two things. First, 19.0's "detailed component usage" is Premium-and-up only. Second, a self-managed instance's catalog starts with zero published components, so you fill it by publishing your own or mirroring from GitLab.com, and the instance must be on 17.0 or later.

Architecture

The component project (templates/*.yml + .gitlab-ci.yml + README/LICENSE) publishes versions to the catalog via release, and any consumer project pulls them in with include: component@version.

Lifecycle

(1) Write the component and push to main → (2) the self-test pipeline actually runs the component → (3) tag a semantic version → (4) the release job publishes → (5) the version is registered in the catalog → (6) consumers include it. Because self-test runs first even on the tag pipeline, a broken component never gets published.

include resolution and input validation

include: component@version is resolved at pipeline creation time: version resolution (tag/SHA/partial/~latest) → fetch the template → input validation (type/options/regex) → $[[ inputs.x ]] interpolation → merge the job. If it's blocked at validation, the pipeline fails before a runner ever spins up.

Prerequisites

This hands-on uses only local glab, git, curl, and python3. Use an already-installed, authenticated glab; if you don't have it, install it per the table below (to use it as a container, docker run the registry.gitlab.com/gitlab-org/cli image).

Tool	Version	macOS	Windows	Linux
GitLab	17.0+	— (SaaS)	—	—
`glab`	1.40+	`brew install glab`	`winget install glab.glab`	package manager, or `docker run --rm -it registry.gitlab.com/gitlab-org/cli:latest`
`git`	2.30+	preinstalled	preinstalled / `winget install Git.Git`	preinstalled / distro package
`curl`	7.x+	preinstalled	preinstalled	preinstalled
Token	—	a PAT with `api` + `write_repository` scope, or `glab auth login`	same	same
Runner	—	enable a shared/group runner on the project (to run pipelines)	same	same

glab/git/curl use the same commands regardless of OS. The commands below are identical across all three; only differences are noted separately.

Core concepts

1) Component directory structure

A component project places components under a top-level templates/ directory.

├── templates/
│   ├── greeting.yml          # single-file component
│   └── my-other/             # directory-form component
│       └── template.yml      # only this file is published
├── LICENSE.md
├── README.md
└── .gitlab-ci.yml

A single file is templates/<name>.yml; a more complex component is templates/<name>/template.yml. In the directory form, only template.yml is published — the rest (build/test helpers) are not.

2) `spec:inputs` — typed inputs

A component file is split into two YAML documents. Above --- is the spec (input declarations); below it is the actual job definition.

spec:
  inputs:
    stage:
      type: string      # string (default) / number / boolean / array
      default: test      # with a default it's optional; without one it's required
    style:
      options: [plain, banner]   # allowed-value whitelist (enum)
    version:
      regex: ^v?\d+\.\d+\.\d+$    # regex validation (RE2)
---
# interpolate with $[[ inputs.NAME ]] in the job definition

The key is the "why." Input validation happens at pipeline creation time (when the configuration is fetched). So invalid input is rejected before a runner ever spins up, saving cost and time. A pipeline can take up to 20 inputs.

3) Interpolation `$[[ inputs.x ]]`

Unlike the CI variable $VAR, input interpolation uses the $[[ inputs.name ]] syntax. It works in the job name, in scripts, and on array elements $[[ inputs.arr[0] ]]. Interpolation is evaluated once at config-fetch time and stays fixed for the whole pipeline.

4) Version references

Components are referenced in this priority order:

Commit SHA — @e3262fdd...
Tag — @v1.0.0 (catalog publishing requires a semantic-version tag)
Branch — @main
Partial version / latest — @1.2, @1, @~latest

~latest points to the latest released version (excluding pre-releases), so breaking changes can flow in automatically. In production, prefer a pinned version or a partial version like @1.

5) include path and `release`

include:
  - component: $CI_SERVER_FQDN/<project-path>/<component-name>@<version>

$CI_SERVER_FQDN is a predefined variable for the GitLab host FQDN, so the same config works across instances. And for a version to appear in the catalog, you must create the release with the **release** keyword (not the Releases API).

Hands-on steps

Step 1 — Create the project and clone

This step creates the empty project (the container) that will hold the components. To publish to the catalog, a project needs a description, a README, and components under templates/, so we prepare that frame first. There are no components yet — we just take the empty repo and set up the working directory.

We created it with a description via the API for clarity (the catalog requires a description). glab repo create works too.

# Get the token from glab config, or export a PAT directly
GITLAB_TOKEN=$(glab config get token --host gitlab.com)
REPO_NAME="gitlab-ci-components-catalog"
DESC="Reusable GitLab CI/CD components (greeting, semver-guard) published to the CI/CD Catalog. Hands-on samples."

# Create via API (or: glab repo create "SEON.N/${REPO_NAME}" --public --description "${DESC}")
curl -s --request POST --header "PRIVATE-TOKEN: ${GITLAB_TOKEN}" \
  --header "Content-Type: application/json" \
  --data "{\"name\":\"${REPO_NAME}\",\"path\":\"${REPO_NAME}\",\"namespace_id\":<your-namespace-id>,\"visibility\":\"public\",\"description\":\"${DESC}\"}" \
  "https://gitlab.com/api/v4/projects"

git clone "https://oauth2:${GITLAB_TOKEN}@gitlab.com/SEON.N/${REPO_NAME}.git" "/tmp/${REPO_NAME}"
cd "/tmp/${REPO_NAME}"
git symbolic-ref HEAD refs/heads/main
mkdir -p templates

Output

created: SEON.N/gitlab-ci-components-catalog | id 83321559 | vis public
Cloning into '/tmp/gitlab-ci-components-catalog'...
warning: You appear to have cloned an empty repository.

Note: A service account or CI environment may not have an SSH key, so do git push in the https://oauth2:${GITLAB_TOKEN}@... form. The token needs api (create) and write_repository (push) scope.

Step 2 — The `greeting` component (4 inputs + interpolation)

This step writes the first component. To show the two essentials — typed inputs and interpolation — at once, we build greeting, which takes four inputs and injects them into the job name and the script. It's the smallest example of a component whose behavior changes with its inputs.

templates/greeting.yml. It uses string/boolean/options/default, and interpolates into the job name and the script.

spec:
  inputs:
    stage:
      type: string
      default: test
    name:
      type: string
      description: "Who to greet. Required: no default."
    style:
      type: string
      default: plain
      options: [plain, banner]
    shout:
      type: boolean
      default: false
---
"greeting $[[ inputs.name ]]":          # interpolated into the job name too
  stage: $[[ inputs.stage ]]
  image: alpine:3.20
  variables:
    GREET_NAME: "$[[ inputs.name ]]"
    GREET_STYLE: "$[[ inputs.style ]]"
    GREET_SHOUT: "$[[ inputs.shout ]]"
  script:
    - |
      MSG="Hello, ${GREET_NAME}!"
      if [ "${GREET_SHOUT}" = "true" ]; then
        MSG="$(echo "${MSG}" | tr '[:lower:]' '[:upper:]')"
      fi
      if [ "${GREET_STYLE}" = "banner" ]; then
        LINE="$(echo "${MSG}" | sed 's/./=/g')"
        printf '%s\n%s\n%s\n' "${LINE}" "${MSG}" "${LINE}"
      else
        echo "${MSG}"
      fi

File: templates/greeting.yml

Note: Write multi-line scripts as a - | block scalar. If you write - echo "OK: ...", the : (colon+space) inside a YAML plain scalar is parsed as a mapping separator and causes a parse error (see Troubleshooting below).

Step 3 — The `semver-guard` component (regex) + `.gitlab-ci.yml`

This step builds a second component to show input validation (regex), and at the same time wires up .gitlab-ci.yml so the project self-tests its own components and releases from a tag. You assemble the component and the "publishing pipeline skeleton" together.

templates/semver-guard.yml validates its input with a regex.

spec:
  inputs:
    stage:
      type: string
      default: test
    version:
      type: string
      regex: ^v?\d+\.\d+\.\d+$
---
semver-guard:
  stage: $[[ inputs.stage ]]
  image: alpine:3.20
  variables:
    INPUT_VERSION: "$[[ inputs.version ]]"
  script:
    - |
      set -e
      echo "Validating version '${INPUT_VERSION}'"
      echo "${INPUT_VERSION}" | grep -Eq '^v?[0-9]+\.[0-9]+\.[0-9]+$'
      echo "OK - '${INPUT_VERSION}' is a valid semantic version"

The project's own .gitlab-ci.yml (a) self-tests the components at the current SHA, and (b) creates a release from a tag.

stages: [test, release]

include:
  - component: $CI_SERVER_FQDN/$CI_PROJECT_PATH/greeting@$CI_COMMIT_SHA
    inputs:
      name: GitLab
      style: banner
      shout: true
  - component: $CI_SERVER_FQDN/$CI_PROJECT_PATH/semver-guard@$CI_COMMIT_SHA
    inputs:
      version: v1.0.0

create-release:
  stage: release
  image: registry.gitlab.com/gitlab-org/cli:latest
  rules:
    - if: $CI_COMMIT_TAG
  script:
    - echo "Creating release for tag $CI_COMMIT_TAG"
  release:
    tag_name: $CI_COMMIT_TAG
    description: "Release $CI_COMMIT_TAG of the components."

File: .gitlab-ci.yml

Note: Using @$CI_COMMIT_SHA in the self-test pulls in exactly the components in the commit you just pushed, at that point in time — the "test yourself in your own pipeline" pattern. Before pushing, we validated the YAML syntax locally.

python3 - <<'PY'
import yaml
for f in ["templates/greeting.yml","templates/semver-guard.yml",".gitlab-ci.yml"]:
    print("OK", f, len(list(yaml.safe_load_all(open(f)))), "doc")
PY
# OK templates/greeting.yml 2 doc
# OK templates/semver-guard.yml 2 doc
# OK .gitlab-ci.yml 1 doc

Step 4 — Push and run the self-test pipeline (real output)

This step checks the component you wrote actually runs. Pushing to main triggers the self-test pipeline, which runs the component you just made exactly as it is in that commit. It's the gate that filters out broken components before publishing.

git add .
git commit -m "Add greeting and semver-guard reusable CI/CD components"
git push -u origin main
# pushing to main triggers the self-test pipeline

Check the pipeline and job status via the API.

PID=83321559
curl -s --header "PRIVATE-TOKEN: ${GITLAB_TOKEN}" \
  "https://gitlab.com/api/v4/projects/${PID}/pipelines?per_page=1"

Real output (job list)

14845634397 | semver-guard     | test | success | 8.4 s
14845634396 | greeting GitLab  | test | success | 10.3 s

Notice the job is named greeting GitLab — the interpolation of "greeting $[[ inputs.name ]]" worked. The job logs (trace) look like this.

Real output (greeting job trace)

$ MSG="Hello, ${GREET_NAME}!" # collapsed multi-line command
==============
HELLO, GITLAB!
==============

Real output (semver-guard job trace)

$ set -e # collapsed multi-line command
Validating version 'v1.0.0'
OK - 'v1.0.0' is a valid semantic version

Note: Both style: banner (banner output) and shout: true (uppercasing) took effect, so HELLO, GITLAB! printed in a box. You can also see the boolean input interpolated into the script as the string "true".

Step 5 — Register the catalog resource + publish the version (real output)

This step puts the verified component in the catalog so it can be "consumed by version." First mark the project as a catalog resource, then push a semantic-version tag so the release job publishes that version to the catalog. From here, other projects can search for and use it.

Mark the project as a "catalog resource." In the UI it's the "CI/CD Catalog project" toggle under Settings > General > Visibility, but for automation use the GraphQL catalogResourcesCreate (no REST yet, issue 463043).

curl -s --request POST "https://gitlab.com/api/graphql" \
  --header "Authorization: Bearer ${GITLAB_TOKEN}" \
  --header "Content-Type: application/json" \
  --data '{"query":"mutation { catalogResourcesCreate(input: { projectPath: \"SEON.N/gitlab-ci-components-catalog\" }) { errors } }"}'
# => {"data":{"catalogResourcesCreate":{"errors":[]}}}

Now pushing a semantic-version tag runs the tag pipeline: self-test, then the release job.

git tag v1.0.0
git push origin v1.0.0

Real output (tag pipeline jobs)

14845637416 | create-release   | release | success
14845637415 | semver-guard     | test    | success
14845637414 | greeting GitLab  | test    | success

Real output (create-release job trace)

$ echo "Creating release for tag $CI_COMMIT_TAG"
Creating release for tag v1.0.0
• Creating or updating release  repo=SEON.N/gitlab-ci-components-catalog tag=v1.0.0
✓ Release created:    url=https://gitlab.com/SEON.N/gitlab-ci-components-catalog/-/releases/v1.0.0
✓ Release succeeded after 0.77 seconds.

Check that the version was registered in the catalog via GraphQL.

curl -s --request POST "https://gitlab.com/api/graphql" \
  --header "Authorization: Bearer ${GITLAB_TOKEN}" --header "Content-Type: application/json" \
  --data '{"query":"{ ciCatalogResource(fullPath: \"SEON.N/gitlab-ci-components-catalog\") { name webPath versions { count nodes { name } } } }"}'

Output

{"data":{"ciCatalogResource":{"name":"gitlab-ci-components-catalog",
  "webPath":"/SEON.N/gitlab-ci-components-catalog",
  "versions":{"count":1,"nodes":[{"name":"v1.0.0"}]}}}}

Note: The image the release job uses is, per the current official docs, registry.gitlab.com/gitlab-org/cli (glab) — changed from the former release-cli. Pushing only a tag without the release keyword does not create a version in the catalog.

Step 6 — Verify consumer include (CI Lint API, real output)

This step checks whether another project can actually consume the published version. Without spinning up a runner, the CI Lint API alone confirms that version references and input validation behave as intended.

We verify with the CI Lint API (POST /projects/:id/ci/lint) whether another project can pull in the published version. include resolution and input validation are confirmed without a runner.

PID=83321559
curl -s --request POST --header "PRIVATE-TOKEN: ${GITLAB_TOKEN}" \
  --header "Content-Type: application/json" \
  --data '{"include_jobs":true,"content":"stages: [test]\ninclude:\n  - component: gitlab.com/SEON.N/gitlab-ci-components-catalog/greeting@v1.0.0\n    inputs:\n      name: World\n      style: plain"}' \
  "https://gitlab.com/api/v4/projects/${PID}/ci/lint"

Output (three version references)

ref @v1.0.0     -> valid=True  jobs=['greeting World']  errors=[]
ref @~latest    -> valid=True  jobs=['greeting World']  errors=[]
ref @1.0.0      -> valid=False errors=["Component '.../greeting@1.0.0' - content not found"]

Since the tag is v1.0.0, the include must also be @v1.0.0 (or @~latest). @1.0.0 (no v prefix) differs from the tag name, so it returns "content not found." Next, validate semver-guard's regex input.

Real output (regex input pass/reject)

=== valid version v2.3.4 (regex pass) ===
valid: True | jobs: ['semver-guard'] | errors: []

=== invalid version 'not-a-version' (regex reject) ===
valid: False | errors: ['`.../semver-guard@v1.0.0`: `version` input: provided value does not match required RegEx pattern']

Note: An invalid version string is blocked by regex validation at the pipeline-creation stage, before the job ever runs. This is the core value of a component's typed/regex inputs.

Advanced hands-on

The basic loop — build the components (Steps 1–3), verify with self-test (Step 4), publish to the catalog (Step 5), confirm a consumer can pull them in (Step 6) — ends here. Below are advanced patterns common in practice: real consumption from another project, controlling breaking changes with version ranges, and composing multiple components.

Step 7 — Actually consume from a consumer project (usage)

The point of the catalog is consuming a component with include from another project, not the one that built it. So we created a separate consumer project (public) and actually pulled it in.

The consumer project's .gitlab-ci.yml only needs to pull the components in by version.

stages: [test]

include:
  - component: $CI_SERVER_FQDN/<your-namespace>/gitlab-ci-components-catalog/greeting@v1.0.0
    inputs:
      name: Consumer
      style: banner
  - component: $CI_SERVER_FQDN/<your-namespace>/gitlab-ci-components-catalog/semver-guard@v1.0.0
    inputs:
      version: v3.1.4

Real output (consumer pipeline jobs)

greeting Consumer | test | success
semver-guard      | test | success

The job named greeting Consumer confirms the interpolation worked with the consumer's input (name: Consumer). In other words, once a component is published, any project pulls it in with its own inputs.

Note: The catalog aggregates a per-component usage count. Query it via GraphQL.
glab api graphql -f query='{ ciCatalogResource(fullPath:"<group>/<project>"){ versions{ nodes{ name components{ nodes{ name last30DayUsageCount } } } } } }'
However, last30DayUsageCount is not reflected immediately after consumption (in our test it was still 0 on a same-day re-query). Check usage after the aggregation has refreshed.

Step 8 — v2.0.0 release and version ranges (controlling breaking changes)

Signal a breaking change with a MAJOR bump. We changed greeting's default style from plain to banner (a change that alters existing consumers' default output) and released v2.0.0.

# templates/greeting.yml: change style default from plain to banner (breaking)
git commit -am "greeting v2.0.0: change default style to banner (breaking change)"
git tag v2.0.0
git push origin main v2.0.0

Once the tag pipeline passes self-test and then release, two versions coexist in the catalog.

Real output (catalog versions)

{"versions":{"count":2,"nodes":[{"name":"v2.0.0"},{"name":"v1.0.0"}]}}

Now the version a consumer pulls in depends on which reference it uses. We verified with CI Lint.

Real output (version reference resolution)

@v1.0.0  -> valid     exact tag
@v2.0.0  -> valid     exact tag
@1       -> valid     latest in the 1.x range = v1.0.0
@2       -> valid     latest in the 2.x range = v2.0.0
@~latest -> valid     latest released (pre-releases excluded) = v2.0.0
@1.0.0   -> invalid   content not found (the tag is v1.0.0)

What matters is the operational strategy. Even when the breaking v2.0.0 ships, a consumer pinned to @1 keeps receiving 1.x and stays safe. A consumer on @~latest, however, automatically gets v2.0.0 and its behavior changes (here, the default output becomes banner). So production pipelines should use a partial version like @1 or a pinned version, and avoid @~latest.

Step 9 — Composing components (multiple components in one pipeline)

The real power of components is "assembly": bundling several single-purpose components as stages into a standard pipeline. Here we add one more component, lint, and compose the self-test as a lint → test flow.

templates/lint.yml:

spec:
  inputs:
    stage:
      type: string
      default: lint
---
component-lint:
  stage: $[[ inputs.stage ]]
  image: alpine:3.20
  script:
    - |
      echo "Linting component templates..."
      for f in templates/*.yml; do echo "checking ${f}"; done
      echo "OK - lint passed"

The .gitlab-ci.yml sets stages to [lint, test, release] and adds lint (the lint stage) before the existing greeting/semver-guard (test).

stages: [lint, test, release]

include:
  - component: $CI_SERVER_FQDN/$CI_PROJECT_PATH/lint@$CI_COMMIT_SHA
    inputs:
      stage: lint
  # greeting and semver-guard are the same as Step 3 (test stage)

Real output (pipeline graph)

The lint stage's component-lint must pass first, then the test stage's greeting/semver-guard run. Each component is versioned and published independently, but the consumer composes them as stages into a single standard pipeline — this is the heart of an "organization-standard pipeline catalog."

Verification

Check	Command/method	Expected
Component works	self-test pipeline	`greeting`/`semver-guard` jobs success
Interpolation	job name	`greeting GitLab`
Catalog publish	GraphQL `ciCatalogResource`	`versions.count == 1`, `v1.0.0`
Release	`GET /projects/:id/releases`	`tag_name: v1.0.0`
Consumer resolution	CI Lint `@v1.0.0`	`valid=true`, job merged
Input validation	CI Lint, invalid value	`valid=false`, RegEx error

On failure: if the pipeline doesn't start, check the project's runner setup (shared/group runner) and the .gitlab-ci.yml syntax (glab ci lint); if the version doesn't appear in the catalog, check (1) catalog-resource registration, (2) use of the release keyword, (3) presence of a description/README.

Production

Permission model & token rotation: Publishing requires the Owner role and api+write_repository scope. In CI, prefer CI_JOB_TOKEN or a group access token where possible, and set an expiry on PATs and rotate them regularly. A token with no expiry is especially risky, so always have an expiry/rotation policy.
Governance: Gather component projects at the group level to enforce a standard pipeline. Signal breaking changes with semantic versions (MAJOR.MINOR.PATCH); consumers pin @1 (a partial version) to auto-accept only non-breaking updates, or pin an exact version.
**\~latest** caution: Convenient, but breaking changes flow in automatically. Production pipelines should prefer a pinned version.
Security hardening: Put regex/options on the inputs a component takes to shrink the arbitrary-command-injection surface. When taking an image tag as input, restrict it with a regex too.
Monitoring & observability: Wire release/pipeline failures to alerts — e.g., a webhook to a notification channel on pipeline failure. (Integrate with your environment's observability stack, e.g. Prometheus/Loki/Tempo.) Expose the pipeline status badge and the catalog version count as dashboard metrics.
Failure-recovery runbook: If you published a bad version: (1) bump a patch version and republish, (2) consumers pin to the last good version, (3) if a ~latest consumer broke, switch it to a pinned version immediately.

Common mistakes & troubleshooting

Symptom	Cause	Fix
`content not found` on `include`	tag is `v1.0.0` but referenced as `@1.0.0`	match the include version to the actual tag name (`@v1.0.0`), or use `@~latest`/`@1`
YAML parse error `expected <block end>`	a `:` (colon+space) in a plain-scalar script (`echo "OK: ..."`)	write multi-line scripts as a block scalar so colon+space isn't in a plain scalar
version doesn't appear in the catalog	project isn't registered as a catalog resource	enable via `catalogResourcesCreate` GraphQL or the UI toggle
registered but zero versions	pushed only a tag without the `release` keyword	add a `release:` job to the pipeline and re-push the tag
pipeline doesn't start at all	no active runner on the project (gitlab.com free tier may require account verification)	enable a shared/group runner; check the group's `shared_runners_setting`
401 with a bad token	using a self-managed or expired token	use the correct host's token via `glab config get token --host gitlab.com`

Going further

**array** inputs and indexing: use structured inputs like $[[ inputs.servers[0].host ]] (max 5 indices per segment).
Strengthen component tests: add a negative test to the self-test that asserts invalid input is actually rejected, so a broken component never gets published.
Search the CI/CD Catalog: explore public components at https://gitlab.com/explore/catalog to learn reuse patterns.

Cleanup

This hands-on uses only git push and release, and stands up no extra infrastructure. gitlab.com pipelines consume a small amount of the project's CI minutes.

# After the hands-on, clean up (optional):
# - Delete the whole project only after your own approval. To keep it, just revoke the token.
# - Manage the temporary token used for git push per your expiry/rotation policy.
# - Clean the local working directory: rm -rf /tmp/gitlab-ci-components-catalog

Cost/billing note: Publishing to the catalog itself costs nothing extra. But running pipelines consumes CI minutes. Use rules to avoid repeatedly triggering large self-tests.

References

CI/CD components | GitLab Docs — https://docs.gitlab.com/ci/components/
CI/CD inputs | GitLab Docs — https://docs.gitlab.com/ci/inputs/
CI/CD Catalog goes GA (GitLab Blog, 2024-05-08) — https://about.gitlab.com/blog/ci-cd-catalog-goes-ga-no-more-building-pipelines-from-scratch/
Introducing the CI/CD Catalog beta (GitLab Blog, 2023-12-21) — https://about.gitlab.com/blog/introducing-the-gitlab-ci-cd-catalog-beta/
GitLab 17.0 release (2024-05-16) — https://about.gitlab.com/blog/gitlab-17-0-release
CI/CD Catalog (explore) — https://gitlab.com/explore/catalog
GraphQL API reference — https://docs.gitlab.com/api/graphql/reference/

Hybrid k3s #5: Putting kubectl down — GitOps 1/3

SEON — Sun, 14 Jun 2026 13:03:11 +0000

0. About this series

This series is a record — written one piece at a time — of how I built the homelab in the image above, the one that's still running as I write this.

What started as a toy project from a simple "would this even work?" turned, through satisfying performance and an endless cycle of tearing down and rebuilding, into a real toy that takes the edge off the stress that builds up at work. It isn't a resource-rich cluster, but it's been more than enough to get a real taste of Kubernetes, and it keeps handing me the next thing I want to try.

6 nodes — 2 Lightsail servers (control plane + etcd) in the cloud (AWS Tokyo) + 4 Lima VM agents on a home (Sapporo) iMac
19 vCPU / 61 GiB total, 49 namespaces, 248 pods (150 running)
Deployed with ArgoCD, auth via Keycloak OIDC, with CloudNativePG, Vault, CrowdSec, Prometheus/Grafana and more running on top

Through part 4, I stood up the cluster and the CloudNativePG on it imperatively (helm, kubectl apply). This part is about turning all of that into GitOps with ArgoCD. The scope — from tool choice to cluster bootstrap to secret management — is too wide for one part, so I'll split it into three.

Part 5 (this one) · Design — why GitOps, and what to use (tool and structure), decided by comparison.

Part 6 · Bootstrap — install ArgoCD and stand up the cluster's skeleton with app-of-apps and ApplicationSet.

Part 7 · Apply — move CloudNativePG over to GitOps as the first target, and finish off secret (password) management.

This first part takes us as far as deciding the tool and the structure.

1. Background — the things I'd stood up imperatively started to pile up

When I brought up CloudNativePG in part 4, two kinds of commands were enough. I installed the operator with helm, and brought up the database cluster with kubectl apply.

# Part 4 — install the operator
helm upgrade --install cnpg cnpg/cloudnative-pg \
  --namespace cnpg-system --create-namespace --wait

# Part 4 — apply demo-db.yaml (Cluster CRD)
kubectl apply -f demo-db.yaml

Kubernetes broadly distinguishes three ways of managing objects — imperative commands (kubectl create ...), imperative object configuration, and declarative object configuration (kubectl apply -f) (Kubernetes — Object Management).

The problem was where that YAML lived, and who applied it, and when. In my case the file was somewhere on my laptop, the person applying it was me, and the timing was "whenever it crossed my mind."

This is operating declarative manifests by hand, and as the things I was bringing up grew one by one, I soon hit a wall.

1-1. The walls I hit operating by hand

An app soon sits on top of the DB, an ingress in front of it, secrets and backups beside it. And as I ran more and more services, this cluster has grown to 248 pods across 49 namespaces. Running that scale by hand with apply, I ran into the following walls.

Drift — the cluster becomes the only truth.
- The moment you fix the cluster directly with kubectl edit or kubectl scale in a pinch, the original YAML and the actual state diverge. That change isn't recorded in any file, so re-applying the original later silently overwrites it or conflicts.
- Kubernetes itself works by having controllers continuously reconcile the current state toward the desired state (Kubernetes — Controllers), but if that "desired state" lives only in my head and in scattered files, there's no reference point for reconciliation at all.
- "What's running now is the real thing" — but that real thing isn't in code.
No history — you can't answer "why is it like this?"
- Why is replicas 3, who added this env var, when, and why — manual ops keeps no record of it. You're left leaning on shell history and memory.
Not reproducible — you can't rebuild the cluster.
- Tear down a node or build a new cluster, and you have to re-type all those applys again, in the right order and with the right dependencies. Thinking back to parts 1 and 2, where I tore down and rebuilt the cluster over and over, this wasn't somebody else's problem.
No audit or collaboration — there's no point to stop and review.
- There's no review of a change, no approval, no revert. Hit Enter and it goes straight to production.
- Honestly, for a homelab I use alone, this item hurts the least. But the real point of running this homelab is to get hands-on with a way of working that transfers directly to a work environment — an enterprise cluster operated by many people.
- The moment a team touches the same cluster, 'who / when / why' and review, approval, and rollback stop being optional and become essential.
- A single change can lead straight to an outage, and without a record to trace it, neither recovery nor accountability is possible.
- GitOps structurally removes this problem by making every change go through Git.
- GitLab explains that a merge commit into the main (trunk) branch itself becomes the audit trail, and the Merge/Pull Request becomes the place where review, approval, and collaboration happen (GitLab — What is GitOps).
- This aligns with one of CNCF OpenGitOps' core principles, "Versioned and Immutable" (a versioned, immutable history of state) (OpenGitOps Principles).

The biggest cause of these walls wasn't whether manifests existed — it was whether those manifests were gathered in one place as the single source of truth, and applied continuously and automatically.

Even declarative manifests bring drift, missing history, no reproducibility, and no audit when applied by hand. The point is to gather declarations in one Git place and continuously automate how they're applied — and especially in an enterprise where many people touch the same cluster, audit and collaboration become essential.

2. GitOps

2-1. What GitOps is

GitOps, in one line, is "an operating model where you declare the desired state in Git, and keep the cluster always matching that declaration."

First, declare the desired state of the system (infrastructure and apps) with Git as the single source of truth.
Second, a software agent inside the cluster automatically pulls that declaration and converges the actual state to it.
- Humans only write "this is how it should be" into Git; reflecting and keeping it in the cluster is the agent's job.

Simply "putting YAML in Git" is not GitOps in itself.

The point is to nail Git down as the single source of truth, so that the cluster cannot change without going through Git. If the path of typing kubectl by hand stays open alongside it, Git is just a file store.

This approach was first named by Weaveworks' Alexis Richardson in a 2017 piece called Operations by Pull Request, and today the CNCF's OpenGitOps project standardizes and maintains its definition with four principles (OpenGitOps).

① Declarative
- Write the desired state not as "do this" (a command) but as "it should be this way" (a declaration).
- Commands depend on order and timing; a declaration converges to the same result whenever it's applied.
② Versioned and Immutable
- Keep that declaration somewhere, like Git, where versions remain and nothing can be changed arbitrarily.
- Every change is set in stone as a commit, leaving who, when, and why, and a revert becomes a rollback.
③ Pulled Automatically
- Rather than a human pushing it in, the agent pulls the declaration itself.
④ Continuously Reconciled
- The agent constantly observes the actual state and, when it diverges from the declaration, brings it back in line.

2-2. Why GitOps

Drift → corrected by ③ auto-pull + ④ continuous reconciliation.
- The agent constantly compares Git against the actual state, so even if someone diverges it by hand with kubectl edit, the next reconciliation reverts it (self-heal).
- "What's running now is the truth" gives way to "Git is the truth."
No history or audit → solved by ② versioning.
- Every change remains as a commit or Pull Request, so who, when, and why is traceable, and there are points for review, approval, and rollback.
- Especially important in an enterprise where several people touch the same cluster.
Not reproducible → solved by ① declaration + ② a single source.
- The desired state of the entire cluster is declared in one place in Git, so even rebuilding the cluster restores the same shape by re-applying that declaration.

In short, the value of GitOps isn't "because it's convenient" — it's that it structurally removes the problems that manual ops structurally carried. And that value splits once more on safety, depending on who performs ③ and ④ and in which direction.

2-3. Push delivery and Pull delivery

There are two ways to deliver a change to the actual cluster.

Push delivery is where a CI/CD pipeline outside the cluster pushes the change into the cluster. The pipeline builds and then applies the manifests, and gitops.tech points out the limitation that this approach "is only triggered when the environment repository changes, and (the cluster's) deviation isn't noticed on its own" (gitops.tech).

Pull delivery is where an agent (operator) inside the cluster watches Git directly and pulls the change in. As gitops.tech describes it, the operator "continuously compares the desired state in the environment repository with the actual deployed state, and aligns the infrastructure if there's a difference." That this compare-and-correct never stops is the decisive difference from Push.

2-4. Why Pull is safer and more robust

There are two clear reasons Pull is held up as the recommended GitOps approach.

First, security — credentials never leave the cluster.

Push requires an external CI to hold privileged credentials to connect to the cluster.
Pull, by contrast, has the deploying party inside the cluster, so "the external service doesn't need to know the credentials" (gitops.tech), and the connection uses only outbound (egress) from the cluster.
CNCF also, in a 2025 piece, summarizes the pull model's security benefit as "not exposing the cluster to external push traffic" (CNCF, 2025).

Second, self-heal — it reverts deviation on its own.

The Pull agent keeps comparing the actual state against Git, so even drift someone introduced directly with kubectl edit is reverted at the next reconciliation.
The drift that "a human had to revert" in section 1 is now reverted by the controller.

3. What to do GitOps with — ArgoCD vs Flux vs Fleet

In section 2 I explained what GitOps is (declare the desired state in Git → an agent converges to it automatically), why it fills the four limits of manual ops, and why the Pull approach is safer and recommended.

Now I need to research the tool that will actually run that Pull.

There are several Kubernetes GitOps tools, but the three I picked as serious candidates for real-world comparison are — ArgoCD, Flux CD, and Rancher Fleet.

All three share the same essence — "make Git the single source and match the cluster to that declaration" — but their character clearly splits on how they structure controllers, how they divide CRDs, whether they embed a UI, and how many clusters they have in mind.

Let me go through each one's concept and architecture.

3-1. ArgoCD — an app-centric GitOps controller

ArgoCD is part of the Argo project that Intuit built and donated, and the official docs define it as "a declarative GitOps continuous delivery tool for Kubernetes" (Argo CD Docs). The application-controller that handles reconcile (matching the current "actual" state to the "desired" state), the repo-server that caches and renders Git, and the server that provides the API and UI all work as one suite.

Below is ArgoCD's detailed architecture drawn against the latest stable version (v3.4.3 — the same version my cluster runs). On top of the API, Repo, and Application three cores come ApplicationSet, Redis, and Dex; the Repository Server pulls and renders Git, and the Application Controller compares it against live and syncs to the cluster.

ArgoCD has two distinctive traits.

Everything is grouped into "apps" centered on a CRD called Application.
- You declare "sync this Git path to this namespace of this cluster" in a single Application, and scale up with the app-of-apps pattern, where one app owns many, or with ApplicationSet, which auto-generates apps from a template.
It ships with a rich Web UI built in.
- The official docs cite "a Web UI that shows application activity in real time" as a core feature, and you handle sync status, diff, and rollback, plus SSO (OIDC) and RBAC, right from the screen (Argo CD Docs). Its maturity is solid too. Argo entered CNCF incubation in 2020 and reached Graduated on December 6, 2022 (CNCF — Argo Graduated).

3-2. Flux CD — a composable GitOps toolkit

Flux was originally built by Weaveworks, and in v2 it was rewritten on top of the Kubernetes controller-runtime and its own GitOps Toolkit. The official site introduces Flux as "a set of continuous and progressive delivery solutions for Kubernetes that are open and extensible" (Flux).

That word "set" captures Flux's character well.

If ArgoCD is one suite of controllers, Flux is a combination of several controllers split by purpose.

Below is Flux's detailed architecture per the latest version (v2.8) official docs.

Six controllers each own their CRDs; the source-controller pulls sources and exposes them as artifacts, the kustomize/helm controllers apply them via SSA, and the image controllers commit new images back to Git, closing the loop.

Concretely, source-controller (acquiring sources: Git, Helm, OCI, S3, etc.), kustomize-controller (applying Kustomize), helm-controller (Helm releases), notification-controller (notifications), and the image automation controllers each own their own CRDs (GitRepository, Kustomization, HelmRelease, etc.) and collaborate (Flux).

This finely divided structure has the upside of free composition and extension and a light cluster footprint, while differing in that there's no officially built-in Web UI.

You mostly check status with the CLI (flux), and if you need a screen you attach a separate ecosystem UI or a vendor-hosted product (Flux).

Its maturity is neck and neck with ArgoCD. Flux also reached CNCF Graduated on November 30, 2022 (CNCF — Flux Graduated), so the two tools graduated less than a week apart, with maturity that stands shoulder to shoulder.

3-3. Rancher Fleet — GitOps for hundreds of clusters

The third is SUSE's Rancher Fleet. Its starting point differs from the other two.

If ArgoCD and Flux start from "GitOps for one cluster" and expand toward multi-cluster, Fleet is designed from the start to aim at "large-scale multi-cluster."

AWS's guidance docs also introduce Fleet as a "GitOps-at-scale" tool "built to scale from a single cluster to thousands" (AWS — Rancher Fleet).

Its operating model is tuned to that purpose. The Fleet Manager on the management (upstream) cluster packages the contents of a repo pointed to by GitRepo into a Bundle, then fans out to many downstream clusters according to group and target settings (Fleet — Mapping to Downstream Clusters).

You usually manage it all from Rancher's Continuous Delivery screen. It's a powerful model for an MSP or a large enterprise where clusters are scattered by the dozens or hundreds across data centers, regions, and customers.

That said, unlike ArgoCD and Flux, you should also note that Fleet is not a CNCF project but part of the SUSE Rancher ecosystem.

Below is Fleet's detailed architecture per the latest version (v0.15.0) official docs.

The upstream gitjob and fleet-controller turn Git into a Bundle and create per-target BundleDeployments, and each downstream's fleet-agent pulls them outbound and applies them (the controller never connects to downstream first, so it works behind NAT and firewalls).

3-4. Comparison in one table

Lining the three up on the same axes makes the differences clear.

Aspect	ArgoCD	Flux CD	Rancher Fleet
One-line definition	app-centric GitOps controller	composable GitOps toolkit	large-scale multi-cluster GitOps
Controller layout	one suite (controller · repo · server)	per-purpose controllers (GitOps Toolkit)	Fleet Manager → downstream
Core CRDs	`Application` · `ApplicationSet`	`GitRepository` · `Kustomization` · `HelmRelease`	`GitRepo` · `Bundle`
Web UI	built-in (status · diff · rollback · OIDC/RBAC)	none official (CLI + ecosystem UI)	Rancher UI integration
Target scale	single~multi cluster	single~multi cluster	hundreds~thousands of clusters
Governance	CNCF Graduated (2022-12)	CNCF Graduated (2022-11)	SUSE Rancher (non-CNCF)
Strength	app-level visibility · UI	lightweight · composability	large-scale cluster fan-out

On maturity (both CNCF Graduated) and on core behavior like Pull and self-heal, ArgoCD and Flux are effectively on par. The real fork was "do you work with apps through a screen (ArgoCD) vs compose controllers and work through the CLI (Flux)", and for Fleet, "how many clusters do you have."

3-5. So why ArgoCD?

My choice was ArgoCD. But this wasn't a question of "which of the three is superior" — it was a decision driven by my homelab's context.

There were three criteria.

First, my environment has one cluster (6 nodes, but a single cluster).
- Fleet's strength of fanning out to hundreds of clusters has no use for me; if anything, that management model is overkill.
Second, I had a strong desire to "see with my own eyes what had diverged."
- ArgoCD's built-in UI, where you can instantly check drift and sync status, diff and rollback on a screen, was more intuitive than CLI-centric Flux, both for learning and for operating.
Third, since the point of this homelab is to learn a way of working I can someday move to an enterprise, I needed a tool with rich material and examples and a guaranteed lifespan.
- It was also a tool I'd grown familiar with from using it on projects, and one I wanted to dig into more deeply.
- ArgoCD is CNCF Graduated, and in CNCF's 2025 ArgoCD end-user survey, adoption was overwhelming — about 60% of respondents' clusters deploy applications with ArgoCD (CNCF, 2025).

Flux is an excellent tool too. With equal maturity (CNCF Graduated), lighter and freely composable, it may actually suit a team that wants to automate ops in a CLI- and code-centric way even better. I personally prefer operating via CLI over a UI as well, but for my conditions — "I want to work with apps through a screen, and I'm practicing an enterprise move by running a single cluster for the long haul" — ArgoCD simply fit a notch better. Right now, ArgoCD runs on this cluster reconciling 79 apps.

$ kubectl get statefulset,deploy -n argocd
NAME                                             READY
statefulset.apps/argocd-application-controller    1/1     # reconcile engine (pull/compare/sync)
deployment.apps/argocd-repo-server                1/1     # Git manifest cache/render
deployment.apps/argocd-server                     1/1     # API / UI
deployment.apps/argocd-applicationset-controller  1/1     # the auto-generator covered in a later part
# … dex / redis / notifications / image-updater

$ kubectl get applications -n argocd --no-headers | wc -l
79

And you can confirm that selfHeal is enabled for this reconcile too.

$ kubectl get application root -n argocd -o jsonpath='{.spec.syncPolicy.automated}'
{"prune":true,"selfHeal":true}

4. What structure to build ArgoCD with — rendering, organization, repository, access

Deciding on ArgoCD doesn't mean I can put part 4's CloudNativePG into Git right away. Following the flow of GitOps —

what and how do I write into Git → how does ArgoCD pull it → and apply it to the cluster — points to decide on appear one after another along that path. Summarized, there are four.

In what format do I write the manifests — rendering
How do I register and manage those apps in ArgoCD — organization
Where (in what repository structure) do I keep those files — repository
How does ArgoCD access that repository — access (auth)

Only once these four are decided does the 'structure' to move CNPG into GitOps stand up. From here, axis by axis, let me look at what it is, why it must be decided, what the candidates are, and on what basis to choose.

4-1. Manifest rendering — Kustomize vs Helm vs plain

First, let me clarify what "manifest rendering" is.

To bring anything up in Kubernetes, you ultimately need YAML (a manifest) describing the target — a Deployment, a Service, a ConfigMap.

But even for the same app, replicas and image tags differ per environment (dev/prod), and similar apps multiply into many copies. At this point, "how you keep the source written, and how you produce the final YAML that actually goes into the cluster" — this process is what we call rendering.

ArgoCD doesn't force this rendering into one way; it looks at the files in the repo path and decides automatically.

If there's a kustomization.yaml it's Kustomize, if there's a Chart.yaml it's Helm, and if neither, it's plain YAML (plain) (Argo CD Docs). So what we decide is "which of these three to write my manifests in." Let me look at how the three differ for the same goal (the same app at replicas 1 in dev, 3 in prod).

plain YAML — as-is, no processing

plain is literally no processing. Put fully filled-in, complete YAML like deployment.yaml and service.yaml in a directory, and ArgoCD's repo-server applies it to the cluster unchanged, as-is.

There's no rendering step at all, so the characters written in Git are exactly the cluster's state, which means what gets deployed reads without doubt and there's no syntax to learn. It is "declarative object configuration (kubectl apply -f)" itself (Kubernetes — Object Management).

The problem is repetition.

To keep environments like dev and prod that differ by just a line or two, you copy the whole file (splitting into separate directories or such) and fix only those lines, and even bumping one shared image tag means hand-editing every copied file.

The more targets, the faster duplication and omissions (fixing only one side) pile up. The image below shows that limit — two nearly identical files exist separately because of one replicas line.

Kustomize — layering with base + overlay (no templates)

Kustomize solves that repetition without copy-paste.

The Kubernetes official docs define it as "a standalone tool to customize Kubernetes objects through a kustomization file," and since 1.14 it's built into kubectl, usable directly with kubectl apply -k (Kubernetes — Kustomize). The core concepts are base and overlay.

Keep one copy of the shared manifest in base/ (e.g. a Deployment with replicas 1), and write only the per-environment differences as a patch in overlays/prod/'s kustomization.yaml (e.g. "replicas to 3," "prefix names with prod-"). Then kustomize build reads the base and overlays the patch to produce the final YAML.

The decisive trait is that there's no template language — it's not variable substitution like {{ }} but merging YAML on top of YAML to make another plain YAML, so the result reads as-is and "what changed and how" is visible.

Shared values (image tags, etc.) reflect to every overlay by fixing just one place in base, so plain's copy-paste problem disappears. In return, it's weak at complex expressions like conditional branching or repeated generation.

The image below shows, with real files, how one base + a prod overlay's patch merge into the final YAML.

Helm — parameterizing with charts and values (a template engine)

Helm takes a different approach.

Calling itself "the package manager for Kubernetes" (Helm), it bundles an application into a package called a chart. The manifests in the chart's templates/ don't write values directly but place Go template variable slots like {{ .Values.replicas }}, and the actual values are written separately in values.yaml.

At deploy time, Helm slots the values into place (substitution) to render the final manifests.

So keeping the chart and just swapping the values lets you deploy with the same chart to dev and prod, and even other clusters.

On top of this, it supports conditionals (if), loops (range), shared helpers (_helpers.tpl), and dependencies on other charts (subcharts), giving it the strongest expressiveness.

That makes it especially good for pulling in complex software someone else has published, chart and all, and changing only the values to fit my environment (public charts usually come from official chart repositories).

Because a template language sits in between, "the text written in Git" and "the YAML that will actually be applied" are one step apart. So you need the habit of expanding the render result in advance with helm template to check it.

The image below shows the process of variable slots ({{ }}) being substituted with values into the final YAML.

Aspect	plain YAML	Kustomize	Helm
Processing	none (apply as-is)	base + overlay patch·merge	Go template variable substitution
Template language	none	none	yes (`{{ }}`)
Parameterization·expressiveness	none	medium (patch·field injection)	high (conditionals·loops·dependencies)
Output transparency	highest (source = result)	high (YAML→YAML)	low (must render to see)
Built into kubectl	apply only (`-f`)	yes (`-k`)	no (separate tool)
Reuse·distribution	low (copy-paste)	medium (base reuse)	high (shared via charts·repos)
Learning·complexity	lowest	low	medium~high
Best fit	a few static resources	manifests I declare myself	complex external public charts

In short, the three aren't a matter of better or worse but of purpose — plain for a small number of static resources, template-free and clean Kustomize for manifests I declare myself, and Helm for pulling in complex external public charts.

4-2. App organization — app-of-apps vs ApplicationSet

Next is "how to register those written manifests in ArgoCD."

ArgoCD handles the deploy unit as a CRD called Application.

It's a single sheet that says "sync this Git path to this namespace of this cluster." With one or two apps, you can write these Applications by hand, one at a time. But once you have dozens to bring up, making the Applications themselves becomes work, and it's easy to miss some or let them drift apart.

So you have to decide "how to create Applications systematically (automatically, if possible)." ArgoCD offers two paths.

app-of-apps is, in the official docs' exact words, a pattern that "declares one ArgoCD app consisting only of other apps" (Argo CD — Cluster Bootstrapping). You write a list of child Applications into one parent root Application, and syncing just that root creates the children one after another.

It suits a bootstrap entry point that "stands up the cluster's skeleton in one shot." But because you write the child list directly, you have to add one more child Application by hand each time a new app appears.

ApplicationSet goes one step further — in the official docs' words, a controller that "automates and flexibly manages Applications across many clusters and apps" (Argo CD — ApplicationSet).

The core is the generator.

A generator produces parameters, and those parameters are slotted into a single template to stamp out Applications. Generators include list (giving the list directly), cluster (scanning registered clusters), git (scanning a repo's folders and files), and matrix (multiplying two together).

In particular, the git generator automatically creates an Application for each folder under a set path (e.g. apps/*), so just adding a new folder makes the app appear on its own.

There's no need for a human to create the Application directly.

The two aren't so much competitors as different layers. app-of-apps is good at making "one initial entry point," ApplicationSet at "mass-producing apps beneath it."

So they're commonly used together — root (app-of-apps) stands up the ApplicationSets, and each ApplicationSet scans folders to stamp out the actual apps.

4-3. Repository structure — monorepo vs polyrepo

Third is "in which repository to keep those declarations."

There's an order to this. There's a principle to note first, and then you decide whether to keep that repository as one or split it into many.

The principle is to separate config (manifests) from app source code.

The ArgoCD official guide nails it down, "strongly recommending that Kubernetes manifests live in a separate Git repository from the application source code" (Argo CD — Best Practices). The reasons follow.

① If source and config are in one repo, it's easy to create an infinite loop where changing only config re-runs the app's build CI.
② Deploy history (config commits) and development history (source commits) get tangled, making the audit log messy.
③ It's hard to separate the permissions of "people who touch the code" and "people who deploy to production."

What remains is how to keep that "config repository."

monorepo gathers the entire cluster's declarations in one repo, separated by folders (platform/, workloads/, …)
- The whole picture of changes fits in one view, and the ApplicationSet's git generator only needs to scan one repo, keeping things simple
- But once an organization gets very large, it's limited at finely dividing permissions like "this folder for this team only."
polyrepo splits config repos per team or domain
- You can cleanly divide access permissions per repo, but it gets cumbersome to see the whole cluster at once or to make changes spanning multiple repos. (Better or worse between these two isn't a matter of a right answer so much as a trade-off driven by org size and permission needs.)

4-4. Repository access — HTTPS vs SSH deploy key vs GitHub App

The last is how ArgoCD reads that (usually private) repository.

Per the Pull model from section 2, ArgoCD's repo-server, the party doing the reading, is inside the cluster and uses only outbound connections. Still, reading a private repo needs credentials, and each method differs in the scope its permission reaches and whether it can be narrowed to read-only. The official docs support HTTPS (user/token), SSH private key (deploy key), GitHub App, TLS client certificates, and more (Argo CD — Private Repositories).

HTTPS · Personal Access Token
- You attach using the token like a password.
- Simplest, but the token easily broadens to reach many repos at the account level, and usually carries read/write permission together.
- If leaked, the blast radius is large.
SSH · Deploy Key
- GitHub's official docs define a deploy key as "an SSH key that grants access to a single repository," and specify that "it's read-only by default, and write access can be granted when adding it" (GitHub — Deploy keys).
- That is, its scope is limited to that one repository and it can be issued read-only, fitting the principle of least privilege best.
- You register the public key as the repository's deploy key and put the private key into ArgoCD.
GitHub App
- A fine-grained method that reaches only chosen repos and permissions per installation.
- The installation token is a short-lived token that expires in about an hour, so you use it with auto-renewal (GitHub — App installation auth), and auditing and revocation stay clean.
- Suits org and many-repo scale, but the initial setup is somewhat complex.

Whichever method, the credential stays only inside the cluster and uses only outbound connections — that's the same. What splits is "how far you can narrow the permission," and for a single repository, the deploy key, limited to that repo and read-only, is the simplest and safest.

4-5. What does my homelab's setup look like? — what, why, and so what do I gain

Rendering
- Kustomize by default + Helm alongside.
- The resources I declare myself have almost no per-environment branching, so Kustomize, where the result YAML is visible as-is with no template language, was simple and easy to debug.
- Conversely, charts someone else made well and published — like operators — I don't bother unpacking and porting; I pull them in as-is with Helm.
- What I use myself stays transparent, what others made gets reused, and the management burden is minimized on both sides.
Organization
- app-of-apps + ApplicationSet.
- I keep root (app-of-apps) as the bootstrap entry point, and beneath it the ApplicationSets scan folders with a git generator to mass-produce apps.
- Adding an app needs no hand-made Application (just add a folder), and rebuilding the cluster restores everything from the single root.
- Reproducibility and scalability come together.
Repository
- A config monorepo separated from source.
- Manifests are separated from app source (the official recommendation), but on a single-cluster homelab I gathered that config into one repo.
- It avoids CI loops, history tangling, and permission issues through separation, while seeing all changes in one view and keeping the generator simple.
Access
- A read-only SSH deploy key.
- On top of the Pull model that keeps credentials only inside the cluster, I narrowed the permission to that one repository, read-only.
- Even if the key leaks, it can't do more than read that repo, so the blast radius is structurally bound.

This setup's character, in one line, is "single cluster, one config monorepo, read-only pull" — simple, safe, and reproducible as a whole.

At the same time, changing just one axis at a time makes it a foundation that scales to an enterprise (repository to polyrepo, access to GitHub App, target to multi-cluster).

# ArgoCD repo connection secret — the key layout alone tells you the method.
$ kubectl get secret repo-seonology-k3s -n argocd -o jsonpath='{.data}' | jq 'keys'
[
  "sshPrivateKey",   # connect via SSH deploy key — credentials stay in the cluster
  "type",            # git
  "url"              # exactly one repo = monorepo
]

These decisions together finish the preparation to move part 4's imperatively-built CloudNativePG into GitOps. In the next part (part 6), I'll actually build this design by hand.

From creating the config repository → installing ArgoCD → connecting the repository with a read-only deploy key → standing up the root (app-of-apps) and ApplicationSet skeleton.

And in part 7, I'll lay CNPG on top of it — declaring the operator with Helm and the Cluster CR with Kustomize — and take it all the way to running it as GitOps.

Even after deciding to use ArgoCD, four more things need deciding — rendering (Kustomize by default + Helm alongside), organization (app-of-apps for bootstrap + ApplicationSet for mass production), repository (a config monorepo separated from source), access (a read-only SSH deploy key). The reason for each choice converges into one — "simple, safe, reproducible" — and at the same time becomes a foundation that scales to an enterprise by changing just one axis at a time.

5. Wrapping up — and what's next

This part didn't add a single line of command (kubectl apply).

Instead, it finished the design for putting that command down. I confirmed that GitOps's four principles fill exactly the four things that broke under hand-run ops (drift, history, reproducibility, audit) (sections 1 and 2), chose ArgoCD as the tool to run that Pull (section 3), and decided how to write, group, keep, and read the manifests on top of it, along four axes (section 4).

Spread out, it looks like a lot of decisions, but they all converge in one direction — simple, safe, and reproducible.

Rendering keeps Kustomize, whose result is visible as-is, by default while pulling in complex charts others made with Helm; organization uses app-of-apps + ApplicationSet, where everything is restored from a single root; the repository is a config monorepo separated from source; access is a read-only deploy key that reads only that one repo.

It's a plan that's both the simplest, safest starting point for running a single cluster, and one you can scale to an enterprise by changing just one axis at a time.

With the design done, from the next part on I build this by hand.

Part 6 · Bootstrap — install ArgoCD, connect the config repository with a read-only deploy key, then stand up the cluster's skeleton with root (app-of-apps) and ApplicationSet. And I'll see with my own eyes how ArgoCD reverts a change someone made directly with kubectl edit (self-heal).
Part 7 · Apply — move the CloudNativePG I brought up imperatively in part 4 into GitOps. I'll declare the operator with Helm and the Cluster CR with Kustomize, and finish off the last remaining homework, secret (password) management.

It's time to make a single commit take the place where I used to type kubectl apply by hand.

References / Sources

GitOps definition and principles — OpenGitOps (CNCF) · GitLab — What is GitOps · gitops.tech · CNCF — GitOps in 2025
Kubernetes basics — Object Management · Controllers
ArgoCD — Official docs · Cluster Bootstrapping (app-of-apps) · ApplicationSet · Best Practices · Private Repositories
Flux CD — Official site
Rancher Fleet — AWS Prescriptive Guidance · Fleet — GitRepo Targets
Rendering tools — Kubernetes — Kustomize · Helm
Governance and adoption — Argo CNCF Graduated (2022) · Flux CNCF Graduated (2022) · CNCF — ArgoCD end-user survey (2025)
Repository access — GitHub — Deploy keys · GitHub — App installation auth

Hybrid k3s #4: Building a unified database on k3s — five Postgres operators, and CloudNativePG

SEON — Tue, 09 Jun 2026 00:00:00 +0000

0. About this series

This series is a record — written one piece at a time — of how I actually built the homelab in the diagram above, the one that's still running as I write this.

What began as a toy project from a simple "could this even work?" turned, through satisfying performance and endless tearing-down-and-rebuilding, into a genuine toy that takes the edge off the stress built up at work. It isn't a resource-rich cluster, but it's been more than enough to get a real taste of Kubernetes, and it keeps handing me the next thing I want to try.

6 nodes — 2 Lightsail servers (control plane + etcd) in the cloud (AWS Tokyo) + 4 Lima VM agents on a home (Sapporo) iMac
19 vCPU / 61 GiB total, 49 namespaces , 248 pods (150 running)
Deployed with ArgoCD , auth via Keycloak OIDC , with CloudNativePG, Vault, CrowdSec, Prometheus/Grafana and more running on top

This time, on top of the six-node hybrid cluster I'd built up through part 3, I dissect five Operators for running PostgreSQL reliably, and end up building an HA cluster — and its backups — with CloudNativePG.

1. Kubernetes, databases, and the Operator

"Kubernetes supports stateful workloads; I do not." — Kelsey Hightower (2018)

That's a one-liner Kelsey Hightower left on X (Twitter) in 2018, the man widely known as a Kubernetes evangelist. "Kubernetes supports stateful workloads — but I don't," meaning "I wouldn't put a database on it myself." And it wasn't just his opinion: putting databases on Kubernetes was long frowned upon across the infrastructure industry.

The reasoning is clear. A Pod can go down at any moment (it's ephemeral), and nodes get swapped out without warning. A stateless app can simply be brought back up if it falls over, but a DB that holds data can have its fate decided the moment a single Pod disappears. So "leave the DB to a managed service like RDS or Cloud SQL" was the accepted wisdom for a long time.

Two things overturned that wisdom. The first was the maturing of StatefulSet and PersistentVolume. Even if a Pod restarts or moves to another node, it can keep the same volume and a stable network ID — which laid the groundwork for stateful workloads. The second was the Operator pattern.

The Operator is a concept CoreOS introduced in 2016 — in a phrase, "putting operational knowledge into software." It takes the operational work that used to live in an admin's head or in shell scripts — provisioning, version upgrades, failover, backups, point-in-time recovery (PITR) — and moves it into the code of a controller that runs right alongside the workload. You declare only the "desired state" in YAML, and the Operator continuously reconciles it against the current state, converging the two. The first examples were CoreOS's etcd Operator and Prometheus Operator.

The harder a piece of software is to operate — and a DB is exactly that — the more this pattern pays off. And the PostgreSQL ecosystem is where Operators compete most fiercely. In the next chapter I compare these solutions, each with its own architectural philosophy, one at a time.

2. Comparing five major Postgres Operator architectures

The major PostgreSQL Operators most widely used today across the CNCF ecosystem and enterprise environments are below. Each has its own architecture and trade-offs, and the right pick shifts with your infrastructure's requirements.

One caveat: my reason for picking one of these five leans heavily toward "what fits my homelab" and "what I personally wanted more hands-on experience with." Please read it knowing that's a different lens from evaluating them for production use at a company.

① Zalando Postgres Operator

The first one I looked at was the elder statesman of this space, Zalando Postgres Operator. Built by the German e-commerce company Zalando for running its own PostgreSQL, and hardened over years of running hundreds of clusters in-house, it's among the oldest Operators around.

The Postgres Operator delivers an easy to run highly-available PostgreSQL clusters on Kubernetes (K8s) powered by Patroni. — Zalando postgres-operator official README

At its core is a Docker image called Spilo. Spilo bundles PostgreSQL, the HA manager Patroni , and the S3 backup/restore tool WAL-G into a single image; the Operator itself sits on top as a relatively thin control layer that "stands up Spilo Pods once you declare the cluster you want via a CRD." The actual high availability — leader election and automatic failover — is handled by the Patroni inside each Pod, and if your application connections need pooling, you can stand up PgBouncer separately as a connection pooler.

Let me clear up a common misconception here: that "using Patroni means you need a separate external consensus store (DCS) like etcd or ZooKeeper." On Kubernetes, that's not the case. In Zalando Operator's defaults, Patroni uses Kubernetes resources themselves (Endpoints, or ConfigMaps) as the DCS, and by default the external etcd connection is simply left unset. Patroni takes a leader lock with a TTL (30s by default) on that K8s object and refreshes it periodically; if the leader vanishes, the remaining nodes compare WAL positions and elect a new one.

Its strength is the sheer weight of precedent and information. Because it's the oldest and most battle-tested in large-scale production, when you hit a problem a search usually turns up a prior case. The license is the permissive MIT, too. That said, this Operator was essentially built for Zalando's own needs, so there's no official commercial support; maintenance continues as of 2026, but the release cadence has visibly slowed compared to the newer CloudNativePG.

② CrunchyData PGO

Next is CrunchyData PGO. As its GitHub repo describes it — "Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service" — it was built by a database-focused company, and it shows: it's the Operator that puts the most weight on "data protection and backup."

HA itself is the same lineage as Zalando. Inside each Postgres Pod's database container, PostgreSQL and Patroni run together to handle automatic failover, and the consensus store (DCS) is the Kubernetes API (Endpoints lease) — no external etcd here either. Applications connect to the Primary and Replicas through PgBouncer (a connection pooler) and Services (rw/ro).

PGO's real strength is its backups. It integrates pgBackRest — the de facto standard PostgreSQL backup tool — as a sidecar container on each Postgres Pod plus a dedicated repo host Pod. With just spec.backups.pgbackrest configuration, it archives all transaction logs (WAL) to up to four storage locations (S3, MinIO, GCS, Azure Blob), so even if a whole node is lost, point-in-time recovery (PITR) and disaster recovery (DR) are guaranteed. If you take backup and recovery seriously, it's the most reassuring choice.

But getting it into the homelab had a licensing catch. PGO's source code is Apache 2.0, but the production container images are bound by the Crunchy Data Developer Program terms , and using those images in production effectively requires a commercial agreement. It's fine for personal learning or a homelab, but measured against "can I extend this to in-house or commercial use whenever I want," it was an uneasy constraint.

③ Percona Operator for PostgreSQL

The third is Percona Operator for PostgreSQL. Built by Percona, which has run an open-source DB business for over 18 years, it's the choice where "fully open source" comes through most clearly.

Its architecture is rooted in the CrunchyData PGO we just saw. Percona hard-forked PGO (becoming a fully independent project from 3.0.0 onward) and grew it from there. So high availability is again handled by Patroni , the consensus store by the Kubernetes API (Endpoints lease), backups by pgBackRest , and connection pooling by PgBouncer — inheriting PGO's proven skeleton as-is.

Two things set Percona apart. One is PMM (Percona Monitoring and Management) integration. A PMM Client sidecar attaches to each Postgres Pod and ships query analytics (QAN), system metrics, and even Patroni's metrics to the PMM Server. Production-grade observability comes along without much extra setup.

The other was the clincher: the container images are fully open source (Apache 2.0), with no usage restrictions. That's the exact opposite of CrunchyData requiring a commercial agreement for production images. Percona itself markets this point as "Migrate to Freedom." Thinking about starting in a homelab and possibly extending to in-house or commercial use someday, that "no restrictions" was a big draw.

The single reason it still fell out of the final cut was weight. Because it carries PGO's lineage, each Pod gets several containers (database, pgBackRest, PMM), and you also have to stand up a PMM Server separately. It's a reasonable setup for an enterprise, but for my homelab splitting 19 vCPU, it was a touch heavy.

④ StackGres

The fourth is StackGres. Built by Spain's OnGres, this Operator reaches beyond a mere HA tool, billing itself as "a complete PostgreSQL platform (DBaaS) on Kubernetes."

The HA foundation is again Patroni , the consensus store the Kubernetes API (no external etcd) — same as above so far. What sets StackGres apart is its "pack everything into one Pod (batteries-included)" design. Inside a single Postgres Pod run, alongside PostgreSQL+Patroni, an Envoy proxy (mandatory) that handles all traffic, the PgBouncer connection pooler, a postgres-exporter for metrics, fluent-bit for logs, and a cluster-controller that reconciles local state — several containers together. The Envoy here isn't just a proxy; it parses the Postgres wire protocol and even produces connection statistics.

The operational experience is well thought out, too. A Web Console and REST API are built in by default, so nearly everything you'd do with kubectl can be handled from a UI instead. Backups go through the SGBackup and SGObjectStorage CRDs, with continuous archiving (base backup + WAL) to S3, MinIO, GCS, or Azure. True to "batteries included," it's the friendliest all-in-one for someone just getting started.

The problem was the license. StackGres's core code is AGPL 3.0. Using it as-is is fine, but the moment you put something on top and offer it as a service, the source-disclosure (copyleft) obligation can reach your own code. It isn't a problem right now, but thinking about the homelab-expansion scenario of "I might put my own service on this cluster and expose it externally," AGPL was a concern I'd rather avoid up front.

⑤ CloudNativePG (CNPG)

Last is CloudNativePG (CNPG). The latest arrival, only showing up in 2022, yet in just two years it overtook Zalando and CrunchyData by GitHub stars to become the most popular PostgreSQL Operator today. Built by EDB (EnterpriseDB), donated to the CNCF Sandbox in January 2025, licensed Apache 2.0.

The secret to CNPG vaulting to the front was, paradoxically, "subtraction." It strips out Patroni entirely — the thing the previous four shared — and uses no external DCS. Instead, each Pod's Instance Manager (a Go process running as PID 1) directly controls the PostgreSQL native binary, and the Kubernetes API (Endpoint Leases) is the single source of truth for state. Deciding the leader (primary) and failover is handled directly by the Operator (controller-manager) through reconciliation.

The result is extreme simplicity. Inside one Pod there's no Patroni, no Envoy, no pile of sidecars — just the Instance Manager and PostgreSQL. That means low overhead and easy debugging. Backups use the built-in Barman Cloud , continuously archiving WAL to S3-compatible storage for PITR, and connections are cleanly split across rw, ro, and r Services.

Governance is reassuring, too. Being a CNCF project, no single company can quietly shut it down , multiple vendors offer commercial support, and development moves fastest in this space. In the next chapter I'll line up the five in one table, then lay out why it ended up being CNPG.

A quick summary table

Operator	HA engine	Consensus store (DCS)	Pod composition	Backup	License	Governance · activity
① Zalando	Patroni	K8s API (Endpoints/ConfigMaps)	Spilo (PG+Patroni+WAL-G), separate Pooler	WAL-G → S3	MIT	in-house · slowing releases
② CrunchyData	Patroni	K8s API (Endpoints lease)	database + pgBackRest sidecar + repo host	pgBackRest (up to 4 repos)	Apache 2.0 · restricted prod images	OSS effectively commercialized
③ Percona	Patroni	K8s API (Endpoints lease)	database + pgBackRest + PMM sidecar	pgBackRest	Apache 2.0 · fully open (no restrictions)	stable company · active
④ StackGres	Patroni	K8s API	batteries (Envoy·PgBouncer·exporter·fluent-bit·controller)	SGBackup · continuous archiving	AGPL 3.0 (copyleft)	OnGres flagship · active
⑤ CloudNativePG	own Instance Manager (no Patroni)	K8s API (Endpoint Leases)	Instance Manager + PostgreSQL (two)	Barman Cloud	Apache 2.0	CNCF · #1 today · most active

Lined up, the differences are clear. ①–④ all use Patroni — the real difference is "what you stack on top of that Pod (backup, monitoring, proxy)" and "the license." And only ⑤ CNPG drops Patroni and adopts its own Instance Manager. The consensus store is the Kubernetes API for all five — the common myth that "using Patroni requires external etcd" is, on Kubernetes, no longer true.

My homelab's criteria were clear. Splitting 19 vCPU, it had to be lightweight ; extending it to in-house or commercial use someday, it had to be free of license strings ; and running it long-term, it needed governance that won't fade.

3. Why I chose CloudNativePG

After comparing all those Operators, the solution that best fit my current homelab (k3s-based) and my plans for it was CloudNativePG (CNPG). Three things clinched it.

① Extreme simplicity and lightness (Kubernetes Native)

By nature, a homelab's resources (CPU/memory) aren't as plentiful as an enterprise's. Standing up Patroni or an external DCS (etcd, ZooKeeper), like the other Operators, adds the heavy burden of managing that component's own state on top. CNPG removes external dependencies entirely and uses the Kubernetes API Server itself as the DCS. Without interposing a separate HA process like Patroni, only PostgreSQL and the Instance Manager (a Go process) that wraps and manages it as PID 1 are running — so overhead is minimal, the structure is intuitive, and debugging is far easier.

② Rock-solid backup/recovery with Barman (PITR)

The most important thing for a database is, above all, backups. CNPG embeds the cloud edition of Barman (Backup and Recovery Manager), the de facto standard for PostgreSQL backups. With just a few lines of config, it continuously archives every transaction log (WAL) to S3-compatible storage (in my case, MinIO built inside the homelab, or Cloudflare R2). Even if a node or disk physically dies, you can roll the data back to a past point in time (Point-In-Time Recovery) up to the last archived moment. CNPG sets the default archive_timeout to 5 minutes, guaranteeing a clear 5-minute RPO (Recovery Point Objective); you can shrink that interval further with synchronous replication or a shorter archive interval.

③ A declarative CRD — the ideal candidate for GitOps next time

CNPG's Cluster CRD is thoroughly declarative. The PostgreSQL version, instance count, resource limits, storage, backup policy — the cluster's entire state fits in a single YAML.

This time I install the Operator with helm and stand the cluster up directly with kubectl apply.

The fact that the whole cluster state fits in YAML means that definition can go straight into Git and be run declaratively. And next time I plan to GitOps the whole homelab, including this cluster, with ArgoCD — CNPG, which expresses everything as a single CRD, is the best-suited choice for that move.

A YAML on Git becomes the cluster state as-is; commit a change and ArgoCD syncs it, applying it through a zero-downtime rolling update. Having decided to bring the database inside Kubernetes, only when its definition is managed as code, too, does it stop feeling half-finished.

4. Hands-on: installing the CNPG Operator

With CNPG, you deploy the Operator once cluster-wide with Helm, and from then on you can create a Cluster in any namespace. Add the official chart repo and install into the cnpg-system namespace.

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update
helm upgrade --install cnpg cnpg/cloudnative-pg \
  --namespace cnpg-system \
  --create-namespace \
  --wait

The CNPG chart only ever offers "the latest point release." To pin a specific version, add --version <chart-version>. (This article is based on Operator v1.24.)

Once installed, check that the Operator Pod (Deployment name cnpg-controller-manager) is Running.

kubectl get pods -n cnpg-system
# NAME READY STATUS RESTARTS AGE
# cnpg-controller-manager-8d447b4b6-xxxxx 1/1 Running 0 40s

All CNPG operations are done through custom resources (CRDs). Check that the main CRDs are registered.

kubectl get crd | grep postgresql.cnpg.io
# backups.postgresql.cnpg.io
# clusterimagecatalogs.postgresql.cnpg.io
# clusters.postgresql.cnpg.io
# imagecatalogs.postgresql.cnpg.io
# poolers.postgresql.cnpg.io
# scheduledbackups.postgresql.cnpg.io

Finally, install the cnpg kubectl plugin you'll use later to inspect cluster state and verify connections. With krew it's one line.

kubectl krew install cnpg
kubectl cnpg version
# Build: {Version:1.24.1 ...}

Without krew, you can also install it via CNPG's official install script.

The Operator and plugin are ready. Now it's time to stand up an actual database cluster.

5. Hands-on: deploying the first highly available PostgreSQL cluster

With the Operator ready, let's stand up a PostgreSQL cluster that actually holds data. CNPG packs the cluster's entire state — instance count, PostgreSQL version, resources, storage — into a single Cluster CRD.

Below is the manifest (demo-db.yaml) for a 3-node HA cluster I made for verification. Since it's for testing, resources are kept small (adjust as needed), and storage uses k3s's default local-path.

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: demo-db
  namespace: cnpg-demo
spec:
  instances: 3 # 1 Primary + 2 Replica
  imageName: ghcr.io/cloudnative-pg/postgresql:16.4

  storage:
    size: 1Gi
    storageClass: local-path # k3s default (WaitForFirstConsumer)

  resources:
    requests:
      memory: "256Mi"
      cpu: "100m"
    limits:
      memory: "512Mi"
      cpu: "500m"

Create the namespace and apply.

kubectl create namespace cnpg-demo
kubectl apply -f demo-db.yaml

The Operator first bootstraps the Primary with initdb, then joins the Replicas one by one. (When pulling the PostgreSQL image for the first time on a home node, the initial startup can take a few minutes.) Once the status reads Cluster in healthy state, it's done.

kubectl get cluster demo-db -n cnpg-demo
# NAME AGE INSTANCES READY STATUS PRIMARY
# demo-db 24m 3 3 Cluster in healthy state demo-db-1

The Pods consist of one Primary and two Replicas, and thanks to CNPG's default anti-affinity, they spread across separate nodes as much as possible. In my homelab, the three Pods landed on three different Lima nodes (agent-2/3/4).

kubectl get pods -n cnpg-demo
# NAME READY STATUS RESTARTS AGE
# demo-db-1 1/1 Running 0 13m
# demo-db-2 1/1 Running 0 6m
# demo-db-3 1/1 Running 0 3m

The Operator also creates three Services for connecting. Writes always go to the Primary via *-rw, reads spread across the Replicas via *-ro, and *-r includes both the Primary and Replicas.

kubectl get svc -n cnpg-demo
# NAME TYPE CLUSTER-IP PORT(S) AGE
# demo-db-r ClusterIP 10.43.89.62 5432/TCP 27m
# demo-db-ro ClusterIP 10.43.198.255 5432/TCP 27m
# demo-db-rw ClusterIP 10.43.55.136 5432/TCP 27m

6. Hands-on: verifying the connection and replication state

With the cluster up, let's actually connect and confirm it works. On cluster creation, CNPG auto-generates an application Secret (<cluster>-app). It holds the username, DB name, password, and a ready-to-use connection URI. (Admin/superuser access is disabled by default , enabled when needed via spec.enableSuperuserAccess: true — so there's no *-superuser Secret in the default setup.)

kubectl get secret demo-db-app -n cnpg-demo -o jsonpath='{.data.username}' | base64 -d; echo
# app
kubectl get secret demo-db-app -n cnpg-demo -o jsonpath='{.data.dbname}' | base64 -d; echo
# app

The easiest way to connect is the cnpg plugin from §4. Open a psql session straight to the Primary.

kubectl cnpg psql demo-db -n cnpg-demo

In the session, check the replication state. The Primary should be streaming WAL to the two Replicas.

postgres=# SELECT application_name, client_addr, state, sync_state
           FROM pg_stat_replication ORDER BY application_name;

 application_name | client_addr | state | sync_state
------------------+-------------+-----------+------------
 demo-db-2 | 10.42.3.111 | streaming | async
 demo-db-3 | 10.42.6.236 | streaming | async
(2 rows)

Both Replicas are streaming. With no Patroni and no external DCS — just the Kubernetes API and each Pod's Instance Manager — a leader was elected and streaming replication was set up. The default replication is asynchronous (async), so write latency is negligible; if you need stronger guarantees, you can enable synchronous replication (minSyncReplicas/maxSyncReplicas) in the spec.

With that, a lightweight, simple, highly available PostgreSQL cluster is running on the homelab's k3s. One last thing remains — how to keep this data safe: adding backups.

7. The last piece that protects your data — backups to MinIO

Even with the cluster up, without backups you're only halfway there. When a whole node dies, or you drop a table by accident, only a base backup + WAL archive lets you roll back to a point in time (PITR).

S3-compatible API — any backend works

CNPG's backup engine, Barman Cloud , uploads backups to object storage via the S3-compatible API. The key point here is that "S3-compatible" is a single standard interface. So the backend can be AWS S3, Google Cloud Storage, Azure Blob, Cloudflare R2, or a self-hosted MinIO — anything; you just change the endpoint and credentials in the config and it works as-is.

What I chose this time is MinIO. MinIO is open-source object storage that implements the S3 API directly, and you can stand it up right inside the cluster. I had three reasons for picking MinIO over a managed S3 service:

Data sovereignty : backups never leave the homelab — not one step.
Zero cost : no cloud storage or transfer charges.
Simple access : it connects directly to minio.minio:9000 within the same k3s.

If I ever need off-site backups, I just change the endpoint to S3 or R2 in the same config and swap the credentials. That near-zero cost of swapping backends is the strength of S3-compatibility.

Adding the backup config

First, place the MinIO connection credentials as a Secret in the same namespace as the cluster (the key names are arbitrary; here, ACCESS_KEY/SECRET_KEY).

kubectl -n cnpg-demo create secret generic minio-backup-creds \
  --from-literal=ACCESS_KEY='<minio-access-key>' \
  --from-literal=SECRET_KEY='<minio-secret-key>'

Then add spec.backup.barmanObjectStore to the Cluster. Point it at the bucket path (destinationPath), the MinIO endpoint (endpointURL), and the Secret you just made.

spec:
  backup:
    barmanObjectStore:
      destinationPath: s3://cnpg-demo/ # MinIO bucket
      endpointURL: http://minio.minio:9000 # MinIO inside the cluster
      s3Credentials:
        accessKeyId:
          name: minio-backup-creds
          key: ACCESS_KEY
        secretAccessKey:
          name: minio-backup-creds
          key: SECRET_KEY
      wal:
        compression: gzip

Once applied, the Operator takes over PostgreSQL's archive_command and begins continuous WAL archiving. When the cluster's ContinuousArchiving condition turns true, it's ready.

kubectl get cluster demo-db -n cnpg-demo \
  -o jsonpath='{.status.conditions[?(@.type=="ContinuousArchiving")].status}'; echo
# True

First backup and verification

WAL keeps flowing, but you need to take a base backup once as the reference point for recovery. A single Backup resource does it.

apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  name: demo-db-backup-1
  namespace: cnpg-demo
spec:
  method: barmanObjectStore
  cluster:
    name: demo-db


kubectl apply -f backup.yaml
kubectl get backup -n cnpg-demo
# NAME CLUSTER METHOD PHASE ERROR
# demo-db-backup-1 demo-db barmanObjectStore completed

completed. Peeking into the MinIO bucket, the base backup sits under a timestamped directory.

# inside MinIO bucket cnpg-demo
demo-db/
└── base/
    └── 20260609T094230/ # base backup (WAL accumulates under wals/ from here)

Now this cluster can — even if a node dies — roll back to any point in time using the base backup and WAL stacked in MinIO. Having brought the data inside Kubernetes, we've now also prepared the safety net that protects it, all in the same declarative way.

8. Wrapping up — and what's next

I'd steer clear of spanning a CNPG cluster across nodes that sit on opposite sides of Tailscale.

CNPG is sensitive to inter-node network quality. The Primary constantly streams WAL to the Replicas, and each Pod's Instance Manager updates its state (lease) to the Kubernetes API on a short cycle. But this homelab's inter-node communication is a doubly encapsulated structure — flannel VXLAN layered again on top of a Tailscale (WireGuard) mesh — and the cloud (Tokyo)–home (Sapporo) leg is effectively a WAN. Placing the Primary and Replicas across that leg piles increased RTT and jitter onto a shrunken MTU, and WAL streaming breaks while leases expire. In fact, when I built a cluster spanning the Lightsail and Lima nodes, latency-driven connection drops and Pod restarts repeated endlessly.

So I strongly recommend keeping a CNPG cluster within the same low-latency leg, not crossing Tailscale. Pin the cluster to a single site (cloud nodes together, or home nodes together) with nodeSelector or affinity, and inter-node communication stays LAN-stable. If you need redundancy across sites, a per-site cluster plus an asynchronous Replica Cluster is safer than spreading a single cluster across the WAN.

Looking back, CNPG's appeal came down to simplicity. Decide the leader with the Kubernetes API alone — no Patroni, no external DCS — and hand backups off to S3-compatible storage. Light, free (Apache 2.0, CNCF), and above all, everything from the cluster to the backup policy declared in a single YAML — that fit my 19-vCPU homelab, and what comes after it, perfectly.

And that "single YAML" is exactly the starting point for next time. This time I stood it up imperatively with helm and kubectl apply, but the cluster and its backups are, in the end, declarative manifests. Next time I'll GitOps the whole homelab, including this cluster, with ArgoCD — moving toward the picture where a YAML on Git is the cluster state itself, and a single commit becomes a deploy.

References

CloudNativePG — official documentation · GitHub
Zalando Postgres Operator — GitHub · Patroni docs
CrunchyData PGO — GitHub
Percona Operator for PostgreSQL — GitHub
StackGres (OnGres) — official site · GitLab
Operator pattern (CoreOS, 2016) — Introducing Operators
Comparing PostgreSQL Operators — Palark · simplyblock

Hybrid k3s #3: Pods couldn't talk to each other — flannel VXLAN and vmnet

SEON — Sat, 06 Jun 2026 00:00:00 +0000

0. About this series

This series is a record — written one piece at a time — of how I actually built the homelab in the diagram above, the one that's still running as I write this.

6 nodes — 2 Lightsail servers (control plane + etcd) in the cloud (AWS Tokyo) + 4 Lima VM agents on a home (Sapporo) iMac
19 vCPU / 61 GiB total, 49 namespaces , 248 pods (150 running)
Deployed with ArgoCD , auth via Keycloak OIDC , with CloudNativePG, Vault, CrowdSec, Prometheus/Grafana and more running on top

In #1 I stood up two cloud control-plane nodes, and in #2 I welcomed the home iMac as four Lima VM agents — and all six nodes went Ready. This article is about what came next: the nodes were all connected, yet the Pods themselves couldn't talk across nodes. I peer into flannel, follow where the packets actually go, and end up binding the VMs on the same iMac directly with vmnet.

1. Six nodes `Ready`, yet the Pods were strangers

The picture I left off on last time looked clean. Running kubectl get nodes, the two Tokyo servers and four Sapporo Lima agents were all Ready, with a Tailscale 100.x address in INTERNAL-IP.

$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE CONTAINER-RUNTIME
ip-172-26-2-70… Ready control-plane,etcd 143d v1.34.3+k3s1 100.99.x.x Amazon Linux 2023 containerd://2.1.5-k3s1
ip-172-26-3-146… Ready control-plane,etcd 143d v1.34.3+k3s1 100.71.x.x Amazon Linux 2023 containerd://2.1.5-k3s1
lima-k3s-agent Ready <none> 143d v1.34.3+k3s1 100.84.x.x Ubuntu 24.04.3 LTS containerd://2.1.5-k3s1
lima-k3s-agent-2 Ready <none> 143d v1.34.3+k3s1 100.98.x.x Ubuntu 24.04.3 LTS containerd://2.1.5-k3s1
lima-k3s-agent-3 Ready <none> 143d v1.34.3+k3s1 100.117.x.x Ubuntu 24.04.3 LTS containerd://2.1.5-k3s1
lima-k3s-agent-4 Ready <none> 143d v1.34.3+k3s1 100.90.x.x Ubuntu 25.10 containerd://2.1.5-k3s1

I thought I was done. So I eagerly started piling Pods on — and almost immediately hit a strange wall. Pods on the same node talked just fine, but Pods on different nodes couldn't reach each other. Services timed out in odd places, and some Pods couldn't even resolve DNS.

At first I thought, "every node is Ready — so why?" That was a misread. Ready and "the Pod network works" are at two different layers.

Node Ready — the control-plane path, where a node's kubelet trades heartbeats with the apiserver. That's exactly what I'd set up so far: making the node and the apiserver reach each other over Tailscale 100.x. As long as this path is alive, a node looks Ready.
Pod ↔ Pod (across node boundaries) — the data-plane path, where Pods on different nodes exchange packets directly. This is a completely separate road from the control plane, and the thing that lays it isn't the kubelet but the CNI (here, flannel).

All six being Ready with a 100.x (Tailscale) INTERNAL-IP only means the control plane is sound. What finished last time went as far as "the nodes are recognized as members of one cluster" — the road that carries Pod traffic across node boundaries had not been verified yet.

Even with kubectl get nodes all Ready, inter-node Pod traffic isn't guaranteed. The control plane (node ↔ apiserver) and the data plane (Pod ↔ Pod) are separate paths, and the latter is the CNI's job. So the next question narrows to one thing — exactly how does flannel carry a Pod packet across node boundaries?

2. flannel and VXLAN — how a Pod packet crosses a node boundary

k3s uses flannel as its CNI and VXLAN as flannel's default backend. In #1 I brought the servers up with --flannel-backend vxlan, and the #2 agents inherited that setting as-is. (k3s Basic Network Options — flannel's default backend is vxlan; host-gw, wireguard-native, and none are the alternatives.)

Let me trace a Pod packet's journey in two cases.

Within the same node — every Pod hangs off the node's cni0 bridge. Two Pods on the same node meet directly at L2 on that bridge. They never cross a node boundary, so it's fast and never congested. (That's why "Pods on the same node talked" in §1.)
To another node — when the destination Pod is on a different node, the packet leaves cni0 and enters a virtual interface called flannel.1. This is VXLAN's VTEP (VXLAN Tunnel Endpoint). Here the original Pod packet (Ethernet frame and all) is encapsulated whole inside a UDP packet and sent to the peer node.

The "address" and "port" that receive this capsule are the crux.

The port is UDP 8472. On Linux, flannel's VXLAN backend uses the kernel default port 8472/udp (only on Windows does it use the IANA standard 4789). So nodes must be able to reach each other on 8472/udp. (flannel backends — "On Linux, defaults to kernel default, currently 8472" · k3s network requirements)
The address is each node's advertised public-ip. flannel advertises, per node, a destination (the VTEP's outer IP) that says "this is where I receive VXLAN capsules." The default is the IP of that node's default-route interface (the per-node real values are in §5). This "which address gets advertised" is what trips us up all the way through.

Encapsulation isn't free. The VXLAN header (outer IP/UDP + VXLAN + inner Ethernet) eats an extra 50 bytes per packet. So flannel sets flannel.1's MTU to the host interface's MTU minus 50. If the host is 1500, flannel.1 becomes 1450.

On an agent node (lima-k3s-agent), it looks like this:

$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.2.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

$ ip -br link | grep -E 'eth0|lima0|tailscale0|flannel'
eth0 UP ... mtu 1500
lima0 UP ... mtu 1500
flannel.1 UNKNOWN ... mtu 1450 # 1500 - 50 (VXLAN)
tailscale0 UNKNOWN ... mtu 1280 # this 1280 bites in §5

That flannel.1 is a VXLAN device sending to port 8472 shows up in one line with -d (details):

$ ip -d link show flannel.1
5: flannel.1: <...> mtu 1450 qdisc noqueue state UNKNOWN ...
    vxlan id 1 local 192.168.105.2 dev lima0 srcport 0 0 dstport 8472 nolearning ttl auto ...

vxlan id 1 with dstport 8472 — "VXLAN → send to 8472" is right there in that line. The trailing local 192.168.105.2 dev lima0, i.e. "which address/interface this node sends VXLAN out of," is the real key this time — but why it has that value is something §5 untangles.

MTU — the maximum size of a single packet

The output above showed mtu 1500, mtu 1450, and mtu 1280. Since this number bites hard in §5, let me pin it down here.

MTU (Maximum Transmission Unit) is the maximum size (in bytes) of a single packet an interface can carry at once. Ethernet's standard default is 1500, so an ordinary NIC starts at 1500. A packet larger than the MTU is split into fragments, or — if it can't be split — simply dropped. Fragmenting is slow, and dropping makes traffic look like it's stalled.

The crux is that the more you wrap, the less real size fits inside. Wrap a box in a bigger box and the outer size (1500) stays the same, but what fits inside shrinks by the padding. That's why each interface has a different MTU.

Interface	MTU	Why this value
`eth0` / `lima0`	1500	bare Ethernet default, no encapsulation
`flannel.1`	1450	1500 − 50. The ceiling so that one VXLAN-header (50B) wrap still stays under 1500
`tailscale0`	1280	WireGuard's encryption overhead + a conservative value safe on any link (IPv6's minimum MTU)

If flannel VXLAN runs over physical Ethernet (1500), 1450 fits nicely. The problem is when inter-node traffic rides over Tailscale — a Pod packet gets wrapped once by VXLAN and again by WireGuard on top, doubly wrapped , and the size that fits inside shrinks further to about 1280 − 50 = 1230. flannel still sends as if it had 1450, and when the tunnel can't accept that much, the gap erupts as fragmentation, drops, and retransmits. Why this "double encapsulation" wrecks even latency is covered in §5.

Inter-node Pod traffic ultimately reduces to "can the nodes exchange UDP 8472 to each other's public-ip?" If that's blocked , the Pod network breaks (§3); if the path is slow , the Pod network is slow (§5). It's enough to remember that tailscale0's MTU is 1280 — putting VXLAN on top of it shrinks the ceiling further.

3. Inter-node traffic failed, so I opened 8472

If §2's conclusion holds, cross-node Pod traffic comes down to "can the nodes exchange 8472 (VXLAN) to each other's public-ip?" So when traffic failed, there was one most-likely suspect — 8472 is blocked.

That's because in #1 I'd closed the firewall by the book. Cluster ports like the apiserver (6443), kubelet (10250), and flannel VXLAN (8472) aren't opened to the public net; they're reachable only inside the private network / tailnet — the right default for minimizing exposure (#1's firewall table). So "inter-node VXLAN is blocked on the public net, which is why inter-node Pod traffic fails" was a natural hypothesis.

Inside the node, flannel was indeed listening on 8472:

$ sudo ss -ulnp | grep 8472
UNCONN 0 0 0.0.0.0:8472 0.0.0.0:*

The port is open inside the node, yet it doesn't reach Pods on another node — so what's blocked isn't inside the node but the road between nodes, I figured. On a "just make it work" impulse, I broke the principle and opened 8472/udp inbound on the Lightsail firewall. Inter-node Pod traffic went through, and I ran it that way.

Note — not a recommended setup. Unlike 6443, 8472 isn't a port guarded by certificates and tokens; opening it to the public net itself widens the exposed surface. "It went through, so this is the answer" was what I thought at the time, but "opening the port made it work" doesn't mean "the port is why it worked" — this judgment gets overturned in §8 when I look at the actual route.

When inter-node Pod traffic fails, suspecting inter-node 8472/udp (VXLAN) reachability first is a reasonable starting point. But here I opened it to the public net, and that was a debt. The bill arrives in the very next section.

4. I added Longhorn and the restarts wouldn't stop

Now that Pods could talk, I wanted to run a stateful workload "like a real cluster." I picked Longhorn — distributed block storage for Kubernetes.

The reason is simple. k3s's default storage (local-path) is a hostPath tied to one node's disk, so it has no replication — when a Pod moves to another node, the data doesn't follow. To try "data survives even if a node dies," you need distributed storage that replicates data across multiple nodes. Longhorn is the CNCF project that fills that role: it stands up a dedicated controller (the Longhorn Engine) per volume, treats each volume like a microservice, and keeps synchronous replicas of it on multiple nodes' local disks. (Longhorn — What is Longhorn?)

Install was one Helm line, and at first it looked fine. Pods came up Running and volumes were created.

But almost immediately, errors and restarts began repeating endlessly. Volumes dropped to Degraded, replica syncs timed out, rebuilds spun up to fix them, that load made them fail again — a self-reinforcing vicious cycle. Even though I hadn't rebuilt the cluster, nodes got shaken, and other Pods on top got dragged in.

The cause was inter-node latency. Longhorn's synchronous replication waits, for every single write, until the replicas respond "written." That's why Longhorn's docs state plainly that "latency is far more important to a volume's stability than throughput or IOPS" (Longhorn Best Practices), and the troubleshooting guide goes further, recommending inter-node latency under 20ms when multiple volumes do I/O on one node at the same time (Longhorn KB — volume readonly / I/O error).

But latency between the home (lima) nodes was far above that. Measured, the replica's synchronous-write latency averaged 200ms+ — over ten times the recommended 20ms. Since synchronous replication eats that whole latency on every write, volumes couldn't climb out of Degraded, and rebuilds and restarts spun forever. In the end Longhorn effectively dropped the lima (home) nodes from storage, and the goal of "run stateful workloads on a real multi-node cluster" looked impossible on top of this latency.

There was one more thing that didn't add up. Those four too-slow home nodes are physically inside the same single iMac. They're VMs in one machine — so where does 200ms come from? The next section follows that route.

5. Same host, so why slow? — to reach the next node, packets went to Tokyo and back

§4's puzzle was this: four VMs inside one iMac, yet 200ms of inter-node latency. Let me follow where the packets actually go.

Clue 1 — there's no direct path between VMs on the same host. Printing each VM's address is odd:

$ limactl shell k3s-agent -- ip -4 addr show eth0 | grep inet
    inet 192.168.5.15/24 ... eth0
$ limactl shell k3s-agent-2 -- ip -4 addr show eth0 | grep inet
    inet 192.168.5.15/24 ... eth0

Both VMs' eth0 are 192.168.5.15. Lima's default user-mode network fixes the subnet at 192.168.5.0/24, and each VM gets the same address behind its own independent NAT. So they're unreachable directly from the host and from other guests (VMs) — they don't even know each other, despite being inside the same iMac. Lima's own docs note this limitation and point you to "use VMNet to access from the host or other guests" (Lima — user-mode network).

Clue 2 — with no direct path, even reaching the next VM takes the long-distance road. Since the VMs can't reach each other directly, inter-node packets (including Pod VXLAN) take the one common path — the Tailscale (100.x) that bound both sites in #1 and #2. Reaching the VM right next door is no different. Tailscale is a tool for connecting long distances across NAT, so when a direct (P2P) path isn't possible, it detours through a relay (DERP). Here's what tailscale netcheck said:

$ tailscale netcheck
   * MappingVariesByDestIP: true ← NAT where direct P2P is hard (endpoint-dependent mapping)
   * Nearest DERP: Tokyo (30.3ms)

With MappingVariesByDestIP: true, a direct connection isn't possible, so Tailscale detours through the nearest DERP relay — Tokyo. A packet from Sapporo's VM A to its neighbor VM B left the house, went all the way to Tokyo, and came back to Sapporo.

Clue 3 — measure it and that detour is right there. Latency between VMs on the same iMac:

lima ↔ lima-2 : 87 ms (max 202, jitter 54)
lima ↔ lima-3 : 133 ms (max 268, jitter 92)

Tens to a hundred ms for the VM right next door — the cost of a Tokyo round-trip. And at this point the home nodes' Pod VXLAN, whose destination (public-ip) is a tailnet 100.x, has flannel VXLAN riding on top of Tailscale (WireGuard) as well — piling on §2's double encapsulation and the MTU-1280 squeeze. Since Longhorn's synchronous replication ate this round-trip on every write, 200ms was the obvious result.

Everything so far is about VMs inside the same iMac (lima ↔ lima). The cross-site link (home ↔ Tokyo) takes a different path (no double encapsulation there), covered separately in §8.

Nodes being on the same physical host doesn't make them fast. When the virtualization network isolates the VMs, even talking to the VM right next door can detour far away through the overlay. Nearby traffic should end nearby — and §6 and §7 find that road.

6. How to fix it

The problem was clear. The four VMs on one iMac have no LAN that reaches each other directly, so even traffic to the VM next door goes through the long-distance Tailscale + DERP. I lined up the candidates for fixing it.

Pin flannel to tailscale0 (--flannel-iface=tailscale0) — put VXLAN explicitly on the tailnet. But the root problem (a long-distance detour and double encap despite being on the same host) stays. And the servers use a VPC address; setting only the agents to tailscale0 makes the destinations disagree, and it breaks asymmetrically — traffic passes one way only (you'd have to change both, touching #1's server config too). Latency doesn't drop either.
flannel's host-gw backend — route directly to node IPs with no encapsulation; fastest. But it assumes direct L2 connectivity between all nodes (k3s docs). There's no L2 between Tokyo and Sapporo, and none between the user-mode-isolated VMs either.
flannel's wireguard-native backend — encrypt with WireGuard instead of VXLAN. Good for security, but it still runs over the same detour path, so latency is unchanged.
Force Tailscale into P2P — connect directly instead of via DERP. But what blocks the direct path is Lima's default user-mode network isolating the VMs, so it can't be solved from the Tailscale side alone.

What 1–4 have in common is that they just change the wrapping on a slow path. The path itself (leaving the same machine and coming back) stays, so latency doesn't drop. So I flipped the idea around.

Use the fact that they're on the same host — give the VMs a real LAN. Since the four live in one iMac, if the host lays a virtual LAN and lets the VMs talk directly at L2, they never go through Tailscale or DERP at all. This is exactly the method Lima's docs recommend for the user-mode isolation ( VMNet ), implemented on macOS via socket_vmnet. ← adopted.

The test for picking a candidate was one thing — "does it remove the root cause of the latency (no direct path)?" 1–4 try to "go faster" or "re-wrap with a different VPN" and leave the path alone, but vmnet changes the path itself into a LAN inside the house. Then Tailscale handles the long-distance Tokyo ↔ Sapporo leg, and same-host traffic ends at home. The next section applies it and checks the result.

7. Laying a LAN inside the house with socket_vmnet

7-1. vmnet and socket_vmnet

The problem from §5 was "the VMs on one iMac have no LAN that reaches each other directly." What fills that gap is vmnet.

vmnet is macOS's built-in virtual-networking framework (vmnet.framework) — Apple's official API that builds NAT, bridges, and host networking for VMs. But using it directly requires the VM process to hold root privileges and an entitlement, which is a hassle. socket_vmnet is a small daemon built by the Lima project that wraps this vmnet.framework and exposes it over a Unix socket. Only the socket_vmnet daemon runs as root; the VMs just connect to that socket — the VMs themselves don't need to run as root. (That's why the install in 7-2 puts the binary in a root-owned /opt and lays down a sudoers entry.) (socket_vmnet)

socket_vmnet offers three modes:

shared — a private subnet (192.168.105.0/24) + internet NAT. The VMs connected to the same socket_vmnet sit on one virtual switch (L2) and talk to each other directly. ← what we need.
bridged — joins the VMs straight onto the host's physical LAN (e.g. en0).
host — an isolated network with no internet.

(Lima — VMNet)

How the structure changes is the crux.

Before — each VM had only eth0 (Lima's default user-mode, 192.168.5.15, mutually isolated) and tailscale0 (100.x). With no direct path between VMs, inter-node traffic (including Pod VXLAN) leaked onto tailscale0, causing §5's long-distance detour.
After — each VM gets one more interface, lima0 (vmnet shared, 192.168.105.x). eth0 and tailscale0 stay as they were, and the k3s node's InternalIP is still the Tailscale 100.x, so the cluster's identity and membership don't change. Only the data path changes — lima0 becomes the node's default route, and per the rule from §2 ("flannel picks the default-route interface as its VXLAN destination / public-ip"), flannel moves its VXLAN destination to lima0. So Pod traffic between home nodes finishes over vmnet without going through the tailnet or DERP.

In short, it's an additive change — without rebuilding the cluster or touching the nodes' identity, you add one interface (lima0) and reroute only same-host traffic onto a fast LAN.

7-2. Install socket_vmnet (host = the iMac)

Because socket_vmnet is a daemon that runs as root, the binary must live on a root-owned path that a user can't tamper with. Lima discourages installing it via Homebrew for security reasons, so build it from source into /opt/socket_vmnet. (Lima — VMNet)

git clone https://github.com/lima-vm/socket_vmnet
cd socket_vmnet
git checkout v1.2.2 # check the latest stable tag on the releases page
make
sudo make PREFIX=/opt/socket_vmnet install.bin
# → /opt/socket_vmnet/bin/socket_vmnet (root-owned)

7-3. Register the Lima sudoers entry

So Lima can launch socket_vmnet as root, lay down a sudoers fragment.

limactl sudoers > etc_sudoers.d_lima
less etc_sudoers.d_lima # review the contents first
sudo install -o root etc_sudoers.d_lima /etc/sudoers.d/lima
rm etc_sudoers.d_lima

7-4. Attach the shared network to each VM

Add the shared network to each VM's ~/.lima/<vm>/lima.yaml. This network lays down 192.168.105.0/24 (gateway 192.168.105.1) and gives each VM an address in that range via a lima0 interface.

networks:
  - lima: shared
    interface: lima0

Note — duplicate networks: key. If a networks: key already exists in lima.yaml (even as a comment), appending another at the end causes a YAML duplicate-key parse error. Merge under the existing key.

7-5. Restart one at a time

Restart one VM at a time, confirming each goes Ready again (to minimize cluster impact).

for VM in k3s-agent k3s-agent-2 k3s-agent-3 k3s-agent-4; do
  limactl stop "$VM"
  limactl start "$VM"
  kubectl --context k3s-lightsail wait --for=condition=Ready node/lima-"$VM" --timeout=180s
done

On restart, k3s comes back up and flannel re-picks its interface. Since the default route is now lima0 (192.168.105.1), flannel re-advertises its VXLAN destination (public-ip) as the vmnet address per the rule from §2 / §7-1 — with no extra flag like --flannel-iface.

7-6. Verify

Check that each VM got a vmnet address on lima0:

$ limactl shell k3s-agent -- ip -4 addr show lima0 | grep inet
    inet 192.168.105.2/24 ... lima0

Then re-measure ping between VMs on the same iMac:

$ limactl shell k3s-agent -- ping -c5 -q 192.168.105.3
5 packets transmitted, 5 received, 0% packet loss
rtt min/avg/max/mdev = 0.449/0.571/0.639/0.064 ms

The same-host VM-to-VM latency that was 87~133 ms dropped to the 0.5 ms range. The packets that had been round-tripping to Tokyo now finish inside the iMac.

By default, flannel picks "the default-route interface" as its VXLAN destination. So if you give a node a faster direct path (here, vmnet's lima0) and make it the default route, flannel switches over to it on its own — with no extra CNI config.

8. Result — what vmnet changed, and the truth about 8472

After applying it, the VXLAN destination (public-ip) that flannel advertises splits cleanly into the two sites:

$ kubectl --context k3s-lightsail get nodes \
    -o custom-columns='NAME:.metadata.name,PUBLIC-IP:.metadata.annotations.flannel\.alpha\.coreos\.com/public-ip'
NAME PUBLIC-IP
ip-172-26-2-70 172.26.2.70 # Tokyo (server) = VPC
ip-172-26-3-146 172.26.3.146
lima-k3s-agent 192.168.105.2 # home (lima) = vmnet
lima-k3s-agent-2 192.168.105.3
lima-k3s-agent-3 192.168.105.4
lima-k3s-agent-4 192.168.105.5

The four home nodes now exchange VXLAN directly over each other's vmnet address (192.168.105.x) — 0.5 ms.

The 8472 I opened in §3 — I went to close it and looked at the actual route

With the home nodes sorted out by vmnet, it was time to close the 8472 I'd opened against principle in §3. But the moment I went to close it, I got nervous — what if closing it breaks inter-node Pod traffic again? Because in §3 I'd thought "opening it made it work."

So before closing, instead of guessing I looked at the actual route. All you have to do is query the route from one node to a Pod on a node at the other site.

# from a home (lima) node to a Pod on a Tokyo (server) node
$ ip route get 10.42.0.235
10.42.0.235 dev tailscale0 table 52 src 100.84.x.x ...

dev tailscale0, not dev flannel.1. In other words, cross-site Pod traffic was flowing directly over Tailscale, not flannel VXLAN (8472).

Here's the mechanism. In this cluster, each node advertises its own pod CIDR (10.42.N.0/24) as a Tailscale subnet route and accepts the others' (accept-routes). So a remote node's Pod range bypasses flannel and travels directly over the encrypted Tailscale (WireGuard). This is also the official way k3s binds a distributed / multi-cloud cluster with Tailscale (k3s — Distributed/multicloud, Tailscale — Subnet routers). So the "double encapsulation, wrapping VXLAN over WireGuard" from §2 / §5 was the old path of the same host (lima ↔ lima); cross-site is a single layer of WireGuard.

Having confirmed the route, I closed it — restricting it from public (0.0.0.0/0) to VPC-private (172.26.0.0/16). Then I re-measured after closing:

cross-site (home → Tokyo Pod) : 20~26 ms (Tailscale, unchanged)
home nodes (lima ↔ lima) : 0.4 ms (vmnet, unchanged)
servers (server ↔ server) : 0.3 ms (VPC VXLAN, unchanged)
6 nodes Ready : 6/6

Nothing broke. The public 8472 wasn't the actual route of inter-node traffic — it was just a leftover. 8472 is still in use, but only inside private networks — lima ↔ lima over vmnet, server ↔ server over VPC.

Why bother closing it? VXLAN is a protocol with no authentication and no encryption (RFC 7348's security considerations also state plainly that "VXLAN itself provides no authentication or encryption"). Its only identifier is the VNI, and flannel's default is 1, so anyone who can reach 8472 on the public net can inject packets into the Pod overlay (10.42.0.0/16). 6443 at least has a gate of certificates and tokens; 8472 doesn't even have that. So while I was at it, I also closed the apiserver (6443) and kubelet (10250) to the public net — VPC/tailnet only — and restricted SSH (22) to the tailnet.

"It worked after a change" doesn't mean "the change is why it worked." Opening 8472 and traffic going through was a fact, but what actually carried the traffic was Tailscale — the public 8472 was a leftover from the start. Before you open and close ports on a guess, read the actual route once with ip route get — that's the cheapest way to cut costly exposure.

A remaining limit — cross-site is far, by the distance

What vmnet fixed is inside the same host (between home nodes). The Tokyo-cloud ↔ Sapporo-home leg is physically far apart, so it's bound by Tailscale, and that latency (tens of ms) is a value set by distance that you can't shrink.

So I compensate with placement — keep workloads with a lot of inter-node synchronous traffic (latency-sensitive ones) gathered within the home node group. I use the node-type=lima label I'd put on the home nodes back in #2, via a nodeSelector.

$ kubectl --context k3s-lightsail get nodes -L node-type
NAME ... NODE-TYPE
ip-172-26-2-70 ... lightsail
ip-172-26-3-146 ... lightsail
lima-k3s-agent ... lima
lima-k3s-agent-2 ... lima
lima-k3s-agent-3 ... lima
lima-k3s-agent-4 ... lima

In a workload's manifest, you write it like this (if the label isn't there, set it first with kubectl label node lima-k3s-agent node-type=lima):

spec:
  template:
    spec:
      nodeSelector:
        node-type: lima # this Pod group only on home nodes (vmnet, 0.5ms)

Even within one cluster, inter-node latency isn't uniform. Accept the fast leg (same host = vmnet) and the slow leg (long distance = Tailscale), and place workloads by latency with a nodeSelector to steer around the slow leg.

9. Glossary — what came up this time

flannel / VXLAN — k3s's default CNI is flannel, its default backend VXLAN. It carries cross-node Pod packets encapsulated in UDP (8472).
VTEP / flannel.1 — the endpoint that wraps and unwraps VXLAN capsules. Exists per node as the flannel.1 interface.
flannel's public-ip — the destination each node advertises as "this is where I receive my VXLAN." Defaults to the node's default-route interface IP.
MTU — the maximum size of a single packet. Shrinks the more you encapsulate (Ethernet 1500 → VXLAN 1450 → 1230 over Tailscale).
double encapsulation — wrapping a VXLAN packet again in WireGuard. Overhead, MTU squeeze, and latency pile up. (In this cluster it only happened for lima ↔ lima before vmnet; not for cross-site.)
Tailscale / DERP — a WireGuard-based mesh VPN. When a direct (P2P) path isn't possible, it detours through a DERP relay.
Tailscale subnet route — a node advertises a range (here, its own pod CIDR) with --advertise-routes, another node receives it with --accept-routes, and that range's traffic travels over the tailnet (WireGuard). Cross-site Pod traffic flows over this road.
NAT (endpoint-dependent mapping) — a NAT where the port mapping varies by destination. Direct P2P is hard, so it falls back to DERP.
vmnet (vmnet.framework) — Apple's framework that provides NAT, bridges, and host networking to VMs on macOS.
socket_vmnet — a root daemon that exposes vmnet.framework over a Unix socket. VMs connect to the socket without root, get a shared LAN (192.168.105.0/24), and talk directly at L2 between VMs on the same host.
Lima user-mode network — Lima's default network (fixed at 192.168.5.0/24). Isolates VMs from each other and from the host.
node-type label / nodeSelector — a label on the nodes (here, lima/lightsail). Used in a nodeSelector to place workloads on a particular group of nodes.

10. Next

With inter-node traffic sorted out, I can finally run all sorts of services stably on these six nodes.

In the end, the important problem that Longhorn surfaced was solved well, and it's still running stably today. That said, given the nodes' spec limits, I decided to give up on Longhorn for now. Once I have a roomier environment, I'm leaving "put it back and test it" as homework.

What I've written up to now lets me run a fair range of services, but optimization and other small issues I've been handling as I operate. Once the material piles up a bit, I'd like to gather and organize that too.

Next time I plan to talk about CloudNativePG (CNPG). Right as the inter-node networking got solved, I set up CNPG to practice and verify running a service with internal clustering — and it's now serving as the main DB for quite a few services.

Thanks for reading all the way through.

References

k3s — Basic Network Options / Requirements / Distributed·multicloud (Tailscale integration): docs.k3s.io/networking/basic-network-options · /installation/requirements · /networking/distributed-multicloud
flannel — Backends (VXLAN · 8472 · MTU): github.com/flannel-io/flannel … backends.md
Longhorn — What is Longhorn / Best Practices / KB (volume readonly or I/O error): longhorn.io/docs/1.11.2/what-is-longhorn · /best-practices · /kb/troubleshooting-volume-readonly-or-io-error
Tailscale — Device connectivity / How NAT traversal works / Subnet routers: tailscale.com/kb/1411/device-connectivity · tailscale.com/blog/how-nat-traversal-works · tailscale.com/kb/1019/subnets
Lima / socket_vmnet — User-mode network / VMNet: lima-vm.io/docs/config/network/user · /vmnet · github.com/lima-vm/socket_vmnet
VXLAN security (no auth/encryption) — RFC 7348 §6 Security Considerations

Hybrid k3s #2: Welcoming the sleeping iMac as a teammate (4 Lima VM agents)

SEON — Wed, 03 Jun 2026 00:00:00 +0000

0. About this series

This series is a record — written one piece at a time — of how I actually built the homelab in the diagram above, the one that's still running as I write this.

6 nodes — 2 Lightsail servers (control plane + etcd) in the cloud (AWS Tokyo) + 4 Lima VM agents on a home (Sapporo) iMac
19 vCPU / 61 GiB total, 49 namespaces , 248 pods (150 running)
Deployed with ArgoCD , auth via Keycloak OIDC , with CloudNativePG, Vault, CrowdSec, Prometheus/Grafana and more running on top

This article is about taking the two-node cloud cluster from #1 and welcoming the iMac that was gathering dust at home — split into 4 Lima VMs that join as agents.

1. Background — welcoming the sleeping iMac as a teammate

What I built in #1 was the cloud-side foundation.

I put k3s on two AWS Lightsail instances (8GB and 16GB), formed a two-node control plane + embedded etcd , and bound both nodes over Tailscale so they could call each other by a 100.x address. Instead of the textbook three nodes I went with two, taking out insurance with automatic etcd snapshots — a cluster that was, in effect, just the "head." (Plenty of apps are crammed onto those nodes too, penny-pincher that I am.)

This time I'm adding the "limbs."

At home, a fairly old 64GB-RAM iMac sits idle. It's slow — it has an HDD — but memory is the one thing it has plenty of, and its macOS is new enough to run virtualization (vz), so as a host it's more than enough. The goal this time is to bring it in as a cluster worker.

But there was one thing to decide right at the start.

Bring the iMac in whole as a single node, or split it into several?

The easy path is whole.

Install Ubuntu on the iMac, stand up one k3s agent, and you're done. But the whole point of this homelab is "to handle Kubernetes like the real thing." With that in mind, I weighed the two options against the official guidance.

The limit of going whole (one node).

The Kubernetes docs recommend at least one instance per failure zone for fault tolerance. If the home side is a single node, that node is itself a single point of failure — and, more to the point, none of the practice that assumes multiple nodes is possible.

cordon/drain a node and shift its workloads off, spread Pods across nodes (anti-affinity), roll a node out and back in — with one node, all of it is meaningless.

The worth of splitting (multiple nodes).

Add several nodes and the blast radius shrinks, while spreading becomes possible. learnkube's worker-node sizing analysis shows this in numbers — with five nodes you can scatter five replicas onto separate nodes, so losing one node costs you at most one replica.

With only two nodes, no matter how many replicas you add, the effective spread tops out at two.

Splitting isn't free, of course.

As that same article points out, every node reserves resources for kubelet and the OS — a 1 vCPU/4GB node gives up about 1.1GB, a 4 vCPU/32GB node about 3.66GB — so the finer you slice, the larger the system-overhead ratio. The pods-per-node count is also capped at 110 by default. In short, "infinitely fine" isn't the answer; you want a balance point of reasonably sized nodes in a reasonable number.

The conclusion was to split.

For a learning-focused homelab where you want to handle scheduling and failure "like a real cluster," the home side should be multiple nodes too. That 64GB of memory is what makes the luxury possible. (How many to split into is decided in §3.)

And here another question branches off. How do you split one physical machine into multiple nodes?

2. Lima VM

There are several ways to turn one physical iMac into multiple k3s nodes. I lined up the candidates and filtered them against this homelab's conditions ("run headless as a server around the clock, mass-produce identical machines reproducibly, keep macOS").

The biggest fork is fake the nodes with containers, or make real nodes with VMs.

Method	Character	For this situation
Bare-metal Ubuntu reinstall	Install Linux straight on the iMac, one node	Have to wipe macOS, and you still end up with one node → conflicts with the point of splitting, excluded
Docker + k3d	Stand up k3s in containers to mimic multi-node	Nodes are containers, so they share the host kernel (weak isolation); the "real node" feel is thin → underwhelming for learning
Multipass (Canonical)	Launch an Ubuntu VM in one line	VM = real node, fine, but Ubuntu only , and it leans toward one-shot launches rather than declaratively mass-producing identical VMs (the backend has been QEMU by default since 1.12)
VirtualBox / VMware Fusion	Traditional GUI hypervisors	Run on Intel Macs but heavy and GUI-centric; scripting N of them is a pain
UTM	A macOS GUI front end for QEMU	Nice for making 1–2 by GUI, but not a great fit for headless / reproducibility
Colima	"Containers on Lima," a Docker/k8s abstraction	Uses Lima underneath; its aim is providing a container runtime, not defining VMs directly
Lima	Headless Linux VMs from declarative YAML	One YAML reproduces the same VM any number of times , headless, containerd-friendly, native speed on the vz backend ← chosen

As the diagram shows, with k3d the nodes are containers sharing one kernel. It's fast and light, but because the kernels aren't isolated between nodes, it's a step removed from the feel of "operating real nodes." The rest (Multipass, VirtualBox, UTM, Lima) are VMs, each with its own kernel , so isolation is strong. What separates them further is management style and backend.

Lima (Linux Machines) is an open-source tool for standing up headless Linux VMs on macOS. I chose it for three reasons.

Declarative and reproducible. Write the VM's spec (CPU, memory, disk, distro) in YAML, and the same definition mass-produces four identical VMs as-is. That's a different level of reproducibility from clicking through a GUI four times.
Suits headless. It runs as an always-on server, so no GUI is needed. Lima runs entirely from the CLI (limactl).
Fast backend. Since v1.0, on macOS (13.5+) Lima uses vz (Apple Virtualization.framework) as the default backend. Running an Intel VM on an Intel Mac this time means native virtualization, not emulation, so it's light. (On older macOS where vz isn't available, you can fall back to vmType: qemu.)

The third point is grounded in the Lima vmType docs.

How Lima works

It looks like "a tool for standing up Linux VMs on a Mac," but once you see the structure, it becomes clear why it gets reproducibility and speed at the same time.

limactl (host CLI) — Lima's core. One line, limactl start ./k3s-agent.yaml, builds and boots a VM exactly to the spec in the YAML. With no GUI, drop it in a script and loop four times to stand up four identical VMs.
lima.yaml (declarative config) — a single file holding the VM's CPU, memory, disk, distro, and provisioning scripts. It's the "VM blueprint," so sharing the same file gets anyone the same VM.
vz (Apple Virtualization.framework) — the hypervisor that actually runs the VM. It's Lima's default on macOS 13.5+, and since it runs an Intel guest on an Intel host, it runs natively with no emulation. Run systemd-detect-virt inside a joined VM and you'll see apple — the proof.
Guest VM (Ubuntu) — has its own Linux kernel, separate from the host. On top of it run containerd (bundled in k3s) and the k3s agent, and a separately installed Tailscale gives the tailscale0 interface a 100.x address. The VM joins the cloud nodes over this 100.x (§5·§6).
Host ↔ guest links — Lima sets up virtiofs file sharing, port-forwarding, and SSH automatically. So you drop straight into the VM from the Mac terminal and exchange files.

In short: "one blueprint (YAML) → limactl boots it via vz → a real node with its own kernel." For the conditions "the same machine, many times, by script, lightly," Lima fit best.

3. Splitting into meaningful units

Once splitting was decided, what's left is the count. There's a reason I went with four rather than two or eight.

The yardstick was RAM. What caps the node count is memory, not CPU. Several VMs can time-share CPU (oversubscription), but memory can't be — once you allocate it, it's gone.

Leave about half of the 64GB for the macOS host and headroom, and the VMs' share is around 32GB. How many pieces to split that into?

The minimum size of one node.

Slice too finely and the fixed overhead each node reserves for kubelet and the OS starts to stand out. In learnkube's numbers, a 1 vCPU/4GB node hands over about 1.1GB (28%!) to the system. Around 8GiB that ratio becomes bearable, leaving room to run a meaningful workload on top. So I made one VM = 8GiB the unit.

Balancing the count. 32GB ÷ 8GiB = 4 nodes. The numbers lined up, and it fits the learning goal too.

Two is too few to call multi-node. Lose one and half is gone, and however many replicas you spread, the effective spread is two.
Eight shrinks 8GiB to 4GiB, pushing the per-node overhead ratio back up, while eight VMs fight over the host's cores — CPU contention and host-RAM pressure get rough. The more nodes there are, the more the node controller's health-check load grows too.
Four is the lowest count where you can drain one node and have three carry it, spread Pods with anti-affinity, and practice rolling a node out and back — while still keeping the 8GiB workload unit.

Why being short on CPU is fine.

I gave each one 3 vCPU, so four make 12 vCPU. That's more than the host's physical core count — honestly, the cores don't add up to that total (oversubscription). It still works because most of this homelab's workloads are warm-start : they sleep on minimal resources when unused and wake on a request. Not every Pod runs at full load at once; they wake and sleep on a stagger, so several VMs sharing the physical cores is no strain in real use. Unlike memory, CPU is time-shared — which is exactly why oversubscription holds up under virtualization.

To sum up, the initial design is 4 home nodes, each 3 vCPU / 8GiB / 300GiB disk / Ubuntu 24.04 LTS. Add #1's two cloud control-plane nodes and you reach the target of six.

Home node (initial, all 4 the same)	Value
vCPU	3
Memory	8 GiB
Disk	300 GiB
OS	Ubuntu 24.04 LTS
Role	k3s agent (workload only)

One thing up front. The above is the initial design , and three of the four are still like that. But while operating it — trying out a larger app, checking compatibility against a different OS version — I bumped just the fourth node (agent-4) to 4 vCPU / 16GiB and reinstalled it on Ubuntu 25.10. So when you run limactl list in §4, one machine looks different.

4. Installing Lima and defining the VM

From here it's hands-on. The order is ① install Lima → ② write the VM blueprint (YAML) → ③ start four → ④ verify. Every output below was taken on this actual iMac.

4-1. Install Lima (host = the iMac's macOS)

One line with Homebrew.

brew install lima
limactl --version


limactl version 2.0.3

Making vz the default backend needs macOS 13.5 or newer (this iMac meets that). On a lower version, change vmType to qemu in the YAML below — slower, but it works the same.

4-2. The VM blueprint — `k3s-agent.yaml`

To keep the four identical, I freeze the definition into one file. This YAML is the blueprint for all four nodes.

# k3s-agent.yaml — shared blueprint for the 4 home nodes (Lima)
images:
  - location: "https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-amd64.img"
    arch: "x86_64"
cpus: 3
memory: "8GiB"
disk: "300GiB"
# vmType omitted → vz is automatic on macOS 13.5+ (use vmType: "qemu" on older macOS)
# mounts/containerd left at defaults. As a node, no host-directory sharing needed.

The points that matter:

images — the Ubuntu 24.04 LTS cloud image (x86_64, since it's Intel). The same image as Lima's default ubuntu-24.04 template. (On Apple Silicon, switch to an arch: "aarch64" image.)
cpus/memory/disk — the 3 vCPU / 8GiB / 300GiB decided in §3.
Omitting vmType is deliberate — on macOS 13.5+, vz is chosen automatically. That's why systemd-detect-virt reads apple inside the VM.

Note — disk is sparse (thin) allocated, but the part you use is really used. disk: 300GiB is a ceiling, so it only takes the image's size at first, but as four of them fill up they eat a good chunk of the host disk in total. On an HDD especially, leave generous room.

4-3. Start four

Same blueprint, only the name changes , four times.

for n in k3s-agent k3s-agent-2 k3s-agent-3 k3s-agent-4; do
  limactl start --name="$n" ./k3s-agent.yaml --tty=false
done

Lima prefixes the instance name with lima- to make the hostname → lima-k3s-agent, lima-k3s-agent-2, … . That name later shows up as-is as the node name in kubectl get nodes. (--tty=false is for automation — create it without opening an editor.)

4-4. Verify (real output)

Check on the host that four came up.

limactl list


NAME STATUS SSH CPUS MEMORY DISK DIR
k3s-agent Running 127.0.0.1:61372 3 8GiB 300GiB ~/.lima/k3s-agent
k3s-agent-2 Running 127.0.0.1:61392 3 8GiB 300GiB ~/.lima/k3s-agent-2
k3s-agent-3 Running 127.0.0.1:60490 3 8GiB 300GiB ~/.lima/k3s-agent-3
k3s-agent-4 Running 127.0.0.1:61460 4 16GiB 300GiB ~/.lima/k3s-agent-4

All four are Running.

Drop into a VM and check its spec, virtualization backend, and OS.

limactl shell k3s-agent -- nproc
limactl shell k3s-agent -- systemd-detect-virt
limactl shell k3s-agent -- free -h
limactl shell k3s-agent -- grep PRETTY_NAME /etc/os-release


3
apple
               total used free shared buff/cache available
Mem: 7.8Gi 2.3Gi 744Mi 232Mi 5.3Gi 5.5Gi
Swap: 0B 0B 0B
PRETTY_NAME="Ubuntu 24.04.3 LTS"

3 vCPU, ~7.8GiB of memory, systemd-detect-virt reading apple (proof it's on vz), Ubuntu 24.04.3 LTS. (used/buff/cache are already loaded because this node is running k3s workloads right now — right after creation it'd be nearly empty.)

Only k3s-agent-4, swapped during operation, has a different OS.

limactl shell k3s-agent-4 -- grep PRETTY_NAME /etc/os-release
limactl shell k3s-agent-4 -- nproc


PRETTY_NAME="Ubuntu 25.10"
4

From the cluster's point of view, a different OS version is no problem — as long as the k3s version and container runtime line up (confirmed in §7).

At this point four Ubuntu VMs are up on the iMac (three at the initial spec, one beefed up for testing). They're still just four Linux machines with nothing to do with the cluster. Next is Tailscale (§5), which binds these four and the Tokyo cloud nodes into one private network.

5. Tailscale — binding the four into one private network with the Tokyo cluster

What §4 produced is four empty Ubuntu VMs on the iMac. To bind them with the Tokyo cloud nodes, there's a problem to solve first.

Home has no global IP. It's behind the router's NAT, so you can't open a connection from outside (the cloud) to a home VM first. You could punch a hole with port-forwarding + DDNS, but I'd rather not touch the router and expose the home IP to the internet. And the answer to this was already decided back in #1 — Tailscale.

Tailscale is a WireGuard-based mesh VPN. Every machine dials outbound , so even with both sides behind NAT they connect directly (falling back to a DERP relay if the direct path fails), and each machine gets a fixed private address in the 100.64.0.0/10 range. Since #1 already bound the two cloud nodes this way, this time it's just adding the four home VMs to the same tailnet.

Why install Tailscale per VM? Because the k3s agent uses a Tailscale 100.x as the address it advertises itself on (--node-ip) (confirmed with the real flags in §6). Each node needs one fixed 100.x, so Tailscale goes on each of the four VMs.

5-1. Install Tailscale on each VM

Same one line on all four.

curl -fsSL https://tailscale.com/install.sh | sh

Either run it by dropping into each VM with limactl shell, or put it in the provisioning script of §4-2's lima.yaml so it installs automatically when the VM is created.

5-2. Auth — an auth key, since it's headless

The VMs have no browser, so instead of interactive login I authenticate non-interactively with an auth key. Issue one Reusable key in the Tailscale admin console under Settings → Keys (reusable so the same key works for all four) and a tskey-auth-… string appears. It's shown only once, so copy it right then.

On each VM:

sudo tailscale up --auth-key=tskey-auth-XXXXXXXX...

I joined them on a tagless personal account — keeping it simple. To lock access down further, you can layer on tags (like tag:server) and an ACL policy.

5-3. Verify — six nodes on one tailnet

Check that each VM got a 100.x.

limactl shell k3s-agent -- tailscale ip -4


100.84.x.x

Look at the whole tailnet.

limactl shell k3s-agent -- tailscale status


100.84.x.x lima-k3s-agent me@… linux -
100.98.x.x lima-k3s-agent-2 me@… linux active; direct
100.117.x.x lima-k3s-agent-3 me@… linux active; direct
100.90.x.x lima-k3s-agent-4 me@… linux active; direct
100.71.x.x ip-172-26-3-146 me@… linux active; direct # cloud server-A (#1)
100.99.x.x ip-172-26-2-70 me@… linux active; direct # cloud server-B (#1)
100.88.x.x seon-mbp-m4 me@… macOS - # work laptop

Other same-account devices (the iMac host, NAS, and so on) show up too, but I trimmed them above. active; direct is the mark of a direct connection with no DERP relay (NAT traversal succeeded). With that, the four home VMs and the two Tokyo cloud boxes reach each other by 100.x inside one tailnet.

5-4. Firewall — Tailscale opens zero new public ports

Let's stop here a moment and check: "did I just expose anything extra to the internet?"

Tailscale only uses outbound connections (UDP 41641, or DERP over 443 if that's blocked). So even with Tailscale on the VMs, the number of new inbound firewall ports to open is zero. It extends only the private network, without widening the exposed surface. The orthodox way is to keep cluster ports like the apiserver (6443) off the public net and reachable only inside the tailnet (the laptop's connection goes over the tailnet too, in §7).

Note — "exactly zero public ports" is something I mean to finish in a later article. The apiserver (6443) can be closed on the public side the moment you switch to tailnet access. But closing inter-node Pod traffic (flannel VXLAN, UDP 8472) on the public net too needs extra config to run flannel over Tailscale (otherwise Pod-to-Pod traffic between nodes breaks — see the note in §6). That work and its limits (the overhead of double encapsulation, and so on) come later. What's certain in this article is "Tailscale itself adds no exposure at all."

Something I learned from running it: this is where a lot of time goes. Even within this article's scope, you end up opening a port to get Pods talking to each other. And even with the port open, the limited speed and latency of the Tailscale overlay make it a real struggle. Solving it is the next article's topic.

At this point six machines (2 cloud + 4 home) call each other by a fixed 100.x on one private network. They aren't bound into a single cluster yet — in the next §6, that 100.x goes straight into the k3s agent join.

6. Agent join — putting the Tailscale 100.x straight into node-ip

This is the article's goal. The four VMs now on one tailnet join as agents to the cloud cluster stood up in #1.

The method is almost the same as #1's server install. You pass the k3s install script two environment variables (K3S_URL, K3S_TOKEN) and the agent flags. The key is setting the address the node advertises itself on (--node-ip) to the Tailscale 100.x you got in §5. That gets the node to the apiserver over the tailnet and into the cluster.

6-1. Get the join token from the server

The token an agent uses to join lives on the server (#1's cluster-init node).

sudo cat /var/lib/rancher/k3s/server/node-token


K10<hash>::server:<random>

Copy this value (the K3S_TOKEN below). The token's location and meaning are laid out in the k3s token docs.

6-2. Join each VM as an agent

Run on each of the four VMs. K3S_URL is the server's Tailscale address ; --node-ip/--node-external-ip are that VM's Tailscale 100.x.

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.34.3+k3s1 \
  K3S_URL=https://100.71.x.x:6443 \
  K3S_TOKEN=K10<hash>::server:<random> \
  sh -s - agent \
    --node-ip=<this VM's 100.x> \
    --node-external-ip=<this VM's 100.x>

Breaking down the flags:

K3S_URL=https://100.71.x.x:6443 — the apiserver on the server (#1 cluster-init). It's a Tailscale address, so it's reachable inside the tailnet even with 6443 closed on the public net.
K3S_TOKEN — the value from 6-1. With both URL and TOKEN present , the k3s install script installs as an agent , not a server.
--node-ip=100.x / --node-external-ip=100.x — use the Tailscale address as this node's InternalIP / externally advertised address.

Peek at a node that actually joined and you'll see the same thing written there (real, token masked):

# /etc/systemd/system/k3s-agent.service
ExecStart=/usr/local/bin/k3s agent --node-external-ip=100.x.x.x --node-ip=100.x.x.x

# /etc/systemd/system/k3s-agent.service.env
K3S_URL='https://100.71.x.x:6443'
K3S_TOKEN= ********

The flag meanings are in the k3s agent reference. The CNI carries over the flannel vxlan decided on the server in #1 as-is, and all six nodes are pinned to k3s version v1.34.3+k3s1 (confirmed in §7).

Note — this article goes only as far as "node join (Ready)." Inter-node Pod traffic is a later networking story. With just --node-ip=100.x, a node reaches the apiserver over the tailnet and goes Ready (which is this article's goal: "the iMac joins as nodes"). But getting Pods on different nodes to talk requires flannel VXLAN to cross between nodes, and there's a trap here.

By default flannel advertises each node's default-route interface IP as the VXLAN destination (public-ip). Cloud nodes have that as a VPC private network (e.g., 172.26.x), home VMs as their own private network — so they sit on different underlays and may not reach each other directly.

So you need config to "run VXLAN over the tailnet (100.x)" (like --flannel-iface=tailscale0), but you must not change just one side (the agents). Setting it on agents only makes them disagree with the server's destination (VPC), and it breaks asymmetrically — traffic passes one way only. You have to align both server and agents (= touching #1's server config too).

This — "putting flannel properly on the tailnet + the overhead and limits of double encapsulation (VXLAN over WireGuard) + the optimization" — is the next article's topic. So this time I stop at nodes joining and going Ready.

6-3. (optional) Automate it in lima.yaml

Instead of repeating by hand on four VMs, put the Tailscale install (§5) and the agent join above into the provisioning script of §4's lima.yaml, and one limactl start carries it all the way to a finished node. The §2 promise of "one blueprint, four identical nodes" comes full circle here.

7. Verifying from the laptop — six nodes in one cluster

Following the orthodox approach, the apiserver (6443) isn't open on the public net and is reached only inside the tailnet. So to check, the laptop joins the same tailnet and the kubeconfig is pointed at the server's tailnet address. (The principle is the same whether the laptop is Mac, Windows, or Linux.)

7-1. Put the laptop on the tailnet (Mac / Windows / Linux)

Install the same Tailscale, on the same account, on the laptop too.

macOS — the GUI app via brew install --cask tailscale-app (or the App Store), then log in.
Windows — the installer from tailscale.com/download, then log in.
Linux — curl -fsSL https://tailscale.com/install.sh | sh → sudo tailscale up.

Once you log in, the laptop gets a 100.x and joins the same tailnet as the nodes. If tailscale status shows the cloud and home nodes, you're ready.

Note — the Homebrew package names are confusing. The macOS GUI app (menu bar) is the cask tailscale-app. brew install tailscale (the formula) installs only the CLI (tailscaled). To log in via the GUI on the laptop, use brew install --cask tailscale-app.

7-2. Point kubeconfig at the tailnet address

The server's (#1 cluster-init) kubeconfig is at /etc/rancher/k3s/k3s.yaml, with the server field defaulting to https://127.0.0.1:6443. Bring this file to the laptop (e.g., ~/.kube/config) and change server to the server's Tailscale address.

kubectl config set-cluster default --server=https://100.71.x.x:6443

Note — TLS won't pass unless the cert SAN matches. Even after pointing server at the tailnet, if that 100.x isn't in the apiserver cert's SAN you'll be rejected with x509: certificate is valid for ... not .... When k3s brings up the server with --node-ip=100.x, it includes that address (the InternalIP) in the cert SAN automatically — peek at the cert and the 100.x is there:
X509v3 Subject Alternative Name:
DNS:kubernetes, DNS:kubernetes.default, ..., IP Address:10.43.0.1,
IP Address:100.71.x.x, IP Address:100.99.x.x, IP Address:127.0.0.1, ...
If the 100.x isn't in the SAN, add --tls-san=100.71.x.x on the server and re-issue the cert (k3s server docs).

7-3. Verify — `kubectl get nodes`

Check it straight from the laptop. The first things to check are whether all six are Ready , and whether the agents' INTERNAL-IP is 100.x.

kubectl get nodes -o wide


NAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE CONTAINER-RUNTIME
ip-172-26-2-70… Ready control-plane,etcd 140d v1.34.3+k3s1 100.99.x.x Amazon Linux 2023 containerd://2.1.5-k3s1
ip-172-26-3-146… Ready control-plane,etcd 140d v1.34.3+k3s1 100.71.x.x Amazon Linux 2023 containerd://2.1.5-k3s1
lima-k3s-agent Ready <none> 140d v1.34.3+k3s1 100.84.x.x Ubuntu 24.04.3 LTS containerd://2.1.5-k3s1
lima-k3s-agent-2 Ready <none> 140d v1.34.3+k3s1 100.98.x.x Ubuntu 24.04.3 LTS containerd://2.1.5-k3s1
lima-k3s-agent-3 Ready <none> 140d v1.34.3+k3s1 100.117.x.x Ubuntu 24.04.3 LTS containerd://2.1.5-k3s1
lima-k3s-agent-4 Ready <none> 140d v1.34.3+k3s1 100.90.x.x Ubuntu 25.10 containerd://2.1.5-k3s1

How to read it:

The two cloud nodes are control-plane,etcd (#1); the four home nodes are ROLES <none> — agents, workload only.
Every node's INTERNAL-IP is 100.x (Tailscale). That's the sign the node joined over the tailnet. If a LAN IP like 192.168.x shows up here, the node advertised itself wrong — so always check this column right after a join.
Version v1.34.3+k3s1 and runtime containerd://2.1.5-k3s1 are the same across all six. Only lima-k3s-agent-4 has a different OS (Ubuntu 25.10, that test node from §4), but since the k3s version and runtime match, it's no problem for joining.

Label the home nodes to tell them apart from the cloud (real):

kubectl get nodes -L node-type


... ip-172-26-2-70 ... lightsail
... ip-172-26-3-146 ... lightsail
... lima-k3s-agent ... lima
... lima-k3s-agent-2 ... lima
... lima-k3s-agent-3 ... lima
... lima-k3s-agent-4 ... lima

When I later route workloads to home/cloud, I use this node-type label in a nodeSelector (placement strategy in a later installment).

By the way, in the current operating state, with the networking finished , Pods are spread across the six nodes like this (this is how it runs now, after the improvements — not right after this step):

ip-172-26-2-70 68 (cloud)
ip-172-26-3-146 14 (cloud)
lima-k3s-agent 26 (home)
lima-k3s-agent-2 18 (home)
lima-k3s-agent-3 25 (home)
lima-k3s-agent-4 96 (home, the larger-app test node)

Pods running on both the cloud and home nodes — that's what a hybrid cluster looks like.

With this, the home iMac became a worker in the cluster. Four empty Lima VMs → one private network with Tailscale → joined as k3s agents → 2 in Tokyo + 4 in Sapporo = all 6 nodes Ready. The apiserver (6443) is seen only over the tailnet, by both the nodes and the laptop.

8. Cost — the increase was zero

The new cost this round is effectively zero. I added four nodes, but everything I used was either free or already on hand.

Item	Cost (USD)
Lightsail server-A (8GB)	$44 / mo (unchanged from #1)
Lightsail server-B (16GB)	$84 / mo (unchanged from #1)
k3s / Lima	$0 (open source)
Tailscale Personal	$0
Home nodes (Lima VM ×4, iMac)	$0 (iMac I already had)
This round's increase	+$0

One at a time:

Cloud — the same two Lightsail boxes from #1. Instances added this round: zero → cloud increase $0.
k3s · Lima — both open source. Adding nodes costs no license fee.
Tailscale — the personal plan is free. Six nodes + a laptop is nowhere near the free-tier limit → $0.
Home nodes — reused the idle iMac → zero new purchase.

Note — I'm deliberately leaving electricity out as a number. The iMac runs 24/7, so power really does cost something. But that cost swings widely by region (Sapporo, Tokyo, and each reader's own country), contract plan, and season/usage — so putting one dollar figure on it would be wrong for most readers. So I don't quantify it — measuring your own draw (W) with a smart plug is the most accurate. The point is that the cloud and software increase is zero , and the only real added cost is "electricity for a machine I already own."

9. Glossary — what came up this time

A quick sweep of the terms.

Lima / limactl — an open-source tool for standing up headless Linux VMs on macOS from declarative YAML. limactl is its CLI.
vz (Apple Virtualization.framework) — macOS's built-in hypervisor. Lima's default on 13.5+. If systemd-detect-virt reads apple in the guest, it's on vz.
Guest VM vs container — a VM has its own kernel and strong isolation; a container shares the host kernel and has weak isolation. That's why this article chose VMs for "real nodes."
k3s server / agent — a server is the control plane (+etcd); an agent is a workload-only node. Pass K3S_URL+K3S_TOKEN at install and it joins as an agent.
tailnet / 100.x — the private mesh network Tailscale builds. Each device gets a fixed address in the 100.64.0.0/10 (CGNAT) range.
WireGuard / DERP — WireGuard is Tailscale's VPN engine; when a direct path isn't possible, it detours through a DERP relay.
auth key (Reusable) — a key for joining the tailnet non-interactively, without a browser. Reusable across multiple nodes.
flannel / VXLAN — flannel is k3s's default CNI, VXLAN its default backend. It carries inter-node Pod packets encapsulated in UDP (8472).
--node-ip / InternalIP — the address a node advertises to the cluster. Put a Tailscale 100.x here and the node joins over the tailnet (putting Pod-to-Pod traffic on the tailnet is separate config — next time).
node-token — the secret an agent uses to join. On the server at /var/lib/rancher/k3s/server/node-token.
node-type label / nodeSelector — a label on the nodes (here, lima/lightsail). Used later in a nodeSelector to route workloads to home/cloud.

10. Next

With six nodes in place, the next thing is what to put on top of them.

At first I got carried away and threw all kinds of things on — and the traffic wouldn't go through. I ended up ignoring the principle and opening port 8472 (udp / flannel VXLAN) to make communication work, and ran it that way. But the real trouble started once I brought in Longhorn and CNPG: latency on inter-node traffic set off a cascade of errors, with pods restarting over and over, and countless rounds of trial and error.

That's what I want to get into next time.

Thanks for reading all the way through.

Hybrid k3s #1: Cloud and home into one cluster — initial setup

SEON — Tue, 02 Jun 2026 00:00:00 +0000

0. About this series

This series is a record — written one piece at a time — of how I actually built the homelab shown in the diagram above, the one I'm running right now.

What started as a toy project from a simple "could this even work?" turned, through satisfying performance and endless tearing-down-and-rebuilding, into a genuine toy that relieves the stress built up at work.

It isn't a resource-rich cluster, but it has been more than enough to get a real taste of Kubernetes, and it keeps giving me new things I want to try next.

6 nodes — 2 Lightsail servers (control plane + etcd) in the cloud (AWS Tokyo) + 4 Lima VM agents on a home (Sapporo) iMac
19 vCPU / 61 GiB total, 49 namespaces , 248 pods (150 running)
Deployment via ArgoCD , authentication via Keycloak OIDC , with CloudNativePG, Vault, CrowdSec, Prometheus/Grafana, and more running on top

It wasn't easy, but it wasn't hard enough to give up on either — so I'm going to write up, one at a time, the things I learned while building it and the things I want to keep.

This first story is about the foundation — how I started from two control-plane nodes in the cloud.

1. Background

There was no grand blueprint to begin with. The starting point was ordinary.

Working with Kubernetes in my day job, things I want to dig into more keep coming up. Reading the docs is one thing; breaking and fixing a cluster with my own hands is another. There's an environment I can touch at work too, but it's limited, and a careless mistake there leads to noisy, annoying situations — so there were limits.

I needed a cluster I could run however I wanted.

As it happened, a 64GB-RAM iMac , more than 10 years old, was sitting mostly idle at home. It still performs well enough, but it has an HDD so it's slow, its OS is past end-of-support, and it has handed its seat to a MacBook Pro M4 and is now resting. On the cloud side, I already had two small Lightsail instances running personal services, and as those services grew, resources were gradually getting tight.

"What if I stopped keeping the idle home machine's resources and the cloud I'm already paying for separate, and used them as one?"

The urge to learn and the pressure on resources converged on a single idea — combine the cloud and home into one cluster. This article is the first dig: building the cloud-side foundation.

2. Why k3s — a choice under limited resources

First, let's prepare a Kubernetes (k8s) environment.

But for the resources I had in my cloud environment, standard k8s was too heavy. In my dreams I wanted to run wild on a multi-cluster with thousands of nodes; in reality it was a small AWS Lightsail instance of about $150/month and a single 10-plus-year-old iMac near retirement.

I had to pick "which Kubernetes to go with" first. Here's what my research turned up.

Option	Character	For this situation
Managed (EKS/GKE/AKS)	The cloud runs the control plane for you	Control-plane fee + node cost → conflicts with low cost / reusing idle gear, excluded
Vanilla Kubernetes (kubeadm)	Assemble upstream yourself	The most orthodox but heavy and hands-on → a burden for low-spec/small scale, excluded
k3s (Rancher/SUSE)	Single-binary lightweight distro	Lightweight distro — finalist
k0s · MicroK8s	Lightweight distros of a similar kind	Likewise lightweight distros — finalist
minikube · kind	For local dev/testing	Not meant for persistent multi-node operation → excluded

Filtering this way, the candidates narrowed to three lightweight distros: k3s, k0s, and MicroK8s. Digging deeper into the three:

Item	k3s (chosen)	k0s	MicroK8s
Maker	Rancher/SUSE	Mirantis	Canonical
Packaging	Single binary	Single binary	snap package (depends on snapd)
Default datastore	SQLite (kine); embedded etcd for HA	etcd standard (kine for other DBs too)	dqlite (distributed SQLite, Raft)
HA approach	Switches to etcd with multiple servers	Provided by default	Automatic HA at 3+ nodes
Control plane	server also runs workloads	Internal components as separate processes, control-plane isolation	Per node
Default CNI	flannel (lightweight, limited policy)	kube-router/calico	calico (HA variant)
Bundling	Essential components included (Traefik, ServiceLB, local-path…)	Minimal, easy to swap default components	Enable add-ons with `microk8s enable`

Why k3s.

All three are CNCF-compliant lightweight distros, but they differ in character.

k0s keeps the control plane separate from workloads, which is clean, but it ships with fewer things, so there's more to plug in yourself.

MicroK8s has the convenience of enabling add-ons with a single microk8s enable line, but in return it's tied to snap, and there are reported cases of dqlite CPU/consensus instability on write-heavy clusters. (GitHub Issue #3227)

k3s , on the other hand, has essential components bundled into a single binary, so the initial setup is the fastest, and the path of moving to embedded etcd with multiple servers fits naturally with this kind of "cloud + home HA." Add low-spec/ARM support and the depth of its docs and community, and for the goal of learning and low-cost operation at once, k3s fit best. (comparison sources: Palark · Portainer · nOps)

k3s repackages that Kubernetes as a single binary (under 100MB) while staying 100% compatible (CNCF certified). Its requirements are essentially just a modern kernel + cgroups, so it's no strain even on low-spec hardware. (What is K3s)

Just three reasons it's light:

Single binary, single process. Components that run separately in regular Kubernetes — kube-apiserver, kube-scheduler, kube-controller-manager, kubelet, kube-proxy — are wrapped into one k3s process, with the containerd runtime built in. (Architecture)
Flexible datastore. A single server uses SQLite by default; with multiple servers, embedded etcd is selected automatically (external MySQL/Postgres are also possible). (Datastore)
Essential components included. flannel (CNI), CoreDNS, Traefik (Ingress), ServiceLB, local-path (storage), and metrics-server are brought up together at install time. That's that much less to assemble yourself.

As a bonus, k3s nodes come in two kinds — server (control plane + datastore) and agent (workload only) — which made it a good match for a hybrid setup like "cloud = server, home = agent." You'll see this in the diagrams from chapter 4 onward.

3. The control plane — three is the rule, but a two-node challenge

Originally I ran personal services in the cloud with Docker Compose. The small instance handled the DB , and the large instance handled several microservices. Moving these two to Kubernetes, my first worry was the control plane.

For Kubernetes to be stable, control-plane HA is the baseline. k3s's embedded etcd can't accept writes unless it keeps a majority (quorum), and the official HA guide recommends 3 or more servers (an odd number). With n nodes the quorum is (n/2)+1, and the node count minus the quorum is how many node failures you can tolerate.

servers	quorum	failures tolerated
1	1	0
2	2	0
3	2	1
4	3	1

The rule is three. But adding one more instance was tight on the wallet, so I changed the goal:

_I know three is the right answer, but for now let me run two as stably as possible. _

In choosing two, I made two things clear.

First, don't pile everything on one node.

I once put the control plane and services all on a single node and got badly burned. Lightsail is a burstable CPU model: each plan has a per-vCPU baseline %, and when load stays above it for a while it spends the burst capacity it had accrued, dropping to baseline once it hits 0. With the control plane (apiserver, etcd) on the same node, the moment the CPU dries up, cluster control itself stops — so I split the load across two nodes.

node	plan	vCPU	baseline	role
server-A	8GB ($44/mo)	2	30%	cluster-init · control-plane+etcd+worker
server-B	16GB ($84/mo)	4	40%	join · control-plane+etcd+worker

Checking usage at the time of writing, both are below baseline (the sustainable zone), accruing burst (kubectl top nodes):

NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
cp-8gb-init 482m 24% 4565Mi 58%
cp-16gb-join 1153m 28% 10096Mi 65%

Second, admit that two is not HA, and take out insurance.

As the table shows, with two nodes, losing even one loses quorum and writes stop (pods already running keep going under kubelet, so it's "no changes" rather than "total outage"). I cover that risk with etcd automatic snapshots. Since I gave no extra config, it runs with k3s defaults — 0 */12 * * * (twice a day), keep 5, stored at /var/lib/rancher/k3s/server/db/snapshots. (etcd-snapshot) Since they only pile up locally, pushing them to NAS/object storage later is a task I've left for the backup installment.

4. Today's star — Tailscale

The control plane is on Lightsail in Tokyo; the machine I'll use as a worker is the home iMac in Sapporo.

These two don't share a private network.

The home machine sits behind a router on a private IP (192.168.x), so it can't be reached directly from outside, and opening ports to expose it would mean exposing cluster ports like kubelet (10250) and VXLAN (8472) to the internet — dangerous. For k3s to bind nodes into one cluster, everyone has to be able to call each other by one stable address , and the current setup doesn't have that.

So I went looking for a method among VPNs and meshes.

Option	Character	For this situation
Direct port exposure + public IP	Expose as-is without a VPN	Effectively exposes kubelet/VXLAN to the internet → dangerous, dropped
raw WireGuard	Fast kernel VPN, manual keys/peers	Fast, but NAT traversal, key management, and access control are all manual
OpenVPN	Traditional hub-style VPN	Hub-centric rather than mesh, heavy to set up
ZeroTier	Managed mesh VPN	A solid candidate, similar in flavor
Tailscale	WireGuard + coordination (mesh)	Automatic NAT traversal, ACLs, MagicDNS, unattended keys, free for personal use ← chosen
Headscale	Self-hosted Tailscale control server	More freedom but the burden of self-operation → consider later

After a lot of trial and deliberation that took plenty of time, in the end I chose Tailscale. It's a WireGuard-based mesh VPN: install a daemon on each machine and log in, and it joins a private network (a tailnet ) tied to your account, with each machine getting one address in the 100.x range. That address is reachable by the same value from anywhere — whether the machine is in Tokyo or behind a router in Sapporo — and Tailscale handles NAT traversal for you.

It means you can lay down a "virtual LAN" that puts the cloud and home on one plane. (And up to 100 machines register for free.)

When k3s registers a node, it stamps the address given via --node-ip as that node's identity (InternalIP). So by setting this value to a Tailscale address from the start, a home node joining later lands on the same 100.x plane as-is. That's why I install Tailscale before k3s.

5. Tailscale: sign up · install · verify

The order is sign up → install → verify.

① Sign up. Log in at login.tailscale.com with an SSO account like Google, GitHub, or Microsoft, and a tailnet for that account is created automatically. There's no separate signup form; SSO is the signup.

② (For servers) Prepare an auth key. Cloud servers have no browser, so issue an auth key (tskey-…) in advance from the admin console under Settings → Keys. You can skip this if you'll connect interactively.

③ Install & connect. On each of the two cloud nodes ( Amazon Linux 2023 ):

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up # authenticate via the printed URL (headless: --authkey tskey-… )
tailscale ip -4 # this node's 100.x address — used directly as --node-ip in ch.6

④ Verify. If both nodes appear in the admin console Machines page (login.tailscale.com/admin/machines) with their 100.x address and hostname, it worked.

You can also check from the node:

tailscale status # list of machines in the tailnet + each one's 100.x

With this, the two cloud nodes see each other by 100.x in one tailnet. Now I bring up k3s with these addresses. (Tailscale Linux install)

6. Installing k3s (with Tailscale addresses)

Put the 100.x you got in chapter 5 straight into --node-ip.

server-A (8GB):

curl -sfL https://get.k3s.io | K3S_TOKEN=<shared-secret> INSTALL_K3S_VERSION=v1.34.3+k3s1 \
  sh -s - server \
    --cluster-init \
    --node-ip 100.71.x.x \
    --node-external-ip <publicA> \
    --advertise-address 100.71.x.x \
    --flannel-backend vxlan

--cluster-init — initializes embedded etcd as the first server. (server flags)
--node-ip 100.71.x.x — advertises the Tailscale address received in ch.5 as the InternalIP.
--node-external-ip / --advertise-address — public IP (for external exposure), apiserver advertise address (Tailscale).
--flannel-backend vxlan — CNI backend (the default, stated explicitly).

K3S_TOKEN can be a value you set yourself, like choosing a password, or left blank for k3s to generate automatically. But since you need to know this value to join, save it separately or just pass the value at the path below.

/var/lib/rancher/k3s/server/node-token

server-B (16GB) — joins as the second server. This node, too, joins the tailnet first, then just connects with the same token:

curl -sfL https://get.k3s.io | K3S_TOKEN=<secret> INSTALL_K3S_VERSION=v1.34.3+k3s1 \
  sh -s - server \
    --server https://172.26.x.x:6443 \
    --node-ip 100.99.x.x

--server https://172.26.x.x:6443 = server-A's address (a private IP, since it's the same VPC).
--node-ip 100.99.x.x = this node's Tailscale address.

The two Lightsail boxes are in the same AWS VPC , so joining itself used the private IP, but the InternalIP advertised to the cluster is Tailscale (100.x) for both.

Firewall — open only the minimum externally. (requirements)

port	use	exposure
80 / 443	Traefik Ingress	all
22	SSH	my IP only
6443 / 2379-2380 / 8472 / 10250	apiserver·etcd·flannel·kubelet	closed publicly , private/Tailscale internal only

7. Cluster setup — complete with two nodes

Attaching the home iMac as an agent is covered in the next article.

For now I've built the cluster with two Lightsail boxes, Tailscale applied. Listing the nodes, you can confirm both are Ready on the same version and runtime.

kubectl get nodes -o wide

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
…3-146(8GB) Ready control-plane,etcd 139d v1.34.3+k3s1 100.71.x.x 52.x.x.x Amazon Linux 2023.7.20250512 6.1.134-…amzn2023.x86_64 containerd://2.1.5-k3s1
…2-70(16GB) Ready control-plane,etcd 139d v1.34.3+k3s1 100.99.x.x 3.x.x.x Amazon Linux 2023.9.20251105 6.1.156-…amzn2023.x86_64 containerd://2.1.5-k3s1

Check whether the two nodes are etcd voting members (look at Conditions in kubectl describe node <name>):

Conditions:
  Type Status Reason Message
  ---- ------ ------ -------
  EtcdIsVoter True MemberNotLearner Node is a voting member of the etcd cluster
  MemoryPressure False KubeletHasSufficientMemory kubelet has sufficient memory available
  DiskPressure False KubeletHasNoDiskPressure kubelet has no disk pressure
  PIDPressure False KubeletHasSufficientPID kubelet has sufficient PID available
  Ready True KubeletReady kubelet is posting ready status

Check that the k3s default bundle came up too (kubectl get pods -n kube-system):

# kubectl get pods -n kube-system → k3s default bundle only (excerpt)
coredns-7f496c8d7d-nx9jc 1/1 Running 139d # DNS
local-path-provisioner-578895bd58-mgxpm 1/1 Running 139d # local storage (default SC)
metrics-server-7b9c9c4b9c-76ldg 1/1 Running 139d # metrics (kubectl top)
traefik-78df465dcc-66kn8 1/1 Running 9d # Ingress (server-A)
traefik-78df465dcc-gs4q7 1/1 Running 8d # Ingress (server-B) → one per node = 2 replicas
helm-install-traefik-crd-pmk4t 0/1 Completed 139d # Helm Job that installed the bundle (completed)

That concludes setting up two cloud instances as a k3s cluster. It isn't just that I installed k3s — I also configured Tailscale so that, later, any machine can join as an agent regardless of where it is or what form it takes, as long as it's an environment where k3s can be configured.

8. Next

The AWS Lightsail nodes are now formed into a cluster, and the groundwork for nodes to join is all set.

In the end it came down to one command per node, but this stage took more time than I expected.

To this two-node cluster, I'll now bring in the iMac resting at home, in earnest. I'll install Lima VMs on the iMac, create an agent on each, join them to the same tailnet, and write up the problems I ran into after joining — solving them along the way.

References

k3s — What is K3s / Architecture / Datastore: https://docs.k3s.io/ · /architecture · /datastore
k3s — HA Embedded etcd / Server flags / etcd-snapshot / Requirements: https://docs.k3s.io/datastore/ha-embedded · /cli/server · /cli/etcd-snapshot · /installation/requirements
Lightweight distro comparison (k3s·k0s·MicroK8s): https://palark.com/blog/small-local-kubernetes-comparison/ · https://www.portainer.io/blog/k0s-vs-k3s · https://www.nops.io/blog/k0s-vs-k3s-vs-k8s/
Tailscale — Linux install: https://tailscale.com/kb/1031/install-linux
AWS Lightsail — burst CPU / baseline: https://docs.aws.amazon.com/lightsail/latest/userguide/baseline-cpu-performance.html

DEV Community: SEON

Hands-on DevOps #1 — GitLab CI/CD Components & Catalog: Build, Publish, and Consume by Version

TL;DR

Overview

Where it's useful

Version history and direction

Architecture

Lifecycle

include resolution and input validation

Prerequisites

Core concepts

1) Component directory structure

2) spec:inputs — typed inputs

3) Interpolation $[[ inputs.x ]]

4) Version references

5) include path and release

Hands-on steps

Step 1 — Create the project and clone

Step 2 — The greeting component (4 inputs + interpolation)

Step 3 — The semver-guard component (regex) + .gitlab-ci.yml

Step 4 — Push and run the self-test pipeline (real output)

Step 5 — Register the catalog resource + publish the version (real output)

Step 6 — Verify consumer include (CI Lint API, real output)

Advanced hands-on

Step 7 — Actually consume from a consumer project (usage)

Step 8 — v2.0.0 release and version ranges (controlling breaking changes)

Step 9 — Composing components (multiple components in one pipeline)

Verification

Production

Common mistakes & troubleshooting

Going further

Cleanup

References

Hybrid k3s #5: Putting kubectl down — GitOps 1/3

0. About this series

1. Background — the things I'd stood up imperatively started to pile up

1-1. The walls I hit operating by hand

2. GitOps

2-1. What GitOps is

2-2. Why GitOps

2-3. Push delivery and Pull delivery

2-4. Why Pull is safer and more robust

3. What to do GitOps with — ArgoCD vs Flux vs Fleet

3-1. ArgoCD — an app-centric GitOps controller

3-2. Flux CD — a composable GitOps toolkit

3-3. Rancher Fleet — GitOps for hundreds of clusters

3-4. Comparison in one table

3-5. So why ArgoCD?

4. What structure to build ArgoCD with — rendering, organization, repository, access

4-1. Manifest rendering — Kustomize vs Helm vs plain

4-2. App organization — app-of-apps vs ApplicationSet

4-3. Repository structure — monorepo vs polyrepo

4-4. Repository access — HTTPS vs SSH deploy key vs GitHub App

4-5. What does my homelab's setup look like? — what, why, and so what do I gain

5. Wrapping up — and what's next

References / Sources

Hybrid k3s #4: Building a unified database on k3s — five Postgres operators, and CloudNativePG

0. About this series

1. Kubernetes, databases, and the Operator

2. Comparing five major Postgres Operator architectures

① Zalando Postgres Operator

② CrunchyData PGO

③ Percona Operator for PostgreSQL

④ StackGres

⑤ CloudNativePG (CNPG)

A quick summary table

3. Why I chose CloudNativePG

① Extreme simplicity and lightness (Kubernetes Native)

② Rock-solid backup/recovery with Barman (PITR)

③ A declarative CRD — the ideal candidate for GitOps next time

4. Hands-on: installing the CNPG Operator

5. Hands-on: deploying the first highly available PostgreSQL cluster

6. Hands-on: verifying the connection and replication state

7. The last piece that protects your data — backups to MinIO

S3-compatible API — any backend works

Adding the backup config

First backup and verification

8. Wrapping up — and what's next

References

Hybrid k3s #3: Pods couldn't talk to each other — flannel VXLAN and vmnet

2) `spec:inputs` — typed inputs

3) Interpolation `$[[ inputs.x ]]`

5) include path and `release`

Step 2 — The `greeting` component (4 inputs + interpolation)

Step 3 — The `semver-guard` component (regex) + `.gitlab-ci.yml`

1. Six nodes `Ready`, yet the Pods were strangers

4-2. The VM blueprint — `k3s-agent.yaml`

7-3. Verify — `kubectl get nodes`