Alan West

Posted on Apr 30

How to Structure Open-Source Government Code So It's Actually Reusable

#opensource #government #devops #beginners

If you've ever tried to reuse code from a government repository, you know the pain. You find a promising project on some agency's Git instance, clone it, and then spend two hours figuring out that it depends on three internal services, has no documentation beyond a README that says "see confluence," and the license file is either missing or references a policy document behind a VPN.

The Netherlands recently soft-launched a national platform for government open-source code, and it highlights a problem I've been banging my head against for years: publishing code is easy, but publishing reusable code requires actual engineering discipline.

Let me walk you through the practical steps to make government (or any institutional) open-source code genuinely usable.

The Root Cause: Code Without Context Is Just Text

The core issue isn't that governments don't publish code. Many do. The problem is threefold:

Missing metadata — no standardized way to describe what the software does, who maintains it, or what it's compatible with
Implicit dependencies — code that assumes access to internal APIs, databases, or auth systems
No contribution pathway — even if you find a bug, there's no clear way to report or fix it

The Dutch initiative and similar European efforts lean on a standard called publiccode.yml to solve the first problem. But the other two? Those require actual code changes.

Step 1: Add a `publiccode.yml` Descriptor

The publiccode.yml standard is a metadata file that sits in your repo root and describes the software in a machine-readable way. Think of it like package.json but for institutional software discovery.

Here's a minimal example:

publicodeYmlVersion: "0.4"
name: permit-tracker
url: "https://github.com/example-agency/permit-tracker"
releaseDate: "2026-03-15"
softwareVersion: "2.1.0"
developmentStatus: stable
softwareType: standalone/web
platforms:
  - web
# Categories help other agencies find your tool
categories:
  - document-management
  - workflow-management
description:
  en:
    shortDescription: "Tracks building permit applications through review stages"
    longDescription: |
      A web application for tracking building permit applications
      from submission through final approval. Supports multiple
      reviewer roles and automated status notifications.
    features:
      - "Multi-stage approval workflow"
      - "Email notifications on status change"
      - "REST API for integration"
legal:
  license: EUPL-1.2
  reusableBy: "government"
maintenance:
  type: internal
  contacts:
    - name: "Infrastructure Team"
      email: "infra@example-agency.gov"

The key fields people skip: developmentStatus (so you know if it's actually production-ready), maintenance.type (so you know if anyone is still home), and categories (so discovery platforms can actually index it).

Step 2: Externalize Your Configuration

This is where most government repos fall apart. The code works perfectly on the agency's infrastructure and nowhere else.

The fix is boring but necessary: every environment-specific value must come from configuration, not code.

import os
from dataclasses import dataclass, field

@dataclass
class AppConfig:
    # Database - no hardcoded internal hostnames
    db_host: str = field(
        default_factory=lambda: os.environ.get("DB_HOST", "localhost")
    )
    db_port: int = field(
        default_factory=lambda: int(os.environ.get("DB_PORT", "5432"))
    )
    db_name: str = field(
        default_factory=lambda: os.environ.get("DB_NAME", "permits")
    )

    # Auth - support multiple providers, not just the internal one
    auth_provider: str = field(
        default_factory=lambda: os.environ.get("AUTH_PROVIDER", "oidc")
    )
    auth_issuer_url: str = field(
        default_factory=lambda: os.environ.get("AUTH_ISSUER_URL", "")
    )

    # Feature flags for agency-specific functionality
    enable_legacy_api: bool = field(
        default_factory=lambda: os.environ.get(
            "ENABLE_LEGACY_API", "false"
        ).lower() == "true"
    )

The pattern is simple: sensible defaults for local development, environment variables for everything else. If someone can clone your repo, set five env vars, and have a running instance — you've won.

Provide an .env.example file (never .env itself) that documents every variable:

# .env.example — copy to .env and fill in your values
DB_HOST=localhost
DB_PORT=5432
DB_NAME=permits

# OIDC provider configuration
# For local dev, you can use a mock provider like oauth2-proxy
AUTH_PROVIDER=oidc
AUTH_ISSUER_URL=https://your-identity-provider/.well-known/openid-configuration

# Set to true only if you need backward compat with the v1 API
ENABLE_LEGACY_API=false

Step 3: Add a Docker Compose for Local Dev

Don't make people install PostgreSQL 15 with specific extensions and a Redis instance manually. Give them a one-command setup:

# docker-compose.dev.yml
services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: permits
      POSTGRES_USER: dev
      POSTGRES_PASSWORD: dev
    ports:
      - "5432:5432"
    volumes:
      - ./migrations/init.sql:/docker-entrypoint-initdb.d/init.sql

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  # Mock OIDC provider so auth works out of the box
  mock-auth:
    image: ghcr.io/navikt/mock-oauth2-server:2.1.10
    ports:
      - "8080:8080"
    environment:
      SERVER_PORT: 8080

The mock-auth service is the critical piece. If your app requires authentication (and government apps always do), provide a mock identity provider that works locally without any external accounts.

Step 4: Document the Architecture, Not Just the API

A common mistake: the README explains how to call the API but never explains why the system is shaped the way it is. For government software, the "why" matters enormously because design decisions are often driven by regulations, not technical preference.

Add an ARCHITECTURE.md with at minimum:

System context — what external systems does this talk to and why
Data flow — where does citizen/user data enter, get processed, and get stored
Regulatory constraints — which design decisions were forced by policy (e.g., "data must not leave this jurisdiction," "audit logs must be immutable for 7 years")
Extension points — where another agency would plug in their own implementation

This isn't busywork. I've seen teams spend weeks reverse-engineering why a system has a seemingly bizarre architecture, only to discover it was mandated by a compliance requirement. Save everyone the archaeology.

Step 5: Automate Compliance Checks in CI

If you're publishing code under a specific license, verify it automatically. Tools like reuse from the FSFE can check that every file has proper license headers:

# .github/workflows/compliance.yml
name: Compliance
on: [push, pull_request]

jobs:
  reuse:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: REUSE Compliance Check
        uses: fsfe/reuse-action@v4

  publiccode:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Validate publiccode.yml
        uses: italia/publiccode-parser-action@v2

This catches the most common issue: someone adds a new file, forgets the license header, and suddenly the legal status of the entire repo is ambiguous.

Prevention: Build Reusability From Day One

Here's what I tell teams starting new government projects:

Start with the publiccode.yml before writing code. It forces you to think about categorization and maintenance commitments early.
Run the app outside your network within the first sprint. If it doesn't work on a developer's laptop with just Docker, fix it now — it only gets harder.
Treat configuration as a public API. Changing env var names is a breaking change for everyone who deployed your software.
Set up the compliance CI pipeline in the first PR. Retrofitting license headers across 500 files is miserable.

The Dutch platform and similar initiatives across Europe are pushing in the right direction — centralizing discovery so agencies stop rebuilding the same permit tracker for the 47th time. But discovery only works if the code on the other end is actually usable.

The technical bar isn't high. It's just publiccode.yml, externalized config, a Docker Compose, architecture docs, and CI checks. Five things. If every government repo had these five things, we'd save an absurd amount of duplicated effort.

Now if only we could get them to stop using Java 8.

DEV Community

How to Structure Open-Source Government Code So It's Actually Reusable

The Root Cause: Code Without Context Is Just Text

Step 1: Add a `publiccode.yml` Descriptor

Step 2: Externalize Your Configuration

Step 3: Add a Docker Compose for Local Dev

Step 4: Document the Architecture, Not Just the API

Step 5: Automate Compliance Checks in CI

Prevention: Build Reusability From Day One

Top comments (0)

The Root Cause: Code Without Context Is Just Text

Step 1: Add a publiccode.yml Descriptor

Step 2: Externalize Your Configuration

Step 3: Add a Docker Compose for Local Dev

Step 4: Document the Architecture, Not Just the API

Step 5: Automate Compliance Checks in CI

Prevention: Build Reusability From Day One

Step 1: Add a `publiccode.yml` Descriptor