Roman Golovanov

Posted on Oct 1

Building an API for processing FHIR Data in Go

#hl7 #go #api #fhir

FHIR is HL7’s standard for healthcare data exchange. It powers EHRs, patient apps, and integrations between providers. The current versions are R4 (2019) and R5 (2023).

FHIR defines resources (Patient, Observation, Encounter, MedicationRequest, etc.) accessible via REST in JSON or XML. On top of this, you can build both patient-facing applications and system-to-system integrations.

Why This Matters

Patient-facing APIs let patients and trusted apps fetch diagnoses, prescriptions, labs, imaging, and discharge notes. In the US, this is required under the ONC Cures Act. Developers typically rely on SMART on FHIR (OAuth2, scopes like patient/*.read, PKCE, refresh tokens).

In practice, “FHIR R4 support” varies widely. Providers differ in:

request-per-minute limits,
supported resource types,
API quirks (e.g. Epic requires an Observation category to search).

Some resources can’t even be queried directly by PatientID—you must traverse references inside other resources, leading to unpredictable crawling across the graph.

What I Built

To handle this, I built a Go project on GitHub https://github.com/RomanGolovanov/go-fhir-storage that does four things:

Export — traverse the provider tree, download all resources, respect API limits.
Process — normalize, deduplicate, fix cycles and serialization issues.
Store — account → patient → snapshot model for offline access and consistency.
Deliver — serve resources back quickly in FHIR-compatible format.

Problems to Solve

Patient apps

Rate limits make large patient datasets load in tens of seconds.
Some resources require iterative traversal via links.
Heavy patients can take minutes to fully load.

Organizations

Clinics and insurers sometimes need all Observations, DiagnosticReports, Encounters at the moment and quickly to produce results.
Direct API calls to Epic or Cerner are slow, brittle, and provider-availability dependent.

The Approach

I introduced an intermediate layer:

fetch once from the provider, store locally;
patient apps work against local data, not provider APIs;
organizations get consistent snapshots.

This delivers fast UX for patients and reliable access for organizations.

Data Model: Account → Patient → Snapshot

Each account contains patients. Each patient has multiple snapshots: immutable slices created during sync.

Why snapshots?

fast full-data display;
no repeated existence checks during sync;
consistent, duplicate-free exports;
independence from provider uptime;
reuse of immutable resources (e.g. old Observations, Binaries) to speed re-syncs.

Raw provider data is never exposed. Clients only see finalized snapshots.

Storage Schema

CREATE TABLE accounts (
    id SERIAL PRIMARY KEY,
    public_id UUID UNIQUE NOT NULL,
    created_at TIMESTAMPTZ NOT NULL
);

CREATE TABLE providers (
    id SERIAL PRIMARY KEY,
    public_id UUID UNIQUE NOT NULL,
    fhir_api_url TEXT NOT NULL,
    oauth_auth_url TEXT NOT NULL,
    oauth_token_url TEXT NOT NULL,
    oauth_client_id TEXT NOT NULL,
    oauth_client_secret TEXT NOT NULL,
    provider_type TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL
);


CREATE TABLE patients (
    id SERIAL PRIMARY KEY,
    public_id UUID UNIQUE NOT NULL,
    account_id INTEGER NOT NULL,
    provider_id INTEGER NOT NULL,
    provider_patient_id TEXT NOT NULL,
    name TEXT,
    created_at TIMESTAMPTZ NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL,
    CONSTRAINT fk_patients_accounts FOREIGN KEY (account_id) REFERENCES accounts (id) ON DELETE CASCADE,
    CONSTRAINT fk_patients_providers FOREIGN KEY (provider_id) REFERENCES providers (id) ON DELETE CASCADE
);

CREATE TABLE snapshots (
    id SERIAL PRIMARY KEY,
    public_id UUID UNIQUE NOT NULL,
    patient_id INTEGER NOT NULL,
    status TEXT NOT NULL,
    status_description TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    CONSTRAINT fk_snapshots_patients FOREIGN KEY (patient_id) REFERENCES patients (id) ON DELETE CASCADE
);

CREATE TABLE fhir_resources (
    id SERIAL PRIMARY KEY,
    snapshot_id INT NOT NULL,
    resource_id VARCHAR(128) NOT NULL,
    resource_type VARCHAR(128) NOT NULL,
    resource_data JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    CONSTRAINT fk_fhir_resources_snapshots FOREIGN KEY (snapshot_id) REFERENCES snapshots (id) ON DELETE CASCADE
);

Downloading from Providers

Each provider needs its own strategy:

Epic — Observations must be queried per category (vital-signs, labs, etc.).
Cerner — slower; needs 20 concurrent workers. Access token lasts 10 min, refresh may be needed mid-sync.
VA Lighthouse — closer to spec, but still requires extra calls.

EPIC downloader configuration example:

// NewEpicDownloader creates a new instance of EpicDownloader
func NewEpicDownloader(repo *repository.FhirRepository) *EpicDownloader {
    return &EpicDownloader{
        repo: repo,
        observationCategories: []string{
            "vital-signs",
            "imaging",
            "laboratory",
            "social-history",
            "functional-mental-status",
            "core-characteristics",
            "genomics",
            "labor-delivery",
            "lda",
            "newborn-delivery",
            "obstetrics-gynecology",
            "periodontal",
            "smartdata",
        },
        medicationRequestCategories: []string{
            "inpatient",
            "outpatient",
            "community",
            "discharge",
        },
        diagnosticReportCategories: []string{
            "cardiology",
            "radiology",
            "pathology",
            "genetics",
            "laboratory",
            "microbiology",
            "toxicology",
            "cytology",
            "hearing",
            "neurology",
        },
    }
}

// DownloadSnapshot downloads FHIR resources for the given patient and stores them in the database
func (c *EpicDownloader) DownloadSnapshot(ctx context.Context, provider *model.Provider, patient *model.Patient, snapshot *model.Snapshot, refreshToken string) error {
    accessToken, err := getAccessToken(ctx, provider, refreshToken)
    if err != nil {
        return fmt.Errorf("failed to get token: %w", err)
    }
    downloader := NewResourceDownloader(provider, patient, snapshot, accessToken, c.repo, 5)
    downloader.AddResourceByIdLoader(model.ResourceTypePatient, patient.ProviderPatientID)
    downloader.AddResourceBundleByCategoriesLoader(model.ResourceTypeObservation, c.observationCategories)
    downloader.AddResourceBundleByCategoriesLoader(model.ResourceTypeMedicationRequest, c.medicationRequestCategories)
    downloader.AddResourceBundleByCategoriesLoader(model.ResourceTypeDiagnosticReport, c.diagnosticReportCategories)
    downloader.AddResourceBundleLoader(model.ResourceTypeAllergyIntolerance)
    downloader.AddResourceBundleLoader(model.ResourceTypeAppointment)
    downloader.AddResourceBundleLoader(model.ResourceTypeCondition)
    downloader.AddResourceBundleLoader(model.ResourceTypeDeviceRequest)
    downloader.AddResourceBundleLoader(model.ResourceTypeDevice)
    downloader.AddResourceBundleLoader(model.ResourceTypeDocumentReference)
    downloader.AddResourceBundleLoader(model.ResourceTypeEncounter)
    downloader.AddResourceBundleLoader(model.ResourceTypeImmunization)
    downloader.AddResourceBundleLoader(model.ResourceTypeProcedure)
    downloader.Run(ctx)
    return ctx.Err()
}

Reference Traversal

Some resources only appear via nested Reference fields. To handle this, I use a FHIRReferenceVisitor: a manual traverser with type dispatch and a callback. It’s not a strict GoF Visitor, but solves the problem cleanly.

visitor := model.NewFHIRReferenceVisitor(func(ref *fhir.Reference, stack *types.ListStack[any], names *types.ListStack[string]) {
    fmt.Printf("found reference: %s (path: %v)\n", ref.Reference, names.ToSlice())
})

// запускаю обход ресурса
visitor.Visit(observation)

Two stacks preserve context: one for object chain, one for field names.

This design gives:

explicit control over traversed fields,
simple extension for new resource types,
flexible callbacks (logging, graph building, etc.).

Downside: verbose code and manual updates when FHIR evolves. For this use case, the tradeoff is fine.

func (v *FHIRReferenceVisitor) visitObservation(obj *fhir.Observation) {
    v.visitWithName("Observation", func() {
        visitSlice(v, obj.BasedOn, "BasedOn")
        visitSlice(v, obj.PartOf, "PartOf")
        visitObject(v, obj.Subject, "Subject")
        visitObject(v, obj.Encounter, "Encounter")
        visitObject(v, obj.Device, "Device")
        visitSlice(v, obj.HasMember, "HasMember")
        visitSlice(v, obj.DerivedFrom, "DerivedFrom")
    })
}

Concurrency Control

To respect provider limits, requests run under a semaphore with a WaitGroup.

// executeLoaders executes all resource loaders
func (s *state) executeLoaders(ctx context.Context, ch chan<- *model.FHIRResourceRaw, loaders []loader, maxDegreeOfParallelism int) {
    var wgDownloads sync.WaitGroup

    // Semaphore to limit concurrency
    sem := make(chan struct{}, maxDegreeOfParallelism)

    for _, l := range loaders {
        wgDownloads.Add(1)

        go func(loader loader) {
            logger := logs.LoggerFromContext(ctx)
            defer wgDownloads.Done()

            // Acquire semaphore
            sem <- struct{}{}
            defer func() { <-sem }() // Release semaphore

            select {
            case <-ctx.Done():
                return
            default:
                if err := loader(ctx, ch); err != nil {
                    logger.Error("Error loading resource", "error", err)
                }
            }
        }(l)
    }

    wgDownloads.Wait()
}

Parallelism is tuned per provider (Cerner: 20, Epic/VA: 5).

Data Delivery and GZIP

Once data is stored in snapshots, the API can:

return the latest completed snapshot,
deliver specific resources or a full Bundle.

Since datasets are large and highly linked, bulk download makes sense. With GZIP, performance improves dramatically.

Example with ~9000 Observations:

Without GZIP: ~15s, ~5 MB.
With GZIP: ~1s, ~500 KB.

For clients, this is the difference between “instant load” and “wait half a minute.”
And it’s as easy as pie to set up compression:

    // Create the server
    server := &http.Server{
        Addr:    address + ":" + port,
        Handler: handlers.CompressHandler(r),
    }

Results

I had prior experience with a full-featured FHIR service: microservices, PostgreSQL, JVM-based pods in Kubernetes. It worked but was heavy:

serving large patients took up to a minute,
full re-syncs could take 30 minutes,
infrastructure was costly and resource-hungry.

With go-fhir-storage:

re-syncs dropped to 1–2 minutes for heavy patients,
data delivery is sub-second from snapshots,
the whole service runs as a lightweight Go monolith,
client bundle sizes stay in the hundreds of KB.

Takeaways

For FHIR, you don’t always need a complex microservice stack.

For the export → process → deliver flow, a focused Go monolith is faster, lighter, and easier to operate.

Snapshots, controlled crawling, tuned concurrency, and GZIP turn a slow provider API into a usable patient and organization experience.

Top comments (1)

Luca • Oct 2

Thanks for this. Useful.