Scott Coristine

Posted on Mar 25 • Originally published at signaturecare.ca

Building a Care Coordination System: What Developers Can Learn from Home Care Service Architecture

#healthcare #webdev

How the operational logic behind home care services maps surprisingly well to distributed systems design

When I started thinking about how home care agencies coordinate caregivers, clients, and schedules, I realized the underlying architecture is essentially a real-world implementation of several patterns we deal with in software every day: service discovery, state machines, event-driven workflows, and fault-tolerant scheduling.

This post breaks down the operational model behind home care coordination — using Signature Care's Montreal-based service model as a reference — and maps it to patterns you'll recognize from distributed systems.

Whether you're building scheduling software, a care platform, or just want a concrete mental model for complex coordination problems, there's a lot here to unpack.

The Core Problem: Coordinating Stateful, Human-Centric Services

Home care isn't a stateless API call. Every client has:

A dynamic need profile that changes over time
Availability constraints (medical appointments, family visits)
Caregiver preferences (language, personality fit, specialization)
Compliance requirements (medication schedules, care documentation)

If you were modeling this in code, a client's care state might look something like:

type CareNeedLevel = 'companion' | 'personal' | 'medical' | 'complex';

interface ClientProfile {
  id: string;
  needLevel: CareNeedLevel;
  languages: string[];           // e.g. ['fr', 'en'] — bilingual matters in Montreal
  scheduledVisits: Visit[];
  careplan: CarePlan;
  lastAssessmentDate: Date;
  escalationThreshold: number;   // trigger reassessment if score changes
}

interface Visit {
  caregiverId: string;
  scheduledAt: Date;
  serviceType: ServiceType[];
  completedAt?: Date;
  notes?: string;
}

The challenge: this profile is not static. It transitions through states, and the system needs to respond to those transitions.

State Machine: The Client Care Journey

The intake-to-care lifecycle maps cleanly to a finite state machine:

[INQUIRY] → [ASSESSMENT] → [PLAN_CREATED] → [CAREGIVER_MATCHED] → [ACTIVE_CARE] → [REASSESSMENT]
                                                                                          ↓
                                                                                    [PLAN_UPDATED]
                                                                                          ↓
                                                                                    [ACTIVE_CARE]

In code (using XState-style notation):

const careStateMachine = {
  id: 'clientCareJourney',
  initial: 'inquiry',
  states: {
    inquiry: {
      on: { ASSESSMENT_SCHEDULED: 'assessment' }
    },
    assessment: {
      on: {
        PLAN_APPROVED: 'caregiver_matching',
        NEEDS_CLARIFICATION: 'assessment'  // self-loop for complex cases
      }
    },
    caregiver_matching: {
      on: {
        MATCH_FOUND: 'active_care',
        NO_MATCH: 'escalated'             // fallback path
      }
    },
    active_care: {
      on: {
        REASSESSMENT_TRIGGERED: 'reassessment',
        SERVICE_ENDED: 'closed'
      }
    },
    reassessment: {
      on: {
        PLAN_UPDATED: 'active_care',
        ESCALATION_REQUIRED: 'escalated'
      }
    },
    escalated: {
      type: 'final'                       // hand-off to specialized coordination
    }
  }
};

What's notable here is the reassessment loop — care isn't set-and-forget. Regular check-ins feed data back into the state machine and can trigger plan updates. This is analogous to a health check loop in a microservices architecture.

Service Discovery: Caregiver Matching as a Constraint Satisfaction Problem

Matching a caregiver to a client is functionally a constraint satisfaction problem (CSP):

def find_matching_caregiver(client: ClientProfile, available_caregivers: list[Caregiver]) -> Caregiver | None:
    """
    Hard constraints (must match):
      - language compatibility
      - service type capability
      - geographic availability (Montreal zone)
      - schedule availability

    Soft constraints (scored):
      - personality/preference notes
      - continuity (has served client before)
      - specialization fit
    """
    hard_filtered = [
        cg for cg in available_caregivers
        if has_language_overlap(cg, client)
        and can_provide_services(cg, client.careplan.required_services)
        and is_available(cg, client.scheduledVisits)
    ]

    if not hard_filtered:
        return None  # trigger escalation

    # Score soft constraints
    scored = sorted(hard_filtered, key=lambda cg: score_match(cg, client), reverse=True)
    return scored[0]


def score_match(caregiver: Caregiver, client: ClientProfile) -> float:
    score = 0.0
    if has_served_before(caregiver, client):
        score += 0.4   # continuity is highly valued
    if caregiver.specialization == client.needLevel:
        score += 0.35
    if caregiver.preferred_zones and client.zone in caregiver.preferred_zones:
        score += 0.25
    return score

The bilingual requirement (French/English) is a hard constraint specific to the Montreal context — agencies like Signature Care build this directly into their matching logic because it directly affects care quality.

Event-Driven Coordination: The Visit Lifecycle

Each visit generates a series of events that downstream systems need to process:

type VisitEvent =
  | { type: 'VISIT_CONFIRMED'; visitId: string; caregiverId: string }
  | { type: 'CAREGIVER_EN_ROUTE'; visitId: string; eta: Date }
  | { type: 'VISIT_STARTED'; visitId: string; startTime: Date }
  | { type: 'TASK_COMPLETED'; visitId: string; task: ServiceTask }
  | { type: 'CONCERN_FLAGGED'; visitId: string; severity: 'low' | 'high'; notes: string }
  | { type: 'VISIT_ENDED'; visitId: string; endTime: Date; summary: VisitSummary }
  | { type: 'VISIT_MISSED'; visitId: string; reason?: string };

// Event consumers
const visitEventHandlers: Record<VisitEvent['type'], Handler> = {
  VISIT_MISSED: triggerEscalationProtocol,
  CONCERN_FLAGGED: notifyCareCo ordinator,
  VISIT_ENDED: updateClientRecord,
  // ...
};

The VISIT_MISSED event is the critical failure case. Unlike a failed HTTP request you can retry, a missed home care visit has real-world consequences — it needs immediate escalation, not exponential backoff.

This is a good reminder that when you're building systems that interface with physical-world events, your error handling semantics need to change.

Fault Tolerance: What Happens When the Primary Path Fails?

In distributed systems, we design for failure. Home care coordination does the same:

Primary Path:    Regular caregiver → scheduled visit → completed
Fallback L1:     Backup caregiver (same agency, pre-identified)
Fallback L2:     On-call coordinator dispatches available staff
Fallback L3:     Emergency escalation + family notification

Modeling this in code:

async def dispatch_visit(visit: Visit) -> DispatchResult:
    # Try primary caregiver
    if await confirm_caregiver(visit.primary_caregiver_id, visit):
        return DispatchResult(caregiver=visit.primary_caregiver_id, source='primary')

    # Fallback L1: pre-identified backup
    backup = await get_backup_caregiver(visit)
    if backup and await confirm_caregiver(backup.id, visit):
        return DispatchResult(caregiver=backup.id, source='backup_l1')

    # Fallback L2: on-call pool
    on_call = await query_on_call_pool(visit.zone, visit.required_services)
    if on_call:
        return DispatchResult(caregiver=on_call.id, source='on_call')

    # Fallback L3: escalate — this is not retryable silently
    await escalate_to_coordinator(visit, reason='no_caregiver_available')
    raise UnresolvableDispatchError(visit_id=visit.id)

Notice that the final fallback doesn't silently fail — it raises an exception that forces human intervention. Some failures are not recoverable programmatically. Knowing when to stop automating is as important as building the automation.

Data Architecture Considerations

If you were building a care coordination platform, your schema would need to handle a few interesting challenges:

1. Temporal Care Plans

Care plans aren't just current-state records — you need full history:

CREATE TABLE care_plan_versions (
  id            UUID PRIMARY KEY,
  client_id     UUID REFERENCES clients(id),
  version       INTEGER NOT NULL,
  valid_from    TIMESTAMPTZ NOT NULL,
  valid_until   TIMESTAMPTZ,             -- NULL = currently active
  plan_data     JSONB NOT NULL,
  created_by    UUID REFERENCES staff(id),
  change_reason TEXT
);

-- Query active plan
SELECT * FROM care_plan_versions
WHERE client_id = $1
  AND valid_from <= NOW()
  AND (valid_until IS NULL OR valid_until > NOW());

2. Compliance Audit Trail

Every action taken during a visit needs to be logged immutably for regulatory compliance:

CREATE TABLE visit_audit_log (
  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  visit_id     UUID REFERENCES visits(id),
  event_type   TEXT NOT NULL,
  actor_id     UUID NOT NULL,
  occurred_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  payload      JSONB,
  -- No UPDATE or DELETE allowed — append-only
  CHECK (occurred_at <= NOW())  -- can't log future events
);

3. Geographic Zone Management

Montreal-specific: service zones affect both caregiver assignment and billing rates:

interface ServiceZone {
  id: string;
  borough: string;           // e.g., 'Plateau-Mont-Royal', 'Côte-des-Neiges'
  postalPrefixes: string[];  // e.g., ['H2W', 'H2J']
  travelTimeSLA: number;     // max acceptable travel time in minutes
  surchargeMultiplier: number;
}

Key Architectural Takeaways

Building scheduling and coordination systems — whether for home care, field services, or logistics — shares a common set of challenges:

Model state explicitly. Don't infer care/service status from fields like last_updated. Use a proper state machine.
Hard constraints vs. soft constraints matter. Not all matching criteria are equal. Language compatibility in a bilingual city isn't a "nice to have."
Design failure paths as carefully as success paths. What happens when the primary path fails? When the fallback fails? When all automated paths fail?
Append-only audit logs are non-negotiable in regulated domains. Build them from day one.
Human escalation is a valid system output. The best automated systems know their own limits.

DEV Community