DEV Community

Cover image for Service Maps: The Architectural Clarity Your Team Is Missing
Samson Tanimawo
Samson Tanimawo

Posted on

Service Maps: The Architectural Clarity Your Team Is Missing

"What Calls What?"

The question that launches a thousand Slack threads. In every microservices architecture I've worked with, nobody has a complete picture of how services interact.

Service maps fix this.

The Problem With Static Diagrams

Every team has architecture diagrams. They're all wrong. They were accurate on the day they were created, which was 18 months ago.

Reality of static diagrams:
  Created: January 2023 (accurate)
  Updated: March 2023 (still mostly accurate)
  Last touched: March 2023
  Current accuracy: ~40%
  New services since: 12
  Removed services: 3
  Changed dependencies: 27
Enter fullscreen mode Exit fullscreen mode

Static diagrams are documentation debt.

Auto-Generated Service Maps

The solution is maps generated from actual traffic. There are three data sources:

1. Network Traffic (Infrastructure Level)

# Parse Kubernetes network policies + actual traffic
def build_network_map(k8s_client):
    services = k8s_client.list_namespaced_service('production')
    connections = []

    for svc in services.items:
        # Get pods for this service
        pods = k8s_client.list_namespaced_pod(
            'production',
            label_selector=','.join(f"{k}={v}" for k, v in svc.spec.selector.items())
        )
        # Check established connections
        for pod in pods.items:
            conns = get_pod_connections(pod.metadata.name)  # from /proc/net/tcp
            for conn in conns:
                target_svc = resolve_ip_to_service(conn.remote_ip)
                if target_svc:
                    connections.append({
                        'source': svc.metadata.name,
                        'target': target_svc,
                        'port': conn.remote_port
                    })

    return connections
Enter fullscreen mode Exit fullscreen mode

2. Distributed Traces (Application Level)

-- Query trace data for service dependencies
SELECT 
  parent_service,
  child_service,
  COUNT(*) as call_count,
  AVG(duration_ms) as avg_latency,
  SUM(CASE WHEN status = 'ERROR' THEN 1 ELSE 0 END)::float / COUNT(*) as error_rate
FROM traces
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY parent_service, child_service
ORDER BY call_count DESC;
Enter fullscreen mode Exit fullscreen mode

3. API Gateway Logs (Edge Level)

API Gateway → Which external services call which internal services
Load Balancer → Traffic distribution and routing rules
Enter fullscreen mode Exit fullscreen mode

What a Good Service Map Shows

Beyond just "A calls B," a useful service map includes:

service_map_edge:
  source: checkout-api
  target: payment-service
  metadata:
    requests_per_second: 150
    error_rate: 0.2%
    p99_latency: 120ms
    protocol: gRPC
    circuit_breaker: enabled
    retry_policy: 3x with exponential backoff
    owner_team: payments
    tier: critical
    last_deploy: 2024-03-14T15:30:00Z
Enter fullscreen mode Exit fullscreen mode

Color-code by health:

  • Green: < 1% error rate, latency within SLO
  • Yellow: Approaching SLO limits
  • Red: SLO breached

The Incident Response Advantage

During an incident, a live service map immediately answers:

  1. Impact scope: What services depend on the broken one?
  2. Blast radius: How many users are affected?
  3. Root cause direction: Is the problem upstream or downstream?
  4. Recovery order: Which services need to come back first?
Before service maps (incident response):
  "Is anyone else affected?" → 15 minutes of Slack investigation

After service maps:
  *click on broken service* → instantly see all dependents
  Answer: 4 downstream services, 3 are degraded
Enter fullscreen mode Exit fullscreen mode

Start Simple

You don't need a fancy tool to get started:

  1. Export your K8s services and network policies
  2. Add trace-based dependencies
  3. Render with a simple graph library
  4. Update automatically via cron job

A basic but accurate map beats a fancy but outdated one every time.

If you want auto-generated, always-accurate service maps with AI-powered impact analysis, check out what we're building at Nova AI Ops.


Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

Top comments (0)