Clay Roach

Posted on Sep 3 • Originally published at dev.to

Day 20: Service Topology Implementation with Critical Request Paths

#ai #observability #claude #visualization

Today completed the Service Topology feature implementation, replacing the previous AI Insights view with a comprehensive three-panel visualization system. The implementation demonstrates practical AI-assisted development achieving enterprise-level features in minimal time.

Implementation Overview

The 4-hour development session produced:

Service Topology visualization with interactive network graph
Critical Request Paths analysis using Sankey flow diagrams
Real-time service health indicators with R.E.D metrics
AI-powered analysis panel for selected services
Global analysis controls integrated into menu bar
Live/Demo mode toggle for data source switching

Technical Architecture

The Service Topology feature uses a three-panel layout for comprehensive system visualization.

Critical Request Paths Panel

interface CriticalPath {
  id: string
  name: string
  description?: string
  priority: 'critical' | 'high' | 'medium' | 'low'
  services: string[]
  edges: Array<{ source: string; target: string }>
  metrics: {
    requestCount: number
    avgLatency: number
    p99Latency: number
    errorRate: number
  }
}

Multi-select functionality with Cmd/Ctrl+Click enables simultaneous path comparison.

Interactive Service Topology Graph

Node sizing uses logarithmic scaling for visual clarity:

const calculateNodeSize = (rate: number, maxRate: number) => {
  const minSize = 30
  const maxSize = 80
  const scaleFactor = Math.log(rate + 1) / Math.log(maxRate + 1)
  return minSize + (maxSize - minSize) * scaleFactor
}

const getHealthColor = (errorRate: number): string => {
  if (errorRate > 0.05) return '#ff4d4f' // >5% errors
  if (errorRate > 0.01) return '#faad14' // 1-5% errors
  return '#52c41a' // <1% errors
}

AI Analysis Panel

Service health analysis with actionable insights:

export const generateHealthExplanation = (
  serviceName: string,
  metrics: ServiceMetricsDetail
): HealthExplanation => {
  const errorSeverity = metrics.errorRate > 0.05 ? 2 : 
                        metrics.errorRate > 0.01 ? 1 : 0
  const latencySeverity = metrics.duration > 500 ? 2 : 
                          metrics.duration > 100 ? 1 : 0
  const rateSeverity = metrics.rate < 1 ? 2 : 
                       metrics.rate < 10 ? 1 : 0

  const maxSeverity = Math.max(errorSeverity, latencySeverity, rateSeverity)
  const status = maxSeverity === 2 ? 'critical' : 
                 maxSeverity === 1 ? 'warning' : 'healthy'

  return {
    status,
    summary: generateSummary(serviceName, metrics, status),
    impactedMetrics: analyzeMetrics(metrics),
    recommendations: generateRecommendations(metrics, status)
  }
}

Development Metrics

Quantifiable progress from today's implementation:

Lines of Code: 2,500 across 12 TypeScript files
Components Created: 8 React components
Test Coverage: 12 e2e tests passing, 7 skipped for compatibility
Development Time: 4 hours focused work
Refactoring Iterations: 3 major cycles

Technical Implementation Details

Sankey Diagram for Request Flow

Converting topology data to flow visualization:

const getSankeyOption = (): EChartsOption => {
  const links = path.edges.map((edge) => {
    const sourceService = services.find(s => s.id === edge.source)
    const targetService = services.find(s => s.id === edge.target)
    const volume = Math.min(
      sourceService?.metrics?.rate || 100,
      targetService?.metrics?.rate || 100
    )
    const errorRate = targetService?.metrics?.errorRate || 0

    return {
      source: edge.source,
      target: edge.target,
      value: volume,
      lineStyle: {
        color: getServiceColor(errorRate),
        opacity: errorRate > 0.01 ? 0.9 : 0.6
      }
    }
  })

  return {
    series: [{
      type: 'sankey',
      emphasis: { focus: 'adjacency' },
      data: nodes,
      links: links
    }]
  }
}

Service Neighbor Visibility

Intelligent filtering for selected service context:

const getVisibleServices = (selectedService: string, allServices: ServiceNode[]) => {
  const neighbors = new Set<string>()

  edges.forEach(edge => {
    if (edge.source === selectedService) neighbors.add(edge.target)
    if (edge.target === selectedService) neighbors.add(edge.source)
  })

  return allServices.filter(service => 
    service.id === selectedService || neighbors.has(service.id)
  )
}

Data Source Management

Supporting both mock and live data:

const useDataSource = () => {
  const { useMockData } = useAppStore()

  return useMemo(() => ({
    fetchTopology: useMockData 
      ? () => Promise.resolve(getMockTopologyData())
      : () => fetchRealTopologyData(),
    fetchMetrics: useMockData
      ? () => Promise.resolve(getMockMetrics())
      : () => fetchRealMetrics()
  }), [useMockData])
}

Visual Documentation

Screenshots from PR #39 implementation:

Main Topology View

Critical paths, interactive topology, and AI analysis panels

Checkout Flow Path

Sankey diagram showing request volumes and error rates

Test Coverage

Comprehensive e2e test suite ensuring quality:

describe('Service Topology Comprehensive Validation', () => {
  test('should display all Service Topology components correctly')
  test('should handle path selection in critical paths panel')
  test('should display topology graph with nodes and edges')
  test('should show service details on node click')
  test('should handle Live/Demo mode switching')
  test('should filter services based on health status')
  test('should highlight selected paths in topology')
  test('should show AI analysis for selected service')
  test('should handle multi-select with Cmd/Ctrl+Click')
  test('should maintain state across panel interactions')
  test('should handle error states gracefully')
  test('should perform smoothly with large datasets')
})

4-Hour Development Breakdown

Hour 1: Requirements analysis and component architecture
Hour 2: ECharts topology graph implementation
Hour 3: Sankey diagram and path visualization
Hour 4: AI analysis panel and test suite

Performance Considerations

Current limitations and planned optimizations:

Graph rendering slows with >100 nodes
WebSocket integration needed for real-time updates
Mobile viewport requires responsive design adjustments
Export functionality pending for diagram sharing

Implementation Insights

Effective Patterns

Component isolation simplified parallel development
Mock data first approach accelerated UI iteration
TypeScript interfaces prevented runtime errors
Effect-TS patterns provided type-safe service boundaries

Areas Requiring Refinement

Large dataset performance optimization
Real-time data streaming integration
Mobile-responsive layout adaptation
Diagram export capabilities

Next Steps

Tomorrow's implementation priorities:

Connect to live OpenTelemetry data streams
Implement autoencoder-based anomaly detection
Optimize rendering for enterprise-scale graphs
Add time-series topology evolution

Summary

Day 20 delivered a complete Service Topology implementation with critical path analysis, interactive visualization, and AI-powered insights. The 4-hour focused development session produced 2,500 lines of production-ready code with comprehensive test coverage.

Progress: Day 20 of 30 complete
Feature: Service Topology with Critical Request Paths
Code: 2,500 LOC added
Tests: 12 passing, 7 skipped
PR: #39

Part of the 30-Day AI-Native Observability Platform series. Building enterprise observability with AI-assisted development and 4-hour focused workdays.

DEV Community