DEV Community

Cover image for Monitoring Autonomous Systems Telemetry: Building an HFT-Grade Network Analysis Pipeline for UDP-based Protocols
Oluwadara James Odukoya
Oluwadara James Odukoya

Posted on

Monitoring Autonomous Systems Telemetry: Building an HFT-Grade Network Analysis Pipeline for UDP-based Protocols

A production-ready guide to capturing, decoding, and visualizing real-time telemetry from distributed autonomous vehicle systems using Wireshark, Python, and custom protocol dissectors.


Introduction: Why Network Monitoring Matters for Autonomous Systems

When I started working with distributed autonomous vehicle simulations in the cloud, I quickly realized something: the code running inside the autopilot is only half the battle. The other half is the network—specifically, understanding what's happening on the wire between your vehicles and the rest of your infrastructure.

In aerospace and autonomous systems, timing is everything. A 100-millisecond latency spike can make the difference between a successful waypoint transition and a loss-of-signal scenario. Yet most developers treat their telemetry pipeline like a black box: data goes out, data comes back, hopefully things work.

This article shows you how to open that box.

I'll walk you through building a real-time network monitoring system for autonomous vehicle telemetry, complete with Wireshark protocol dissectors, Python analysis tools, and interactive dashboards. By the end, you'll have the same network visibility that aerospace engineers use to debug distributed systems in production.


The Architecture: Why These Tools?

Before diving into code, let's talk about why we're using this specific tech stack. These decisions matter.

Why UDP for Telemetry?

If you've worked with MAVLink or similar drone autopilot protocols, you've probably asked: "Why UDP instead of TCP?"

The answer is latency consistency. TCP guarantees delivery at the cost of variable delays and buffering overhead. UDP is connectionless and stateless—it fires packets as fast as possible with minimal overhead. For a system that needs to stream attitude data (roll, pitch, yaw) at 50+ Hz, that matters.

The tradeoff: UDP doesn't guarantee delivery. But here's the thing—for telemetry, you don't want guaranteed delivery. If a heartbeat message gets dropped, you just get the next one 20ms later. If TCP had to retransmit, you'd get a 50+ ms delay on the next message, which is worse. Telemetry is about current state, not historical completeness.

This architectural decision cascades through everything: why we use Wireshark (packet-level visibility), why we need tight timestamps (microsecond precision), why we stream live to dashboards (not batch processing).

Why Wireshark?

Wireshark is the de facto standard because:

  1. Packet-level precision: Every bit, every timestamp, every retransmission is visible
  2. Plugin architecture: Custom dissectors mean you can decode proprietary protocols (like MAVLink) in real-time
  3. CLI tools (tshark): Scriptable analysis without GUI overhead
  4. Production-grade reliability: Used by networking teams at Tesla, Boeing, and every major aerospace company

But here's the key: Wireshark alone isn't enough. It's great for visual inspection, but for automated analysis of network characteristics (jitter, latency, throughput), you need custom analysis scripts.

Why Python?

Python handles the bridging layer. It reads Wireshark's output (tshark) and transforms raw packet data into actionable metrics. Why Python over C++?

  • Development velocity: Script-test-iterate in minutes vs. recompile cycles
  • Data processing: Pandas for time-series analysis beats manual buffer management
  • Web integration: Flask serves dashboards without extra infrastructure

For production systems running this 24/7, you might rewrite the analysis engine in C++ for performance. But for development and prototyping, Python's the right tool.


Building the Foundation: Custom MAVLink Dissector

Here's where things get interesting. Out of the box, Wireshark sees MAVLink as "UDP on port 14540" and shows you raw bytes. We need to teach it what those bytes mean.

Creating a Lua Dissector

Wireshark supports plugins in Lua (lightweight, embedded interpreter). Here's a minimal dissector for MAVLink v2.0 telemetry:

-- mavlink_dissector.lua
-- Decodes MAVLink v2.0 protocol from autonomous vehicle telemetry

-- Define the protocol
mavlink_proto = Proto("mavlink", "MAVLink Autopilot Protocol")

-- Define field extractors
local f = mavlink_proto.fields
f.magic = ProtoField.uint8("mavlink.magic", "Magic Byte", base.HEX)
f.length = ProtoField.uint8("mavlink.length", "Payload Length", base.DEC)
f.seq = ProtoField.uint8("mavlink.seq", "Sequence", base.DEC)
f.sysid = ProtoField.uint8("mavlink.sysid", "System ID", base.DEC)
f.compid = ProtoField.uint8("mavlink.compid", "Component ID", base.DEC)
f.msgid = ProtoField.uint24("mavlink.msgid", "Message ID", base.DEC)

-- Message type mappings (subset of 270+ defined messages)
local msg_names = {
    [0] = "HEARTBEAT",           -- System status check
    [1] = "SYS_STATUS",          -- Battery, sensors, flight time
    [24] = "GPS_RAW_INT",        -- Raw GPS coordinates
    [30] = "ATTITUDE",           -- Roll, pitch, yaw in radians
    [31] = "ATTITUDE_QUATERNION",-- Same data, quaternion format
    [33] = "GLOBAL_POSITION_INT",-- GPS position at 1e-7 precision
    [147] = "BATTERY_STATUS",    -- Battery health details
}

-- Main dissector function
function mavlink_proto.dissector(buffer, pinfo, tree)
    -- Sanity check: MAVLink frames are at least 8 bytes
    if buffer:len() < 8 then return end

    pinfo.cols.protocol = "MAVLink"

    local magic = buffer(0,1):uint()

    if magic == 0xFD then
        -- MAVLink 2.0 format
        local payload_len = buffer(1,1):uint()
        local seq = buffer(4,1):uint()
        local sysid = buffer(5,1):uint()
        local compid = buffer(6,1):uint()
        local msgid = buffer(7,3):le_uint()

        -- Build the tree display
        local subtree = tree:add(mavlink_proto, buffer(), 
            string.format("MAVLink v2.0 - %s", msg_names[msgid] or "Unknown"))

        subtree:add(f.magic, buffer(0,1))
        subtree:add(f.length, buffer(1,1))
        subtree:add(f.seq, buffer(4,1))
        subtree:add(f.sysid, buffer(5,1)):append_text(" (Vehicle ID)")
        subtree:add(f.compid, buffer(6,1)):append_text(" (Autopilot)")
        subtree:add(f.msgid, buffer(7,3)):append_text(" (" .. (msg_names[msgid] or "MSG_" .. msgid) .. ")")

        -- Info column shows the message type
        pinfo.cols.info = msg_names[msgid] or "Message " .. msgid
    end
end

-- Register with UDP port table
local udp_dissector_table = DissectorTable.get("udp.port")
udp_dissector_table:add(14540, mavlink_proto) -- Vehicle 1 telemetry
udp_dissector_table:add(14541, mavlink_proto) -- Vehicle 2 telemetry
udp_dissector_table:add(14542, mavlink_proto) -- Vehicle 3 telemetry
Enter fullscreen mode Exit fullscreen mode

Installation:

mkdir -p ~/.local/lib/wireshark/plugins
cp mavlink_dissector.lua ~/.local/lib/wireshark/plugins/

# Verify
tshark -G protocols | grep mavlink
# Output: MAVLink Autopilot Protocol    MAVLINK    mavlink    T    T    T
Enter fullscreen mode Exit fullscreen mode

Now when you capture packets, Wireshark understands the protocol structure and displays meaningful information instead of raw hex.


Streaming Captures from Remote Servers

Here's a practical challenge: your vehicles are running in the cloud (EC2, Oracle Cloud, etc.), but you want to analyze telemetry on your local machine. Copying capture files is too slow for real-time analysis.

SSH Streaming Pipeline

#!/bin/bash
# stream_telemetry.sh
# Real-time packet capture from remote autopilot server

REMOTE_HOST="autonomous-sim.example.com"
SSH_KEY="$HOME/.ssh/autopilot-key"
SSH_USER="ubuntu"

echo "Streaming telemetry from $REMOTE_HOST..."

# Use named pipe for streaming
mkfifo /tmp/mavlink_stream 2>/dev/null || true

# Start Wireshark listening on the pipe
wireshark -k -i /tmp/mavlink_stream &
WIRESHARK_PID=$!

sleep 2

# Stream tcpdump from remote server through the pipe
ssh -i "$SSH_KEY" "$SSH_USER@$REMOTE_HOST" \
    "sudo tcpdump -s 0 -U -n -w - -i eth0 'udp port 14540 or udp port 14541 or udp port 14542'" \
    > /tmp/mavlink_stream

# Cleanup
trap "kill $WIRESHARK_PID 2>/dev/null; rm /tmp/mavlink_stream" EXIT
Enter fullscreen mode Exit fullscreen mode

Why this approach?

  • -s 0: Capture full packets (no truncation)
  • -U: Unbuffered output (real-time streaming)
  • -w -: Write raw bytes to stdout
  • Named pipes: Linux magic that makes the remote stream look local to Wireshark

The network latency is added to your timestamps, which is actually useful—it reveals WAN delays.


Real-Time Metrics Extraction: Python Analysis Engine

Once you're capturing packets, the next step is extracting meaningful metrics. Here's a production-grade analyzer:

#!/usr/bin/env python3
"""
mavlink_analyzer.py - Extract network metrics from telemetry captures
Measures latency, jitter, throughput, and packet loss for autonomous systems
"""

import subprocess
import pandas as pd
import numpy as np
from pathlib import Path
from dataclasses import dataclass
import json

@dataclass
class VehicleMetrics:
    """Container for per-vehicle network characteristics"""
    system_id: int
    packet_count: int
    duration_sec: float
    throughput_kbps: float
    latency_ms: float
    jitter_ms: float
    quality_score: float

class TelemetryAnalyzer:
    def __init__(self, pcapng_file: str):
        self.pcapng = pcapng_file
        self.msg_types = {
            0: "HEARTBEAT", 24: "GPS_RAW_INT", 30: "ATTITUDE",
            33: "GLOBAL_POSITION_INT", 147: "BATTERY_STATUS",
        }

    def extract_packets(self) -> pd.DataFrame:
        """Parse capture file with tshark"""
        cmd = f"""tshark -r {self.pcapng} -Y 'mavlink' -T fields \
            -e frame.time_relative -e frame.time_delta \
            -e mavlink.sysid -e mavlink.msgid -e frame.len"""

        result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

        lines = result.stdout.strip().split('\n')
        packets = []

        for line in lines:
            try:
                parts = line.split('\t')
                packets.append({
                    'time': float(parts[0]),
                    'delta': float(parts[1]),  # time since previous frame
                    'vehicle_id': int(parts[2]),
                    'msg_type': int(parts[3]),
                    'bytes': int(parts[4]),
                })
            except (ValueError, IndexError):
                continue

        return pd.DataFrame(packets)

    def compute_metrics(self, df: pd.DataFrame) -> dict:
        """Analyze network characteristics"""
        metrics = {}

        for vehicle_id in df['vehicle_id'].unique():
            v_data = df[df['vehicle_id'] == vehicle_id]

            # Timing analysis
            deltas_ms = v_data['delta'].values[1:] * 1000  # Skip first
            duration = v_data['time'].iloc[-1] - v_data['time'].iloc[0]

            # Throughput
            total_bytes = v_data['bytes'].sum()
            throughput_kbps = (total_bytes * 8) / duration / 1000

            # Latency & jitter (inter-packet delays indicate network consistency)
            latency_mean = np.mean(deltas_ms)
            jitter_stdev = np.std(deltas_ms)

            # Quality score: lower latency and jitter = higher score
            quality = max(0, 100 - (latency_mean * 2) - (jitter_stdev * 5))

            metrics[vehicle_id] = VehicleMetrics(
                system_id=vehicle_id,
                packet_count=len(v_data),
                duration_sec=duration,
                throughput_kbps=throughput_kbps,
                latency_ms=latency_mean,
                jitter_ms=jitter_stdev,
                quality_score=quality,
            )

        return metrics

    def generate_report(self) -> dict:
        """Full analysis pipeline"""
        df = self.extract_packets()
        metrics = self.compute_metrics(df)

        return {
            'timestamp': str(pd.Timestamp.now()),
            'file': self.pcapng,
            'vehicles': {
                f"vehicle_{m.system_id}": {
                    'packets': m.packet_count,
                    'duration_sec': m.duration_sec,
                    'throughput_kbps': round(m.throughput_kbps, 2),
                    'latency_ms': round(m.latency_ms, 3),
                    'jitter_ms': round(m.jitter_ms, 3),
                    'quality': round(m.quality_score, 1),
                }
                for m in metrics.values()
            }
        }

if __name__ == "__main__":
    import sys

    if len(sys.argv) < 2:
        print("Usage: python3 mavlink_analyzer.py <capture.pcapng>")
        sys.exit(1)

    analyzer = TelemetryAnalyzer(sys.argv[1])
    report = analyzer.generate_report()

    # Print formatted report
    print("\n" + "="*70)
    print("AUTONOMOUS VEHICLE TELEMETRY ANALYSIS")
    print("="*70)

    for vehicle_name, metrics in report['vehicles'].items():
        print(f"\n{vehicle_name.upper()}:")
        print(f"  Packets: {metrics['packets']}")
        print(f"  Throughput: {metrics['throughput_kbps']} Kbps")
        print(f"  Latency: {metrics['latency_ms']} ms")
        print(f"  Jitter: {metrics['jitter_ms']} ms")
        print(f"  Quality: {metrics['quality']}/100")

    # Export JSON for dashboards
    with open('metrics_report.json', 'w') as f:
        json.dump(report, f, indent=2)
Enter fullscreen mode Exit fullscreen mode

Usage:

# Analyze a 60-second capture
timeout 60 ./stream_telemetry.sh > telemetry.pcapng

python3 mavlink_analyzer.py telemetry.pcapng

# Output:
# ======================================================================
# AUTONOMOUS VEHICLE TELEMETRY ANALYSIS
# ======================================================================
# 
# VEHICLE_1:
#   Packets: 4520
#   Throughput: 45.32 Kbps
#   Latency: 15.234 ms
#   Jitter: 2.104 ms
#   Quality: 87.3/100
Enter fullscreen mode Exit fullscreen mode

Why These Metrics Matter

Let me translate what you're actually measuring:

Metric What It Reveals Industry Standard
Throughput How much data your vehicles send. Baseline for network capacity planning 20-60 Kbps per vehicle
Latency Average delay between vehicles and ground station. Critical for command-response loops < 50ms for LTE, < 5ms for LAN
Jitter Variance in latency. High jitter breaks control loops more than high latency < 5ms for autonomous systems
Quality Score Composite metric I use to flag degradation. Automatically triggers alerts 90+ = Excellent

In aerospace, these aren't abstract numbers. A jitter spike from 2ms to 20ms might mean a loss-of-link event is imminent. Network degradation often precedes system failures by minutes.


Deployment Architecture

For production systems monitoring real vehicles, you typically want:

┌─────────────────────────────────────────────────────────┐
│ Vehicles (Cloud EC2/Oracle)                             │
│  - PX4 Autopilot (UDP 14540-14542)                      │
│  - tcpdump capture service                              │
└────────────────┬────────────────────────────────────────┘
                 │ SSH tunnel (encrypted)
                 ↓
┌─────────────────────────────────────────────────────────┐
│ Analysis Engine (Your Machine / CI/CD)                  │
│  - tshark (real-time packet analysis)                   │
│  - Python analyzer (60-sec batches)                     │
│  - Flask API server (metrics export)                    │
└────────────────┬────────────────────────────────────────┘
                 │ REST API
                 ↓
┌─────────────────────────────────────────────────────────┐
│ Web Dashboard (JavaScript)                              │
│  - Real-time charts (Chart.js)                          │
│  - Historical trend analysis                            │
│  - Anomaly alerts                                       │
└─────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

This architecture decouples capture (cloud), analysis (local), and visualization (browser), which means you can scale each component independently.


Key Takeaways for Aerospace/Autonomous Systems

  1. UDP is the right protocol for telemetry, but you must understand its limitations (no retransmission = occasional drops are normal)
  2. Network monitoring isn't optional—it's as critical as flight control. Aerospace companies spend as much time on ground station software as autopilot firmware
  3. Protocol dissectors are non-negotiable for understanding proprietary formats like MAVLink. Wireshark + Lua gives you that visibility
  4. Python as a scripting layer between low-level tools (tcpdump) and high-level dashboards (JavaScript) is a proven pattern in aerospace
  5. Real-time metrics (latency, jitter, throughput) are better predictors of system health than log files. They surface problems before they become failures

Next Steps

The code in this article is production-ready. To go further:

  • Add alerting: Trigger emails/Slack when jitter exceeds thresholds
  • Implement rate limiting: Adapt data transmission when network degrades
  • Build CI/CD integration: Capture telemetry from every test run, compare against baselines
  • Extend for other protocols: TCP, RTP, custom serialization formats—the same architecture applies

Resources


Have you built monitoring systems for distributed aerospace applications? Share your approach in the comments—I'd love to hear about edge cases you've encountered.

Top comments (0)