DEV Community

Debjit Dey
Debjit Dey

Posted on

Tracking Chaos: Building a Real-Time Flight Anomaly Engine with Django, Celery, and Machine Learning

Imagine walking outside on a quiet afternoon. You hear a sharp roar overhead, pull out your phone, and open a flight-tracking app. You find a tiny airplane icon ✈️ iding smoothly along a solid line. It looks clean, structured, and completely predictable.

But what if that airplane icon suddenly starts making frantic, tight loops over a residential area? What if it begins a terrifyingly rapid descent, or behaves in a way that defies normal flight paths?

If a human air traffic controller isn’t watching that specific screen, how can software automatically flag that something is wrong?

When I set out to build SkyWatch Live, an open-source airspace and satellite tracking dashboard, my focus quickly shifted from simply drawing dots on a map to a much more interesting engineering problem: How do you build a real-time machine learning pipeline that can detect unusual airborne behavior across thousands of concurrent flights using chaotic public data?

Repo: https://github.com/debjit450/skywatch-live

Whether you are an AI engineer or someone who has never looked at aviation data in your life, the architecture behind processing volatile, high-frequency telemetry into clean, explainable alerts contains lessons that apply to any real-time streaming system.


The Operational Blueprint

Before diving into the math, it helps to see the pipeline that keeps everything alive. The application runs a split architecture:

  • The UI Layer: A React 19 and TanStack Start dashboard using MapLibre GL and deck.gl to paint real-time positions and historical playback tracks smoothly at 60 frames per second.
  • The Ingestion & Inference Layer: A Django ASGI core backed by Celery background workers, a fast Redis state cache, and a durable PostgreSQL time-series database.

Every 15 seconds, a scheduled Celery Beat task pulls raw telemetry vectors from multiple fragmented public radio networks. It de-duplicates them, caches the latest snapshot in Redis, and broadcasts them down a live WebSocket channel directly to the map.

But as soon as that data hits the database, a post-commit hook hands the raw flight data over to our dedicated machine learning pipeline.


The 3-Gate Anomaly Pipeline

If you feed raw, noisy public data straight into a complex neural network, your system will instantly drown in false positives. External sensors glitch, transponders experience signal dropouts, and rate limits throttle incoming coordinates.

To prevent false alarms, SkyWatch Live runs data through three computational gates.

Gate 1: Deterministic Physical Rules

The first line of defense doesn't use AI at all. It uses fast, binary checks written in raw Python to catch immediate high-signal events:

  • Emergency Squawk Codes: Radio transponders broadcasting specific numbers like 7700 (general emergency) or 7600 (radio communication failure).
  • Kinematic Violations: Physical impossibilities, such as a non-military cargo plane suddenly executing a 90-degree turn mid-air or dropping altitude faster than its structural limits allow.

Gate 2: Feature Engineering & Spatial Grids

If a flight path passes basic physics checks, the pipeline extracts hidden behavioral features out of raw coordinate sequences (latitude, longitude, altitude, heading, velocity). This happens inside backend/ml/features.py:

  • Spatial Indexing: The engine dynamically hashes coordinate points into a geometric grid to calculate local proximity metrics (spotting close-proximity events or dense airspace deviations).
  • Angular Velocity Tracking: By computing statistical variance across rolling windows of an aircraft's heading history, the system converts raw directional degrees into a "loitering score" that exposes circling or tracking patterns.
  • The Behavioral Baseline: The telemetry is cross-referenced with a historical AircraftProfile table, checking whether the current aircraft type is operating outside its typical operational envelope.

Gate 3: The Statistical Machine Learning Ensemble

Once the features are compiled into a normalized vector, they strike a three-model ensemble powered by scikit-learn. Using a blend of different algorithmic approaches balances out the blind spots of any single model:

  1. Isolation Forest: A tree-based model that isolates anomalies by randomly partitioning features. Because anomalies require fewer splits to isolate than normal data, they appear near the shallow roots of the trees. It is great for spotting overall global outliers.
  2. Local Outlier Factor (LOF): A density-based algorithm that measures how locally isolated a data point is relative to its surrounding neighborhood. This catches contextual anomalies—like a plane flying at a speed that is normal globally, but highly irregular for that specific crowded corridor.
  3. MLP Autoencoder: A neural network that attempts to compress the multi-dimensional feature vector down into a tiny bottleneck layer and reconstruct it perfectly on the other side.

The autoencoder flags structural anomalies by calculating the Mean Squared Error (MSE) reconstruction error between the original feature vector x and the reconstructed output x̂ across n dimensions:

MSE Formula

If the structural configuration of a flight path is weird, the autoencoder fails to reconstruct it accurately, causing the error E to spike beyond a dynamic, self-calibrating threshold.


Going Deeper: Time-Series Sequence Analysis

While spatial snapshots catch immediate deviations, flight data is fundamentally a time-dependent sequence. To track subtle, slow-building irregularities over time, SkyWatch Live features an optional deep learning path using a Long Short-Term Memory (LSTM) network inside backend/ml/lstm.py.

[State at t-3] ──► [State at t-2] ──► [State at t-1] ──► [Current State t]
                                                               │
                                                               ▼
                                                    Deep Sequence Inference
                                                               │
                                                               ▼
                                                    Trajectory Anomaly Score

Enter fullscreen mode Exit fullscreen mode

Engineering for Accessibility: TensorFlow and Keras are intentionally excluded from the project's default dependency file. This ensures open-source contributors can download, run, and modify the UI or standard ingestion loops without requiring massive deep-learning runtimes or expensive GPUs. The LSTM modules load conditionally, initializing only when a user explicitly activates them via python manage.py train_lstm_anomaly.


Explainability: Busting the "Black Box"

An anomaly alert is completely useless if a user doesn't know why it went off. If the map flashes red, an operator needs to know what triggered the alarm instantly.

To fix this, our explainability module decomposes the ensemble's mathematical scoring matrix into an explicit plain-text payload. When the UI hits /api/v1/anomalies/<id>/explanation/, it receives a clean breakdown:

{
  "anomaly_id": "8f3b2a-7c",
  "detector_type": "Ensemble_LOF",
  "severity": "CRITICAL",
  "confidence_score": 0.91,
  "explanation": "Triggered due to an 87% deviation in rolling heading variance (circling behavior) paired with atypical low-velocity thresholds for this airframe profile."
}

Enter fullscreen mode Exit fullscreen mode

The frontend reads this payload to display clear warnings alongside the live flight profile. Users can even submit structured feedback to mark a detection as a false alarm, creating a clean, labeled dataset that our automated Celery jobs use to retrain the models every week.


Building Systems That Expect Uncertainty

SkyWatch Live shows that you can build highly performant, intelligent monitoring tools out of raw public data if you architect your pipeline to expect imperfections. By separating your fast live-state caches from your analytical data stores and guarding your machine learning models with strict validation layers, you build a system that tells the truth about chaotic real-world inputs.

If you have a laptop, a curious mindset, and want to dig into the background tasks, training engines, or the mapping layer, the repository is open source and ready for setup. Hop into the codebase, explore how the real-time data flows, and let me know your thoughts!

👉 GitHub Repository: debjit450/skywatch-live


Top comments (0)