Who this is for: platform/data engineers working on shipment visibility, cold-chain monitoring, alerting and dashboards.
What you’ll get: a reproducible way (GPT12-X) to synthesize telemetry, inject incidents, publish to MQTT or save as NDJSON/CSV, so you can validate alert rules and dashboards quickly and safely.
Why simulate?
Real data is expensive, slow to obtain and hard to control. Synthetic data lets you:
- Reproduce delays/route deviations/temperature breaches/door events/etc. on demand
- Regression-test alert/dash changes in minutes
- Cover edge cases (GPS jamming/drift, sudden battery drop, humidity spikes)
- Exercise the whole pipeline: MQTT → streaming jobs → alerts/dashboards, or NDJSON/CSV → lakehouse/BI
Incident types we simulate
- DELAY — prolonged standstill (warehouse/traffic)
- ROUTE_DEVIATION — geofence/corridor deviation
- TEMP_EXCURSION — cold-chain breach
- SHOCK — handling impact
- DOOR_OPEN — unauthorized door event
- BATTERY_DRAIN — abnormal battery drop
- GPS_JAMMING — missing/invalid GNSS fix
- HUMIDITY_ANOMALY — humidity out of range
Unified data model
{
device_id: string;
shipment_id: string;
ts: string; // ISO timestamp
lat: number | null;
lon: number | null;
speed_kph: number | null;
temp_c: number | null;
humidity: number | null;
shock_g: number | null;
door_open: boolean | null;
battery_pct: number | null;
events: string[]; // e.g. ["TEMP_EXCURSION"]
meta?: { route: string; step: number; } // helpful for replay/debug
}
Sample record
{
"device_id": "ELK-SIM-204913",
"shipment_id": "SHP-1736389123456-0",
"ts": "2025-09-08T06:02:00.000Z",
"lat": 34.0643,
"lon": -118.2519,
"speed_kph": 42.7,
"temp_c": 5.1,
"humidity": 67,
"shock_g": 0,
"door_open": false,
"battery_pct": 82,
"events": ["DELAY"],
"meta": { "route": "US-LA-CHI-NYC", "step": 2 }
}
GPT12-X at a glance
GPT12-X is a single Node.js CLI script that generates tracks along predefined routes and injects incidents at configurable rates, then:
- publishes live to MQTT (for streaming consumption), or
- writes NDJSON/CSV for offline analysis/replay.
Prereqs: Node.js ≥ 18
CLI options (quick reference)
Option | Type | Default | Description |
---|---|---|---|
--route |
string | US-LA-CHI-NYC |
Predefined route (e.g. US-LA-CHI-NYC , CN-SZ-SH ) |
--minutes |
number | 180 | Total duration (minutes) |
--interval |
number | 60 | Sampling / MQTT publish interval (s) |
--shipments |
number | 1 | Concurrent shipments |
--coldchain |
boolean | true | 2–8 °C baseline (if false → ambient) |
--incident-rate |
number | 0.18 | Incident intensity (per hour) |
--mqtt |
string | – | mqtt(s)://host:port |
--topic |
string | sim/telemetry |
MQTT topic |
--out |
string | – | Output NDJSON path |
--csv |
string | – | Output CSV path |
--username / --password
|
string | – | MQTT auth |
--insecure |
boolean | false | Allow self-signed cert (test only) |
Tips
- In production tests, keep a fixed
--interval
for steady event cadence. - Interpret
--incident-rate
as per-hour average (Poisson-like): 0.25 ≈ 1 time every 4 hours.
Quickstart (copy & run)
# Two shipments, 3 hours, 60-second interval; write NDJSON
node gpt12x-sim.js --minutes 180 --interval 60 --shipments 2 --out gpt12x.ndjson
# Publish to local MQTT (topic sim/telemetry) for 60 minutes
node gpt12x-sim.js --mqtt mqtt://localhost:1883 --topic sim/telemetry --minutes 60
# Generate CSV on the CN South → East route
node gpt12x-sim.js --route CN-SZ-SH --minutes 120 --csv gpt12x.csv
Use it to validate alerts & dashboards
Rules
-
Temperature breach (
TEMP_EXCURSION
): trigger immediately; confirm notification within SLA; clear when temp returns to safe range for N minutes. - Route deviation: compare against geofence/corridor; require M consecutive deviations to escalate.
- Delay/stall: near-zero speed & minimal positional delta for T minutes.
- Door: open outside authorized stops triggers alert (combine with geofence).
- Battery: sudden drops or slope above threshold → warn; below lower bound → escalate.
-
GPS quality: mark
GPS_JAMMING
/drift, trigger self-check and data-quality flags.
Visualization
- Map polyline + incident bubbles; link with timeline brushing.
- Time series (temp/speed/battery) with colored anomalies.
- Incident histograms by type/time/route/device.
- Data-quality tiles: GNSS accuracy, gaps, deviation rate, incident coverage, etc.
Pipelines
- Streaming: MQTT → Flink/Spark/Kafka Streams → alert service/metrics store (Influx/TSDB/ClickHouse).
- Batch: NDJSON/CSV → Lakehouse (Iceberg/Hudi/Delta) → BI/Notebook.
FAQ
Q: Synthetic ≠ real. How to close the gap?
A: Parameterize the generator with distributions from real devices (speed/dwell, temp drift, congestion windows, etc.).
Q: Can I mix synthetic with real?
A: Yes. Tag synthetic records (e.g., meta.synthetic=true
) and load-test throughput/latency & alert false-negatives/positives.
Q: More realistic routes?
A: Import multi-segment Polyline/GeoJSON or use road-network APIs (OSRM/Valhalla/Mapbox Directions) and add congestion models.
Extensions
- More sensors: light, CO₂, tilt, vibration spectrum
- Road-network & congestion modeling by POI/time-of-day
- Statistical control of intensity/duration (Poisson, Exponential, Gaussian mixture)
- Mixed fleets across routes/time-zones/holidays
- Blend with real devices for stress tests
License & Disclaimer
- Code & examples for educational/testing under MIT.
- TLV/payload examples are demonstrative, not any vendor’s production protocol/spec.
CTA
Want the full script (or a Git/Gist link), plus ready-made routes and incident-distribution templates?
Tell me where to host it and I’ll add it to this post (or as an Appendix).
Top comments (0)