applekoiot

Posted on Sep 9

Guide — Simulating Supply Chain Incidents with GPT12-X Test Data

#iot #supplychain #node #mqtt

Who this is for: platform/data engineers working on shipment visibility, cold-chain monitoring, alerting and dashboards.

What you’ll get: a reproducible way (GPT12-X) to synthesize telemetry, inject incidents, publish to MQTT or save as NDJSON/CSV, so you can validate alert rules and dashboards quickly and safely.

Why simulate?

Real data is expensive, slow to obtain and hard to control. Synthetic data lets you:

Reproduce delays/route deviations/temperature breaches/door events/etc. on demand
Regression-test alert/dash changes in minutes
Cover edge cases (GPS jamming/drift, sudden battery drop, humidity spikes)
Exercise the whole pipeline: MQTT → streaming jobs → alerts/dashboards, or NDJSON/CSV → lakehouse/BI

Incident types we simulate

DELAY — prolonged standstill (warehouse/traffic)
ROUTE_DEVIATION — geofence/corridor deviation
TEMP_EXCURSION — cold-chain breach
SHOCK — handling impact
DOOR_OPEN — unauthorized door event
BATTERY_DRAIN — abnormal battery drop
GPS_JAMMING — missing/invalid GNSS fix
HUMIDITY_ANOMALY — humidity out of range

Unified data model

{
  device_id: string;
  shipment_id: string;
  ts: string;              // ISO timestamp
  lat: number | null;
  lon: number | null;
  speed_kph: number | null;
  temp_c: number | null;
  humidity: number | null;
  shock_g: number | null;
  door_open: boolean | null;
  battery_pct: number | null;
  events: string[];        // e.g. ["TEMP_EXCURSION"]
  meta?: { route: string; step: number; }  // helpful for replay/debug
}

Sample record

{
  "device_id": "ELK-SIM-204913",
  "shipment_id": "SHP-1736389123456-0",
  "ts": "2025-09-08T06:02:00.000Z",
  "lat": 34.0643,
  "lon": -118.2519,
  "speed_kph": 42.7,
  "temp_c": 5.1,
  "humidity": 67,
  "shock_g": 0,
  "door_open": false,
  "battery_pct": 82,
  "events": ["DELAY"],
  "meta": { "route": "US-LA-CHI-NYC", "step": 2 }
}

GPT12-X at a glance

GPT12-X is a single Node.js CLI script that generates tracks along predefined routes and injects incidents at configurable rates, then:

publishes live to MQTT (for streaming consumption), or
writes NDJSON/CSV for offline analysis/replay.

Prereqs: Node.js ≥ 18

CLI options (quick reference)

Option	Type	Default	Description
`--route`	string	`US-LA-CHI-NYC`	Predefined route (e.g. `US-LA-CHI-NYC`, `CN-SZ-SH`)
`--minutes`	number	180	Total duration (minutes)
`--interval`	number	60	Sampling / MQTT publish interval (s)
`--shipments`	number	1	Concurrent shipments
`--coldchain`	boolean	true	2–8 °C baseline (if false → ambient)
`--incident-rate`	number	0.18	Incident intensity (per hour)
`--mqtt`	string	–	`mqtt(s)://host:port`
`--topic`	string	`sim/telemetry`	MQTT topic
`--out`	string	–	Output NDJSON path
`--csv`	string	–	Output CSV path
`--username` / `--password`	string	–	MQTT auth
`--insecure`	boolean	false	Allow self-signed cert (test only)

Tips

In production tests, keep a fixed --interval for steady event cadence.
Interpret --incident-rate as per-hour average (Poisson-like): 0.25 ≈ 1 time every 4 hours.

Quickstart (copy & run)

# Two shipments, 3 hours, 60-second interval; write NDJSON
node gpt12x-sim.js --minutes 180 --interval 60 --shipments 2 --out gpt12x.ndjson

# Publish to local MQTT (topic sim/telemetry) for 60 minutes
node gpt12x-sim.js --mqtt mqtt://localhost:1883 --topic sim/telemetry --minutes 60

# Generate CSV on the CN South → East route
node gpt12x-sim.js --route CN-SZ-SH --minutes 120 --csv gpt12x.csv

Use it to validate alerts & dashboards

Rules

Temperature breach (TEMP_EXCURSION): trigger immediately; confirm notification within SLA; clear when temp returns to safe range for N minutes.
Route deviation: compare against geofence/corridor; require M consecutive deviations to escalate.
Delay/stall: near-zero speed & minimal positional delta for T minutes.
Door: open outside authorized stops triggers alert (combine with geofence).
Battery: sudden drops or slope above threshold → warn; below lower bound → escalate.
GPS quality: mark GPS_JAMMING/drift, trigger self-check and data-quality flags.

Visualization

Map polyline + incident bubbles; link with timeline brushing.
Time series (temp/speed/battery) with colored anomalies.
Incident histograms by type/time/route/device.
Data-quality tiles: GNSS accuracy, gaps, deviation rate, incident coverage, etc.

Pipelines

Streaming: MQTT → Flink/Spark/Kafka Streams → alert service/metrics store (Influx/TSDB/ClickHouse).
Batch: NDJSON/CSV → Lakehouse (Iceberg/Hudi/Delta) → BI/Notebook.

FAQ

Q: Synthetic ≠ real. How to close the gap?

A: Parameterize the generator with distributions from real devices (speed/dwell, temp drift, congestion windows, etc.).

Q: Can I mix synthetic with real?

A: Yes. Tag synthetic records (e.g., meta.synthetic=true) and load-test throughput/latency & alert false-negatives/positives.

Q: More realistic routes?

A: Import multi-segment Polyline/GeoJSON or use road-network APIs (OSRM/Valhalla/Mapbox Directions) and add congestion models.

Extensions

More sensors: light, CO₂, tilt, vibration spectrum
Road-network & congestion modeling by POI/time-of-day
Statistical control of intensity/duration (Poisson, Exponential, Gaussian mixture)
Mixed fleets across routes/time-zones/holidays
Blend with real devices for stress tests

License & Disclaimer

Code & examples for educational/testing under MIT.
TLV/payload examples are demonstrative, not any vendor’s production protocol/spec.

CTA

Want the full script (or a Git/Gist link), plus ready-made routes and incident-distribution templates?

Tell me where to host it and I’ll add it to this post (or as an Appendix).

DEV Community