DEV Community

Ronny Elsner
Ronny Elsner

Posted on

Understanding Industrial IoT Protocols for AI Documentation

TL;DR

To auto-generate documentation from IoT systems, you need to understand how industrial protocols structure data. Here's what I learned parsing CAN, OPC UA, and MQTT for AI processing—without sharing the secret sauce.


The Challenge

In my last post, I explained why Digital Twin documentation doesn't scale.

Now let's talk about the technical challenge:

How do you teach AI to understand industrial IoT protocols?


Why Industrial Protocols Are Different

Consumer IoT:
Temperature sensor → JSON → Cloud → Dashboard
{
"device": "temp_sensor_1",
"value": 23.5,
"unit": "celsius"
}

Simple. Self-explanatory. Human-readable.

Industrial IoT:
CAN Frame: ID 0x18FF5017 Data: 0x1A 0x2B 0x3C 0x4D 0x5E 0x6F 0x70 0x81
What does this mean?

Which component sent it?
What's being measured?
What are the units?
How do you decode it?

The answer is buried in:

  • DBC files (CAN database)
  • NodeSets (OPC UA)
  • Topic hierarchies (MQTT)
  • Manufacturer documentation
  • Tribal knowledge

The Three Protocol Types I Focus On

1. CAN Bus (Controller Area Network)

Why it matters:

  • Automotive standard
  • Industrial machinery
  • Real-time critical
  • Deterministic timing

The complexity:

  • Binary data requires decoding
  • DBC files define structure
  • Signals span multiple bytes
  • Bit-level operations needed
  • Byte order matters (endianness)

What AI needs to understand:

  • Message definitions
  • Signal mappings
  • Scaling factors
  • Physical units
  • Update rates

2. OPC UA (Open Platform Communications)

Why it matters:

  • Industry 4.0 standard
  • Machine-to-machine communication
  • Rich metadata
  • Security built-in

The complexity:

  • Node hierarchies (tree structures)
  • Type systems
  • References between nodes
  • Historical data
  • Events and alarms

What AI needs to understand:

  • Node relationships
  • Data types
  • Semantic meaning
  • Access levels
  • Update mechanisms

3. MQTT (Message Queuing Telemetry Transport)

Why it matters:

  • IoT standard
  • Lightweight
  • Publish/subscribe model
  • Cloud integration

The complexity:

  • Topic naming conventions
  • Payload formats (often custom)
  • QoS levels
  • Retained messages
  • Will messages

What AI needs to understand:

  • Topic hierarchies
  • Payload structure
  • Device identification
  • Data frequency
  • Error handling

The Architecture Challenge

To generate documentation, the system needs to:

1. Parse Protocol Definitions

Extract structure from:

  • CAN DBC files
  • OPC UA NodeSets (XML)
  • MQTT topic schemas
  • Custom payload formats

2. Understand Semantics

Figure out:

  • What is being measured?
  • Why does this data point exist?
  • How does it relate to other signals?
  • What are normal vs. abnormal values?

3. Extract Context

Determine:

  • Component location
  • System role
  • Dependencies
  • Update frequency
  • Error conditions

4. Generate Content

Create:

  • Human-readable descriptions
  • Technical specifications
  • Troubleshooting guides
  • Training materials

The Validation Problem

The hardest challenge isn't generation—it's validation.

AI can hallucinate. In technical documentation, that's dangerous.

Example hallucination:
AI Generated: "Sensor range: -40°C to +125°C"
Reality: Sensor range: -40°C to +85°C
Difference: 40°C
Impact: Potential equipment damage

My validation approach:

Layer 1: Protocol-Level Validation

  • Does the signal exist in the DBC?
  • Are units consistent?
  • Do value ranges match specifications?
  • Are calculations correct?

Layer 2: Physics-Based Validation

  • Is this temperature realistic for this component?
  • Does this pressure make physical sense?
  • Are related signals consistent?
  • Do timing relationships check out?

Layer 3: Cross-Reference Validation

  • Compare against existing documentation
  • Check manufacturer datasheets
  • Verify with similar components
  • Flag deviations for human review

Layer 4: Human Review

  • Technical expert reviews flagged items
  • Approves or corrects
  • System learns from corrections
  • Confidence scores improve

What I Don't Share: The AI Part

What I can't reveal:

  • Specific prompt engineering techniques
  • Training data strategies
  • Model fine-tuning approaches
  • Proprietary validation algorithms

Why?

This is the competitive advantage. The "secret sauce."

What I can share:

  • The architecture approach
  • Validation strategies
  • Integration patterns
  • Lessons learned

The Technical Challenges I'm Still Solving

1. Proprietary Protocols

Many manufacturers use custom protocols.

Current approach:

  • Support standard protocols first
  • Build adapter framework
  • Work with manufacturers on custom parsers
  • Community-contributed adapters

2. Multi-Protocol Systems

Real systems use multiple protocols simultaneously.

Challenge:

  • CAN for vehicle bus
  • OPC UA for machine control
  • MQTT for cloud telemetry

They all reference the same physical components.

Solution approach:

  • Unified data model
  • Cross-protocol mapping
  • Relationship inference
  • Consistency validation

3. Real-Time vs. Historical Data

Documentation needs both:

  • Real-time: Current system state
  • Historical: Behavior patterns, trends

Balance:

  • When to use live data?
  • When to use historical patterns?
  • How to detect anomalies?
  • How to update documentation automatically?

4. Safety-Critical Components

Some components are safety-critical.

Extra requirements:

  • Higher validation standards
  • Mandatory human review
  • Audit trails
  • Certification compliance

Approach:

  • Component classification
  • Escalation rules
  • Enhanced validation
  • Regulatory compliance checks

Lessons from 25 Years of Industrial Testing

1. Context Is Everything

A temperature sensor reading 85°C:

  • In an engine bay: Normal
  • In a passenger cabin: Emergency
  • In a battery pack: Critical failure

AI must understand context.

2. Edge Cases Matter More Than Normal Cases

Normal operation is easy to document.

Edge cases are where:

  • Systems fail
  • Technicians get confused
  • Documentation is most valuable

Focus on edge cases.

3. Different Audiences Need Different Details

Engineer needs:

  • Signal definitions
  • Bit-level encoding
  • Update rates
  • Protocol details

Technician needs:

  • What it measures
  • Normal values
  • How to troubleshoot
  • When to escalate

Same component, different documentation.

4. Documentation Rots Quickly

System updated? Documentation is outdated.

Key insight:
Treat documentation as code—regenerate on every system change.


The Architecture I'm Building Toward

Industrial IoT System

Protocol Parsers (CAN/OPC UA/MQTT)

Semantic Understanding Layer

Context Enrichment

AI Generation Engine

Multi-Layer Validation

Human Review Interface

Output Generator (Docs/Training/Test Data)

Version Control & Distribution

Each layer has:

  • Clear interfaces
  • Validation rules
  • Error handling
  • Logging & monitoring

What's Working

Protocol Parsing:

  • CAN DBC: ✅ Solid
  • OPC UA: ✅ Good for standard NodeSets
  • MQTT: ⚠️ Depends on payload format

Semantic Understanding:

  • Standard components: ✅ Good
  • Custom components: ⚠️ Needs training
  • Complex relationships: 🚧 Work in progress

Content Generation:

  • Technical docs: ✅ Production-ready
  • Training materials: ✅ Good
  • Test data: ⚠️ Needs more physics validation

Validation:

  • Protocol-level: ✅ Reliable
  • Physics-based: ⚠️ Component-dependent
  • Cross-reference: 🚧 Being refined

What I'm Testing in Beta

Accuracy in real-world scenarios:

  • Automotive testing systems
  • Manufacturing lines
  • Building automation
  • Energy monitoring

Questions I need answered:

  1. Does validation catch enough errors?
  2. Is human review burden acceptable?
  3. Does it save time vs. manual documentation?
  4. What breaks in edge cases?

This is why I need beta testers.


Technical Decisions Still Open

1. On-Premise vs. Cloud

Tradeoffs:

  • Security (on-premise wins)
  • Updates (cloud wins)
  • Performance (depends)
  • Cost (complicated)

2. API-First vs. UI-First

Considerations:

  • Engineers want API
  • Managers want UI
  • Both needed eventually
  • Which first?

3. Open-Source Strategy

What to open-source:

  • Protocol parsers? (community benefit)
  • Validation framework? (industry standard?)
  • Full system? (competitive risk?)

4. Industry Focus

Should I specialize:

  • Automotive first? (largest market)
  • Manufacturing? (most pain)
  • Building tech? (fastest adoption)
  • Platform for all? (harder to differentiate)

Questions for the Community

For Protocol Experts:

  1. What protocols am I missing?
  2. What's the hardest protocol to parse?
  3. Any gotchas I should know about?

For Industrial IoT Engineers:

  1. What's your validation approach?
  2. How do you handle protocol updates?
  3. What documentation format do you prefer?

For AI/ML Engineers:

  1. How would you validate technical content?
  2. What's your hallucination detection strategy?
  3. Any red flags in my approach?

Next Article

Part 3: "25 Years of Industrial Testing - Lessons for AI Documentation"

Topics:

  • Pre-production testing insights
  • Common failure patterns in documentation
  • What makes documentation actually useful
  • Test data generation strategies
  • Human factors in automation

Beta Program Status

Spots filled: 3/10
Applications: 14

Still accepting applications from:

  • Industrial IoT teams
  • Digital Twin implementers
  • Protocol experts
  • Technical documentation managers

Apply if you:

  • Have real IoT data to test with
  • Can commit to weekly feedback
  • Want to influence product direction
  • Need this problem solved

Contact: DM here


Questions about the technical approach? Concerns about feasibility? Tell me what I'm missing. 👇

iot #protocols #architecture #industrialautomation #digitaltwin

Top comments (0)