TL;DR
To auto-generate documentation from IoT systems, you need to understand how industrial protocols structure data. Here's what I learned parsing CAN, OPC UA, and MQTT for AI processing—without sharing the secret sauce.
The Challenge
In my last post, I explained why Digital Twin documentation doesn't scale.
Now let's talk about the technical challenge:
How do you teach AI to understand industrial IoT protocols?
Why Industrial Protocols Are Different
Consumer IoT:
Temperature sensor → JSON → Cloud → Dashboard
{
"device": "temp_sensor_1",
"value": 23.5,
"unit": "celsius"
}
Simple. Self-explanatory. Human-readable.
Industrial IoT:
CAN Frame: ID 0x18FF5017 Data: 0x1A 0x2B 0x3C 0x4D 0x5E 0x6F 0x70 0x81
What does this mean?
Which component sent it?
What's being measured?
What are the units?
How do you decode it?
The answer is buried in:
- DBC files (CAN database)
- NodeSets (OPC UA)
- Topic hierarchies (MQTT)
- Manufacturer documentation
- Tribal knowledge
The Three Protocol Types I Focus On
1. CAN Bus (Controller Area Network)
Why it matters:
- Automotive standard
- Industrial machinery
- Real-time critical
- Deterministic timing
The complexity:
- Binary data requires decoding
- DBC files define structure
- Signals span multiple bytes
- Bit-level operations needed
- Byte order matters (endianness)
What AI needs to understand:
- Message definitions
- Signal mappings
- Scaling factors
- Physical units
- Update rates
2. OPC UA (Open Platform Communications)
Why it matters:
- Industry 4.0 standard
- Machine-to-machine communication
- Rich metadata
- Security built-in
The complexity:
- Node hierarchies (tree structures)
- Type systems
- References between nodes
- Historical data
- Events and alarms
What AI needs to understand:
- Node relationships
- Data types
- Semantic meaning
- Access levels
- Update mechanisms
3. MQTT (Message Queuing Telemetry Transport)
Why it matters:
- IoT standard
- Lightweight
- Publish/subscribe model
- Cloud integration
The complexity:
- Topic naming conventions
- Payload formats (often custom)
- QoS levels
- Retained messages
- Will messages
What AI needs to understand:
- Topic hierarchies
- Payload structure
- Device identification
- Data frequency
- Error handling
The Architecture Challenge
To generate documentation, the system needs to:
1. Parse Protocol Definitions
Extract structure from:
- CAN DBC files
- OPC UA NodeSets (XML)
- MQTT topic schemas
- Custom payload formats
2. Understand Semantics
Figure out:
- What is being measured?
- Why does this data point exist?
- How does it relate to other signals?
- What are normal vs. abnormal values?
3. Extract Context
Determine:
- Component location
- System role
- Dependencies
- Update frequency
- Error conditions
4. Generate Content
Create:
- Human-readable descriptions
- Technical specifications
- Troubleshooting guides
- Training materials
The Validation Problem
The hardest challenge isn't generation—it's validation.
AI can hallucinate. In technical documentation, that's dangerous.
Example hallucination:
AI Generated: "Sensor range: -40°C to +125°C"
Reality: Sensor range: -40°C to +85°C
Difference: 40°C
Impact: Potential equipment damage
My validation approach:
Layer 1: Protocol-Level Validation
- Does the signal exist in the DBC?
- Are units consistent?
- Do value ranges match specifications?
- Are calculations correct?
Layer 2: Physics-Based Validation
- Is this temperature realistic for this component?
- Does this pressure make physical sense?
- Are related signals consistent?
- Do timing relationships check out?
Layer 3: Cross-Reference Validation
- Compare against existing documentation
- Check manufacturer datasheets
- Verify with similar components
- Flag deviations for human review
Layer 4: Human Review
- Technical expert reviews flagged items
- Approves or corrects
- System learns from corrections
- Confidence scores improve
What I Don't Share: The AI Part
What I can't reveal:
- Specific prompt engineering techniques
- Training data strategies
- Model fine-tuning approaches
- Proprietary validation algorithms
Why?
This is the competitive advantage. The "secret sauce."
What I can share:
- The architecture approach
- Validation strategies
- Integration patterns
- Lessons learned
The Technical Challenges I'm Still Solving
1. Proprietary Protocols
Many manufacturers use custom protocols.
Current approach:
- Support standard protocols first
- Build adapter framework
- Work with manufacturers on custom parsers
- Community-contributed adapters
2. Multi-Protocol Systems
Real systems use multiple protocols simultaneously.
Challenge:
- CAN for vehicle bus
- OPC UA for machine control
- MQTT for cloud telemetry
They all reference the same physical components.
Solution approach:
- Unified data model
- Cross-protocol mapping
- Relationship inference
- Consistency validation
3. Real-Time vs. Historical Data
Documentation needs both:
- Real-time: Current system state
- Historical: Behavior patterns, trends
Balance:
- When to use live data?
- When to use historical patterns?
- How to detect anomalies?
- How to update documentation automatically?
4. Safety-Critical Components
Some components are safety-critical.
Extra requirements:
- Higher validation standards
- Mandatory human review
- Audit trails
- Certification compliance
Approach:
- Component classification
- Escalation rules
- Enhanced validation
- Regulatory compliance checks
Lessons from 25 Years of Industrial Testing
1. Context Is Everything
A temperature sensor reading 85°C:
- In an engine bay: Normal
- In a passenger cabin: Emergency
- In a battery pack: Critical failure
AI must understand context.
2. Edge Cases Matter More Than Normal Cases
Normal operation is easy to document.
Edge cases are where:
- Systems fail
- Technicians get confused
- Documentation is most valuable
Focus on edge cases.
3. Different Audiences Need Different Details
Engineer needs:
- Signal definitions
- Bit-level encoding
- Update rates
- Protocol details
Technician needs:
- What it measures
- Normal values
- How to troubleshoot
- When to escalate
Same component, different documentation.
4. Documentation Rots Quickly
System updated? Documentation is outdated.
Key insight:
Treat documentation as code—regenerate on every system change.
The Architecture I'm Building Toward
Industrial IoT System
↓
Protocol Parsers (CAN/OPC UA/MQTT)
↓
Semantic Understanding Layer
↓
Context Enrichment
↓
AI Generation Engine
↓
Multi-Layer Validation
↓
Human Review Interface
↓
Output Generator (Docs/Training/Test Data)
↓
Version Control & Distribution
Each layer has:
- Clear interfaces
- Validation rules
- Error handling
- Logging & monitoring
What's Working
Protocol Parsing:
- CAN DBC: ✅ Solid
- OPC UA: ✅ Good for standard NodeSets
- MQTT: ⚠️ Depends on payload format
Semantic Understanding:
- Standard components: ✅ Good
- Custom components: ⚠️ Needs training
- Complex relationships: 🚧 Work in progress
Content Generation:
- Technical docs: ✅ Production-ready
- Training materials: ✅ Good
- Test data: ⚠️ Needs more physics validation
Validation:
- Protocol-level: ✅ Reliable
- Physics-based: ⚠️ Component-dependent
- Cross-reference: 🚧 Being refined
What I'm Testing in Beta
Accuracy in real-world scenarios:
- Automotive testing systems
- Manufacturing lines
- Building automation
- Energy monitoring
Questions I need answered:
- Does validation catch enough errors?
- Is human review burden acceptable?
- Does it save time vs. manual documentation?
- What breaks in edge cases?
This is why I need beta testers.
Technical Decisions Still Open
1. On-Premise vs. Cloud
Tradeoffs:
- Security (on-premise wins)
- Updates (cloud wins)
- Performance (depends)
- Cost (complicated)
2. API-First vs. UI-First
Considerations:
- Engineers want API
- Managers want UI
- Both needed eventually
- Which first?
3. Open-Source Strategy
What to open-source:
- Protocol parsers? (community benefit)
- Validation framework? (industry standard?)
- Full system? (competitive risk?)
4. Industry Focus
Should I specialize:
- Automotive first? (largest market)
- Manufacturing? (most pain)
- Building tech? (fastest adoption)
- Platform for all? (harder to differentiate)
Questions for the Community
For Protocol Experts:
- What protocols am I missing?
- What's the hardest protocol to parse?
- Any gotchas I should know about?
For Industrial IoT Engineers:
- What's your validation approach?
- How do you handle protocol updates?
- What documentation format do you prefer?
For AI/ML Engineers:
- How would you validate technical content?
- What's your hallucination detection strategy?
- Any red flags in my approach?
Next Article
Part 3: "25 Years of Industrial Testing - Lessons for AI Documentation"
Topics:
- Pre-production testing insights
- Common failure patterns in documentation
- What makes documentation actually useful
- Test data generation strategies
- Human factors in automation
Beta Program Status
Spots filled: 3/10
Applications: 14
Still accepting applications from:
- Industrial IoT teams
- Digital Twin implementers
- Protocol experts
- Technical documentation managers
Apply if you:
- Have real IoT data to test with
- Can commit to weekly feedback
- Want to influence product direction
- Need this problem solved
Contact: DM here
Questions about the technical approach? Concerns about feasibility? Tell me what I'm missing. 👇
Top comments (0)