Most engineers who move from software or ML into industrial AIoT go through the same disorientation. The skills transfer and the fundamentals apply, but the environment has properties that invalidate assumptions so deeply embedded in standard engineering practice that they feel less like assumptions and more like physics.
Three of those invalidated assumptions are responsible for most of the interesting engineering work in this space—and for most of the ways that well-built systems fail when they first meet a real industrial environment.
Problem one: Your model's performance metric is not your customer's success metric
In most ML contexts, model performance metrics and customer success metrics are closely correlated. A recommendation model with a higher click-through rate produces a better user experience. An image classifier with higher accuracy produces fewer errors in the application. The metric you optimize during development is a reasonable proxy for the outcome the customer cares about.
In industrial AIoT, this correlation breaks down in ways that are not immediately obvious but become very clear after a first real deployment.
Consider a predictive maintenance system. Your evaluation metric is precision and recall on a held-out test set. Your customer's success metric is reduction in unplanned downtime. These are related, but the relationship is mediated by a factor that does not appear in your evaluation framework: the operations team's decision to act on your system's alerts.
A model with 85% precision and 90% recall generates alerts that are wrong 15% of the time. In consumer applications, a 15% error rate is often acceptable. In a manufacturing environment where each alert requires a maintenance team to interrupt a running production line, a 15% false positive rate means one in every six-to-seven interruptions was unnecessary. After enough unnecessary interruptions, the operations team develops a mental model of when to trust the system and when not to—which is a rational response, but one that decouples the system's outputs from the customer's decision-making in ways that your monitoring dashboards will not capture.
js// What your evaluation pipeline measures:
precision = true_positives / (true_positives + false_positives) // 0.85
recall = true_positives / (true_positives + false_negatives) // 0.90
// What your customer actually experiences:
alert_follow_through_rate_month_1 = 0.97 // team investigates almost everything
alert_follow_through_rate_month_3 = 0.71 // team starting to filter by intuition
alert_follow_through_rate_month_6 = 0.43 // team has built their own mental model
// Your model metrics have not changed.
// Your customer's operational value has collapsed.
The engineering response is not better model accuracy—it is a richer evaluation framework that includes operational context. Not "is the alert correct?" but "is the alert actionable given what the operations team knows?" That reframe changes which model architectures are most valuable, which data you need during training, and what post-deployment monitoring needs to track.
Problem two: Edge inference constraints are set by physics, not by engineering preferences
Web and mobile engineers make edge inference choices for performance or privacy reasons. If latency is acceptable, you can fall back to cloud inference. If the device cannot handle the model, you can serve from a cloud endpoint.
Industrial environments remove this flexibility for the cases that matter most.
A worker safety system needs to detect a zone breach and trigger an alert within two seconds. During the morning shift change — when the most workers are moving and the most equipment is operating — the facility's wireless network is at peak congestion. The RTT to the cloud inference endpoint is 800ms under normal conditions and 2,400ms under shift-change load. Your two-second requirement is met sometimes and missed the rest of the time. Since the cases where it is missed are precisely the high-activity cases where safety events are most likely, the system's safety guarantee is weakest exactly when it is most needed.
The only architecture that meets the requirement unconditionally is edge inference on a device that does not depend on network availability for its core decision loop. This means your model needs to fit within the compute and memory budget of hardware that costs tens of dollars, draws milliwatts of power, and operates in a temperature range that may include a freezer warehouse or a foundry floor.
The tradeoffs between model complexity, inference latency, power budget, and hardware cost do not have clean solutions. They require engineering judgment that is informed by the specific operational context of each deployment — and that judgment gets better with repeated exposure to the ways these tradeoffs play out in real environments. Organizations building across multiple industrial contexts, like Aperture Venture Studio with its portfolio of AIoT ventures on a shared platform, develop systematic approaches to these tradeoffs that single-deployment teams take much longer to acquire.
Problem three: The data you need most is the hardest to get
Industrial AI faces a class imbalance problem that is structural rather than incidental. The events you most want to predict—equipment failures, safety incidents, and inventory stockouts in well-run operations—are, by definition, rare in environments that are being managed reasonably well.
The consequence is that your most important models are the ones with the least training data. A bearing failure model for a specific class of equipment might have access to four or five labeled failure sequences across an entire customer history. A safety incident detection model might have fewer than ten labeled examples across multiple facility deployments.
Standard class imbalance techniques—SMOTE, cost-sensitive learning, and threshold adjustment—help but do not solve the core problem, which is that rare industrial events have complex, high-dimensional precursor signatures that are difficult to learn from a handful of examples. The approaches that work best combine domain knowledge (from the engineers and operators who have seen failures before) with transfer learning from related equipment types and anomaly detection framings that do not require labeled failure examples at all.
Getting this right requires working closely enough with industrial operations teams that their institutional knowledge can be encoded into the model development process—which is a different kind of collaboration than most ML teams are practiced at and a skill that compounds significantly with experience.
What's the hardest version of any of these three problems you've encountered in practice? Curious what approaches this community has found that work.
Top comments (0)