DEV Community

Cover image for The Safety Workaround That Worked, Until It Didn't
Sudeeksha Chagarlamudi
Sudeeksha Chagarlamudi

Posted on

The Safety Workaround That Worked, Until It Didn't

Power it on, and it's supposed to be ready to go on its own, no operator steps. Then one day, after a couple of restarts, it just... wasn't. Here's the clever safety workaround that quietly caused it.

The setup

I had to build one testing tool that could behave as two opposite safety systems, but only ever connected to one at a time, with both behaviours crammed into a single safety system, sharing the same connection ID, over FSoE.
Quick context: FSoE (Fail Safe over EtherCAT) is how Beckhoff carries safety signals like emergency stops over a normal network. It's trustworthy precisely because it's strict, every connection has a unique ID that's checked every cycle, with a master deciding what's safe and a slave carrying it out. So my problem was a contradiction: two opposite personalities forced to share one identity.

The clever bit

The textbook move would've been two separate, validated configs with only one ever active. My constraints didn't allow that. So I used TE9000 (Beckhoff's TwinCAT 3 Safety Editor) to switch the behaviour between master and slave at runtime, reusing the same connection ID.
The whole thing is a standalone, portable unit, a Beckhoff compact PC paired with a safety PLC, and I set the application to launch automatically on startup. Power on, ready to go. It worked, and I was proud of it.

The bit that bit me

Then, after switching between the two safety configurations and restarting a few times, the auto-start sometimes just... didn't. On the second or third restart, the application that was supposed to come up on its own wouldn't, and I'd have to launch it manually from the tool.
I still don't have a clean explanation. My suspicion is that repeatedly mutating the safety configuration at runtime left the system in a state where the boot sequence and the safety side weren't reliably in step, so the auto-start quietly failed. But "usually starts" is not a phrase you want anywhere near a portable safety tool someone trusts to come up ready.

The lesson

Safety systems are supposed to be boringly predictable, including at startup. Power-on should land you in the same known state every single time. The moment you make the safety configuration something that changes at runtime, you risk introducing states the boot process can't always recover from cleanly. And an intermittent failure is the worst kind: it hides until it doesn't.
The fix was never a cleverer switch. It was not switching at all, separate, statically validated configurations, with only one ever active. Convenience and predictability were pulling opposite ways, and on a safety system, predictability wins. Every time.
Work with TwinSAFE, FSoE, or any safety PLC? Be deeply suspicious of anything that changes safety behaviour while the system is live. Working and reliable are not the same thing.

Top comments (0)