DEV Community

dvnc0
dvnc0

Posted on

Keeping Mission Critical Code Running

After pushing a pretty nice little defect to production at work that broke arguably some of our most mission-critical code my mind went on its normal spiral of why and how can I avoid this in the future. It was not a spectacular defect it was pretty boring and fixed pretty quickly, but it got through to production and noisily broke critical code. As my mind raced around I remembered a few articles I had stumbled on at some point about NASA and the Jet Propulsion Laboratories rules for critical code. This is code meant to go to Mars and remain functional or fail as gracefully as possible so I figured it is probably a good area to dig around in.

Why NASA?

The JPL Power of 10 rules are designed to limit specific C coding practices that make C code harder to peer review and statically analyze. In turn, these practices make C code more prone to errors and complex defects that could cause production failures on mission-critical code on other planets, imagine being that developer. NASA also focuses on defensive programming as a way to limit defects on systems running millions of miles away from Earth. I'm a PHP developer so these rules don't translate 100% to PHP, but they make you think about common patterns that can be eliminated to make critical code safer. I'm not going to talk about all the rules here but you can read them here or in this PDF

Relating these rules to PHP

Like I said not all of these rules will directly translate to PHP, but the idea of eliminating coding practices that are prone to errors or make code hard to review does. What things can we do in PHP to make critical code safer?

  • Use simple control flow logic and avoid nested loops and deeply nested conditionals. This makes code harder to follow, test, and statically analyze which increases the likelihood of defects. Mission-critical code should use as simple of control flow logic as possible.
  • Always leave errors and warnings on when running your code and never ignore them.
  • Wrap code in modular and reusable classes that can be tested and isolated, this also helps in restricting scope.
  • Avoid global variables which can often introduce unintended side effects into your system
  • Use types for all returns and arguments when possible. Clearly define interfaces and validate they are being fulfilled.
  • Use static analysis to catch errors early
  • Use graceful error handling to ensure mission-critical code continues to function even after errors
  • Keep functions and classes as small and simple as possible
  • Avoid references because they can introduce unintended behavior that is hard to catch and replicate

These are loosely inspired by the Power of Ten Rules and the JPL guidelines. Going further down the rabbit hole we can take this to another level, reviewing NASA coding standards, defensive programming, and reliability which provides summaries of some NASA defensive coding practices.

  • As the article says test, test, and test. These should be unit tests and integration tests of mission-critical code. NASA operates on the idea that code must work flawlessly in production so testing is their best guarantee.
  • Monitor production code for potential issues
  • Treat warnings as errors, because eventually, they will be
  • Use extreme caution when including code you do not control as part of your mission-critical code, it can change and it can introduce errors
  • Treat your code as if it will be attacked and it will fail and defend against that
  • Validate all input
  • Ability to replicate production environment locally and enable fast development cycles
  • Code reviews using both automated tools and peer review
  • All warnings should be fixed before the code is sent to production
  • 100% unit test coverage
  • Test for edge cases and all branches of a function

Some additional items to think about

These are things not mentioned in the NASA reading directly, but can really help when it comes to developers practicing defensive coding and peer code reviews.

  • Ensure your code is well documented and that documentation is up to date
  • Develop and follow coding standards and best practices

Conclusion

These are all things that we can do not only in PHP code but any code. Treating mission-critical code as code that must run correctly every time it runs and taking a defensive stance to ensure that it does improve your product and the reliability of that product. My defect was the result of a mix of testing failures, good thing I could fix it and push it live and not have to wait for it to travel millions of miles. What rules do you follow to ensure your critical code stays functional?

Top comments (0)