Input-based attacks like Buffer Overflows, Cross-Site Scripting (XSS), and XXE are common in today’s software. And they do not go away. But why is that? Shouldn’t one assume that existing frameworks handle input correctly, and free developers from struggling with correctly implementing input handling over and over again? Sadly, the answer is no.
In this post I wrap up some ideas of Language Security Langsec which find a general solution to this problem and provide some tools to fix it.
Although there are people who dislike type-safe programming languages, from a security point of view they help a lot in securing software. Type-safety assures that data within a program has a well-defined structure one can rely on. Input-based attacks arise whenever data enters or leaves this well-typed environment.
Attackers use assumptions made by the program about its input to drive it into an undesired state. This could be execution of arbitrary code, or make it produce undesired output injections like XSS. Every program that handles input has to validate all assumptions made on its input before using it. There are two points during input processing where this validation has to be performed.
At first while reading input, a parser needs to reject all inputs that do not fulfill the expected format. Second when creating output in a specific format an unparser has to assure, that all input which interferes with the output format is encoded. Since input may be used in various locations within the output and depending on this location a different encoding is required, encoding cannot be performed by the parser when reading input.
Therefore parsers, unparsers, and encoding are key points where software security is ensured.
History of vulnerabilities has shown, that creating your own parser and unparser is as bad as rolling your own crypto. Some parser and unparser construction kits with focus on security are listed at Langsec. Currently MCHammerCoder is the weapon of choice to automate the creation of parsers and unparsers using Hammer. If those tools do not support your favorite programming language please feel free to adapt the concepts they use, ask for help on the LangSec Mailing list, or visit the LangSec Hackathon.
In the future, every programming language should support easy to use means to create such parsers and unparsers, to enable developers to create programs that work as expected even if some hacker tries to exploit them.
Those of you still wondering about the “Don’t fear the Grammar from the title, now is the time; We have to talk about grammars. Perhaps you remember some theoretical computer science class where you heard about context-free grammars. Sadly, computers are not almighty and are unable to recognize languages more complex than deterministic context-free reliably (see sidebar on page two in The Halting Problems of Network Stack Insecurity). They cannot "sensibly" resolve ambiguity if present in specification; even humans occasionally fail at it. This is what makes speech recognition fail sometimes. When considering security, such failures are exploited by adversaries. Why is this theoretical result important to developers? Because, every program that reads input implicitly defines a grammar for the input it accepts. And if this input is more complex than a context-free language there will always be edge cases attackers are able to exploit. That is why there is no program that can make absolutely sure whether the input belongs to the language.
So when creating your next custom format, use at most a context-free grammar to define it, this way you are assured the format does not get more complex and thereby inherent vulnerable. The tools mentioned before use this principle. So by using them when defining a new format you do not need to worry about the complexity theory behind LangSec. To get started with Hammer, try the Hammer Primer.
If this was too much of a rush through the topic, take a look at the current Usenix ;login: article Curing the Vulnerable Parser: Design Patterns for Secure Input Handling and on all the papers and presentations at Langsec.