DEV Community: Iroro Chadere

XML External Entity (XXE) Injection: A Complete Guide for Developers

Iroro Chadere — Sun, 04 Jan 2026 17:35:33 +0000

XXE (XML External Entity) injection is a vulnerability that turns standard XML features into security nightmares. Imagine three weeks after adding XML support to your API, you discover your application has been leaking AWS credentials to attackers. The culprit? A seemingly innocent XML parser doing exactly what it was designed to do.

Let's break down exactly how it works and how to prevent it.

Why XML Parsers Are Different

If you've worked with JSON APIs, you know the parser's job is straightforward: read the data structure and deserialize it. The JSON itself can't tell the parser to fetch external files or make network requests.

XML operates differently. XML includes a feature called Document Type Definitions (DTDs), a system originally designed to define structure and validation rules for documents. DTDs support entities, which work like variables you can define and reference throughout your document.

Here's a simple internal entity:

<!DOCTYPE note [
  <!ENTITY company "TechCorp">
]>
<note>
  <message>Welcome to &company;</message>
</note>

When parsed, &company; gets replaced with "TechCorp". This is completely safe because the entity value is defined right there in the document—there's no external data source involved.

But XML also supports external entity references to files on the filesystem or URLs on the network:

<!DOCTYPE data [
  <!ENTITY external SYSTEM "file:///etc/passwd">
]>
<data>&external;</data>

This is where things get dangerous. The SYSTEM keyword tells the parser to go fetch content from an external source. By default, most XML parsers will resolve this external entity without question. They'll read /etc/passwd from the filesystem and insert its contents into your document.

The parser doesn't distinguish between "content the developer intended to include" and "content an attacker specified in a malicious payload." It just follows instructions. If your application reflects that parsed content back to users in an API response, displays it in an admin panel, or logs it somewhere accessible, an attacker just read arbitrary files from your server.

The Three Main Attack Vectors

Direct File Disclosure

The most straightforward attack. An attacker sends XML with an external entity pointing to a sensitive file:

<!DOCTYPE order [
  <!ENTITY xxe SYSTEM "file:///var/www/app/config/database.yml">
]>
<order>
  <details>&xxe;</details>
</order>

Here's what happens step by step. Your application receives this XML, maybe via an API endpoint that accepts XML data. The XML parser encounters the <!DOCTYPE> declaration and processes the DTD. The parser sees the entity definition and reads that file from disk. When the parser reaches &xxe; in the document body, it replaces it with the file's contents. Your application processes the parsed XML, maybe it stores the details field in a database or returns it in an API response. The attacker receives your database credentials.

Real-world target files include /etc/passwd for user account information on Linux systems, /var/www/app/.env for environment variables and secrets, /proc/self/environ for environment variables of the current process which often contains secrets on cloud platforms, C:\Windows\System32\drivers\etc\hosts for system configuration on Windows, and ~/.ssh/id_rsa for private SSH keys.

The key requirement for this attack to work: your application must echo back or expose the parsed content somehow. This happens more often than you'd think, debugging endpoints that return parsed data, error messages that include XML values, or admin dashboards that display imported data.

Server-Side Request Forgery via XXE

Instead of reading local files, attackers can make your server send HTTP requests to arbitrary URLs:

<!DOCTYPE data [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/admin-role">
]>
<data>&xxe;</data>

This example targets AWS's metadata service, an internal HTTP endpoint available to EC2 instances that returns sensitive information about the instance, including IAM credentials.

Here's why this is so dangerous. The metadata service at 169.254.169.254 is only accessible from inside the AWS network. You can't reach it from your laptop or from the internet. But when your server's XML parser encounters SYSTEM "http://169.254.169.254/...", it makes an HTTP request from inside your infrastructure. The parser fetches the response which contains AWS credentials and inserts it into the parsed document.

If your application returns this parsed content, the attacker now has credentials to read from and write to S3 buckets, query and modify databases, spin up new EC2 instances, and access any resource the role has permissions for.

This works because your server can reach internal services that should never be internet-accessible. The attacker is using your server as a proxy into your private network. They could target http://internal-api.company.local/admin/users for internal admin APIs, http://localhost:8080/actuator/health for Spring Boot management endpoints, http://192.168.1.50:9200/_cat/indices for Elasticsearch clusters on your internal network, or http://consul.service.consul:8500/v1/kv/ for service discovery systems.

The XML parser doesn't care that these are internal resources. It just follows the URL and returns whatever it finds.

Denial of Service Through Entity Expansion

This attack is called "Billion Laughs" and it's devastatingly simple:

<!DOCTYPE data [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
]>
<data>&lol4;</data>

Let's trace what happens when the parser expands &lol4;. The entity lol4 contains 10 references to lol3. Each lol3 contains 10 references to lol2, so that's 10 times 10 equals 100 lol2 references. Each lol2 contains 10 references to lol, so that's 100 times 10 equals 1,000 lol references. Each lol contains the string "lol", so that's 1,000 times 3 characters equals 3,000 characters.

Now imagine adding more levels. With just a few more entity definitions, you can create billions of repetitions. By the time the parser finishes expanding, it's trying to hold billions of strings in memory. Your application crashes. Your server runs out of memory and the operating system kills the process. If this happens repeatedly, you've got a full denial of service.

The attack works because entity expansion happens during parsing before your application code even sees the data. You can't validate or limit it at the application layer because the parser consumes resources before you get a chance to intervene.

Where XXE Hides in Modern Applications

"Nobody uses XML anymore" is a dangerous assumption. XML processing happens in places you might not expect. SAML authentication systems use XML to pass authentication assertions between identity providers and applications. SOAP APIs are still common in enterprise integrations, especially with older systems. Office documents like .docx, .xlsx, and .pptx are ZIP files containing XML files that define document structure. SVG images are XML-based. RSS and Atom feeds for blogs and podcasts use XML. Configuration files like Maven POMs, Android layouts, and Spring configuration files are all XML.

The File Upload Problem

The most dangerous scenario is file uploads. This deserves special attention because it's often overlooked.

Here's why file uploads are particularly vulnerable. First, developers don't realize they're parsing XML. When you add a feature to let users upload profile pictures and accept SVG files, you might not think "I'm processing XML." But SVGs are XML documents. If you validate them, extract metadata, or resize them using a library that parses the XML structure, you're running an XML parser on untrusted input.

Second, the parser runs automatically. Many image processing libraries will automatically parse XML-based formats. You call image.open('avatar.svg') thinking you're just loading an image, but behind the scenes, an XML parser is processing the entire document structure including any DTDs and external entities.

Third, users expect file uploads to be safe. Unlike an API endpoint that explicitly accepts XML data, file uploads feel like simple data storage. Developers might thoroughly secure their API parsing but forget that uploaded files need the same scrutiny.

Example attack: An attacker uploads this SVG as their profile picture:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
  <text x="10" y="20">&xxe;</text>
</svg>

If your application processes this SVG to validate its dimensions before storing it, generate a thumbnail preview, extract and display metadata, or render it server-side for a preview, then your XML parser reads /etc/passwd and potentially exposes its contents.

The attacker might see the file contents in an error message like "Invalid text content: root❌0:0:root:/root:/bin/bash...", a thumbnail that renders the text, server logs that your application saves, or admin panels that show "problematic uploads."

Office documents have the same issue. A .docx file is a ZIP archive containing these XML files: word/document.xml for the actual document content, word/styles.xml for formatting information, and [Content_Types].xml for file type mappings.

If your application accepts document uploads and uses a library to extract text, count words, or index content, you're parsing XML. An attacker can embed malicious entities in these XML files:

<!-- word/document.xml -->
<?xml version="1.0"?>
<!DOCTYPE document [
  <!ENTITY xxe SYSTEM "file:///var/www/app/.env">
]>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p><w:t>&xxe;</w:t></w:p>
  </w:body>
</w:document>

When your document processor extracts text to index it for search or display a preview, it exposes your environment variables.

How to Fix XXE: Secure Configuration

The most effective defense is disabling external entity processing entirely. Unless you specifically need DTDs, and you probably don't, turn them off.

The reason this works: you're removing the dangerous functionality at its source. If the parser can't resolve external entities, all three attack vectors we discussed become impossible. The parser will either skip the entities or throw an error, but it won't make network requests or read files.

Let's walk through secure configurations for different languages and why each setting matters.

Java with DocumentBuilderFactory

Java's XML parsing ecosystem is complex because there are multiple parsers (DOM, SAX, StAX) and multiple ways to configure them. DocumentBuilderFactory is the most common for building DOM parsers.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

// Option 1: Disable DTDs entirely (most secure)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

// Option 2: If you must support DTDs, disable external entities
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

// Disable entity expansion to prevent billion laughs attacks
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);

DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlInput);

Breaking down each setting: Setting disallow-doctype-decl to true is the nuclear option. It completely disables DTD processing, which means the parser will throw an error if it encounters <!DOCTYPE>. This is the most secure configuration because it eliminates the entire attack surface. Use this unless you have a specific business requirement for DTD validation.

Setting external-general-entities to false means general entities (the ones you reference with &entityName;) won't fetch external resources. The parser will still process internal entities defined within the document itself, but it won't make network requests or read files.

Setting external-parameter-entities to false affects parameter entities (referenced with %entityName;) which are used within DTDs themselves. They're less commonly exploited but can still be dangerous. Disabling them prevents attackers from using DTDs to pull in external DTD fragments.

Setting XIncludeAware to false disables XInclude, which is a separate W3C standard for including external XML documents. It uses <xi:include> elements instead of entities, but it has the same security implications. Disabling it prevents another vector for including external content.

Setting ExpandEntityReferences to false tells the parser not to expand entity references at all. Combined with the other settings, it provides defense in depth.

For SAX parsers:

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

SAXParser parser = factory.newSAXParser();

SAX parsers are event-driven, meaning they call your code as they encounter elements rather than building a full document tree in memory. They're often used for large documents, but they have the same XXE vulnerabilities.

Python with defusedxml

Python's standard library includes multiple XML parsing modules like xml.etree.ElementTree, xml.dom.minidom, and xml.sax, and all of them are vulnerable to XXE by default. The Python documentation even includes warnings about this.

The defusedxml library is a wrapper around these standard modules that applies secure defaults:

# Don't use this - vulnerable to XXE
# from xml.etree import ElementTree

# Use this instead
from defusedxml import ElementTree

tree = ElementTree.parse('data.xml')
root = tree.getroot()

What defusedxml does under the hood: It disables DTD processing, external entity resolution, XInclude processing, and entity expansion beyond a safe limit.

The beautiful part: it's a drop-in replacement. You change one import line and your code is secure. The API is identical to the standard library.

Why this matters: Python makes it extremely easy to accidentally create XXE vulnerabilities because the vulnerable behavior is the default. You have to actively opt into security with the standard library. Defusedxml eliminates this complexity. Install it with pip install defusedxml and use it everywhere you process XML.

For lxml, a popular alternative XML library:

from lxml import etree

parser = etree.XMLParser(
    resolve_entities=False,  # Don't resolve external entities
    no_network=True,         # Disable all network access
    dtd_validation=False,    # Don't validate against DTDs
    load_dtd=False           # Don't load external DTDs
)

tree = etree.parse('data.xml', parser)

lxml is faster than the standard library and has more features, but it requires explicit security configuration. The settings above disable all the dangerous functionality.

Node.js with libxmljs

Node.js has several XML parsing libraries. libxmljs is a popular choice that binds to the C library libxml2. It's fast but requires careful configuration:

const libxmljs = require('libxmljs');

const doc = libxmljs.parseXml(xmlString, {
  noent: false,  // Disable entity substitution
  nonet: true    // Prevent network access
});

Understanding these options: noent stands for "no entity substitution." Setting it to false means entity substitution is disabled. This is confusing naming because it's a double negative, but it's how libxml2 works. When disabled, the parser won't replace &xxe; with external content.

The nonet option explicitly prevents all network access. Even if entity substitution is enabled, the parser won't make HTTP requests. This is defense in depth.

Alternative: xml2js is a pure JavaScript parser:

const xml2js = require('xml2js');

const parser = new xml2js.Parser({
  explicitCharkey: true,
  // xml2js doesn't support external entities by default
  // It's safer because it's pure JavaScript (no C bindings)
});

parser.parseString(xmlString, (err, result) => {
  if (err) {
    // Handle parsing error
    return;
  }
  // Process result
});

xml2js is a pure JavaScript implementation, which means it doesn't have the same feature set as libxml2-based parsers. Importantly, it doesn't support external entities at all not because of security configuration, but because the feature was never implemented. This makes it inherently safer for untrusted input, though it's slower for large documents.

PHP with libxml

PHP uses libxml2 for XML parsing, the same C library that Node.js uses. It has a global setting that affects all XML operations:

// Disable external entity loading globally
libxml_disable_entity_loader(true);

$dom = new DOMDocument();
// The LIBXML_NONET flag prevents network access
$dom->loadXML($xmlString, LIBXML_NONET);

Why global configuration matters in PHP: libxml_disable_entity_loader(true) affects every XML operation in your PHP process. This is both good and bad. Good because you can't accidentally forget to secure one XML parser. Bad because if any library or dependency needs external entities, it will break. In practice, most applications don't need external entities, so setting this globally is the right approach.

Additional libxml options:

// Load XML with multiple security flags
$dom = new DOMDocument();
$dom->loadXML($xmlString, 
    LIBXML_NONET |      // Disable network access
    LIBXML_NOENT |      // Disable entity substitution
    LIBXML_DTDLOAD |    // Don't load external DTDs
    LIBXML_DTDATTR      // Don't default attributes from DTDs
);

These flags work as bitwise options combined with the pipe operator to layer multiple protections.

The pattern across all languages is consistent: you're telling the parser to ignore DTDs and external references. Once configured, XXE becomes impossible because the parser simply won't execute the dangerous operations.

Defense in Depth: Additional Layers

Disabling external entities is your primary defense, but secure systems use multiple layers. If a developer accidentally uses an insecure parser or if you need to support legacy systems, these additional protections can prevent exploitation.

Input Validation with XML Schemas

If your application only accepts XML with a specific structure, which is common in APIs, enforce that structure before parsing anything. XML Schema (XSD) lets you define exactly what your XML should look like:

<!-- schema.xsd -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="order">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="id" type="xs:string"/>
        <xs:element name="amount" type="xs:decimal"/>
        <xs:element name="customer" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

This schema says: "An order must have an id (string), amount (decimal), and customer (string), in that order. Nothing else is allowed."

Validate before parsing:

from lxml import etree

# Load your schema
schema_doc = etree.parse('schema.xsd')
schema = etree.XMLSchema(schema_doc)

# Create a parser that enforces the schema AND disables dangerous features
parser = etree.XMLParser(
    schema=schema,
    no_network=True,
    resolve_entities=False
)

try:
    doc = etree.parse('input.xml', parser)
    # If we get here, the XML is valid and safe
except etree.XMLSyntaxError as e:
    # Validation failed - reject the input
    print(f"Invalid XML: {e}")
except etree.DocumentInvalid as e:
    # XML is well-formed but doesn't match schema
    print(f"Schema validation failed: {e}")

Why schema validation helps: It rejects unexpected structure early. If your schema doesn't include DTD declarations, any document with <!DOCTYPE> fails validation before the parser processes it. It limits what attackers can send. Even if they bypass your entity protections, they can't include arbitrary elements that might trigger other vulnerabilities in your application logic. It catches malformed attacks. Many XXE payloads will violate your schema's structural requirements, failing validation before reaching the vulnerable code.

Real-world example:

<!-- Attacker's XXE payload -->
<!DOCTYPE order [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<order>
  <id>&xxe;</id>
  <amount>100.00</amount>
  <customer>John Doe</customer>
</order>

If your schema says id must be a string with max length 20 characters, and /etc/passwd is 1,500 characters, the validation fails. The content never reaches your application logic.

Caveat: Schema validation alone doesn't protect you. Older XML validators might resolve entities during validation. Always combine schema validation with secure parser configuration.

Sanitize File Uploads

For file types that contain XML like SVG and Office documents, apply extra scrutiny. The challenge here is that you often need to process these files you can't just store them as opaque blobs.

For SVG files, SVGs are particularly dangerous because they're images that contain executable code (JavaScript via <script> tags) and can reference external resources. Use a library specifically designed for sanitization:

const DOMPurify = require('isomorphic-dompurify');

function sanitizeSVG(svgContent) {
  // DOMPurify removes dangerous elements while preserving valid SVG
  return DOMPurify.sanitize(svgContent, {
    USE_PROFILES: { svg: true, svgFilters: true },
    ADD_TAGS: ['use'],  // Allow specific SVG features you need
    ADD_ATTR: ['href']  // Allow specific attributes
  });
}

// Process only the sanitized version
const uploadedContent = req.file.buffer.toString();
const cleanSVG = sanitizeSVG(uploadedContent);

// Now you can safely parse it
const doc = libxmljs.parseXml(cleanSVG, { noent: false, nonet: true });

What DOMPurify removes: <!DOCTYPE> declarations and DTDs, external entity references, <script> tags for JavaScript in SVG, event handlers like onload="malicious()", <foreignObject> elements which can embed HTML with scripts, and references to external stylesheets that could leak data.

The result is a clean SVG that only contains drawing instructions.

For Office documents:

import zipfile
from defusedxml import ElementTree
from io import BytesIO

def extract_text_from_docx(file_path):
    """
    Safely extract text from a .docx file.
    """
    try:
        with zipfile.ZipFile(file_path, 'r') as docx:
            # Read the main document XML
            xml_content = docx.read('word/document.xml')

            # Parse with defusedxml (secure by default)
            root = ElementTree.fromstring(xml_content)

            # Extract text (namespace handling omitted for brevity)
            paragraphs = root.findall('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t')
            text = ' '.join([p.text for p in paragraphs if p.text])

            return text
    except zipfile.BadZipFile:
        raise ValueError("Invalid document file")
    except ElementTree.ParseError:
        raise ValueError("Malformed XML in document")

Why this is safer: We're explicitly reading only word/document.xml, not processing arbitrary files from the ZIP. We're using defusedxml which won't resolve external entities. We're only extracting text content, not executing any macros or scripts. We catch parsing errors and treat them as invalid uploads.

Alternative approach: Don't parse at all. If your use case is just storage and download, like in a document management system, the safest approach is:

// Just store the file
await s3.putObject({
  Bucket: 'uploads',
  Key: `documents/${userId}/${filename}`,
  Body: fileBuffer,
  ContentType: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
  ContentDisposition: 'attachment' // Force download, not inline display
});

// Serve it back with security headers
res.set({
  'Content-Type': 'application/octet-stream',  // Generic binary
  'Content-Disposition': `attachment; filename="${filename}"`,
  'X-Content-Type-Options': 'nosniff',  // Don't MIME-sniff
  'Content-Security-Policy': "default-src 'none'"  // No script execution
});

By not parsing the file, you avoid the entire vulnerability. Let users download files and open them in desktop applications where they control the security environment.

Network-Level Controls

Even with secure parsers, add network restrictions as a last line of defense. This protects you if a developer adds a new dependency with a vulnerable parser, a zero-day vulnerability is discovered in your XML library, or an attacker finds a way to bypass your parser configuration.

Firewall rules:

# Example iptables rule (Linux)
# Block outbound requests from application servers to internal networks

iptables -A OUTPUT -d 169.254.169.254 -j REJECT  # Block AWS metadata service
iptables -A OUTPUT -d 10.0.0.0/8 -j REJECT       # Block private network (10.x.x.x)
iptables -A OUTPUT -d 172.16.0.0/12 -j REJECT    # Block private network (172.16-31.x.x)
iptables -A OUTPUT -d 192.168.0.0/16 -j REJECT   # Block private network (192.168.x.x)

This prevents your application servers from making requests to internal resources, even if XXE somehow succeeds. The parser tries to fetch http://169.254.169.254, but the request is blocked at the network layer.

Cloud security groups (AWS example):

# Security Group configuration
Outbound:
  - Protocol: TCP
    Port: 443
    Destination: 0.0.0.0/0
    Description: "HTTPS to internet only"

  # Explicitly deny internal ranges
  - Protocol: ALL
    Port: ALL
    Destination: 10.0.0.0/8
    Action: DENY

Cloud platforms let you define security groups that act as virtual firewalls. Configure your application servers to only allow outbound HTTPS to the public internet, not to internal services.

IMDSv2 (Instance Metadata Service version 2) for AWS. AWS introduced IMDSv2 to make SSRF attacks harder:

# IMDSv2 requires a session token
# Step 1: Get a token (requires PUT request with headers)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Step 2: Use the token to access metadata
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/

XXE attacks can't do PUT requests or set custom headers, so they can't get the token. Enable IMDSv2 on all EC2 instances:

aws ec2 modify-instance-metadata-options \
  --instance-id i-1234567890abcdef0 \
  --http-tokens required \
  --http-put-response-hop-limit 1

Allowlists for legitimate external resources. If your application legitimately needs to fetch external XML, like RSS feeds, use an allowlist:

ALLOWED_DOMAINS = [
    'rss.example.com',
    'feeds.partner.com'
]

def is_allowed_url(url):
    from urllib.parse import urlparse
    domain = urlparse(url).netloc
    return domain in ALLOWED_DOMAINS

# In your XML entity resolver
class SecureEntityResolver(etree.Resolver):
    def resolve(self, url, id, context):
        if not is_allowed_url(url):
            # Log the attempt
            logger.warning(f"Blocked XXE attempt to: {url}")
            # Return empty content
            return self.resolve_string("", context)

        # Allow only specific resources
        return None  # Use default resolution

This provides defense in depth. Even if entities are enabled, only pre-approved domains can be accessed.

Testing Your Application

To verify your defenses work, test with actual XXE payloads in a safe environment. Never test on production.

Create a test server to receive callbacks:

# Start a simple HTTP server to detect SSRF attempts
python3 -m http.server 8000

Test file disclosure:

curl -X POST http://localhost:3000/api/parse \
  -H "Content-Type: application/xml" \
  -d '<!DOCTYPE test [<!ENTITY xxe SYSTEM "file:///etc/hostname">]><data>&xxe;</data>'

Expected result if properly secured: The application either rejects the document or returns <data>&xxe;</data> literally (entity not expanded).

Vulnerable result: The application returns the contents of /etc/hostname.

Test SSRF:

curl -X POST http://localhost:3000/api/parse \
  -H "Content-Type: application/xml" \
  -d '<!DOCTYPE test [<!ENTITY xxe SYSTEM "http://your-test-server:8000/xxe-test">]><data>&xxe;</data>'

Watch your test server logs. If you see a request to /xxe-test, your application is vulnerable the XML parser made an outbound HTTP request.

Test denial of service, but be careful with this:

curl -X POST http://localhost:3000/api/parse \
  -H "Content-Type: application/xml" \
  -d '<!DOCTYPE lolz [<!ENTITY lol "lol"><!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;"><!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">]><lolz>&lol3;</lolz>'

Expected result: The request is rejected or times out gracefully.

Vulnerable result: The application hangs or crashes with out-of-memory errors.

Automated testing with OWASP ZAP or Burp Suite. These tools have built-in XXE detection:

# OWASP ZAP active scan includes XXE checks
docker run -t owasp/zap2docker-stable zap-baseline.py \
  -t http://localhost:3000 \
  -r report.html

Burp Suite Professional has XXE scanning in its active scanner. It will automatically test various payloads and entity expansion techniques.

When You Can't Disable XML

Some scenarios require XML processing with external entities. Legacy system integrations where you don't control the format and the vendor requires DTD validation. Standards compliance where some industry specifications require DTD support. Content management systems where users expect to use XInclude for modular documents.

These are rare, but they exist. In these cases, you need a different approach because completely disabling external entities breaks functionality.

Use a separate, isolated service. Run XML processing in a dedicated service with no access to sensitive resources.

The Bottom Line

XXE exists because XML parsers are powerful by default and most developers don't know how to lock them down. The vulnerability isn't complex, it's a dangerous default configuration that's been around for decades.

Disable external entities in your XML parser configuration. This is non-negotiable for any parser processing untrusted input. Validate input structure before parsing using XML schemas to enforce exactly what your application expects. Treat any XML from untrusted sources as dangerous, including API inputs, file uploads, data from partners, and anything users can control. For file uploads containing XML like SVG and Office documents, sanitize or avoid parsing entirely. Add network-level controls as defense in depth by blocking access to metadata services and internal networks from application servers.

XXE is completely preventable. The vulnerability only exists when developers don't configure their parsers securely or don't realize they're processing XML. Test your applications, review your dependencies, and make secure XML parsing the default in your codebase.

Top 10 Free APIs You Need to Use in Your Projects

Iroro Chadere — Wed, 07 Aug 2024 12:40:53 +0000

Finding reliable and free APIs can be a game-changer for your development projects. Here’s a list of the top 10 free APIs that you can integrate into your applications to enhance functionality and save time:

1. OpenWeatherMap

OpenWeatherMap provides current weather data, forecasts, and historical data to millions of developers for free. It's perfect for any weather-related applications.

2. NewsAPI

NewsAPI allows you to fetch the latest headlines from various news sources worldwide. It’s great for building news aggregators or keeping your users updated with current events.

3. Unsplash API

Unsplash offers high-quality, royalty-free images. Their API allows you to access a vast library of stunning photos, perfect for enhancing your website or app's visual appeal.

4. OpenAI GPT-3

OpenAI's GPT-3 API provides access to advanced natural language processing capabilities. While it has a free tier, it’s highly versatile for building chatbots, content generation tools, and more.

5. CoinGecko

CoinGecko API provides comprehensive data on cryptocurrencies, including current prices, market data, and historical information. It’s ideal for any crypto-related projects.

6. NASA API

NASA's API offers access to a treasure trove of data, including images, videos, and information about space missions. It’s a fantastic resource for educational and space-themed projects.

7. The Cat API

The Cat API provides random pictures of cats, perfect for adding a bit of fun to your application. It’s widely used in various projects for entertainment and light-hearted content.

8. REST Countries

REST Countries API provides information about countries, including population, languages, and more. It’s useful for applications that need to display or utilize geographical data.

9. JokeAPI

JokeAPI delivers a wide range of jokes in a simple, easy-to-use format. It’s perfect for adding humor to your projects or creating entertainment apps.

10. Open Library

Open Library API allows access to a vast collection of book data. It’s excellent for building book-related applications, such as reading lists or book discovery tools.

These free APIs can significantly enhance your projects by providing valuable data and functionality without additional costs. Explore them and see how they can add value to your next development endeavor!

If you're building APIs, you'd likely want to test your APIs,document and maybe even want to Mock your APIs. With Apidog, you can do all of those at ease. With its easy to use dashboard, and fast functionality, Apidog is a good choice to consider if you're building APIs.

API Security Scanning Tools: Ensuring the Safety of Your APIs

Iroro Chadere — Thu, 25 Jul 2024 21:05:38 +0000

APIs (Application Programming Interfaces) are the invisible connectors powering modern web and mobile applications. They enable different systems to communicate and share data seamlessly, making them indispensable in today's software ecosystem. However, their pivotal role also makes them prime targets for cyberattacks. This article delves into the critical world of API security scanning tools, exploring their importance, key features, popular options, and best practices for implementation.

Why API Security is Crucial

APIs have become the backbone of modern software architectures, facilitating seamless integration and data exchange across various platforms. However, this widespread adoption has made APIs attractive targets for cybercriminals. Understanding the significance of API security involves recognizing the common threats and real-world implications of security breaches.

Injection Attacks: Malicious actors exploit vulnerabilities to inject harmful code or commands into APIs. This can lead to unauthorized access, data corruption, or system compromise.
Broken Authentication: Weak or improperly implemented authentication mechanisms can be bypassed, allowing attackers to gain unauthorized access to sensitive data.
Data Exposure: Inadequate access controls and lack of data encryption can result in sensitive information being exposed to unauthorized parties.
Rate Limiting Bypasses: Attackers may overwhelm an API with excessive requests, causing denial of service (DoS) attacks that disrupt normal operations.

Real-world incidents, such as the Facebook API data breach and the T-Mobile API vulnerability, underscore the devastating impact of security lapses. These breaches compromised millions of user records, leading to significant financial and reputational damage.

Key Features of API Security Scanning Tools

API security scanning tools are designed to identify and mitigate vulnerabilities, ensuring the safety and integrity of your APIs. Key features of these tools include:

Automated Vulnerability Detection: Continuous scanning for known vulnerabilities, misconfigurations, and security flaws helps identify and address issues promptly.
CI/CD Integration: Seamless integration with continuous integration/continuous deployment pipelines ensures that security checks are an integral part of the development process.
Comprehensive Reporting and Analytics: Detailed reports and analytics provide insights into vulnerabilities, helping developers understand and prioritize remediation efforts.
Support for Various API Types: These tools can scan REST, SOAP, GraphQL, and other API types, ensuring comprehensive security coverage.

Popular API Security Scanning Tools

Several tools have established themselves as leaders in API security scanning. Here are some of the most popular options:

OWASP ZAP: An open-source tool known for its robust vulnerability scanning capabilities.
Burp Suite: A comprehensive suite of tools for web application security testing.
Apidog: A versatile tool that offers API development, testing, and security scanning functionalities.
Postman:Beyond its API development features, Postman offers security testing functionalities.
Acunetix: A powerful web vulnerability scanner with extensive API scanning features.
Nessus: Known for its network vulnerability scanning, Nessus also supports API security testing.
APIsec: A specialized tool focused on API security with advanced scanning and reporting features.

In-Depth Tool Reviews

OWASP ZAP

Overview: The OWASP Zed Attack Proxy (ZAP) is a popular open-source security tool maintained by the Open Web Application Security Project (OWASP). It is designed to help developers and security professionals identify vulnerabilities in web applications and APIs.

Key Features:

Automated scanners for finding common vulnerabilities.
Passive and active scanning capabilities.
Integration with CI/CD pipelines.
Extensive community support and documentation.

Use Cases: Ideal for developers and security teams looking for a robust, open-source solution to integrate into their development workflows.

Pros:

Free and open-source.
Regular updates and a large community.
Comprehensive scanning capabilities.

Cons:

Can be resource-intensive.
Steeper learning curve for beginners.

Burp Suite

Overview: Burp Suite is a comprehensive platform for performing security testing of web applications. Its advanced tools and features make it a favorite among security professionals.

Key Features:

Advanced scanning for various vulnerabilities.
Intruder tool for automated customized attacks.
Extensible through plugins and integrations.
Detailed reporting and analysis.

Use Cases: Best suited for security professionals and penetration testers who need a powerful and flexible tool for in-depth security assessments.

Pros:

Highly customizable and extensible.
Powerful suite of tools for various testing needs.
Detailed and actionable reports.

Cons:

High cost for the professional version.
Requires expertise to fully utilize its capabilities.

Apidog

Overview: Apidog is a versatile tool that stands out for its comprehensive approach to API development, testing, and security. Designed with both developers and security professionals in mind, Apidog simplifies the process of creating and securing APIs, making it an invaluable asset in any API lifecycle.

Key Features:

API Design and Documentation: Apidog offers intuitive tools for designing and documenting APIs, ensuring that your API specifications are clear and comprehensive. The platform supports OpenAPI and Swagger, allowing for seamless integration with other tools and services.
Automated Security Scanning: Apidog includes robust security scanning capabilities that automatically detect vulnerabilities in your APIs. It scans for common security issues such as SQL injection, cross-site scripting (XSS), and broken authentication, providing detailed reports and remediation guidance.
Mock Server: The mock server functionality allows you to simulate API responses without needing a fully developed backend. This is particularly useful for testing and development purposes, enabling you to validate API behavior early in the development cycle.
Testing and Debugging: Apidog provides comprehensive testing tools, including automated test generation, test scripts, and debugging features. These tools help ensure that your APIs function correctly and securely before deployment.
CI/CD Integration: Apidog seamlessly integrates with popular CI/CD pipelines, enabling continuous security checks and automated testing as part of your development workflow. This integration ensures that security is a continuous process rather than an afterthought.
User-Friendly Interface: With its clean and intuitive interface, Apidog makes it easy for both beginners and experienced professionals to navigate and utilize its features effectively. Detailed documentation and tutorials further enhance the user experience.

Use Cases:

API Development Teams: Apidog is ideal for development teams that need a unified platform for designing, testing, and securing APIs. Its comprehensive feature set supports the entire API lifecycle, from conception to deployment.
Security Professionals: For security teams, Apidog provides advanced scanning and reporting tools that help identify and mitigate vulnerabilities, ensuring that APIs are secure before they go live.
DevOps Engineers: The CI/CD integration makes Apidog a valuable tool for DevOps engineers looking to incorporate security checks into their automated pipelines, ensuring continuous security monitoring and compliance.

Pros:

All-in-One Solution: Combines API design, testing, and security in a single platform, reducing the need for multiple tools.
Ease of Use: User-friendly interface and detailed documentation make it accessible to users of all skill levels.
Comprehensive Security Features: Robust security scanning and detailed reporting help ensure that APIs are secure and compliant.
Scalability: Suitable for projects of all sizes, from small startups to large enterprises.

Cons:

Newer Tool: As a relatively new tool in the market, Apidog may still be evolving, and users might encounter occasional bugs or missing features compared to more established tools.
Advanced Features: While it covers a broad range of functionalities, some highly specialized or advanced security features might not be as developed as those in dedicated security tools like Burp Suite or Nessus.

By integrating Apidog into your API development workflow, you can streamline the process of designing, testing, and securing your APIs. Its comprehensive feature set and user-friendly design make it an excellent choice for teams looking to enhance their API security without sacrificing efficiency or ease of use.

Postman

Overview: Postman is widely known for its API development and testing capabilities. It also offers security testing features that help identify and mitigate potential vulnerabilities in APIs.

Key Features:

Automated testing and monitoring.
Security scanning and vulnerability detection.
Collaboration features for teams.
Detailed reporting and analytics.

Use Cases: Ideal for development teams looking to integrate API security testing into their existing Postman workflows.

Pros:

Well-established and widely used tool.
Comprehensive API testing and security features.
Strong community and support.

Cons:

Security features are secondary to its primary development functions.
Can become complex with large-scale API projects.

Acunetix

Overview: Acunetix is a powerful web vulnerability scanner with extensive features for API security testing. It offers automated scanning and detailed reporting to help secure APIs.

Key Features:

Automated vulnerability scanning for web applications and APIs.
Detailed reports with remediation advice.
Integration with popular CI/CD tools.
Support for various API types.

Use Cases: Suitable for organizations needing a robust, automated solution for continuous API security monitoring.

Pros:

Extensive scanning capabilities.
Detailed and actionable reports.
Strong support and regular updates.

Cons:

High cost for enterprise use.
Can be resource-intensive.

Nessus

Overview: Nessus is a well-known vulnerability scanner with capabilities to test network infrastructures and APIs. It helps identify security issues across a wide range of environments.

Key Features:

Comprehensive vulnerability scanning.
Integration with CI/CD pipelines.
Detailed reporting and remediation guidance.
Support for various environments and API types.

Use Cases: Best suited for organizations needing a comprehensive vulnerability management tool that includes API security testing.

Pros:

Extensive vulnerability database.
Detailed and actionable reporting.
Strong support and regular updates.

Cons:

High cost for the professional version.
Requires expertise to fully utilize its capabilities.

APIsec

Overview: APIsec is a specialized tool focused on API security. It provides advanced scanning and reporting features to help identify and mitigate vulnerabilities in APIs.

Key Features:

Automated security testing for APIs.
Detailed reporting and analytics.
Integration with CI/CD pipelines.
Support for various API types.

Use Cases: Ideal for organizations looking for a specialized tool dedicated to API security.

Pros:

Specialized focus on API security.
Advanced scanning and reporting capabilities.
Easy integration with existing workflows.

Cons:

Higher cost compared to general vulnerability scanners.
May require expertise to fully leverage its features.

Implementing API Security Scanning in Your Workflow

Integrating API security scanning into your development workflow is crucial for maintaining the security and integrity of your APIs. Here are some best practices:

CI/CD Integration: Ensure that security scans are part of your CI/CD pipelines. This allows for continuous security checks and immediate detection of vulnerabilities during the development process.
Regular Scans: Schedule regular security scans to keep your APIs protected against new vulnerabilities and emerging threats.
Analyze and Act on Scan Results: Review scan reports thoroughly and prioritize remediation efforts based on the severity of identified vulnerabilities.
Collaboration and Documentation: Encourage collaboration between development and security teams. Maintain comprehensive documentation of your security processes and scan results.

Best Practices for API Security

Beyond using security scanning tools, adopting best practices for API security is essential. Here are some recommendations:

Secure Authentication and Authorization: Implement strong authentication mechanisms (e.g., OAuth, JWT) and enforce proper authorization checks.
Input Validation and Sanitization: Validate and sanitize all inputs to prevent injection attacks and other vulnerabilities.
Rate Limiting and Throttling: Implement rate limiting to protect your APIs from abuse and DoS attacks.
Regular Security Audits and Updates: Conduct regular security audits and keep your software and dependencies up to date to address known vulnerabilities.

Conclusion

API security scanning tools are indispensable in the modern software development landscape. They help identify and mitigate vulnerabilities, ensuring the safety and integrity of your APIs. By integrating these tools into your development workflows and adhering to best practices, you can protect your applications and data from potential threats. For further reading and resources, consider exploring documentation and community forums

My new Blog design

Iroro Chadere — Fri, 27 Oct 2023 14:10:22 +0000

Hey guys! I'm so happy to finally announce that my side blog project that I've been working on is finally out - and I'm soo loving it!

BrightsideCodes took a lot of time to make due to many reasons and a lack of motivation. But today, I'm happy to finally release it.
You can visit brightsidecodes.com and learn more about it.

if you're wondering if I'd still be here, the simple answer is yes. I'll still be here posting content.

Server-side Rendering (SSR) vs. Client-side Rendering (CSR) in React

Iroro Chadere — Wed, 30 Aug 2023 10:07:01 +0000

In the world of modern web development, choosing the right rendering approach for your React applications is crucial. Server-side Rendering (SSR) and Client-side Rendering (CSR) are two prominent methods for delivering content to users. Each approach has its own set of advantages, disadvantages, and use cases. In this article, we'll dive deep into the differences between SSR and CSR in React, exploring their benefits, drawbacks, and best practices, including how to fetch data in each approach.

Introduction

Before delving into the specifics, let's understand the fundamental concepts of SSR and CSR. Both techniques relate to how the content of a web page is generated and delivered to the user's browser.

Server-side Rendering (SSR): With SSR, the initial HTML is generated on the server and sent to the client's browser. This means that the user receives a fully-rendered page right from the start, which can improve perceived loading speed and search engine optimization (SEO).
Client-side Rendering (CSR): In CSR, the initial HTML is minimal, often containing only a loading script. The majority of the content is generated on the client side using JavaScript. This approach allows for dynamic content updates without full-page reloads, resulting in a smoother user experience.

The Differences

Initial Page Load

SSR: When a user requests a page, the server generates the complete HTML for that page, including data fetched from APIs or databases. This pre-rendered HTML is sent to the client's browser, providing a fast initial loading experience.
CSR: The initial HTML sent to the browser is minimal and includes JavaScript bundles. The page's content is fetched and rendered on the client side after JavaScript execution. This may lead to a slower initial load, especially on slower devices or connections.

SEO and Social Sharing

SSR: Search engines can easily crawl and index the content, as it's available in the initial HTML response. This can lead to better SEO and improved social sharing previews.
CSR: Search engines may have difficulties indexing dynamic content that is generated on the client side. Special techniques like server-side rendering for specific routes are often required to ensure proper indexing.

Performance

SSR: The user gets a fully-rendered page on the first load, which can result in a faster perceived performance. However, subsequent interactions might involve more round trips to the server.
CSR: While the initial load might be slower, subsequent interactions within the app can be faster, as only the necessary components are updated without full-page reloads.

Fetching Data in SSR

In SSR, fetching data is often done on the server side, as part of the initial rendering process. This data can then be included in the pre-rendered HTML that is sent to the client. Here's a simplified example of how you might fetch data in an SSR setup using React and Next.js:

// pages/index.js (Next.js page)
import React from 'react';

function HomePage({ data }) {
  return (
    <div>
      <h1>{data.title}</h1>
      <p>{data.description}</p>
    </div>
  );
}

export async function getServerSideProps() {
  const response = await fetch('https://api.example.com/data');
  const data = await response.json();

  return {
    props: {
      data,
    },
  };
}

export default HomePage;

In this example, the getServerSideProps function is called on the server side before rendering the page. It fetches data from an API and includes it in the props that are passed to the HomePage component.

Fetching Data in CSR

In CSR, data fetching typically happens on the client side, often triggered by user interactions or component lifecycle events. This allows for dynamic content updates without requiring a full page reload. Here's a basic example of data fetching in a CSR scenario using React's useEffect hook:

// components/PostList.js
import React, { useState, useEffect } from 'react';

function PostList() {
  const [posts, setPosts] = useState([]);

  useEffect(() => {
    fetch('https://api.example.com/posts')
      .then(response => response.json())
      .then(data => setPosts(data));
  }, []);

  return (
    <ul>
      {posts.map(post => (
        <li key={post.id}>{post.title}</li>
      ))}
    </ul>
  );
}

export default PostList;

In this example, the useEffect hook is used to fetch a list of posts from an API after the component mounts. The fetched data is then used to update the component's state.

Conclusion

Understanding how to fetch data in both Server-side Rendering (SSR) and Client-side Rendering (CSR) scenarios is crucial for building robust and performant React applications. SSR involves fetching data on the server side during the initial rendering, while CSR involves fetching data on the client side as part of user interactions or component lifecycle events. By choosing the appropriate approach based on your application's needs, you can ensure a smooth and efficient user experience.

How to Fix: Generating static pages (0/8)TypeError: Cannot read properties of undefined (reading 'data')

Iroro Chadere — Sun, 09 Jul 2023 11:03:28 +0000

When generating static pages in web development, developers often encounter errors that can hinder progress. One common error is the "TypeError: Cannot read properties of undefined (reading 'data')." This article aims to provide a comprehensive understanding of this error and offer effective solutions to resolve it. Through a real-life scenario and step-by-step guidance, we will explore the causes of this error and demonstrate how to fix it.

A few days ago, after I finished developing a side project I was working on, I pushed over to GitHub which is connected to Vercel. However, when Vercel started to run yarn build, I got an error: Generating static pages (0/8)TypeError: Cannot read properties of undefined (reading 'data')

What could be the problem? I asked myself. Do you know what happened next? I started to google.

Sadly, I didn't see enough material that addressed the issue, I was tired, I really wanted to build the application so I can see it live!

But hold on, why is the app working on my local machine, but not working when I run yarn build?

Understanding the Error:

The "TypeError: Cannot read properties of undefined (reading 'data')" error message typically occurs when a variable or object is accessed without being properly defined or initialized. In the context of generating static pages, this error commonly arises when there is an issue with the data being used or accessed during the page generation process.

Sound confusing? Don't be. Let's break it down.

In my case, I was trying to build a blog, and I have blog/[slug] page.

The error is always shown in the slug page, and that's because I was using getStaticPaths and I don't want all the blog posts to be rendered when the page is first rendered, after all, that'd slow down the page rendering and increase the loading time. I don't want that!

To make sure all the page static blog posts don't render when the page first loads, I did the following:

export async function getStaticPaths() {
  const client = createClient();

  const allPosts = await client.getAllByType('post');

  return {
    paths: allPosts.map((post) => post.url),
    fallback: true,
  };
}

This is where the error is, I asked the getStaticPaths to have a fallback:true.

Well, this is a good practice, I don't want to fetch all the blog posts from the server before the page loads, that will slow down the page!

But since fallback is true, I need to tell nextJs that

hey buddy, some of my blog posts are not fetched yet, so if the visitor should navigate to /blog/not-yet-found-post, don't return an error page immediately. Instead, try to run yarn build again to see if the post is on the server, if yes, render the post content, else return a 404.

Do You get the point now? The main issue in my case, I didn't ask Nextjs to first run yarn build if a visitor should navigate to /blog/not-yet-found-post.

According to next.js to resolve the issue is simply to import { useRouter } from 'next/router'; and use it to check if the page is falling back like so;

import { useRouter } from 'next/router'

function Post({ post }) {
  const router = useRouter()

  // If the page is not yet generated, this will be displayed
  // initially until getStaticProps() finishes running
  if (router.isFallback) {
    return <div>Loading...</div>
  }

  // Render post...
}

// This function gets called at build time
export async function getStaticPaths() {
  return {
    // Only `/posts/1` and `/posts/2` are generated at build time
    paths: [{ params: { id: '1' } }, { params: { id: '2' } }],
    // Enable statically generating additional pages
    // For example: `/posts/3`
    fallback: true,
  }
}

// This also gets called at build time
export async function getStaticProps({ params }) {
  // params contains the post `id`.
  // If the route is like /posts/1, then params.id is 1
  const res = await fetch(`https://.../posts/${params.id}`)
  const post = await res.json()

  // Pass post data to the page via props
  return {
    props: { post },
    // Re-generate the post at most once per second
    // if a request comes in
    revalidate: 1,
  }
}

export default Post

so by adding router.isFallback, I was able to fix the issue by first asking nextjs to in the background build the page, fetch the post the user is trying to access, and show it to them.

Conclusion

When generating static pages, encountering errors like "TypeError: Cannot read properties of undefined (reading 'data')" is not uncommon. However, with a systematic approach to understanding the error's origin and implementing the appropriate solutions, developers can overcome this issue effectively.
I hope that this blog post has helped you to understand and fix the error you're having Generating static pages (0/8)TypeError: Cannot read properties of undefined (reading 'data').

Let me know in the comments if it did help you, or you have any questions.

All the best!