Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

HTTP File Upload Internals and Encodings Explained

Tech 2

HTTP uploads place data in the request body. How that body is encoded depends on the Content-Type chosen by the client (usually a browser). File inputs in forms are transmitted using multipart/form-data; simple text-only forms often use application/x-www-form-urlencoded. A request’s start-line and headers are followed by a blank line, then the body that carries fields and/or file bytes.

Encodings for form submission

  • application/x-www-form-urlencoded: key=value pairs joined with &. Non-ASCII bytes are percent-encoded. Not suitable for sending file bytes.
  • multipart/form-data: the body is split into parts separated by a unique boundary string. Each part carries its own headers (e.g., Content-Disposition, Content-Type) and the raw bytes for that field. This is used for file inputs.
  • text/plain: rarely used; ambiguous and not reliable for production.

Hands-on inspection with a local echo

  1. Minimal test page (multipart/form-data):
<!doctype html>
<meta charset="utf-8">
<title>Upload probe</title>
<form action="http://localhost:8000" method="post" enctype="multipart/form-data">
  <p><input type="text" name="alpha" value="x ζ y">
  <p><input type="file" name="doc1">
  <p><input type="file" name="doc2">
  <p><button>Send</button>
</form>
  1. Create sample files:
echo 'Plain content A.' > one.txt
echo '<h1>Markup B</h1>' > two.html
# 3 bytes: 'A', 0x00, 'Z'
printf 'A\x00Z' > bin.dat
  1. Run a simple echo server (one connection per run):
nc -l 8000
  1. Submit the form in a browser and observe the raw request in the terminal. A typical multipart/form-data request looks like:
POST / HTTP/1.1
Host: localhost:8000
Content-Type: multipart/form-data; boundary=----mP9XxwJ2ae1Q0pQyJYyB
Content-Length: 812

------mP9XxwJ2ae1Q0pQyJYyB
Content-Disposition: form-data; name="alpha"

x ζ y
------mP9XxwJ2ae1Q0pQyJYyB
Content-Disposition: form-data; name="doc1"; filename="one.txt"
Content-Type: text/plain

Plain content A.
------mP9XxwJ2ae1Q0pQyJYyB
Content-Disposition: form-data; name="doc2"; filename="bin.dat"
Content-Type: application/octet-stream

A\x00Z (raw bytes here)
------mP9XxwJ2ae1Q0pQyJYyB--

Key observations for multipart/form-data

  • Boundary: The boundary string after Content-Type separates parts. The browser selects a value unlikely to occur in payloads.
  • Part headers: Each part starts with headers. Content-Disposition includes name and, for files, filename. A file part can carry a part-specific Content-Type.
  • Bytes are unencoded: Inside a part’s body, data is transmitted verbatim (binary-safe). The server reads untill the next boundary sequence.

Comparing with application/x-www-form-urlencoded Change the form to use application/x-www-form-urlencoded and resubmit:

<form action="http://localhost:8000" method="post" enctype="application/x-www-form-urlencoded">
  <input type="text" name="alpha" value="x ζ y">
  <input type="file" name="doc1">
  <button>Send</button>
</form>

Typical result captured by nc:

POST / HTTP/1.1
Host: localhost:8000
Content-Type: application/x-www-form-urlencoded
Content-Length: 33

alpha=x+%CE%B6+y&doc1=one.txt
  • Only field names and simple value are sent. File input contents are not transmitted; historically some UAs sent only the basename. Non-ASCII characters are percent-ancoded (UTF-8 → %HH sequences). This encoding is not suitable for files.

Sending a file as the entire request body (no form) When you want to upload exactly one file without form fields, place the file bytes directly in the HTTP body and set an appropriate Content-Type. Content-Disposition may be supplied for a filename hint, though servers often rely on URL or other metadata.

Fetch example:

async function uploadBlob(file, url) {
  const headers = new Headers();
  headers.set('Content-Type', file.type || 'application/octet-stream');
  headers.set('Content-Disposition', `attachment; filename="${file.name}"`);

  const res = await fetch(url, { method: 'POST', headers, body: file });
  if (!res.ok) throw new Error(`Upload failed: ${res.status}`);
}

XMLHttpRequest variant:

function putFile(file, target) {
  const xhr = new XMLHttpRequest();
  xhr.open('POST', target, true);
  xhr.setRequestHeader('Content-Type', file.type || 'application/octet-stream');
  xhr.setRequestHeader('Content-Disposition', `attachment; filename="${file.name}"`);
  xhr.send(file);
}

Minimal server-side inspection (Java) The snippet below accepts one connection, dumps the raw HTTP request to stdout as text and hex. It uses a small buffer and prints headers and body bytes. Adjust port as needed.

import java.io.*;
import java.net.*;
import java.nio.charset.StandardCharsets;

public class DumpHttpOnce {
  public static void main(String[] args) throws IOException {
    try (ServerSocket server = new ServerSocket(8081);
         Socket client = server.accept();
         InputStream in = new BufferedInputStream(client.getInputStream())) {

      ByteArrayOutputStream buf = new ByteArrayOutputStream();
      byte[] tmp = new byte[4096];
      int r;
      while ((r = in.read(tmp)) != -1) {
        buf.write(tmp, 0, r);
        if (in.available() == 0) break; // simplistic: stop when idle
      }
      byte[] data = buf.toByteArray();

      // Split headers and body on CRLFCRLF
      String all = new String(data, StandardCharsets.ISO_8859_1);
      int sep = all.indexOf("\r\n\r\n");
      if (sep < 0) sep = data.length;
      String headers = all.substring(0, sep);
      System.out.println(headers);
      System.out.println();
      System.out.println("-- BODY (hex dump) --");
      for (int i = sep + 4; i < data.length; i++) {
        System.out.printf("%02X ", data[i] & 0xFF);
        if (((i - (sep + 4) + 1) % 16) == 0) System.out.println();
      }
      System.out.println();
    }
  }
}

Uploading a small test file with the earlier HTML form produces output similar to:

POST / HTTP/1.1
Host: localhost:8081
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryn3pU7xLw3f9CqB9H
Content-Length: 196

-- BODY (hex dump) --
2D 2D 2D 2D 2D 57 65 62 4B 69 74 46 6F 72 6D 42 ...

Request anatomy and file metadata

  • Start-line and headers precede a blank line.
  • The body follows; its framing is defined by Content-Length or Transfer-Encoding: chunked.
  • For multipart/form-data:
    • Each part begins with headers, commonly:
      • Content-Disposition: form-data; name="..."; filename="..." (filename is present to file parts)
      • Content-Type: per-part media type (e.g., image/png). If omitted, servers may treat as application/octet-stream.
    • The part body is the raw field value or file bytes.
    • The closing delimiter ends with -- after the boundary.

MIME type and filenaem handling

  • Browsers typically derive the per-part Content-Type from the selected file’s extension and/or OS-provided type mapping; they may fall back to application/octet-stream.
  • The filename appears in the part’s Content-Disposition; servers should not rely solely on it for storage paths and must sanitize it.

Why multipart for files

  • Binary-safe: no percent-encoding overhead; bytes are transmitted as-is.
  • Structured: multiple fields and files in one request, each self-described.
  • application/x-www-form-urlencoded inflates non-ASCII bytes and cannot carry binary content reliably, making it unsuitable for file uploads.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.