Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Implementing Large File Uploads with Chunking, Hash Verification, and Resumable Transfers

Tech 3

Large file uploads can encoutner issues such as prolonged upload times, failure recovery requiring full re-uploads, and server-side size restrictions. Chunked uploading addresses these by splitting files into smaller segments, uploading them individually, and reassembling them on the server. This method reduces failure risks and can accelerate uploads through parallel transfers.

Project Setup

Frontend: Vue 3 with Vite Backend: Node.js using Express, with packages including multiparty, fs-extra, cors, body-parser, and nodemon for development.

File Reading

Listen for the change event on an input element to access selected files:

const handleFileSelect = (event) => {
  const selectedFiles = event.target.files;
  if (!selectedFiles) return;
  console.log(selectedFiles[0]);
};

Chunking Files

Use the slice method from the Blob interafce to divide files into chunks. Define a constant for chunk size (e.g., 2 MB):

const CHUNK_SIZE = 2 * 1024 * 1024;

const splitIntoChunks = (file) => {
  const chunks = [];
  let position = 0;
  while (position < file.size) {
    chunks.push({
      data: file.slice(position, position + CHUNK_SIZE),
    });
    position += CHUNK_SIZE;
  }
  return chunks;
};

Hash Calculation

Generate a unique hash based on file content using spark-md5 to differentiate files and enable instant uploads for duplicate content. Optimize by sampling parts of intermediate chunks:

import sparkMD5 from 'spark-md5';

const computeFileHash = async (chunks) => {
  return new Promise((resolve) => {
    const hasher = new sparkMD5.ArrayBuffer();
    const samples = [];

    chunks.forEach((chunk, idx) => {
      if (idx === 0 || idx === chunks.length - 1) {
        samples.push(chunk.data);
      } else {
        samples.push(chunk.data.slice(0, 2));
        samples.push(chunk.data.slice(CHUNK_SIZE / 2, CHUNK_SIZE / 2 + 2));
        samples.push(chunk.data.slice(CHUNK_SIZE - 2, CHUNK_SIZE));
      }
    });

    const reader = new FileReader();
    reader.readAsArrayBuffer(new Blob(samples));
    reader.onload = (e) => {
      hasher.append(e.target.result);
      resolve(hasher.end());
    };
  });
};

Uploading Chunks

Frontend Implementation

Limit concurrent requests to avoid browser overload. Use FormData to send chunk data and metadata:

const uploadSegments = async (chunks, fileHash, fileName) => {
  const requests = chunks.map((chunk, idx) => ({
    segmentHash: `${fileHash}-${idx}`,
    segment: chunk.data,
  }));

  const forms = requests.map(({ segment, segmentHash }) => {
    const form = new FormData();
    form.append('segment', segment);
    form.append('segmentHash', segmentHash);
    form.append('fileName', fileName);
    form.append('fileHash', fileHash);
    return form;
  });

  let currentIndex = 0;
  const maxConcurrent = 6;
  const activeRequests = [];

  while (currentIndex < forms.length) {
    const request = fetch('http://localhost:3000/upload', {
      method: 'POST',
      body: forms[currentIndex],
    });

    request.then(() => {
      const index = activeRequests.indexOf(request);
      if (index > -1) activeRequests.splice(index, 1);
    });
    activeRequests.push(request);

    if (activeRequests.length >= maxConcurrent) {
      await Promise.race(activeRequests);
    }
    currentIndex++;
  }
  await Promise.all(activeRequests);
};

Backend Implementation

Store chunks temporarily in a directory named after the file hash:

const UPLOAD_PATH = path.resolve(__dirname, 'uploads');

app.post('/upload', async (req, res) => {
  const form = new multiparty.Form();
  form.parse(req, async (err, fields, files) => {
    if (err) {
      return res.status(400).json({ success: false, message: 'Upload failed' });
    }
    const segmentHash = fields.segmentHash[0];
    const fileHash = fields.fileHash[0];
    const segmentDir = path.resolve(UPLOAD_PATH, fileHash);

    if (!fse.existsSync(segmentDir)) {
      await fse.mkdirs(segmentDir);
    }

    const tempPath = files.segment[0].path;
    await fse.move(tempPath, path.resolve(segmentDir, segmentHash));
    res.json({ success: true, message: 'Chunk received' });
  });
});

Merging Chunks

Frontend Implementation

Send a merge request after all chunks are uploaded:

const requestMerge = (fileHash, fileName, chunkSize) => {
  fetch('http://localhost:3000/merge', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      fileHash,
      fileName,
      chunkSize,
    }),
  })
    .then((res) => res.json())
    .then(() => alert('Upload complete'));
};

Backend Implementation

Read and concatenate chunks in order, then delete the temporary directory:

const getExtension = (filename) => filename.slice(filename.lastIndexOf('.'));

const mergeChunks = async (filePath, fileHash, chunkSize) => {
  const segmentDir = path.resolve(UPLOAD_PATH, fileHash);
  const segments = await fse.readdir(segmentDir);
  segments.sort((a, b) => parseInt(a.split('-')[1]) - parseInt(b.split('-')[1]));

  const mergeTasks = segments.map((segment, idx) => {
    return new Promise((resolve) => {
      const readStream = fse.createReadStream(path.resolve(segmentDir, segment));
      const writeStream = fse.createWriteStream(filePath, {
        start: idx * chunkSize,
        end: (idx + 1) * chunkSize,
      });
      readStream.on('end', () => {
        fse.unlinkSync(path.resolve(segmentDir, segment));
        resolve();
      });
      readStream.pipe(writeStream);
    });
  });

  await Promise.all(mergeTasks);
  fse.rmdirSync(segmentDir);
};

app.post('/merge', async (req, res) => {
  const { fileHash, fileName, chunkSize } = req.body;
  const finalPath = path.resolve(UPLOAD_PATH, `${fileHash}${getExtension(fileName)}`);
  if (fse.existsSync(finalPath)) {
    return res.json({ success: true, message: 'File already exists' });
  }
  const segmentDir = path.resolve(UPLOAD_PATH, fileHash);
  if (!fse.existsSync(segmentDir)) {
    return res.status(400).json({ success: false, message: 'No chunks to merge' });
  }
  await mergeChunks(finalPath, fileHash, chunkSize);
  res.json({ success: true, message: 'Merge successful' });
});

Instant Upload and Resumable Transfers

Frontend Implementation

Verify file existence and uploaded chunks before uploading:

const checkUploadStatus = async (fileHash, fileName) => {
  const response = await fetch('http://localhost:3000/verify', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ fileHash, fileName }),
  });
  return response.json();
};

const handleUpload = async (event) => {
  const file = event.target.files[0];
  const chunks = splitIntoChunks(file);
  const fileHash = await computeFileHash(chunks);
  const { data } = await checkUploadStatus(fileHash, file.name);

  if (!data.shouldUpload) {
    alert('Instant upload: File already exists');
    return;
  }

  const filteredChunks = chunks.filter((_, idx) => {
    return !data.uploadedList.includes(`${fileHash}-${idx}`);
  });
  await uploadSegments(filteredChunks, fileHash, file.name);
  requestMerge(fileHash, file.name, CHUNK_SIZE);
};

Backend Implementation

Check for existing files and list uploaded chunks:

const listUploadedSegments = async (fileHash) => {
  const segmentDir = path.resolve(UPLOAD_PATH, fileHash);
  return fse.existsSync(segmentDir) ? await fse.readdir(segmentDir) : [];
};

app.post('/verify', async (req, res) => {
  const { fileHash, fileName } = req.body;
  const finalPath = path.resolve(UPLOAD_PATH, `${fileHash}${getExtension(fileName)}`);
  if (fse.existsSync(finalPath)) {
    return res.json({ data: { shouldUpload: false } });
  }
  const uploadedSegments = await listUploadedSegments(fileHash);
  res.json({ data: { shouldUpload: true, uploadedList: uploadedSegments } });
});

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.