Implementing Large File Uploads with Chunking, Hash Verification, and Resumable Transfers
Large file uploads can encoutner issues such as prolonged upload times, failure recovery requiring full re-uploads, and server-side size restrictions. Chunked uploading addresses these by splitting files into smaller segments, uploading them individually, and reassembling them on the server. This method reduces failure risks and can accelerate uploads through parallel transfers.
Project Setup
Frontend: Vue 3 with Vite Backend: Node.js using Express, with packages including multiparty, fs-extra, cors, body-parser, and nodemon for development.
File Reading
Listen for the change event on an input element to access selected files:
const handleFileSelect = (event) => {
const selectedFiles = event.target.files;
if (!selectedFiles) return;
console.log(selectedFiles[0]);
};
Chunking Files
Use the slice method from the Blob interafce to divide files into chunks. Define a constant for chunk size (e.g., 2 MB):
const CHUNK_SIZE = 2 * 1024 * 1024;
const splitIntoChunks = (file) => {
const chunks = [];
let position = 0;
while (position < file.size) {
chunks.push({
data: file.slice(position, position + CHUNK_SIZE),
});
position += CHUNK_SIZE;
}
return chunks;
};
Hash Calculation
Generate a unique hash based on file content using spark-md5 to differentiate files and enable instant uploads for duplicate content. Optimize by sampling parts of intermediate chunks:
import sparkMD5 from 'spark-md5';
const computeFileHash = async (chunks) => {
return new Promise((resolve) => {
const hasher = new sparkMD5.ArrayBuffer();
const samples = [];
chunks.forEach((chunk, idx) => {
if (idx === 0 || idx === chunks.length - 1) {
samples.push(chunk.data);
} else {
samples.push(chunk.data.slice(0, 2));
samples.push(chunk.data.slice(CHUNK_SIZE / 2, CHUNK_SIZE / 2 + 2));
samples.push(chunk.data.slice(CHUNK_SIZE - 2, CHUNK_SIZE));
}
});
const reader = new FileReader();
reader.readAsArrayBuffer(new Blob(samples));
reader.onload = (e) => {
hasher.append(e.target.result);
resolve(hasher.end());
};
});
};
Uploading Chunks
Frontend Implementation
Limit concurrent requests to avoid browser overload. Use FormData to send chunk data and metadata:
const uploadSegments = async (chunks, fileHash, fileName) => {
const requests = chunks.map((chunk, idx) => ({
segmentHash: `${fileHash}-${idx}`,
segment: chunk.data,
}));
const forms = requests.map(({ segment, segmentHash }) => {
const form = new FormData();
form.append('segment', segment);
form.append('segmentHash', segmentHash);
form.append('fileName', fileName);
form.append('fileHash', fileHash);
return form;
});
let currentIndex = 0;
const maxConcurrent = 6;
const activeRequests = [];
while (currentIndex < forms.length) {
const request = fetch('http://localhost:3000/upload', {
method: 'POST',
body: forms[currentIndex],
});
request.then(() => {
const index = activeRequests.indexOf(request);
if (index > -1) activeRequests.splice(index, 1);
});
activeRequests.push(request);
if (activeRequests.length >= maxConcurrent) {
await Promise.race(activeRequests);
}
currentIndex++;
}
await Promise.all(activeRequests);
};
Backend Implementation
Store chunks temporarily in a directory named after the file hash:
const UPLOAD_PATH = path.resolve(__dirname, 'uploads');
app.post('/upload', async (req, res) => {
const form = new multiparty.Form();
form.parse(req, async (err, fields, files) => {
if (err) {
return res.status(400).json({ success: false, message: 'Upload failed' });
}
const segmentHash = fields.segmentHash[0];
const fileHash = fields.fileHash[0];
const segmentDir = path.resolve(UPLOAD_PATH, fileHash);
if (!fse.existsSync(segmentDir)) {
await fse.mkdirs(segmentDir);
}
const tempPath = files.segment[0].path;
await fse.move(tempPath, path.resolve(segmentDir, segmentHash));
res.json({ success: true, message: 'Chunk received' });
});
});
Merging Chunks
Frontend Implementation
Send a merge request after all chunks are uploaded:
const requestMerge = (fileHash, fileName, chunkSize) => {
fetch('http://localhost:3000/merge', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
fileHash,
fileName,
chunkSize,
}),
})
.then((res) => res.json())
.then(() => alert('Upload complete'));
};
Backend Implementation
Read and concatenate chunks in order, then delete the temporary directory:
const getExtension = (filename) => filename.slice(filename.lastIndexOf('.'));
const mergeChunks = async (filePath, fileHash, chunkSize) => {
const segmentDir = path.resolve(UPLOAD_PATH, fileHash);
const segments = await fse.readdir(segmentDir);
segments.sort((a, b) => parseInt(a.split('-')[1]) - parseInt(b.split('-')[1]));
const mergeTasks = segments.map((segment, idx) => {
return new Promise((resolve) => {
const readStream = fse.createReadStream(path.resolve(segmentDir, segment));
const writeStream = fse.createWriteStream(filePath, {
start: idx * chunkSize,
end: (idx + 1) * chunkSize,
});
readStream.on('end', () => {
fse.unlinkSync(path.resolve(segmentDir, segment));
resolve();
});
readStream.pipe(writeStream);
});
});
await Promise.all(mergeTasks);
fse.rmdirSync(segmentDir);
};
app.post('/merge', async (req, res) => {
const { fileHash, fileName, chunkSize } = req.body;
const finalPath = path.resolve(UPLOAD_PATH, `${fileHash}${getExtension(fileName)}`);
if (fse.existsSync(finalPath)) {
return res.json({ success: true, message: 'File already exists' });
}
const segmentDir = path.resolve(UPLOAD_PATH, fileHash);
if (!fse.existsSync(segmentDir)) {
return res.status(400).json({ success: false, message: 'No chunks to merge' });
}
await mergeChunks(finalPath, fileHash, chunkSize);
res.json({ success: true, message: 'Merge successful' });
});
Instant Upload and Resumable Transfers
Frontend Implementation
Verify file existence and uploaded chunks before uploading:
const checkUploadStatus = async (fileHash, fileName) => {
const response = await fetch('http://localhost:3000/verify', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ fileHash, fileName }),
});
return response.json();
};
const handleUpload = async (event) => {
const file = event.target.files[0];
const chunks = splitIntoChunks(file);
const fileHash = await computeFileHash(chunks);
const { data } = await checkUploadStatus(fileHash, file.name);
if (!data.shouldUpload) {
alert('Instant upload: File already exists');
return;
}
const filteredChunks = chunks.filter((_, idx) => {
return !data.uploadedList.includes(`${fileHash}-${idx}`);
});
await uploadSegments(filteredChunks, fileHash, file.name);
requestMerge(fileHash, file.name, CHUNK_SIZE);
};
Backend Implementation
Check for existing files and list uploaded chunks:
const listUploadedSegments = async (fileHash) => {
const segmentDir = path.resolve(UPLOAD_PATH, fileHash);
return fse.existsSync(segmentDir) ? await fse.readdir(segmentDir) : [];
};
app.post('/verify', async (req, res) => {
const { fileHash, fileName } = req.body;
const finalPath = path.resolve(UPLOAD_PATH, `${fileHash}${getExtension(fileName)}`);
if (fse.existsSync(finalPath)) {
return res.json({ data: { shouldUpload: false } });
}
const uploadedSegments = await listUploadedSegments(fileHash);
res.json({ data: { shouldUpload: true, uploadedList: uploadedSegments } });
});