When we need to upload relatively large files, we often encounter the following problems:
The upload time is quite long.
Once an error occurs during the upload process, the entire file needs to be re - uploaded.
Generally, the server sets limits on the file size.
These issues can lead to a poor user experience during the upload process. To address these problems, we can use the chunked upload method.
Principle of Chunked Upload
The principle of chunked upload is similar to cutting a large cake into small pieces.
First, we divide the large file to be uploaded into many small chunks, with each chunk having the same size. For example, each chunk can be 2MB in size. Then, we upload these chunks to the server one by one. During the upload process, we can upload multiple chunks simultaneously or one at a time. After uploading each chunk, the server will save these chunks and record their order and location information.
Once all the chunks have been uploaded, the server will stitch these chunks together in the correct order to restore the complete large file. Finally, we have successfully uploaded the entire large file.
The advantage of chunked upload is that it can reduce the risk of upload failure. If a problem occurs during the upload process, you only need to re - upload the faulty chunk, rather than the entire large file.
In addition, chunked upload can also speed up the upload process. Since we can upload multiple chunks simultaneously, we can fully utilize the network bandwidth. This enables us to complete the file upload process more quickly.
Implementation of Chunked Upload
1. Read the file
By listening to the change event of the input element, when a local file is selected, you can obtain the corresponding file in the callback function.
const handleUpload = (e: Event) => {
const files = (e.target as HTMLInputElement).files
if (!files) {
return
}
console.log(files[0]);
}
2. File Chunking
The core of file chunking is to use the slice method of the Blob object. In the previous step, the selected file we obtained is a File object, which inherits from the Blob object. Therefore, we can use the slice method to divide the file into chunks. The usage is as follows:
let blob = instanceOfBlob.slice(start?, end?, contentType?);
start and end represent the indices within the Blob, indicating the starting and ending positions of the bytes to be copied into the new Blob. contentType assigns a new document type to the new Blob, which is not needed in this case. Next, let's use the slice method to implement file chunking.
const CHUNK_SIZE = 1024 * 1024
const createFileChunks = (file: File) => {
const fileChunkList = []
let cur = 0
while (cur < file.size) {
fileChunkList.push({
file: file.slice(cur, cur + CHUNK_SIZE),
})
cur += CHUNK_SIZE
}
return fileChunkList
}
3. Hash Calculation
Let's first consider a question: When uploading files to the server, how can we distinguish different files? Can we distinguish them by file names?
The answer is no. Since we can modify file names at will, we cannot use file names for distinction. However, the content of each file is different. We can distinguish files based on their content. So, how can we do it specifically?
We can generate a unique hash value based on the file content. You may have noticed that the file names of files packaged by webpack all have a different string. This string is the hash value generated based on the file content. When the file content changes, the hash value will also change accordingly. Here, we can also use this method to distinguish different files. Moreover, through this method, we can also implement the instant upload function. How can we achieve this?
When the server processes a file upload request, it first needs to check if there is a record of the corresponding file's hash value. If A and B upload files with the same content one after another, the hash values of these two files will be the same.
When A uploads a file, a corresponding hash value will be generated based on the file content, and then there will be a corresponding file on the server. When B uploads the same file later, the server will find that the hash value of this file has been recorded before, indicating that a file with the same content has been uploaded previously. Therefore, the server does not need to process B's upload request, giving the user the impression that instant upload has been achieved.
So, how can we calculate the hash value of a file? We can use a tool called spark-md5. Therefore, we need to install it first.
In the previous step, we obtained all the chunks of the file. We can use these chunks to calculate the hash value of the file. However, if the file is extremely large, it will be very time - consuming if all the content of each chunk participates in the calculation. So, we can adopt the following strategies:
1) Include all the content of the first and the last chunks in the calculation.
2) For the remaining middle chunks, take 2 bytes from the beginning, the end, and the middle respectively for the calculation.
In this way, we can ensure that all chunks are involved in the calculation without consuming too much time.
const calculateHash = async (fileChunks: Array<{file: Blob}>) => {
return new Promise(resolve => {
const spark = new sparkMD5.ArrayBuffer()
const chunks: Blob[] = []
fileChunks.forEach((chunk, index) => {
if (index === 0 || index === fileChunks.length - 1) {
chunks.push(chunk.file)
} else {
chunks.push(chunk.file.slice(0, 2))
chunks.push(chunk.file.slice(CHUNK_SIZE / 2, CHUNK_SIZE / 2 + 2))
chunks.push(chunk.file.slice(CHUNK_SIZE - 2, CHUNK_SIZE))
}
})
const reader = new FileReader()
reader.readAsArrayBuffer(new Blob(chunks))
reader.onload = (e: Event) => {
spark.append(e?.target?.result as ArrayBuffer)
resolve(spark.end())
}
})
}
4. File Upload
Front - end Implementation
We have completed the pre - operations for uploading. Next, let's see how to upload these chunks.
Let's analyze the situation with a 1GB file. Suppose the size of each chunk is 1MB, then the total number of chunks will be 1024. If we send these 1024 chunks simultaneously, the browser definitely won't be able to handle it. The reason is that there are too many chunk files, and the browser creates too many requests at once.
This is unnecessary. Take the Chrome browser as an example. Its default concurrent request limit is only 6. Excessive requests won't improve the upload speed; instead, they will impose a huge burden on the browser. Therefore, it's necessary to limit the number of front - end requests.
So, how can we do it?
We need to create requests with the maximum number of concurrent connections, for example, 6. At the same time, we only allow the browser to send 6 requests. After one request gets a response, we initiate a new request, and so on, until all requests are sent.
When uploading files, we usually use the FormData object. We need to put the file to be transferred and additional information into this FormData object.
const uploadChunks = async (fileChunks: Array<{file: Blob}>) => {
const data = fileChunks.map(({ file }, index) => ({
fileHash: fileHash.value,
index,
chunkHash: `${fileHash.value}-${index}`,
chunk: file,
size: file.size,
}))
const formDatas = data
.map(({ chunk, chunkHash }) => {
const formData = new FormData()
formData.append('chunk', chunk)
formData.append('chunkHash', chunkHash)
formData.append('fileName', fileName.value)
formData.append('fileHash', fileHash.value)
return formData
})
let index = 0;
const max = 6;
const taskPool: any = []
while(index < formDatas.length) {
const task = fetch('http://127.0.0.1:3000/upload', {
method: 'POST',
body: formDatas[index],
})
task.then(() => {
taskPool.splice(taskPool.findIndex((item: any) => item === task))
})
taskPool.push(task);
if (taskPool.length === max) {
await Promise.race(taskPool)
}
index ++
percentage.value = (index / formDatas.length * 100).toFixed(0)
}
await Promise.all(taskPool)
}
Back - end Implementation
When handling files on the back - end, we need to use the multiparty tool. So, we first need to install it and then import it.
When processing each uploaded chunk, we should temporarily store them in a specific location on the server for easy retrieval during the merging process. To distinguish the chunks of different files, we use the hash value corresponding to the file as the name of the folder, and put all the chunks of this file into this folder.
const UPLOAD_DIR = path.resolve(__dirname, 'uploads');
app.post('/upload', async (req, res) => {
const form = new multiparty.Form();
form.parse(req, async function (err, fields, files) {
if (err) {
res.status(401).json({
ok: false,
msg: 'upload fail'
});
}
const chunkHash = fields['chunkHash'][0]
const fileName = fields['fileName'][0]
const fileHash = fields['fileHash'][0]
const chunkDir = path.resolve(UPLOAD_DIR, fileHash)
if (!fse.existsSync(chunkDir)) {
await fse.mkdirs(chunkDir)
}
const oldPath = files.chunk[0].path;
await fse.move(oldPath, path.resolve(chunkDir, chunkHash))
res.status(200).json({
ok: true,
msg: 'received file chunk'
});
});
});
5. File Merging
In the previous step, we have successfully uploaded all the file chunks to the server. After the upload is completed, we can merge all these chunks into a single, complete file. Let's implement this process below.
Front - end Implementation
The front - end only needs to send a merge request to the server. Moreover, to distinguish the file to be merged, it is necessary to pass the hash value of the file.
const mergeRequest = () => {
fetch('http://127.0.0.1:3000/merge', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
size: CHUNK_SIZE,
fileHash: fileHash.value,
fileName: fileName.value,
}),
})
.then((response) => response.json())
.then(() => {
alert('upload success')
})
}
Back - end Implementation
Previously, we have been able to upload all the file chunks to the server and store them in the corresponding directory. When merging, we need to retrieve all the chunks from the corresponding folder and then use file read - write operations to merge the files. After the merging is completed, we can name the generated file with the hash value and store it in the corresponding location.
Here is the sample code to achieve file merging:
const extractExt = filename => {
return filename.slice(filename.lastIndexOf('.'), filename.length)
}
const pipeStream = (path, writeStream) => {
return new Promise((resolve, reject) => {
const readStream = fse.createReadStream(path)
readStream.on('end', async () => {
fse.unlinkSync(path)
resolve()
})
readStream.pipe(writeStream)
})
}
async function mergeFileChunk(filePath, fileHash, size) {
const chunkDir = path.resolve(UPLOAD_DIR, fileHash)
const chunkPaths = await fse.readdir(chunkDir)
chunkPaths.sort((a, b) => {
return a.split('-')[1] - b.split('-')[1]
})
const list = chunkPaths.map((chunkPath, index) => {
return pipeStream(
path.resolve(chunkDir, chunkPath),
fse.createWriteStream(filePath, {
start: index * size,
end: (index + 1) * size
})
)
})
await Promise.all(list)
fse.rmdirSync(chunkDir)
}
app.post('/merge', async (req, res) => {
const { fileHash, fileName, size } = req.body
const filePath = path.resolve(UPLOAD_DIR, `${fileHash}${extractExt(fileName)}`)
if (fse.existsSync(filePath)) {
res.status(200).json({
ok: true,
msg: 'upload success'
});
return
}
const chunkDir = path.resolve(UPLOAD_DIR, fileHash)
if (!fse.existsSync(chunkDir)) {
res.status(200).json({
ok: false,
msg: 'upload fail, please reupload'
});
return
}
await mergeFileChunk(filePath, fileHash, size)
res.status(200).json({
ok: true,
msg: 'merge success'
});
});

Top comments (0)