This week, I refactored file uploads to AWS S3 to work asynchronously using asyncio and aioboto3 for one of the existing FastAPI apps that I had deployed.
Aioboto3 is the async wrapper over boto3 which is the SDK for handling AWS S3 storage.
The largest bottleneck with the app was file uploads as users could upload a large batch of them and handling them sequentially was inefficient and not a good user experience.
Uploads being mostly I/O bound largely benefit from being done asynchronously which should prove to be efficient and low overhead compared to say multiple threads or processes.
However, it was not so simple.
Initially, I read the entire file into memory with await file.read() and wrapped it in BytesIO. However, this approach scales poorly for large files or batches. A better solution is to pass file.file (a streamed file-like object provided by FastAPI) directly to upload_fileobj, which streams efficiently using multipart uploads under the hood.
When batches of very small files were uploaded, the performance would be nearly as good as the theoretical minimum which is the duration to upload the largest file. But when the files were larger (>10MB), the performance would get worse and as they got larger I wouldn’t even see a difference in between the sync and async file uploads.
Here’s why,
Network Bandwidth - Uploading many large files asynchronously won’t necessarily improve performance if your internet upload bandwidth is the bottleneck. In such cases, the total throughput is capped by your available bandwidth. For example, if you upload five 100 MB files concurrently with a 100 Mbps (megabits per second) upload speed (which is about 12.5 MBps), the combined data transfer rate is still limited to 12.5 MB per second. So, whether you upload them concurrently or sequentially, the total time taken will be similar — you're just dividing the same bandwidth across multiple uploads.
Coroutines - By default, the number of coroutines concurrently started is the number of files that we are uploading. But this is not really safe because when the number of files and their sizes are large it would overload the RAM, disk I/O cycles, network, asyncio event loop which would make it even slower than handling each file sequentially.
To prevent overwhelming system resources, we can use a Semaphore, which limits how many operations or in this context coroutines that can run at the same time. I used asyncio.Semaphore for this.
Finding the number of coroutines for your use case is a bit of a trial and error process as it depends on your system details, bandwidth, expected file and batch sizes.
Anyway, I hope this article is useful and helps readers with implementing asynchronous code in their codebases. Please find the code attached below.
Upload coroutine
upload_sem = asyncio.Semaphore(settings.CONCURRENT_UPLOADS)
async def upload_file(file: UploadFile, aioclient):
async with upload_sem:
file_name = f"{user_id}/media/{file.filename}"
await aioclient.upload_fileobj(
file.file,
settings.AWS_S3_BUCKET_NAME,
file_name,
ExtraArgs={
"ACL": "public-read",
"ContentType": file.content_type,
},
)
return file.filename
Concurrent uploads
try:
async with aioboto_session.client(
service_name="s3",
region_name=settings.AWS_REGION,
aws_access_key_id=settings.AWS_ACCESS_KEY,
aws_secret_access_key=settings.AWS_SECRET_KEY,
) as aioclient:
result = await asyncio.gather(
*(upload_file(file, aioclient) for file in files)
)
except NoCredentialsError:
return JSONResponse(
content={"error": "AWS credentials not found"}, status_code=500
)
except Exception as e:
return JSONResponse(content={"error": str(e)}, status_code=500)
return result
Top comments (0)