DynamoDB has a 400KB item limit. That's fine for most data. But sometimes you need to store something bigger - a PDF, an image, a JSON blob that grew too large.
The usual solution? Store the file in S3, save the metadata in DynamoDB. It's a common pattern. But it takes work.
The manual way
Here's what you normally do:
- Upload the file to S3
- Get the bucket and key back
- Save those in DynamoDB
- When reading, fetch the S3 metadata
- Download from S3 if you need the content
- When deleting, remove from both places
- Handle errors if one succeeds and the other fails
That's a lot of code for something that should be simple. And you have to do it every time.
# The manual way - lots of code
import boto3
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('documents')
# Upload to S3
s3.put_object(
Bucket='my-bucket',
Key=f'documents/{doc_id}/file.pdf',
Body=file_content,
ContentType='application/pdf'
)
# Save metadata to DynamoDB
table.put_item(Item={
'pk': f'DOC#{doc_id}',
'sk': 'METADATA',
's3_bucket': 'my-bucket',
's3_key': f'documents/{doc_id}/file.pdf',
'content_type': 'application/pdf',
'size': len(file_content),
})
# Later, to read...
response = table.get_item(Key={'pk': f'DOC#{doc_id}', 'sk': 'METADATA'})
item = response['Item']
s3_response = s3.get_object(Bucket=item['s3_bucket'], Key=item['s3_key'])
content = s3_response['Body'].read()
# And to delete...
table.delete_item(Key={'pk': f'DOC#{doc_id}', 'sk': 'METADATA'})
s3.delete_object(Bucket='my-bucket', Key=f'documents/{doc_id}/file.pdf')
It works. But it's a lot of code for something simple. And in pure Python, all that serialization and network handling adds up - especially in Lambda where you pay for every millisecond.
A better way
pydynox is a DynamoDB library with a Rust core. If you haven't heard of it, check out my intro post.
It has an S3Attribute that handles all of this. You define it once, and the library takes care of uploads, downloads, and cleanup.
from pydynox import Model, ModelConfig
from pydynox.attributes import StringAttribute, S3Attribute
class Document(Model):
model_config = ModelConfig(table="documents")
pk = StringAttribute(hash_key=True)
sk = StringAttribute(range_key=True)
title = StringAttribute()
file = S3Attribute(bucket="my-bucket")
Now saving a file is one line:
from pydynox import S3File
doc = Document(
pk="DOC#123",
sk="METADATA",
title="Contract",
file=S3File(data=pdf_bytes, content_type="application/pdf"),
)
doc.save() # Uploads to S3, saves metadata to DynamoDB
Reading is just as simple:
doc = Document.get(pk="DOC#123", sk="METADATA")
# Get the S3 metadata
print(doc.file.bucket) # my-bucket
print(doc.file.key) # documents/DOC#123/file
print(doc.file.size) # 1234567
# Download the content when you need it
content = doc.file.download()
# Or get a presigned URL for direct access
url = doc.file.presigned_url(expires_in=3600)
Deleting cleans up both places:
doc.delete() # Removes from DynamoDB AND S3
How it works
When you call save():
- pydynox uploads the file to S3
- Stores the S3 metadata (bucket, key, size, etag) in DynamoDB
- If the upload fails, nothing is saved
When you call delete():
- Deletes from DynamoDB first
- Then deletes from S3
- If S3 delete fails, the DynamoDB record is already gone (orphaned S3 objects can be cleaned up with lifecycle rules)
The S3 key is built from the partition key + the filename you pass in S3File. You can also set a prefix:
# Key will be: uploads/documents/{pk}/{sk}/report.pdf
file = S3Attribute(bucket="my-bucket", prefix="uploads/documents/")
# When saving:
doc.file = S3File(data=pdf_bytes, name="report.pdf")
Async works too
If you're using async:
doc = Document(
pk="DOC#123",
sk="METADATA",
title="Contract",
file=S3File(data=pdf_bytes),
)
await doc.async_save()
# Later
doc = await Document.async_get(pk="DOC#123", sk="METADATA")
content = await doc.file.async_download()
When to use this
Use S3Attribute when:
- Your data might exceed 400KB
- You're storing files (PDFs, images, JSON blobs)
- You want presigned URLs for direct downloads
- You don't want to manage S3 uploads manually
Don't use it when:
- Your data is always small (just use a regular attribute)
- You need complex S3 features (versioning, lifecycle rules on specific objects)
The pattern is common
This isn't a new idea. boto3 lets you do this since forever - upload to S3, save metadata to DynamoDB. The difference is doing it automatically instead of writing the same code over and over.
One attribute. One line to save. One line to delete. The library handles the rest.
Links
- pydynox: https://github.com/leandrodamascena/pydynox
- S3Attribute docs: https://leandrodamascena.github.io/pydynox/guides/s3-attribute/
- Install:
pip install pydynox
Top comments (0)