Your application must accept user‑uploaded video files (average size ≈ 150 MB, peak size ≈ 1 GB) and store them in Amazon S3.
Requirements:
- Direct client upload (no server‑side proxy) with minimal latency.
- Server‑side validation of file type and size before the object becomes publicly accessible.
- Automatic virus scanning after upload.
- Retention policy: keep objects for 90 days, then archive to Glacier Deep Archive.
- Access control: only authenticated users can read their own files; no public read access.
Design the S3 architecture and supporting AWS services to satisfy these constraints. Include the steps a client follows, the AWS resources you would configure (cost, latency, complexity).
1. High‑Level Flow (client‑side)
-
Client requests a pre‑signed PUT URL from your backend API (e.g.,
/upload-url?filename=video.mp4). - Backend generates a pre‑signed URL using
s3:PutObjectpermission, limited to the specific bucket, key prefix (e.g.,uploads/{userId}/{uuid}.mp4), and a short expiration (5 min). - Client uploads the file directly to S3 via HTTP PUT using the URL.
- S3 triggers an ObjectCreated event → Lambda validation & scanning workflow.
- If validation passes, Lambda moves the object to the final location (
private/{userId}/{uuid}.mp4) and updates a DynamoDB record that tracks ownership and status. - If validation fails, Lambda deletes the object and optionally notifies the user.
2. Required AWS Resources
| Resource | Purpose | Key Configuration |
|---|---|---|
S3 Bucket (my‑media‑bucket) |
Store raw uploads and final objects | - Block public access (Bucket Policy). - Enable Object Lock (optional) for tamper‑evidence. - Enable Versioning (helps with accidental deletes). |
| IAM Role for Backend API | Generate pre‑signed URLs | Policy: s3:PutObject on uploads/${userId}/* with condition s3:x-amz-content-sha256 optional. |
Lambda Function (ValidateAndScan) |
Validate MIME type, size, run antivirus, move object | - Triggered by S3 ObjectCreated on uploads/ prefix.- Permissions: s3:GetObject, s3:PutObject, s3:DeleteObject, dynamodb:PutItem.- Timeout: up to 15 min (max for scanning large files). |
| Amazon GuardDuty / Amazon Macie (optional) | Continuous threat detection on bucket | Can be enabled at bucket level; not required for per‑file scan. |
| Amazon Inspector / ClamAV (via Lambda) | Virus scanning | Deploy ClamAV database in the Lambda layer; update daily via a scheduled Lambda. |
DynamoDB Table (UserFiles) |
Metadata: owner, status, S3 key, timestamps | Primary key: fileId (UUID). Sort key optional for user partitioning. |
| S3 Lifecycle Policy | 90‑day transition → Glacier Deep Archive, then expiration | - Rule 1: Transition after 90 days to GLACIER_DEEP_ARCHIVE.- Rule 2: Expiration after 7 years (or as required). |
| API Gateway / ALB | Expose /upload-url endpoint (and optional download endpoint) |
Use IAM authorizer or Cognito for user authentication. |
| CloudWatch Alarms | Monitor failed validations, scan errors | Metrics: LambdaErrors, S3ObjectCreated, ValidationFailures. |
3. Detailed Lambda Validation & Scanning Logic (pseudo‑code)
func handler(ctx context.Context, s3Event events.S3Event) error {
for _, record := range s3Event.Records {
bucket := record.S3.Bucket.Name
key := record.S3.Object.Key
// 1. Get object metadata (size, content‑type)
head, err := s3Client.HeadObject(ctx, &s3.HeadObjectInput{Bucket:&bucket, Key:&key})
if err != nil { return err }
// 2. Size check
if head.ContentLength > 1<<30 { // >1 GB
deleteObject(bucket, key)
notifyUser("File too large")
continue
}
// 3. Content‑type whitelist (e.g., video/mp4, video/webm)
if !allowedMime(head.ContentType) {
deleteObject(bucket, key)
notifyUser("Unsupported file type")
continue
}
// 4. Download to /tmp (max 512 MB per Lambda, so for >512 MB use S3 Select or multipart copy)
// For simplicity assume <512 MB; larger files can be scanned via S3 Object Lambda (advanced).
// 5. Virus scan with ClamAV
infected, err := scanWithClamAV(tmpPath)
if err != nil || infected {
deleteObject(bucket, key)
notifyUser("File failed virus scan")
continue
}
// 6. Move to final location (copy + delete)
destKey := fmt.Sprintf("private/%s/%s", userIDFromKey(key), uuid.New())
copyObject(bucket, key, destKey)
deleteObject(bucket, key)
// 7. Record metadata in DynamoDB
putItem := dynamodb.PutItemInput{
TableName: aws.String("UserFiles"),
Item: map[string]types.AttributeValue{
"fileId": &types.AttributeValueMemberS{Value: uuid.NewString()},
"userId": &types.AttributeValueMemberS{Value: userIDFromKey(key)},
"s3Key": &types.AttributeValueMemberS{Value: destKey},
"status": &types.AttributeValueMemberS{Value: "READY"},
"uploaded": &types.AttributeValueMemberS{Value: time.Now().Format(time.RFC3339)},
},
}
_, err = dynamoClient.PutItem(ctx, &putItem)
if err != nil { return err }
}
return nil
}
4. Summary of Steps for a Client
- Authenticate (Cognito, JWT, etc.).
- POST /upload‑url with filename & size → receive pre‑signed PUT URL.
- PUT the file directly to S3 using the URL.
- Poll (or receive a webhook) for upload status (metadata stored in DynamoDB).
- GET the file via a signed download URL generated by the backend (or via API that checks ownership).
This architecture delivers low‑latency, secure uploads, enforces validation and virus scanning, automatically manages retention, and isolates each user’s data while keeping operational costs reasonable.
Top comments (0)