DEV Community

Cover image for Secure Your Media Files by Removing Metadata with AWS Lambda
Ercan for AWS Community Builders

Posted on

Secure Your Media Files by Removing Metadata with AWS Lambda

In today’s digital world, images and videos often contain metadata that reveals a surprising amount of information about the media file. This metadata, such as EXIF data in images, can include sensitive details like location, device information, and more. To protect user privacy and enhance security, businesses in various industries can benefit from removing this metadata from media files. In this blog post, we’ll walk you through a simple AWS Lambda script that automatically removes metadata from uploaded images and videos in S3 buckets.

Industries That Can Benefit:

  1. Social Media Platforms: Social media platforms handle a massive number of media uploads every day. By removing metadata from images and videos, these platforms can better protect user privacy and minimize the risk of unintentional information leaks.
  2. E-Commerce: E-commerce websites often display user-generated content, such as product images and reviews. Stripping metadata from these media files ensures that customers’ private information is not inadvertently exposed.
  3. Healthcare: The healthcare industry deals with sensitive patient information, including images and videos from medical procedures. Removing metadata from these files is essential to comply with privacy regulations and protect patient confidentiality.
  4. News and Media: Journalists and media organizations publish images and videos that may contain sensitive information about sources or locations. Stripping metadata can help protect this information and maintain the integrity of their reporting.
  5. Education: Educational institutions often host and share various media files, such as lecture videos, research images, and student presentations. Removing metadata from these files ensures that private information about students, faculty, and research subjects is protected.

Benefits of Removing Metadata:

  1. Enhanced Privacy: Stripping metadata from media files helps protect sensitive information about users, locations, and devices, safeguarding user privacy.
  2. Security: By removing metadata, you reduce the risk of accidentally leaking sensitive information, which could be exploited by malicious actors.
  3. Compliance: Removing metadata can help organizations comply with data protection regulations, such as GDPR or HIPAA, that require safeguarding user data.
  4. Simplified Management: Automating metadata removal with AWS Lambda reduces the manual work needed to process media files, streamlining media management across your organization.
import os
import json
import boto3
from PIL import Image
from io import BytesIO
import av

def remove_exif_from_image(file_content):
    img = Image.open(BytesIO(file_content))
    img = img.convert('RGB')

    # Save the image without EXIF data
    output = BytesIO()
    img.save(output, format='JPEG', quality=95)
    output.seek(0)

    return output

def remove_metadata_from_video(file_content):
    input_container = av.open(BytesIO(file_content), mode='r')
    output = BytesIO()
    output_container = av.open(output, mode='w')

    for stream in input_container.streams:
        output_stream = output_container.add_stream(stream.codec.name, stream.rate)
        for packet in input_container.demux(stream):
            for frame in packet.decode():
                packet = output_stream.encode(frame)
                if packet:
                    output_container.mux(packet)
        output_stream.encode(None)
    output_container.close()
    output.seek(0)

    return output

def lambda_handler(event, context):
    s3 = boto3.client('s3')

    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    # Download the file
    s3_object = s3.get_object(Bucket=bucket, Key=key)
    file_content = s3_object['Body'].read()

    # Process the image or video and remove EXIF data
    if key.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff', '.webp')):
        output = remove_exif_from_image(file_content)
    elif key.lower().endswith(('.mp4', '.mkv', '.avi', '.mov', '.webm', '.flv')):
        output = remove_metadata_from_video(file_content)
    else:
        return {
            'statusCode': 400,
            'body': json.dumps('Unsupported file format.')
        }

    # Upload the processed file
    new_bucket = os.environ['NEW_BUCKET'] if os.environ.get('NEW_BUCKET') else bucket
    s3.put_object(Bucket=new_bucket, Key=key, Body=output)

    return {
        'statusCode': 200,
        'body': json.dumps('File processed and uploaded.')
    }
Enter fullscreen mode Exit fullscreen mode

Please note that the pyav library requires FFmpeg shared libraries, which may not be available in the default Lambda environment. You’ll need to create a custom Lambda layer that includes both the pyav library and FFmpeg shared libraries. You can follow the official guide to create a custom Lambda layer for FFmpeg.

Here is the Github Repository: https://github.com/flightlesstux/EXIF-Metadata-Remover

Conclusion

The AWS Lambda script we’ve provided makes it easy to remove metadata from images and videos uploaded to S3 buckets, enhancing privacy and security across a wide range of industries. By implementing this solution, you can protect user information, reduce potential risks, and ensure compliance with data protection regulations.

Top comments (0)