Alexander Smirnoff

Posted on Apr 13 • Edited on Apr 17

Amazon S3 Files: from Kafka to S3 via NFS

#architecture #aws #dataengineering #tutorial

AWS recently announced Amazon S3 Files and the obvious question was whether it actually makes day-to-day code simpler.

The interesting part is not the marketing line. The interesting part is this: can I take Kafka events, write them with normal filesystem calls like open() or write_bytes(), and still end up with objects in S3 without pushing SDK calls into the application code?

I built a small demo to check exactly that. The setup is simple: Confluent Cloud Kafka, AWS Lambda, AWS CDK, and an Amazon S3 Files mount. This page shows how the demo is wired, where CDK is still rough, and what the read cache looks like when you run the same workload twice.

What matters here

Amazon S3

S3 is still just object storage. It is durable, cheap, and scales well, but objects are not files. You do not get normal filesystem behavior. You cannot treat a bucket like a mounted directory and expect things like os.listdir(), append-style writes, or file-style access patterns to behave the way they would on a POSIX filesystem.

For a lot of workloads that is fine. For some, it is just friction.

Amazon EFS and the caching layer

Amazon EFS gives you a shared POSIX-compatible filesystem over NFS. Lambda, EC2, and ECS can all mount it and use plain filesystem calls.

The useful bit in this demo is the cache behavior. If a file is not already warm, EFS may need to pull it from backing storage first. Once it is in the SSD-backed cache, later reads are much faster.

Amazon S3 Files

Amazon S3 Files sits between those two worlds. You mount it over NFS, but the backing store is an S3 bucket. From the application point of view, you write files. Under the hood, those files are synced to S3. When you read them back, the first access may come from S3 and later accesses can come from the EFS cache.

That gives you a pretty nice mix:

S3 pricing and durability
normal filesystem calls instead of SDK-heavy application code
compatibility with the rest of the S3 ecosystem
cached reads for data that gets reused

There is one behavior that matters a lot: the cache import threshold. You can define rules per prefix so only files below a configured size are imported into the fast cache. Files above that threshold are still in S3, but they are not treated the same way on read. That threshold was the main thing I wanted to test.

What the demo does

The Confluent Cloud topic gets two kinds of synthetic events:

Class	Size	S3 Files prefix	Cache rule
`small`	~4 KB	`small/`	`sizeLessThan: 131072` (128 KB) → cached
`large`	~920 KB	`large/`	`sizeLessThan: 2097152` (2 MB) → cached

A consumer Lambda is triggered by Lambda Event Source Mapping (ESM) and writes each record to the S3 Files NFS mount at /mnt/events. A replayer Lambda, invoked on-demand, walks the mount, reads every file, and re-publishes events to a replay Kafka topic. It also returns the average read latency so the cache effect is visible.

The simplest experiment is to run the replayer twice against the same prefix:

first run -> cold read, files are fetched from S3
second run -> warm read, files come from cache

That is enough to see whether S3 Files behaves the way the docs imply.

Architecture

Infrastructure with AWS CDK

The stack is CDK v2 in TypeScript. The main limitation right now is that there are no L2 constructs for S3 Files yet. CloudFormation already has the resource types, so the practical approach is to use L1 CfnResource for the S3 Files pieces and move on.

The S3 bucket

S3 Files requires versioning. The rest is ordinary bucket setup.

const eventsBucket = new s3.Bucket(this, "EventsBucket", {
  encryption: s3.BucketEncryption.S3_MANAGED,
  blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
  versioned: true,           // required by S3 Files
  removalPolicy: RemovalPolicy.DESTROY,
  autoDeleteObjects: true,
  lifecycleRules: [{ expiration: Duration.days(90) }],
});

The S3 Files service role

S3 Files needs a role so it can read and write the bucket on your behalf. The trust relationship is a bit odd the first time you see it, so it is worth calling out.

const s3FilesServiceRole = new iam.Role(this, "S3FilesServiceRole", {
  assumedBy: new iam.ServicePrincipal("elasticfilesystem.amazonaws.com", {
    conditions: {
      StringEquals: { "aws:SourceAccount": this.account },
      ArnLike: {
        "aws:SourceArn": `arn:aws:s3files:${this.region}:${this.account}:file-system/*`,
      },
    },
  }),
});
s3FilesServiceRole.addToPrincipalPolicy(
  new iam.PolicyStatement({
    actions: ["s3:*"],
    resources: [eventsBucket.bucketArn, `${eventsBucket.bucketArn}/*`],
  }),
);

The S3 Files filesystem, mount targets, and access point

Since there is no higher-level CDK support yet, this part uses raw CloudFormation resources.

const s3FilesFs = new CfnResource(this, "S3FilesFileSystem", {
  type: "AWS::S3Files::FileSystem",
  properties: {
    Bucket: eventsBucket.bucketArn,
    RoleArn: s3FilesServiceRole.roleArn,
    AcceptBucketWarning: true,   // required: S3 Files warns about versioning costs
  },
});

const fileSystemId = s3FilesFs.getAtt("FileSystemId").toString();

// One mount target per AZ – Lambda must be in the same subnet.
const mountTargets = mountSubnets.map((subnet, i) => {
  const mt = new CfnResource(this, `S3FilesMountTarget${i}`, {
    type: "AWS::S3Files::MountTarget",
    properties: {
      FileSystemId: fileSystemId,
      SubnetId: subnet.subnetId,
      SecurityGroups: [mountTargetSg.securityGroupId],
    },
  });
  mt.addDependency(s3FilesFs);
  return mt;
});

// Access point: UID/GID 1000 matches the Lambda runtime user.
const s3FilesAccessPoint = new CfnResource(this, "S3FilesAccessPoint", {
  type: "AWS::S3Files::AccessPoint",
  properties: {
    FileSystemId: fileSystemId,
    PosixUser: { Uid: "1000", Gid: "1000" },
    RootDirectory: {
      Path: "/events",
      CreationPermissions: { OwnerUid: "1000", OwnerGid: "1000", Permissions: "755" },
    },
  },
});

Attaching the mount to a Lambda function

This is where the current CDK story is still rough.

const attachS3FilesMount = (fn: lambda.Function, write: boolean): void => {
  // FileSystemConfigs escape-hatch (see pitfalls section below)
  const cfnFn = fn.node.defaultChild as lambda.CfnFunction;
  cfnFn.addPropertyOverride("FileSystemConfigs", [
    {
      Arn: s3FilesAccessPoint.getAtt("AccessPointArn"),
      LocalMountPath: "/mnt/events",
    },
  ]);

  // Lambda must not start until mount targets are ready.
  for (const mt of mountTargets) cfnFn.addDependency(mt);

  // IAM: s3files permissions.
  const actions = ["s3files:ClientMount", "s3files:ClientRootAccess"];
  if (write) actions.push("s3files:ClientWrite");
  fn.addToRolePolicy(new iam.PolicyStatement({
    actions,
    resources: [`arn:aws:s3files:${this.region}:${this.account}:file-system/*`],
  }));

  // Secrets Manager + KMS for the Confluent credentials.
  confluentSecret.grantRead(fn);
  fn.addToRolePolicy(new iam.PolicyStatement({
    actions: ["kms:Decrypt", "kms:DescribeKey"],
    resources: ["*"],
  }));
};

The consumer Lambda with ESM

The consumer Lambda gets Kafka records from Lambda ESM and writes them straight to the mounted path.

const consumerFn = new lambda.Function(this, "ConsumerFn", {
  runtime: lambda.Runtime.PYTHON_3_12,
  handler: "handler.handler",
  code: bundle("./lambda/consumer"),
  vpc,
  vpcSubnets: { subnets: mountSubnets },
  securityGroups: [lambdaSg],
  timeout: Duration.minutes(5),
  environment: {
    MOUNT_PATH: "/mnt/events",
    SMALL_PREFIX: "small/",
    LARGE_PREFIX: "large/",
    SMALL_SIZE_THRESHOLD: String(128 * 1024),
  },
});
attachS3FilesMount(consumerFn, true);

// ESM with BASIC_AUTH (SASL/PLAIN) – no confluent-kafka library needed in Lambda.
new lambda.EventSourceMapping(this, "LambdaConsumer", {
  target: consumerFn,
  kafkaBootstrapServers: [kafkaBootstrapServer],
  kafkaTopic: KAFKA_TOPIC,
  batchSize: 100,
  startingPosition: lambda.StartingPosition.LATEST,
  provisionedPollerConfig: {
    minimumPollers: 1,
    maximumPollers: 5,
  },
  sourceAccessConfigurations: [
    {
      type: lambda.SourceAccessConfigurationType.BASIC_AUTH,
      uri: confluentSecret.secretArn,    // {username: api_key, password: api_secret}
    },
  ],
});

The consumer handler

Because ESM already handles polling, batching, and offset management, the Lambda code itself is boring in a good way. It decodes the record value, decides whether the payload is small or large, and writes a file to the mounted path.

import base64, logging, os, uuid
from datetime import datetime, timezone
from pathlib import Path

MOUNT_PATH      = os.environ.get("MOUNT_PATH", "/mnt/events")
SMALL_PREFIX    = os.environ.get("SMALL_PREFIX", "small/")
LARGE_PREFIX    = os.environ.get("LARGE_PREFIX", "large/")
SMALL_THRESHOLD = int(os.environ.get("SMALL_SIZE_THRESHOLD", str(128 * 1024)))


def _write(prefix: str, value: bytes) -> None:
    now      = datetime.now(timezone.utc)
    dir_path = Path(MOUNT_PATH) / prefix / now.strftime("%Y/%m/%d/%H")
    dir_path.mkdir(parents=True, exist_ok=True)
    (dir_path / f"evt-{uuid.uuid4().hex}.json").write_bytes(value)


def handler(event: dict, context):
    small = large = 0
    for partition_records in event.get("records", {}).values():
        for rec in partition_records:
            value = base64.b64decode(rec["value"])
            if len(value) < SMALL_THRESHOLD:
                _write(SMALL_PREFIX, value)
                small += 1
            else:
                _write(LARGE_PREFIX, value)
                large += 1

    return {"written_small": small, "written_large": large}

That is the whole point of the demo. The application code does not know anything about S3 APIs. It just writes bytes to a path.

The replayer handler

The replayer is equally simple. It walks the mounted directory, measures read_bytes(), and optionally pushes the content back to a replay topic. The latency number in the response is enough to see whether you are reading from cold storage or a warm cache.

def handler(event: dict, context):
    source_prefix = event.get("source_prefix", "")
    dry_run       = event.get("dry_run", True)

    root     = Path(MOUNT_PATH) / source_prefix if source_prefix else Path(MOUNT_PATH)
    producer = None if dry_run else _make_producer(_get_credentials())

    files_read = published = errors = 0
    total_read_ms = 0.0

    for fpath in sorted(root.rglob("*")):
        if not fpath.is_file() or fpath.suffix.lower() not in (".json", ".ndjson"):
            continue

        t0      = time.monotonic()
        data    = fpath.read_bytes()          # ← this is the measured operation
        read_ms = (time.monotonic() - t0) * 1000
        total_read_ms += read_ms
        files_read += 1

        if producer:
            for evt in _parse(fpath.suffix, data):
                producer.produce(REPLAY_TOPIC, value=evt)
                published += 1
        producer and producer.poll(0)

    producer and producer.flush()
    return {
        "files_read":  files_read,
        "published":   published,
        "avg_read_ms": round(total_read_ms / files_read if files_read else 0, 2),
        "dry_run":     dry_run,
    }

Run it twice against the same prefix and the difference is obvious.

# First run – cold cache
aws lambda invoke --function-name <ReplayerFnArn> \
  --cli-binary-format raw-in-base64-out \
  --payload '{"source_prefix": "small/2026/04/13/", "dry_run": true}' \
  /tmp/response.json && cat /tmp/response.json
# → {"avg_read_ms": 87.4, ...}

# Second run – warm cache
aws lambda invoke --function-name <ReplayerFnArn> \
  --cli-binary-format raw-in-base64-out \
  --payload '{"source_prefix": "small/2026/04/13/", "dry_run": true}' \
  /tmp/response.json && cat /tmp/response.json
# → {"avg_read_ms": 1.2, ...}

The annoying part: CDK L2 does not understand S3 Files yet

This was the part that ate the most time.

CDK already has the nice lambda.FileSystem abstraction, so at first glance it looks like mounting a filesystem into Lambda should be trivial. In practice it is only built for EFS access points. S3 Files access points use a different ARN namespace, so CDK rejects them before you even get to deployment.

My first attempt was the obvious one:

// ❌ This does not compile – FileSystemConfig expects an EFS ARN type
new lambda.Function(this, "ConsumerFn", {
  filesystem: lambda.FileSystem.fromEfsAccessPoint(
    s3FilesAccessPoint,   // CfnResource – wrong type
    "/mnt/events",
  ),
  ...
});

That path goes nowhere.

The working solution was to stop fighting the L2 and set FileSystemConfigs directly on the underlying CfnFunction. It is not pretty, but it works and still stays dynamic because the access point ARN is resolved at deploy time.

const cfnFn = fn.node.defaultChild as lambda.CfnFunction;
cfnFn.addPropertyOverride("FileSystemConfigs", [
  {
    Arn: s3FilesAccessPoint.getAtt("AccessPointArn"),   // IResolvable – resolved at deploy time
    LocalMountPath: "/mnt/events",
  },
]);

Once you bypass the L2, there is one more thing to remember: you also lose the automatic dependency wiring that CDK normally adds for EFS mounts. Without an explicit dependency on the mount targets, CloudFormation can create the Lambda before the NFS endpoint is ready. When that happens, the function fails during init with an NFS connection error.

for (const mt of mountTargets) cfnFn.addDependency(mt);

So yes, the escape hatch works, but it also means you need to handle the lifecycle details yourself.

Apply the sync rules after deployment

Another rough edge: ImportDataRules and ExpirationDataRules are not CloudFormation properties on AWS::S3Files::FileSystem. You cannot finish the full setup in one CDK deploy. The sync rules need to be applied after the stack is up.

FS_ID=$(aws cloudformation describe-stacks \
  --stack-name AmazonS3FilesAndLambdaStack \
  --query "Stacks[0].Outputs[?OutputKey=='S3FilesFileSystemId'].OutputValue" \
  --output text)

aws s3files put-synchronization-configuration \
  --file-system-id "${FS_ID}" \
  --import-data-rules '[
    {"prefix":"small/","trigger":"ON_FILE_ACCESS","sizeLessThan":131072},
    {"prefix":"large/","trigger":"ON_FILE_ACCESS","sizeLessThan":2097152}
  ]' \
  --expiration-data-rules '[{"daysAfterLastAccess":30}]'

This config does three things:

files under small/ smaller than 128 KB are imported into cache on first access
files under large/ smaller than 2 MB are also imported into cache on first access
cached data that has not been touched for 30 days is eligible for eviction

That second step is easy enough, but it is still a separate step, which is worth knowing before you try to wrap this in a clean IaC story.

Deploying

# Clone the repo
git clone https://github.com/<your-org>/<your-repo>
cd amazon-s3-files-kafka-lambda-demo

# Install CDK dependencies
npm install

# Bootstrap your account/region (once per account)
npx cdk bootstrap

# Deploy – provide your VPC name, secret ARN, and Confluent bootstrap server
npx cdk deploy \
  --context vpcName=my-vpc \
  --context confluentSecretArn=arn:aws:secretsmanager:eu-central-1:123456789012:secret:confluent/dev/api-key/my-cluster \
  --context kafkaBootstrapServer=pkc-xxxxx.eu-central-1.aws.confluent.cloud:9092

The Confluent Secrets Manager secret must look like this:

{
  "endpoint":   "pkc-xxxxx.eu-central-1.aws.confluent.cloud:9092",
  "username":   "YOUR_API_KEY",
  "password":   "YOUR_API_SECRET"
}

Producing test events

I used a synthetic producer so the demo can be run without any real source system.

export BOOTSTRAP_SERVERS=pkc-xxxxx.eu-central-1.aws.confluent.cloud:9092
export API_KEY=<your-api-key>
export API_SECRET=<your-api-secret>
export NUM_SMALL=1000  # ~4 KB each → routes to small/
export NUM_LARGE=1000  # ~920 KB each → routes to large/

uv run ./scripts/produce-demo-events.py

The payload sizes are deliberate. Small events are far below the 128 KB threshold. Large events are above 128 KB but below 2 MB, so they still match the larger cache rule.

Benchmarking

The numbers below came from a real run in eu-central-1. The producer sent 1 000 small events at around 4 KB and 1 000 large events at around 920 KB. The consumer Lambda wrote them to the mounted S3 Files path in batches of up to 100 records through Lambda ESM.

NFS write latency in the consumer Lambda

Write latency was measured around each write_bytes() call and published to CloudWatch as StatisticValues.

Size class	Payload	Avg write latency	Implied NFS throughput
`small`	~4 KB	11–15 ms	~0.3 MB/s (latency-bound – RTT dominates)
`large`	~920 KB	62–64 ms	~14 MB/s (throughput-bound – single NFS stream)

The small-file case is mostly round-trip overhead. Payload size barely matters there. The large-file case looks more like what you would expect from a normal throughput-bound transfer over NFS.

The CloudWatch metric definition:

_cw.put_metric_data(
    Namespace="S3FilesDemo",
    MetricData=[{
        "MetricName": "WriteLatencyMs",
        "Dimensions": [{"Name": "SizeClass", "Value": "small"}],
        "StatisticValues": {
            "SampleCount": float(len(latencies)),
            "Sum":         sum(latencies),
            "Minimum":     min(latencies),
            "Maximum":     max(latencies),
        },
        "Unit": "Milliseconds",
    }]
)

Using StatisticValues instead of individual data points lets you see p50/p99 alarms on accumulated batches without any CloudWatch custom metric quota concerns.

NFS read latency in the replayer Lambda - cold vs warm

The replayer reads each file with read_bytes() and records the latency. Run it twice against the same prefix and you get the cold-read vs warm-read comparison immediately.

# Cold read – files not yet in EFS SSD cache
aws lambda invoke --function-name <ReplayerFnArn> \
  --cli-binary-format raw-in-base64-out \
  --payload '{"source_prefix": "small/2026/04/13/", "dry_run": true}' \
  /tmp/r1.json && jq '{avg_small: .avg_read_ms_small, avg_large: .avg_read_ms_large}' /tmp/r1.json

# Warm read – same files, now cached in EFS
aws lambda invoke --function-name <ReplayerFnArn> \
  --cli-binary-format raw-in-base64-out \
  --payload '{"source_prefix": "small/2026/04/13/", "dry_run": true}' \
  /tmp/r2.json && jq '{avg_small: .avg_read_ms_small, avg_large: .avg_read_ms_large}' /tmp/r2.json

The response includes per-class averages:

// First invocation (cold)
{ "avg_read_ms_small": 94.3, "avg_read_ms_large": 187.6 }

// Second invocation (warm)
{ "avg_read_ms_small": 1.1,  "avg_read_ms_large": 4.8 }

Numbers above are representative; your latency depends on VPC-to-S3 distance, mount target AZ alignment, and EFS cache state at invocation time.

The important part is not the exact number. It is the shape of the result. The first read pays the fetch cost. The second read is dramatically faster because the working set is already warm.

What to look at in CloudWatch

Open CloudWatch → Metrics → S3FilesDemo and look at these four metrics together:

Metric	Stat	What to look for
`WriteLatencyMs`	p99	Tail latency spikes during large ESM batches
`ReadLatencyMs`	Average	Clear drop between cold and warm replayer runs
`FilesWritten`	Sum	Total files written in the selected time window
`FilesRead`	Sum	Confirms the replayer actually walked the whole prefix

S3 Files Pricing

	Price
High-performance storage*	$0.36/GB-mo
Data access
File reads from high-performance storage	$0.04/GB
File reads directly from S3 bucket**	FREE
File writes	$0.07/GB

* Pricing is quoted monthly and prorated by the hour.

Takeaways

S3 Files is useful when you want file semantics on top of S3. The main value here is not magic performance. It is simpler application code. The ingest path can just write bytes to a mounted path and let the backing service deal with S3.
Lambda ESM works well with Confluent Cloud in this setup. BASIC_AUTH was enough for SASL/PLAIN, and the consumer Lambda did not need to embed a Kafka client just to read records.
CDK support is still immature. You can make this work today, but expect to use L1 resources and escape hatches. This is not a clean all-L2 CDK experience yet.
Cold and warm reads are very different. That is the behavior to understand before using S3 Files in a real workload. For archive and replay style flows, that can be perfectly fine. For anything latency-sensitive, cache behavior is not a detail - it is the design constraint.

The full source code is available at https://github.com/dobeerman/amazon-s3-files-kafka-lambda-demo

Top comments (1)

Paul Marcelin • Apr 13

Amid all the noise about S3 Files, here is an article and sample code that shows how to actually use this new S3 feature! Thank you for writing this.

I especially appreciate that you cover the need to mix some CloudFormation with CDK code, establish explicit dependencies, and deploy in 2 steps.

Even for people who might not use S3 Files right away, learning about the Kafka + AWS Lambda pattern is valuable. "Because ESM already handles polling, batching, and offset management, the Lambda code itself is boring in a good way." Using Lambda to read from Kafka is a great way to avoid bringing in dependencies and writing code that is not related to the actual application.