DEV Community

Cover image for Mount S3 Objects to Kubernetes Pods
Ant(on) Weiss for Otomato

Posted on

Mount S3 Objects to Kubernetes Pods

This post describes how to mount an S3 bucket to all the nodes in an EKS cluster and make it available to pods as a hostPath volume. Yes, we're aware of the security implications of hostPath volumes, but in this case it's less of an issue - because the actual access is granted to the S3 bucket (not the host filesystem) and access permissions are provided per serviceAccount.

Goofys

We're using goofys as the mounting utility. It's a "high-performance, POSIX-ish Amazon S3 file system written in Go" based on FUSE (file system in user space) technology.

Daemonset

In order to provide the mount transparently we need to run a daemonset - so the mount is created on all nodes in the cluster.

The Dockerfile and the Helm Chart

We've built our own goofys Docker image based on Alpine Linux and a Helm chart that installs the DaemonSet.

The image is found on our Docker hub repo here: https://hub.docker.com/r/otomato/goofys

The Dockerfile and the Helm chart can be found here: https://github.com/otomato-gh/s3-mounter

S3 Access per ServiceAccount

The Helm chart currently assumes that S3 access is provided by using an IAM Role attached to a kubernetes serviceAccount. We may add API access keys support in the future if needed.

HowTo:

Here's how to set it all up:

1. OIDC Provider for EKS

Make sure you have an IAM OIDC identity provider for your cluster. If not - you can use the following commands (you'll need eksctl installed):

aws eks describe-cluster --name cluster_name --query "cluster.identity.oidc.issuer" --output text`
Enter fullscreen mode Exit fullscreen mode

Example output:

https://oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E

List the IAM OIDC providers in your account. Replace EXAMPLED539D4633E53DE1B716D3041E with the value returned from the previous command.

aws iam list-open-id-connect-providers | grep EXAMPLED539D4633E53DE1B716D3041E
Enter fullscreen mode Exit fullscreen mode

Example output

"Arn": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"

If output is returned from the previous command, then you already have a provider for your cluster. If no output is returned, then you must create an IAM OIDC provider with the following command. Replace cluster_name with your own value.

eksctl utils associate-iam-oidc-provider --cluster cluster_name --approve
Enter fullscreen mode Exit fullscreen mode

2. Create a Managed Policy for Bucket Access

Create json file named policy.json with the appropriate policy definition. For example - the following code snippet creates a json file that allows full access to a bucket named my-kubernetes-bucket:

read -r -d '' MY_POLICY <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*",
            ],
            "Resource": [
                "arn:aws:s3:::my-kubernetes-bucket"
            ]
        }
    ]
}
EOF
echo "${MY_POLICY}" > policy.json
Enter fullscreen mode Exit fullscreen mode

Create the managed policy by running:

aws iam create-policy --policy-name kubernetes-s3-access --policy-document file://policy.json
Enter fullscreen mode Exit fullscreen mode

Example output:

{
"Policy": {
"PolicyName": "kubernetes-s3-access",
"PolicyId": "ANPAS3DOMWSIX73USJOHK",
"Arn": "arn:aws:iam::04968064045764:policy/kubernetes-s3-access",

Note the policy ARN for the next step.

3. Create a Role for S3 Access

Set your AWS account ID to an environment variable with the following command:

ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
Enter fullscreen mode Exit fullscreen mode

Set your OIDC identity provider to an environment variable with the following command. Replace the example values with your own values:

OIDC_PROVIDER=$(aws eks describe-cluster --name cluster-name --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")
Enter fullscreen mode Exit fullscreen mode

Copy the following code block to your computer and replace the example values with your own values.

read -r -d '' TRUST_RELATIONSHIP <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:my-namespace:my-service-account"
        }
      }
    }
  ]
}
EOF
echo "${TRUST_RELATIONSHIP}" > trust.json
Enter fullscreen mode Exit fullscreen mode

Run the modified code block from the previous step to create a file named trust.json.

Run the following AWS CLI command to create the role:

aws iam create-role --role-name eks-otomounter-role --assume-role-policy-document file://trust.json --description "Mount s3 bucket to EKS"
Enter fullscreen mode Exit fullscreen mode

Run the following command to attach the IAM policy using the ARN created in the previous section to your role:

aws iam attach-role-policy --role-name eks-otomounter-role --policy-arn=IAM_POLICY_ARN
Enter fullscreen mode Exit fullscreen mode

4. Finally - Install the S3 Mounter!

  • Add the helm repo to your repo list:
helm repo add otomount https://otomato-gh.github.io/s3-mounter
Enter fullscreen mode Exit fullscreen mode
  • Inspect its arguments in values.yaml
helm show values otomount/s3-otomount
Enter fullscreen mode Exit fullscreen mode

The values you want to set are in the end:

bucketName: my-bucket
iamRoleARN: my-role
mountPath: /var/s3
hostPath: /mnt/s3data
Enter fullscreen mode Exit fullscreen mode
  • Install the chart by providing your own values:
helm upgrade --install s3-mounter otomount/s3-otomount   \ 
--namespace otomount --set bucketName=<your-bucket-name> \
--set iamRoleARN=<your-role-arn> --create-namespace
Enter fullscreen mode Exit fullscreen mode

This will use the default hostPath for the mount - i.e /mnt/s3data

5. Use the mounted S3 bucket in your Deployments.

Here's an example pod definition that provides its container the access to the mounted bucket:

apiVersion: v1
kind: Pod
metadata:
  name: sleeper
spec:
  containers:
  - command:
    - sleep
    - infinity
    image: ubuntu
    name: ubuntu
    volumeMounts:
    - mountPath: /mydata:shared
      name: s3data
  volumes:
  - hostPath:
      path: /mnt/s3data
    name: s3data
Enter fullscreen mode Exit fullscreen mode

Note the :shared - it's a mount propagation modifier in the mountPath field that allows this volume to be shared by multiple pods/containers on the same node.

And that's it! You can now access your bucket. If you've created the pod from our example - you can exec to verify:

kubectl exec sleeper -- ls /mydata
Enter fullscreen mode Exit fullscreen mode

Note: running this on your cluster will cost you a few additional $ for S3 API calls that goofys performs to maintain the mount. So remember to monitor your cloud costs. But you should do that anyway, right?

Happy delivering!

Top comments (16)

Collapse
 
eldada profile image
Eldad Assis

Cool setup. Have you tested its speed?

Collapse
 
antweiss profile image
Ant(on) Weiss

No, speed wasn't a consideration here. Main motivation here was providing an easy and transparent way to upload files and make them accessible to pods. S3 gives users an easy and secure UI for that. Goofys is supposedly quite performant compared to other FUSE implementations (i.e s3fs). But we haven't benchmarked this ourselves.

Collapse
 
eldada profile image
Eldad Assis

Thx! Would love to know numbers if you ever do try it :-)

Collapse
 
oleksiihead profile image
Oleksii Smiichuk

Hi, Can set multiple bucketName?
I need to interact with few s3 buckets for different tasks

Collapse
 
antweiss profile image
Ant(on) Weiss

hi @oleksiihead
no support for this right now.
to add this one would need to do smthng like:

  1. modify the Dockerfile to replace the container startup command with an entrypoint script that mounts buckets in a loop.
  2. modify the Helm chart to receive an dictionary of bucket names and mount points and pass these into the DaemonSet

If you get to do this - please submit a PR.

Collapse
 
naturalett profile image
Lidor Ettinger

Amazing approach!
Thx for sharing in details.

Collapse
 
coming2022 profile image
Big Bunny • Edited

I tried to use the sharing method to complete the entire demo, but unfortunately this didn't work. Because the goofys mount directory( /var/s3fs) in daemonset is not the same as the directory I want to share with the host(/var/s3fs:shared);

/otomato # df -h
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme0n1p1           50.0G      4.7G     45.3G   9% /var/s3fs:shared
poc-s3goofys-source 1.0P         0      1.0P   0% /var/s3fs
Enter fullscreen mode Exit fullscreen mode

Is there any configuration I missed?

Daemonset.yaml


apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: s3-mounter
  name: s3-mounter
  namespace: otomount
spec:
  selector:
    matchLabels:
      app: s3-mounter
  template:
    metadata:
      labels:
        app: s3-mounter
    spec:
      serviceAccountName: s3-mounter
      containers:
      - name: mounter 
        image: otomato/goofys
        securityContext:
          privileged: true
        command: ["/bin/sh"]
        args: ["-c", "mkdir -p /var/s3fs && ./goofys --region xxxxx -f poc-s3goofys-source /var/s3fs"]
        volumeMounts:
          - name: devfuse
            mountPath: /dev/fuse
          - name: mntdatas3fs
            mountPath: /var/s3fs:shared
      volumes:
        - name: devfuse
          hostPath:
            path: /dev/fuse
        - name: mntdatas3fs
          hostPath:
            path: /mnt/s3data
Enter fullscreen mode Exit fullscreen mode
Collapse
 
antweiss profile image
Ant(on) Weiss

What's your node OS? Is mount propagation enabled in the container runtime? See this note here: kubernetes.io/docs/concepts/storag...

Collapse
 
dirai09 profile image
dirai09

I have tried this and the other similar option mentioned in this blog. blog.meain.io/2020/mounting-s3-buc.... In neither case, the mounting to hostPath was successful for the cluster managed by AWS EKS.

Collapse
 
antweiss profile image
Ant(on) Weiss

Hi @dirai09 , this was originally tested on AWS EKS. I haven't tested it since but it should in theory still work. What is the error you're getting when trying to mount the hostPath?
Also - can you share your config in a gist?

Collapse
 
scaratec profile image
Randy Gupta • Edited

Nice approach. However, you might want to have a look at JuiceFS:

github.com/juicedata/juicefs

That has quite a good performance due to the combination with Redis and it is made with Kubernetes in mind.

Collapse
 
sarav_ak profile image
Sarav AK

Thanks for the wonderful suggestion @randy

Collapse
 
behroozam profile image
behrooz hasanbeygi

In high number of files its fail you due to nature of s3 api for small files the http response will be bigger than files.

I think mounting s3 is a bad idea, if you have enough developing resources its better to write a client for code to connect directly to s3 and cache list of s3 files ... For better performance.
But its a fun thing to do, also cephfs with rados gateway will give you better performance in kubernetes

Collapse
 
antweiss profile image
Ant(on) Weiss

good to know. not an issue in our case - we have a small number of large files there. And I agree it's not such a great idea in general - both performance wise and because of the hidden complexity. But it solved our specific itch and may help others solve it.

Collapse
 
dirai09 profile image
dirai09

Hi,
I don't think I am able to mount the volumes on the hostPath. Am I missing something here.

Collapse
 
janesoo profile image
Jane

I ran into an issue where goofys doesn't reload the content of a small txt file. It updates the timestamp though. Do you know what could be wrong?
I have goofys run inside a container.