Ceph S3 Object storage from Fluentd(EFK stack)

I find it hard to understand fluentd documentation and utilize Ceph storage (S3) to push Logs from Fluentd. This post helps to Store the Logs in Ceph’s S3 Object storage using Fluentd.

Ceph Storage with Rook

Follow the steps provided in Rook’s Github documentation for setting up Rook with Ceph storage.
*https://github.com/rook/rook/blob/master/Documentation/ceph-quickstart.md*

Setting Up EFK stack on Kubernetes cluster

Easiest way is to clone official kubernetes git repo
git clone https://github.com/kubernetes/kubernetes.git

Navigate to kubernetes/cluster/addons/fluentd-elasticsearch/ to find the deployment YAML’s for

ElasticSearch (statefulset)
Fluentd
Kibana

cd kubernetes/cluster/addons/fluentd-elasticsearch/
kubectl create -f es-service.yaml
kubectl create -f es-statefulset.yaml
kubectl create -f fluentd-es-configmap.yaml
kubectl create -f fluentd-es-ds.yaml
kubectl create -f fluentd-es-image
kubectl create -f kibana-deployment.yaml
kubectl create -f kibana-service.yaml

**Note: **For Development/Testing purpose you can edit Kibana-service.yaml ‘type’ as **NodePort **to expose Kibana dashboard to access it outside the Cluster.

The out_s3 Output plugin writes records into the Amazon S3 cloud object storage service. By default, it creates files on an hourly basis

Fluentd’s out_s3 also provides support to AWS’s S3 Object storage implementations. Ceph Provides S3-compatible object storage functionality with an interface that is compatible with a large subset of the Amazon S3 RESTful API.

Installation

out_s3 is included in td-agent by default.

**Note: **Fluentd gem users will need to install the fluent-plugin-s3 gem. In order to install it, please refer to the Plugin Management article.

Example Configuration

This config will push all the logs of services running in cluster to Ceph’s S3 Object storage in json format

<match **>
  @type s3

aws_key_id **CEPH_S3_KEY_ID**
  aws_sec_key **CEPH_S3_SECRET_KEY**
  s3_bucket **CEPH_S3_BUCKET_NAME**
  s3_endpoint **CEPH_S3_URL_WITH_STORE_NAME**
  path logs
  # by default Objects are gZipped but you can store as json
  store_as json
  <buffer tag,time>
    @type file
    path /var/log/fluent/s3
    timekey 3600 # 1 hour partition
    timekey_wait 10m
    timekey_use_utc true # use utc
    chunk_limit_size 256m
  </buffer>
</match>

You can connect to Ceph’s s3 using s3cmd tool
s3cmd:

sudo apt-get update
sudo apt-get install s3cmd

To Consume s3 storage

export AWS_HOST**=**<host>
export AWS_ENDPOINT**=**<endpoint>
export AWS_ACCESS_KEY_ID**=**<accessKey>
export AWS_SECRET_ACCESS_KEY**=**<secretKey>

Host: The DNS host name where the rgw service is found in the cluster. Assuming you are using the default rook-ceph cluster, it will be rook-ceph-rgw-my-store.rook-ceph.
Endpoint: The endpoint where the rgw service is listening. Run kubectl -n rook-ceph get svc rook-ceph-rgw-my-store, then combine the clusterIP and the port.
Access key: kubectl -n rook-ceph get secret rook-ceph-object-user-my-store-my-user -o yaml | grep AccessKey | awk ‘{print $2}’ | base64 — decode
Secret key: kubectl -n rook-ceph get secret rook-ceph-object-user-my-store-my-user -o yaml | grep SecretKey | awk ‘{print $2}’ | base64 — decode

**s3cmd **Listing files in S3_BUCKET

s3cmd ls s3://S3_BUCKET_NAME --no-ssl --host=$AWS_HOST

Summary

We have deployed EFK stack on Kubernetes and Rook with ceph storage cluster. Created Ceph Object store and used it in Fluentd Conf to connect to S3 using out-s3 Plugin. Access S3 storage using s3cmd Tool.