CockroachDB and persistent volumes
When deployed on Kubernetes or OpenShift, CockroachDB uses persistent volumes (PVs) to store DB data, metadata, state-data, user-data, log files, configuration files. These volumes are typically file-system mounts that are mapped to disks/SSDs where the data is physically saved in a distributed fashion. When you operate CockroachDB and run queries, data must be read or written where these operations translate to frequent or continuous disk reads & writes.
Managing the disk: IOPS & throughput
On cloud-managed orchestrators, when you read or write data to disk (PVs), this consumes IOPS and utilizes some of the available IO throughput. These are limiting factors that can result bandwidth saturation, or worse, throttling by the cloud provider under heavier workloads. This condition can be identified by the combination of low CPU usage and high disk latencies, visualized through the CockroachDB UI console hardware dashboard metrics and charts.
Divide & conquer
To overcome these limitations, CockroachDB lets you take advantage of multiple, independent PVs to separate the destination of the cockroach runtime data. CockroachDB Logging is a good candidate to move out of the critical path by dedicating its own volume/storage. This will help with performance tuning since your SQL/schemas live on their own dedicated volume. In fact it's the production readiness recommendation to split the data from the logs into separate PVs.
Typical CockroachDB deployments
Most CockroachDB clusters implement a single PVC that is assigned to each node in a stateful set. Default configurations in both HELM and Operator managed environments create this 1:1 mapping as follows:
Our planned deployment with multiple PVs
By introducing a second PV dedicated for logs, we split the workload and effectively double the IO channels and allows for each to be independently configured. Storage for logs can be significantly reduced when compared to the cockroach-data PV since logs can be rotated/truncated while your business data can grow over time. This illustration highlights the logical infrastructure layout between nodes and PVs.
…to the implementation
We need to make additions to the StatefulSet template along with custom log-configuration settings to direct CockroachDB logs into the new destination PV.
The logging “secret” configuration
This resource is the one-stop-shop for all your customized logging properties, including log sinks (output logs to different locations, including over the network), logging channels that are mapped to each sink, the format used by the log messages, any redaction-flags of log messages, the buffering and max sizes of log messages.
The following log configuration is the smallest/simplest configuration that we will use as a starting point. Here we keep most defaults, only adjusting the file-defaults destination path for the actual files, where this path will be mounted to a separate PV defined in the StatefulSet template.
file-defaults:
dir: /cockroach/cockroach-logs
sinks:
file-groups:
default:
channels:
- ALL
For a comprehensive explanation of this fragments, along with working examples and code-fragments, please refer to the Cockroach log configuration documentation so you can tailor the actual logging to your needs.
The StatefulSet template configuration
This statefulset fragment only highlights the added template properties to define the PVC and specific mount points to both the log config secret and the new logs folder. A full, complete StatefulSet example follows this fragment to show the entirety of an actual solution I deployed.
kind: StatefulSet
apiVersion: apps/v1
spec:
volumeClaimTemplates:
# ...
# ...
# Fragment 1
# New volumeClaimTemplate to generate Log PVC & PV
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: logsdir
labels:
app.kubernetes.io/instance: zlamal
app.kubernetes.io/name: cockroachdb
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeMode: Filesystem
template:
spec:
containers:
- # ...
# ...
volumeMounts:
# ...
# ...
# Fragment 2
# Additional mount-points for path to logs and log-config
- name: logsdir
mountPath: /cockroach/cockroach-logs/
- name: log-config
readOnly: true
mountPath: /cockroach/log-config
# Fragment 3
# Addition of a new “cockroach start” parameter --log-config-file=...
# This parameter points CRDB to the mounted log-config secret
args:
- shell
- '-ecx'
- |-
exec /cockroach/cockroach start --log-config-file=/cockroach/log-config/log-config.yaml --join=... --advertise-host=... --certs-dir=/cockroach/cockroach-certs/ --http-port=8081 --port=26257 --cache=11% --max-sql-memory=10%
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
# Fragment 4
# Establish the logical YAML reference to the logging directory
- name: logsdir
persistentVolumeClaim:
claimName: logsdir
# Fragment 5
# Establish logical YAML reference to the log-config secret resource
- name: log-config
secret:
secretName: zlamal-cockroachdb-log-config
defaultMode: 420
# ...
# ...
Here is the complete StatefulSet of these changes,including tags/labels specific to my cluster as a reference example that you can copy and edit to make your own (eg sizes, storage classes, IOPS, tags/labels. etc):
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: zlamal-cockroachdb
labels:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: zlamal
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: cockroachdb
helm.sh/chart: cockroachdb-14.0.4
spec:
serviceName: zlamal-cockroachdb
volumeClaimTemplates:
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: datadir
labels:
app.kubernetes.io/instance: zlamal
app.kubernetes.io/name: cockroachdb
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeMode: Filesystem
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: logsdir
labels:
app.kubernetes.io/instance: zlamal
app.kubernetes.io/name: cockroachdb
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeMode: Filesystem
template:
metadata:
labels:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: zlamal
app.kubernetes.io/name: cockroachdb
spec:
restartPolicy: Always
initContainers:
- resources: {}
terminationMessagePath: /dev/termination-log
name: copy-certs
command:
- /bin/sh
- '-c'
- cp -f /certs/* /cockroach-certs/; chmod 0400 /cockroach-certs/*.key
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
imagePullPolicy: IfNotPresent
volumeMounts:
- name: certs
mountPath: /cockroach-certs/
- name: certs-secret
mountPath: /certs/
terminationMessagePolicy: File
image: busybox
serviceAccountName: zlamal-cockroachdb
schedulerName: default-scheduler
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: zlamal
app.kubernetes.io/name: cockroachdb
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 300
securityContext: {}
containers:
- resources: {}
readinessProbe:
httpGet:
path: /health?ready=1
port: http
scheme: HTTPS
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 2
terminationMessagePath: /dev/termination-log
name: db
livenessProbe:
httpGet:
path: /health
port: http
scheme: HTTPS
initialDelaySeconds: 30
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
env:
- name: STATEFULSET_NAME
value: zlamal-cockroachdb
- name: STATEFULSET_FQDN
value: zlamal-cockroachdb.mz-helm-v11.svc.cluster.local
- name: COCKROACH_CHANNEL
value: kubernetes-helm
ports:
- name: grpc
containerPort: 26257
protocol: TCP
- name: http
containerPort: 8081
protocol: TCP
imagePullPolicy: IfNotPresent
volumeMounts:
- name: datadir
mountPath: /cockroach/cockroach-data/
- name: logsdir
mountPath: /cockroach/cockroach-logs/
- name: log-config
readOnly: true
mountPath: /cockroach/log-config
- name: certs
mountPath: /cockroach/cockroach-certs/
terminationMessagePolicy: File
image: 'cockroachdb/cockroach:v23.2.1'
args:
- shell
- '-ecx'
- |-
exec /cockroach/cockroach start --log-config-file=/cockroach/log-config/log-config.yaml --join=${STATEFULSET_NAME}-0.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-1.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-2.${STATEFULSET_FQDN}:26257 --advertise-host=$(hostname).${STATEFULSET_FQDN} --certs-dir=/cockroach/cockroach-certs/ --http-port=8081 --port=26257 --cache=11% --max-sql-memory=10%
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: zlamal
app.kubernetes.io/name: cockroachdb
serviceAccount: zlamal-cockroachdb
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
- name: logsdir
persistentVolumeClaim:
claimName: logsdir
- name: log-config
secret:
secretName: zlamal-cockroachdb-log-config
defaultMode: 420
- name: certs
emptyDir: {}
- name: certs-secret
projected:
sources:
- secret:
name: zlamal-cockroachdb-node-secret
items:
- key: ca.crt
path: ca.crt
mode: 256
- key: tls.crt
path: node.crt
mode: 256
- key: tls.key
path: node.key
mode: 256
defaultMode: 420
dnsPolicy: ClusterFirst
podManagementPolicy: Parallel
replicas: 3
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app.kubernetes.io/component: cockroachdb
app.kubernetes.io/instance: zlamal
app.kubernetes.io/name: cockroachdb
Conclusion & References
This is a versatile addition to the standard statefulset because the IOPS can be managed between the PVs, and the plumbing is in-place for log customization. DB admins can easily make changes the to logging channels in a running environment by editing a single log-config file that saved as a secrets object.
Cockroach Logging Overview
Cockroach log configuration
Cockroach start: logging
Production recommendations
Top comments (2)
Mark, I'm guessing you could take a similar approach to having multiple data store devices?
Yes indeed! Adding additional data-stores is an ideal solution to address several use-cases:
CRDB on high-vCPU worker-nodes: From our production readiness guidelines, we do not recommend workers with > 32 vCPUs. If you're bound to servers with 32 or more vCPUs, the additional store will benefit from the extra compute/processing power by creating additional processes per-store. These include splitting the GC workload, compactions, replica management, WAL, monitoring, etc. In the end you will leverage the additional CPU and will experience less waiting times on I/O operations.
You can create custom stores dedicated for specialized activities such as encryption at-rest for a subset of your data. In most cases there is a performance cost to encrypt/decrypt data, and you may not want to do this for the entirety of your data, maybe just a few tables managing PII. This is nicely written up with tangible examples in this blog: cockroachlabs.com/blog/selective-e...