Discussion on: Kubernetes, are you ready for production?

View post

I just started with k8s and used minikube for benchmarking redis (single instance, redis-benchmark with default settings). Here's a very, very short version of the result:

docker:                   100,000 requests/second
k8s/vm-driver=none:                 --- ERROR --- 
k8s/vm-driver=virtualbox:   7,500 requests/second

I wonder why VirtualBox is the default setting of minikube. I mean - the performance gap is huge compared to (native) docker performance! The performance with vm-driver=none would be even more interesting, because it would create docker containers as well, but under the control of minikube. Unfortunately I couldn't run this benchmark, because "sudo minikube start" resulted in a timeout of the dns service (I opened an issue).

Thorsten Hirsch • Sep 16 '19

Now I could complete my benchmark:

docker:                   100,000 requests/second
k8s/vm-driver=none:        28,500 requests/second
k8s/vm-driver=virtualbox:   7,500 requests/second

Oh... that's less than I thought. Why is vm-driver=none so much slower than (native) docker? There are no additional instances of redis in k8s (I removed the 2 pods with the slave instances), so there's no redis replication happening that could harm the performance.

Is k8's load balancing (and all the other stuff) eating up 70% of the performance on my desktop computer? Is the performance decrease less on big server hardware?

Adi Polak • Sep 19 '19

This is an interesting point. I know for a fact that when running things like Kafka and Spark you get better performance when using more than one server, balancing a big load of work. When used correctly the more data they process, the less time it takes them to process the next event/piece of data. When traditional solutions will probably break or fail.

Hence performing at best for big data/events streams at scale workloads.

BTW, Spark for a fact doesn't perform well on small loads, it takes more time than traditional operations on data.

I think that for K8s there is also an overhead when using it for one task/machine at a small load scale. Managing the pods, the networking between master and works/pods through the kubelet ( even when running locally).

I know Kafka and Spark are way different from K8s, but they rely on similar principals and challenges of distributed systems. Some of the challenges come from network bandwidth, lack of shared memory, distributed lock and much more. Which makes me wonder if ppl will be interested in reading a blog post about it.. 🤔

BTW, I like how your research shows that for one task, docker is good on its own. and for more. the vm= none, is none means bare metal? or actually native docker? from the documentation, it says that this is the vm supplied by the user.

BTW, documentation also say that When run in none mode, minikube has no built-in resource limit mechanism, which means you could deploy pods which would consume all of the host's resources. which can explain the better performance. What are the virtualbox limitations? can it be a low RAM? and than Radis need to write and read from disk and not from memory?