At my previous company I dealt with Kubernetes in every way you could imagine, and for each way you can interact with Kubernetes, there are many projects clamoring to be the de-facto standard. Sometimes you have a couple beat out the rest to become the top choices while still not being the obvious default, like with deploying (helm vs kustomize). As a developer looking to use Kubernetes in a serious way, this is good news, because it means less technologies to evaluate, which means your journey of gathering data and building a compelling case to present to your colleagues can begin sooner.
Fortunately and unfortunately, Ingress for Kubernetes clusters is far from having an obvious choice. This means there is likely an Ingress controller which is more suited to your organization’s needs, but it also means far more due diligence you as a developer must do to evaluate and make the best choice. To be completely thorough may not be an option if you’re working at a small company and facing deadlines. Changing your Ingress controller is also non-trivial, so a thorough evaluation is critical. And it’s not only the technology you’re evaluating, but the ecosystem and community behind it. I offer my experience going through this process of evaluation.
In this blog, we take a look at some of the challenges of routing to complex Kubernetes applications and the different iterations of implementations to get this to work at scale. We explore the different architectures and technologies we used, including how they performed. In a Kubernetes cluster, this means we need some kind of Ingress controller or Gateway to enable this traffic flow. The context for this journey is based on a project I worked on which required serving HTTPS and TCP traffic with SNI to a multi-tenant Kubernetes cluster.
In a previous iteration of the project architecture, we tried using the NGINX Ingress controller to handle traffic for the whole cluster. Adding a new tenant caused applications and services to be deployed, which in turn created routes. Adding these new routing rules to the Ingress caused NGINX configuration reloads which triggered dropped requests and broke user connections. This caused availability issues where adding new tenants, and consequently new routing rules, could affect connectivity for all users.
We re-evaluated that architecture and went with a two-tier architecture that reduced the dynamic configurations on the Ingress. At the edge, all we really needed was some basic SNI sniffing and routing to a namespace router that could then do more fine-grained routing closer to the application services. We ended up going with k8sniff as the edge Ingress controller which would do the SNI sniffing, then forward the request to a router in a user’s namespace. In that namespace would be the NGINX router for that particular tenant’s applications and services. NGINX in this case had fairly static routing rules and any impact for one tenant would be limited to them.
This architecture wasn’t perfect either, however. We needed to fork k8sniff and make changes such as enabling per-backend connection metrics for observability, fixing data corruption issues, and fixing large CPU and memory spikes due to informer cache invalidation. We also hit other scaling issues and realized we would have to maintain all of this ourselves which is not something we were keen to do. So we started exploring other alternatives.
I started with this article to become familiar with all the Kubernetes Ingress controller options and then narrowed my choices based on the project’s requirements. After mulling it over, I decided to further limit my choices to Ingress controllers that were based on the Envoy proxy since, in my opinion, it appeared to be the de-facto proxy moving forward.
Envoy is an L7 proxy that was built to be dynamic (dynamic configuration reload, no hot restarts, API driven, etc) and nicely solves some of the issues cloud-native applications suffer (lack of observability, resilience measures, etc). We also wanted to be able to proxy HTTPS and TCP through the same port. We found two promising projects which act as control planes for Envoy: Ambassador from Datawire and Gloo from Solo.io.
We started down the path of Ambassador but experienced some limitations. In the next few sections, we explore what happened.
We started with Ambassador as its documentation was approachable at the time, and its configuration CRDs seemed easy to manage. We also found the out-of-the-box dev environment provided by Ambassador to be quite attractive should we be required to make any changes. As it turns out, we did have to do that in order to add functionality to Ambassador to satisfy our project’s requirements.
We began evaluating Ambassador as a replacement for our homemade solution of k8sniff+NGINX (we used the Ambassador API Gateway 1.0.0 directly, not their new Edge Stack -- we couldn’t get Edge Stack to work). We were able to start replacing our home-grown solution with Ambassador to handle routes for all tenants. We were particularly active on the Ambassador slack channel trying to get as much knowledge and insight as we could in order to optimally achieve our use case. Again, migrating Ingress solutions is non-trivial!
As part of the evaluation process, we wanted to understand how to scale our edge gateway and what dimensions affected the scale.
Unfortunately, we couldn’t get very far with Ambassador. Before getting to a realistic test (we had 10 tenants with their associated applications and services deployed), we noticed Ambassador was struggling to keep up with Mapping changes (ie, it took almost 2min for some changes to become effective) and was consuming a lot of memory (around 2.3 GB). In fact, even without configuration changes, the Ambassador pod was in a continuous state of crashing.
A quick breakdown of the project’s dimensions: each namespace had about 13 Deployments and 60 Kubernetes Services, as well as a handful each of Secrets, ConfigMaps, and Jobs. For Ambassador mappings, this translated to 20 Ambassador CRs per namespace (combination of Mappings -- not all Services needed Mappings -- TCPMappings and TLSContexts)
At this point, we figured we hit some configuration limitation, could tune it later, and just proceeded to scale up the number of replicas. After doing that, we saw the same behavior from all of the replicas (each using 2.3 GB). Each one seemed to be doing the same thing (serving the control plane), struggling to keep up, and falling over. This is when we realized the architecture of Ambassador was limiting itself since the control plane was duplicating work in every replica.
We brought this observation to the Datawire/Ambassador slack and didn’t get much response. To be fair, this was right when they had launched 1.0.0, so things were understandably overwhelming for a company with 20-30 employees. After digging into the code, we saw that the Ambassador implementation relies on watching Kubernetes and writing snippets of configuration to the file system, while other processes within the same container pick up those snippets, write some envoy configuration to the file system, and then eventually serve config to Envoy. All of these different processes “kick” each other with SIGHUP posix signals on file changes.
We could not move forward with using Ambassador as our Ingress controller, but we still wanted to use one which was based on Envoy. So, despite initially feeling intimidated by the docs (when compared to Ambassador) we looked at Gloo API Gateway from Solo.io
We joined the Solo.io slack and started to dig into Gloo. Gloo is also an API Gateway built on Envoy, but the first thing we noticed was that its control plane was separate from the Envoy proxy. The second thing we noticed, which only became more apparent as we familiarized ourselves with Envoy documentation, was the CRDs for Gloo actually aren't that scary. They were the closest “zero-cost” mapping of fields to upstream Envoy structs we have found.
At first, we didn’t do much tuning and found some issues. We worked closely with the Solo.io folks in their Slack and were able to quickly resolve some of the issues. And by “resolve some of the issues'' I mean one of their developers and I chatted, they made some changes, and pushed a new container image to their personal dockerhub so I could reference it in my Gloo Deployment YAML to try in the soak tests we were currently running. This happened several times as we pushed further and further with Gloo, hitting issues that had their engineers fired up to quickly fix as we continued pushing Gloo.
As we started to explore its API, we also found it easy to work with and had many options for tuning. Not all of the options were immediately easy to find, and we do wish Solo.io’s docs had better guidance around tuning, but with guidance from their Slack, we were able to understand the knobs to tune. We found this to be very powerful because you can trim Gloo down to be ultra efficient.
Once we put Gloo into the test harness we had set up for testing Ambassador, we noticed that it scaled a lot better. There were some areas that needed tweaking and tuning, but we were happy that we had the options to do this. We found that turning off things like automatic function discovery, and the additional Kubernetes Service mapping Gloo does out of the box, the resource usage ended up significantly lower per tenant than what we saw with Ambassador (~200mb per tenant with Ambassador, Gloo was around ~15mb -- even NGINX was around 110mb at idle per tenant) and scaled quite well.
In fact, we wanted to push our infrastructure as far as we could to see where Gloo would fall over. We created a cluster sized the same as our staging cluster and created over 200 tenants.
Gloo held up.
We ran into other infrastructure limits, but we were quite happy with Gloo’s sub-linear scaling. The more load we added, the more Gloo scaled, and the cheaper it got.
Envoy Proxy is a powerful component for doing edge routing, but just as important is selecting the right control plane. We found working with the Gloo API and the responsiveness of their community and engineers to be a big reason why we were able to begin replacing our infrastructure with a more cost-effective solution.