Starship - a dynamic layer 4 load balancer

#showdev #devops

This is my very first post, please be kind!

Running thousands of pods on Kubernetes over multiple clusters or regions is relatively simple now a-days with Google.

But what about Multi Tenant Clusters, where users pods are distributed randomly over a pool of clusters.

To put it simply, how to route a.com and b.com ... dynamically based on their host name to the correct cluster.

This was a problem we run into at The Remote Company recently and we had to find a solution. We also needed to work around the 110 pods per node issue within Kubernetes, that also stopped us from using 1 very large cluster.

We know we will very quickly hit these limits and we wanted to overcome them from the very beginning. We launched Ycode Beta our no-code product to thousands of users last month. Starship has been handling all the traffic from day 1, and continues to scale inside our infrastructure with our clients web apps.

So first of all we started with a good old Google search to see what was around, no point in reinventing the wheel, unfortunately nothing suited our needs, we wanted a flexible solution and something we could fit into our current infrastructure without too much effort.

In the end we came to the conclusion that we had to build something bespoke, I had already built a Dynamic Nginx SNI Load Balancer in the past and I was confident that this same stack would work, Nginx + Lua is a nice combination.

We started researching the various different modules that OpenResty provides in their core. Because we needed to route HTTP/S traffic we had to use ngx_stream_lua_module and didn't want to terminate TLS here this would be done downstream on the Nginx Ingress.

From there we began thinking about the API and how to interact with Lua we needed a cache layer and having used https://github.com/thibaultcha/lua-resty-mlcache in the past it was again a no brainer, it provides a really nice approach for caching data in various different levels inside the Nginx processes.

The API is simple, on request lookup the host (via ngx.var.host or ngx.var.ssl_preread_server_name depending on the request subsystem) and call a cache callback to query an internal API for the upstream IP to proxy too.

stream {
    ...

    init_by_lua_file "lualib/init.lua";

    server {
        listen 443;
        ...

        set $endpoint "default";

        preread_by_lua_block{
            stream: get_cluster()
        }

        proxy_pass $endpoint;
        ssl_preread on;
    }
}

For http we actually had to use another server block, I didn't like this but as the current stream Lua module doesn't have access_by_lua_* https://github.com/openresty/stream-lua-nginx-module#todo we cant read the host of a normal TCP request (not easily)

function _M.get_cluster() 
    ...

    res = stream_cache:get(host, nil, api.cache_callback, host)

    ...

    ngx.var.endpoint = res

end

function _M.cache_callback(host)

    local httpapi = http.new()

    local res, err = httpapi:request_uri(uri, {
        method = "GET"
    })

    ...

    return res
end

The internal API would respond with a simple JSON response.

{
    app: 'A Simple App'
    host: 'a.com'
    endpoint: '10.0.0.24'
}

The JSON response includes an endpoint, this would be an internal Google Load Balancer IP, so all traffic must go thought Starship. The initial request would then just continue onto the desired upstream cluster, and Kubernetes would do its magic, handle TLS etc ...

With this idea built and working locally, with a mock api server and docker, we was able to build on the above API and include fallback error pages and some error checking/logging.

If the API request failed or that host doesn't exist then the internal API would return nil and our cache would store this nil value for a specific time, in this case we show a 404 (App not found) page if it is HTTP. We unfortunately cant return anything if the request is HTTPS, we can however give a 495 response this is a little nicer than the default browser response.

This has worked out quite well for us so far, and as we scale we may have to tweak some config, This could have probably been achieved with simple scripting of Nginx configs or even using HAProxy but this I am sure would have been more complicated, and harder to maintain and scale.

Here at The Remote Company "We are big believers in the power of keeping it simple"

If you have any ideas or a way to improve on this then please reach out to me 👍