DEV Community: David Bond

Go: Creating Dynamic Kubernetes Informers

David Bond — Wed, 04 Aug 2021 00:11:28 +0000

Introduction

Recently, I published v1.0.0 of Kollect, a dynamic Kubernetes informer that
publishes changes in cluster resources to a configurable event bus. At the heart of this project is a dynamic informer,
a method of handling add/update/delete notifications of arbitrary cluster resources (including those added as a CustomResourceDefinition).

This kind of tooling is quite powerful, as you can perform operations on arbitrary resources without knowing their structure.
This is especially useful in situations where you cannot import the canonical types for those resources from public
repositories, or they're too large and complex to write your own types without a lot of time and effort.

For example, using Kollect (or your own informer), you can track changes to your resources in real time, or check that
resources follow best practices as they change, using a tool like Open Policy Agent.

In this post, I'll attempt to get you started writing a dynamic Kubernetes informer that will allow you to perform operations
when any resources of your choosing change.

Getting Started

Every go project starts with initialising a new Go module:

go mod init github.com/myname/myinformer

And a main.go:

package main

func main() {

}

Cluster Authentication

In order to start handling notifications, we're going to need to authenticate with the cluster that we're running in or
against. This means we have two separate methods of authentication:

Kubeconfig - Pointing directly to a kubeconfig file that our application accesses on startup. This would be used typically when your program is not running in the cluster it is watching.
In-cluster - Obtaining permissions based on the ServiceAccount associated with the Pod that our program is running in within the cluster we want to watch.

We're going to use the k8s.io/client-go package, so you'll need to run:

go get k8s.io/client-go

Now we've downloaded the dependency, let's update our main.go to create a Kubernetes API cluster config based on where
our application is running:

package main

import (
    "log"
    "os"

    "k8s.io/client-go/dynamic"
    "k8s.io/client-go/rest"
    "k8s.io/client-go/tools/clientcmd"
)

func main() {
    kubeConfig := os.Getenv("KUBECONFIG")

    var clusterConfig *rest.Config
    var err error
    if kubeConfig != "" {
        clusterConfig, err = clientcmd.BuildConfigFromFlags("", kubeConfig)
    } else {
        clusterConfig, err = rest.InClusterConfig()
    }
    if err != nil {
        log.Fatalln(err)
    }

    clusterClient, err := dynamic.NewForConfig(clusterConfig)
    if err != nil {
        log.Fatalln(err)
    }
}

The code above checks for the presence of the KUBECONFIG environment variable, if it is present, we create our cluster
configuration using the clientcmd package. Otherwise, we use the rest package to assume credentials from the Pod
we're running in. Then, we create a new dynamic client.

The dynamic package, allows us to query cluster resources as unstructured.Unstructured types. These are basically
wrappers around map[string]interface{} that have helper methods for obtaining Kubernetes resource specifics such as the
API version, group, kind, labels, annotations etc.

Monitoring Resources

Now that we've authenticated against the cluster, we can start monitoring resources. We'll do this with the dynamicinformer
package. We're also going to need to decide which resources we want to watch and create an informer for each one. In this
example, we'll create a single informer that watches Deployment resources, but you can easily extend it to watch
multiple resources.

package main

import (
    "log"
    "os"
    "time"

    "k8s.io/client-go/dynamic"
    "k8s.io/client-go/rest"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/dynamic/dynamicinformer"
    "k8s.io/apimachinery/pkg/runtime/schema"
    corev1 "k8s.io/api/core/v1"
)

func main() {
    kubeConfig := os.Getenv("KUBECONFIG")

    var clusterConfig *rest.Config
    var err error
    if kubeConfig != "" {
        clusterConfig, err = clientcmd.BuildConfigFromFlags("", kubeConfig)
    } else {
        clusterConfig, err = rest.InClusterConfig()
    }
    if err != nil {
        log.Fatalln(err)
    }

    clusterClient, err := dynamic.NewForConfig(clusterConfig)
    if err != nil {
        log.Fatalln(err)
    }

    resource := schema.GroupVersionResource{Group:"apps", Version:"v1", Resource: "deployments"}
    factory := dynamicinformer.NewFilteredDynamicSharedInformerFactory(clusterClient, time.Minute, corev1.NamespaceAll, nil)
    informer := factory.ForResource(resource).Informer()
}

Notice that when we call NewFilteredDynamicSharedInformerFactory, we pass in corev1.NamespaceAll as the namespace to
watch resources in. This causes the informer to watch over all namespaces within the cluster. You can modify this to only
a specific namespace, or filter by namespace in the handler methods.

Now that we've created a new informer that will watch for changes in Deployment resources, we need to register handler
functions for add, update and delete events. This is done via the informer.AddEventHandler method:

package main

import (
    "log"
    "os"
    "time"
    "os/signal"
    "context"

    "k8s.io/client-go/dynamic"
    "k8s.io/client-go/rest"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/dynamic/dynamicinformer"
    "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
    "k8s.io/apimachinery/pkg/runtime/schema"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/client-go/tools/cache"
)

func main() {
    kubeConfig := os.Getenv("KUBECONFIG")

    var clusterConfig *rest.Config
    var err error
    if kubeConfig != "" {
        clusterConfig, err = clientcmd.BuildConfigFromFlags("", kubeConfig)
    } else {
        clusterConfig, err = rest.InClusterConfig()
    }
    if err != nil {
        log.Fatalln(err)
    }

    clusterClient, err := dynamic.NewForConfig(clusterConfig)
    if err != nil {
        log.Fatalln(err)
    }

    resource := schema.GroupVersionResource{Group:"apps", Version:"v1", Resource: "deployments"}
    factory := dynamicinformer.NewFilteredDynamicSharedInformerFactory(clusterClient, time.Minute, corev1.NamespaceAll, nil)
    informer := factory.ForResource(resource).Informer()

    informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(obj interface{}) {
            u := obj.(*unstructured.Unstructured)
        },
        UpdateFunc: func(oldObj, newObj interface{}) {},
        DeleteFunc: func(obj interface{}) {},
    })

    ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
    defer cancel()

    informer.Run(ctx.Done())
}

Notice that for AddFunc, UpdateFunc and DeleteFunc that the parameters are passed as interface{}, because we're
using the dynamicinformer package, we can assume these are instances of *unstructured.Unstructured and safely cast them.

We're also creating a context.Context that is cancelled on an os.Interrupt signal. This allows us to prevent the application
from exiting until it receives an interrupt signal. Its Done channel is passed to informer.Run, to keep the informer
alive until execution is cancelled.

From here, your handling logic is your own, do what you want when resources are added, updated or changed. Further sections
in this post will cover additional considerations regarding cache syncing and using RBAC to give your Pod access to
the Kubernetes API.

Cache Syncing

When an informer starts, it will build a cache of all resources it currently watches which is lost when the application
restarts. This means that on startup, each of your handler functions will be invoked as the initial state is built. If this
is not desirable for your use case, you can wait until the caches are synced before performing any updates using the
cache.WaitForCacheSync function:

package main

import (
    "log"
    "os"
    "sync"
    "time"
    "os/signal"
    "context"

    "k8s.io/client-go/dynamic"
    "k8s.io/client-go/rest"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/dynamic/dynamicinformer"
    "k8s.io/apimachinery/pkg/runtime/schema"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/client-go/tools/cache"
)

func main() {
    kubeConfig := os.Getenv("KUBECONFIG")

    var clusterConfig *rest.Config
    var err error
    if kubeConfig != "" {
        clusterConfig, err = clientcmd.BuildConfigFromFlags("", kubeConfig)
    } else {
        clusterConfig, err = rest.InClusterConfig()
    }
    if err != nil {
        log.Fatalln(err)
    }

    clusterClient, err := dynamic.NewForConfig(clusterConfig)
    if err != nil {
        log.Fatalln(err)
    }

    resource := schema.GroupVersionResource{Group: "apps", Version: "v1", Resource: "deployments"}
    factory := dynamicinformer.NewFilteredDynamicSharedInformerFactory(clusterClient, time.Minute, corev1.NamespaceAll, nil)
    informer := factory.ForResource(resource).Informer()

    mux := &sync.RWMutex{}
    synced := false
    informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(obj interface{}) {
            mux.RLock()
            defer mux.RUnlock()
            if !synced {
                return
            }

            // Handler logic
        },
        UpdateFunc: func(oldObj, newObj interface{}) {
            mux.RLock()
            defer mux.RUnlock()
            if !synced {
                return
            }

            // Handler logic
        },
        DeleteFunc: func(obj interface{}) {
            mux.RLock()
            defer mux.RUnlock()
            if !synced {
                return
            }

            // Handler logic
        },
    })

    ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
    defer cancel()

    go informer.Run(ctx.Done())

    isSynced := cache.WaitForCacheSync(ctx.Done(), informer.HasSynced)
    mux.Lock()
    synced = isSynced
    mux.Unlock()

    if !isSynced {
        log.Fatal("failed to sync")
    }

    <-ctx.Done()
}

In the code above, we use a boolean synced to indicate that the caches are finished syncing and that our handler functions
are only being invoked once the initial state of the watched resources has been built. We've had to make some modifications,
like starting the informer asynchronously using a go statement, as the caches will not start building until informer.Run
is called.

It may seem unintuitive at first, but we also don't directly assign the return value of WaitForCacheSync to the synced
variable within a mutex lock. This is because the handler functions are being invoked while the cache is syncing and will
effectively be queued. If we lock that mutex initially, the updates that occurred while the cache was syncing will still trigger
our handler functions. This means we need to only reassign synced once we're sure the cache sync is complete.

RBAC

Finally, when running within a cluster, we're going to need to use RBAC to provide the ServiceAccount the appropriate
permissions to monitor resources of our choosing. This is done using the Role/RoleBinding resources (if you're handling
things at the namespace level) or the ClusteRole/ClusterRoleBinding resources (if you're handling things at the cluster
level). You can view full documentation for these resources here

Let's create a ServiceAccount, ClusterRole and ClusterRoleBinding to match our code above. It will allow us to watch
all Deployment resources in all namespaces:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: myinformer
  namespace: mynamespace

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: deployment-informer
rules:
- apiGroups: ["apps/v1"]
  resources: ["deployments"]
  verbs: ["get", "watch", "list"]

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: read-secrets-global
subjects:
- kind: ServiceAccount
  name: myinformer
  namespace: mynamespace
roleRef:
  kind: ClusterRole
  name: deployment-informer
  apiGroup: rbac.authorization.k8s.io

When you deploy your application within the cluster, use the serviceAccountName field in the pod specification to
the myinformer one created above. This will provide the Pod with access to the Kubernetes API, specifically to perform
get, list and watch request on Deployment resources.

Wrapping Up

Hopefully this post has given you enough insight into the world of Kubernetes informers to implement your own. As said
at the start, I used code like this to implement Kollect, and it works as well
as you would expect.

Homelab: Accessing my k3s cluster securely from anywhere with Tailscale

David Bond — Wed, 30 Dec 2020 00:00:00 +0000

Introduction

At home, I run my own k3s cluster on 4 Raspberry Pi 4Bs. In order to access the
services I run from anywhere without exposing my cluster to the open internet I use Tailscale,
a service designed to make a private VPN really easy to set up. I run a bunch of services, including (but not
limited to) a password manager, Google Photos alternative, finance management tools etc.

This post aims to describe how my cluster is set up to use Tailscale, allowing me to resolve DNS via Cloudflare restricting
access solely to me (or anyone I share my Tailscale machines with). This allows me to go to https://bitwarden.homelab.dsb.dev
on any device I have connected to my Tailscale network and access my own password manager instance.

Cluster Setup

My k3s cluster consists of four nodes. Each one is a Raspberry Pi 4B+,
the 8GB model. I've been pleasantly surprised with how much you can run on these small machines, every year they seem to
pack more and more power into a credit card's worth of space. The GitHub repository
has a full overview of the setup you can view for yourself.

Installing Tailscale

Each node in the cluster is running Ubuntu for Raspberry Pi, so installing Tailscale is
as simple as following the instructions for ubuntu.

Add Tailscale’s package signing key and repository

curl -fsSL https://pkgs.Tailscale.com/stable/ubuntu/focal.gpg | sudo apt-key add -
curl -fsSL https://pkgs.Tailscale.com/stable/ubuntu/focal.list | sudo tee /etc/apt/sources.list.d/Tailscale.list

Personally, I use the unstable repository instead, because I like to be bleeding edge. It's worth adding that I keep
regular backups of everything on my cluster, just in case my bleeding-edge tendencies end up with me breaking my cluster
or losing access to things I need.

Install Tailscale

sudo apt-get update
sudo apt-get install Tailscale

Authenticate and connect your machine to your Tailscale network

sudo Tailscale up

I did this for each node, you could use a terminal multiplexer (like tmux) to speed things
up a bit.

Installing K3S

Next, we need to install k3s on each node to get a cluster running. The documentation
is the authoritative source for this, but I'm also going to outline the quickstart steps here.

Install the k3s server for the control plane node. Including setting the node's advertise address as the Tailscale IP, rather than the local network IP, this is done via the --bind-address flag. This is optional, but saves you having to set up a static IP address for your machine. It also means the cluster nodes will communicate via the Tailscale network.

curl -sfL https://get.k3s.io | sh -s - --bind-address <TAILSCALE_IP>

Rancher have provided a simple script to get things up and running quickly. I'd advise you take a look at it first rather
than just running some script off the internet. Once complete, grab the token required for your agent nodes to join the
cluster. This is stored at /var/lib/rancher/k3s/server/node-token.

Install the k3s agent on all the other nodes

Here, we set up each agent node in the cluster. Once again, rancher have provided a simple script:

curl -sfL https://get.k3s.io | K3S_URL=https://<SERVER_TAILSCALE_IP>:6443 K3S_TOKEN=<NODE_TOKEN> sh -

SERVER_TAILSCALE_IP is the Tailscale IP address of your control-plane node. NODE_TOKEN is the token mentioned in the
previous step. After following these steps, you've now got a k3s cluster running, where all nodes communicate via
Tailscale. Go you!

Ingresses

By default, k3s comes with Traefik already deployed. Because I'm using a .dev domain, I also
needed to ensure everything I serve on my domain was using https. To do this, I've added cert-manager
to my cluster. Cert-manager allows me to generate TLS certificates for my ingresses automatically via letsencrypt.
All I have to do is add additional annotations to my Ingress resources.

If you also want to use cert-manager, it's easiest for you to follow their instructions,
as explaining it all here would be out of scope for this blog post, since you may not even care about using HTTPS at all.
In brief, my cert-manager deployment authenticates with cloudflare using an API key with limited permissions using a
DNS-01 challenge. You can read more about DNS-01 challenges here.
You can also see my cert-manager deployment here.

Here's my Ingress resource for my Bitwarden deployment:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: bitwarden
  annotations:
    kubernetes.io/ingress.class: traefik
    traefik.ingress.kubernetes.io/router.tls: "true"
    traefik.ingress.kubernetes.io/router.entrypoints: https
    cert-manager.io/cluster-issuer: cloudflare
spec:
  tls:
  - hosts:
    - bitwarden.homelab.dsb.dev
    secretName: bitwarden-tls
  rules:
  - host: bitwarden.homelab.dsb.dev
    http:
      paths:
      - backend:
          service:
            name: bitwarden
            port:
              number: 80
        path: /
        pathType: Prefix

Here I'm telling Traefik that any inbound requests for bitwarden.homelab.dsb.dev should route to a Service resource
named bitwarden, and that TLS certificates should be stored in a secret named bitwarden-tls, issued via cert-manager.

Cloudflare DNS

The last step is to set up appropriate DNS records to route requests to the cluster when connected to the Tailscale
network. For my use-case, I want any subdomain of homelab.dsb.dev to go straight to the cluster. This way, I don't
need DNS records for each individual application I want to expose.

I manage these records using Terraform, and the setup is fairly straightforward. To start
with, I needed to set up the cloudflare provider:

provider "cloudflare" {
  email   = var.cloudflare_email
  api_key = var.cloudflare_api_key
}

All you need is to provide the email address you use for your cloudflare account, and the API key. Next, you need to
be able to grab the zone identifier for your domain. Since I have a single domain dsb.dev, I just created a simple
data source that returns all my cloudflare zones:

data "cloudflare_zones" "dsb_dev" {
  filter {}
}

If you have multiple domains, you're going to want to modify that filter to return the one you care about. You can see
the documentation for that here.

Lastly, I needed to create a DNS record for each node in the cluster, using its Tailscale IP address. The name of each
record is *.homelab, which specifies that any requests to a subdomain of homelab.dsb.dev gets sent straight to the
cluster, providing you have access to the Tailscale network:

resource "cloudflare_record" "homelab_0" {
  zone_id = lookup(data.cloudflare_zones.dsb_dev.zones[0], "id")
  name    = "*.homelab"
  value   = var.homelab_0_ip
  type    = "A"
  ttl     = 3600
}

resource "cloudflare_record" "homelab_1" {
  zone_id = lookup(data.cloudflare_zones.dsb_dev.zones[0], "id")
  name    = "*.homelab"
  value   = var.homelab_1_ip
  type    = "A"
  ttl     = 3600
}

resource "cloudflare_record" "homelab_2" {
  zone_id = lookup(data.cloudflare_zones.dsb_dev.zones[0], "id")
  name    = "*.homelab"
  value   = var.homelab_2_ip
  type    = "A"
  ttl     = 3600
}

resource "cloudflare_record" "homelab_3" {
  zone_id = lookup(data.cloudflare_zones.dsb_dev.zones[0], "id")
  name    = "*.homelab"
  value   = var.homelab_3_ip
  type    = "A"
  ttl     = 3600
}

You can view the full terraform configuration here.

I run Traefik as a DaemonSet in my cluster, meaning whichever node receives the request can route it to the appropriate
service regardless of the node its running on. This allows me to do some basic load balancing. The main caveat here, is for
each new node I add, I also need a new DNS record, but since this is my homelab, I'm not planning on increasing the node size
to a larger size where I'd need to automate this.

Wrapping up

The above setup allows me to access all the applications I have running on my home k3s cluster from anywhere providing
I have a connection to the Tailscale network. This works great for me, especially since Tailscale also has an android
app, which allows me to access my password manager and other applications on my phone, all without exposing my cluster
to the public internet!

Combining it with cert-manager also gives me the ability to secure everything with HTTPS and use a FQDN on a domain
that I own.

Go: Structuring repositories with protocol buffers

David Bond — Sun, 01 Mar 2020 00:00:00 +0000

Introduction

In my current position at Utility Warehouse, my team keeps our go code for all our services within a monorepo. This includes all our protocol buffer definitions that are used to generate client/service code to allow our services to interact.

This post aims to outline our way of organising our protocol buffer code and how we perform code generation to ensure allservices are up-to-date with the latest contracts when things change. This posts expects you to already be familiar withprotocol buffers.

You could also use the structure explained in this post to create a single repository that contains all proto definitions for all services (or whatever else you use them for) and serve it as a go module.

What are protocol buffers?

Taken from Google’s documentation

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

On top of using them for service-to-service communication, we also use them as our serialization format forthe event-sourced aspects of our systems, where proto messages are sent over the wire via Apache Kafka and NATS. Which also allows systems that consume/produceevents to always have the most up-to-date definitions.

The ‘proto’ directory

At the top level of our repository lives the proto directory. This is where all .proto files live, as well as third-party definitions (such as those provided by google or from other teams within the business).

Our team is the partner platform, so our specific proto definitions are found in a subdirectory named partner. Below this are the different domains we deal with. Subdirectories here include aspects such as identity, or document for services that deal with authentication or the management of individual partner’s documents.

Below here are either versioned or service directories. Let’s say we have a gRPC API that serves documents for a partner,the proto definitions will are found under partner/document/service/v* (where * is the major version number for the service). Alternatively, if we have domain objects we want to share across multiple proto packages, we keep those under partner/document/v*. Using versioned directories like this allows us to version our proto packages easily and have the package names reflect the location of those files within the repository.

Here’s a full example of what this looks like:

.
└── proto
    ├── partner
    │   └── document
    │   ├── service
    │   │   └── v1
    | | └── service.proto # gRPC service definitions, DTOs etc
    │   └── v1
    |   └── models.proto # Shared domain objects

Writing protocol buffer definitions

Next, lets take a look at how we actually define our protocol buffers. There’s nothing particularly out of the ordinary here that you wouldn’t see in most other definitions. The most important part is the package declaration. We make sureour package names reflect the relative location of the protocol buffer files. In the example above, the packages are namedpartner.document.service.v1 and partner.document.v1.

Here’s an example of what the top of our .proto files look like:

syntax = "proto3";

// Additional imports go here

package partner.document.service.v1;

option go_package = "github.com/utilitywarehouse/<repo>/proto/gen/go/partner/document/service/v1;document";

// Service & message definitions go here

We’re also using the buf tool in order to lint our files and check for breaking changes.

Generating code from protocol buffers

Finally, we need to generate our code so we can use it in our go services. We commit and keep all our generated source code within the repository along with the definitions. This means that when code is regenerated, all services that depend on that generated code are updated at once.

To achieve our code generation, we use a bash script that finds all directories containing at least one .proto fileand runs the protoc command. This will output our generated code in directories relative to the respective .proto files within a proto/gen/go subdirectory. If we wanted to extend this to other languages (Java, TypeScript etc), these would be kept underneath proto/gen/<language_name>.

The script lives at proto/generate.sh, the important part looks like this:

#!/usr/bin/env bash

# Get current directory.
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

# Find all directories containing at least one prototfile.
# Based on: https://buf.build/docs/migration-prototool#prototool-generate.
for dir in $(find ${DIR}/partner -name '*.proto' -print0 | xargs -0 -n1 dirname | sort | uniq); do
  files=$(find "${dir}" -name '*.proto')

  # Generate all files with protoc-gen-go.
  protoc -I ${DIR} --go_out=plugins=grpc,paths=source_relative:${DIR}/gen/go ${files}
done

We also have some extra utilities, such as running additional generators when certain import directives are used withinthe proto definitions. For example, if go-proto-validators are used within a definition. We will also generate code using --govalidators_out. Rinse and repeat for some additional tooling and some internal ones.

Generated package names

If you’re anal like myself, you may not like the go package names you get as a result of this. In the example above, you end up with a package name of partner_document_v1, which isn’t pretty to look at unless you alias it when importing it.

To solve this, you can specify option go_package in order to override the generated package name. This is purely optional, but it allows us to have package names like document instead. You can read more about this optionhere

Go: Creating gRPC interceptors

David Bond — Fri, 14 Jun 2019 00:00:00 +0000

Introduction

Just like when building HTTP APIs, sometimes you need middleware that applies to your HTTP handlers for things like request validation, authentication etc. In gRPC this is no different. Methods for authentication need to be applied to both servers and clients in an ‘all or none’ fashion. For the uninitiated, gRPC describes itself as:

A modern open source high performance RPC framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. It is also applicable in last mile of distributed computing to connect devices, mobile applications and browsers to backend services.

The key difference here is that in HTTP we create middleware for handlers (purely on the server side). With gRPC we can create middleware for both inbound calls on the server side and outbound calls on the client side. This post aims to outline how you can create simple gRPC interceptors that act as middleware for your clients and servers.

Interceptor Types

In gRPC there are two kinds of interceptors, unary and stream. Unary interceptors handle single request/response RPC calls whereas stream interceptors handle RPC calls where streams of messages are written in either direction. You can get more in-depth details on the differences between them here. On top of this, you can create interceptors that apply to both servers and clients.

Unary Client Interceptors

In situations where we have a simple call & response, we need to create a unary client interceptor. This is a function that matches the signature of grpc.UnaryClientInterceptor and looks like this:

func Interceptor(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
  // Do some things and invoke `invoker` to finish the request
}

This signature has a lot of parameters, so lets look at each one and what they’re for:

ctx context.Context - This is the request context and will be used primarily for timeouts. It can also be used to add/read request metadata.
method string - The name of the RPC method being called.
req interface{} - The request instance, this is an interface{} as reflection is used for the marshalling
reply interface{} - The response instance, works the same way as the req parameter
cc *grpc.ClientConn - The underlying client connection to the server.
invoker grpc.UnaryInvoker - The RPC invocation method. Similarly to HTTP middleware where you call ServeHTTP, this needs to be invoked for the RPC call to be made.
opts ...grpc.CallOption - The grpc.CallOption instances used to configure the gRPC call.

With all of these, we get a lot of information about the call being made. This makes it quite straightforward to create things like logging middleware that will write out RPC call information.

Unary Server Interceptors

Server interceptors look fairly similar to the client, with the exception that they allow us to modify the response returned from the gRPC call. Here’s the function signature, it’s defined as grpc.UnaryServerInterceptor:

func Interceptor(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
    // Invoke 'handler' to use your gRPC server implementation and get
    // the response.
}

Like with the client, there’s a few different params here:

ctx context.Context - This is the request context and will be used primarily for timeouts. It can also be used to add/read request metadata.
req interface{} - The inbound request
info *grpc.UnaryServerInfo - Information on the gRPC server that is handling the request
handler grpc.UnaryHandler - The handler for the inbound request, you’ll need to invoke this otherwise you won’t be getting your response to the client.

Stream Client Interceptors

Working with streams works pretty much the same, here’s the signature of grpc.StreamClientInterceptor:

func Interceptor(ctx context.Context, desc *grpc.StreamDesc, cc *grpc.ClientConn, method string, streamer grpc.Streamer, opts ...grpc.CallOption) (grpc.ClientStream, error) {
    // Call 'streamer' to write messages to the stream before this function returns
}

ctx context.Context - This is the request context and will be used primarily for timeouts. It can also be used to add/read request metadata.
desc *grpc.StreamDesc - Represents a streaming RPC service’s method specification.
cc *grpc.ClientConn - The underlying client connection to the server.
method string - The name of the gRPC method being called.
streamer grpc.Streamer - Called by the interceptor to create a stream.

Stream Server Interceptors

Below is the signature of grpc.StreamServerInterceptor

func Interceptor(srv interface{}, stream grpc.ServerStream, info *grpc.StreamServerInfo, handler grpc.StreamHandler) error {
    // Call 'handler' to invoke the stream handler before this function returns
}

srv interface{} - The server implementation
stream grpc.ServerStream - Defines the server-side behavior of a streaming RPC.
info *grpc.StreamServerInfo - Various information about the streaming RPC on server side
handler grpc.StreamHandler - The handler called by gRPC server to complete the execution of a streaming RPC

Creating an interceptor

For this post, lets say we have a gRPC client and server that authenticate via a JWT token that we obtain via an HTTP API. If the provided JWT token is no longer valid, the server will return an appropriate status code that will be detected by the interceptor, triggering a call to the HTTP API to refresh the token. We’re going to use a unary client interceptor to achieve this, but the code can be easily ported for client streams and servers.

Note: There are plenty of open-source implementations for token-based authentication on gRPC, the code in this post is just to serve as an example. Ideally, you’ll want something stronger than just a username and password combo. You can check out lots of different interceptor implementations in the grpc-ecosystem/go-grpc-middleware repository

To start, we’ll need a type to store our JWT token and authentication details, we’re going to use basic auth to obtain the token.

type (
    JWTInterceptor struct {
        http *http.Client // The HTTP client for calling the token-serving API
        token string // The JWT token that will be used in every call to the server
        username string // The username for basic authentication
        password string // The password for basic authentication
        endpoint string // The HTTP endpoint to hit to obtain tokens
    }
)

Next, we’ll need our unary client interceptor that will add the JWT token to the request metadata for each outbound call, we’re following the bearer token approach:

func (jwt *JWTInterceptor) UnaryClientInterceptor(ctx context.Context, method string, req interface{}, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
    // Add the current bearer token to the metadata and call the RPC
    // command
    ctx = metadata.AppendToOutgoingContext(ctx, "authorization", "bearer "+t.token)
    return invoker(ctx, method, req, reply, cc, opts...)
}

The above will work for as long as the JWT token is valid. If the token has an expiry, we will eventually no longer be able to make calls to the server. So we need a method that can call the HTTP API that serves us tokens. The API accepts a JSON body and returns the token in the response body, we’ll also need some types to represent those

type(
    authResponse struct {
        Token string `json:"token"`
    }

    authRequest struct {
        Username string `json:"username"`
        Password string `json:"password"`
    }
)

Here are the functions for obtaining new JWT tokens. The API called will give back a 200 response with a JSON encoded body containing the token. It returns errors using http.Error so those are just string responses. Once we have the token, we set it on the JWT struct for later use.

func (jwt *JWTInterceptor) refreshBearerToken() error {
    resp, err := jwt.performAuthRequest()

    if err != nil {
        return err
    }

    var respBody authResponse
    if err = json.NewDecoder(resp.Body).Decode(&respBody); err != nil {
        return err
    }

    jwt.token = respBody.Token

    return resp.Body.Close()
}

func (jwt *JWTInterceptor) performAuthRequest() (*http.Response, error) {
    body := authRequest{
        Username: jwt.username,
        Password: jwt.password,
    }

    data, err := json.Marshal(body)

    if err != nil {
        return nil, err
    }

    buff := bytes.NewBuffer(data)
    resp, err := jwt.http.Post(jwt.endpoint, "application/json", buff)

    if err != nil {
        return resp, err
    }

    if resp.StatusCode != http.StatusOK {
        out := make([]byte, resp.ContentLength)
        if _, err = resp.Body.Read(out); err != nil {
            return resp, err
        }

        return resp, fmt.Errorf("unexpected authentication response: %s", string(out))
    }

    return resp, nil
}

With these defined, we can update our interceptor logic like so:

func (jwt *JWTInterceptor) UnaryClientInterceptor(ctx context.Context, method string, req interface{}, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
    // Create a new context with the token and make the first request
    authCtx := metadata.AppendToOutgoingContext(ctx, "authorization", "bearer "+jwt.token)
    err := invoker(authCtx, method, req, reply, cc, opts...)

    // If we got an unauthenticated response from the gRPC service, refresh the token
    if status.Code(err) == codes.Unauthenticated {
        if err = jwt.refreshBearerToken(); err != nil {
            return err
        }

        // Create a new context with the new token. We don't want to reuse 'authCtx' here
        // because we've already appended the invalid token. We're appending metadata to
        // a slice here rather than a map like HTTP headers, so the first one will be picked
        // up and invalid.
        updatedAuthCtx := metadata.AppendToOutgoingContext(ctx, "authorization", "bearer "+jwt.token)
        err = invoker(updatedAuthCtx, method, req, reply, cc, opts...)
    }

    return err
}

Testing an interceptor

Now that we’ve written the interceptor, we need some tests. It can be a little tricky asserting values within a context when your packages don’t define the keys that are used. Luckily the google.golang.org/grpc/metadata contains methods we can use to get the information we need and assert that it is what we expect. We’re going to implement our own version of the invoker method that will assert the existence of the JWT token in the metadata. We can then just call the JWTInterceptor.UnaryClientInterceptor method directly in our test, without connecting to or mocking a gRPC service.

I normally write using table driven tests, but for the sake of brevity I’ll just go through the steps you can take to pull the token out from the context and check its value.

In your custom invoker function, pull the outgoing metadata using metadata.FromOutgoingContext(ctx)
Convert your outbound context into an inbound one using metadata.NewIncomingContext(ctx, md) with the metadata from above.
Extract the JWT token using github.com/grpc-ecosystem/go-grpc-middleware/auth and the AuthFromMD method.
If the token isn’t what you expect or is blank, return codes.Unauthenticated using the google.golang.org/grpc/codes package.
Use a HTTP mock to catch the request for a token and handle it. (Either using the standard library or an HTTP mocking package like gock)

Using an interceptor

With our interceptor written, we can apply it using the grpc.With... methods like so:

// Create a new interceptor
jwt := &JWTInterceptor{
    // Set up all the members here
}

conn, err := grpc.Dial("localhost:5000", grpc.WithUnaryInterceptor(jwt.UnaryClientInterceptor))

// Perform the rest of your client setup

This works the same for servers as well. When you create your server you’ll have the option on providing unary/stream server interceptors.

Go: Creating distributed systems using memberlist

David Bond — Sun, 14 Apr 2019 00:00:00 +0000

Introduction

As scaling requirements have increased steadily throughout enterprise software the need to create distributed systems has increased. Leading to a variety of incredibly scalable products that rely on a distributed architecture. Wikipedia describes a distributed system as:

A system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another.

Examples of these systems range from data stores to event buses and so on. There are many applications for distributed systems. Because there are so many applications, there are also many off-the-shelf implementations of these distributed communications protocols that allow us to easily build self-discovering, distributed systems. This post aims to go into detail on the memberlist package and demonstrate how you can start building a distributed system using it.

I’ve currently used the library to create sse-cluster, a scalable Server Sent Events broker. It utilises memberlist in order to discover new nodes and propagate events to clients spread across different nodes. It was born from a need to scale an existing SSE implementation. It’s a half decent reference for using the package. I have yet to delve much into the fine-tuning aspect of the configuration.

So what is memberlist?

Memberlist is a Go library that manages cluster membership and member failure detection using a gossip-based protocol.

Sounds great, but what is a gossip-based protocol?

Imagine a team of developers who like to spread rumours about their coworkers. Let’s say every hour the developers congregate around the water cooler (or some equally banal office space). Each developer pairs off with another randomly and shares their new rumours with each other.

At the start of the day, Chris starts a new rumour: commenting to Alex that he believes that Mick is paid twice as much as everyone else. At the next meeting, Alex tells Marc, while Chris repeats the idea to David. After each rendezvous, the number of developers who have heard the rumour doubles (except in scenarios where a rumour has already been heard via another developer and has effectively been spread twice). Distributed systems typically implement this type of protocol with a form of random “peer selection”: with a given frequency, each machine picks another machine at random and shares any hot, spicy rumours.

This is a loose description of how an implementation of a gossip protocol may work. The memberlist package utilises SWIM but has been modified to increase propagation speeds, convergence rates and general robustness in the face of processing issues (like networking delays). Hashicorp have released a paper on this named Lifeguard: SWIM-ing with Situational Awareness, which goes into full detail on these modifications.

With this package, we’re able to create a self-aware cluster of nodes that can perform whatever tasks we see fit.

Creating a simple cluster

To start, we’ll need to define our configuration. The package contains some methods for generating default configuration based on the environment you intend to run your cluster in. Here they are:

DefaultLANConfig (Best for local networks):
- Uses the hostname as the node name
- Uses 7946 as the port for gossip communication
- Has a 10 second TCP timeout
DefaultLocalConfig (Best for loopback environments):
- Based on DefaultLANConfig
- Has a 1 second TCP timeout
DefaultWANConfig (Best for nodes on WAN environments):
- Based on DefaultLANConfig
- Has a 1 second TCP timeout

We’re going to run a 3 node cluster on a development machine, so we currently only need DefaultLocalConfig. We can initialize it like so:

config := memberlist.DefaultLocalConfig()

list, err := memberlist.Create(c)

if err != nil {
  panic(err)
}

If we want, we can also broadcast some custom metadata for each node in the cluster. This is useful if you want to use slightly varying configuration between nodes but still want them to communicate. This does not impact the operation of the memberlist itself, but can be used when building applications on top of it.

node := list.LocalNode()

// You can provide a byte representation of any metadata here. You can broadcast the
// config for each node in some serialized format like JSON. By default, this is
// limited to 512 bytes, so may not be suitable for large amounts of data.
node.Meta = []byte("some metadata")

This gets us as far as running a single node cluster. In order to join an existing cluster, we can use the list.Join() method to connect to one or more existing nodes. We can extend the example above to connect to an existing cluster.

// Create an array of nodes we can join. If you're using a loopback
// environment you'll need to make sure each node is using its own
// port. This can be set with the configuration's BindPort field.
nodes := []string{
  "0.0.0.0:7946"
}

if _, err := list.Join(nodes); err != nil {
  panic(err)
}

From here, we’ve successfully configured the client and joined an existing cluster. The package will output some logs so you can see the nodes syncing with each other as well as any errors they run into. On top of this, we need to gracefully leave the memberlist once we’re done. If we don’t handle a graceful exit, the other nodes in the cluster will treat it as a dead node, rather than one that has left.

To do this, we need to listen for a signal to exit the application, catch it and leave the cluster:

// Create a channel to listen for exit signals
stop := make(chan os.Signal, 1)

// Register the signals we want to be notified, these 3 indicate exit
// signals, similar to CTRL+C
signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP)

<-stop

// Leave the cluster with a 5 second timeout. If leaving takes more than 5
// seconds we return.
if err := ml.Leave(time.Second * 5); err != nil {
  panic(err)
}

Communication between members

Now that we can join and leave the cluster, we can use the member list to perform distributed operations.

Let’s create a simple messaging system. We could take a message via HTTP on a single node and propagate it to the next node in the cluster. This gives us an eventually consistent system that could be adapted into some sort of event bus.

This is by no means an optimal solution but demonstrates the power of service discovery in a clustered environment.

Let’s start with a node:

type (
  // The Node type represents a single node in the cluster, it contains
  // the list of other members in the cluster and an HTTP client for
  // directly messaging other nodes.
  Node struct {
    memberlist *memberlist.Memberlist
    http *http.Client
  }
)

Imagine this node receives a message from an HTTP handler that just takes the entire request body and forwards it to another node. We can implement a method that will iterate over members in the list and attempt to forward a message. Once the message has been successfully forwarded to a single node, it stops handling it. This means we have eventual consistency where eventually all nodes receive all messages.

func (n *Node) HandleMessage(msg []byte) {
  // Iterate over all members in the cluster
  for _, member := range n.memberlist.Members() {
    // We also need to make sure we don't send the message to the node
    // currently processing it
    if member == n.memberlist.LocalNode() {
      continue
    }

    // Memberlist gives us the IP address of every member. In this example,
    // they all handle HTTP traffic on port 8080. You can also provide custom
    // metadata for your node to provide interoperability between nodes with
    // varying configurations.
    url := fmt.Sprintf("http://%s:8080/publish", member.Addr)
    resp, err := n.http.Post(url, "application/json", bytes.NewBuffer(msg))

    if err != nil {
      // handle error and try next node
      continue
    }

    if resp.StatusCode != http.StatusOK {
      // handle unexpected status code and try next node
      continue
    }

    // Otherwise, we've forwarded the message and can do
    // something else.
    break
  }
}

Hopefully, this post has outlined how you can use the memberlist package to implement a clustered application. The library is very powerful and allows you to focus on the actual logic your cluster depends on rather than the underlying network infrastructure. In my experience, the time taken for members to synchronise is negligible, but you should keep in mind the protocol is eventual.

In the example above, we can’t guarantee that our message will be propagated to every single node if there is a lot of traffic in terms of nodes joining/leaving. Ideally, new members should join in a controlled manner and only when necessary.

Go: Reverse engineering an AKAI MPD26 using gousb

David Bond — Fri, 14 Dec 2018 00:00:00 +0000

Introduction

The other day, I discovered Google’s gousb package. A low level interface for interacting with USB devices in Golang. At the time of writing, it’s fairly one-of-a-kind. I haven’t seen many golang packages attempt to tackle interfacing with USB devices and was keen to give it a try.

I perused the pile of dead tech sitting around my flat. After some solid thought, I decided to reverse engineer an old AKAI MPD26 sampler. These things were a super popular choice back when they were first released. Nowadays, there are far fancier samplers available which much more feature-rich interfaces. Unfortunately, I never really got deep into creating electronic music/getting good at using a sampler. This seemed like a way to make it a worthwhile purchase.

To start, lets examine the different parts of the sampler we want to be able to read from. It provides:

6 faders, these are used for things like managing volume of various channels. You would assign these to something in your DAW that they can manipulate.
6 knobs, these are more for manipulating automation that you’ve applied to audio tracks, but could easily also be used like a fader and vice versa
16 pressure sensitive pads, these are used to trigger the sounds you want to hear.

It’s a fairly simple setup. There are a lot more buttons and knobs that modify the output of the aforementioned controls. For example, a ‘note repeat’ button which will cause pads to keep triggering if pressure is maintained on them.

I decided to set out some goals for how I’d like my interface to the sampler to work:

I want to implement it in Golang
It should provide a way to read values from individual aspects of the sampler using channels
It should abstract away as much of the nastiness of interfacing with USB devices as possible

Connecting to the USB interface

For honesty, I had never done any programming work related to USB devices before, so I didn’t really know what I was getting myself in to. Luckily, the gousb library provides a really simple interface. However, it requires some background reading on how connections with USB devices work.

The godoc page for the library has a pretty good explanation of how it works under the hood. I wish I’d read it first before trying to bruteforce my way in.

Figuring out which USB device to use

First challenge is figuring out which of the USB ports on the host machine is actually connected to the sampler. To do this, we need to know the product and vendor identifiers for the usb device.

This question on stack overflow has a good explanation of what these identifiers are:

The Vendor ID or VID is a 16-bit number which you have to buy from the USB Foundation. If you want to make USB device (and fully play by the rules) the VID identifies your organisation.

The Product ID or PID is also a 16-bit number but is under your control. When you purchase a VID you have the right to use that with every possible PID so this gives you 65536 possible VID:PID combinations. The intention is that a VID:PID combination should uniquely identify a particular poduct globally.

The AKAI MPD26 will already have a product and vendor identifier, so how do we find those? It’s actually fairly simple if you use the lsusb command on UNIX systems. After plugging in the device, I was able to locate it pretty easy.

> lsusb -v

Using this command, I was able to determine the product and vendor identifiers: 0x0078 and 0x09e8. Using these, we can use the gousb.Context.OpenDevices() method. This method takes an argument of func(desc *gousb.DeviceDesc) bool. For each connected USB device, the provided method is executed and should return true if we’ve found a device we’re interested in accessing.

const (
  product = 0x0078
  vendor = 0x09e8
)

func example() {
  ctx := gousb.NewContext()
  devices, _ := ctx.OpenDevices(findMPD26(product, vendor))

  // Do something with the device.
}

func findMPD26(product, vendor uint16) func(desc *gousb.DeviceDesc) bool {
  return func(desc *gousb.DeviceDesc) bool {
    return desc.Product == gousb.ID(product) && desc.Vendor == gousb.ID(vendor)
  }
}

Using this code, we get back an array of devices with one element, the sampler!

Reading from the USB device

When dealing with a USB device, we need to obtain three things: a configuration, an interface and an endpoint.

The library defines USB configuration as:

A config descriptor determines the list of available USB interfaces on the device.

Interfaces are defined too:

Each interface is a virtual device within the physical USB device and its active config. There can be many interfaces active concurrently. Interfaces are enumerated sequentially starting from zero.

And finally, endpoints:

An endpoint can be considered similar to a UDP/IP port, except the data transfers are unidirectional.

What we’re after is that endpoint, that is where we will be able to read data from the device and react to it. To get it, we need to figure out the correct configuration, obtain the interface and then the endpoint.

My first attempt at connecting to the USB device failed for a couple of reasons. I tried to use some of the convenience methods available in the gousb library. Mainly, the DefaultInterface and ActiveConfigNum methods.

Here’s the documentation for DefaultInterface:

DefaultInterface opens interface #0 with alternate setting #0 of the currently active config. It’s intended as a shortcut for devices that have the simplest interface of a single config, interface and alternate setting. The done func should be called to release the claimed interface and config.

And ActiveConfigNum:

ActiveConfigNum returns the config id of the active configuration. The value corresponds to the ConfigInfo.Config field of one of the ConfigInfos of this Device.

DefaultInterface should allow you to skip finding an appropriate configuration so you can just get straight to your desired endpoint. I’m not sure if it’s something to do with my machine, or the device itself, but this would return an error for me each time. I had the same issue with the ActiveConfigNum method.

However, when trying to connect to the device, I’d get the following error:

libusb: device or resource busy [code -6]

This is because the kernel has already assigned a driver to the USB device. In this case, pulseaudio was claiming the USB device as soon as it was plugged in since its an audio interface. I was able to debug this using the journalctl command while reconnecting the USB device.

This command is used to view Systemd logs and should let us know what is happening to our USB device whenever it is plugged in. Using the -f flag allows us to just read the most recent logs in real time. From this, I found that the pulseaudio driver would claim the device as soon as it was plugged in, so we can’t use it!

The fix is nice and easy, the gousb library provides a method on the Device type called SetAutoDetach that will take the device away from pulseaudio.

SetAutoDetach enables/disables automatic kernel driver detachment. When autodetach is enabled gousb will automatically detach the kernel driver on the interface and reattach it when releasing the interface. Automatic kernel driver detachment is disabled on newly opened device handles by default.

const (
  product = 0x0078
  vendor = 0x09e8
)

func example() {
  ctx := gousb.NewContext()
  devices, _ := ctx.OpenDevices(findMPD26(product, vendor))

  // Detach the device from whichever process already
  // has it.
  devices[0].SetAutoDetach(true)
}

func findMPD26(product, vendor uint16) func(desc *gousb.DeviceDesc) bool {
  return func(desc *gousb.DeviceDesc) bool {
    return desc.Product == gousb.ID(product) && desc.Vendor == gousb.ID(vendor)
  }
}

The next issue I faced was in the ActiveConfigNum and DefaultInterface methods. The configuration that the USB device was using would not allow me to use these methods. This means we have to make our own decisions on which config and interface to use.

To work around this, I decided to manually loop through configurations, then available interfaces. Once we get an interface we can use, we find the IN endpoint we can read from.

This code is a little bit ugly and I have excluded the error handling code for brevity. I’m sure there’s a nicer way of doing this but for the sake of learning it serves its purpose:

// Iterate through configurations
for num := range devices[0].Desc.Configs {
  config, _ := devices[0].Config(num)

  // In a scenario where we have an error, we can continue
  // to the next config. Same is true for interfaces and
  // endpoints.
  defer config.Close()

  // Iterate through available interfaces for this configuration
  for _, desc := range config.Desc.Interfaces {
    intf, _ := config.Interface(desc.Number, 0)

    // Iterate through endpoints available for this interface.
    for _, endpointDesc := range intf.Setting.Endpoints {
      // We only want to read, so we're looking for IN endpoints.
      if endpointDesc.Direction == gousb.EndpointDirectionIn {
        endpoint, _ := intf.InEndpoint(endpointDesc.Number)

        // When we get here, we have an endpoint where we can
        // read data from the USB device
      }
    }
  }
}

To stitch this all together we need a type that can hold all the contextual information about the USB device we’re interacting with. This is the aptly named MPD26 type:

type MPD26 struct {
  // Fields for interacting with the USB connection
  context *gousb.Context
  device *gousb.Device
  intf *gousb.Interface
  endpoint *gousb.InEndpoint

  // Additional fields we'll get to later
}

What we need now is a method that will constantly read from the endpoint and write values to channels. I’ve created an unexported method named read that runs an infinite loop in its own goroutine once the connection to the USB device is successful. Once again, error handling is redacted for clarity.

func (mpd *MPD26) read(interval time.Duration, maxSize int) {
  ticker := time.NewTicker(interval)
  defer ticker.Stop()

  for {
    select {
    case <-ticker.C:
      buff := make([]byte, maxSize)
      n, _ := mpd.endpoint.Read(buff)

      data := buff[:n]
      // Do something with this data
    }
  }
}

You’ll notice this method takes in two paramters, interval and maxSize. The interval parameter determines how often we should be attempting to read data from the USB device. It’s important to note that calling the mpd.endpoint.Read method halts further execution if there’s no data to read, so using this interval just ensures we don’t read too often from the device. The maxSize parameter determines the maximum size of the buffer we should use when reading data. Both of these values can be obtained from the device configuration we looked at earlier:

mpd := &MPD26{
  context: ctx,
  device: devices[0],
  intf: intf,
  endpoint: endpoint,
}

// The endpoint description defines the poll interval and max packet
// size.
go mpd.read(endpointDesc.PollInterval, endpointDesc.MaxPacketSize)

To start with, lets just print the contents of the byte array to stdout so that we can see the difference in values based on the controls we’re using. Below are some samples:

[11, 176, 1, 127] # Output when moving the first fader
[11, 176, 11, 127] # Output when moving the first knob
[9, 144, 36, 127] # Output when triggering a pad
[8, 144, 26, 0] # Output when releasing a pad

Reverse engineering serial data

We’re going to use the output we get reading the raw USB data to make some assumptions about which values mean what. Luckily, the values we’re getting are MIDI. So any variance between 0-127 is usually a good candidate for the value of the control you’re looking at. Based on the console output, it seems that the last byte in the array is always the MIDI value of the control.

This means the first 3 bytes should indicate the control we’re using. I’ve still yet to figure out what all bytes in the array represent, but there are consistent values for certain controls, so we can use these to update the respective state of a control in the library.

Faders & Knobs

The faders and knobs were the easiest controls to get working. They only have a number to identify them and a value between 0 and 127. After playing with all of them, the first two bytes are consistently [11, 176]. We can use this information to create a method to identify if a message is for the value of a control:

func isControl(data []byte) bool {
  // Knobs and faders all share the same two bytes in common, first and second
  // are always 11 and 176
  return data[0] == 11 && data[1] == 176
}

Easy enough. The next challenge is to determine if we’re handling the change of a knob or a fader. This can be determined using the third byte in the array, which contains values from 1 to 6 for faders and 11 to 16 for the knobs. Using these, we can create two new helper methods to identify the types of control we’re getting a message for:

func isFader(data []byte) bool {
  // A fader is a control where the value of the third byte is always
  // 1 to 6
  return isControl(data) && data[2] >= 1 && data[2] <= 6
}

func isKnob(data []byte) bool {
  // A knob is a control where the value of the third byte is always
  // 11 to 16
  return isControl(data) && data[2] >= 11 && data[2] <= 16
}

Pads

The pads have a little more logic to them, but work the same way. The first byte determines whether or not the pad has been pressed or released, the second byte is always 144 and the third byte is a number between 26 and 51 that identifies the unique pad being pressed/released. Here’s our method:

func isPad(data []byte) bool {
 return (data[0] == 9 || data[0] == 8) && data[1] == 144 && (data[2] >= 36 && data[2] <= 51)
}

Creating the Golang API

Now we need to expose this data in a nice way so that people can build things in Go using an MPD26. Earlier we saw code for reading the serial data, but we need a way to get that data out in a format that would make sense to someone looking directly at the sampler. We also want things to work asynchronously, waiting to read from a pad shouldn’t block a read from a fader.

For the asynchronous output, we’re going to use channels, I’ve added the following fields to the MPD26 type:

// Channels for various components
faders map[int]chan int
knobs map[int]chan int
pads map[int]chan int

I’ve also updated the read method to make a call to a paseMessage function that classifies the type of input and writes to the correct channel:

func (mpd *MPD26) parseMessage(msg []byte) {
 defer mpd.waitGroup.Done()

 // Discard invalid messages.
 if len(msg) < 4 {
  return
 }

 mpd.waitGroup.Add(1)

 if isFader(msg) {
  go mpd.handleFader(msg)
  return
 }

 if isKnob(msg) {
  go mpd.handleKnob(msg)
  return
 }

 if isPad(msg) {
  go mpd.handlePad(msg)
 }
}

As you can see, we now have 3 more functions for handling each kind of input handlePad, handleKnob and handleFader:

func (mpd *MPD26) handlePad(data []byte) {
 defer mpd.waitGroup.Done()

 num := int(data[2]) - 35
 val := int(data[3])

 channel, ok := mpd.pads[num]

 if !ok {
  return
 }

 channel <- val
}

func (mpd *MPD26) handleKnob(data []byte) {
 defer mpd.waitGroup.Done()

 var num int
 val := int(data[3])

 switch data[2] {
 case 12:
  num = 6
 case 11:
  num = 5
 case 14:
  num = 4
 case 13:
  num = 3
 case 16:
  num = 2
 case 15:
  num = 1
 default:
  return
 }

 // Check if there's a channel already listening
 // to this knob, if so, write to it. Otherwise
 // ignore the message.
 channel, ok := mpd.knobs[num]

 if !ok {
  return
 }

 channel <- val
}

func (mpd *MPD26) handleFader(data []byte) {
 defer mpd.waitGroup.Done()

 num := int(data[2])
 val := int(data[3])

 // Check if there's a channel already listening
 // to this fader, if so, write to it. Otherwise
 // ignore the message.
 channel, ok := mpd.faders[num]

 if !ok {
  return
 }

 channel <- val
}

Now, we just need some exported functions on the MPD26 type that someone can use to get the pad/fader/knob they want to read from:

func (mpd *MPD26) Fader(id int) <-chan int {
    channel, ok := mpd.faders[id]

    if !ok {
        channel = make(chan int)
        mpd.faders[id] = channel
    }

    return channel
}

func (mpd *MPD26) Pad(id int) <-chan int {
    channel, ok := mpd.pads[id]

    if !ok {
        channel = make(chan int)
        mpd.pads[id] = channel
    }

    return channel
}

func (mpd *MPD26) Knob(id int) <-chan int {
    channel, ok := mpd.knobs[id]

    if !ok {
        channel = make(chan int)
        mpd.knobs[id] = channel
    }

    return channel
}

With all these in place, we can now connect and read from the sampler. In future, I’d like to hook this up to an audio library like beep in order to get some actual output. But for now, we’ve got a working interface with the sampler!

Go: Implementing kafka consumers using sarama-cluster

David Bond — Wed, 22 Aug 2018 00:00:00 +0000

Introduction

Nowadays it seems as though more and more companies are using event-based architectures to provide communication between services across various domains. Confluent maintain a huge list of companies actively using Apache Kafka, a high performance messaging system and the subject of this post.

Kafka has been so heavily adopted in part due to its high performance and the large number of client libraries available in a multitude of languages.

The concept is fairly simple, clients either produce or consume events that are categorised under “topics”. For example, a company like LinkedIn may produce an event against a user_created topic after a successful sign-up, allowing multiple services to asynchronously react and perform respective processing regarding that user. One service might handle sending me a welcome email, whereas another will attempt to identify other users I may want to connect with.

Kafka events are divided into “partitions”. These are parallel event streams that allow multiple consumers to process events from the same topic. Every event contains what is called an “offset”, a number that represents where an event resides in the sequence of all events in a partition. Imagine all events for a topic partition are stored as an array, the offset would be the index where a particular event is located in time. This allows consumers to specify a starting point from which to consume events, granting the ability to avoid duplication of events processed, or the consumption of events produced earlier in time.

Consumers can then form “groups”, where each consumer reads one or more unique partitions to spread the consumption of a topic across multiple consumers. This is especially useful when running replicated services and can increase event throughput.

Implementing a Kafka consumer

There aren’t a huge number of viable options when it comes to implementing a Kafka consumer in Go. This tutorial focuses on sarama-cluster, a balanced consumer implementation built on top the existing sarama client library by Shopify.

The library has a concise API that makes getting started fairly simple. The first step is to define our consumer configuration. We can use the NewConfig method which creates a default configuration with some sensible starting values

// Create a configuration with some sane default values
config := cluster.NewConfig()

Authentication

If you’re sensible, the Kafka instance you’re connecting to will have some form of authentication. The sarama-cluster library supports both TLS and SASL authentication methods.

If you’re using TLS certificates, you can populate the config.TLS struct field:

config := cluster.NewConfig()

// Load an X509 certificate pair like you would for any other TLS
// configuration
cert, err := tls.LoadX509KeyPair("cert.pem", "cert.key")

if err != nil {
  panic(err)
}

ca, err := ioutil.ReadFile("ca.pem")

if err != nil {
  panic(err)
}

pool := x509.NewCertPool()
pool.AppendCertsFromPEM(ca)

tls := &tls.Config{
  Certificates: []tls.Certificate{cert},
  RootCAs: pool,
}

kafkaConfig.Net.TLS.Config = tls

It’s important to note that if you’re running your consumer within a docker image, you’ll need to install ca-certificates in order to create an x509 certificate pool. In a Dockerfile based on alpine this looks like:

FROM alpine

RUN apk add --update ca-certificates

Alternatively, if you’re using SASL for authentication, you can populate the config.SASL struct field like so:

config := cluster.NewConfig()

// Set your SASL username and password
config.SASL.User = "username"
config.SASL.Password = "password"

// Enable SASL
config.SASL.Enable = true

Implementing the consumer

Now that we’ve created a configuration with our authentication method of choice, we can create a consumer that will allow us to handle events for specified topics. You’re going to need to know the addresses of your Kafka brokers, the name of your consumer group and each topic you wish to consume:

consumer, err := cluster.NewConsumer(
  []string{"broker-address-1", "broker-address-2"},
  "group-id",
  []string{"topic-1", "topic-2", "topic-3"},
  kafkaConfig)

if err != nil {
  panic(err)
}

The sarama-cluster library allows you to specify a consumer mode within the config. It’s important to understand the difference as your implementation will differ based on what you’ve chosen. This can be modified via the config.Group.Mode struct field and has two options. These are:

ConsumerModeMultiplex - By default, messages and errors from the subscribed topics and partitions are all multiplexed and made available through the consumer’s Messages() and Errors() channels.
ConsumerModePartitions - Users who require low-level access can enable ConsumerModePartitions where individual partitions are exposed on the Partitions() channel. Messages and errors must then be consumed on the partitions themselves.

When using ConsumerModeMultiplex, all messages come from a single channel exposed via the Messages() method. Reading these messages looks like this:

// The loop will iterate each time a message is written to the underlying channel
for msg := range consumer.Messages() {
  // Now we can access the individual fields of the message and react
  // based on msg.Topic
  switch msg.Topic {
    case "topic-1":
      handleTopic1(msg.Value)
      break;
    // ...
  }
}

If you want a more low-level implementation where you can react to partition changes yourself, you’re going to want to use ConsumerModePartitions. This provides you the individual partitions via the consumer.Partitions() method. This exposes an underlying channel that partitions are written to when the consumer group rebalances. You can then use each partition to read messages and errors:

// Every time the consumer is balanced, we'll get a new partition to read from
for partition := range consumer.Partitions() {
  // From here, we know exactly which topic we're consuming via partition.Topic(). So won't need any
  // branching logic based on the topic.
  for msg := range consumer.Messages() {
    // Now we can access the individual fields of the message
    handleTopic1(msg.Value)   
  }
}

The ConsumerModePartitions way of doing things will require you to code more oversight into your consumer. For one, you’re going to need to gracefully handle the situation where the partition closes in a rebalance situation. These will occur when adding new consumers to the group. You’re also going to need to manually call the partition.Close() method when you’re done consuming.

Handling errors & rebalances

Should you add more consumers to the group, the existing ones will experience a rebalance. This is where the assignment of partitions to each consumer changes for an optimal spread across consumers. The consumer instance we’ve created already exposes a Notifications() channel from which we can log/react to these changes.

  for notification := range consumer.Notifications() {
    // The type of notification we've received, will be
    // rebalance start, rebalance ok or error
    fmt.Println(notification.Type)

    // The topic/partitions that are currently read by the consumer
    fmt.Println(notification.Current)

    // The topic/partitions that were claimed in the last rebalance
    fmt.Println(notification.Claimed)

    // The topic/partitions that were released in the last rebalance
    fmt.Println(notification.Released)
  }

Errors are just as easy to read and are made available via the consumer.Errors() channel. They return a standard error implementation.

  for err := range consumer.Errors() {
    // React to the error
  }

In order to enable the reading of notification and errors, we need to make some small changes to our configuration like so:

config.Consumer.Return.Errors = true
config.Group.Return.Notifications = true

Committing offsets

The last step in implementing the consumer is to commit our offsets. In short, we’re telling Kafka that we have finished processing a message and we do not want to consume it again. This should be done once you no longer require the message data for any processing. If you commit offsets too early, you may lose the ability to easily reconsume the event if something goes wrong. Let’s say you’re writing the event contents straight to a database, don’t commit offsets before you’ve written the contents of the event to your database successfully. That way, should the database operation fail, you can just reconsume the event to try again.

// The loop will iterate each time a message is written to the underlying channel
for msg := range consumer.Messages() {
  // Now we can access the individual fields of the message and react
  // based on msg.Topic
  switch msg.Topic {
    case "topic-1":
      // Do everything we need for this topic
      handleTopic1(msg.Value)

      // Mark the message as processed. The sarama-cluster library will
      // automatically commit these.
      // You can manually commit the offsets using consumer.CommitOffsets()
      consumer.MarkOffset(msg)
      break;
      // ...
  }
}

This is everything you need in order to implement a simple Kafka consumer group. The sarama-cluster library provides a lot more configuration options to suit your needs based on how you maintain your Kafka brokers. I’d recommend browsing through all the config values yourself to determine if you need to tweak any.

Go: Debugging memory leaks using pprof

David Bond — Wed, 08 Aug 2018 00:00:00 +0000

Introduction

I work as a software engineer at OVO Energy where my team are implementing the CRM solution used by customer services. We’re currently building a new set of microservices to replace the existing services. One of our microservices is responsible for migrating data from the old system into the new one.

A few days after deploying a new version of the service, I opened the relevant monitoring dashboard and saw this:

According to this graph, we have a memory leak somewhere. This is most likely due to an issue with the management of goroutines within the service. However, the service relies heavily on concurrency, so finding where the leak is might not be so easy. Luckily, goroutines are lightweight, allowing a reasonable amount of time to figure out where the leak is before it becomes a real/expensive problem. The two spikes on the 12pm marks are times when migrations occurred.

Background

Over the course of a few weeks I designed and implemented the service and hosted it in our Kubernetes cluster on GCP, ensuring that I added monitoring functionality in order to make it ready for production. This included an HTTP endpoint for health checks, log-based metrics and uptime checks using Stackdriver.

This service has to communicate with a handful of external dependencies, these are:

Apache Kafka - Kafka allows services to publish and subscribe to streams of records, similar to a message queue or enterprise messaging system. An event is published from another service to signify that a customer is ready for us to migrate.
Confluent Schema Registry - The registry allows us to apply versioned schemas to our Kafka events and is used to decode messages from Apache Avro format into its JSON counterpart.
PostgreSQL - A relational database used to store information on the migration of a customer (records created, any errors and warnings etc).
Two Salesforce instances - These are where the customer support staff work on a day-to-day basis. One containing the source of the V1 data and one to store the new V2 data.

From all these dependencies, we have a health check that looks something like this:

{
    "status": "UP",
    "uptime": "22h54m27.491102074s",
    "goroutines": 24,
    "version": "go1.9.7",
    "sf1": {
        "status": "UP"
    },
    "sf2": {
        "status": "UP"
    },
    "database": {
        "status": "UP"
    },
    "kafka": {
        "status": "UP"
    },
    "registry": {
        "status": "UP"
    }
}

First thing I would note, using runtime.NumGoroutine() to see the number of running goroutines is extremely helpful in identifying the source of the memory leak. I recommend having some way to monitor this in your production environments. In this scenario, our HTTP health check returns the number of running goroutines.

On the day of the leak, I saw the number of goroutines exceed 100000 and keep rising steadily with each health check request. Below are the steps I took in debugging this issue.

Enabling pprof output via HTTP

The pprof tool describes itself as “a tool for visualization and analysis of profiling data”, you can view the GitHub repository for it here. This tool allows us to obtain various metrics on the low-level operations of a Go program. For our purposes, it allows us to get detailed information on running goroutines. The only problem here is that pprof is a binary. This means we would have to run commands against the service in production for meaningful results. The application also runs within a Docker container based on a scratch image, which makes using the binary somewhat invasive. How then can we get the profiling data we need?

The net/http/pprof package within the standard library exposes pprof methods for providing profiling data via HTTP endpoints. This project uses mux as its url router, so exposing the endpoints can be done using the HandleFunc and Handle methods:

// Create a new router
router := mux.NewRouter()

// Register pprof handlers
router.HandleFunc("/debug/pprof/", pprof.Index)
router.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
router.HandleFunc("/debug/pprof/profile", pprof.Profile)
router.HandleFunc("/debug/pprof/symbol", pprof.Symbol)

router.Handle("/debug/pprof/goroutine", pprof.Handler("goroutine"))
router.Handle("/debug/pprof/heap", pprof.Handler("heap"))
router.Handle("/debug/pprof/threadcreate", pprof.Handler("threadcreate"))
router.Handle("/debug/pprof/block", pprof.Handler("block"))

Once I had added these handlers, I span up a local instance of the service and navigated to the /debug/pprof/goroutine endpoint.

Understanding pprof output

The response I got from /debug/pprof/goroutine was fairly easy to interpret, here’s a sample that shows the routines span upby the Kafka consumer.

goroutine profile: total 25
2 @ 0x434420 0x4344e5 0x404747 0x40451b 0x8a25af 0x8f2486 0x8ee88c 0x461d61
#   0x8a25ae    /vendor/github.com/Shopify/sarama.(*Broker).responseReceiver+0xfe
            /vendor/github.com/Shopify/sarama/broker.go:682
#   0x8f2485    /vendor/github.com/Shopify/sarama.(*Broker).(/vendor/github.com/Shopify/sarama.responseReceiver)-fm+0x35
            /vendor/github.com/Shopify/sarama/broker.go:149
#   0x8ee88b    /vendor/github.com/Shopify/sarama.withRecover+0x4b
            /vendor/github.com/Shopify/sarama/utils.go:45

The first line tells us the total number of running goroutines. In this example, I was running a version of the servicewhich had fixed the memory leak. As you can see we have a total of 25 running goroutines. The following lines tell us how manygoroutines belong to specific package methods. In this example, we can see the .responseReceiver method from the Broker type in the sarama package is currently using 2 goroutines. This was the silver bullet in locating the culprit of the leak.

In the leaking version of the service, two particular lines stand out that have an ever increasing number of active goroutines.

14 @ 0x434420 0x444c4e 0x7c87fd 0x461d61
#   0x7c87fc    net/http.(*persistConn).writeLoop+0x15c C:/Go/src/net/http/transport.go:1822

14 @ 0x434420 0x444c4e 0x7c761e 0x461d61
#   0x7c761d    net/http.(*persistConn).readLoop+0xe9d  C:/Go/src/net/http/transport.go:1717

Somewhere in the code we’re creating HTTP connections that are stuck in a read/write loop. I decided to take a look into the source code of the standard library to understand this behavior. The first place I looked was the location at which these routines are spawned. This is within the net/http/transport.go file, by the dialConn method. The full contents of which can be viewed here

// transport.go:1234

pconn.br = bufio.NewReader(pconn)
pconn.bw = bufio.NewWriter(persistConnWriter{pconn})
go pconn.readLoop() // <- Here is the source of our leak
go pconn.writeLoop()
return pconn, nil

Now that we’ve identified where our leak is coming from, we need to understand what scenario is causing these goroutines to never return. I noticed that the number of goroutines only increased after a health check. In the production system, this was happening approximately once a minute using Stackdriver’s uptime checks from different regions.

After a little bit of searching, I determined the source of the leak was during our request to the Confluent schema registry to assert its availability. I had made some rather naive mistakes when writing this package. First off, here’s the New method that creates the client for the registry:

func New(baseURL, user, pass string, cache *cache.Cache) Registry {
    return &registry{
        baseURL: baseURL,
        username: user,
        password: pass,
        cache: cache,
        client: &http.Client{},
    }
}

Error number one, always configure your http clients with sensible values. This issue can be half resolved by the inclusion of a timeout:

func New(baseURL, user, pass string, cache *cache.Cache) Registry {
    return &registry{
        baseURL: baseURL,
        username: user,
        password: pass,
        cache: cache,
        client: &http.Client{
            Timeout: time.Second * 10,
        },
    }
}

With this change in place, the leaking routines were cleaned up after about 10 seconds.

While this works, there was one more one-line change to resolve this issue within the method that generates the HTTP requests. I was looking through the definition of the http.Request type and found the Close flag:

// request.go:197

// Close indicates whether to close the connection after
// replying to this request (for servers) or after sending this
// request and reading its response (for clients).
//
// For server requests, the HTTP server handles this automatically
// and this field is not needed by Handlers.
//
// For client requests, setting this field prevents re-use of
// TCP connections between requests to the same hosts, as if
// Transport.DisableKeepAlives were set.
Close bool

I decided to check what would happen if I set this flag to true and if it would prevent the locking of these goroutines. Here’s what it looked like in code:

func (r *registry) buildRequest(method, url string) (*http.Request, error) {
    req, err := http.NewRequest(method, url, nil)

    if err != nil {
        return nil, errors.Annotate(err, "failed to create http request")
    }

    req.SetBasicAuth(r.username, r.password)
    req.Close = true

    return req, nil
}

After implementing these changes and deploying it to production, the memory usage of the service stayed at a healthy amount forever more:

Lessons learned

Monitor your number of active goroutines, especially in services that rely on concurrency patterns
Add functionality to your services to expose profiling data using pprof
Set reasonable configuration values for your HTTP clients and requests

DEV Community: David Bond

Go: Creating Dynamic Kubernetes Informers

Introduction

Getting Started

Cluster Authentication

Monitoring Resources

Cache Syncing

RBAC

Wrapping Up

Links

Homelab: Accessing my k3s cluster securely from anywhere with Tailscale

Introduction

Cluster Setup

Installing Tailscale

Installing K3S

Ingresses

Cloudflare DNS

Wrapping up

Links

Go: Structuring repositories with protocol buffers

Introduction

What are protocol buffers?

The ‘proto’ directory

Writing protocol buffer definitions

Generating code from protocol buffers

Generated package names

Links

Go: Creating gRPC interceptors

Introduction

Interceptor Types

Unary Client Interceptors

Unary Server Interceptors

Stream Client Interceptors

Stream Server Interceptors

Creating an interceptor

Testing an interceptor

Using an interceptor

Links

Go: Creating distributed systems using memberlist

Introduction

Creating a simple cluster

Communication between members

Links

Go: Reverse engineering an AKAI MPD26 using gousb

Introduction

Connecting to the USB interface

Figuring out which USB device to use

Reading from the USB device

Reverse engineering serial data

Faders & Knobs

Pads

Creating the Golang API

Links

Go: Implementing kafka consumers using sarama-cluster

Introduction

Implementing a Kafka consumer

Authentication

Implementing the consumer

Handling errors & rebalances

Committing offsets

Links

Go: Debugging memory leaks using pprof

Introduction

Background

Enabling pprof output via HTTP

Understanding pprof output

Lessons learned

Links