Diagrams as code for infrastructure as code

#infrastructure #diagrams #documentation

Blueprint — a diagram is worth a thousand lines of poorly documented code

If you’re working with infrastructure, you’re undoubtedly leveraging infrastructure as code, right? I sure hope so. Otherwise, start now. I’ll be waiting.

A natural evolution might be creating a graphical representation of your infrastructure. Diagrams as code if you will. It is not a new idea. This technique appeared in the tech radar back in 2015. You can hark back to that period where people thought that generating code from UML diagrams was a sensible idea.

Let’s forget that ever happened! Instead, let’s think about using visualization to understand complex architectures. And maintaining those artifacts up to date while we’re at it. I’m going to talk about a tool that builds on top of Graphviz called Diagrams.

What is diagrams?

A tool with a very generic name, that’s for sure. If you check its website, it’s described as:

Diagrams lets you draw the cloud system architecture in Python code.

Pretty straight-forward. Installing it is a one-liner.

pip3 install diagrams

We’re going to create infrastructure diagrams with Python. We’ll commit both the image and the code used to generate them so that everybody in the team can make changes. Let’s get coding, err drawing. I have two examples for you.

Alerting workflow

Let’s say you have a workflow where you publish alarms to an SNS Topic. That topic is processed by a lambda function, which transforms the data and writes it back to Cloudwatch. Through an event rule, the right alerts reach our event bus.

So much text for such a little thing. I’m surprised you’re still paying attention. Let me hit you with a fancy diagram.

If you check the code, you’ll notice how simple it is. There are three elements.

The nodes , which are single components. It has a recognizable icon and a name.
The edges , which are the connections between components. They can be directed or undirected.
The clusters , which group nodes logically.

And there is not much else. The diagram is built with this snippet.

from diagrams import Diagram, Cluster

from diagrams.aws.compute import Lambda
from diagrams.aws.integration import SNS, Eventbridge
from diagrams.aws.management import Cloudwatch
from diagrams.onprem.queue import ActiveMQ

with Diagram("Alerting Workflow", show=True):
    with Cluster('main account'):
        topic = SNS('SNS Topic')

        with Cluster('Lambda'):
            l = Lambda('processor')
            topic >> l
            S3('lambda source') - l

        cl = Cloudwatch('Cloudwatch')
        l >> cl

        event = Eventbridge('Cloudwatch\nevent rule')
        cl >> event

    with Cluster('Event Bus'):
        event_bus = ActiveMQ('bus')
        event >> event_bus

Notice how quickly you can get a decent overview of your system with a just bit of Python.

Network diagrams

Networking seems to be uniquely suited for this approach. While researching VPC endpoints, I realized that a clear drawing aids understanding significantly.

Let’s say we are connecting to a Kubernetes cluster because that’s what we all do these days. We’re routing a bunch of different domains to a VPC endpoint. The cluster resides in a different account, so we use an endpoint service to make it available. Add a few more routes, and the whole thing becomes a tangled mess, much like the pile of cables behind your desk. That is until you see this diagram.

Better, isn’t it? What about the code used to generate it?

from diagrams import Cluster, Diagram
from diagrams.aws.network import VPC, PublicSubnet, PrivateSubnet, Endpoint, ELB, Route53
from diagrams.aws.compute import EKS

with Diagram("Connecting two accounts", show=True):
    with Cluster("Account 1"):
        with Cluster("Hosted Zone\nmain.io"):
            star = Route53("*.test.main.io")
            subdomain = Route53("test.main.io")

        with Cluster("Hosted Zone\nalt.dev"):
            alt = Route53("other.alt.dev")

        with Cluster("VPC"):
            VPC()

            with Cluster(""):
                PrivateSubnet()

                endpoint = Endpoint("VPC Endpoint")
                [star, subdomain, alt] >> endpoint

    with Cluster("Account 2"):
        with Cluster("VPC"):
            VPC()

            with Cluster(""):
                PrivateSubnet()

                service = ELB("VPC\nEndpoint Service")
                cluster = EKS("Kubernetes\nCluster")

                endpoint >> service >> cluster

The amount of code used to represent it has grown compared to the previous example. Luckily, using Python gives us access to functions, comprehensions, and other tools to manage this complexity.

The curious case of the diagram that could not be updated

That’s how you use Diagrams. Is it really valuable? I certainly like the pretty icons. Apart from that, it shines is enabling visual documentation to evolve.

I’ve tried to digitalize technical diagrams before. I really tried. Sketch, sketchboard, or even just taking a picture of a hand-drawn diagram. It works until you need to update it, and you don’t have the source. Perhaps it was done by somebody else, who preferred a completely different tool. I’ve often seen a project’s documentation get more and more out of date because nobody can update the damn diagrams. If it’s just source code, your chances get a lot better.

Conclusion

Diagrams is a neat, albeit limited tool. You can’t add much more than what I’ve shown. While that is constraining, it can protect you from yourself. Overly complicated diagrams do more harm than good. If you are representing the system in its entirety, why not check the code directly? The point of abstraction is to make it simpler to understand by omitting some of the details.

In summary, it’s a convenient way to bring clarity into your impenetrable READMEs, and you’ll be able to update the images as your code evolves.