DEV Community: John McBride

Let Us Be Free

John McBride — Thu, 02 Jul 2026 15:55:55 +0000

Nearly half a century ago, the free software movement made a demand that was both technical and moral: Users should have the freedom to understand, run, modify, and share the software on which they depended.

It was a demand born from practical life with machines. A printer that couldn't be fixed. A program that couldn't be studied. A system that asked its users to accept dependence as the price of progress.

That belief shaped modern computing and gave us the tools and norms that made the internet, open infrastructure, and collaborative software development possible.

Today, that belief faces its hardest test.

The technology has changed, but the warning signs are familiar. In 1980, at the MIT AI Laboratory in Cambridge, Massachusetts, a new Xerox 9700 printer was installed. The previous printer had come with source code that could be modified, inspected, recompiled, and reinstalled. Richard Stallman had changed that software to message users when their print job was done or when there was a jam, a small but meaningful feature since the printer sat several floors away.

The new printer arrived with software preloaded and installed, no source code available, no way to modify it. If you needed help or new features, you hoped and prayed Xerox would listen.

That loss of agency, alongside other anti-consumer shifts in early software, helped push him toward GNU and the free software movement: the belief that software should be free as in freedom, free to inspect, run, study, modify, understand, and redistribute.

AI and inference services today are not too dissimilar. Closed frontier intelligence can make entire companies, governments, developers, and communities dependent on systems they cannot inspect, reproduce, modify, or meaningfully contest.

At the dawn of this AI moment, we were promised unfettered intelligence across our products, companies, and codebases. We were told we'd be free to build whatever we wanted. At first, with tab completions. Then whole function blocks. Then files. Then apps. And now, entire long-running horizon tasks and services.

But we've traded true freedom for cloud inference and models we can't run ourselves, inspect, reproduce, modify, or own. Inference providers have gotten us hooked on /v1/chat/completions APIs when, just three years ago, nearly everything in a typical engineer's stack was free and open software. Now, the token squeeze has begun, and what was originally a pretty good deal will start to get more expensive, restrictive, gated, and guarded.

This is a freedom problem.

What about open weight models? I can download Qwen or GLM or Kimi or MiniMax and run it locally with ease. Good. That matters. Open weights are a meaningful step toward a freer future, and they should be defended.

But open weight is not open source.

Weights are not source code in the way a compiler, kernel, or editor has source code. They are the trained result of a process we often cannot inspect, reproduce, or meaningfully understand. I cannot inspect or modify the weights, at least not without expensive specialized fine-tuning. I cannot reproduce the output.

A half-freedom is no freedom at all.

The free and open software movement has never faced such an existential crisis. We risk trading our freedom and dignity for access to new frontier models, granted only to a curated list of companies and governments.

As agentic workloads move off "localhost" and into cloud compute boxes, we will soon be expected to trade our sovereign ability to run agent software ourselves for rented vCPU black-sandboxes we can never understand.

We must demand that AI labs open source their models, datasets, and training workflows. We must demand that frontier intelligence be made available for all. We must demand that local inference, agents, and workflows be open and free. We must preserve the right to build with intelligence without asking permission.

If I am not for myself, who will be for me? If I am only for myself, what am I? If not now, when?

If I am not free now, when?

OpenSauced on Azure: Lessons learned from a near-zero downtime migration

John McBride — Tue, 15 Oct 2024 16:50:08 +0000

At the beginning of October, the OpenSauced engineering team completed a weeks-long
migration of our infrastructure, data, and pipelines to Microsoft Azure. Before this move, we had several bespoke container Apps on Digital Ocean alongside managed PostgreSQL databases.

This setup worked well for a while and was a great way to bootstrap. But, because we lacked GitOps, infrastructure-as-code (IaC) tooling, and a structured method for storing secrets in those early days, our app configurations could be brittle, prone to breaking during upgrades or releases, and difficult to scale in a streamlined manner.

We ultimately decided to migrate our core backend infrastructure from DigitalOcean to Azure, consolidating everything into a unified environment. This move allowed us to capitalize on our existing Azure Kubernetes Service (AKS) infrastructure and fully commit to Kubernetes as our primary service and container orchestration platform.

Azure Kubernetes Service for container runtimes

If you've read any of my previous engineering deep dives (including Technical Deep Dive: How We Built the Pizza CLI Using Go and Cobra, How we use Kubernetes jobs to scale OpenSSF Scorecard, and How We Saved 10s of Thousands of Dollars Deploying Low Cost Open Source AI Technologies At Scale with Kubernetes), you know that we already deploy several AI services and core data pipelines on AKS (primarily the services that power StarSearch).

To simplify our infrastructure and make the most of our existing compute resources in our AKS clusters, we adopted a "monolithic cluster" approach. This means we’re deploying all infrastructure, APIs, and services to the same AKS clusters, centralizing control, management, deployment, and scaling.

The benefits are clear: we avoid the complexity of multi-cluster management, consolidate our networking within a single region, and streamline operations for our small, agile engineering team.

However, this approach has trade-offs we may need to tackle in the future. As OpenSauced grows and scales, we’ll need to reassess and likely adopt a multi-region or multi-cluster strategy to support a globally distributed network. This decision was made with a conscious understanding of the scalability challenges we may face in the future, but for now, this approach gives us the flexibility and simplicity we need.

Choosing a Kubernetes Ingress controller

With AKS now handling all our backend infrastructure, including public-facing APIs, we needed an ingress solution for routing external traffic into our clusters. This also required load balancing, firewall management, Let's Encrypt certificates for SSL, and security policies.

We chose Traefik as our Kubernetes ingress controller. Traefik, a popular choice in the Kubernetes community, is an "application proxy" that offers a rich set of features while being easy to set up. With Traefik, what could have been a complex, error-prone task became an intuitive and streamlined integration into our infrastructure.

Using Pulumi for infrastructure as code and deployment

A key part of our migration was adopting Pulumi as our infrastructure-as-code solution. Before this, our infrastructure setup was a bit ad-hoc, with various configurations and third-party services stitched together manually. When we needed a new cloud service or we were ready to deploy some new API service, we'd piece-meal the different bits together in cloud dashboards and build some custom automation in GitHub actions. While this worked in the very early stages of OpenSauced, it quickly became brittle and hard to manage at scale or across an engineering team.

Pulumi offers several benefits that have already had a noticeable impact on our workflows and engineering culture:

Environment Reproducibility: We can easily create and replicate environments, whether spinning up a new Kubernetes cluster or a full staging environment. It’s as simple as creating a new Pulumi stack.
Simple, Consistent Deployments: Deployments are straightforward, repeatable, and integrated into our CI/CD pipelines.
State and Secret Management: Pulumi provides a built-in mechanism for storing state and secrets, which can be securely shared across the entire engineering team.
GitOps Compatibility: By leveraging Pulumi’s tight integration with Git, we can adopt deeper GitOps workflows, bringing more automation and consistency to our infrastructure management.

Overall, Pulumi has significantly reduced the friction around infrastructure management and deploying new services, allowing us to focus on what really matters — building OpenSauced!

Azure Flexible servers for managed Postgres

For the data layer at OpenSauced (including user data, user assets, and GitHub repository metadata), we previously used DigitalOcean’s managed PostgreSQL service. For our migration to Azure, we opted for Azure Database for PostgreSQL with the Flexible Server deployment option.

This service gives us all the benefits of a managed database solution, including automated backups, restoration capabilities, and high availability. The bonus here is that we can co-locate our data with our AKS clusters in the same region, ensuring low-latency networking between our services on-cluster and the database.

Looking ahead, as our user base grows, we’ll need to explore data replication and distribution to additional regions to enhance availability and redundancy. But for now, this managed solution meets our needs and positions us well for future scalability.

Hats off to the Azure Postgres team on enabling a smooth and near zero downtime migration of our data. All in all, using Azure's provided migration tools, moving everything over took less than 5 minutes. We completed the production migration with minimal end user impact. Because we used Pulumi to configure all our containers on-cluster and also deploy the Postgres flexible servers, we could quickly and easily re-deploy our containers with different configurations to be ready to use the new databases.

Between our Kubernetes environment, Pulumi IaC tooling, and Azure's sublime migration tools, we were able to complete a full production migration seamlessly.

Grafana Observability

As part of this migration, we also made some enhancements to our observability stack to ensure that our backend infrastructure is properly monitored. We use Grafana for observability, and during the migration, we deployed Grafana Alloy on our clusters. Alloy integrates seamlessly with Prometheus for metrics and Loki for log aggregation, giving us a powerful observability framework.

With these tools in place, we have a comprehensive view of our system’s health, allowing us to monitor performance, detect anomalies, and respond to issues before they impact our users. Additionally, our integration with Grafana’s on-call and alerting features enable our engineering team to respond to incidents and ensure OpenSauced stays healthy.

A huge thank you to our Microsoft Azure partners in enabling us to make this transition, providing their expertise, and supporting us along the way!!

As always, stay saucy friends!!

Technical Deep Dive: How We Built the Pizza CLI Using Go and Cobra

John McBride — Mon, 23 Sep 2024 16:07:21 +0000

Last week, the OpenSauced engineering team released the Pizza CLI, a powerful and composable command-line tool for generating CODEOWNER files and integrating with the OpenSauced platform. Building robust command-line tools may seem straightforward, but without careful planning and thoughtful paradigms, CLIs can quickly become tangled messes of code that are difficult to maintain and riddled with bugs. In this blog post, we'll take a deep dive into how we built this CLI using Go, how we organize our commands using Cobra, and how our lean engineering team iterates quickly to build powerful functionality.

Using Go and Cobra

The Pizza CLI is a Go command-line tool that leverages several standard libraries. Go’s simplicity, speed, and systems programming focus make it an ideal choice for building CLIs. At its core, the Pizza-CLI uses spf13/cobra, a CLI bootstrapping library in Go, to organize and manage the entire tree of commands.

You can think of Cobra as the scaffolding that makes a command-line interface itself work, enables all the flags to function consistently, and handles communicating to users via help messages and automated documentation.

Structuring the Codebase

One of the first (and biggest) challenges when building a Cobra-based Go CLI is how to structure all your code and files. Contrary to popular belief, there is no prescribed way to do this in Go. Neither the go build command nor the gofmt utility will complain about how you name your packages or organize your directories. This is one of the best parts of Go: its simplicity and power make it easy to define structures that work for you and your engineering team!

Ultimately, in my opinion, it's best to think of and structure a Cobra-based Go codebase as a tree of commands:

├── Root command
│   ├── Child command
│   ├── Child command
│   │   └── Grandchild command

At the base of the tree is the root command: this is the anchor for your entire CLI application and will get the name of your CLI. Attached as child commands, you’ll have a tree of branching logic that informs the structure of how your entire CLI flow works.

One of the things that’s incredibly easy to miss when building CLIs is the user experience. I typically recommend people follow a “root verb noun” paradigm when building commands and child-command structures since it flows logically and leads to excellent user experiences.

For example, in Kubectl, you’ll see this paradigm everywhere: “kubectl get pods”, “kubectl apply …“, or “kubectl label pods …” This ensures a sensical flow to how users will interact with your command line application and helps a lot when talking about commands with other people.

In the end, this structure and suggestion can inform how you organize your files and directories, but again, ultimately it’s up to you to determine how you structure your CLI and present the flow to end-users.

In the Pizza CLI, we have a well defined structure where child commands (and subsequent grandchildren of those child commands) live. Under the cmd directory in their own packages, each command gets its own implementation. The root command scaffolding exists in a pkg/utils directory since it's useful to think of the root command as a top level utility used by main.go, rather than a command that might need a lot of maintenance. Typically, in your root command Go implementation, you’ll have a lot of boilerplate setting things up that you won’t touch much so it’s nice to get that stuff out of the way.

Here's a simplified view of our directory structure:

├── main.go
├── pkg/
│   ├── utils/
│   │   └── root.go
├── cmd/
│   ├── Child command dir
│   ├── Child command dir
│   │   └── Grandchild command dir

This structure allows for clear separation of concerns and makes it easier to maintain and extend the CLI as it grows and as we add more commands.

Using go-git

One of the main libraries we use in the Pizza-CLI is the go-git library, a pure git implementation in Go that is highly extensible. During CODEOWNERS generation, this library enables us to iterate the git ref log, look at code diffs, and determine which git authors are associated with the configured attributions defined by a user.

Iterating the git ref log of a local git repo is actually pretty simple:

// 1. Open the local git repository
repo, err := git.PlainOpen("/path/to/your/repo")
if err != nil {
        panic("could not open git repository")
}

// 2. Get the HEAD reference for the local git repo
head, err := repo.Head()
if err != nil {
        panic("could not get repo head")
}

// 3. Create a git ref log iterator based on some options
commitIter, err := repo.Log(&git.LogOptions{
        From:  head.Hash(),
})
if err != nil {
        panic("could not get repo log iterator")
}

defer commitIter.Close()

// 4. Iterate through the commit history
err = commitIter.ForEach(func(commit *object.Commit) error {
        // process each commit as the iterator iterates them
        return nil
})
if err != nil {
        panic("could not process commit iterator")
}

If you’re building a Git based application, I definitely recommend using go-git: it’s fast, integrates well within the Go ecosystem, and can be used to do all sorts of things!

Integrating Posthog telemetry

Our engineering and product team is deeply invested in bringing the best possible command line experience to our end users: this means we’ve taken steps to integrate anonymized telemetry that can report to Posthog on usage and errors out in the wild. This has allowed us to fix the most important bugs first, iterate quickly on popular feature requests, and understand how our users are using the CLI.

Posthog has a first party library in Go that supports this exact functionality. First, we define a Posthog client:

import "github.com/posthog/posthog-go"

// PosthogCliClient is a wrapper around the posthog-go client and is used as a
// API entrypoint for sending OpenSauced telemetry data for CLI commands
type PosthogCliClient struct {
    // client is the Posthog Go client
    client posthog.Client

    // activated denotes if the user has enabled or disabled telemetry
    activated bool

    // uniqueID is the user's unique, anonymous identifier
    uniqueID string
}

Then, after initializing a new client, we can use it through the various struct methods we’ve defined. For example, when logging into the OpenSauced platform, we capture specific information on a successful login:

// CaptureLogin gathers telemetry on users who log into OpenSauced via the CLI
func (p *PosthogCliClient) CaptureLogin(username string) error {
    if p.activated {
        return p.client.Enqueue(posthog.Capture{
            DistinctId: username,
            Event:      "pizza_cli_user_logged_in",
        })
    }

    return nil
}

During command execution, the various “capture” functions get called to capture error paths, happy paths, etc.

For the anonymized IDs, we use Google’s excellent UUID Go library:

newUUID := uuid.New().String()

These UUIDs get stored locally on end users machines as JSON under their home directory: ~/.pizza-cli/telemtry.json. This gives the end user complete authority and autonomy to delete this telemetry data if they want (or disable telemetry altogether through configuration options!) to ensure they’re staying anonymous when using the CLI.

Iterative Development and Testing

Our lean engineering team follows an iterative development process, focusing on delivering small, testable features rapidly. Typically, we do this through GitHub issues, pull requests, milestones, and projects. We use Go's built-in testing framework extensively, writing unit tests for individual functions and integration tests for entire commands.

Unfortunately, Go’s standard testing library doesn’t have great assertion functionality out of the box. It’s easy enough to use “==” or other operands, but most of the time, when going back and reading through tests, it’s nice to be able to eyeball what’s going on with assertions like “assert.Equal” or “assert.Nil”.

We’ve integrated the excellent testify library with its “assert” functionality to allow for smoother test implementation:

config, _, err := LoadConfig(nonExistentPath)
require.Error(t, err)
assert.Nil(t, config)

Using Just

We heavily use Just at OpenSauced, a command runner utility, much like GNU’s “make”, for easily executing small scripts. This has enabled us to quickly onramp new team members or community members to our Go ecosystem since building and testing is as simple as “just build” or “just test”!

For example, to create a simple build utility in Just, within a justfile, we can have:

build:
  go build main.go -o build/pizza

Which will build a Go binary into the build/ directory. Now, building locally is as simple as executing a “just” command.

But we’ve been able to integrate more functionality into using Just and have made it a cornerstone of how our entire build, test, and development framework is executed. For example, to build a binary for the local architecture with injected build time variables (like the sha the binary was built against, the version, the date time, etc.), we can use the local environment and run extra steps in the script before executing the “go build”:

build:
    #!/usr/bin/env sh
  echo "Building for local arch"

  export VERSION="${RELEASE_TAG_VERSION:-dev}"
  export DATETIME=$(date -u +"%Y-%m-%d-%H:%M:%S")
  export SHA=$(git rev-parse HEAD)

  go build \
    -ldflags="-s -w \
    -X 'github.com/open-sauced/pizza-cli/pkg/utils.Version=${VERSION}' \
    -X 'github.com/open-sauced/pizza-cli/pkg/utils.Sha=${SHA}' \
    -X 'github.com/open-sauced/pizza-cli/pkg/utils.Datetime=${DATETIME}' \
    -X 'github.com/open-sauced/pizza-cli/pkg/utils.writeOnlyPublicPosthogKey=${POSTHOG_PUBLIC_API_KEY}'" \
    -o build/pizza

We’ve even extended this to enable cross architecture and OS build: Go uses the GOARCH and GOOS env vars to know which CPU architecture and operating system to build against. To build other variants, we can create specific Just commands for that:

# Builds for Darwin linux (i.e., MacOS) on arm64 architecture (i.e. Apple silicon)
build-darwin-arm64:
  #!/usr/bin/env sh

  echo "Building darwin arm64"

  export VERSION="${RELEASE_TAG_VERSION:-dev}"
  export DATETIME=$(date -u +"%Y-%m-%d-%H:%M:%S")
  export SHA=$(git rev-parse HEAD)
  export CGO_ENABLED=0
  export GOOS="darwin"
  export GOARCH="arm64"

  go build \
    -ldflags="-s -w \
    -X 'github.com/open-sauced/pizza-cli/pkg/utils.Version=${VERSION}' \
    -X 'github.com/open-sauced/pizza-cli/pkg/utils.Sha=${SHA}' \
    -X 'github.com/open-sauced/pizza-cli/pkg/utils.Datetime=${DATETIME}' \
    -X 'github.com/open-sauced/pizza-cli/pkg/utils.writeOnlyPublicPosthogKey=${POSTHOG_PUBLIC_API_KEY}'" \
    -o build/pizza-${GOOS}-${GOARCH}

Conclusion

Building the Pizza CLI using Go and Cobra has been an exciting journey and we’re thrilled to share it with you. The combination of Go's performance and simplicity with Cobra's powerful command structuring has allowed us to create a tool that's not only robust and powerful, but also user-friendly and maintainable.

We invite you to explore the Pizza CLI GitHub repository, try out the tool, and let us know your thoughts. Your feedback and contributions are invaluable as we work to make code ownership management easier for development teams everywhere!

Introducing the Pizza CLI

John McBride — Mon, 16 Sep 2024 15:13:37 +0000

As software engineering teams and projects scale, a common problem larger organizations can find themselves in is deciphering the “who’s who” of a codebase. This problem only compounds itself if large mono-repos are in use or multiple teams interact in the same space. Developers may find themselves asking “Who do I ask for a review on this? What team owns this part of the code base? Who do I ask for help?”

On the receiving end of this, you may find yourself getting asked questions on things you haven’t been involved in for years. Or you may be missing critical notifications on pieces of code you maintain or your team maintains. Or even worse, there may be cross team miscommunications that cause problems for the code you own.

As teams grow and codebases expand, before too long, it can be very easy to lose the thread on who owns what piece of “knowledge” across your engineering org. This lost context can slow down development, hinder open collaboration, and slowly erode the engineering culture of an engineering organization.

But what if there was a way to automate code ownership, streamline collaboration, and reduce this knowledge debt? Today, the OpenSauced team is very excited to introduce the Pizza CLI, a powerful command-line tool designed to help maintainers, teams, and organizations manage their engineering “ownership” culture and derive insights right on the command line.

Introducing the Pizza CLI

The Pizza CLI is our solution to the challenges of lost context and asking “Who’s who?” Born from discussions with industry experts like Bdougie and Kelsey Hightower, inspired by robust tools used by hyperscale tech companies across the industry, the Pizza CLI empowers teams to automate code ownership and enhance cross org collaboration directly from the command line.

CODEOWNERS Generation: Easily generate GitHub-style CODEOWNERS or Google-style OWNERS files, granularly mapping out who owns which parts of your codebase based on git history, number of lines touched, and current activity. These owner files can then be used in GitHub CODEOWNERS automation or as part of more robust CI/CD.

Attribution Configuration: Use a simple YAML configuration to map commit emails to GitHub usernames and teams, ensuring accurate ownership assignments and easy management of entities within your engineering org.

OpenSauced Integration: Seamlessly connect with OpenSauced to create Contributor Insights pages and metrics, helping you visualize and understand your project's contributor and owner landscape.

Enhanced collaboration for large teams

By clearly defining code ownership in a granular manner and integrating seamlessly with GitHub CODEOWNERS functionality, the Pizza CLI helps you:

Improve Efficiency: Developers know exactly who to reach out to for code reviews or questions, reducing delays, miscommunications, and team misalignments.

Enhance Collaboration: Ownership transparency creates a culture of shared responsibility and teamwork, further enhancing open “inner-source” between teams and developers.

Streamline Onboarding: New team members can quickly identify code owners, making it easier for them to ramp up and contribute confidently. Oftentimes, this can be automated through GitHub’s integration with CODEOWNER files through automatic PR review requests and notifications.

Getting Started with Pizza CLI

Installation



brew install open-sauced/tap/pizza

We offer a number of flexible options for installing the Pizza CLI onto your system (including Homebrew, NPM, Docker, and more). Check out the docs in the repository for a full rundown of the ways you can install this tool.

Validate your install

Check to make sure you can run the Pizza CLI:



pizza version

This will print out the version you have installed. Successfully running this in your terminal means you have successfully installed the CLI!

Generate a config

Before you can start generating codeowner files, you’ll need a YAML configuration that attributes git commit emails with GitHub user logins. You can generate a config through the ”pizza generate config” command:



pizza generate config /path/to/your/git/repo -i

The “-i” flag tells the Pizza CLI to use “interactive” mode. This iterates through the git ref-log and looks at commits and who authored them. It will then ask you to attribute those commit emails to individuals on your team:

Once you’ve finished generating your config, you’ll see that a .sauced.yaml file is now in your git repo where you originally pointed the pizza command to. It’s been populated with the associated logins and emails that can be used to attribute changes in the repository to individual owners.

We encourage you to commit this file to your repository as a core piece of configuration “infrastructure” denoting what individuals have what attributions associated with them. Alternatively, if exposing commit emails in the config is not acceptable, you may choose to add it to a private secret store and pull it down manually to make code attributions for individuals on your teams.

You can also attribute GitHub teams to long lists of emails in your YAML configuration:



attribution:
  # Keys may also be GitHub team names.
  open-sauced/engineering:
    - john@opensauced.pizza
    - other-user@email.com
    - other-user@no-reply.github.com

This way, multiple people can be associated to a single team within your configuration and will get the same attribution to anyone who is a code owner for those files. In other words, this is a powerful way to compose your teams and manage ownership across a whole team of engineers.

Generate a CODEOWNERS file

Now that you have a config, you can generate a CODEOWNERS file:



pizza generate codeowners /path/to/your/git/repo

This will read the .sauced.yaml configuration file that you generated in your repo to know which git commit emails are associated with GitHub logins or teams.

The codeowners generation iterates the git-ref log and looks at the number of lines touched and the frequency of updates from individuals. It will find the top 3 codeowners per file who’ve done the most work within the configured time range (note: you can use the –range flag to change how far back to look in the git ref-log!)

We’re very excited to be bringing this tool to you! Knowing who owns what and how to get help from other teams can improve your workflow and minimize bottlenecks. Using tools that help you do that, can make it easier to connect with people when it matters most.

Be sure to check out the open source Pizza CLI repository for a full rundown of everything that’s possible with the Pizza CLI. And feel free to ask any questions or give us feedback in the GitHub issues!!

As always, stay saucy!

How we use Kubernetes jobs to scale OpenSSF Scorecard

John McBride — Thu, 08 Aug 2024 15:13:26 +0000

We recently released integrations with the OpenSSF Scorecard on the OpenSauced platform. The OpenSSF Scorecard is a powerful Go command line interface that anyone can use to begin understanding the security posture of their projects and dependencies. It runs several checks for dangerous workflows, CICD best practices, if the project is still maintained, and much more. This enables software builders and consumers to understand their overall security picture, deduce if a project is safe to use, and where improvements to security practices need to be made.

But one of our goals with integrating the OpenSSF Scorecard into the OpenSauced platform was to make this available to the broader open source ecosystem at large. If it’s a repository on GitHub, we wanted to be able to display a score for it. This meant scaling the Scorecard CLI to target nearly any repository on GitHub. Much easier said than done!

In this blog post, let’s dive into how we did that using Kubernetes and what technical decisions we made with implementing this integration.

We knew that we would need to build a cron type microservice that would frequently update scores across a myriad of repositories: the true question was how we would do that. It wouldn’t make sense to run the scorecard CLI ad-hoc: the platform could too easily get overwhelmed and we wanted to be able to do deeper analysis on scores across the open source ecosystem, even if the OpenSauced repo page hasn’t been visited recently. Initially, we looked at using the Scorecard Go library as direct dependent code and running scorecard checks within a single, monolithic microservice. We also considered using serverless jobs to run one off scorecard containers that would give back the results for individual repositories.

The approach we ended up landing on, which marries simplicity, flexibility, and power, is to use Kubernetes Jobs at scale, all managed by a “scheduler” Kubernetes controller microservice. Instead of building a deeper code integration with scorecard, running one off Kubernetes Jobs gives us the same benefits of using a serverless approach, but with reduced cost since we’re managing it all directly on our Kubernetes cluster. Jobs also offer alot of flexibility in how they run: they can have long, extended timeouts, they can use disk, and like any other Kubernetes paradigm, they can have multiple pods doing different tasks.

Let’s break down the individual components of this system and see how they work in depth:

The first and biggest part of this system is the “scorecard-k8s-scheduler”; a Kubernetes controller-like microservice that kicks off new jobs on-cluster. While this microservice follows many of the principles, patterns, and methods used when building a traditional Kubernetes controller or operator, it does not watch for or mutate custom resources on the cluster. Its function is to simply kick off Kubernetes Jobs that run the Scorecard CLI and gather finished job results.

Let’s look first at the main control loop in the Go code. This microservice uses the Kubernetes Client-Go library to interface directly with the cluster the microservice is running on: this is often referred to as an on-cluster config and client. Within the code, after bootstrapping the on-cluster client, we poll for repositories in our database that need updating. Once some repos are found, we kick off Kubernetes jobs on individual worker “threads” that will wait for each job to finish.

// buffered channel, sort of like semaphores, for threaded working
sem := make(chan bool, numConcurrentJobs)

// continuous control loop
for {
    // blocks on getting semaphore off buffered channel
    sem <- true

    go func() {
        // release the hold on the channel for this Go routine when done
        defer func() {
            <-sem
        }()

        // grab repo needing update, start scorecard Kubernetes Job on-cluster,
        // wait for results, etc. etc.

        // sleep the configured amount of time to relieve backpressure
        time.Sleep(backoff)
    }()
}

This “infinite control loop” method, with a buffered channel, is a common way in Go to continuously do something but only using a configured number of threads. The number of concurrent Go funcs that are running at any one given time depends on what configured value the “numConcurrentJobs” variable has. This sets up the buffered channel to act as a worker pool or semaphore which denotes the number of concurrent Go funcs running at any one given time. Since the buffered channel is a shared resource that all threads can use and inspect, I often like to think of this as a semaphore: a resource, much like a mutex, that multiple threads can attempt to lock on and access. In our production environment, we’ve scaled the number of threads in this scheduler all running at once. Since the actual scheduler isn’t very computationally heavy and will just kick off jobs and wait for results to eventually surface, we can push the envelope of what this scheduler can manage. We also have a built-in backoff system that attempts to relieve pressure when needed: this system will increment the configured “backoff” value if there are errors or if there are no repos found to go calculate the score for. This ensures we’re not continuously slamming our database with queries and the scorecard scheduler itself can remain in a “waiting” state, not taking up precious compute resources on the cluster.

Within the control loop, we do a few things: first, we query our database for repositories needing their scorecard updated. This is a simple database query that is based on some timestamp metadata we watch for and have indexes on. Once a configured amount of time passes since the last score was calculated for a repo, it will bubble up to be crunched by a Kubernetes Job running the Scorecard CLI.

Next, once we have a repo to get the score for, we kick off a Kubernetes Job using the “gcr.io/openssf/scorecard” image. Bootstrapping this job in Go code using Client-Go looks very similar to how it would look with yaml, just using the various libraries and apis available via “k8s.io” imports and doing it programmatically:

// defines the Kubernetes Job and its spec
job := &batchv1.Job{
    // structs and details for the actual Job
    // including metav1.ObjectMeta and batchv1.JobSpec
}

// create the actual Job on cluster
// using the in-cluster config and client
return s.clientset.BatchV1().Jobs(ScorecardNamespace).Create(ctx, job, metav1.CreateOptions{})

After the job is created, we wait for it to signal it has completed or errored. Much like with kubectl, Client-Go offers a helpful way to “watch” resources and observe their state when they change:

// watch selector for the job name on cluster
watch, err := s.clientset.BatchV1().Jobs(ScorecardNamespace).Watch(ctx, metav1.ListOptions{
    FieldSelector: "metadata.name=" + jobName,
})

// continuously pop off the watch results channel for job status
for event := range watch.ResultChan() {
        // wait for job success, error, or other states
}

Finally, once we have a successful job completion, we can grab the results from the Job’s pod logs which will have the actual json results from the scorecard CLI! Once we have those results, we can upsert the scores back into the database and mutate any necessary metadata to signal to our other microservices or the OpenSauced API that there’s a new score!

As mentioned before, the scorecard-k8s-scheduler can have any number of concurrent jobs running at once: in our production setting we have a large number of jobs running at once, all managed by this microservice. The intent is to be able to update scores every 2 weeks across all repositories on GitHub. With this kind of scale, we hope to be able to provide powerful tooling and insights to any open source maintainer or consumer!

The “scheduler” microservice ends up being a small part of this whole system: anyone familiar with Kubernetes controllers knows that there are additional pieces of Kubernetes infrastructure that are needed to make the system work. In our case, we needed some role-based access control (RBAC) to enable our microservice to create Jobs on the cluster.

First, we need a service account: this is the account that will be used by the scheduler and have access controls bound to it:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: scorecard-sa
  namespace: scorecard-ns

We place this service account in our “scorecard-ns” namespace where all this runs.

Next, we need to have a role and role binding for the service account. This includes the actual access controls (including being able to create Jobs, view pod logs, etc.)

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: scorecard-scheduler-role
  namespace: scorecard-ns
rules:
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["create", "delete", "get", "list", "watch", "patch", "update"]
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]

—

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: scorecard-scheduler-role-binding
  namespace: scorecard-ns
subjects:
- kind: ServiceAccount
  name: scorecard-sa
  namespace: scorecard-ns
roleRef:
  kind: Role
  name: scorecard-scheduler-role
  apiGroup: rbac.authorization.k8s.io

You might be asking yourself “Why do I need to give this service account access to get pods and pod logs? Isn’t that an over extension of the access controls?” Remember! Jobs have pods and in order to get the pod logs that have the actual results of the scorecard CLI, we must be able to list the pods from a job and then read their logs!

The second part of this, the “RoleBinding”, is where we actually attach the Role to the service account. This service account can then be used when kicking off new jobs on the cluster.

—

Huge shout out to Alex Ellis and his excellent run-job controller: this was a huge inspiration and reference for correctly using Client-Go with Jobs!

Stay saucy everyone!

Introducing OpenSSF Scorecard for OpenSauced

John McBride — Tue, 06 Aug 2024 17:48:11 +0000

In September of 2022, the European Parliament introduced the “Cyber Resilience Act”, commonly called the CRA: a new piece of legislation that requires anyone providing digital products in the EU to meet certain security and compliance requirements.

But there’s a catch: before the CRA, companies providing or distributing software would often need to take on much of the risk when ensuring safe and reliable software was being shipped to end users. Now, software maintainers further down the supply chain will have to carry more of that weight. Not only may certain open source maintainers need to meet certain requirements, but they may have to provide an up to date security profile of their project.

As the Linux Foundation puts it:

The Act shifts much of the security burden onto those who develop software, as opposed to the users of software. This can be justified by two assumptions: first, software developers know best how to mitigate vulnerabilities and distribute patches; and second, it’s easier to mitigate vulnerabilities at the source than requiring users to do so.

There’s a lot to unpack in the CRA. And it’s still not clear how individual open source projects, maintainers, foundations, or companies will be directly impacted. But, it’s clear that the broader open source ecosystem needs easier ways to understand the security risk of projects deep within dependency chains. With all that in mind, we are very excited to introduce the OpenSSF Scorecard ratings within the OpenSauced platform.

What is the OpenSSF Scorecard?

The OpenSSF is the Open Source Security Foundation: a multidisciplinary group of software developers, industry leaders, security professionals, researchers, and government liaisons. The OpenSSF aims to enable the broader open source ecosystem “to secure open source software for the greater public good.” They interface with critical personnel across the software industry to fight for a safer technological future.

The OpenSSF Scorecard project is an effort to unify what best practices open source maintainers and consumers should use to judge if their code, practices, and dependencies are safe. Ultimately, the “scorecard” command line interface gives any the capability to inspect repositories, run “checks” against those repos, and derive an overall score for the risk profile of that project. It’s a very powerful software tool that gives you a general picture of where a piece of software is considered risky. It can also be a great starting point for any open source maintainer to develop better practices and find out where they may need to make improvements. By providing a standardized approach to assessing open source security and compliance, the Scorecard helps organizations more easily identify supply chain risks and regulatory requirements.

OpenSauced OpenOSSF Scorecards

Using the scorecard command line interface as a cornerstone, we’ve built infrastructure and tooling to enable OpenSauced to capture scores for nearly all repositories on GitHub. Anything over a 6 or a 7 is generally considered safe to use with no blaring issues. Scores of 9 or 10 are doing phenomenally well. And projects with lower scores should be inspected closely to understand what’s gone wrong.

Scorecards are enabled across all repositories. With this integration, we aim to make it easier for software maintainers to understand the security posture of their project and for software consumers to be assured that their dependencies are safe to use.

Starting today, you can see the score for any project within individual Repository Pages. For example, in kubernetes/kubernetes, we can see the project is safe for use:

Let’s look at another example: crossplane/crossplane. These maintainers are doing an awesome job of ensuring they are following best practices for open source security and compliance!!

The checks that the OpenSSF Scorecard looks for involves a wide range of common open source security practices, both “in code” and with the maintenance of the project: e.g. checking for code review best practices, if there are “dangerous workflows” present (like untrusted code being run and checked out during CI/CD runs), if the project is actively maintained, the use of signed releases, and many more.

The Future of OpenSSF Scorecards at OpenSauced

We plan to bring the OpenSSF Scorecard to more of the OpenSauced platform, as we aim to be the definitive place for open source security and compliance for maintainers and consumers. As part of that, we’ll be bringing more details to the OpenSSF Scorecard with how individual checks are ranked:

We’ll also be bringing OpenSSF Scorecard to our premium offering, Workspaces:

Within a Workspace, you’ll soon be able to get an idea of how each of the projects you are tracking stack up alongside each other's score for open source security and compliance. You can use the OpenSSF Score together with all the Workspace insights and metrics, all in one single dashboard, to get a good idea of what’s happening within a set of repositories and what their security posture is. In this example, I’m tracking all the repositories within the bottlerocket-os org on GitHub, a security focused Linux based operating system: I can see that each of the repositories has a good rating which gives me greater confidence in the maintenance status and security posture of this ecosystem. This also enables stakeholders and maintainers of Bottlerocket to have a birds eye snapshot of the compliance and maintenance status of the entire org.

As the CRA and similar regulations push more of the security burden onto developers, tools like the OpenSSF Scorecard become invaluable. They offer a standardized, accessible way to assess and improve the security of open source projects, helping maintainers meet new compliance requirements and giving software consumers confidence in their choices.

Looking ahead, we're committed to expanding these capabilities at OpenSauced. By providing comprehensive security insights, from individual repository scores to organization-wide overviews in Workspaces, we're working to create a more secure and transparent open source ecosystem, to enable anyone in the open source community to better understand their software dependencies, feel empowered to make a meaningful change if needed, and provide helpful tools to open source maintainers to better maintain their projects.

Stay saucy!

Understanding the Lottery Factor

John McBride — Wed, 22 May 2024 22:08:11 +0000

It’s 2:36am on a Sunday morning. You’re on-call and your pager is going off with a critical alert. You flip a light on, roll out of bed, and groggily open your laptop. Maybe it’s nothing and you can go back to bed, addressing whatever it is in the morning. You log on, silence the alert, and start digging into whatever’s going on. Something’s obviously not right: clients don’t seem to be connecting to your databases correctly. Or there’s some problem with the schema, but that wouldn’t make sense since no one should have pushed changes this late at night on a weekend. You start sifting through logs. You feel your pulse pick up as you notice strange logs from the databases. Really strange logs. Connection logs from IP addresses that you don’t recognize and aren’t within your VPC. Clients still aren’t able to connect so you decide to use the “break-glass” service account to investigate what’s going on inside one of your production databases and debug further. Maybe there’s a weird configuration that needs updating or something needs to be hard-reset to start working again.

What you see startles you: every single row of your production database has garbled up messes of data, not the textual data you were expecting. Digging further in, you find a recent change to the schema and pushes from the database root account. One change in particular catches your attention: a new table called “ransom_note”. You pause, shocked, waiting to see if you’ll suddenly wake up from a bad dream. You cautiously begin to inspect the new table: “SELECT COUNT(*) FROM ransom_note” returns only 1 row. “SELECT * FROM ransom_note” reveals your worst suspicions: “all your data has been encrypted, pay us 10 BTC to have the decryption key”.

This is a nightmare scenario of almost every technology business owner, Chief Information Security Officer, and security red-team: a sudden and unexpected attack orchestrated through some unknown means that completely cripples your operations. Maybe it was a well orchestrated social engineering attack. Maybe it was an extremely unfortunate misconfiguration that let some bad actors into your networks. Or maybe it was a sophisticated supply-chain attack from one of the many hundreds of open source dependencies you have within your product’s stack.

Supply-chain attacks have become very popular among nefarious actors for a few reasons: open source software is used nearly everywhere and many open source maintainers are spread incredibly thin. Open source software has become the critical infrastructure of the commons that we all depend on today. But it’s not unlikely to find solo-maintained or completely abandoned projects that have millions of downloads and sit in the critical dependency path within the software-supply-chain of many large enterprise products.

A good example of this is the recent xz supply-chain attack against ssh: a malicious actor was able to inject a backdoor into ssh, a secure way to connect to other computers through a network, by adding nefarious code to the xz library, a lossless data compression library. In theory, if this had not been detected as early as it was, this would have given the nefarious actors a way to remotely execute code or gain access to any affected Linux computer. One thing that stands out in this example, like so many other supply-chain attacks, is the maintenance status of xz: it went relatively untouched with only a few people around to maintain it. Burned out, with no other volunteers, and very few resources to dedicate to the project, the attacker was easily able to slip in malicious code. Because of how burned out the maintainers were, the attacker automatically “inherits trust built up by the original maintainer”, using that good will to make nefarious changes.

For further reading and analysis on the tragedy of the xz attack, I highly recommend this piece from Rob Mensching.

While there’s no one catch-all solution for preventing these kinds of problems in open source, one piece of the bigger puzzle is the “Lottery Factor”: a metric that looks at open source communities and the weight and distribution of work being done by individuals within a project.

The way we at OpenSauced are defining the Lottery Factor is a follows:

The minimum number of team members that have to suddenly disappear from a project (they won the lottery!) before the project stalls due to lack of knowledgeable or competent personnel. If 1 contributor makes over 50% of commits: Very high risk. 2 contributors make over 50% of commits: High risk. 3 to 5 contributors make over 50% of commits: Moderate risk. And over 5 contributors make over 50% of commits: Low risk.

The Lottery Factor can help uncover this sort of burnout and identify projects that need an injection of critical engineering resources. This can begin to give you an idea of how catastrophic it would be if someone who makes the majority of contributions in a project suddenly disappeared (because they won the lottery and went off to live their best life on the beach!). This may happen for any number of reasons and it’s important to note that the Lottery Factor is unique to each individual project: it’s not a hard and fast rule, but rather, another important metric in understanding the full story of a project.

With all that in mind, we are very excited to unveil the inclusion of the Lottery Factor in OpenSauced Repo Pages as an additional metric and insight you can inspect!!

Through the lens of the Lottery Factor, we can begin to look at projects with a better understanding of where the critical “human” links in the secure software supply chain are, where funding resources need to be spent, and where to allocate crucial engineering resources.

In the analogjs/analog example above, we can see that in the last 30 days, about 50% of contributions were made by ~2 contributors, 50% of that being Brandon. This gives the overall Lottery factor as “High” and would start to unveil critical personnel in the Analog and Angular ecosystem.

An example of a project where the lottery factor is critically high is core-js, a widely used JavaScript standards library in use by Amazon, Netflix, and many other Fortune 500 companies across the web:

Over the last 90 days, the core maintainer “zloirock” has made the majority of the contributions. And, because of the wide adoption of core-js, this library could be a good candidate for an injection of critical resources to ensure the good standing and governance of the library.

Now, let’s look at a project with a “Low” Lottery Factor over the last year where there are no single individuals with the majority of the commits, kubernetes/kubernetes:

Because there are so many different people from so many different companies invested in the success of the Kubernetes platform and the cloud-native ecosystem, it makes sense that there are no single critical individuals that would be the sole point of failure if they were no longer working on the project.

The Lottery Factor can help tell a story unique to each individual community and project. And it can help open source project offices, small teams, or individual contributors better understand the landscape of any open source project or piece of technology they depend on.

We at OpenSauced hope this can start to help you understand where the critical human factor is within projects you contribute to and depend on! Make sure to check-out OpenSauced Repo Pages and stay saucey everyone!

How We Saved 10s of Thousands of Dollars Deploying Low Cost Open Source AI Technologies At Scale with Kubernetes

John McBride — Tue, 14 May 2024 05:54:00 +0000

When you first start building AI applications with generative AI, you'll likely end up using OpenAI's API at some point in your project's journey. And for good reason! Their API is well-structured, fast, and supported by great libraries. At a small scale or when you’re just getting started, using OpenAI can be relatively economical. There’s also a huge amount of really great educational material out there that walks you through the process of building AI applications and understanding complex techniques using OpenAI’s API.

One of my personal favorite OpenAI resources these days is the OpenAI Cookbook: this is an excellent way to start learning how their different models work, how to start taking advantage of the many cutting edge techniques in the AI space, and how to start integrating your data with AI workloads.

However, as soon as you need to scale up your generative AI operations, you'll quickly encounter a pretty significant obstacle: the cost. Once you start generating thousands (and eventually tens of thousands) of texts via GPT-4, or even the lower-cost GPT-3.5 models, you'll quickly find your OpenAI bill is also growing into the thousands of dollars every month.

Thankfully, for small and agile teams, there are a lot of great options out there for deploying low cost open source technologies to reproduce an OpenAI compatible API that uses the latest and greatest of the very solid open source models (which in many cases, rival the performance of the GPT 3.5 class of models).

This is the very situation we at OpenSauced found ourselves in when building the infrastructure for our new AI offering, StarSearch: we needed a data pipeline that would continuously get summaries and embeddings of GitHub issues and pull requests in order to do a “needle in the haystack” cosine similarity search in our vector store as part of a Retrieval Augmented Generation (RAG) flow. RAG is a very popular technique that enables you to provide additional context and search results to a large language model where it wouldn’t have that information in its foundational data otherwise. In this way, an LLM’s answers can be much more accurate for queries that you can "augment" with data you’ve given it context on.

Cosine similarity search on top of a vector store is a way to enhance this RAG flow even further: because much of our data is unstructured and would be very difficult to parse through using a full text search, we’ve created vector embeddings on AI generated summaries of relevant rows in our database that we want to be able to search on. Vectors are really just a list of numbers but they represent an “understanding” from an embedding machine learning model that can be used with query vector embeddings to find the “nearest neighbor” data to the end users question.

Initially, for the summary generation part of our RAG data pipeline, we were using OpenAI directly and wanted to target "knowing" about the events and communities of the top 40,000+ repositories on GitHub. This way, anyone could ask about and gain unique insights into what's going on across the most prominent projects in the open source ecosystem. But, since new issues and pull request events are always flowing through this pipeline, on any one given day, upwards of 100,000 new events for the 40,000+ repos would flow through to have summaries generated: that’s a lot of calls to the OpenAI API!!

At this kind of scale, we quickly ran into "cost" bottlenecks: we considered further optimizing our usage of OpenAI's APIs to reduce our overall usage, but felt that there was a powerful path forward by using open source technologies at a significantly lower cost to accomplish the same goal at our target scale.

And while this post won’t get too deep into how we implemented the actual RAG part of StarSearch, we will look at how we bootstrapped the infrastructure to be able to consume many tens of thousands of GitHub events, generate AI summaries from them, and surface those as part of a nearest neighbor search using vLLM and Kubernetes. This was the biggest unlock to getting StarSearch to be able to surface relevant information about various technologies and "know" about what's going on across the open source ecosystem.

There’s a lot more that could be said about RAG and vector search - I recommend the following resources:

Running open source inference engines locally

Today, thanks to the power and ingenuity of the open source ecosystem, there are a lot of great options for running AI models and doing "generative inference" on your own hardware.

A few of the most prominent that come to mind are llama.cpp, vLLM, llamafile, llm, gpt4all, and the Huggingface transformers. One of my personal favorites is Ollama: it allows me to easily run an LLM with ollama run on the command line of my MacBook. All of these, with their own spin and flavors on the open source AI space, provide a very solid way for you to run open source large language models (like Meta's llama3, Mistral's mixtral model, etc.) locally on your own hardware without the need for a third party API.

Maybe even more importantly, these pieces of software are well optimized for running models on consumer grade hardware like personal laptops and gaming computers: you don't need a cluster of enterprise grade GPUs or an expensive third party service in order to start playing around with generating text! You can get started today and start building AI applications right from your laptop using open source technology with no 3rd party API.

This is exactly how I started transitioning our generative AI pipelines from OpenAI to a service we run on top of Kubernetes for StarSearch: I started simple with Ollama running a Mistral model locally on my laptop. Then, I began transitioning our OpenAI data pipelines that read from our database and generate summaries to start using my local Ollama server. Ollama, along with many of the other inference engines out there, provide an OpenAI compatible API. Using this, I didn’t have to re-write much of the client code: simply replace the OpenAI API endpoint with the localhost pointed to Ollama.

Choosing vLLM for production

Eventually, I ran into a real bottleneck using Ollama: it didn't support servicing concurrent clients. And, at the kind of scale we're targeting, at any given time, we likely need a couple dozen of our data pipeline microservice runners to all concurrently be batch processing summaries from the generative AI service all at once. This way, we could keep up with the constant load from over 40,000+ repos on GitHub. Obviously OpenAI's API can handle this kind of load, but how would we replicate this with our own service?

Eventually, I found vLLM, a fast inference runner that can service multiple clients behind an OpenAI compatible API and take advantage of multiple GPUs on a given computer with request batching and an efficient use of "PagedAttention" when doing inference. Also like Ollama, the vLLM community provides a container runtime image which makes it very easy to use on a number of different production platforms. Excellent!

Note to the reader: Ollama very recently merged changes to support concurrent clients. At the time of this writing, it was not supported in the main upstream image, but I’m very excited to see how it performs compared to other multi-client inference engines!

Running vLLM locally

To run vLLM locally, you’ll need a linux system and a python runtime:

python -m vllm.entrypoints.openai.api_server \
    --model mistralai/Mistral-7B-Instruct-v0.2

This will start the OpenAI compatible server which you can then hit locally on port 8000:

curl http://localhost:8000/v1/models

{
    "object": "list",
    "data": [
        {
            "id": "mistralai/Mistral-7B-Instruct-v0.2",
            "object": "model",
            "created": 1715528945,
            "owned_by": "vllm",
            "root": "mistralai/Mistral-7B-Instruct-v0.2",
            "parent": null,
            "permission": [
                {
                    "id": "modelperm-020c373d027347aab5ffbb73cc20a688",
                    "object": "model_permission",
                    "created": 1715528945,
                    "allow_create_engine": false,
                    "allow_sampling": true,
                    "allow_logprobs": true,
                    "allow_search_indices": false,
                    "allow_view": true,
                    "allow_fine_tuning": false,
                    "organization": "*",
                    "group": null,
                    "is_blocking": false
                }
            ]
        }
    ]
}

Alternatively, to run a container with the OpenAI compatible API, you can use docker on your linux system:

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model mistralai/Mistral-7B-Instruct-v0.2

This will mount the local Huggingface cache on my linux machine and use the host network. Then, using localhost again, we can hit the OpenAI compatible server running on docker. Let’s do a chat completion now:

curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "TheBloke/Mistral-7B-Instruct-v0.2-AWQ",
      "messages": [
        {"role": "user", "content": "Who won the world series in 2020?"}
      ]
  }'

{
    "id": "cmpl-9f8b1a17ee814b5db6a58fdfae107977",
    "object": "chat.completion",
    "created": 1715529007,
    "model": "mistralai/Mistral-7B-Instruct-v0.2",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The Major League Baseball (MLB) World Series in 2020 was won by the Tampa Bay Rays. They defeated the Los Angeles Dodgers in six games to secure their first-ever World Series title. The series took place from October 20 to October 27, 2020, at Globe Life Field in Arlington, Texas."
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "usage": {
        "prompt_tokens": 21,
        "total_tokens": 136,
        "completion_tokens": 115
    }
}

Using Kubernetes for a large scale vLLM service

Running vLLM locally works just fine for testing, developing, and experimenting with inference, but at the kind of scale we're targeting, I knew we'd need some kind of environment that could easily handle any number of compute instances with GPUs, scale up with our needs, and load balance vLLM behind an agnostic service that our data pipeline microservices could hit at a production rate: enter Kubernetes, a familiar and popular container orchestration system!

This, in my opinion, is a perfect use case for Kubernetes and would make scaling up an internal AI service that looked like OpenAI's API relatively seamless.

In the end, the architecture for this kind of deployment looks like this:

Deploy any number of Kubernetes nodes with any number of GPUs on each node into a nodepool
- Install GPU drivers per the managed Kubernetes service provider instructions. We're using Azure AKS so they provide these instructions for utilizing GPUs on cluster.
Deploy a daemonset for vLLM to run on each node with a GPU
Deploy a Kubernetes service to load balance internal requests to vLLM's OpenAI compatible API

Getting the cluster ready

If you're following along at home and looking to reproduce these results, I'm assuming at this point you have a Kubernetes cluster already up and running, likely through a managed Kubernetes provider, and have also installed the necessary GPU drivers onto the nodes that have GPUs.

Again, on Azure’s AKS, where we deployed this service, we needed to run a daemonset that installs the Nvidia drivers for us on each of the nodes with a GPU:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: gpu-resources
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      containers:
      - image: mcr.microsoft.com/oss/nvidia/k8s-device-plugin:v0.14.1
        name: nvidia-device-plugin-ctr
        securityContext:
          capabilities:
            drop:
            - All
        volumeMounts:
        - mountPath: /var/lib/kubelet/device-plugins
          name: device-plugin
      nodeSelector:
        accelerator: nvidia
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: nvidia.com/gpu
        operator: Exists
      volumes:
      - hostPath:
          path: /var/lib/kubelet/device-plugins
          type: ""
        name: device-plugin

This daemonset installs the Nvidia device plugin pod on each node that has the node selector accelerator: nvidia and can tolerate a few taints from the system. Again, this is more or less platform specific but this enables our AKS cluster to have the necessary drivers for the nodes that have GPUs so vLLM can take full advantage of those compute units.

Eventually, we end up with a cluster node configuration that has the default nodes and the nodes with GPUs:

❯ kubectl get nodes -A
NAME                    STATUS   ROLES  AGE   VERSION
defaultpool-88943984-0   Ready  <none>   5d v1.29.2
defaultpool-88943984-1   Ready  <none>   5d v1.29.2
gpupool-42074538-0      Ready   <none>   41h   v1.29.2
gpupool-42074538-1      Ready   <none>   41h   v1.29.2
gpupool-42074538-2      Ready   <none>   41h   v1.29.2
gpupool-42074538-3      Ready   <none>   41h   v1.29.2
gpupool-42074538-4      Ready   <none>   41h   v1.29.2

Each of these nodes has a gpu device plugin pod managed by the daemonset where the drivers get installed:

❯ kubectl get daemonsets.apps -n gpu-resources
NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR        AGE
nvidia-device-plugin-daemonset   5         5         5       5            5           accelerator=nvidia   41h

One thing to note for this setup: each of these gpu nodes have a accelerator: nvidia label and taints for nvidia.com/gpu. These are to ensure that no other pods are scheduled on these nodes since we anticipate vLLM consuming all the compute and GPU resources on each of these nodes.

Deploying a vLLM DaemonSet

In order to take full advantage of each of the GPUs deployed on the cluster, we can deploy an additional vLLM daemonset that also selects for each of the Nvidia GPU nodes:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: vllm-daemonset-ec9831c8
  namespace: vllm-ns
spec:
  selector:
    matchLabels:
      app: vllm
  template:
    metadata:
      labels:
        app: vllm
    spec:
      containers:
      - args:
        - --model
        - mistralai/Mistral-7B-Instruct-v0.2
        - --gpu-memory-utilization
        - "0.95"
        - --enforce-eager
        env:
        - name: HUGGING_FACE_HUB_TOKEN
          valueFrom:
            secretKeyRef:
              key: HUGGINGFACE_TOKEN
              name: vllm-huggingface-token
        image: vllm/vllm-openai:latest
        name: vllm
        ports:
        - containerPort: 8000
          protocol: TCP
        resources:
          limits:
            nvidia.com/gpu: "1"
      nodeSelector:
        accelerator: nvidia
      tolerations:
      - effect: NoSchedule
        key: nvidia.com/gpu
        operator: Exists

Let’s break down what’s going on here:

First, we create the metadata and label selectors for the vllm daemonset pods on the cluster. Then, in the container spec, we provide the arguments to the vLLM container running on the cluster. You’ll notice a few things here: we’re utilizing about 95% of GPU memory in this deployment and we are enforcing CUDA eager mode (which helps with memory consumption while trading off inference performance). One of the things I like about vLLM is its many options for tuning and running on different hardware: there are lots of capabilities for tweaking how the inference works or how your hardware is consumed. So check out the vLLM docs for further reading!

Next, you’ll notice we provide a Huggingface token: this is so that vLLM can pull down the model from Huggingface’s API and bypass any “gated” models that we’ve been given permission to access.

Next, we expose port 8000 for the pod. This will be used latter in a service to select for these pods and provide an agnostic way to hit a load balanced endpoint for any of the various deployed vLLM pods on port 8000. Then, we use a nvidia.com/gpu resource (which is provided as a node level resource by the Nvidia device plugin daemonset - again, depending on your managed Kubernetes provider and how you installed the GPU drivers, this may varry). And finally, we provide the same node selector and taint tolerations to ensure that vLLM runs only on the GPU nodes! Now, when we deploy this, we’ll see the vLLM daemonset has successfully deployed onto each of the GPU nodes:

❯ kubectl get daemonsets.apps -n vllm-ns
NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR        AGE
vllm-daemonset-ec9831c8   5         5         5       5            5           accelerator=nvidia   41h

Load balancing with an internal Kubernetes service

In order to provide a OpenAI like API to other microservices internally on the cluster, we can apply a Kubernetes service that selects for the vllm pods in the vllm namespace:

apiVersion: v1
kind: Service
metadata:
  name: vllm-service
  namespace: vllm-ns
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8000
  selector:
    app: vllm
  sessionAffinity: None
  type: ClusterIP

This simply selects for app: vllm pods and targets the vLLM 8000 port. This then will get picked up by the internal Kubernetes DNS server and we can use the resolved “vllm-service.vllm-ns” endpoint to be load balanced to one of the vLLM APIs.

Results

Let's hit this vLLM Kubernetes service endpoint:

# hitting the vllm-service internal api endpoint resolved by Kubernetes DNS

curl vllm-service.vllm-ns.svc.cluster.local/v1/chat \
    -H "Content-Type: application/json" \
    -d '{
     "model": "mistralai/Mistral-7B-Instruct-v0.2",
     "prompt": "Why is the sky blue?"
}'

This "vllm-service.vllm-ns" internal Kubernetes service domain name will resolve to one of the nodes running a vLLM daemonset (again, load-balanced across all the running vLLM pods) and will return inference generation for the prompt "Why is the sky blue?":

{
    "id": "cmpl-76cf74f9b05c4026aef7d64c06c681c4",
    "object": "chat.completion",
    "created": 1715533000,
    "model": "mistralai/Mistral-7B-Instruct-v0.2",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The color of the sky appears blue due to a natural phenomenon called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with molecules and particles in the air, such as nitrogen and oxygen. These particles scatter short-wavelength light, like blue and violet light, more than longer wavelengths, like red, orange, and yellow. However, we perceive the sky as blue and not violet because our eyes are more sensitive to blue light and because sunlight reaches us more abundantly in the blue part of the spectrum.\n\nAdditionally, some of the violet light gets absorbed by the ozone layer in the stratosphere, which prevents us from seeing a violet sky. At sunrise and sunset, the sky can take on hues of red, orange, and pink due to the scattering of sunlight through the Earth's atmosphere at those angles."
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "usage": {
        "prompt_tokens": 15,
        "total_tokens": 201,
        "completion_tokens": 186
    }
}

Conclusion

In the end, this provides our internal microservices running on the cluster a way to generate summaries without having to use an expensive 3rd party API: we found that we’ve gotten very good results from using the Mistral models and, for this use case at this scale, using a service we run on some GPUs has been significantly more economical.

You could expand on this and provide some additional networking policy or configurations to your internal service or even add an ingress controller to provide this as service to others outside of your cluster. The sky is the limit with what you can do from here! Good luck, and stay saucey!

If you want to check out StarSearch, join our waitlist now!

Awk: A beginners guide for humans

John McBride — Sun, 03 Mar 2024 22:16:52 +0000

Earlier this week, I had a file of names, each delimited by a newline:

john
jack
jill

But really, I needed this file to be in the form:

{
    "full_name": "name"
},

This file wasn't absolutely huge, but it was big enough that editing it manually would have been annoying. I thought to myself, "instead of editing this file manually or generating it correctly, how can I spend the maximum amount of time using a bespoke tool to get it in the right format? A neovim macro? Sed? Write some python? Why not awk!"

In the end, here's the awk command I used:

awk '{print "{\n    \"full_name\": \"" $0 "\"\n},"}' names.txt

This printed each line surrounded by the appropriate curly braces and whitespace.

Let's break down how I did this and build the command one bit at a time:

Awk is a Linux command line utility just like any other. But, similar to something like like python or lua, it's a special program interpreter that is especially good at scanning and processing inputs with small (or big) one liner programs you give it.

awk '<an-awk-program>' some-input-file

Let's start simple and just print the names from the file directly to stdout:

awk '{print $0}' names.txt

john
jack
jill

within the '', we provide awk with a small program it will execute. This is basically the "hello world" of awk: it just takes each line and prints it out just like it is, unedited, in the file.

But what is $0? Awk has the concept of "columns" in a file: these are typically space delimited. So a file like:

1 2 3
4 5 6

has 3 columns and 2 rows.

The $0 variable is a special one and represents the entire row of arguments. Then, each $N is the N-th (where 1 is the first column) argument in that row.

So, if we only wanted the 1st column in the above file with 3 columns, we could run the following awk program:

awk '{print $1}' numbers.txt

1
4

If we only wanted the 2nd and 3rd columns, we could run:

awk '{print $2 " " $3}' numbers.txt

2 3
5 6

(notice the blank " " we provide as a string to force some whitespace formatting so the columns are closer to what exists in the original file.

Next, lets add in some additional text to print out:

awk '{print "{\"full_name\": \"" $0 "\"},"}' names.txt

First thing you'll notice is a confusing array of " - the first " denotes the beginning of a string output for awk to print. The subsequent \" are literal escaped quotes which we want to appear in the output. We eventually end the first string with a standalone " to then print the line with the $0 variable and then we enter a string again to add the trailing bracket } and comma ,.

When run, this outputs:

{"full_name": "john"},
{"full_name": "jack"},
{"full_name": "jill"},

Now we're getting somewhere! Let's finish this off by adding the additional white spacing:

awk '{print "{\n    \"full_name\": \"" $0 "\"\n},"}' names.txt

{
    "full_name": "john"
},
{
    "full_name": "jack"
},
{
    "full_name": "jill"
},

The added whitespace within the strings (by including the literal escaped newlines \n) are printed to give the correct, desired output!

Bonus: what if we wanted to remove the trailing comma? What if we wanted to wrap this all in [...] to be closer to valid json? Yeah, yeah, I know, jq exists, but by the power of our lord and savior awk, all things possible!!

To remove the trailing comma, we can use a sliding window technique:

awk 'NR > 1 {print prev ","} {prev = "{\n    \"full_name\": \"" $0 "\"\n}"} END {print prev}' names.txt

This introduces abit more complexity.

First, we add the NR concept: NR is the "number of records". This can be really useful for checking progress, doing different things based on number of records processed, etc.

So, after the first record, we print the comma. We also always store the "previous" chunk in a prev variable: this is the N + 1 sliding window. Nothing actually happens when the first record is processed, it's line output is simply stored in the prev variable to be printed on the next iteration. This way, we're always one behind the current record and when we reach the very end (using the END keyword), we can print the previous chunk without the trailing comma!

To wrap it up the entire output in a square bracket and give it the correct spacing, we can use this awk program:

BEGIN {
    # Print the opening bracket for the JSON array
    print "["
}

NR > 1 {
    # after the first line, print the previously stored chunk
    print prev ","
}

{
    # Store the current line in a JSON object format
    prev = "    {\n        \"full_name\": \"" $0 "\"\n    }"
}

END {
    # Print the last line stored in prev and close the JSON array
    print prev "\n]"
}

We can run this awk program via a file instead of doing all of that on the command line directly. This greatly helps with readability, maintainability, etc.

awk -f format_names.awk names.txt

[
    {
        "full_name": "john"
    },
    {
        "full_name": "jack"
    },
    {
        "full_name": "jill"
    }
]

Just like the previous awk program, we are printing each segment and then at the end, leaving off the trailing comma. But this time, at the beginning of the program, using BEGIN and END, we print an opening and closing bracket.

Happy awk-ing and good luck!

Job scheduling with tmux

John McBride — Mon, 15 Jan 2024 23:41:43 +0000

Tmux is one of my favorite utilities: it's a terminal multiplexer that lets you create persistent shell sessions, panes, windows, etc. all within a single terminal. It's a great way to organize your shell sessions and natively give you multi-shell environments to work in without having to rely on a terminal program for those features.

You'd think in a world of modern applications and fancy terminals like iTerm 2 and Kitty, you wouldn't need such a utility. But time and time again, tmux has proven itself to be a powerful and essential tool. Especially when working with remote machines in the cloud or across SSH sessions, tmux is critical in maintaining my organization and getting things done.

Beyond multiplexing, tmux has some incredible capabilities that extend its functionality to be able to run and schedule jobs, automatically execute scripts within given contexts, and much more.

Let's look at a few use cases where we can schedule jobs to run and even create a whole production like environment, all organized and managed from tmux!

Running commands

Tmux offers a way to run scripts in new sessions automatically:

tmux new -s my-session \
  -c /path/to/directory 'echo "Hello Tmux!" && sleep 100'

Let's break this down: this arbitrary example creates a new session named "my-session", sets the session directory using the -c flag, and then executes a command.

This command will echo "Hello Tmux!" and then sleep for 100 seconds.

When running this tmux command, we are automatically attached to the session and see "Hello Tmux!" printed at the top of the screen and then the sleep command takes over. Once the sleep command is done, the session exits.

If we wanted to run this in the background, we could provide the -d flag: this will keep the new session detached and run the given commands behind the scenes in the background.

$ tmux new -s my-session 
  -d -c ~/workspace 'echo "hello world!" && sleep 1000'

$ tmux ls
my-session: 1 windows (created Mon Jan 15 11:02:21 2024)

Using tmux ls we can list out the current sessions and see my-session is running with 1 window in the background. This is part of the power of tmux: you can have sessions exist and persist outside of the current shell or session you are attached to. The sky is really the limit here and using multiple sessions, windows, and panes has become a cornerstone of my workflows.

If we wanted to attach to the session and see the progress of the command we gave it, we could run tmux a -t my-session. This will attach to the session named my-session.

Persisting sessions

This is all great, but not all that useful when need to latter observe the results of our command or persist the history: running a script for a new session or window or pane will automatically close once it's completed.

Instead, we can use a regular session we create and send it some commands remotely:

As an example, let's say we needed to run some tests in the background on our Typescript project with npm run test and latter observe the results. We can do this with the send keys command for sessions. Here, I'll be using the OpenSauced API as my playground:

Create a new named session:

# Create a new named, detached session
# that starts in the given directory
tmux new -s my-npm-tests -d -c ~/workspace/opensauced/api

Send the command

# Send the test command to the session
tmux send-keys -t my-npm-tests "npm run test" Enter

A few things to note here:

Enter uses the special "key binding syntax" for sending a literal Enter key at the end of the command. If we needed to send something else, like "control c", we could do that with C-c or M-c for "alt c". Check the official man page where this has a full description of what's possible with sending key bindings to sessions.

Attach to the session:

tmux a -t my-npm-tests

Now that we've sent our test command to the session, at any point in the future we can attach to the session to see how it did and check the results. Since the session will be persisted after the command has run, there's no rush to observe the results! The shell's full history for that session will be right there when we need it!

Check results

Within the attached session, we can see the full history of the npm command that was sent and check the results! This session is persisted so we can use the shell from this session to do additional work, detach, close it, etc.

$ npm run test
npm info using npm@9.6.7
npm info using node@v18.17.1

> @open-sauced/api@2.3.0-beta.2 test
> jest

npm info ok

$

Script it!

What if there are 5 or 6 things I want to do behind the scenes? Maybe I have a build and test process that can run many things in parallel at once? Instead of using send-keys manually, let's create a small script that can do this all for us!

#!/usr/bin/env bash

# Create named, detached sessions
tmux new -s npm-test -d -c ~/workspace/opensauced/api
tmux new -s npm-build -d -c ~/workspace/opensauced/api

# Send commands to the detached sessions
tmux send-keys -t npm-test "npm run test" Enter
tmux send-keys -t npm-build "npm run build" Enter

Running this script yields the following tmux sessions:

$ tmux ls
npm-build: 1 windows (created Mon Jan 15 11:31:28 2024)
npm-test: 1 windows (created Mon Jan 15 11:31:28 2024)

and can be attached to in order to inspect the results of each command.

If the commands to run within individual sessions is more complex than just a sole one liner, send-keys can also run a script or make command!

tmux send-keys -t kubernetes "make build" Enter

In this article, I'm assuming you always want to create a new session. But many of the same rules, flags, and syntaxes also apply to creating new windows, panes, etc. Tmux has a strong paradigm that is consistent across different ways to multi plex shells so it'd be just as simple to create 2 windows instead of two panes that we then send commands to:

#!/usr/bin/env bash

# Create named windows
tmux new-window -n npm-test -d -c ~/workspace/opensauced/api
tmux new-window -n npm-build -d -c ~/workspace/opensauced/api

# Send commands to the detached sessions
tmux send-keys -t 0:npm-test "npm run test" Enter
tmux send-keys -t 0:npm-build "npm run build" Enter

A few things to note here: instead of -s for the session name, we provide -n for the new window name. You'll also notice the send-keys syntax now includes a :. The first part is the name of the session (in my case, session named 0) and the name of the window to send the keys to.

Setting env variables for sessions

An important and powerful thing to remember here is environment variables: tmux provides the ability to denote global environment variables (env vars available to all new sessions) and session based env vars. In newer versions of tmux, I recommend setting the local session variable with the -e flag:

tmux new -s my-session -d -e MYVAR=myvalue -c /dir

This session named my-session will have access to the MYVAR environment variable we provided when creating the new session:

$ echo $MYVAR
myval

Scheduling jobs with `at` and scripts

One of the more powerful things I've used this all for is local job scheduling. Let's look at 2 examples using at and scripts:

One off `at` scheduling

at is a very basic command line utility that comes packaged with many desktop Linux distros and lets you do very simple one off scheduling.

For example, let's say that you needed to do a git push 3 hours from now in a specific directory:

tmux new -d -s git-push-later \
  -c /path/to/your/repo 'echo "git push" | at now + 3 hours'

This will create a new detached session named git-push later within the directory for your git repo and it sends git push to the at command via a pipe with the argument "now + 3 hours".

Looking at scheduled jobs via at:

$ at -l
1       Mon Jan 15 14:46:00 2024

I can see there is a scheduled job! Cool!! This isn't too much different than just running at manually from the given current directory, but it can be really useful and powerful if I'm working in a different directory or need to quickly load up some env vars. Better yet, you can easily combine this into a script that loads some global tmux environments to then execute many at commands in sequence.

Shell script scheduling

There are alot of ways in Linux to do what I'm suggesting here, primarily through cron and crontab but sometimes for a quick and dirty job that needs to run on repeat every so often in a background shell, it can be quick and dirty to just wrap what I'm doing in a loop with a sleep command:

while true; do
    # The command to continously run
    npm run test

    # Sleep for 5 minutes between runs
    sleep 5m
done

This can then be thrown in a script and executed via a tmux send-keys command like we've seen:

tmux send-keys -t my-npm-tests \
  "./run-tests-every-5-mins.sh" Enter

Why do it this way and not just have a cron job in the background?

For observable things, like builds, tests, etc., I really like to have a persistent shell session that I can attach to, detach from, and occasionally keep track of.

Usually with this method, these aren't things that are too important, so if the tmux server dies, it's nothing I can't quickly spin back up with a little tmux script. It's nice having a sort of "location" where these jobs are running in the background but always reachable from a different tmux window or tab. I sometimes find I've lost track of things Linux abstracts away with cron, systemd, etc. (which is generally a good thing: I don't want to have to think about the things systemd is managing!) So, instead, for the little things I need to keep an eye on, I choose to keep track of them in a tmux session!

Building production like environments

Using all of this and with my weird tendency to keep track of things in tmux sessions, let's build a simple production like environment using a starter script, docker, and a few tmux sessions!

Let's again look at an OpenSauced example: this starts a postgres database in docker, boots up the API (which will then attach to that database), and then starts the frontend:

#!/usr/bin/env bash

# Create named, detached sessions
tmux new -s database -d -c ~/workspace/opensauced/api
tmux new -s api -d -c ~/workspace/opensauced/api
tmux new -s frontend -d -c ~/workspace/opensauced/app

# Start the database up
tmux send-keys -t database "docker run -it --rm --name database -p 25060:5432 my_postgres_image:latest" Enter

# Start the API
tmux send-keys -t api "npm run start" Enter

# Start the frontend app
tmux send-keys -t frontend "npm run start" Enter

Horrifying, I know.

But surprisingly, I've found this to be a really great way to keep the various components of our system organized in a system I know well and can easily wrap my head around.

Then, when I'm done with this environment, I can easily tear it down by stopping the tmux sessions:

tmux kill-session database
tmux kill-session api
tmux kill-session frontend

And that's it! Easy organization, job scheduling, and multi tasking with tmux! Let me know if you have questions!!

How we made our Go microservice 24x faster

John McBride — Thu, 14 Sep 2023 15:34:49 +0000

As data intensive backend applications scale and grow, with larger data sets scaled out to higher availability, performance bottlenecks can quickly become major hurdles. Processing requests that once took mere milliseconds can suddenly become multi-minute problems.

In this blog post, let’s take a look at some recent optimization strategies the OpenSauced pizza micro-service recently underwent. This backend service is a Go server that processes git commits by request, sometimes processing thousands of commits in one single request. You can almost think of it as a real time batch processor that can be called by arbitrary clients to fetch and process git commits within an agnostic git repo.

These commits eventually are all indexed within a Postgres database. Most of these optimizations revolve around “batching” the Postgres calls instead of going one by one.
For simplicity in our examples, we’ll be using an arbitrary table called “my_table” with data that fits into the “my_data” column. Let’s dive in and take a look at how we can optimize!

Some setup first

Before we can go too much further, let’s make sure the database connection is bootstrapped correctly:

import (
    "database/sql"
    "log"

    _ "github.com/lib/pq"
)

func main() {
    // In a real world scenario, use good password handling practices
    // to handle connecting to the Postgres cluster!
    connectString := "host=my_host port=54321 user=my_postgres_user sslmode=require"

    // Acquire the *sql.DB instance
    db, err := sql.Open("postgres", connectString)
    if err != nil {
        log.Fatalf("Could not open database connection: %s", err)
    }

    // ping once to ensure the database connection is working
    err = db.Ping()
    if err != nil {
        log.Fatalf("Could not ping database: %s", err)
    }
}

This little bit of Go code sets up our Postgres connection and makes a single Ping to the database to ensure that everything is setup correctly. Now, we have a working db instance which in itself has many connection pools abstracted away that make concurrently querying and writing to a database a breeze. We don’t have to manage those connection pools ourselves; we get all that for free through the magic of Go’s pq library!

The brute force approach

When first written, the pizza micro-service would process each individual piece of data one row at a time. Here’s a very arbitrary example that demonstrates inserting data values one at a time into a Postgres database:

for _, v := range data {
    err := db.Exec("INSERT INTO my_table(my_data) VALUES($1)", v)
}

This is essentially a raw, brute force approach.

Round trip inserts into the database for all data members becomes an O(n) operation, which, depending on network latency and the power of your Postgres database, can quickly become a massive bottleneck. Even on a localhost network where network latency can generally be ignored, with a hunk of data containing many thousands of entries, these inserts can take several milliseconds each which adds up very quickly.

Just make it parallel!?

In theory, if you never really needed to handle conflicts within the database or elegantly surface errors, making the whole process parallel may work just fine:

for _, v := range data {
    go func(d string) {
        db.Exec("INSERT INTO my_table(my_data) VALUES($1)", d)
    }(v)
}

Here we are doing the same thing as the brute force approach but we’re firing off a new thread each time via a Go routine.
While you may see marginal performance improvements (depending on the system and the number of cores in the machine’s processor that correspond to the number of possible threads going at once), this still requires O(n) inserts into the database and can quickly throttle the pool of connections available in the *sql.DB we are using. And again, this doesn’t do a great job of handling multiple inserts that may conflict and ignores errors entirely. In other words, going with a parallel solution may seem like the ideal quickfix, but in reality, it may create more problems down the road.
So, generally, this approach isn’t recommended.

Using `CopyIn`

Thankfully, Postgres and the pq library offer powerful “transaction” paradigms that make it easy to batch massive sets of data all at once. If this was raw SQL, we’d be using the COPY FROM keywords to mass drop in data from a “file” directly into a table. All in one statement. Go’s pq library abstracts all that using the CopyIn method and allows for large batching operations.

Let’s take a quick look at how you would implement this and how it works:

// Start a psql transaction.
txn, err := p.db.Begin()
if err != nil {
        log.Fatalf("Could not start psql transaction: %s", err.Error())
}

// Make a "statement" to use for the psql transaction. The "CopyIn" takes
// our table name and the columns we are coping into.
//
// The error handling will rollback the transaction if there's a
// problem with preparing the statement.
stmt, err := txn.Prepare(pq.CopyIn("my_table", "my_data"))
if err != nil {
    txn.Rollback()
    log.Fatalf("Could not prepare psql statement: %s", err.Error())
}

// Iterate the data and add the data to the psql statement
for _, v := range data {
    err := stmt.Exec(v)
      if err != nil {
        log.Fatalf("Could not execute the statement: %s", err.Error())
      }
}

// Execute, commit, and close the transaction
err = stmt.Close()
if err != nil {
    log.Fatalf("Could not close the psql statement: %s", err.Error())
}

err = txn.Commit()
if err != nil {
    log.Fatalf("Could not commit the psql transaction: %s", err.Error())
}

All in all, this takes our number of round trips to the database from O(n) to just O(1) with a constant, predictable number of Postgres statements that will be executed. Much more efficient!

What about conflicts with unique constraints?
Taking all the data wholesale works fine if you can be relatively assured that there won’t ever be conflicts within it. But as soon as one of the rows you’re copying into has a unique identifier or some other unique constraint, you’ll run into major problems. For example, let’s say we’re processing a batch of emails and those emails being inserted into the database should all be unique: the above approach will fail as soon as a duplicate email is processed.

Unfortunately, the CopyIn approach we’re using doesn’t have a way to handle conflicts directly. We need a different way:
Enter the temporary table! Postgres offers a pretty powerful way to take a temporary table and pivot it into your real data tables, all while giving you the ability to handle conflicts. We’ll use a similar approach as above, but instead of adding everything to the real my_table, we’ll first create a temporary table to insert the data into:

tmpTableName := "my_tmp_table"

// Create a temporary table and use the real table as a template.
// "WHERE 1=0" is a trick to select no rows in psql but still copy 1 for 1
// all the data column types and names from the real table.
_, err := p.db.Exec(fmt.Sprintf("CREATE TEMPORARY TABLE %s AS SELECT * FROM my_table WHERE 1=0", tmpTableName))


if err != nil {
    log.Fatalf("Could not create temporary table: %s", err.Error())
}

Now that we have a temporary table, we can use that in our CopyIn to do a mass insert:

// Start a psql transaction.
txn, err := p.db.Begin()
if err != nil {
        log.Fatalf("Could not start psql transaction: %s", err.Error())
}

// Make a "statement" to use for the psql transaction.
// Notice the "my_tmp_table" as the table name
stmt, err := txn.Prepare(pq.CopyIn("my_tmp_table", "my_data"))
if err != nil {
    txn.Rollback()
    log.Fatalf("Could not prepare psql statement: %s", err.Error())
}

// Iterate the data, add the data to the psql statement
for _, v := range data {
    err := stmt.Exec(v)
}

// Execute, commit, and close the transaction
err = stmt.Close()
if err != nil {
    log.Fatalf("Could not close the psql statement: %s", err.Error())
}

err = txn.Commit()
if err != nil {
    log.Fatalf("Could not commit the psql transaction: %s", err.Error())
}

At this point, our temporary table has all the data: the table was created, the statement prepared, each data item added to the statement, and the transaction was committed.
Now, we can attempt to pivot the data from the temporary table into the real table, handling conflicts along the way:

_, err := p.db.Exec(`
    INSERT INTO my_table(my_data)
    SELECT my_data FROM my_tmp_table
    ON CONFLICT (my_data)
    DO NOTHING
`)
if err != nil {
    log.Fatalf("Could not pivot temporary table data: %s", err.Error())
}

// Drop the temporary table now that we're done pivoting the data
_, err = p.db.Exec(fmt.Sprintf("DROP TABLE %s", tmpTableName))
if err != nil {
    log.Fatalf("Could not drop temporary table: %s", err.Error())
}

In our example here, we use the temporary table’s data to mass insert into the real table. We avoid conflicts by doing nothing, dropping the conflicting data point. In a real world circumstance, you may want to do something with that data: the ON CONFLICT handler is really powerful and there’s alot of stuff you can do with it in psql.

Table name clashes

If you’re running the temporary table pivot on a server that handles many requests at scale concurrently, the obvious problem that will arise is clashes with a static temporary table name. Since we create the temporary table upon request and then drop it once we’re done, other threads may still be using it for operations of their own because the table name is not unique.

There are alot of methods for handling temporary table name clashes but an arbitrary one that is a good place to get started is to use a unique identifier:

rawUUID := uuid.New().String()
uuid := strings.ReplaceAll(rawUUID, "-", "")
tmpTableName := fmt.Sprintf("temp_table_%s_%d", uuid, atomic.AddInt64(&counter, 1))

This uses the github.com/google/uuid library to generate a UUID and replaces “-” with empty strings (since typically, dashes “-” are not valid within Postgres table names). We also combine this with a Go atomic counter (that is thread safe) in order to generate a unique table name: since these tables are short lived, individual uuid clashes are extremely unlikely, and we’re using an atomic counter to wrap it all up, the likelihood of a table name clash is nearly 0 using this basic approach.

If you’re going to horizontally scale out your service to many additional instances, it may be advantageous to develop an orchestration method to ensure there are no conflicts with temporary table names across your scaled deployment.

Overall, using batch inserts and table pivots in Postgres are a really powerful way to optimize your Go backends. Compared to the arbitrary, brute force approach, we found that this generally improved performance 24x. When processing a git repository with over 30,000 commits, using the standard “one by one” approach, processing would take about 1 minute. But, using the batch approach laid out above, this now only takes about 3 seconds. Wow! What an improvement!!

If you’re interested in diving in deeper on these methodologies and how we implemented them at OpenSauced, check out the original PR for this here!

https://insights.opensauced.pizza/feed/471

Stay saucy friends!!

There is no secure software supply-chain

John McBride — Sun, 03 Sep 2023 17:03:58 +0000

Years ago, entrepreneurs and innovators predicated that “software would eat the world”.

And to little surprise, year after year, the world has become more and more reliant on software solutions. Often times, that software is (or indirectly depends on) some open source software, maintained by a group of people whose only affiliation to one another may be participation in that open source project’s community.

But we’re in trouble. The security of open source software is under threat and we’re running out of people to reliably maintain those projects. And as our stacks get deeper, our dependencies become more interlinked, leading to terrifying compromises in the secure software supply-chain. For a perfect example of what’s happening in the open source world right now, we don’t need to look much further than the extremely popular Gorilla toolkit for Go.

In December of 2022, Gorilla was archived, a project that provided powerful web framework technology like mux and sessions. Over its lengthy tenure, it was the de facto Go framework for web servers, routing requests, handling HTTP traffic, and using websockets. It was used by tens of thousands of other software packages and it came as a shock to most people in the Go community that the project would be no more; no longer maintained, no more releases, and no community support. But for anyone paying close enough attention, the signs of turmoil were clear: open calls for maintainers went unanswered, there were few active outside contributors, and the burden of maintainership was very heavy.

The Gorilla framework was one of those “important dependencies”. It sat at the critical intersection of providing nice quality of life tools while still securely handling important payloads. Developers would mold their logic around the APIs provided by Gorilla and entire codebases would be shaped by the use of the framework. The community at large trusted Gorilla; the last thing you want in your server is a web framework riddled with bugs and CVEs. In the secure software supply-chain, much like Nginx and OpenSSL, it’s a project that was at the cornerstone of many other supply-chains and dependencies. If something went wrong in the Gorilla framework, it had the potential to impact millions of servers, services, and other projects.

The secure software supply-chain is one of those abstract concepts that giant tech companies, security firms, and news outlets all love to buzz wording about. It’s the “idea” that the software you are consuming as a dependency, all the way through your stack, is exactly the software you’re expecting to consume. In other words, it’s the assurance that some hacker didn’t inject a backdoor into a library or build tool you use, compromising your entire product, software library, or even company. Supply-chain attacks are mischievous because they almost never go after the actual intended target. Instead, they compromise some dependency to then go after the intended target.

The classic example, still to this day, is the Solar Winds attack: some unnamed, Russian state-backed hacker group was able to compromise the internal Solar Winds build system, leaving any subsequent software built using that system injected with backdoors and exploits. The fallout from this attack was massive. Many government agencies, including the State Department, confirmed massive data breaches. The estimated cost of this attack continues to rise and is estimated to be in the billions of dollars.

Product after product have popped up in the last few years to try and solve these problems: software signing solutions, automated security scanning tools, up to date CVE databases, automation bots, AI assisted coding tools, etc. There was even a whole Whitehouse counsel on the subject. The federal government knows this is the most important (and most critically vulnerable) vector to the well being of our nation’s software infrastructure and they’ve been taking direct action to fight these kind of attacks.

But the secure software supply-chain is also one of those things that falls apart quickly; without delicate handling and meticulous safeguarding, things go south fast. For months, the Gorilla toolkit had an open call for maintainers, seeking additional people to keep its codebases up to date, secure, and well maintained. But in the end, the Gorilla maintainers couldn’t find enough people to keep the project afloat. Many people volunteered but then were never seen again. And the bar for maintainer-ship was rightfully very high:

just handing the reins of even a single software package that has north of 13k unique clones a week (mux) is just not something I’d ever be comfortable with. This has tended to play out poorly with other projects.

And in the past, this has played out poorly in other projects:

In 2018, GitHub user FallingSnow opened the issue “I don’t know what to say.” in the popular, but somewhat unknown, NPM JavaScript package event-stream. He'd found something very peculiar in recent commits to the library. A new maintainer, not seen in the community before, with what appeared to be an entirely new GitHub account, had committed a strange piece of code directly to the main branch. This unknown new maintainer had also cut a new package to the NPM registry, forcing this change onto anyone tracking the latest packages in their project.

The changes looked like this: In a new file, a long inline encrypted string was added. The string would be decoded using some unknown environment variable, and then, that unencrypted string would be injected as a JavaScript module into the package, effectively executing whatever code was hidden behind the encrypted string. In short, unknown code was being deciphered, injected, and executed at runtime.

The GitHub issue went viral. And through sheer brute force, abit of luck, and hundreds of commenters, the community was able to decrypt the string, revealing the injected code’s purpose: a crypto-currency “wallet stealer”. If the code detected a specific wallet on the system, it used a known exploit to steal all the crypto stored in that wallet.

This exploitative code lived in the event-stream NPM module for months. Going undetected by security scanners, consumers, and the project’s owner. Only when someone in the community who was curious enough to take a look did this obvious code-injection attack become clear. But what made this attack especially bad was that the event-stream module was used by many other modules (and those modules used by other modules, and so on). In theory, this potentially affected thousands of software packages and millions of end-users. Developers who had no idea their JavaScript used event-stream deep in their dependency stack were now suddenly having to quickly patch their code. How was this even possible? Who approved and allowed this to happen?

The owner of the GitHub repository, and original author of the code, said:

he emailed me and said he wanted to maintain the module, so I gave it to him. I don't get any thing from maintaining this module, and I don't even use it anymore, and havn't for years.

and

note: I no longer have publish rights to this module on npm.

Just like that, just by asking, some bad actor was able to compromise tens of thousands of software packages, going undetected through the veil of “maintainership”.

In the past, I’ve referred to this as “The Risks of Single Maintainer Dependencies”: the overwhelming, often lonely, and sometimes dangerous experience of maintaining a widely distributed software package on your own. Like the owner of event-stream, most solo maintainers drift away, fading into the background to let their software go into disarray.

This was the case with Gorilla:

The original author and maintainer, moraes, had moved on a long time ago. kisielk and garyburd had the longest run, maintaining a mix of the HTTP libraries and gorilla/websocket respectively. I (elithrar) got involved sometime in 2014 or so, when I noticed kisielk doing a lot of the heavy lifting and wanted to help contribute back to the libraries I was using for a number of personal projects. Since about ~2018 or so, I was the (mostly) sole maintainer of everything but websocket, which is about the same time garyburd put out an (effectively unsuccessful) call for new maintainers there too.

The secure software supply-chain will never truly be strong and secure as long as a single solo maintainer is able to disrupt an entire ecosystem of packages by giving their package away to some bad actor. In truth, there is no secure software supply-chain: we are only as strong as the weakest among us and too often, those weak links in the chain are already broken, left to rot, or given up to those with nefarious purposes.

Whenever I bring up this topic, someone always asks about money. Oh, money, life’s truest satisfaction! And yes! Money can be a powerful motivator for some people. But it’s a sad excuse for what the secure software supply-chain really needs: true reliability. The software industry can throw all the money it wants at maintainers of important open source projects, something Valve has started doing:

Griffais says the company is also directly paying more than 100 open-source developers to work on the Proton compatibility layer, the Mesa graphics driver, and Vulkan, among other tasks like Steam for Linux and Chromebooks.

but at some point, it becomes unreasonable to ask just a handful of people to hold up the integrity, security, and viability of your companies entire product stack. If it’s that important, why not hire some of those people, build a team of maintainers, create processes for contribution, and allocate developer time into the open source? Too often I hear about solving open source problems by just throwing money at it, but at some point, the problems of scaling software delivery outweigh any amount you can possibly pay a few people. Let’s say you were building a house, it might make sense to have one or two people work on the foundation. But if you’re zoning and building an entire city block, I’d sure hope you’d put an entire team on planning, building, and maintaining those foundations. No amount of money will make just a few people build a strong and safe foundation all by themselves. But what we’re asking some open source maintainers to do is to plan, build, and coordinate the foundations for an entire world.

And this is something the Gorilla maintainers recognized as well:

No. I don’t think any of us were after money here. The Gorilla Toolkit was, looking back at the most active maintainers, a passion project. We didn’t want it to be a job.

For them, it wasn’t about the money, so throwing any amount at the project wouldn’t have helped. It was about the software’s quality, maintainability, and the kind of intrinsic satisfaction it provided.

So then, how can we incentivize open source maintainers to maintain their software in a scalable, realistic way? Some people are motivated by the altruistic value they provide to a community. Some are motivated by fame, power, and recognition. Others still just want to have fun and work on something cool. It’s impossible to understand the complicated, interlinked way
different people in an open source community are all motivated. Instead, the best solution is obvious: If you are on a team that relies on some piece of open source software, allocate real engineering time to contributing, being apart of the community, and helping maintain that software. Eventually, you’ll get a really good sense of how a project operates and what motivates its main players. And better yet, you’ll help alleviate the heavy burden of solo maintainership.

Sometimes, I like to think of software like its a wooden canoe, its many dependencies making up the wooden strips of the boat. When first built, it seems sturdy, strong, and able to withstand the harshest of conditions. Its first coat of oil finish is fresh and beautiful, its wood grains smooth and
unbent. But as the years ware on, eventually, its finish fads, its wooden strips need replacing, and maybe, if it takes on water, it requires time and new material to repair. Neglected long enough, and its wood could mold and rot from the inside, completely compromising the integrity of the boat. And just like a boat, software requires time, energy, maintenance, and “hands-on-deck” to ensure its many links in the secure software supply-chain are strong. Otherwise, the termites of time and the rot of bad-actors weaken links in the chain, compromising the stability of it all.

In the end, the maintainers of the Gorilla framework did the right thing: they decommissioned a widely used project that was at risk of rotting from the inside out. And instead of let it live in disarray or potentially fall into the hands of bad actors, it is simply gone. Its link on the chain of software has been purposefully broken to force anyone using it to choose a better, and hopefully, more secure option.

I do believe that open source software is entitled to a lifecycle — a beginning, a middle, and an end — and that no project is required to live on forever. That may not make everyone happy, but such is life.

But earlier this year, people in the Gorilla community noticed something: a new group of individuals from Red Hat had been added as maintainers to the Gorilla GitHub org. Was Red Hat taking the projected over? No, but ironically, the emeritus maintainers had done exactly what they promised they would never do: at the 11th hour, they handed over the project to people with little vetting from the community.

To address many comments that we have seen - we would like to clarify that Red Hat is not taking over this project. While the new Core Maintainers all happen to work at Red Hat, our hope is that developers from many different organizations and backgrounds will join the project over time.

Maybe Gorilla was too important to drift slowly into obscurity and Red Hat rightfully allocated some engineering resources to the project. Gorilla lives on. Here's hoping the code is in good hands.

DEV Community: John McBride

Let Us Be Free

OpenSauced on Azure: Lessons learned from a near-zero downtime migration

Azure Kubernetes Service for container runtimes

Choosing a Kubernetes Ingress controller

Using Pulumi for infrastructure as code and deployment

Azure Flexible servers for managed Postgres

Grafana Observability

Technical Deep Dive: How We Built the Pizza CLI Using Go and Cobra

Using Go and Cobra

Structuring the Codebase

Using go-git

Integrating Posthog telemetry

Iterative Development and Testing

Using Just

Conclusion

Introducing the Pizza CLI

Introducing the Pizza CLI

Enhanced collaboration for large teams

Getting Started with Pizza CLI

Installation

Validate your install

Generate a config

Generate a CODEOWNERS file

How we use Kubernetes jobs to scale OpenSSF Scorecard

Introducing OpenSSF Scorecard for OpenSauced

What is the OpenSSF Scorecard?

OpenSauced OpenOSSF Scorecards

The Future of OpenSSF Scorecards at OpenSauced

Understanding the Lottery Factor

How We Saved 10s of Thousands of Dollars Deploying Low Cost Open Source AI Technologies At Scale with Kubernetes

Running open source inference engines locally

Choosing vLLM for production

Running vLLM locally

Using Kubernetes for a large scale vLLM service

Getting the cluster ready

Deploying a vLLM DaemonSet

Load balancing with an internal Kubernetes service

Results

Conclusion

Awk: A beginners guide for humans

Job scheduling with tmux

Running commands

Persisting sessions

Script it!

Setting env variables for sessions

Scheduling jobs with at and scripts

One off at scheduling

Shell script scheduling

Building production like environments

How we made our Go microservice 24x faster

Some setup first

The brute force approach

Just make it parallel!?

Using CopyIn

Table name clashes

There is no secure software supply-chain

Scheduling jobs with `at` and scripts

One off `at` scheduling

Using `CopyIn`