DEV Community: Super Payments

Setting Up a Robust Local DevX for Snowflake Python Development

Jag Thind — Fri, 27 Feb 2026 17:04:28 +0000

In the evolving world of data engineering, developing Python-based workloads in Snowflake (via Snowpark, Python UDFs, or Stored Procedures) has become increasingly popular. However, as pipelines become more complex, a critical question arises: How should we develop and maintain our Python code for Snowflake?

While the convenience of browser-based editors like Snowflake Workspaces is fine for quick scripts, there is a significant "Developer Experience (DevX) Gap" that emerges when you try to build production-grade Python code in a browser tab.

Why I'm writing this blog

I've seen many Data Engineers and Analytics Engineers fall into the "UI Trap" of writing complex Python logic directly in Snowflake, only to struggle with inconsistent environments, broken dependencies, and the frustration of "it works on my machine, but not on others" problems. This blog is born out of a desire to share a better way.

My goal is to encourage people to step out of the browser and into a professional local development environment. By establishing repeatable local dev environments where every developer uses the same Python version, the same dependencies, and the same tooling, we can build Python-based features that are not just functional and robust, but most importantly maintainable by others.

One aspect of democratizing data rich features in a product is by making it easier to develop and maintain code using consistent tools. This is why we need to focus on local DevX!

What we'll cover

We will explore the merits of a local-first approach to Snowflake Python development, specifically focusing on:

Deterministic Python versions with pyenv and .python-version
Robust dependency management with Poetry and pyproject.toml
Consistent tooling configured in a single file
Simplified task execution with Poe the Poet

Python version management with pyenv

On macOS pyenv is a tool for managing multiple Python versions. It allows you to install and switch between different Python versions on a per-project basis by creating a .python-version file in the project root.

Why this matters for DevX: By pinning the Python version in version control, you ensure that:

Every developer uses the same Python version for the project.
Your CI/CD pipeline can install the exact same version.
You avoid subtle bugs that arise from Python version differences.
Dependencies work consistently (some packages require specific Python versions).
Debugging is easier when issues are reproducible across all environments.

Setup:

Install pyenv:
```
brew install pyenv
```
Create a .python-version file in the project root:
```
3.10
```
Install the Python version specified in the .python-version file:
```
pyenv install
```
Verify the desired Python version is installed and is set for the project:
```
pyenv version
```

Dependency management with Poetry and the `pyproject.toml` file

Poetry is a tool for dependency management and packaging in Python. It allows you to declare the packages your project depends on and it will manage (install/update) them for you. Poetry offers a lockfile to ensure repeatable installs, and can build your project for distribution.

It uses the pyproject.toml file (which we'll explore next) as its source of truth.

Why this matters for DevX: With pyproject.toml and Poetry, you've eliminated the "works on my machine, but not on others" problem at the dependency level. Every developer and every CI/CD runner will install the exact same versions of every package, every time!

Installing Poetry

Install Poetry using Homebrew:
```
brew install poetry
```
Verify the installation:
```
poetry --version
```
Configure Poetry to create virtual environments in the project directory (recommended for better DevX). This ensures that when you run poetry install, it creates a .venv folder directly in the project, making it easy to activate and manage:

poetry config virtualenvs.in-project true

The `pyproject.toml` file

The pyproject.toml file is a PEP 518 standard that replaces the need for multiple configuration files (setup.py, requirements.txt, setup.cfg, etc.) with one unified file. It uses TOML (Tom's Obvious, Minimal Language) format.

Benefits:

Single source of truth: All project configuration lives in one file.
Version constraints: You can specify package versions according to Poetry's dependency specification and version constraints.
Deterministic builds: Poetry generates a poetry.lock file that pins every dependency—both direct (what you specify) and transitive (dependencies of your dependencies)—ensuring identical installs across environments.
Tool configuration: You can configure multiple tools in the same file (no need for separate config files).

Example pyproject.toml file:

[project]
name = "PROJECT_NAME"
version = "0.1.0"
description = "PROJECT_DESCRIPTION"
authors = [
    {name = "YOUR_NAME",email = "youremail@domain.com"}
]
readme = "README.md"

# Production dependencies that your code needs to run.
[tool.poetry.dependencies]
python = ">=3.10,<3.11"
snowflake-snowpark-python = "1.33.0" # Snowflake Snowpark Python library
pydantic = "2.11.7"                  # Data validation library in Python

# Development-only tools that aren't needed in production.
[tool.poetry.group.dev.dependencies]
black = "^23.0.0"           # Code formatter
pylint = "^3.0.0"           # Linter for code quality
isort = "^5.13.2"           # Import statement organiser
poethepoet = "^0.27.0"      # Task runner for simplifying development tasks
pytest = "^8.1.2"           # Testing framework for Python
pytest-xdist = "^3.0.0"     # Run tests in parallel for faster execution
pytest-cov = "^5.0.0"       # Generate code coverage reports

[build-system]
requires = ["poetry-core>=2.0.0,<3.0.0"]
build-backend = "poetry.core.masonry.api"

# Configure all your tools

[tool.black]
line-length = 120
target-version = ['py310']

[tool.isort]
profile = "black"
multi_line_output = 3

[tool.pylint]
max-line-length = 120
fail-under = 9.5

# Configure tasks for Poe the Poet
[tool.poe.tasks]
# Private tasks (prefixed with _ to hide from the help menu)
_format_black = "black ."
_format_isort = "isort ."
_pylint = "pylint src/"

# Public tasks that compose the individual tools
format = ["_format_black", "_format_isort"]
lint = ["format", "_pylint"]
test = "pytest --cov -vv"

Installing dependencies

Once your pyproject.toml is set up, installing all dependencies (including dev dependencies) is a single command. It will:

Create a virtual environment (in .venv if you configured Poetry to do so).
Install all dependencies (including dev dependencies).
Generate or update poetry.lock to ensure reproducible installs across environments.

For new projects where you haven't written code yet, you'll need to use the --no-root flag:

poetry install --no-root

Why --no-root is needed initially:

When you first create a project manually or with poetry init, Poetry assumes you're building a package. If you run poetry install without any code, you'll get an error:

Installing the current project: example-project (0.1.0)
Error: The current project could not be installed: No file/folder found for package example-project

The --no-root flag tells Poetry to skip installing your project as a package and only install the dependencies you've specified.

When you won't need --no-root:

Once you've written code and added a packages section to your pyproject.toml file like the example below, you can use the standard poetry install command (without --no-root):

[tool.poetry]
packages = [{include = "<YOUR_PACKAGE_NAME>", from = "src"}]

Configuring VS Code (optional):

To use the project's virtual environment in VS Code / Cursor for IntelliSense, debugging, and running code in the IDE:

Press Cmd+Shift+P (or Ctrl+Shift+P on Windows/Linux)
Type "Python: Select Interpreter"
Select "Enter interpreter path"
Enter the path to your project's virtual environment: ./<PROJECT_ROOT>/.venv/bin/python (adjust the path to match your project location)
VS Code will now use the same Python environment as Poetry, giving you access to all installed packages and proper code completion.

Poe the Poet: Simplifying development tasks

Poe the Poet is a task runner that lets you define common development commands in your pyproject.toml file. Instead of remembering long commands like poetry run black . && poetry run isort . && poetry run pylint src/, you can create a simple alias and run poetry run poe lint. See the [tool.poe.tasks] section in the example pyproject.toml file above for the configuration.

Benefits:

Consistency: Everyone on your team uses the same commands
Simplicity: poe lint instead of remembering multiple flags
Composability: Chain tasks together (e.g., lint runs format then pylint)
Documentation: Tasks are self-documenting in pyproject.toml

Wrapping Up

You now have a solid foundation for local Snowflake Python development:

✅ Deterministic Python versions with pyenv and .python-version
✅ Robust dependency management with Poetry and pyproject.toml
✅ Consistent tooling configured in a single file
✅ Simplified task execution with Poe the Poet

This setup eliminates the "it works on my machine, but not on others" problem at its source. Every developer on your team will have the exact same environment, the same dependencies, and the same tooling automatically.

The DevX payoff: By investing in these foundations, you're not just setting up tools, you're creating an environment where Data Engineers can focus on building features instead of fighting with configuration. This is how we democratize data development.

I hope you find this guide helpful. If you have questions or feedback, I'd love to hear from you!

Improving our frontend tracking with Avo

Clément Raul — Wed, 03 Dec 2025 14:08:23 +0000

Overview

This article explains how we’ve revamped our product analytics frontend tracking at Super using Avo 📊. For a long time, we relied on Google Sheets to document frontend events, which led to unclear ownership, inconsistent schemas, and slow, manual QA in Segment. We’ve since moved to Avo’s Tracking Plan and Inspector, giving us a single source of truth, a proper branching and peer review process with developers, and automated validation.
➡️ The result: cleaner data, faster debugging, and much smoother collaboration between data and engineering ✅.

Introduction

Accurate tracking is essential for reliable data monitoring. It helps us confirm that newly released features work as expected, identify and fix bugs, and optimise key user journeys – for example, the funnel for the Super Credit application.

When tracking goes wrong, the symptoms can vary:

Missing events
Missing properties
Typos in property values
Duplicate events being sent

But the root cause is almost always the same: poor or missing documentation.

Our previous setup: Google Sheets as a tracking plan 📄

Until recently, our main solution for documenting frontend tracking was Google Sheets. For each new feature, we would either create a new document or add a new tab listing all the events that needed to be tracked.

What worked well:

It was simple and familiar for everyone.
The data team could quickly spin up a new sheet and share it.

The data team was responsible for:

Creating and maintaining the event list
Sending it to the dev team when new frontend tracking was required

However, the limitations quickly became obvious.

Key pain points ⚠️:

Poor versioning: It was difficult to see when events had been removed or updated, and why.
Unclear ownership: Anyone could edit the sheet, and changes often went unnoticed.
Weak review process: There was no clear “branching” or peer review flow before sending tracking specs to developers.
No automated validation: We had no way to systematically check that frontend tracking had been implemented correctly. Validating events in Segment’s debugger was manual, time-consuming, and especially painful for complex features like Super Credit with many different paths.
Little support for harmonisation: There was nothing to enforce reuse of existing properties, ensure consistent property names/values across features, or keep our schema tidy over time.

Because of these limitations, we decided to look for a better solution.

Exploring alternatives and discovering Avo

One option we considered was documenting our tracking events in JSON files and using GitHub for version control, branching, and reviews. This would have been free and would have given us better structure, but it would also have been fairly developer-centric and not very user-friendly for non-engineers.

After some research, we came across Avo, a tool focused on frontend tracking schema management, observability, and monitoring.

Avo offers two main components:

Tracking Plan
Inspector

The Tracking Plan: a single source of truth 📘

The Tracking Plan is where we define all the events sent from the frontend via Segment.

In Avo, events can be organised by category – for example:

App events
Super Credit
Merchant checkout

Each event includes:

A clear definition of when the event is triggered
The list of properties to send (e.g. brandId, pageName, memberId)
The allowed values and formats for those properties, where relevant

What Avo improves:

✅Single source of truth: All frontend tracking specs live in one structured place instead of being scattered across multiple Google Sheets.

✅Branching and reviews: Adding or updating events happens via branches, similar to a development workflow. A contributor creates a branch, a peer reviews it, then it’s sent to developers for implementation and finally merged into the main frontend tracking plan once implemented.

✅Better versioning: It’s easy to see when events are created, changed, or archived.

✅Consistency and harmonisation: Avo encourages consistent event naming and property reuse, and helps keep property values aligned across features.

The Inspector: validating implementation automatically 🔎

The second major feature we use is the Inspector.

The Inspector connects Segment to Avo so that Avo can:

Read the events coming from the frontend
Compare them against the definitions in the Tracking Plan

This is extremely useful for:

Checking that events have been implemented correctly 🤝
Spotting typos in property names or values
Ensuring that all required properties are being sent as defined
What used to require manual QA in Segment’s debugger can now be done much more quickly and systematically.

How we are using Avo today

We started using Avo in the context of the Super Credit features. It has already:

Improved collaboration between the data team and developers
Made it easier to review and refine frontend tracking specifications
Helped us identify and fix tracking bugs more quickly and efficiently

At the moment, we’re using the free version of Avo, which comes with some limitations:

A cap on the volume of events that Inspector can analyse per month (currently 100,000 events)
Some paid features that we don’t yet have access to

Whether upgrading to the paid version would be worth it is still under review.

We are also in the process of migrating our legacy frontend tracking documentation. Around 80% of event definitions related to frontend checkout and Webflow (including Super Credit) have been moved from Google Sheets to Avo. The next step is to complete the migration for app-related frontend events.

Conclusion 🎯

Overall, our experience with Avo has been extremely positive. It is user-friendly, has saved us significant time, and has improved collaboration both within the data team and between data and development.

By moving away from ad-hoc Google Sheets towards a proper schema management and observability tool, we’ve made our tracking more reliable, our debugging faster, and our analytics more trustworthy – which ultimately helps us build and improve features like Super Credit with much more confidence.

How We Use OpenAI and Gemini Batch APIs to Qualify Thousands of Sales Leads

Jag Thind — Tue, 09 Sep 2025 11:38:32 +0000

The following blog details how the Data team used AI to solve a specific problem for our Marketing and Sales teams - Qualify 3000 websites (Salesforce Accounts) to determine if they are ecommerce and can take payments.

It is broken down into:

Problem at hand and what are we trying to solve?
Process design
Why use LLMs from 2 AI providers
Prompt engineering and using prompt templates
Scaling up using OpenAI batch API and google batch predictions

TL;DR

We implemented a batch data enrichment pipeline that uses OpenAI and Gemini Large Large Models (LLMs) via the OpenAI Batch API and Google Batch Predictions for a cost effective way to enrich data using the power of LLMs.

To ensure maximum accuracy and minimise the effects of hallucinations from the LLMs, we use a simple consensus system: each website is checked by both AIs 3 times each, and only results where they agree are accepted. Yes, this makes it more expensive, but we optimised for time to value and getting good leads into the hands of the Sales team.

We used a prompt template configured to use the web search tool to ground the LLM with real-time information about the website, overcoming the model's static knowledge cutoff date.

We trained the Marketing team in writing effective prompts for the LLMs before we scaled up using the batch mode.

A great example of tech and the business working together to achieve a shared outcome and spreading the use of AI in the business.

Problem at Hand

The Marketing team periodically builds lists of potential merchants that can integrate Super as a payment method on their website checkout. These leads are then provided to Account Executives (AEs) to sign up.

When assigned a website the first thing AEs do is manually double check is the website ecommerce:

Can you buy products on the website?
Is there a checkout on the website?
Does it accept card payments?

⚠️ Many websites were not ecommerce ⚠️ resulting in:

AEs wasting time doing manual checking
Many leads getting dis-qualified at the top of the sales funnel
AEs getting frustrated with leads they were assigned
AEs resorting to self-sourcing leads and taking them away from their core responsibilities of closing deals

What are we trying to solve?

Questions we asked ourselves:

Can we increase the number of leads at the top of the sales funnel?
Can we automate the is ecommerce check instead of manually qualifying each website?
Can we scale this check across N (hundreds/thousands) websites?

Process Design

Before we dive into the details of Prompt Engineering and how the Batch pipeline works. The below illustrates the process and its 2 parts.

Why use LLMs from 2 AI Providers?

Even though it was more costly to do so, we needed to be confident in the accuracy of what we were telling the AEs in the Sales team. Instead of relying on a single AI, we used LLMs from two different AI providers, then based our final decision on their consensus.

Think of it like getting a second opinion from a trusted expert. If two independent specialists examine the same data and come to the same conclusion, your confidence in that outcome increases dramatically.

Some benefits include:

Accuracy Through Consensus: The core of our strategy is built on consensus. An ecommerce qualification is only confirmed if both LLMs independently agree. This simple but powerful rule acts as a powerful filter, significantly reducing the risk of a single LLM making a mistake, hallucinating, or misinterpreting a site.
Mitigating Model-Specific Weaknesses: Every LLM has its own unique architecture, training data, and inherent biases. One LLM might be brilliant at identifying traditional retail sites but struggle with subscription services, while the other might have the opposite strengths. Using a single LLM means you also inherit all of its blind spots. By using two, we diversify our "cognitive portfolio," allowing the strengths of one LLM to compensate for the weaknesses of the other, leading to a more balanced and consistently accurate outcome.
Automatic Quality Control: Perhaps the most valuable benefit is what happens when the LLMs disagree. A disagreement is a critical signal. It tells us that a website is ambiguous, an edge case, or complex in a way that could have easily fooled a single AI. Our system automatically flags these disagreements for manual review.

Prompt Engineering

Prompt engineering is the process of writing effective instructions for a LLM, such that it consistently generates content that meets your requirements.

We used the OpenAI developer platform to iteratively develop a reusable prompt template that could be used in the responses API. The platform allows testing different versions of a prompt side-by-side to evaluate changes.

Advantages of using a Prompt Template

You can use variables via {{placeholder}} and your integration code remains the same, e.g.

response = client.responses.create(
    model="gpt-4.1",
    prompt={
        "id": "pmpt_abc123",
        "version": "2",
        "variables": {
            "website_url": "xyz.com"
        }
    }
)

You can also configure the prompt to use the web search tool to allow the LLM to search the web for the latest information before generating a response:

{
    "type": "web_search_preview",
    "user_location": {
        "type": "approximate",
        "country": "GB",
        "search_context_size": "high",
    }
}

Prompt Template

The Marketing team produced a prompt template that had clear instructions for the LLM to check if a single website URL is ecommerce.

Please research the website {{url}} provided by the user. You must only return the data requested in the "InformationRequested" section and in a format according to the "OutputFormat" section. Do not include any explanations, reasoning, or commentary.

## InformationRequested
- url: {{url}}
- is_url_valid: Y/N — Is the URL valid and accessible?
- is_ecommerce: Y/N - You MUST use rules from section "Evaluation Rules for column is_ecommerce"

## OutputFormat
Output as JSON with the following fields. Do not include markdown around the JSON:
- url
- is_url_valid
- is_ecommerce

## Evaluation Rules for column is_ecommerce

*Mark "Y" only if all of the following are true, based on explicit evidence available*:
* rule 1
* rule 2
* etc

*Mark "N" in any of the following cases*:
* rule 1
* rule 2
* etc

## Final Reminder

- You must only return the data requested in the "InformationRequested" section.
- You must only return it in the format according to the "OutputFormat" section.
- You must not include any explanations, reasoning, or commentary.

Scaling it up - OpenAI Batch API

OpenAI has a Batch API that allows you to send asynchronous groups of requests with 50% lower costs, a separate pool of significantly higher rate limits, and a clear 24-hour turnaround time. The workflow is:

The uploaded batch file containing the requests will have one line per website as below,

{"custom_id": "request-[1756480801.159196]-xyz.com", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1", "input": "Run the following prompt", "prompt": {"id": "pmpt_XXX", "version": "2", "variables": {"url": "xyz.com"}}}}
{"custom_id": "request-[1756480802.1434196]-abc.com", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1", "input": "Run the following prompt", "prompt": {"id": "pmpt_XXX", "version": "2", "variables": {"url": "abc.com"}}}}

The benefits of this are:

Significant Cost Reduction: The 50% discount on pricing is a major advantage for processing thousands of URLs, leading to substantial cost savings compared to using the real-time API.
Increased Throughput: The much higher rate limits allow for processing a large volume of requests in parallel, drastically reducing the overall time it takes to enrich a large dataset.
Asynchronous "Fire-and-Forget" Workflow: You can submit a large batch job and not have to wait for it to complete. This is perfect for non-time-sensitive, offline processing tasks, as you can retrieve the results later without keeping a connection open.
Simplified Client-Side Logic: It removes the need for you to build and maintain complex logic to handle rate limiting, concurrent requests, and retries. You simply prepare and upload a file.
Enhanced Resilience and Error Handling: Since requests are independent, the success or failure of one doesn't impact others. The output file clearly indicates the status of each request, making it easy to identify and retry only the failed ones.
Up to date context: The prompt template is configured to use the web search tool to ground the LLM with real-time information about the website. This search is performed independently for each website.

Scaling it up - Google Batch Predictions

Google Batch Predictions also allows you to generate predictions from Gemini models using a Batch Job, the workflow is:

Similar to OpenAI the batch job file contains one request per line, but you cannot use a prompt template, so each request in the file has the full prompt. Also web search tools in Gemini are not available via Batch Predictions, but we still found the results to be accurate.

Where we Ended Up

We now have a repeatable way to enrich data using the power of LLMs for a large number of websites. We have already started using it to conduct other checks.

The Salesforce Accounts we enriched with is commerce = Y/N was used to create a better qualified list at the top of the sales funnel.

AEs were no longer reporting the website as not ecommerce.

A job well done by AI and Humans!

📈 Period-over-Period Measures in Looker: A Simpler, Better Way to Analyze Time Trends

Clément Raul — Tue, 22 Jul 2025 12:50:48 +0000

Tracking change over time—month-over-month, year-over-year—is essential for monitoring performance. Until recently, doing this in Looker meant relying on table calculations, which could be:

Hard to read and maintain
Prone to human error
Limited to visualization-only logic

Looker's new period-over-period measures help bring time-based comparisons into LookML itself - making it easier to build reliable, reusable metrics. (https://cloud.google.com/looker/docs/period-over-period)

🧱 Before: Table Calculations

Previously, getting last year’s value for a metric required a workaround like the following table calculation:
offset(${payment_transactions_fact.count_successful_transactions}, 12)

Limitations we encountered with this approach:

⚠️ Only works in visualizations—can’t be filtered or reused elsewhere
🧩 Logic cannot get duplicated across dashboards
🔍 Harder to QA and understand over time
📉 Fragile if the time grain or sort order changes

✅ Now: LookML Period-over-Period Measures

With the period_over_period measure type, we can now move this logic into our LookML layer.

Example: Last Year’s Value

measure: count_successful_transactions_last_year {
  type: period_over_period
  description: "Successful transactions from the previous year"
  based_on: count_successful_transactions
  based_on_time: created_year
  period: year
  kind: previous
}

Example: Year-over-Year % Change

measure: count_successful_transactions_last_year_relative_change {
  type: period_over_period
  description: "Year-over-year % change in successful transactions"
  based_on: count_successful_transactions
  based_on_time: created_year
  period: year
  kind: relative_change
  value_format_name: percent_0

Example: Looker Explore

Example: Looker Visualisation

🏢 How We Use It

At Super, we’ve started using these measures to track key metrics like:

📦 Sales – Year-over-year trends
🧲 Customer and merchant retention – Month-over-month comparisons
🚀 Feature impact – Week-over-week shifts after a product launch

Because these are built into our LookML layer, they can be reused across different dashboards, which helps ensure consistency and saves time.

🔍 Why This Helps

✅ Cleaner and centralized logic – Easier to review and test
📊 Works across dashboards – Filterable, sortable, and visualization-friendly
🔧 Adaptable – Supports different time periods (week, month, quarter, year)

💡 Final Thoughts

This feature might seem small, but it's been a noticeable improvement for our data team. It reduces repetitive work, helps standardise time-based comparisons, and makes dashboards easier to maintain and trust.

Isolating Integration Tests

Sam Adams — Fri, 04 Jul 2025 05:57:12 +0000

When we write integration tests at Super we follow some principles:

Each test should work independently of any other test
It should only act and assert at the boundary of the service (i.e. how the service is interacted with by other services or users)

In practice this means we have to do some work to manage isolation (or lack of state-bleed) between each test.

(Note: if you want to skip to a working code example here is an example repo: https://github.com/sam-super/example-db-test-isolation)

When thinking about isolation it's handy to have a good mental model of how tests are executed in the most popular testing frameworks:

This means our test suites run in parallel, but (by default) each test in the suite run in sequence). So it's important that we use this model to make sure our tests can run in parallel and stay isolated.

Below is an example of testing a simple fastify app that has a DB with a single table for cars:

So for example:

import {FastifyInstance} from "fastify";
import knexFactory, {Knex} from "knex";
import {$} from 'execa';
import {buildApp} from "../src/app";
import {expect} from 'expect';
import {Client} from "pg";
import {randomString} from "./helpers/utils";

async function startContainers() {
  await $`docker compose up --remove-orphans --wait -d postgres`;
}

async function initDb(testSpecificDbName: string) {
  const start = Date.now();

  const connOpts = {
    // these need to match your docker-compose.yml
    host: '127.0.0.1',
    port: 54323,
    password: 'whatever',
    user: 'root',
  };
  const client = new Client({
    database: 'postgres',
    ...connOpts,
  });
  await client.connect();
  await client.query('CREATE DATABASE ' + testSpecificDbName);
  await client.end();

  const knex = knexFactory({
    client: 'pg',
    connection: {
      ...connOpts,
      database: testSpecificDbName,
    },
    migrations: {
      directory: __dirname + '/../src/migrations',
      tableName: 'knex_migrations',
    }
  });

  await knex.migrate.latest();

  console.log(`Created DB ${testSpecificDbName} (in ${(Date.now() - start) / 1000}s)`);
  return knex;
}

describe('cars api', function () {
  let app: FastifyInstance;
  let knex: Knex;

  before(async function () {
    await startContainers();
  });
  beforeEach(async function () {
    process.env.DB_NAME = `test_${Date.now()}_${randomString()}`;
    knex = await initDb(process.env.DB_NAME);
    app = buildApp(knex);
  });
  afterEach(async function () {
    await app.close();
    await knex.destroy();
  })

  it('can get a car it creates', async function () {
    const postRes = await app
      .inject()
      .post('/cars')
      .headers({'content-type': 'application/json'})
      .body({make: 'ford'});
    expect(postRes.statusCode).toEqual(200);
    expect(postRes.json()).toEqual({make: 'ford'});

    const getRes = await app
      .inject()
      .get('/cars')
      .headers({'content-type': 'application/json'});
    expect(getRes.statusCode).toEqual(200);
    expect(getRes.json()).toEqual([{make: 'ford'}]);
  });

  it('gets no cars if none created', async function () {
    const getRes = await app
      .inject()
      .get('/cars')
      .headers({'content-type': 'application/json'});
    expect(getRes.statusCode).toEqual(200);
    expect(getRes.json()).toEqual([]);
  });
});

What is this doing:

once, before any tests run, we start our containers (postgres - see docker compose file in GH repo)
before each test we create a new database with a random name and run the knex migrations (which creates the 'cars' table).
create a new instance of the fastify app (which is bound to the random db name/instance)
make requests to our api using the inbuilt .inject() method to issue http requests to fastify (without having to start a http server)
we can see the second test doesn't have the state (db row) created in the first test (otherwise it would fail)

In future articles we can go into more depth on how to optimize our tests and how we work with isolation when using localstack (dynamo, sqs queues etc).

FAQs

Why not just re-use the same database?

We could re-use the same db for each test and truncate the data between each test. We have to be careful tho, because, although in our example we have a single test file/suite, in practice we want our suites to be able to run in parallel. When we have many test-suites, it is advantageous to re-use databases on a per-thread basis (to save the time to create/migrating the DB) and then just truncate the tables between tests.

Isn't this slow?

On an M1 Macbook it's about 10ms to create each DB and run the migrations. It's only 1 migration, and as the number grows so will the time. However, for us it's worth the trade of for what we are trying to achieve and the guaranteed isolation it gives us.
There are also a few strategies to speed it up:

As above we can re-use databases per-worker-thread
we can maintain a single sql dump file (generated from the migrations themselves) and use it populate the databases on creation (rather than running each individual migration).

Shouldn't you be deleting the DBs and removing the containers at the end?

Re-using the containers speeds up our tests (postgres is fast to start, but it helps a lot if you have something like localstack which takes a while to start). Since our tests are isolated, it shouldn't matter what state we leave lying around on our containers. Eventually we could fill up the disk with all our test DBs, but that will take a long time and is easily fixed by killing our containers: docker compose down -v. Then next test run will bring up clean containers.

There is also the added benefit of having the DB left around after each test to manually inspect after a failed test run.

Backend Testing At Super Payments

Sam Adams — Fri, 04 Jul 2025 05:33:25 +0000

At Super Payment we adopt an "integration-first" testing methodology, with unit/e2e style tests being applied more sparingly.

If you want to read more on why we prefer integration tests to unit/e2e tests then Kent Dodds has done a much better job than I will on explaining the rationale.

To understand how our integration tests work at Super, we need to know a bit more about our architecture:

Architecture

At Super, our services are written in TypeScript and each service encompasses a given domain. Each service has it's own DB, SQS queue and EventBridge Bus (or whichever subset it needs).

The main way for our services to talk to each other internally is via EventBridge/SQS messages, where EB receives messages as the output of the producing system, and the consuming system binds the EB messages it's interested in to it's own, dedicated SQS queue for consumption.

The main way the outside world (customer, third-parties etc) interact with Super is via HTTP (or a partner EventBridge).

Integration Tests

When we say 'integration test' we mean hitting a single service with some real-world input (an event or HTTP request) to see what outputs/side-effects occur.

Since most of our services follow the same set of inputs (HTTP/SQS) and outputs (HTTP/EventBridge), our tests try to only act and assert on those inputs and outputs (ignoring 'how' the service works internally).

A typical integration test at Super looks like this:

start supporting containers (postgres and localstack)
start the app (we use NestJS with fastify)
spin up a new clean DB for that test (we use Knex migrations)
use nock to intercept all network requests
make HTTP requests to the app and/or send SQS events
assert on what happened

We can assert on:

the HTTP requests we made to third-party APIs (nock)
the events we sent (using nock to monitor the EventBridge requests)
that the SQS queue has been processed/is empty (by directly calling localstack with the AWS-SQS SDK)
the database state (if we have to)
the response from our starting HTTP request

Since our tests are fully stateful (but fully isolated from other tests) we can see how changes to inputs affect outputs.

In this series we will dig into the finer details on the above, specifically on how we isolate our tests (while keeping them fast).

The Impact of AI on Organizations

Aidan McGinley — Fri, 20 Jun 2025 12:30:08 +0000

What if everything we believe about AI's impact on organizational structure is backward? The widespread assumption is that artificial intelligence will inevitably lead to smaller, leaner teams. This is characterised by the pithy comment 'Do more with less'. However that assumption may be fundamentally wrong.

Here, I outline why the organizations that thrive in the AI era won't be those that downsize, but those that leverage AI to build and manage larger, more impactful teams than were previously possible.

The Economics of Team Size

To understand how AI will impact optimal team size, we first need to understand what determines team size in the first place. In any organization, there are two key functions that determine the optimal number of employees: the cost function and the value function.

The Cost Function

Adding an employee incurs two types of costs:

Fixed costs: These are straightforward - salary, benefits, equipment, office space, etc.
Variable costs: These are subtler but often more significant. They stem from organizational complexity and are based on principles like Dunbar's number and the exponential growth of communication paths as teams expand.

When you plot the marginal cost of hiring employee N, you get a hockey stick curve. Early on, fixed costs dominate, resulting in a relatively flat line. But as the team grows, variable costs take over, causing the curve to bend sharply upward. Anyone who has managed a rapidly growing team will recognize this pattern intuitively - it gets exponentially harder to add people past a certain point.

The Value Function

On the value side, let's consider a simplified model of diminishing returns. While real organizations exhibit much more complex patterns, including potential network effects and complementarities between employees, a basic diminishing returns model can still offer useful insights. In this simplified view, your first hire delivers significant value, and each subsequent hire adds value, though typically at a decreasing rate. This creates a generally downward-sloping line when we plot the marginal value of employee N. While this is a significant simplification of organizational reality, it serves to illustrate our key points about AI's impact.

The Intersection

The optimal team size occurs where these two curves intersect - the point where the marginal cost of adding another employee equals their marginal value contribution. Past this point, each new hire costs more than the value they bring.

Enter AI

Now, here's where it gets interesting. AI doesn't just shift these curves - it fundamentally reshapes them. Let's examine how:

Value Function Changes:
- The baseline value per employee increases as AI amplifies individual productivity
- The diminishing returns effect may be moderated as AI tools support productivity at scale
- AI can facilitate more effective employee specialization

This causes the value curve to move upwards indicating more value per employee

Cost Function Changes:
- The fixed cost component might increase slightly due to AI tool costs
- The variable cost curve may become more manageable as AI helps with certain aspects of organizational complexity
- Some communication and coordination costs can be better handled with AI-powered tools

These attributes lead to lower costs per employee causing the cost curve to move downwards or to the right

When you plot these new curves, the intersection point tends to move to the right. This suggests that the optimal team size for an AI-enabled organization could be larger than for a traditional one.

Why This Matters

This insight has profound implications for how we think about AI's impact on organizations:

Instead of viewing AI as a replacement for human workers, we should see it as a catalyst for organizational scaling
The focus should be on how AI can help manage complexity and maintain productivity at scale
Companies that understand this will gain competitive advantages by building larger, more capable teams that deliver more value

What's Next

Here we've covered an economic model for describing how AI will affect organisations. In the next post we will explore how these theoretical insights translate into practice by examining the impact on a software engineering team.

Improve DBT Incremental Performance on Snowflake using Custom Incremental Strategy

Jag Thind — Thu, 29 May 2025 11:34:29 +0000

The following presents how to improve the performance of the DBT built-in delete-insert incremental strategy on snowflake so we can control snowflake query costs. It is broken down into:

Defining the problem, with supporting performance statistics
Desired solution requirements
Solution implementation, with supporting performance statistics

TL;DR

We implemented a DBT custom incremental strategy, along with incremental predicates to improve snowflake query performance:

Reduced MBs scanned by ~99.68%
Reduced micro-partitions scanned by ~99.53%
Reduced query time from 19 seconds to 1.3 seconds

Less data is being scanned, so the snowflake warehouse is waiting less time on I/O, so the query completes faster.

Disclaimer

Custom incremental strategies and incremental predicates are more advanced uses of DBT for incremental processing. But I suppose that’s where you have the most fun, so lets get stuck in!

Problem

When using the DBT built-in delete-insert incremental strategy on large volumes of data, you can get inefficient queries on snowflake when the delete statement is executed. This means queries take longer and increase warehouse costs.

Taking an example target table:

With ~458 million rows
Is ~26 GB in size
Has ~2560 micro-partitions

With a DBT model that:

Is running every 30 minutes
Typically there are ~100K rows to merge into the target table on every run. As data can arrive out-of-order, a subsequent run will pick these up, but means it can include rows already processed.

With DBT model config:

  - name: model_name
    config:
      materialized: "incremental"
      incremental_strategy: "delete+insert"
      on_schema_change: "append_new_columns"
      unique_key: ["dw_order_created_skey"] -- varchar(100)
      cluster_by: ["to_date(order_created_at)"]

Default delete SQL generated by DBT, before it inserts data in the same transaction:

delete from target_table as DBT_INTERNAL_DEST
where (dw_order_created_skey) in (
  select distinct dw_order_created_skey
  from source_temp_table as DBT_INTERNAL_SOURCE
);

Performance Statistics

To find the rows in the target table to delete with the matching dw_order_created_skey (see node profile overview image below), snowflake has to:

Scan ~11 GB of the target table
Scan all ~2560 micro-partitions
Query takes ~19 seconds

Why? - The query is not filtering on order_created_at to allow snowflake to use the clustering key of to_date(order_created_at) to find the matching rows to delete.

Query plan

Desired Solution

To limit the data read in the target table above. We can make use of incremental_predicates in the model config. This will add SQL to filter the target table.

DBT model config:

  - name: model_name
    config:
      materialized: "incremental"
      incremental_strategy: "delete+insert"
      on_schema_change: "append_new_columns"
      unique_key: ["dw_order_created_skey"]
      cluster_by: ["to_date(order_created_at)"]
      incremental_predicates:
        - "order_created_at >= (select dateadd(hour, -24, min(order_created_at)) from DBT_INTERNAL_SOURCE)"

Issues with this

The incremental_predicates docs states dbt does not check the syntax of the SQL statements, so it does not change anything in the SQL.
We get an error when it executes on snowflake: Object 'DBT_INTERNAL_SOURCE' does not exist or not authorized.
We cannot hardcode the snowflake table name in the incremental_predicates, as its dynamically generated by DBT.

Solution Implementation

We need to:

Do some pre-processing on each element of incremental_predicates to replace DBT_INTERNAL_SOURCE with actual source_temp_table so SQL like the below is generated by DBT for better performance:

delete from target_table as DBT_INTERNAL_DEST
where (dw_order_created_skey) in (
  select distinct dw_order_created_skey
  from source_temp_table as DBT_INTERNAL_SOURCE
)
-- Added by incremental_predicates
and order_created_at >= (select dateadd(hour, -24, min(order_created_at)) from source_temp_table)
;

Continue to call the default DBT delete+insert incremental strategy with the new value for incremental_predicates in the arguments dictionary.

How - The below macro implements a light-weight custom incremental strategy do this. You can see at the end it calls the default get_incremental_delete_insert_sql DBT code.

{% macro get_incremental_custom_delete_insert_sql(arg_dict) %}
  {% set custom_arg_dict = arg_dict.copy() %}
  {% set source = custom_arg_dict.get('temp_relation') | string %}
  {% set target = custom_arg_dict.get('target_relation') | string %}

  {% if source is none %}
    {{ exceptions.raise_compiler_error('temp_relation is not present in arguments!') }}
  {% endif %}

  {% if target is none %}
    {{ exceptions.raise_compiler_error('target_relation is not present in arguments!') }}
  {% endif %}

  {% set raw_predicates = custom_arg_dict.get('incremental_predicates', []) %}

  {% if raw_predicates is string %}
    {% set predicates = [raw_predicates] %}
  {% else %}
    {% set predicates = raw_predicates %}
  {% endif %}

  {% if predicates %}
    {% set replaced_predicates = [] %}
    {% for predicate in predicates %}
      {% set replaced = predicate
        | replace('DBT_INTERNAL_SOURCE', source)
        | replace('DBT_INTERNAL_DEST', target)
      %}
      {% do replaced_predicates.append(replaced) %}
    {% endfor %}
    {% do custom_arg_dict.update({'incremental_predicates': replaced_predicates}) %}
  {% endif %}

  {{ log('Calling get_incremental_delete_insert_sql with args: ' ~ custom_arg_dict, info=False) }}
  {{ get_incremental_delete_insert_sql(custom_arg_dict) }}
{% endmacro %}

This is now callable from the DBT model config by setting incremental_strategy to custom_delete_insert.

  - name: model_name
    config:
      materialized: "incremental"
      incremental_strategy: "custom_delete_insert"
      on_schema_change: "append_new_columns"
      unique_key: ["dw_order_created_skey"]
      cluster_by: ["to_date(order_created_at)"]
      incremental_predicates:
        - "order_created_at >= (select dateadd(hour, -24, min(order_created_at)) from DBT_INTERNAL_SOURCE)"

Performance Improvement Statistics

To find ~100K rows to delete in the target table, now snowflake has to only:

Scan ~35 MB of the target table, 11 GB → 35 MB = ~99.68% improvement
Scan 12 micro-partitions, 2560 → 12 = ~99.53% improvement
Query takes ~1.3 seconds

Less data is being scanned, so the snowflake warehouse is waiting less time on I/O, so the query completes faster.

Query plan

If you're interested in hearing more about how we use DBT at Super Payments, feel free to reach out!

Hashicorp Vault at Super

Luke Livingstone — Thu, 22 May 2025 14:41:55 +0000

At Super, we use HashiCorp Vault to securely store the secrets required by our microservices running on Kubernetes.

We’ve been long-time fans of Vault. Our Platform team has previous experience deploying and maintaining it, so choosing Vault for our current setup was an easy decision from a knowledge and reliability standpoint.

Drawing on lessons from past implementations, we were able to build something robust and scalable. Our infrastructure is hosted entirely on AWS and is segmented across multiple accounts. We maintain three separate workload accounts, Staging, Mock, and Production each running Super's microservices in Kubernetes along side a Infrastructure account, for Platform tooling.

Rather than deploying and maintaining a separate Vault cluster for each environment, we opted for a centralised approach. This decision reduced operational overhead and significantly improved the developer experience, avoiding the complexity of managing and switching between multiple Vault interfaces.

To get started, we deployed our Vault infrastructure via Terraform. Vault’s storage backend is powered by Amazon S3, with DynamoDB providing high availability. We also use AWS KMS for auto-unseal functionality, eliminating the need for manual intervention when restarting Vault. Vault itself is installed using the official HashiCorp Helm chart.

Next, we provisioned an internal Network Load Balancer (NLB) and exposed it through a VPC Endpoint Service. This design choice enables secure, cross-account connectivity to Vault using VPC Interface Endpoints—avoiding the complexity and security risks of VPC peering.

To simplify service discovery within our Kubernetes clusters, we created human-readable internal services that resolve super.vault to the appropriate VPC interface endpoint. This gives our services a clean and consistent way to talk to Vault, regardless of the environment they’re running in.

That wraps up our simple yet effective centralized Vault infrastructure here at Super. By consolidating our setup, we've kept operations streamlined, secure, and developer-friendly across all environments.

If you're interested in hearing more or want us to dive deeper into any aspect of our Vault implementation—be it authentication flows, secret injection, or scaling—feel free to reach out. We'd love to share more!

Karpenter on EKS Fargate

Luke Livingstone — Wed, 17 Apr 2024 15:04:30 +0000

We're reinventing payments. Super powers free payments for businesses and more rewarding shopping for customers, so that everyone wins. https://www.superpayments.com/

We're using Karpenter to manage our Kubernetes node scaling.

We're big fans of how fast Karpenter can provision just-in-time nodes for us across our EKS clusters but there was one sticking point, for obvious reasons the Karpenter controller pods can't run on Karpenter managed nodes.

To get around this we used AWS EKS managed node groups as init nodes and pinned Karpenter to said nodes. We provisioned a node group with a minimum and maximum of 2 nodes for Karpenter mostly (although other pods could run on these nodes too, to avoid wasting compute resources!)

The downside is that updating managed node groups is slow; updating two nodes, with a maximum of one available at a time, took between six and ten minutes, and we wanted to speed up this process.

The simple solution? Remove the init nodes! But then, where do we run Karpenter? Enter Fargate.

We created an EKS Fargate profile via our EKS Terraform module with a selector for the Karpenter namespace:

resource "aws_eks_fargate_profile" "karpenter" {
  cluster_name           = aws_eks_cluster.cluster.name
  fargate_profile_name   = "karpenter"
  pod_execution_role_arn = aws_iam_role.fargate.arn
  subnet_ids             = var.private_subnets

  selector {
    namespace = "karpenter"
  }
}

Pod Execution Role

If you've ever used ECS, you'll be familiar with the pod execution role. For Fargate and EKS, it's a straightforward role with two AWS managed policies attached:

resource "aws_iam_role" "fargate" {
  name = "${var.cluster_name}-fargate"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks-fargate-pods.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "fargate" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy"
  role       = aws_iam_role.fargate.name
}

resource "aws_iam_role_policy_attachment" "fargate_eks_cni" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.fargate.name
}

Karpenter Helm Install

We use the hashicorp/helm Terraform provider to install both the Karpenter and CRD charts directly from our EKS module. This ensures that Karpenter is up and running before anything else, ready to provision compute.

Next, we set the namespace for the Karpenter chart to match the selector in the Fargate profile, which in our case is karpenter, and we're off to the races!

NAME                         READY   STATUS    RESTARTS   AGE   IP              NODE                                                  NOMINATED NODE 
karpenter-75c664b7cb-9z9lr   1/1     Running   0          5d    <snip>   fargate-<snip>.eu-west-2.compute.internal   <none>           <none>
karpenter-75c664b7cb-fxhb2   1/1     Running   0          5d    <snip>    fargate-<snip>.eu-west-2.compute.internal

Notes

By default we're using Fargate's minimum resources which are 0.25 vCPU and 0.5GB RAM per task.

Currently you can't specify ARM when creating Fargate tasks on EKS so we're currently using x86 but the cost is around $20 per month for both tasks.

We've generally reduced the number of nodes across our EKS clusters too, resulting in some cost savings but much less waiting around for the Platform team!

Terraform Modules at Super

Luke Livingstone — Thu, 28 Mar 2024 12:30:55 +0000

We're reinventing payments. Super powers free payments for businesses and more rewarding shopping for customers, so that everyone wins. https://www.superpayments.com/

Like most startups, we use Terraform to manage and deploy our infrastructure. This post covers how we use Terraform modules at Super to adhere to the DRY principle.

Early in our Terraform refactor, we aimed to invest in modules. Our goal was to promote high reusability while minimising code.

At the time of writing, Super has around 70 Terraform modules in use across 10 providers. Some of the modules are small (e.g. IAM Role) and some are larger (e.g. EKS Cluster).

Template Module & Code Style 📝

In order to keep module creation in line with a style guide we have a template module. Some of the rules below are best practice and some are specific to Super.

We don't include provider configurations
We don't include any backend configuration
data.tf file is used for all data resources
outputs.tf file is used for all output resources
variables.tf file is used for all variables
versions.tf file is used for required_providers and required_version

Why no provider?! 😱

The primary reason we avoid including a provider in our modules is to facilitate nesting modules. Nesting modules can be beneficial to keep resources used in a standardised format across modules.

When using a module inside of a module Terraform deems it incompatible with count, for_each, and depends_on if the module in question has its own local provider configuration.

We started out only removing the providers of modules nested, but decided that we can make use of Terragrunt's generate and include to remove providers from all modules.

Let's take the following directory structure for AWS as an example. We have a folder for the AWS region (eu-west-2) and we also have a few hcl files.

├── super-staging
│   ├── eu-west-2
│   ├── aws.hcl
│   ├── terragrunt.hcl
│   └── vault.hcl

The aws.hcl file uses a Terragrunt generate block to arbitrarily generate a file in the terragrunt working directory (where terraform is called).

generate "aws" {
  path      = "aws.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "eu-west-2"
  default_tags {
    tags = {
      environment = "staging",
    }
  }
}
EOF
}

When using a module with Terragrunt you can then use the include block with the find_in_parent_folders function.

include "aws" {
  path   = find_in_parent_folders("aws.hcl")
}

terraform {
  source = "git@github.com:organisation/terraform-example-module.git?ref=v1.0.0"
}

Remote State

We use S3 as our state store along with DynamoDB for locking all encrypted with KMS.

The terragrunt.hcl at the root of the directory includes three things. The terragrunt remote_state block, iam_role and some default inputs.

remote_state {
  backend = "s3"

  generate = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }

  config = {
    bucket                = "super-staging-eu-west-2-example-bucket"
    key                   = "${path_relative_to_include()}/terraform.tfstate"
    region                = "eu-west-2"
    encrypt               = true
    dynamodb_table        = "super-staging-eu-west-2-example-table"
    kms_key_id            = "alias/s3-super-staging-eu-west-2-example-kms"
    disable_bucket_update = true
  }
}

iam_role = "arn:aws:iam::<snip>:role/example-role"

inputs = {
  environment    = "staging"
  aws_account_id = "<snip>"
  service_owner  = "devops"
}

We then add the include like we do with the AWS provider. By default find_in_parent_folders will search for the first terragrunt.hcl file.

include "root" {
  path   = find_in_parent_folders()
  expose = true
}

Versioning 🔢

Our Platform team are enthusiasts of semantic versioning and we also use conventional commits.

We have a simple Github Action job on each module repository that uses the semantic-release-action. We use the @semantic-release/commit-analyzer plugin with the conventionalcommits preset.

      - name: Release
        uses: cycjimmy/semantic-release-action@v4
        with:
          semantic_version: 23.0.2
          extra_plugins: |
            @semantic-release/changelog@6.0.3
            @semantic-release/git@10.0.1
            conventional-changelog-conventionalcommits@7.0.2
        env:
          GITHUB_TOKEN: ${{ secrets.CI_GITHUB_TOKEN }}

DEV Community: Super Payments

Setting Up a Robust Local DevX for Snowflake Python Development

Why I'm writing this blog

What we'll cover

Python version management with pyenv

Dependency management with Poetry and the pyproject.toml file

The pyproject.toml file

Poe the Poet: Simplifying development tasks

Wrapping Up

Improving our frontend tracking with Avo

Overview

Introduction

Our previous setup: Google Sheets as a tracking plan 📄

Exploring alternatives and discovering Avo

The Tracking Plan: a single source of truth 📘

The Inspector: validating implementation automatically 🔎

How we are using Avo today

Conclusion 🎯

How We Use OpenAI and Gemini Batch APIs to Qualify Thousands of Sales Leads

TL;DR

Problem at Hand

What are we trying to solve?

Process Design

Why use LLMs from 2 AI Providers?

Prompt Engineering

Advantages of using a Prompt Template

Prompt Template

Scaling it up - OpenAI Batch API

Scaling it up - Google Batch Predictions

Where we Ended Up

📈 Period-over-Period Measures in Looker: A Simpler, Better Way to Analyze Time Trends

🧱 Before: Table Calculations

✅ Now: LookML Period-over-Period Measures

🏢 How We Use It

🔍 Why This Helps

💡 Final Thoughts

Isolating Integration Tests

What is this doing:

FAQs

Why not just re-use the same database?

Isn't this slow?

Shouldn't you be deleting the DBs and removing the containers at the end?

Backend Testing At Super Payments

Architecture

Integration Tests

The Impact of AI on Organizations

The Economics of Team Size

The Cost Function

The Value Function

The Intersection

Enter AI

Why This Matters

What's Next

Improve DBT Incremental Performance on Snowflake using Custom Incremental Strategy

Problem

Performance Statistics

Desired Solution

Solution Implementation

Performance Improvement Statistics

Hashicorp Vault at Super

Karpenter on EKS Fargate

Pod Execution Role

Karpenter Helm Install

Notes

Terraform Modules at Super

Template Module & Code Style 📝

Why no provider?! 😱

Remote State

Versioning 🔢

Dependency management with Poetry and the `pyproject.toml` file

The `pyproject.toml` file