DEV Community: Ken Ahrens

Twenty years of Linux, and now I run my own AI

Ken Ahrens — Wed, 17 Jun 2026 15:27:00 +0000

I've been running Linux and Ubuntu for nearly 20 years. I can date it almost exactly, because I still have the email from my Dad back in early 2007 telling me he'd just installed Ubuntu. That's where this started.

The big trend the last couple of years is obviously AI and the large language models, and this balancing act between what you can run yourself on your own machine versus what you get from a state-of-the-art model that you have to pay quite a bit for. Everyone has seen the challenges with the cost structure. Copilot had to change their license. Cursor had to change their license. Anthropic is changing theirs. It's hard to find the right mix where you give a lot of value to customers and you're also able to charge them an appropriate amount.

Open source and AI is another way to solve this. So I've been investing in building out my own local stack of AI tools I can run on my machine.

I'm on an M-series Mac, the Apple Silicon. You may have heard of tools like Ollama that let you run open source models. There's another one called oMLX that's designed specifically for Apple Silicon, and in my testing it ran better. You download the models straight from Hugging Face. The ones I run are the mlx-community builds, which are already quantized down to 4-bit, so a 31B model that would be around 60GB at full precision comes down to about 17GB on disk. I didn't have to convert anything or do any prep. I just pulled them down and pointed oMLX at the folder. Right now I'm running Gemma 4 and Nemotron, plus a few others.

The thing that makes this actually useful day to day is that I keep oMLX running all the time. It installs as a Homebrew service. Newer versions of Homebrew want you to explicitly trust third-party taps before they'll load anything, so it's a two-step setup that you do once, and then it starts when I log in and restarts itself if it ever crashes:

brew trust jundot/omlx       # one-time; newer brew rejects untrusted taps
brew services start omlx

If you skip the trust step you'll get a Refusing to load formula error and brew won't move on.

Under the hood brew services is just a wrapper around launchd, the macOS process supervisor. Same mechanism I use further down for my own scheduled jobs, just packaged up so I don't have to write the plist by hand.

It asks for a chunk of memory up front, but it only touches the CPU and GPU when something calls it, so the rest of the time it just sits idle. That means I can wire it into all the tools I use day to day, whether it's the Zed code editor or the opencode terminal-based AI agent, and they all just point at a server that's already there.

To give you a sense of what that actually means in practice, here's what the three models I use most do on my machine through oMLX, generating a few hundred tokens from a cold start:

Model	Type	Throughput
Gemma 4	31B, dense	~27 tokens/sec
Qwen 3.6	27B, dense	~30 tokens/sec
Nemotron	30B, mixture-of-experts	~158 tokens/sec

All three are plenty fast for interactive work. The Nemotron number is the interesting one. It's a similar size on disk to the others, but because it's a mixture-of-experts model it only activates a small slice of its weights per token, so it runs five or six times faster. That's fast enough that I don't think twice about putting it in a script that runs on a loop.

That's been a cool boost. I can run AI models, and I can especially use them as part of scripts that run continuously on my machine. I set up a launchd framework so I can run things on a schedule. I have a weekly to-do checklist job, a daily planning job, and a heartbeat that runs every 15 minutes making sure my system is actually in good shape.

The other big model I've been testing is DeepSeek 4. antirez came up with a way to run it in its own package called DS4, and that's part of my stack too. The advantage is it's a lot more powerful. It runs around 35 tokens a second for me, about the same as Gemma even though it's a much bigger model. The disadvantage is it uses most of the resources on the computer when I run it, so it's not something I leave idling in the background the way I do with the oMLX models.

But just being able to tap into open weight models and run them on my own machine is a huge power boost. It gives me the confidence to kick something off and step away from my desk without burning up all my tokens for the week. Or I can run things overnight and come back to see if they got done.

I'm not worried about consuming all my tokens or getting bumped into the next API tier. You've probably seen some of my past blogs where I spent tens of thousands of dollars on tokens in a couple of months. I like the idea of using open source AI to optimize some of that spend. And frankly, I like the freedom and the support for open source too.

I even use VoiceInk on my own machine so I can do my own transcribing without sending it to a third party. And when I record videos, I run Whisper locally to create the transcripts.

So there's quite a bit you can do on your own machine when you're on Apple Silicon. The hardware is expensive. But you also get the ability to bring tons of tools onto your local machine and get a lot more out of the horsepower you already paid for.

If you want to see the actual setup, I put my stack here: github.com/kenahrens/mac-local-ai.

Our follow-up rate was abysmal, so I built my own tool

Ken Ahrens — Tue, 02 Jun 2026 13:43:56 +0000

I had a number stuck in my head: eight to 10 touches before deciding a prospect isn't interested anymore.

I thought we were doing better than that. Then I actually looked.

In 2024 I went through HubSpot, which was our CRM at the time, and checked the outreach history for prospects who had gone quiet. When was the last time we heard from them? How many times had we followed up since then?

The pattern was pretty consistent: one outreach attempt, sometimes two, then nothing. We'd get a no-response or two and move on. The deal would go quiet and we'd call it dead.

My benchmark was eight to 10. We were stopping at one or two.

That's not a team discipline problem. We're not bad at follow-up when a deal is active. We're bad at follow-up when the signal from the other side goes quiet and there's no obvious forcing function to reach back out. People are busy. Their inboxes are full. Sometimes they didn't ignore me on purpose. They just didn't respond yet.

I knew I needed something that would make them make noise — at least for me.

The tool

I called it Radar.

I built it around the places where our customer and prospect interactions actually happen. Slack. Email. CRM. In-product usage. Meeting notes in a markdown folder. PostHog for what people are doing after the call. No single system had the full picture.

Radar syncs all of it. CRM, PostHog, meeting notes, email history, Slack threads. That sync is the workhorse. It builds a timeline of every interaction with each account, then tracks the touches: Slack touch, email touch, meeting, whatever happened.

Different accounts need different cadences. A happy customer might be fine with a 30-day check-in. Someone in the middle of an important project might need a touch every three days. Radar lets me set that standard and highlights the accounts that are outside it.

The first version did one thing: for each flagged account, suggest a draft follow-up email. A draft that knew the last thing we'd talked about, not a generic "checking in." Referenced anything new we'd shipped that might be relevant. Asked something specific.

I didn't use the suggested follow-ups as-is. That wasn't the point. They gave me a starting point, and I could edit the email before sending it.

I ran it the first week it worked. We sent follow-ups to the accounts it surfaced.

We booked five meetings in the next seven days.

Five meetings from one week of follow-up, to accounts that had been quiet. People we'd talked to before who had just gone dark while we weren't watching.

The team noticed. We ended up with our own Slack emoji for Radar. That's how I know something is actually being used.

Why

Following up with prospects who've gone quiet is an important task that competes with everything else on my plate. When there's nothing urgent pushing me toward an account, I don't go looking.

Radar makes it visible. Every morning I can open it and see who has gone past the follow-up window I set. Three days. Thirty days. Whatever makes sense for that account. The accounts I should be thinking about surface automatically instead of disappearing into the CRM.

That changed how I think about what I need from a CRM. Every CRM I've used was designed to track deals. That's useful, but it's backward-looking — it tells me what happened. Radar tells me who needs attention right now.

What went wrong

The first version proved the concept. Then I tried to scale it.

The earliest mistake was letting it send automatically. The original setup queued drafts for me to review before anything went out. I got comfortable and loosened that. Emails went out without a human pass, and some of them weren't ready.

UTF-8 encoding was one I didn't see coming. The body of an email could look completely fine in the preview and still produce a garbled subject line when it hit a mail client that handled encoding differently.

Here's an example of what an average email looked like when things were working:

One issue I keep seeing in service-heavy systems is mock drift. Mocks stay "correct" per the spec, but prod has extra fields, missing fields, odd status codes, retries, and timeouts. Then AI tools and integration tests learn the wrong shape and you get regressions that only show up under real traffic.

Do you have a way to detect when your mocks no longer match what upstreams are actually returning in prod?

Not a disaster. But it needed editing before sending. The problem is that at volume, "needs editing" becomes "probably didn't get edited."

Strong prompts and evals help. They don't fully solve it. The AI knows the account history. It doesn't know whether this person has heard a similar pitch three times this week, or whether the framing that worked on one account falls flat on another. That judgment doesn't transfer.

I scaled back. The tool now focuses more on the intelligence side — surfacing who needs attention, what's happened recently, what context matters before a call — and less on generating copy to blast out. The draft emails are still there, but as a starting point I edit, not output I send.

Since then

We've kept building on it. Now it pulls together the pipeline of new opportunities, what problems specific accounts are running into, and how long since the last meaningful touch. It's tracking across web, email, slack and other systems.

We're still not hitting eight to 10 touches on every prospect. But we're not giving up at one or two anymore either.

My follow-up process had a gap between the standard in my head and what was actually happening. The only way I found that gap was by auditing it. The only way I closed it was by making a tool that made the gap visible every morning.

The accounts that don't make noise aren't necessarily dead. They're just quiet. That's different.

Notes + Local AI: Simpler Than You Think

Ken Ahrens — Tue, 19 May 2026 19:28:06 +0000

I got caught flat-footed twice by the same customer. I have an office hours call with them almost every week. For a couple of weeks in a row they asked the same question: any update on that issue from last month? I'd written it down both times. I just never turned it into a ticket.

Office hours with active customers is some of the best feedback a startup gets. They're in the product, they know what's broken, and they're telling you directly instead of just quietly churning. When they point out you forgot to track their feedback it's embarrassing. But more importantly, it tells you something's wrong with the system, not just your memory.

I've tried a lot of note-taking approaches. Written by hand, scanned handwritten notes, typed everything up, recorded audio and had it transcribed. Capture was fine. The Follow-up section was where things died.

From 2022 to 2024 I used Notion. Got to 500+ notes. I thought the tagging and grouping would help me actually use what I'd captured. But it was just another black box. Notes went in, nothing came back out. Search worked fine, but only if I went looking, and between back-to-back calls, I wasn't going looking. And if I wanted to feed those notes to an LLM, I had to export everything, unzip a folder, run through a whole process. Enough friction that I just didn't.

So last year I exported everything as markdown and wrote a script to add frontmatter to all of them. Title, date, note type, meeting type (customer, vendor, investor, internal), company, tags.

$ ls -al spd-ai-notes | wc -l
    2257

2,200+ files. Any script, any AI tool, any grep can read them directly. No export. No process.

Customer notes

I use four templates: technical, meeting, internal, and finance. The meeting template has four sections: Attendees, Agenda, Notes, Follow-up. I fill in half of it in the minute before the call starts. Customer feedback goes in Notes. Anything I need to act on goes in Follow-up.

---
title: "{{title}}"
date: {{date:YYYY-MM-DD}}
note_type: meeting
meeting_type:
company:
tags:
  -
---
# {{title}}

## Attendees
- 

## Agenda
- 

## Notes
- 

## Follow-up
-

The Follow-up section is the one that used to get lost. Now an AI agent reads it.

I can point Claude or Qwen or DS4 at the notes folder and say "read my meeting notes from the last week, find follow-up items related to product issues, and create a Linear ticket for each one." First time I ran it, it made 20 tickets. A full week of calls where customers had mentioned things in passing, I'd written them down, and nothing had happened. One pass, done.

It works the other direction too. Before a call, point the AI at your notes for that customer and ask for a quick brief: what did we cover last time, what's open, what changed. Fifteen minutes between back-to-back calls isn't enough to dig through notes manually. One prompt is.

Customer notes also flow into our GTM system so the sales context is there when the next call happens. Follow-ups from meetings feed my weekly TODO. And when I'm working with an AI agent on something and it spots an issue, it can drop a note right into that same folder. I'll pick it up later.

Finance checklist

The finance template works differently. Instead of capturing what happened, it's a checklist I run 2-3 times a month: vendor invoices, customer AR aging, bank balance, payroll, board reporting. Every item links directly to the system I need to check. Open the note, work down the list, check things off. Takes about half the time it used to because I'm not trying to remember what I missed last time.

---
title: "{{title}}"
date: {{date:YYYY-MM-DD}}
note_type: internal
topic: finance
participants:
  - 
tags:
  -
---
# {{title}}

## Participants
- 

## Checklist
- [ ] Vendor invoices (AP)
    - [ ] Contractors
    - [ ] SaaS subscriptions
    - [ ] Other vendors
- [ ] Customer invoices (AR aging report)
    - [ ] Outstanding by customer
    - [ ] Past due follow-ups
- [ ] Payment processor
    - [ ] Open invoices
    - [ ] Failed charges
- [ ] Marketplace revenue
    - [ ] Active agreements
    - [ ] Billed revenue
- [ ] Upcoming expenses
    - [ ] Credit card balance
    - [ ] Other payables
- [ ] Confirm bank balance
    - [ ] Payroll clearing
    - [ ] Operating account
- [ ] Bookkeeping review
    - [ ] Recent transactions
    - [ ] Outstanding requests
- [ ] Board/investor reporting
- [ ] Compliance
    - [ ] State and local filings
    - [ ] Audits

Same idea works for quarterly security reviews, board prep, anything repeatable that currently lives in your head.

The wider ecosystem

Because everything is plain text, you're not locked into any one tool.

For viewing and navigating, Obsidian handles large markdown libraries well: graph view, tag search, template plugins. VSCode works too if you'd rather stay in your dev environment. Both read the same folder with no conversion needed.

For AI, you don't have to use a cloud model. Ollama and Apple MLX let you run models locally against the same folder. Useful if you have notes you'd rather not send to an external API. DS4 is worth looking at specifically. The latest models support up to 200k token context windows, so you can feed in most of your notes folder in a single pass.

And if you need a different output format, just ask the AI to convert. Board summary, customer brief, HTML email. Markdown converts cleanly to any of them. The notes become inputs to other systems, not just records you write and forget.

The value of an app like Notion is the UI. The cost is that your data only works inside that app. Plain text inverts that tradeoff.

The note-taking advice you usually see is about capturing or summarizing a meeting. That's not wrong. But capture is the easy part.

The hard part is what happens to a note after it's written. If the answer is "nothing, unless I manually go find it," there's a hole in the system regardless of how good the notes are. Pointing an AI at the folder is what closed it.

That customer still does office hours with me. When they ask about last month's item, I've got a ticket to show them.

The Ultimate Guide to a Smooth Dev Environment

Ken Ahrens — Thu, 09 Apr 2026 15:22:21 +0000

The Ultimate Guide to a Smooth Dev Environment

Originally published on 2025-12-11 at speedscale.com.

Setting up a development environment can be challenging, especially for new developers or those adapting to new developer tools. One common hurdle is setup time, as configuring all the necessary components can delay the start of actual work. A local development environment offers significant advantages for testing and debugging, allowing developers to work efficiently on their own machines without relying on remote resources. A well-configured environment is crucial for efficient coding, testing, and debugging, enhancing productivity and minimizing errors. When setting up your tools, remember that an integrated development environment (IDE) is a type of software application designed to streamline development by integrating coding, debugging, and automation features. This guide will walk you through everything you need to know, from the basics to advanced customizations for different operating systems. Whether you’re starting out or refining your setup, you’ll find practical tips to optimize your workspace, streamline your workflow, and ensure your environment is secure and efficient.

Understanding the Basics of a Development Environment

A development environment is a carefully configured setup of hardware, software, and tools essential for writing, testing, and debugging code. It provides a controlled space that mimics real-world conditions, allowing software developers to identify and fix issues early, reducing errors and saving time. This isolated environment ensures that code can be safely created, tested, and refined without impacting live systems, making the development process more efficient. Simulating different scenarios and configurations allows for optimizing applications for performance and stability before reaching end-users. Whether developing web, mobile apps, or other software, a well-structured development environment is crucial for experimentation, iteration, and perfecting code, making it an invaluable tool for developers at any level. Within the broader field of software engineering, development environments play a key role, but measuring productivity in software engineering remains challenging due to the complexity of workflows and the limitations of traditional metrics.

Setting Up a Development Environment

Setting up your development environment is crucial for efficient coding, testing, and debugging. Here’s a streamlined guide to the essential components.

Install a Code Editor

Choose a code editor like Visual Studio Code or Sublime Text—or an Integrated Development Environment (IDE) like IntellliJ. Modern IDEs offer advanced features such as intelligent code completion, real-time feedback, and seamless integration of development tools, which significantly improve programmer productivity. Look for features such as syntax highlighting, plugins, support for multiple languages, and an integrated terminal to enhance productivity and streamline your workflow. Many modern IDEs also support languages like Visual Basic, especially for visual programming and drag-and-drop application development. Note that Visual Studio Code can in a way be set up to be a fully-fledged IDE, by way of extensions.

Version Control Systems

Install Git for version control to manage changes in your codebase, collaborate with others, and track different project versions. A source repository is used to store and manage different versions of your code externally, enabling seamless collaboration and version tracking. The basic setup includes configuring your username, email, and SSH keys.

Terminal and Shell Options

Execute commands using terminals like Windows Terminal, iTerm2, or built-in options on macOS and Linux. Customizing themes, fonts, and shortcuts can optimize your workflow.

Package Managers

Package managers like Chocolatey (Windows), APT (Linux), and Homebrew simplify software installation and management. They keep your tools up-to-date and reduce dependency conflicts.

Environment Variables

Set up environment variables like PATH to ensure your system can access tools and runtimes. Proper management helps avoid configuration issues and smoothens the development process.

Installing Language Runtimes and Tools

Install runtimes for your programming languages (e.g., Python, Node.js) using package managers. Use version managers (e.g., pyenv, nvm) to handle multiple language versions across projects.

Configuring Your Environment

Enhance your setup with linters, formatters, and debuggers to improve code quality and efficiency. Customize editor settings to personalize your development experience and maintain consistent code standards.

Testing Your Setup

Run basic tests, like a “Hello World” script, to verify that your environment is correctly configured. These checks ensure that your tools, runtimes, and editors are properly integrated before starting more complex projects. This streamlined setup will help create a productive and efficient development environment tailored to your needs.

Windows-Specific Setup

Setting up a development environment on Windows requires specific configurations to optimize your workflow. Below are the key steps to tailor your environment for Windows, focusing on terminal setup, package managers, and environment variables.

Install Windows Terminal

Windows Terminal is a modern, versatile terminal application that supports multiple shells, including Command Prompt, PowerShell, and Git Bash. Unlike the traditional Command Prompt, Windows Terminal offers a more feature-rich experience with support for multiple tabs throughout, customizable themes, and various shell options, making it a preferred choice for developers.

Command Prompt and PowerShell: Command Prompt is the classic shell for executing commands on Windows, while PowerShell offers more advanced scripting capabilities and greater integration with Windows management tasks.
Git Bash: Git Bash provides a Unix-like shell experience on Windows, which can be particularly useful if you are accustomed to Linux command line tools.

Customizing Windows Terminal

To enhance your productivity, you can customize Windows Terminal settings by accessing the settings file (settings.json). Here, you can change the appearance of the terminal, set custom key bindings, and tweak the startup behavior of windows command prompt and different shells. You can adjust font styles, background images, and color schemes to create an environment that is both visually appealing and tailored to your workflow.

Using Windows Package Managers

Windows package managers like Chocolatey and Scoop simplify the installation and management of software on your machine. These tools help automate software setup, allowing you to install, update, and manage applications via the command line.

Chocolatey: A widely-used package manager for Windows that enables you to install software with a single command. For example, to install Node.js, you would use:
```
choco install nodejs
```
Scoop: Another package manager that emphasizes simplicity and avoids requiring administrative permissions for installations. To install Python using Scoop, you would use:
```
scoop install python
```

These package managers are particularly useful for maintaining consistency in your development environment, as they allow you to quickly set up or replicate environments across different systems.

Windows Environment Variables

Environment variables are crucial for configuring how your operating system and applications behave. On Windows, managing environment variables involves navigating through the system settings, which can be slightly different from other operating systems.

Modifying Environment Variables: To add or edit environment variables, you can access the settings through the following steps:
- Open the Start menu and search for “Environment Variables.”
- Click on “Edit the system environment variables.”
- In the System Properties window, click the “Environment Variables…” button.
Navigating Environment Variables: In the Environment Variables window, you can create new variables, modify existing ones, or delete unnecessary entries. For instance, to add a new path to the PATH variable, select “Path” under “System variables,” click “Edit,” and then add the desired directory path.
Practical Example: Adding Git to your PATH variable ensures you can use Git commands from any terminal window. This configuration is essential for a seamless development experience, enabling all your tools to work together harmoniously.

By carefully configuring your terminal, utilizing package managers, and properly setting environment variables, you can create a streamlined and efficient development environment on Windows. These steps not only enhance your coding workflow but also make managing your tools and software much easier.

Linux-Specific Setup

Setting up a development environment on Linux provides flexibility and control, making it an ideal choice for many developers.

Choosing and Configuring a Shell

Linux offers various shell options, each with unique features that can enhance your command-line experience. The default shell on most Linux distributions is Bash, but other popular alternatives include Zsh and Fish.

Bash (Bourne Again Shell): The most common shell on Linux, Bash is powerful, highly scriptable, and familiar to most developers. It provides robust scripting capabilities and is suitable for general-purpose use.
Zsh (Z Shell): Zsh builds on Bash’s functionality, offering enhanced features like auto-suggestions, improved tab completion, and support for custom themes and plugins through frameworks like Oh My Zsh.
Fish (Friendly Interactive Shell): Known for its user-friendly syntax and intuitive command-line interface, Fish provides advanced features like syntax highlighting and smart suggestions out-of-the-box without requiring extensive configuration.
Customizing Your Shell:
- Oh My Zsh: A popular framework for managing Zsh configurations, Oh My Zsh allows you to easily add themes and plugins, enhancing both aesthetics and functionality. Installation is simple and can be done with a single command:
```
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
```
  Powerlevel10k: A highly customizable Zsh theme that displays useful information like Git status, Python virtual environments, and system load in a visually appealing way. To install Powerlevel10k, follow the instructions provided in the Oh My Zsh themes section.

These customizations can significantly improve your efficiency and make your command-line environment visually engaging and informative.

Using Linux Package Managers

Linux package managers are essential for installing and managing software, offering a simple way to keep your system up-to-date and organized.

APT (Advanced Package Tool): Used primarily on Debian-based distributions like Ubuntu, APT is the go-to package manager for installing software. For example, to install Git, you would run:
```
sudo apt update sudo apt install git
```
Homebrew: Originally developed for macOS, Homebrew is now available on Linux and provides an easy way to install newer or alternative software versions. For instance, to install Node.js, use:

brew install node

Snap: A package manager that provides self-contained applications, Snap is particularly useful for installing the latest software versions across different Linux distributions. To install VS Code, you would use:
```
sudo snap install code --classic
```
YUM (Yellowdog Updater, Modified): Used mainly on Red Hat-based distributions like CentOS and Fedora, YUM allows you to manage RPM packages. For example, to install Python, run:
```
sudo yum install python3
```

These package managers streamline the software installation process, ensuring your development environment is equipped with the necessary tools to develop full and up-to-date software.

Linux Environment Variables

Environment variables on Linux control how your system behaves and how applications access resources. Properly configuring these variables can enhance your development experience and prevent common setup issues. Environment variables can be added or modified directly in shell configuration files such as .bashrc, .zshrc, or .config/fish/[config.fish](http://config.fish/) for Fish. To add a directory to your PATH, you would append a line like this to your configuration file:

export PATH="$PATH:/your/new/path"

After editing the file, apply the changes by running source ~/.bashrc (or the equivalent command for your shell).

Persisting Changes Across Sessions

The best practice for environment variable changes is to make them in the shell’s configuration file, ensuring they are loaded each time a new terminal session is started. For changes that should apply system-wide, you can add them to /etc/environment or similar global configuration files. Always back up configuration files before making significant changes to project files to avoid misconfigurations that could impact your system’s behavior. By carefully selecting and configuring your shell, efficiently managing software with package managers, and properly setting environment variables, you can create a highly functional and personalized development environment on Linux. These steps help you leverage the full power of Linux, making your coding experience smoother and more productive.

Development Process Optimization

Optimizing the development process is essential for boosting developer productivity and delivering high-quality software efficiently. By leveraging modern development tools and integrated development environments (IDEs) like Visual Studio Code, teams can streamline their workflows and minimize time spent on repetitive tasks. Features such as syntax highlighting, code completion, and built-in debugging empower developers to write, test, and refine code with greater accuracy and speed. Incorporating robust version control systems, such as Git, further enhances collaboration by making it easy to track code changes, manage branches, and coordinate work across teams. When the development process is optimized with the right tools and practices, teams can increase developer productivity, improve code quality, and accelerate the delivery of software applications—ultimately leading to better outcomes for both developers and end users.

Building and Debugging Applications

Local development tools are essential for enhancing efficiency and accelerating the coding process. They provide immediate feedback, enabling real-time debugging and testing in an environment that closely mirrors production. This setup allows developers to quickly identify and fix issues, ensuring a smoother development experience.

While pull requests are often used to track coding activity, they may not always accurately reflect meaningful contributions or true productivity, as they can sometimes encourage unnecessary busywork instead of impactful development.

Compilers and Interpreters: These tools, such as the JVM for Java or the Python interpreter, are vital for running code locally, enabling you to test and debug applications directly on your machine.
Debuggers: Tools like Chrome DevTools and GDB are critical for diagnosing and resolving issues. They allow you to step through code, inspect variables, and set breakpoints, making troubleshooting more manageable.
Package Managers: Tools like npm and pip streamline the management of dependencies and environment setup, ensuring your project remains consistent and up-to-date.

By leveraging these local development tools, teams can streamline workflows, reduce errors, and improve developer productivity.

Modern Approaches to Local Development and Debugging

Docker is a popular tool that creates isolated, reproducible environments, simplifying the process of running applications locally. It ensures that your development setup is consistent across different machines and stages, reducing “it works on my machine” issues. Some best practices include:

Automate builds using scripts to save time and minimize errors.
Use breakpoints and logs for effective debugging and quicker identification of problems.
Leverage incremental builds and local testing to catch issues early in the development cycle.

By integrating these tools and practices, you can optimize your development workflow, making the process of building and debugging applications more efficient and reliable.

Development Environment Security

Securing your development environment is crucial to protect your code, data, and infrastructure. Implementing key security principles ensures that your environment is safe from unauthorized access and vulnerabilities.

Key Principles of a Secure Development Environment

Implementing robust security measures is essential to protect your development environment from vulnerabilities. This involves a combination of access control, network security, and data protection strategies that help safeguard your code and infrastructure.

Access Control and Authentication: Use strong, unique passwords and multi-factor authentication (MFA) for all tools and services. Restrict access based on the principle of least privilege and use SSH keys for secure server access instead of passwords.
Network Security: Secure your connections with firewalls and VPNs, and avoid using public Wi-Fi. Keep all software, libraries, and dependencies up-to-date to prevent known vulnerabilities.

Tools and Best Practices for Securing Your Development Environment

Utilizing the right tools and adopting best practices can significantly reduce vulnerabilities in your development environment. Implementing these strategies will help you maintain a secure, efficient, and resilient workspace.

Secure Coding Practices: Use linters with security-focused rules and regularly scan code with tools like SonarQube or GitHub’s Dependabot to identify vulnerabilities early.
Environment Hardening: Utilize containers (e.g., Docker) to isolate environments and reduce security risks. Secure your tools and servers by disabling unnecessary services and configuring permissions properly.
Data Protection and Backup: Encrypt sensitive data and use secure storage solutions for credentials, such as environment variables and secret management tools. Regularly back up critical files to safeguard against data loss.

By incorporating these practices, you can create a secure web development environment that minimizes risks and protects your projects from security threats.

Dev Environment Best Practices

Adopting best practices for your development environment is key to ensuring consistency, reliability, and efficiency across your team. One foundational practice is to standardize your development environment—whether on Linux or Windows—to reduce compatibility issues and make onboarding new developers seamless. Utilizing a package manager like Homebrew or Chocolatey simplifies the installation and management of development tools, ensuring that everyone on the team has access to the same essential components. Maintaining code quality is equally important; implementing consistent coding standards and using tools such as linters and formatters helps catch syntax errors early and keeps your codebase clean and maintainable. By following these best practices, you create an environment that supports productivity, reduces errors, and enables developers to focus on building robust software.

Development Environments and Collaboration

A well-configured development environment is a catalyst for effective collaboration among development teams. By adopting shared or cloud-based development environments, teams can work together in real time, share knowledge, and minimize the risk of configuration drift. Tools like Git and GitHub are indispensable for managing code changes, tracking project progress, and resolving conflicts, making it easier for multiple developers to contribute to the same codebase. Integrating a continuous integration and continuous deployment (CI/CD) pipeline further streamlines collaboration by automating testing, building, and deployment processes. This not only improves developer productivity but also ensures that code changes are thoroughly tested and delivered to end users more reliably. Ultimately, a collaborative development environment empowers teams to innovate faster and deliver higher-quality software.

Enhancing Local Development Environments with Speedscale

Speedscale helps create powerful local development environments by leveraging Kubernetes preview environments that closely mirror production settings. Using tools like Minikube and Skaffold, Speedscale enables developers to deploy applications in isolated environments where real-world traffic conditions can be replicated. This approach allows developers to test code changes and validate application behavior in a controlled setting, identifying issues early and reducing inconsistencies between local and production environments. A key advantage of using Speedscale is its traffic replay feature, which allows recorded production traffic to be replayed within the development environment. This enables thorough testing of application performance and behavior against realistic data, providing immediate feedback and enhancing debugging capabilities. By automating the simulation of service interactions and test scenarios, Speedscale helps streamline the development process, making it easier to catch issues early and ensure reliable performance.

Developer Experience and Satisfaction

Fostering a positive developer experience is crucial for attracting and retaining top talent, as well as driving overall productivity. Providing access to a diverse set of development tools—such as Visual Studio Code, IntelliJ, and GitHub—enables developers to choose the best solutions for their workflow and programming languages. Supporting professional growth through training, mentorship, and opportunities to learn new technologies helps developers stay engaged and motivated. Additionally, cultivating a culture of open communication, regular feedback, and recognition creates an inclusive and supportive development environment where developers feel valued. By prioritizing developer experience and satisfaction, organizations can create an environment that not only boosts productivity but also leads to better software outcomes and long-term business success.

Conclusion

Setting up a development environment is a foundational step in the software development process that directly impacts your productivity and the quality of your work. By implementing the tips and best practices outlined in this guide, you can create a smooth and efficient environment tailored to your needs, making coding, testing, and debugging more manageable. Customize your setup to match your workflow and preferences using tools like an Integrated Development Environment (IDE) to streamline tasks and boost productivity. A well-configured development environment supports your immediate project needs and enhances your overall coding experience, leading to better outcomes and a more enjoyable development journey.

Originally published at speedscale.com

Top 5 WireMock Alternatives Best Practices

Ken Ahrens — Tue, 07 Apr 2026 16:00:00 +0000

Top 5 WireMock Alternatives Best Practices

Originally published on 2025-12-22 at speedscale.com.

WireMock is a popular open source tool for simulating APIs in testing environments through the wiremock server in the wiremock cloud. It allows developers to stub HTTP responses, match requests by URL, headers, and body content, record and play back API interactions, and add configurable delays and errors. WireMock is known for its broad adoption and active community, which contribute to its reliability and ongoing updates. In addition to its core capabilities, WireMock offers advanced features for HTTP mocking, such as TLS interception, request verification, and dynamic response conditions. Initially created for Java, WireMock now supports multiple programming languages and technology stacks, making it a favorite among developers for its flexibility and ease of use.

However, sometimes WireMock isn’t the right tool for the job, such as when you’re dealing with large-scale testing frameworks or facing integration challenges. Also, if you prefer enterprise support beyond an open source model, there are other options. Some alternatives provide a comprehensive platform for API design, testing, and governance, streamlining the entire API development lifecycle. This article compares the following five top tools for API simulation and testing—Postman, LocalStack, MockServer, Speedscale, and Microcks—based on scalability, developer and user experience, customization options, integration capabilities (especially with Kubernetes), licensing, and traffic replay functionality. API simulation is a critical capability provided by these tools, enabling high-fidelity testing and development workflows.

Introduction to API Mocking

API mocking is a foundational technique in modern software development that enables teams to simulate the behavior of APIs without relying on the actual backend services. By using API mocking tools, developers can create mock APIs that replicate the expected responses, error codes, and data structures of real APIs. This approach allows application code to be developed and tested in isolation, reducing dependencies on external systems and minimizing delays caused by incomplete or unavailable APIs. With API mocking, teams can confidently test their applications, validate integrations, and ensure that their software behaves as expected, even before the real API is fully implemented or deployed.

Benefits of API Mocking Tools

API mocking tools bring a host of advantages to development teams aiming for speed, quality, and collaboration. By decoupling application code from real API dependencies, these tools allow developers to move forward without waiting for backend services to be ready. This accelerates the development process and enables teams to test a wide range of scenarios, including edge cases and error conditions, that might be difficult or costly to reproduce with live APIs. API mocking tools also foster team collaboration by allowing developers, testers, and frontend engineers to work in parallel, each using mock APIs to simulate the parts of the system they depend on. Ultimately, this approach streamlines the testing process, reduces infrastructure costs, and ensures that applications are robust and reliable across various scenarios.

Choosing the Right Mocking Tool

Selecting the best API mocking tool for your project involves evaluating several key factors. Consider the specific requirements of your application, such as the need to support REST, GraphQL, or gRPC protocols, and whether the tool integrates smoothly with your existing CI/CD pipelines. Look for a mocking tool that offers dynamic responses and precise control over mock behavior, enabling you to simulate complex scenarios, including error handling and latency. Ease of use is also important—some tools are tailored for specific languages or frameworks, while others are more flexible and technology-agnostic. Ultimately, the right tool should empower your team to efficiently manage mock servers, automate tests, and maintain high-quality mock definitions throughout the development lifecycle.

Postman

Postman is a widely-used tool for API testing among developers. The GUI makes it easy to create requests and organize them into collections. Postman provides built-in test snippets and test automation that allows you to quickly create and run tests to validate API functionality, thereby saving time and effort compared to manual software testing. Postman offers built-in support for importing API specifications such as OpenAPI and AsyncAPI, as well as live editing for seamless integration and automation. It allows users to write tests directly within the platform, streamlining the API development and testing workflow. Postman also leverages environment variables to customize mock server responses for different testing scenarios or deployment environments, improving collaboration and consistency. Additionally, Postman's client API enables dynamic configuration and management of mock servers, enhancing flexibility for various programming languages and workflows.

Scalability

Postman manages a wide range of API testing scenarios, from basic unit tests using service virtualization to complex integration testing. It organizes requests into collections that you can execute using the Collection Runner or Newman. Features like Collection Runner are beneficial for large projects with complex workflows that require software testing multiple APIs in a specific sequence. Regardless of the size of the project, Newman is valuable for integrating API tests into your continuous integration and continuous delivery (CI/CD) pipeline.

However, its scalability in load testing is limited by the host machine’s resources - making large scale loads a bit more taxing compared to WireMock, which focuses on mocking HTTP requests without actual network calls.

Developer/User Experience

Postman’s intuitive GUI appeals to beginners, with features like autocomplete and prebuilt templates that simplify the API development and testing process. Experienced developers will appreciate the advanced capabilities for functional testing, collaboration, as well as built-in automation like CI/CD integration, visualizing response data, conditional workflows, and pre-request/post-response scripts. In contrast, WireMock has a steeper learning curve due to its configuration-based approach and reliance on JSON or XML files. Automation has to be manually scripted.

Tutorial: Postman Load Test Tutorial

API Mocking Customization

The tool allows detailed customization of API requests and responses. You can modify headers, set query parameters, and define body data using various formats like raw text, JSON, XML, or form data directly within the user interface. Postman enables users to create and manage mock configurations for different testing scenarios, making it easy to import, export, and share setups across teams. The platform also supports generating random data for use in mock responses, which helps simulate unpredictable API behavior during testing. The GUI also supports pre-request scripts and tests in JavaScript, which enable dynamic data generation and response validation. WireMock provides similar customization through stubbing, but you’d need to manually edit configuration files, which is less straightforward than Postman’s GUI.

Integration

Postman integrates with a variety of CI/CD tools, including Jenkins, GitHub Actions, GitLab CI, and CircleCI. This enables you to automate API testing as part of your continuous integration and delivery pipelines. You can also integrate version control (GitHub, GitLab, Bitbucket, and Azure DevOps), API monitoring (Datadog and New Relic), API design (Apicurio Studio), API automation (Workato), API testing (Speedscale), and a number of other tools for API development. Additionally, the CLI enables your teams to execute test collections and view detailed reports within your CI/CD platforms.

Setup and Running

Postman can be run as a standalone application on Windows, macOS, and Linux. It also has a web service interface for managing API collections and tests. For test automation, it provides the Newman CLI, which can be integrated into CI/CD pipelines.

Licensing

Postman uses a tiered licensing model, including a free tier with limited features and paid plans with advanced capabilities. WireMock is open source, so it doesn’t have licensing costs, which may appeal to budget-conscious teams and developers who prefer open source solutions.

Traffic Replay

With Postman, you can record API interactions and then use the test data from those recordings to build test cases and simulate realistic test scenarios. Postman can also record real traffic, enabling the creation of high-fidelity test cases that closely mirror actual production behavior. These capabilities are useful for identifying performance bottlenecks, thereby ensuring your APIs can handle real-world traffic patterns and maintain the reliability of your API infrastructure.

Features like a built-in proxy for capturing HTTP and HTTPS traffic, an interceptor for browser traffic, and support for importing HAR files to generate collections provide a user-friendly way to capture and replay HTTP requests. Postman offers flexible proxy configuration options for capturing and replaying HTTP traffic, making it easier to intercept and test API calls without complex setup. WireMock supports generic request matching and response stubbing but lacks Postman’s visual interface and analysis tools.

How to load test using Postman

LocalStack

LocalStack is an open source tool that emulates various AWS services, using chaos engineering to help test your website. It allows developers to run and test their applications locally without connecting to the actual AWS cloud environment. LocalStack is particularly useful for mocking and testing AWS-specific HTTP services, enabling teams to simulate real-world scenarios. It also simplifies managing mock servers for AWS service emulation, making it easier to configure, start, and stop mock environments as needed. It has extensive support for AWS-specific services and eliminates the complexity and financial risks associated with using real AWS services during development and functional testing. This means developers get to test their applications in a controlled developer environment without incurring costs or dealing with the potential issues of using live AWS resources.

Scalability

LocalStack can handle multiple concurrent requests and scale to support various application needs. However, its scalability is limited by the local machine’s resources. As mentioned earlier, WireMock is a lightweight HTTP mocking tool so it’s less resource-intensive without the same level of AWS sophistication as LocalStack.

Developer/User Experience

LocalStack provides a local AWS-like environment, which is great for developers familiar with AWS. For newcomers, however, it presents a steeper learning curve to build services compared to WireMock. WireMock’s simpler setup process and syntax make it easier for developers to start mocking HTTP requests and responses quickly.

Customization

LocalStack offers extensive customization options for emulating AWS services, such as setting custom endpoints and defining resource policies. WireMock focuses on HTTP request matching and response stubbing, providing detailed control over individual API interactions. While both offer customization, LocalStack is geared towards AWS-specific services, whereas WireMock is more general.

Integration for Integration Tests

LocalStack integrates well with AWS services, including services like S3, DynamoDB, and Lambda, making it ideal for applications heavily relying on AWS. LocalStack supports infrastructure as code (IaC) tools like Terraform and AWS CloudFormation, allowing your teams to test their cloud infrastructure configurations locally before deploying to production, rather than only testing the web infrastructure in the WireMock cloud. It also works with popular CI/CD platforms such as CircleCI, GitHub Actions, GitLab CI, and Jenkins. As WireMock is a general HTTP mocking tool, it can integrate with any system that communicates over HTTP, which makes its integration options a bit more versatile.

Setup and Running

LocalStack can be deployed using Docker, which enables you to run it on any system that supports Docker containers. It can also be integrated into CI/CD pipelines using native plugins for CircleCI and a generic driver for other CI platforms.

Licensing

LocalStack offers a free community edition and paid enterprise tiers. The community edition is suitable for individual developers or small teams, while the paid tiers offer additional features and support. As mentioned, WireMock is open source and free under the Apache License 2.0.

Traffic Replay

LocalStack doesn’t have built-in traffic replay functionality and focuses on emulating AWS services. Additional tools or custom implementations are needed for traffic replay. WireMock can record and replay HTTP traffic using its HTTP request matching and response stubbing capabilities.

Blog: Speedscale vs. LocalStack for Realistic Mocks

MockServer (Mock Servers)

MockServer has rich request-matching features that allow you to take precise control over mock behavior. It supports matching based on URL, method, headers, cookies, query parameters, and even request body patterns. MockServer allows developers to stub HTTP endpoints and simulate API responses, making it possible to simulate and test specific HTTP endpoints. It also supports request verification to ensure accurate simulation and debugging. MockServer is commonly used to test applications that depend on external APIs and plays a key role in facilitating integration tests by simulating external dependencies. It can also act both as mock servers and proxy servers, which enhances its utility in creating realistic testing environments. Developers can integrate MockServer into their existing infrastructure and CI/CD pipelines by running it as a standalone process, deployed as a WAR (Web Application Resource) file in a servlet container or as a Docker container.

Scalability

MockServer can manage massive amounts of concurrent requests, so it’s suitable for performance testing large APIs at scale. To manage a large number of concurrent connections efficiently, MockServer primarily uses Netty, an asynchronous event-driven network application testing framework, to maximize the scalability of HTTP and HTTPS communication. Netty uses a non-blocking I/O model and a thread pool to handle I/O operations and events. As a result, this allows MockServer to serve many clients with fewer threads compared to traditional blocking I/O models.

Developer/User Experience

MockServer offers multiple deployment options, including Maven, Docker, and Java API, providing flexibility based on the user’s environment. It has a feature-rich UI that enables you to view internal states such as logs, active expectations, received requests, and proxied requests. As a result, this makes it easier to manage and debug API interactions and monitor the behavior of a mock server instance. While it has extensive documentation, new users might find the initial setup more complex than WireMock, which has a simpler setup process. However, running it as a standalone process makes it easier to integrate into existing infrastructure making it useful for expansions.

Customization

For customization in API testing tools, the first thing you want to look at is how the tool handles request matching and response generation. MockServer has detailed request-matching features, including matching by URL, method, headers, cookies, query parameters, and body content using JSON schema, regular expressions, and exact matches. MockServer can use request data such as query parameters and headers to generate dynamic responses, allowing for advanced templating and request verification. It also supports dynamic response generation using JavaScript, which enables the creation of response bodies based on the content of incoming requests. Additionally, MockServer supports fault simulation by introducing delays and errors, making it possible to test network robustness and application resilience under adverse conditions. WireMock also provides good customization features, but MockServer’s level of detail offers more granular control.

Integration

MockServer provides a REST API and a Java library for creating, updating, and deleting expectations programmatically, making it seamless to integrate with CI/CD scripts. As a powerful Java-based library for mocking web services, MockServer is well-suited for JVM-based testing environments. You can integrate MockServer directly into your test code for flexible API mocking during integration testing. It can be integrated with CI/CD tools such as Jenkins, CircleCI, and Travis CI. You can also use it in tandem with API testing tools like Postman and SoapUI. These tools send requests to MockServer and validate the mock responses against the defined expectations. MockServer generates detailed logs of all the incoming requests and responses it handles. You also have the option to integrate these logs with centralized logging and monitoring solutions like the ELK Stack (Elasticsearch, Logstash, Kibana) or Splunk.

Setup and Running

MockServer can be deployed as a Maven plugin, a Docker container, or programmatically via a Java API. It also supports deployment within Kubernetes clusters using Helm charts. Additionally, MockServer can run as a standalone server, allowing independent API simulation without embedding it into your application code.

Licensing

MockServer is open source software released under the Apache License 2.0, which allows for free use, modification, and distribution. This is similar to WireMock.

Traffic Replay

MockServer can act as a proxy to record and replay HTTP traffic, providing realistic test cases and thus more realistic test data. It captures detailed data about the request and response bodies and converts the data into expectations for replay. In addition, MockServer is capable of capturing and reproducing real network behavior, including response timing and data flows, to enable high-fidelity testing environments. WireMock offers similar functionality but with a different approach to recording and replaying interactions.

How to mock APIs in Kubernetes

Speedscale

Speedscale is a service that runs live API tests and mocks for your infrastructure based on your production data. It’s a good option for teams looking for an out-of-the-box solution with minimal configuration requirements. Speedscale serves as a comprehensive platform for API simulation, testing, and governance, enabling high-fidelity simulation of real API behaviors and seamless integration with existing development workflows. Speedscale offers deep integration with Kubernetes and can provide realistic load testing using actual production traffic. It’s also a strong solution for teams looking to optimize their performance testing process and workflows in containerized environments.

Scalability

Speedscale is designed to scale efficiently within Kubernetes clusters. Speedscale’s Kubernetes operators can capture and replay real production traffic without the need for separate load testing infrastructure. Furthermore, Speedscale’s architecture takes advantage of the scalability and resilience of Kubernetes. When running load tests directly within a cluster, Speedscale eliminates the need for additional infrastructure, reducing infrastructure costs and ensuring that the tests reflect the application’s performance in its actual runtime environment. In contrast, WireMock, while capable of handling a wide range of testing scenarios, will require additional configuration and resources to achieve optimal performance under heavy loads. Scaling WireMock involves running multiple instances and load balancing between them on the WireMock server instance, which is much more complex to set up and manage.

Developer/User Experience

Speedscale prioritizes the developer experience by providing a no-scripting-required approach to load testing and API mocking. While WireMock relies on manual configuration and scripting, Speedscale automates much of the process, allowing developers to focus on writing code rather than creating test scripts. This is achieved through Speedscale’s ability to capture and replay real production traffic, which eliminates the need for time-consuming mock creation. Furthermore, Speedscale’s visual interface and rapid feedback loop enable developers to quickly assess the performance of their applications and identify potential issues, lowering the learning curve and making the tool easier to use.

Test Data Customization

Speedscale’s approach to customization focuses on automating the generation of realistic mocks and load tests based on actual production traffic. Speedscale allows you to customize traffic patterns, introduce chaos engineering principles through chaos testing scenarios, and simulate varying network conditions. The traffic replay feature generates mocks based on captured production traffic. You can further customize the mocks using transforms to modify captured traffic data (for example, editing specific fields, parameterizing values, or injecting custom logic before it is replayed). The chaos testing capabilities enable you to introduce variable latency, errors, and unresponsive dependencies during traffic replay. In contrast, WireMock allows you to manually edit configuration files and offers customization through stubbing. This is good enough for individual API endpoints, but if your project prioritizes realistic testing scenarios with minimal manual setup, Speedscale is a better option.

Integration

Speedscale integrates with CI/CD platforms such as Jenkins, GitHub Actions, and GitLab CI. This allows for load test automation and traffic replay as part of your continuous integration and delivery processes. Speedscale also supports integration with monitoring and observability tools like New Relic, which enables you to track performance metrics and identify bottlenecks during tests. You can also import your traffic replay reports into application performance management (APM) platforms like Datadog.

Speedscale has deep integration with Kubernetes, meaning that it’s designed to work seamlessly within the Kubernetes ecosystem. It uses Kubernetes operators to manage test orchestration and teardown, so it’s seamless to run distributed load tests directly within Kubernetes clusters. This Kubernetes-native approach allows the tool to simulate real-world traffic patterns without the need for additional infrastructure, ensuring cost-effective load testing. For example, Speedscale can capture traffic from a production environment and replay it in a staging environment to test how new code changes handle real-world usage.

Setup and Running

Speedscale is designed to run natively within Kubernetes clusters, utilizing Kubernetes operators for data collection and traffic replay. It can also be run in Docker for local testing and development.

Licensing

While WireMock is open source and freely available, Speedscale provides both a free trial and paid enterprise tiers. You can experience the full range of Speedscale’s features with the free trial, and the paid tiers offer additional benefits, such as increased data limits, single sign-on support, and dedicated customer support.

Traffic Replay

So, Speedscale helps you test your applications using real-world traffic patterns. But how does Speedscale implement traffic replay? First, it captures the traffic using a sidecar proxy to intercept and record all incoming and outgoing requests to your application. Once the traffic is captured, Speedscale allows you to analyze and filter the data. You can specify the exact set of calls you want to replicate or specific time periods. After capturing and analyzing the traffic, you can replay it in your preferred environment. Speedscale supports two main methods for traffic replay: through its web service UI or using the command line tool (CLI).

While WireMock can simulate API responses, it cannot replay actual production traffic, making it less effective at creating representative test environments.

Microcks

Microcks is an open source Kubernetes-native tool for API mocking and testing that provides an enterprise-grade solution to speed up, secure, and scale your API strategy. Its support for a broad range of API specifications, including OpenAPI, AsyncAPI, GraphQL schemas, and gRPC/Protobuf schemas, makes it a versatile tool for modern API development and testing. Microcks excels at mocking HTTP services, enabling teams to simulate real-world API interactions and streamline the development and testing process.

Scalability

Microcks’s architecture supports high availability and load handling by deploying multiple instances. Its Kubernetes-native approach enables seamless scaling within clusters. In comparison, WireMock may require additional resources and configuration for heavy loads in the WireMock cloud.

Developer/User Experience

Microcks has a user-friendly web service interface that simplifies managing API mocks and tests. The UI includes features like a “Copy as curl command” button for mock testing and an “Add to your CI/CD” button that generates code snippets for integration into CI/CD pipelines. Microcks also provides detailed summaries of executed unit tests, including metrics like Conformance index and Conformance score, which help assess how well an API implementation adheres to its contract. The summaries also include detailed request and response pairs, allowing you to see the exact payloads and headers exchanged during tests.

Unlike WireMock, which relies on manual configuration, Microcks simplifies the process with its intuitive UI and example-driven approach.

Customization

Microcks has a wide range of customization options for API mocking and testing. It supports multiple API specifications, including OpenAPI, AsyncAPI, GraphQL, gRPC/Protobuf, Postman collections, and SoapUI projects, meaning you can generate mocks from these definitions. It allows you to use templating to create dynamic mock responses and define custom dispatching rules to match requests based on various criteria like URL, method, headers, and body content. It also supports schema validation to ensure that requests and responses conform to their respective API contracts.

WireMock also offers extensive customization, but Microcks’s broad mock API specification support and compatibility with API design tools adds an extra layer of versatility.

Integration

Microcks’s deep integration with Kubernetes makes it suitable for cloud-native API development. It uses Kubernetes-native features and resources to provide an effective testing and mocking experience. Other integration options include popular CI/CD platforms like Jenkins, GitHub Actions, and Tekton through the CLI. You can also integrate private or third-party Java applications and libraries to customize the behavior of Microcks during mock invocation. Microcks integrates with Apicurio Studio, an API design tool that allows you to mock your API definitions with just a single click.

While WireMock can be used in Kubernetes, Microcks’s native design and extensive integration options make it a more fitting choice for such environments.

Setup and Running

Microcks can be deployed on Kubernetes using Helm charts or operators, making it easier to integrate into cloud-native environments. It also supports deployment as a standalone instance using Docker.

Licensing

Microcks is an open source, community-driven tool and is part of the Cloud Native Computing Foundation (CNCF) landscape.

Traffic Replay

Microcks allows you to record HTTP traffic and convert it into mocks, thereby creating realistic test scenarios. When creating and using mock data for testing, it is crucial to manage sensitive data carefully to prevent exposure and ensure security and compliance. These mocks are based on recorded requests and responses. They can be customized using dispatching rules and response templating. When a request matching the recorded traffic is received, Microcks responds with the corresponding predefined response. This process involves capturing detailed request and response data, such as headers, body content, and query parameters, and storing them as mock definitions. It also captures and replays traffic across various mock API specifications and protocols, providing a comprehensive traffic replay solution. While WireMock also supports traffic replay, Microcks extends this functionality across a wider range of mock API specifications and protocols, such as OpenAPI, AsyncAPI, GraphQL, and gRPC/Protobuf.

Best Practices for API Mocking

To maximize the effectiveness of API mocking, it’s important to follow a set of best practices. Start by ensuring that your mock APIs accurately reflect the real API’s behavior, including response formats, error codes, and performance characteristics. This realism helps developers and testers identify issues early and avoid surprises during integration. Make your mock APIs easily configurable to support a variety of testing scenarios, such as simulating different data sets or network conditions. Use version control to manage your mock definitions, so changes are tracked and can be rolled back if needed. Finally, integrate your API mocking tools into your CI/CD workflows, allowing developers to automate tests and maintain consistency across environments. By following these practices, teams can streamline their testing processes and deliver more reliable software.

Rapid Prototyping with API Mocking

API mocking is a powerful enabler for rapid prototyping, allowing teams to quickly simulate API interactions and validate application functionality even before backend services are complete. With API mocking tools, developers can build and test user interfaces and business logic in parallel with API development, significantly shortening the development cycle. This approach provides immediate feedback, as stakeholders can interact with a working prototype that mimics real API-driven features. By supporting iterative testing and refinement, API mocking accelerates time to market and helps ensure that the final product meets user expectations. For teams aiming to innovate quickly and deliver high-quality applications, API mocking is an essential part of the modern development toolkit.

Conclusion

In this article, you explored five alternatives to WireMock for API mocking and testing: Postman, LocalStack, MockServer, Speedscale, and Microcks. Each tool has its own strengths and weaknesses and caters to different testing needs and environments. Postman is known for its user-friendly interface and features for API development and testing, but it struggles with scalability in load testing due to the limitations of the host machine’s resources when simulating virtual users locally. LocalStack is great for emulating AWS services locally, offering a cost-effective and secure way to test AWS-dependent applications, but it’s limited to AWS services or tools. MockServer shines with its detailed request-matching features and flexible deployment options, making it ideal for complex testing scenarios and integration with Kubernetes. Finally, Microcks is great if you want a more simplified approach that’s also community-driven and open source.

While all these tools have their strengths, Speedscale stands out as the best alternative for Kubernetes load testing. Its deep integration with Kubernetes, ability to run distributed load tests directly in clusters, support for chaos testing, and seamless CI/CD integration make it the go-to choice for developers and teams looking to optimize their testing workflows.

Experience the benefits of Speedscale firsthand by exploring the Speedscale Sandbox, which comes preloaded with traffic to help you get started quickly. To see how Speedscale’s traffic replication and automated mocking can streamline your testing workflows, start your free thirty-day trial now or schedule a personalized demo.

Originally published at speedscale.com

How to Tame Your AI Agents: From $900 in 18 Days to Coding Smarter

Ken Ahrens — Tue, 12 Aug 2025 23:23:53 +0000

It started with a curiosity and ended with a $900 bill. Eighteen days. Three AI coding agents: Claude Code, Gemini CLI, Cursor and Codex. What could possibly go wrong? Turns out, everything—until I learned how to tame them.

When I first fired up Cursor back in March, it was like having a hyperactive coding partner who never needed coffee breaks. I used it to freshen up product docs and tweak a few demo apps. Then Claude Code hit the scene in June and I dove headfirst into something more ambitious: vibecoding a complete CRM demo app (react frontend, go backend, postgres database). That worked so well, I figured—why not push it further?

Gemini CLI arrived just in time for me to test it on an even bigger challenge: building a banking microservice application with full OpenTelemetry tracing. Since we use Google Workspace, working with Gemini AI Agent seemed like a no-brainer. But where Claude kept pace and Cursor quickly showed off code changes, Gemini sometimes got lost in its own loops—one particularly wild day ended with it racking up $300 in charges all by itself.

By the end of July, I’d also migrated our marketing site from WordPress to an Astro content site, and GPT-5 Codex had entered the chat. I had four AI development tools at my fingertips and an itch to see how far I could take them. In less than three weeks, I burned through $900 for API costs and monthly subscription fees (about $50 per day of #vibecoding).

The Costly Lessons

Don't Let the AI Drive

The biggest mistake I made early on was treating AI agents like senior developers who could just "figure it out." I'd give them vague instructions like "build a microservices app" and watch them spiral into increasingly complex solutions that solved problems I didn't have.

AI agents work best when managed like talented junior engineers: give them clear requirements, specific constraints, and well-defined deliverables. Create a PLAN.md that breaks down exactly what you want, in what order, with clear boundaries. Then supervise each step before letting them move to the next one. This is a great primer from Rich Stone on how to Code with LLMS and a PLAN.

Think of it as technical leadership, not delegation. You're the architect; they're the implementers. If you learn something new about your architecture while building a task from the list, then tell the AI Agent to make a note about it in ARCHITECTURE.md so it will keep the standards. It really wants to not follow the standards so you may need to remind it frequently.

The Docker Identity Crisis

Another one of my painful headaches came from letting an AI mix Docker Compose (for local) and Kubernetes (for production) configs without clear boundaries. One minute it’s spinning up a clean docker-compose.yml for local dev, the next it’s sprinkling Kubernetes Deployment YAML into the mix—resulting in setups that ran nowhere. And when I asked it to test something, it would run part in docker and part in K8S and get itself easily confused.

The fix? Separate everything. I now keep local and production infra in completely different directories and make it painfully clear to the AI which world we’re in before it writes a single line.

├── kubernetes
│   ├── base
│   │   ├── configmaps
│   │   │   ├── app-config.yaml
│   │   │   └── app-secrets.yaml
│   │   ├── database
│   │   │   ├── postgres-configmap.yaml
│   │   │   ├── postgres-deployment.yaml
│   │   │   ├── postgres-pvc.yaml
│   │   │   └── postgres-service.yaml
│   │   ├── deployments
│   │   │   ├── accounts-service-deployment.yaml
│   │   │   ├── api-gateway-deployment.yaml
│   │   │   ├── frontend-deployment.yaml
│   │   │   ├── transactions-service-deployment.yaml
│   │   │   └── user-service-deployment.yaml
│   │   ├── ingress
│   │   │   ├── frontend-ingress-alternative.yaml
│   │   │   └── frontend-ingress.yaml
│   │   ├── kustomization.yaml
│   │   ├── namespace
│   │   │   └── namespace.yaml
│   │   └── services
│   │       ├── accounts-service-service.yaml
│   │       ├── api-gateway-service.yaml
│   │       ├── frontend-service-nodeport.yaml
│   │       ├── frontend-service.yaml
│   │       ├── transactions-service-service.yaml
│   │       └── user-service-service.yaml

OpenTelemetry Overload

Then came observability. I trusted the AI to set up tracing across Node.js and Spring Boot services. Big mistake. It pulled in deprecated Node OTel APIs, tried to auto- and manually instrument Spring Boot at the same time (hello, duplicate spans), and wrote Jaeger configs that didn’t match my collector.

Now I predefine exactly which observability stack I’m using—library names, versions, and all—and paste that into every session so the AI can’t go rogue. If you're not sure, ask the AI to audit what it installed and double check if those are the right versions or the right configs. It realized that it had the wrong configs for Jaeger and recommended installing the OTEL Collector which cleaned up the config quite a bit.

The 1.8GB Node.js Docker Image

This one was a shocker. Here's what the AI generated for our Next.js frontend—a classic case of "it works" without any thought about efficiency:

# What the AI built (simplified version)
FROM node:20
WORKDIR /app
COPY package*.json ./
RUN npm install  # Installs ALL dependencies, including dev ones
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]

This innocent-looking Dockerfile created a 1.8GB monster. The base Node 20 image alone is 1.1GB, then it installed all dev dependencies (including things like TypeScript, ESLint, and testing frameworks that shouldn't be in production), copied the entire source tree, and kept everything.

I only realized how bad it was when a user casually mentioned, "Your images take forever to start." Sure enough, the startup lag was brutal. The AI had made no attempt to slim things down because I hadn't told it to.

The fix required explicit instructions about multi-stage builds and production optimization—resulting in a 97% size reduction from 1.8GB to ~50MB. If you don't explicitly demand lean builds, it won't even try.

The Wins

1. PLAN.md as a North Star – Writing a detailed PLAN.md with every service, API, and today's focus point keeps the AI grounded. Hallucinations dropped by about 80% once I started using this. It's the one file that gives the AI its "map" before it starts building. Also checking things off your plan makes you feel that incremental progress like something is actually getting done around here.

2. Multi-Agent Workflow – Sometimes one agent just isn't enough. Rather than relying on a single AI that might have blind spots, I started configuring Claude to "call out" to specialized sub-agents for second opinions—like having a Gemini agent act as fact-checker or a critical thinking agent provide analytical feedback. Each sub-agent gets a clean context window and specialized tooling for their specific role. This approach delivered measurably better results: studies show up to 90% improvement over standalone agents on complex tasks. You're essentially building a specialized team where each AI has a focused expertise rather than asking "a chef to fix a car engine." My friend Shaun wrote more about this approach in Is Your Agent Lying?

3. The "Prove It" Step – This is where I make the AI prove it tested its own work. Good is having it run a quick self-check and explain what it tested. Better is TDD—writing the tests first, then building to make them pass. Best is when those tests run automatically in CI with hooks that block anything failing from merging. This one change has caught more silly errors than I'd like to admit.

4. Real Traffic Testing with ProxyMock – Unit tests are great, but they don't catch integration failures or API contract changes. I started using proxymock to record real production traffic patterns, then replay them against new versions of services. This caught several breaking changes that would have slipped through traditional testing—like when the AI "optimized" a JSON response structure without realizing downstream services depended on the original format. Recording actual traffic patterns and replaying them against every code change became the ultimate safety net for AI-generated modifications.

LATENCY / THROUGHPUT
+--------------------+--------+-------+-------+-------+-------+-------+-------+-------+------------+
|      ENDPOINT      | METHOD |  AVG  |  P50  |  P90  |  P95  |  P99  | COUNT |  PCT  | PER-SECOND |
+--------------------+--------+-------+-------+-------+-------+-------+-------+-------+------------+
| /                  | GET    |  1.00 |  1.00 |  1.00 |  1.00 |  1.00 |     1 | 20.0% |      18.56 |
| /api/numbers       | GET    |  4.00 |  4.00 |  4.00 |  4.00 |  4.00 |     1 | 20.0% |      18.56 |
| /api/rocket        | GET    |  4.00 |  4.00 |  4.00 |  4.00 |  4.00 |     1 | 20.0% |      18.56 |
| /api/rockets       | GET    |  4.00 |  5.00 |  5.00 |  5.00 |  5.00 |     1 | 20.0% |      18.56 |
| /api/latest-launch | GET    | 34.00 | 34.99 | 34.99 | 34.99 | 34.99 |     1 | 20.0% |      18.56 |
+--------------------+--------+-------+-------+-------+-------+-------+-------+-------+------------+

1 PASSED CHECKS
 - check "requests.response-pct != 100.00" was not violated - observed requests.response-pct was 100.00

Was It Worth It?

As a startup co-founder, my world isn’t measured in billable hours—it’s measured in how quickly we can get something in people’s hands, learn from it, and ship the next iteration. The banking demo wasn’t just an experiment; it was a race against the clock to have something ready for KubeCon India.

We made it. The team presented the project on stage, showing off our “Containerized Time Travel” with traffic replay. It was the perfect proof point that speed and iteration matter more than perfection in the early days.

You can watch their talk here: Containerized Time Travel with Traffic Replay – KubeCon India.

AI Agent Troubleshooting Checklist

When your AI agent starts spinning its wheels or burning through tokens, stop and check:

Context overload: Is the conversation too long? Start fresh with a clear, focused prompt
Vague requirements: Did you give it a specific goal or just say "make it better"?
Missing constraints: Have you defined boundaries (tech stack, file structure, performance requirements)?
No success criteria: How will the AI know when it's done?
Tool confusion: Is it trying to use the wrong approach for the task (e.g., complex Kubernetes for a simple local dev setup)?
Infinite loops: Is it repeatedly "fixing" the same issue? Stop and reframe the problem
Scope creep: Has it started solving problems you didn't ask it to solve?

When in doubt, restart with a PLAN.md that breaks down exactly what you want, then hand it one piece at a time.

How I'll Avoid Another $900 Sprint

Choose a main model and go for their version of an "unlimited" plan. As of August 2025 for example you can get Claude Max for $200 with high limits and no per-API costs.
The web interfaces are good for building out a plan, have it research and draft the initial plan, which you then hand over to the AI Agent.
Check the dependencies of your project. The AI tools readily add new libraries, keep it in line with ARCHITECTURE.md. An easy way to tell is when you check in code see if your pom.xml or package.json or go.mod has new entries.
Enforce small diffs. Have it make a branch and separate check-in for each change. Then run "/clean" in between steps on your PLAN.md

Ready to Tame Your AI Agents?

The journey from chaos to control with AI coding agents isn't about avoiding them—it's about learning to tame them. With the right approach, these tools can accelerate your development without draining your bank account.

I'd love to hear your story. What's the most expensive lesson you've learned with AI coding agents? Share it—we might just build the ultimate survival guide together.

Record API calls in prod, replay in dev to test

Ken Ahrens — Sun, 28 Jul 2024 20:07:26 +0000

Introduction

Have you ever experienced the problem where your code is broken in production, but everything runs correctly in your dev environment? This can be really challenging because you have limited information once something is in production, and you can’t easily make changes and try different code. Speedscale production data simulation lets you securely capture the production application traffic, normalize the data, and replay it directly in your dev environment.red

There are a lot of challenges with trying to replicate the production environment in non-prod:

Data - Production has much more data and a much wider variety than non-prod
Third Parties - It’s not always possible to integrate non-prod with third party sandboxes
Scale - The scale of non-prod environment is typically just a fraction of production

By using production data simulation, you can bring the realistic data and scale from production back into the non-prod dev and staging environments. Like any good process implementing Speedscale boils down to 3 simple steps:

Record - utilize the Speedscale sidecar to capture traffic
Analyze - identify the exact set of calls you want to replicate from prod into dev
Replay - utilize the Speedscale operator to run the traffic against your dev cluster

“Works on my machine” -Henry Ford (not a real quote)

Record

In order to capture traffic from your production cluster, you’re going to want to install the operator (helm chart is usually the preferred method). During the installation, don’t forget to configure the Data Loss Prevention (DLP) to identify sensitive fields you want to mask, a good example is the HTTP Authentication header. Configuring DLP is as easy as these settings in your values.yaml file:

# Data Loss Prevention settings.
dlp:
    enabled: true
    config: "standard"

Once you have the operator installed, then annotate the workload you’d like to record, for example if you have an nginx deployment, you can run something like this (or the GitOps equivalent if you prefer):

kubectl annotate deployment nginx sidecar.speedscale.com/inject="true"

Check and make sure your pod got the sidecar added, you should see an additional container.

⚡ Note there are additional configuration options as needed for more complex use cases

Analyze

Now that you have the sidecar, you should see the service show up in Speedscale. At a glance you’re able to see how much traffic your service is handling, and what are the real backend systems it relies upon. For example our service needs data in DynamoDB and real connections to Stripe and Plaid to work. In a corporate dev environment this kind of access may not be properly configured. Fortunately with Speedscale, we will be able to replicate even these third-party APIs into our dev cluster.

Drilling down further into the data you can see all the details of the calls, including the fact that the Authorization data has been redacted. There is a ton of data available, and it’s totally secure.

Set the right time range for your data and add some filters to make sure you include just the traffic that you want to replay. Finally hit the Record button to complete the analysis.

Replay

Just like during the record step, you will want to make sure the Speedscale operator is installed in your dev cluster. You can use the same helm chart install as previous, but remember to give your cluster a new name like dev-cluster or whatever is your favorite name.

The wizard lets you pick and choose which ingress and egress services you want to replay in your dev cluster. This is how you’ll solve the problem for not having the right data in DynamoDB, or how to provide the Stripe and Plaid responses even if you don’t have it configured in the dev cluster.

Finally you can take the traffic you’ve selected and replay it locally in your non-prod dev cluster. Speedscale takes care of normalizing the traffic and modifying the workload so that a full production simulation takes place. The code you have running will behave just the same way it does under production conditions because the same kinds of API traffic and data are being used.

When the traffic replay is complete, you’ll get a nice report to understand how the traffic behaved in your dev cluster, you can even change configurations and easily replay this traffic again.

Conclusion

You now have the ability to replay this traffic in any environment where you need it: development clusters, CI/CD systems, staging or user acceptance environments. This lets you re-create production conditions, run experiments, validate code fixes, and have much higher confidence before pushing these fixes to production. If you are interested in validating this for yourself, feel free to learn more here.

Testing LLMs for Performance with Service Mocking

Ken Ahrens — Tue, 26 Mar 2024 22:15:12 +0000

While incredibly powerful, one of the challenges when building an LLM application (large language model) is dealing with performance implications. However one of the first challenges you'll face when testing LLMs is that there are many evaluation metrics. For simplicity let's take a look at this through a few different test cases for testing LLMs:

Capability Benchmarks - how well can the model answer prompts?
Model Training - what are the costs and time required to train and fine tune models?
Latency and Throughput - how fast will the model respond in production?

A majority of the software engineering blogs you’ll find related to LLM software testing cover capabilities and training. However the reality is that these are edge cases and you'll likely call a 3rd party API to get a response, it's that vendor's job to handle capabilities and training. What you’re left with is figuring out performance testing— how to improve the latency and throughput— which is the focus of the majority of this article.

Capability Benchmarks

Here is an example of a recent benchmark test suite from Anthropic about the comparison of the Claude models compared with generative AI models from OpenAI and Google. These capability benchmarks help you understand how accurate the responses are at tasks like getting a correct answer to a math problem or code generation.

Source: https://www.anthropic.com/news/claude-3-family

The blog is incredibly compelling, however it's all functional testing— there is little performance testing considerations such as expected latency or throughput. The phrase "real-time" is used however specific latency is not measured. The rest of this blog will cover some techniques to get visibility into latency, throughput and various ways to validate how your code will perform against model behavior.

Model Training

If you run searches to learn about LLM, much of the content is related to getting access to GPUs so you can do your machine learning training. Thankfully however there has been so much effort and capital that has been put into machine learning training that most "AI applications" utilize existing models that have already been well trained. Your AI applications might be able to take advantage of an existing model and simply fine tune it on some aspects of your own proprietary data. For the purposes of this blog we will assume your AI systems have already been properly trained and you’re ready to install it in production.

Latency, Throughput and SRE Golden Signals

In order to understand how well your application can scale, you can focus on the SRE golden signals as established in the Google SRE Handbook:

Latency is the response time of your application, usually expressed in milliseconds
Throughput is how many transactions per second or minute your application can handle
Errors is usually measured in a percent of
Saturation is the ability of your application to use the available CPU and Memory

Before you put this LLM into production, you want to get a sense for how your application will perform under load. This starts by getting visibility into the specific endpoints and then driving load throughout the system.

Basic Demo App

For the purposes of this blog, I threw together a quick demo app that uses OpenAI chat completion and image generation models. These have been incorporated into a demo website to add a little character and fun to an otherwise bland admin console.

Chat Completion Data

This welcome message uses some prompt engineering with the OpenAI chat completion API to welcome new users. Because this call happens on the home page, it needs to have low latency performance to enable quick user feedback:

Image Generation

To spice things up a little bit, the app also lets users generate some example images for their profile. This is one of the really powerful capabilities of a large language model but you’ll quickly see these are much more expensive and can take a lot longer to respond. You can’t put this kind of call on the home page for sure.

Here is an example of an image generated by DALL-E 2 of a unicorn climbing a mountain and jumping onto a rainbow. You're welcome.

Validating Application Signals

Now that we have our LLM selected and demo application, we want to start getting an idea of how it scales out with the SRE golden signals. To do this, I turned to a product called Speedscale which allows me to listen to Kubernetes traffic and modify/replay the traffic in dev environments, so. I can simulate different conditions at will. The first step is to install a Speedscale sidecar to capture API interactions running into and out of my user microservice. This lets us start confirming how well this application will scale once it hits a production environment.

Measuring LLM Latency

Now that we have our demo app, we want to start understanding the latency in making calls to OpenAI as part of an interactive web application. Using Speedscale Traffic Viewer, at a glance you can see the response time of the 2 critical inbound service calls:

The Welcome endpoint is responding at 1.5 seconds
The Image endpoint takes nearly 10 seconds to respond

Always compare these response times to your application scenarios. While the image call is fairly slow, it’s not called on the home page so may not be as critical to the overall application performance. The welcome chat however takes over 1 second to respond, so you should ensure the webpage does not wait for this response before loading.

Comparing LLM Latency to Total Latency

By drilling down further into each of the calls, you can find that about 85 - 90% of the time is spent waiting on the LLM to respond. This is by using the standard out of the box model with no additional fine tuning. It's fairly well known that fine tuning your model can improve the quality of the responses but will sacrifice latency and often cost a lot more as well. If you are doing a lot of fine tuning of your models, these validation steps are even more critical.

Validating Responses to Understand Error Rate

The next challenge you may run into is that you want to test your own code and the way it interacts with the external system. By generating a snapshot of traffic, you can replay and compare how the application responds compared with what is expected. It's not a surprise to see that each time the LLM is called, it responds with slightly different data.

While having dynamic responses is incredibly powerful, it's a useful reminder that the LLM is not designed to be deterministic. If your software development uses a continuous integration/continuous deployment pipeline, you want to come up with some way to make the responses consistent based on the inputs. This is one of Service Mocking's best use cases.

Comparing Your Throughput to Rate Limits

After running just 5 virtual users through the application, I was surprised to see the failure rate spike from rate limits. While this load testing is helpful so you don't inadvertently run up your bill, it also has a side effect that you can't learn the performance of your own code.

This is another good reason to implement a service mock so that you can do load testing without making your bill spike off the charts like traditional software testing would experience.

Comparing Rate Limits to Expected Load

You should be able to plan out which API calls are made on which pages and compare against the expected rate limits. You can confirm your account’s rate limits in the OpenAI docs.

Fortunately OpenAI will let you pay more money to increase these limits. However, just running a handful of tests multiple times can quickly run up a bill into thousands of dollars. And remember, this is just non-prod. What you should do instead is create some service mocks and isolate your code from this LLM.

Mocking the LLM Backend

Because the Speedscale sidecar will automatically capture both the inbound and outbound traffic, the outbound data that can be turned into service mocks.

Building a Service Mock

Find the interesting traffic showing both the inbound and outbound calls you’re interested in and simply hit the Save button. Within a few seconds you will have generated a suite of tests and backend mocks without ever writing any scripts.

Replaying a Service Mock

Speedscale has built-in support for service mocking of backend downstream systems. When you are ready to replay the traffic you simply check the box for the traffic you would like to mock. There is no scripting or coding involved, the data and latency characteristics you recorded will be replayed automatically.

Using service mocks lets you decouple your application code from the downstream LLM and helps you understand the throughput that your application can handle. And as an added bonus, you can test the service mock as much as you want without hitting a rate limit and no per-transaction cost.

Confirming Service Mock Calls

You can see all the mocked out calls at a glance on the mock tab of the test report. This is a helpful way to confirm that you’ve isolated your code from external systems which may be adding too much variability to your scenario.

You usually want to have 100% match rate on the mock responses, but in case something is not matching as expected, drill into the specific call to see the reason why. There is a rich transform system that is a good way to customize how traffic is observed and ensure the correct response is returned by the mock.

Running Load

Now that you have your environment running with service mocks, you want to crank up the load to get an understanding of just how much traffic your system can handle.

Test Config

Once the traffic is ready, you can customize how many copies you’ll run and how quickly by customizing your Test Config. It’s easy to ramp up the users or set a target throughput goal.

This is where you should be experimenting with a wide variety of settings. Set it to the number of users you expect to see to make sure you know the number of replicas you should run. Then crank up the load another 2-3x to see if the system can handle the additional stress.

Test Execution

Running the scenario is as easy as combining your workload, your snapshot of traffic and the specific test config. The more experiments you run, the more likely you are to get a deep understanding of your latency profile.

The scenarios should definitely build upon each other. Start with a small run and your basic settings to ensure that the error rate is within bounds. Before you know it you’ll start to see the break points of the application.

Change Application Settings

You’re not only limited to changing your load test configuration, you also should experiment with different memory, cpu, replica or node configurations to try to squeeze out extra performance. Make sure you track each change over time so you can find the ideal configuration for your production environment.

In my case, one simple change was to expand the number of replicas which cut way down on the error rate. The system could handle significantly more users and the error rate was within my goal range.

Sprinkle in some Chaos

Once you have a good understanding of the latency and throughput characteristics you may want to inject some chaos in the responses to see how the application will perform. By making the LLM return errors or stop responding altogether you can sometimes find aspects of the code which may fall down under failure conditions.

While chaos engineering edge cases is pretty fun, it’s important to ensure you check the results without any chaotic responses first to make sure the application scales under ideal conditions.

Reporting

Once you’re running a variety of scenarios through your application, you’ll start to get a good understanding of how things are scaling out. What kind of throughput can your application handle? How do the various endpoints scale out under additional load?

At a glance this view gives a good indication of the golden signals:

Latency overall was 1.3s, however it spiked up to 30s during the middle of the run
Throughput was unable to scale out consistently and even dropped to 0 at one point
Errors were less than 1% which is really good, just a few of the calls timed out
Saturation of Memory and CPU was good, the app did not become constrained

Percentiles

You can dig in even further by looking at the response time percentiles by endpoint to see what the typical user experience was like. For example if you look at the image endpoint, P95 of 8 seconds means that 95% of the users had a response time of 8 seconds or less which really isn’t that great. Even though the average was 6.5 seconds, there were plenty of users that experienced timeouts, so there are still some kinks that need to be worked out of this application related to images.

For even deeper visibility into the response time characteristics you can incorporate an APM (Application Performance Management) solution to understand how to improve the code. However in our case we already know most of the time is spent waiting for the LLM to respond with its clever answer.

Summary

While large language models can bring an enormous boost to your application functionality, you need to ensure that your service doesn’t fall down under the additional load. It’s important to run latency performance profiling in addition to looking at the model capabilities. It's also important to consider avoiding breaking the bank on LLMs in your continuous integration/continuous deployment pipeline. While it can be really interesting to run a model that is incredibly smart with answers, you may want to consider the tradeoff of using a simpler model that can respond to your users more quickly so they stay on your app without closing their browser window. If you'd like to learn more, you can check out a video of this blog in more detail here. If you want to dig into the world of LLM and how to understand performance, feel free to join the Speedscale Community and reach out, we’d love to hear from you.

APIs for Beginners

Ken Ahrens — Thu, 06 Jan 2022 13:28:25 +0000

Are you looking to benefit from automation but lack the experience to leverage an API? To equip you with the tools you need to start utilizing APIs and automation, we’ve put together these helpful Beginner FAQs covering common terminology, methods, and tools for testing APIs.

What is an API?

API stands for Application Programming Interface. An API is a set of programming code that enables data transmission between one software product and another.

How does an API Work?

APIs sit between an application and the web server, acting as an intermediary layer that processes data transfer between systems. Here’s how an API works:

A client application initiates an API call to retrieve information—also known as a request. This request is processed from an application to the web server via the API’s Uniform Resource Identifier (URI) and includes a request verb, headers, and sometimes, a request body.
After receiving a valid request, the API makes a call to the external program or web server.
The server sends a response to the API with the requested information.
The API transfers the data to the initial requesting application.

What is API Testing?

While there are many aspects of API testing, it generally consists of making requests to a single or sometimes multiple API endpoints and validating the response. The purpose of API testing is to determine if the API meets expectations for functionality, performance, and security.

What is the most popular kind of API?

The most used API is a RESTful API (Representational State Transfer API). RESTful APIs allow for interoperability between different types of applications and devices on the internet.

What is REST?

Representational State Transfer (REST) is a software architectural style that developers apply to web APIs. REST relies on HTTP to transfer information using requests, called ‘URLs’, to return specified data, called ‘resources’, to the client. Resources can take many forms (images, text, data). At a basic level, REST is a call and response model for APIs.

What is a REST API?

A REST API conforms to the design principles of the REST, or representational state transfer architectural style. Restful APIs are extremely simple when it comes to building and scaling as compared to other types of APIs. When these types of APIs are put into action, they help facilitate client-server communications with ease. Because RESTful APIs are simple, they can be the perfect APIs for beginners.

What is REST API Testing?

REST API Testing is a web automation testing technique for testing REST-based APIs for web applications without using the user interface. The purpose of REST API testing is to record the response of REST API by sending various HTTP requests to check if REST API is working correctly. You can test a REST API with GET, POST, PUT, PATCH and DELETE methods.

What is the most Popular Response Data Format?

JSON is the most popular response data format amongst developers. JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write and it’s simple for machines to parse and generate. Plus, JSON is a is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. JSON is widely used due to its lighter payloads, greater readability, reduced machine overhead for Serialization/Deserialization and easier consumption by JavaScript. These properties make JSON an ideal data-interchange language.

How Can I Improve My API Testing & Performance?

Speedscale helps operation teams prevent costly incidents by validating how new code will perform under production-like workload conditions. Site Reliability Engineers use Speedscale to measure the golden signals of latency, throughput and errors before the code is released. Speedscale Traffic Replay is an alternative to legacy API testing approaches which take days or weeks to run and do not scale well for modern architectures.

Now that you know some of the basics of APIs and API testing methods, you’re one step closer to being able to leverage the full power of API automation. Learn how Speedscale’s solutions can help improve your API testing & performance.