Every team I've worked on has had the same conversation at some point. Someone opens the coverage report, sees a sea of red, and asks: "How do we get this up?" The answer is always some version of "we need to write more tests," followed by a long silence, because everyone knows what that actually means — hours of boilerplate, test file setup, mock wiring, and fixture scaffolding before you've written a single meaningful assertion.
That's the problem TestSmith was built to solve.
The Real Bottleneck Isn't Willingness
Developers generally want to write tests. The resistance isn't laziness — it's the setup cost. For every new module you want to test, you have to:
- Create the test file in the right location with the right naming convention
- Import the module under test
- Import the test framework and any mock libraries
- Set up fixtures for external dependencies
- Write the boilerplate class or function structure that the framework expects
- Then, finally, write the actual test logic
For a well-understood module with clear inputs and outputs, steps 1 through 5 can easily take longer than step 6. You're doing janitorial work before you can do the meaningful work. And if you're adding coverage to a large existing codebase — the kind of coverage catch-up project every team eventually faces — you're doing that setup dozens or hundreds of times.
Why Python First
We wrote the first version of TestSmith in Python for the most straightforward of reasons: our immediate problem was a Python codebase.
But Python also happened to be a good fit for the tool itself. Python's AST module is excellent — ast.parse() gives you a full parse tree in a few lines, and walking it to extract class names, function signatures, and import statements is straightforward. For a tool that needs to understand source code structure without actually running it, static AST analysis is exactly right, and Python's standard library makes it easy.
import ast
tree = ast.parse(source_code)
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
if not node.name.startswith('_'): # skip private
public_functions.append(node.name)
The other reason was speed of iteration. We were solving our own problem — we needed the tool to work on Python projects, and we were Python developers. Building it in Python meant we could use it on itself from day one, which is a useful forcing function for catching rough edges.
What the Tool Actually Does
The core idea is simple: given a source file, generate the test scaffold that you'd write by hand.
For a Python service like this:
# src/services/payment.py
class PaymentService:
def __init__(self, stripe_client, db):
self.stripe = stripe_client
self.db = db
def process_payment(self, order_id: str, amount: int) -> dict:
...
def refund(self, payment_id: str) -> bool:
...
TestSmith generates:
# tests/services/test_payment.py
import pytest
from unittest.mock import MagicMock, patch
from src.services.payment import PaymentService
@pytest.fixture
def stripe_client():
return MagicMock()
@pytest.fixture
def db():
return MagicMock()
@pytest.fixture
def payment_service(stripe_client, db):
return PaymentService(stripe_client=stripe_client, db=db)
class TestPaymentService:
def test_process_payment(self, payment_service):
# TODO: implement
pass
def test_refund(self, payment_service):
# TODO: implement
pass
It's not a complete test. It's the scaffold — the file is in the right place, the imports are correct, the fixtures for the constructor dependencies are wired up, and the test methods exist. The developer fills in the assertion logic. The janitorial work is already done.
The tool also handles things that are easy to get wrong: where test files should live relative to source files (which varies by framework and project convention), how to name fixtures based on constructor parameters, which mock library to use, and how to structure the test class if the source is class-based vs. the test functions if it's function-based.
The Gap Analysis Problem
Coverage reports tell you what's untested, but they don't prioritise it. A file with three simple utility functions and a file with a complex payment processing pipeline both show up as "uncovered." Knowing which one to tackle first requires reading the code.
TestSmith added a coverage gap command that went a step further: it computed a coupling score for each untested module based on how many other modules imported it. A module imported by ten others is higher priority than one imported by none — because a bug in the heavily-imported module has a wider blast radius.
$ testsmith gaps
Coverage gaps (by coupling score):
src/services/payment.py coupling: 8 functions: 5 ← fix this first
src/utils/currency.py coupling: 6 functions: 3
src/models/order.py coupling: 4 functions: 7
src/scripts/backfill.py coupling: 0 functions: 12
This gave teams a principled answer to "where do we start?" rather than requiring someone to manually audit the codebase.
What v1 Didn't Do Well
The tool worked. Teams used it and got value from it. But two things became clear over time.
Distribution was painful. pip install testsmith sounds simple, but in practice it meant managing Python versions, virtual environments, and dependency conflicts — especially in CI. A testing tool that requires its own setup to work in CI is fighting against itself.
One language wasn't enough. Once word got around that the tool existed, the first question from every team was "does it work for TypeScript?" or "can it do Java?" The Python-only design wasn't a deliberate choice — it was an artifact of solving our own immediate problem. But the architecture didn't make adding languages easy. Every language-specific piece of logic was woven through the core code rather than isolated.
Those two problems drove the v2 rewrite in Go: a single static binary that drops into any environment, and a plugin architecture where each language is an isolated driver.
But that's the next post.
TestSmith is open source at github.com/orieken/testsmith. The v1 Python package is archived at archive/v1/ for reference.
Top comments (0)