EvvyTools

Posted on May 28

How to A/B Test Email Subject Lines Without Wasting Sends

#tools #writing #productivity

A/B testing email subject lines is one of the most accessible experiments in email marketing, but most teams run it in ways that produce noise rather than signal. Testing the wrong variables, using undersized splits, or failing to track results means the test costs send volume but delivers no learnable data.

This guide covers how to structure subject line tests that produce clean results and accumulate into actual knowledge about your list.

Step 1: Define One Variable Per Test

The most common A/B testing mistake: changing multiple things at once. If test A uses "How to cut your invoice payment time in half" and test B uses "3 quick fixes for faster invoice payments," you've changed framing, length, number inclusion, and tone simultaneously. Whichever wins tells you only which combination worked -- not why.

Isolate one variable per test.

Good test pair examples:

Test	Variable	Version A	Version B
1	Framing	"How to..."	"Why..."
2	Length	35-char version	60-char version
3	Personalization	With first name	Without first name
4	Question vs. statement	"Is your subject line working?"	"Why most subject lines fail"
5	Urgency	With deadline reference	Without deadline

Run one test at a time, or at most run two simultaneously on non-overlapping segments. After five to ten tests, you'll have actual findings specific to your list.

Step 2: Determine Minimum Sample Size

Testing a 50/50 split on a list of 200 gives you 100 people per variant. That's not enough to draw conclusions -- a 5-point open rate difference is within normal statistical variance at that sample size.

Most email statisticians recommend at minimum 1,000 recipients per variant for open rate tests to be statistically useful. At 95% confidence, a 2-point difference on 1,000 opens is real. On 100 opens, it's noise.

If your list is under 2,000 total:

Run tests over multiple sends using the same framing hypothesis. If "how-to" framing consistently outperforms "why-framing" across five separate sends, that's cumulative evidence even without a single clean experiment.
Focus on the largest observable differences, not marginal ones.
Use Mailchimp's A/B testing documentation or your platform's sample-size calculator before committing to a test structure.

Step 3: Set Up the Split

Most email platforms with built-in A/B testing let you configure:

Split size: what percentage of the list sees each variant
Winner selection: by open rate, click rate, or manual selection
Wait window: how long before declaring a winner and sending to the remainder

Recommended settings for clean tests:

Split: 40% / 40% (20% held back for winner send)
Winner metric: open rate (for subject line tests)
Wait window: 4 hours minimum (24 hours for low-volume lists)

If you're using a platform without native A/B testing, segment your list manually and track results in a spreadsheet. The manual approach is less convenient but the data is identical.

Platforms like HubSpot, ActiveCampaign, and Klaviyo all have built-in A/B testing with configurable wait windows.

Step 4: Pre-Screen Both Variants Before Sending

Before the test goes out, both subject lines should pass a basic quality check:

No spam trigger words
Both front-loaded with the important content in the first 40 characters
Preview text written intentionally (not defaulted to body copy)
Approximately correct length for the list type

If one variant fails a basic quality check, the test is unfair -- you're testing a good subject line against a bad one, which wastes the test slot. Both variants should be reasonable candidates for the send.

The Email Subject Line Tester on EvvyTools lets you run both variants through spam detection and length scoring before the test launches. This takes five minutes and prevents the most common pre-send failure modes.

Step 5: Record and Accumulate Results

A single test result is an observation. Ten test results on the same hypothesis is a finding. Build a simple tracking log:

Date | List | Subject A | Subject B | Open A | Open B | Winner | Variable tested
2026-05-01 | Newsletter | "How to cut..." | "3 quick fixes..." | 28% | 31% | B | framing
2026-05-08 | Newsletter | "Why freelancers..." | "The mistake..." | 26% | 29% | B | negative framing

Review this log every quarter. Patterns that appear consistently -- "our list always opens negative framing over positive" or "our list doesn't respond to numbered lists" -- are audience-specific findings that no general best-practices guide can tell you.

Step 6: Act on the Findings

A finding is only useful if it changes behavior. After 20+ tests, you should have a short list of validated preferences for your specific list:

Which framing style outperforms (how-to vs. why-framing vs. question)
Whether length matters significantly for this audience
Whether personalization moves the needle
Which urgency signals, if any, produce real lifts

Use this list to set defaults for new sends, not to write identical subject lines -- use it to know which variable to favor when you're uncertain.

Common Testing Mistakes to Avoid

Testing too early: Running a test on a list of 300 and drawing conclusions is worse than no test, because it creates false confidence in a false finding.

Testing the same variable twice: If you've already confirmed your list doesn't respond to numbered lists, stop testing numbered lists. Move to untested variables.

Declaring winners based on opens alone: A subject line that gets opens but low clicks might be curiosity-gap bait -- it earns the open but the content doesn't deliver. Track click rate as a secondary metric.

Changing sends between test and control: If test A goes out Tuesday morning and test B goes out Friday afternoon, time of day is now a confounding variable. Send both within the same hour.

How Platform Choice Affects What You Can Test

Not all email platforms support the same A/B testing configurations. Before committing to a testing approach, confirm what your platform actually supports:

Native split testing with automatic winner selection (Mailchimp, Klaviyo, HubSpot, ActiveCampaign) lets you define the split size, wait window, and winner metric before the send. The platform handles distribution and winner selection automatically. This is the cleanest setup for systematic testing.

Manual split with external tracking (spreadsheet, UTM parameters, or list segmentation) works on any platform but requires more setup. Segment the list yourself, send to each segment, record opens manually or via campaign reports. The data is equivalent; the process is less automated.

No split testing support on some basic email tools means you're limited to sequential testing: send variant A to the full list in one send, variant B in the next. Time gaps, list churn, and changing send conditions make this the weakest testing approach, but it's better than nothing if the platform doesn't support splits.

For platforms without native A/B, Brevo offers a free tier that supports basic subject line splits -- worth using as a dedicated testing environment even if you're sending through a different platform for production sends.

The testing rigor you can achieve depends on the platform you're using. Know the constraints before designing experiments around them.

Summary

Clean A/B testing of email subject lines requires:

Isolating one variable per test
Using a large enough sample to produce valid data
Pre-screening both variants before sending
Recording results systematically
Reviewing findings over time to identify list-specific preferences

For the deeper context on what makes subject lines work before you get to testing them, the guide "How to Write Email Subject Lines That Actually Get Opened" covers the five core variables that drive open rates and the patterns worth testing across different list types.

DEV Community