DEV Community

Pavel Myslik
Pavel Myslik

Posted on

Fixing Production Data in Rails: Lessons from a 6,000-Row Backfill

Fixing a bug in production is usually straightforward. But what happens when the data in the database itself is broken? That’s when things get tricky.

I learned this when a routine deploy quietly broke confirmed_at on our Order records. By the time anyone noticed, around 6,000 rows were affected — and dashboards, emails, and downstream services all depended on that field.

The two-line code fix shipped in minutes, but the data backfill required a lot more thought.


The First Naive Attempt

Once we understood the scope of the problem, the first instinct was obvious: fix it fast. Most Rails developers will reach for one of two things here — a quick one-liner in the console, or a small migration that updates the data directly.

Something like this:

Order.where(confirmed_at: nil, status: "confirmed")
     .update_all(confirmed_at: Time.current)
Enter fullscreen mode Exit fullscreen mode

Note: In reality, determining the correct value for confirmed_at was more complicated — we had to derive it from related records and business logic. For the purposes of this article, Time.current keeps the examples simple and focused on the backfill pattern itself.

It looks clean. One line, no fuss. After a two-minute code fix, it feels like the natural next step.

And this is exactly where production backfills could go wrong.


Why This Is Dangerous in Production

At first glance, the one-liner looks harmless — simple, fast, and it works perfectly in development. The problem starts when you run it on real production data.

There are a few risks that are easy to overlook:

  • No visibility into progress. Once the query starts, you have no idea how many records have been updated or how many are left.
  • No safe way to restart. If the process stops halfway — timeout, a deploy, a dropped connection — you don't know what state the data is in.
  • No way to preview the change. There's no dry run. Running it once already changes the data.
  • Hard to test properly. You can't write a meaningful test for a console one-liner or an inline migration. If the logic is wrong, you find out in production.
  • Large transactions put pressure on the database. Updating thousands of rows in a single query can lock tables and slow down the app for real users.

The problem isn't the syntax. It's that once it starts, you've handed over control — and when other systems depend on this data being correct, that's a risk you don't want to take.


A Safer Approach

Instead of trying to fix everything in one query, we decided to treat the backfill like real code.

Not a console one-liner.
Not an inline migration.
A small Ruby class that lives in the repository and can be tested like anything else.

Something like this:

lib/backfills/backfill_confirmed_at.rb
Enter fullscreen mode Exit fullscreen mode

The idea is simple:

  • the script only triggers the backfill
  • all the logic lives in a dedicated class
  • the class can be tested safely before anything runs in production

This keeps the backfill predictable, restartable, and — most importantly — verifiable before it touches a single row.


The Backfill Class

# lib/backfills/backfill_confirmed_at.rb

module Backfills
  class BackfillConfirmedAt
    attr_reader :dry_run, :logger

    def initialize(dry_run: false, logger: Rails.logger)
      @dry_run = dry_run
      @logger  = logger
    end

    def run
      log "Starting | mode: #{dry_run ? 'DRY RUN' : 'LIVE'}"

      processed = 0

      orders.find_each do |order|
        if dry_run
          log "[DRY RUN] Would update Order ##{order.id}"
        else
          order.update_columns(confirmed_at: confirmed_at)
          log "Updated Order ##{order.id}"
        end

        processed += 1

      rescue StandardError => e
        log "Error processing Order #{order.id}: #{e.message}"
      end

      log "Done. Processed: #{processed}"
    end

    private

    def log(message) = logger.info "[Backfills::BackfillConfirmedAt] #{message}"

    def orders = Order.where(confirmed_at: nil, status: "confirmed").order(:id)

    # In reality, the more complex logic of calculating the date
    def confirmed_at = Time.current
  end
end
Enter fullscreen mode Exit fullscreen mode

Testing the Backfill Class

One of the biggest advantages of extracting the logic into a class is that you can test it like any other Ruby code — before running anything in production.

Here are a few tests that cover the most important cases:

# spec/lib/backfills/backfill_confirmed_at_spec.rb

RSpec.describe Backfills::BackfillConfirmedAt do
  let(:null_logger) { Logger.new(nil) }

  describe "#run" do
    let!(:order) { create(:order, status: "confirmed", confirmed_at: nil) }

    context "when dry_run is enabled" do
      it "does not update any records" do
        described_class.new(dry_run: true, logger: null_logger).run

        expect(order.reload.confirmed_at).to be_nil
      end
    end

    context "when order is confirmed with missing confirmed_at" do
      it "updates confirmed_at" do
        described_class.new(logger: null_logger).run

        expect(order.reload.confirmed_at).not_to be_nil
      end
    end

    context "when order already has confirmed_at" do
      let(:timestamp) { 2.days.ago }

      before { order.update(confirmed_at: timestamp) }

      it "does not overwrite it" do
        described_class.new(logger: null_logger).run

        expect(order.reload.confirmed_at).to be_within(1.second).of(timestamp)
      end
    end

    context "when order has a different status" do
      before { order.update(status: "pending") }

      it "does not touch it" do
        described_class.new(logger: null_logger).run

        expect(order.reload.confirmed_at).to be_nil
      end
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

Notice that we pass a null_logger to keep the test output clean — no need to see backfill logs while running the test suite.

These tests won't catch every edge case, but they give you enough confidence to run the backfill knowing the core logic has been verified.


Running the Backfill

The script becomes very simple — it only decides whether we run a dry run or the real update:

# script/backfill_confirmed_at.rb

dry_run = ARGV.include?("--dry-run")

Backfills::BackfillConfirmedAt.new(dry_run: dry_run).run
Enter fullscreen mode Exit fullscreen mode

Always Start With a Dry Run

Before touching any production data, always preview what would change first:

rails runner script/backfill_confirmed_at.rb --dry-run
Enter fullscreen mode Exit fullscreen mode

The output will look something like this:

[Backfills::BackfillConfirmedAt] Starting | mode: DRY RUN
[Backfills::BackfillConfirmedAt] [DRY RUN] Would update Order #10042
[Backfills::BackfillConfirmedAt] [DRY RUN] Would update Order #10051
[Backfills::BackfillConfirmedAt] [DRY RUN] Would update Order #10063
...
[Backfills::BackfillConfirmedAt] Done. Processed: 6000
Enter fullscreen mode Exit fullscreen mode

This gives you a chance to verify a few things before anything is written:

  • Is the scope correct? Spot-check a few IDs directly in the database.
  • Does the number of affected records match your expectations? If you expected 6,000 rows but the dry run shows 10,000, something is wrong.
  • Are there any surprising records? Maybe some orders have a status you didn't account for.

Dry run costs you a few minutes.
A botched live backfill can cost you hours.

Then Run It For Real

When everything looks correct:

rails runner script/backfill_confirmed_at.rb
Enter fullscreen mode Exit fullscreen mode

If you want even more control, consider extending the backfill class with a limit: option to test on a small subset first, or a configurable batch_size: for larger datasets.


How to Safely Stop the Running Backfill

One of the hidden benefits of this approach is that you can stop the script at any time — without worrying about leaving the data in a broken state.

Because find_each processes and commits records one by one, each update is independent. If you stop the script after 2,000 records, those 2,000 rows are correctly updated and stay that way. The remaining records are simply untouched.

To stop the script, a simple Ctrl+C is enough. When you're ready to continue, just run it again. The scope automatically skips already-updated records and picks up where it left off.

This is the key difference from update_all or an inline migration. A large single query either completes fully or rolls back entirely. If something goes wrong halfway through, you're back to square one.

With batch processing, interrupting is not a failure. It's just a pause.


The Workflow We Follow Now

Not every data fix needs this treatment. If you're updating a handful of records, a quick one-liner in the console is perfectly fine.

But once you're dealing with thousands of rows — especially when other systems depend on that data — it's worth slowing down and following a simple process.

1. Understand the scope first

Before writing a single line of code, run a COUNT query on production. Know exactly how many records are affected and confirm what the correct value should be.

2. Write the backfill class

Extract all logic into a dedicated class in lib/backfills/. Add a dry_run option and logging from the start.

3. Test it

Write at least a few basic tests before running anything. This is the step that's easiest to skip under pressure, and the one you'll be most grateful for.

4. Dry run on production

Run with --dry-run first and review the output carefully. Verify the record count matches your expectations and spot-check a few IDs directly in the database.

5. Run it for real

Only when the dry run looks correct. Monitor the logs as it runs — that's what they're there for.


Final Thoughts

The irony of this whole situation was that the code fix took two minutes. The data fix took almost a day — not because it was technically hard, but because we wanted to do it right.

That ratio is worth remembering. A careless backfill can easily cause more damage than the original bug. Treating it like real code — with tests, dry runs, and logging — is not over-engineering. It's just respect for the data your users depend on.

Next time you're staring at thousands of broken records, resist the one-liner. Take the extra hour. Your future self will thank you.


What does your backfill process look like? I'm curious whether others reach for a similar pattern — or something completely different. Let me know in the comments.

Top comments (0)